WO2023123021A1 - Method and apparatus for acquiring feature description of molecule, and storage medium - Google Patents

Method and apparatus for acquiring feature description of molecule, and storage medium Download PDF

Info

Publication number
WO2023123021A1
WO2023123021A1 PCT/CN2021/142379 CN2021142379W WO2023123021A1 WO 2023123021 A1 WO2023123021 A1 WO 2023123021A1 CN 2021142379 W CN2021142379 W CN 2021142379W WO 2023123021 A1 WO2023123021 A1 WO 2023123021A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
target molecule
slater
basis
obtaining
Prior art date
Application number
PCT/CN2021/142379
Other languages
French (fr)
Chinese (zh)
Inventor
曾群
付文博
袁久闯
Original Assignee
深圳晶泰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳晶泰科技有限公司 filed Critical 深圳晶泰科技有限公司
Priority to PCT/CN2021/142379 priority Critical patent/WO2023123021A1/en
Publication of WO2023123021A1 publication Critical patent/WO2023123021A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Definitions

  • the present application relates to the field of computer-aided pharmacy, in particular to a method, device and storage medium for obtaining molecular feature descriptions.
  • Molecular feature description usually refers to the measurement of the properties of a molecule in a certain aspect, including the physical and chemical properties of the molecule and numerical indicators derived from various algorithms based on the molecular structure, such as: molecular mass, number of rings, hydrogen bond supply and acceptance Body number, molecular shape expression, etc.
  • Molecular feature description needs to be designed in advance, and a reasonable model can only be obtained if a description that is correlated with the target property is selected. In terms of data attributes, molecular feature description is usually difficult to directly use in the construction of deep neural networks.
  • the present application provides a method, device and storage medium for obtaining molecular feature descriptions.
  • the technical solution can quickly obtain molecular feature descriptions for training deep neural networks.
  • the first aspect of the present application provides a method for obtaining molecular feature descriptions, including:
  • the characteristic description of the target molecule is obtained.
  • the second aspect of the present application provides a device for obtaining molecular feature descriptions, including:
  • the first obtaining module is used to obtain the structural characteristic value of the target molecule
  • the first calculation module is used to obtain an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule;
  • the second calculation module is used to solve the wave function coefficient of the target molecule as a quantum system by using a semi-empirical quantum mechanical method based on the overlapping matrix;
  • the third calculation module is used to solve the density matrix of the target molecule as a quantum system according to the wave function coefficient
  • a regularization module configured to regularize the minimum Slater basis set to obtain a transformation matrix
  • the second obtaining module is used to obtain the characteristic description of the target molecule according to the density matrix and the transformation matrix.
  • the third aspect of the present application provides an electronic device, including:
  • a memory on which executable codes are stored, which, when executed by the processor, cause the processor to perform the method as described above.
  • a fourth aspect of the present application provides a storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is made to execute the above-mentioned method.
  • the overlapping matrix is obtained based on the minimum Slater basis set, and based on the obtained overlapping matrix, the target molecule is solved as a quantum system by using a semi-empirical quantum mechanical method wave function coefficients when . Since it is based on the minimum Slater basis set, and uses the semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is a quantum system, the process is non-self-consistent, that is, only a single round of matrix eigenvalue calculation is required for calculation, so it can be greatly improved. Reduce computing consumption and quickly describe the characteristics of target molecules, thereby reducing training costs and improving training efficiency when used to train deep neural networks.
  • FIG. 1 is a schematic flowchart of a method for obtaining molecular feature descriptions shown in an embodiment of the present application
  • Fig. 4 is a schematic structural diagram of a device for obtaining molecular feature descriptions shown in an embodiment of the present application
  • FIG. 5 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
  • first, second, third and so on may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another.
  • first information may also be called second information, and similarly, second information may also be called first information.
  • second information may also be called first information.
  • a feature defined as “first” and “second” may explicitly or implicitly include one or more of these features.
  • “plurality” means two or more, unless otherwise specifically defined.
  • molecular feature description usually refers to the measurement of a certain property of a molecule, including the physical and chemical properties of the molecule and numerical indicators derived from various algorithms based on the molecular structure, such as molecular mass, The number of rings, the number of hydrogen bond donors and acceptors, and the expression of molecular shape, etc.
  • Molecular feature description needs to be designed in advance, and a reasonable model can only be obtained if a description that is correlated with the target property is selected. In terms of data attributes, molecular feature description is usually difficult to directly use in the construction of deep neural networks.
  • the embodiment of the present application provides a method for obtaining molecular feature descriptions, which can quickly obtain molecular feature descriptions, reduce training costs and improve training efficiency when used for training deep neural networks.
  • FIG. 1 it is a schematic flowchart of a method for obtaining molecular feature descriptions shown in the embodiment of the present application.
  • the method mainly includes steps S101 to S106, and the details are as follows:
  • Step S101 Acquiring the structural characteristic value of the target molecule.
  • the structural feature value of the target molecule may be the spatial relative position r ij between the electrons belonging to the target molecule on the i-th basis function ⁇ slater,i and the j-th basis function ⁇ slater,j ,
  • the i-th basis function ⁇ slater,i and the j-th basis function ⁇ slater,j are the basis functions of the smallest Slater basis set. Since r ij is a parameter determined by the structure of the molecule, r ij can be known when the structure of the target molecule is obtained.
  • the above-mentioned relative spatial position r ij between electrons can be the relative spatial position between electrons belonging to the same atom of the target molecule, or the space between electrons belonging to different atoms of the target molecule relative position.
  • Step S102 Calculate an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule.
  • Structural feature values corresponding to the target molecule may include the implementation of the spatial relative position r ij between the electrons belonging to the target atom of the target molecule on the i-th basis function ⁇ slater,i and the j-th basis function ⁇ slater,j example.
  • the calculation of the overlap matrix can be: using the spatial relative position r ij as the integral variable, for the i-th basis function ⁇ slater,i and the i-th basis function ⁇ slater,i
  • S ij ⁇ slater,i ⁇ slater,j dr ij .
  • i and j are different or ⁇ slater,i , ⁇ slater,j and/or r ij are different, different elements S ij can be obtained, thereby obtaining the overlapping matrix S.
  • Step S103 Based on the overlapping matrix, the wave function coefficients of the target molecule as a quantum system are solved by using a semi-empirical quantum mechanics method.
  • the semi-empirical quantum mechanical method may be an extended Huckel method (EHT), that is, a non-self-consistent semi-empirical quantum mechanical method.
  • EHT extended Huckel method
  • STO-3G Gaussian group
  • the element H ij of the single-electron Hamiltonian matrix H in the semi-empirical quantum mechanics method of this application belongs to non-self-consistent semi- Empirical quantum mechanics method, which can greatly reduce the calculation consumption.
  • Step S104 Solve the density matrix when the target molecule is a quantum system according to the wave function coefficients.
  • Step S105 Regularize the minimum Slater basis set to obtain a transformation matrix.
  • the transformation matrix can transform the density matrix D when the target molecule is a quantum system into another matrix form.
  • other regularization methods such as the Schmidt orthogonalization method can also be used to regularize the minimum Slater basis set group for regularization.
  • the transformation matrix T obtained by regularizing the occupancy weighted symmetric orthogonalization method on the minimum Slater basis set can make the structure of the target molecule satisfy the rotation invariance.
  • Step S106 Obtain the characteristic description of the target molecule according to the density matrix and the transformation matrix.
  • the characteristic description of the target molecule can be obtained through steps S1061 to S1063, as described below:
  • Step S1061 Using the transformation matrix T, according to the formula Transform the density matrix D to obtain the transformed density matrix D T , where, Indicates that the inverse matrix of the transformation matrix T performs the conjugate rank conversion operation.
  • Step S1062 according to the formula Calculate the charge q A of the atom numbered A in the target molecule as the first feature description of the target molecule, where, is the element of the transformed density matrix D T , and Z A is the effective nuclear charge of the atom numbered A.
  • Step S1063 According to the formula Calculate the bond order BO AB between the atom numbered A and the atom numbered B in the target molecule as the second characteristic description of the target molecule.
  • both the charge q A and the bond level BO AB have clear physical meanings, so it can not only provide purposeful and meaningful edge descriptions for graph convolutional neural networks based on molecular or crystal structures, but also reduce molecular or crystal
  • the parameter space and the number of training sets of the deep neural network are very beneficial to improve the training efficiency of the neural network or reduce the training cost.
  • the wave function coefficient C of the water molecule shown in Figure 2 as a quantum system is:
  • the density matrix D of the water molecule shown in Figure 2 as a quantum system is:
  • the transformation matrix T is:
  • the transformed density matrix D T is:
  • the wave function coefficient C of the water molecule shown in Figure 3 as a quantum system is:
  • the density matrix D when the water molecule shown in Figure 3 is used as a quantum system is:
  • the transformation matrix T is:
  • the transformed density matrix D T is:
  • the overlapping matrix is obtained based on the minimum Slater basis set, and based on the obtained overlapping matrix, the target molecule is solved as a quantum system by using a semi-empirical quantum mechanical method wave function coefficients when . Since it is based on the minimum Slater basis set, and uses the semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is a quantum system, the process is non-self-consistent, that is, only a single round of matrix eigenvalue calculation is required for calculation, so it can be greatly improved. Reduce computing consumption and quickly describe the characteristics of target molecules, thereby reducing training costs and improving training efficiency when used to train deep neural networks.
  • the present application also provides a device for obtaining molecular feature descriptions, electronic equipment, and corresponding embodiments.
  • FIG. 4 it is a schematic structural diagram of an apparatus for obtaining molecular feature descriptions shown in an embodiment of the present application. For ease of description, only the parts related to the embodiment of the present application are shown.
  • the device illustrated in Fig. 4 may include a first acquisition module 401, a first calculation module 402, a second calculation module 403, a third calculation module 404, a regularization module 405 and a second acquisition module 406, as follows:
  • the first obtaining module 401 is used to obtain the structural characteristic value of the target molecule
  • the first calculation module 402 is used to obtain an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule;
  • the second calculation module 403 is used to solve the wave function coefficient when the target molecule is a quantum system based on the overlapping matrix and using a semi-empirical quantum mechanics method;
  • the third calculation module 404 is used to solve the density matrix when the target molecule is a quantum system according to the wave function coefficient
  • the regularization module 405 is used for regularizing the minimum Slater basis set to obtain a transformation matrix
  • the second acquisition module 406 is configured to acquire the characteristic description of the target molecule according to the density matrix and the transformation matrix.
  • the overlap matrix is obtained based on the minimum Slater basis set, and based on the obtained overlap matrix, the semi-empirical quantum mechanics method is used to solve the problem of the target molecule as a quantum system.
  • wave function coefficients Since it is based on the minimum Slater basis set, and uses the semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is a quantum system, the process is non-self-consistent, that is, only a single round of matrix eigenvalue calculation is required for calculation, so it can be greatly improved. Reduce computing consumption and quickly describe the characteristics of target molecules, thereby reducing training costs and improving training efficiency when used to train deep neural networks.
  • the structural characteristic value of the target molecule in the above example includes the spatial relative position r between the electrons belonging to the target atom of the target molecule on the i-th basis function ⁇ slater,i and the j-th basis function ⁇ slater,j ij , the i-th basis function ⁇ slater,i and the j-th basis function ⁇ slater,j are the basis functions of the smallest Slater basis set.
  • the first calculation module 402 illustrated in FIG. 4 may include an integration unit for taking the electrons belonging to the target atom of the target molecule on the i-th basis function ⁇ slater,i and the j-th basis function ⁇ slater,j
  • the spatial relative position r ij between is the integration variable, and the product of the i-th basis function ⁇ slater,i and the j-th basis function ⁇ slater,j is integrated to obtain the element S ij of the overlap matrix.
  • the second calculation module 403 illustrated in FIG. 4 may include a matrix element calculation unit and a wave function coefficient calculation unit, wherein:
  • the third calculation module 404 illustrated in FIG. 4 may include a conjugate conversion rank calculation unit and a density matrix calculation unit, wherein:
  • the conjugate conversion rank calculation unit is used to perform the conjugate conversion rank operation on the wave function coefficient C to obtain
  • Density matrix calculation unit for following the formula Solve the density matrix D when the target molecule is a quantum system, where ⁇ is the orbital occupation matrix.
  • the regularization module 405 shown in FIG. 4 may include a symmetric orthogonalization unit, configured to regularize the minimum Slater basis set using an occupancy weighted symmetric orthogonalization method to obtain a transformation matrix T.
  • a symmetric orthogonalization unit configured to regularize the minimum Slater basis set using an occupancy weighted symmetric orthogonalization method to obtain a transformation matrix T.
  • the second acquisition module 406 illustrated in FIG. 4 may include a matrix transformation unit, a first feature description calculation unit, and a second feature description calculation unit, wherein:
  • the matrix transformation unit is used to adopt the transformation matrix T according to the formula Transform the density matrix D to obtain the transformed density matrix D T , where, Indicates that the inverse matrix of the transformation matrix T performs the conjugate rank conversion operation;
  • the first feature describes the computational unit for following the formula Calculate the charge q A of the atom numbered A in the target molecule as the first feature description of the target molecule, where, is the element of the transformed density matrix D T , Z A is the effective nuclear charge of the atom numbered A;
  • the second feature describes the computational unit for following the formula Calculate the bond order BO AB between the atom numbered A and the atom numbered B in the target molecule as the second characteristic description of the target molecule.
  • the electronic device 500 includes a memory 510 and a processor 520 .
  • the processor 520 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 510 may include various types of storage units, such as system memory, read only memory (ROM), and persistent storage. Wherein, the ROM can store static data or instructions required by the processor 520 or other modules of the computer.
  • the persistent storage device may be a readable and writable storage device. Persistent storage may be a non-volatile storage device that does not lose stored instructions and data even if the computer is powered off.
  • the permanent storage device adopts a mass storage device (such as a magnetic or optical disk, flash memory) as the permanent storage device.
  • the permanent storage device may be a removable storage device (such as a floppy disk, an optical drive).
  • the system memory can be a readable and writable storage device or a volatile readable and writable storage device, such as dynamic random access memory.
  • System memory can store some or all of the instructions and data that the processor needs at runtime.
  • memory 510 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic disks and/or optical disks may also be used.
  • memory 510 may include a readable and/or writable removable storage device, such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-Only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc.
  • a readable and/or writable removable storage device such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-Only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc.
  • Computer-readable storage media do not contain carrier waves and transient electronic signals transmitted by wireless or wire.
  • Executable codes are stored in the memory 510 , and when the executable codes are processed by the processor 520 , the processor 520 can be made to execute part or all of the methods mentioned above.
  • the method according to the present application can also be implemented as a computer program or computer program product, the computer program or computer program product including computer program code instructions for executing some or all of the steps in the above method of the present application.
  • the present application may also be implemented as a storage medium, including a non-transitory machine-readable storage medium, a computer-readable storage medium, or a machine-readable storage medium, on which executable code (or computer program, or computer instruction) is stored. code), when the executable code (or computer program, or computer instruction code) is executed by the processor of the electronic device (or electronic device, server, etc.), causing the processor to perform part or part of each step of the above-mentioned method according to the present application all.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Medical Informatics (AREA)
  • Organic Low-Molecular-Weight Compounds And Preparation Thereof (AREA)

Abstract

A method and an apparatus for acquiring a feature description of a molecule, and a storage medium. The method comprises: acquiring a structural feature value of a target molecule; obtaining an overlapping matrix on the basis of a minimum Slater basis set and the structural feature value of the target molecule; on the basis of the overlapping matrix, using a semi-empirical quantum mechanical method to solve for a wave function coefficient in the case that the target molecule is used as a quantum system; according to the wave function coefficient, solving for a density matrix in the case that the target molecule is used as the quantum system; regularizing the minimum Slater basis set to obtain a transform matrix; and acquiring a feature description of the target molecule according to the density matrix and the transform matrix. The invention enables quick acquisition of a feature description of a molecule for use in training a deep neural network.

Description

获取分子特征描述的方法、装置及存储介质Method, device and storage medium for obtaining molecular feature description 技术领域technical field
本申请涉及计算机辅助制药领域,尤其涉及一种获取分子特征描述的方法、装置及存储介质。The present application relates to the field of computer-aided pharmacy, in particular to a method, device and storage medium for obtaining molecular feature descriptions.
背景技术Background technique
分子特征描述,通常指的是分子在某一方面性质的度量,包含分子的物理化学性质以及根据分子结构通过各种算法推导出来的数值指标,比如:分子质量、环个数、氢键供受体个数、分子形状表述等。分子特征描述需要预先设计,只有选择了与目标性质具有相关性的描述,才可能获得合理的模型。从数据属性上,分子特征描述通常难以直接用于深度神经网络的搭建。目前,尽管已经存在一些具有物理意义明确的分子特征描述,例如库伦矩阵(CM)、键包(BoB)、原子位置平滑重叠(SOAP)、原子中心对称函数(ACSF)和变形的径像函数等。然而,由于不仅可能存在经验性的超参数,而且由此建立的机器学习模型依然需要较大的参数空间以及与之对应的大量训练集数据,因此并不适合用于深度神经网络。Molecular feature description, usually refers to the measurement of the properties of a molecule in a certain aspect, including the physical and chemical properties of the molecule and numerical indicators derived from various algorithms based on the molecular structure, such as: molecular mass, number of rings, hydrogen bond supply and acceptance Body number, molecular shape expression, etc. Molecular feature description needs to be designed in advance, and a reasonable model can only be obtained if a description that is correlated with the target property is selected. In terms of data attributes, molecular feature description is usually difficult to directly use in the construction of deep neural networks. At present, although there are already some descriptions of molecular features with clear physical meaning, such as Coulomb matrix (CM), bag of bonds (BoB), smooth overlapping of atomic positions (SOAP), atomic central symmetry function (ACSF) and deformed radial image function, etc. . However, since there may not only be empirical hyperparameters, but also the machine learning model established therefrom still requires a large parameter space and a large amount of corresponding training set data, it is not suitable for deep neural networks.
发明内容Contents of the invention
为解决或部分解决相关技术中存在的问题,本申请提供一种获取分子特征描述的方法、装置及存储介质,该技术方案能够快速获取分子特征描述,以用于对深度神经网络的训练。In order to solve or partially solve the problems existing in related technologies, the present application provides a method, device and storage medium for obtaining molecular feature descriptions. The technical solution can quickly obtain molecular feature descriptions for training deep neural networks.
本申请第一方面提供一种获取分子特征描述的方法,包括:The first aspect of the present application provides a method for obtaining molecular feature descriptions, including:
获取目标分子的结构特征值;Obtain the structural eigenvalues of the target molecule;
基于最小Slater基组和所述目标分子的结构特征值,求取重叠矩阵;Obtaining an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule;
基于所述重叠矩阵,采用半经验量子力学方法求解所述目标分子作为量子系统时的波函数系数;Based on the overlapping matrix, using a semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is used as a quantum system;
根据所述波函数系数,求解所述目标分子作为量子系统时的密度矩阵;Solving the density matrix of the target molecule as a quantum system according to the wave function coefficients;
对所述最小Slater基组正则化,得到变换矩阵;Regularizing the minimum Slater basis set to obtain a transformation matrix;
根据所述密度矩阵和变换矩阵,获取所述目标分子的特征描述。According to the density matrix and transformation matrix, the characteristic description of the target molecule is obtained.
本申请第二方面提供一种获取分子特征描述的装置,包括:The second aspect of the present application provides a device for obtaining molecular feature descriptions, including:
第一获取模块,用于获取目标分子的结构特征值;The first obtaining module is used to obtain the structural characteristic value of the target molecule;
第一计算模块,用于基于最小Slater基组和所述目标分子的结构特征值,求取重叠矩阵;The first calculation module is used to obtain an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule;
第二计算模块,用于基于所述重叠矩阵,采用半经验量子力学方法求解所述目标分子作为量子系统时的波函数系数;The second calculation module is used to solve the wave function coefficient of the target molecule as a quantum system by using a semi-empirical quantum mechanical method based on the overlapping matrix;
第三计算模块,用于根据所述波函数系数,求解所述目标分子作为量子系统时的密度矩阵;The third calculation module is used to solve the density matrix of the target molecule as a quantum system according to the wave function coefficient;
正则化模块,用于对所述最小Slater基组正则化,得到变换矩阵;A regularization module, configured to regularize the minimum Slater basis set to obtain a transformation matrix;
第二获取模块,用于根据所述密度矩阵和变换矩阵,获取所述目标分子的特征描述。The second obtaining module is used to obtain the characteristic description of the target molecule according to the density matrix and the transformation matrix.
本申请第三方面提供一种电子设备,包括:The third aspect of the present application provides an electronic device, including:
处理器;以及processor; and
存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如上所述的方法。A memory, on which executable codes are stored, which, when executed by the processor, cause the processor to perform the method as described above.
本申请第四方面提供一种存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如上所述的方法。A fourth aspect of the present application provides a storage medium, on which executable code is stored, and when the executable code is executed by a processor of an electronic device, the processor is made to execute the above-mentioned method.
从上述本申请提供的技术方案可知,在获取目标分子的结构特征值后,基于最小Slater基组求取重叠矩阵,并且基于求取的重叠矩阵,采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数。由于是基于最小Slater基组,而且采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数,其过程是非自洽的即仅需单轮的矩阵本征值求解进行计算,因此能够大幅降低计算消耗,快速目标分子的特征描述,从而在用于对深度神经网络进行训练时降低训练成本并提升训练效率。From the technical solution provided by the above-mentioned application, after obtaining the structural eigenvalues of the target molecule, the overlapping matrix is obtained based on the minimum Slater basis set, and based on the obtained overlapping matrix, the target molecule is solved as a quantum system by using a semi-empirical quantum mechanical method wave function coefficients when . Since it is based on the minimum Slater basis set, and uses the semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is a quantum system, the process is non-self-consistent, that is, only a single round of matrix eigenvalue calculation is required for calculation, so it can be greatly improved. Reduce computing consumption and quickly describe the characteristics of target molecules, thereby reducing training costs and improving training efficiency when used to train deep neural networks.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
附图说明Description of drawings
通过结合附图对本申请示例性实施方式进行更详细的描述,本申请的上述以及其它目的、特征和优势将变得更加明显,其中,在本申请示例性实施方式中,相同的参考标号通常代表相同部件。The above and other objects, features and advantages of the present application will become more apparent by describing the exemplary embodiments of the present application in more detail with reference to the accompanying drawings, wherein, in the exemplary embodiments of the present application, the same reference numerals generally represent same parts.
图1是本申请实施例示出的获取分子特征描述的方法的流程示意图;FIG. 1 is a schematic flowchart of a method for obtaining molecular feature descriptions shown in an embodiment of the present application;
图2是本申请实施例示出的键长
Figure PCTCN2021142379-appb-000001
键角R HOH=100.04°的水分子结构示意图;
Fig. 2 is the bond length shown in the embodiment of the present application
Figure PCTCN2021142379-appb-000001
Schematic diagram of the water molecule structure with bond angle R HOH = 100.04°;
图3是本申请实施例示出的键长
Figure PCTCN2021142379-appb-000002
键角R HOH=103.49°的水分子结构示意图;
Fig. 3 is the bond length shown in the embodiment of the present application
Figure PCTCN2021142379-appb-000002
Schematic diagram of the water molecule structure with bond angle R HOH = 103.49°;
图4是本申请实施例示出的获取分子特征描述的装置的结构示意图;Fig. 4 is a schematic structural diagram of a device for obtaining molecular feature descriptions shown in an embodiment of the present application;
图5是本申请实施例示出的电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.
具体实施方式Detailed ways
下面将参照附图更详细地描述本申请的实施方式。虽然附图中显示了本申请的实施方式,然而应该理解,可以以各种形式实现本申请而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本申请更加透彻和完整,并且能够将本申请的范围完整地传达给本领域的技术人员。Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the scope of this application to those skilled in the art.
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in this application is for the purpose of describing particular embodiments only, and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本申请可能采用术语“第一”、“第二”、“第三”等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。It should be understood that although the terms "first", "second", "third" and so on may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present application, first information may also be called second information, and similarly, second information may also be called first information. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of the present application, "plurality" means two or more, unless otherwise specifically defined.
在进行计算机辅助药物设计的过程中,分子特征描述通常指的是分子 在某一方面性质的度量,包含分子的物理化学性质以及根据分子结构通过各种算法推导出来的数值指标,例如分子质量、环个数、氢键供受体个数以及分子形状表述等。分子特征描述需要预先设计,只有选择了与目标性质具有相关性的描述,才可能获得合理的模型。从数据属性上,分子特征描述通常难以直接用于深度神经网络的搭建。相关技术中,尽管已经存在一些具有物理意义明确的分子特征描述,例如库伦矩阵(CM)、键包(BoB)、原子位置平滑重叠(SOAP)、原子中心对称函数(ACSF)和变形的径像函数等。然而,由于不仅可能存在经验性的超参数,而且由此建立的机器学习模型依然需要较大的参数空间以及与之对应的大量训练集数据,因此并不适合用于深度神经网络。In the process of computer-aided drug design, molecular feature description usually refers to the measurement of a certain property of a molecule, including the physical and chemical properties of the molecule and numerical indicators derived from various algorithms based on the molecular structure, such as molecular mass, The number of rings, the number of hydrogen bond donors and acceptors, and the expression of molecular shape, etc. Molecular feature description needs to be designed in advance, and a reasonable model can only be obtained if a description that is correlated with the target property is selected. In terms of data attributes, molecular feature description is usually difficult to directly use in the construction of deep neural networks. In related technologies, although there are already some molecular feature descriptions with clear physical meanings, such as Coulomb matrix (CM), bag of bonds (BoB), smooth overlapping of atomic positions (SOAP), atomic central symmetry function (ACSF) and deformed radial image function etc. However, since there may not only be empirical hyperparameters, but also the machine learning model established therefrom still requires a large parameter space and a large amount of corresponding training set data, it is not suitable for deep neural networks.
针对上述问题,本申请实施例提供一种获取分子特征描述的方法,能够速获取分子特征描述,在用于对深度神经网络进行训练时降低训练成本并提升训练效率。In view of the above problems, the embodiment of the present application provides a method for obtaining molecular feature descriptions, which can quickly obtain molecular feature descriptions, reduce training costs and improve training efficiency when used for training deep neural networks.
以下结合附图详细描述本申请实施例的技术方案。The technical solutions of the embodiments of the present application are described in detail below with reference to the accompanying drawings.
参见图1,是本申请实施例示出的获取分子特征描述的方法的流程示意图,该方法主要包括步骤S101至步骤S106,详细说明如下:Referring to FIG. 1 , it is a schematic flowchart of a method for obtaining molecular feature descriptions shown in the embodiment of the present application. The method mainly includes steps S101 to S106, and the details are as follows:
步骤S101:获取目标分子的结构特征值。Step S101: Acquiring the structural characteristic value of the target molecule.
在本申请实施例中,目标分子的结构特征值可以是位于第i个基函数χ slater,i和第j个基函数χ slater,j上属于目标分子的电子之间的空间相对位置r ij,其中,第i个基函数χ slater,i和第j个基函数χ slater,j为最小Slater基组的基函数。由于r ij是由分子的结构确定的参数,因此,当得到目标分子的结构即可获知r ij。需要说明的是,上述的电子之间的空间相对位置r ij,既可以是属于目标分子的同一原子的电子之间的空间相对位置,又可以是属于目标分子的不同原子的电子之间的空间相对位置。 In the embodiment of the present application, the structural feature value of the target molecule may be the spatial relative position r ij between the electrons belonging to the target molecule on the i-th basis function χ slater,i and the j-th basis function χ slater,j , Among them, the i-th basis function χ slater,i and the j-th basis function χ slater,j are the basis functions of the smallest Slater basis set. Since r ij is a parameter determined by the structure of the molecule, r ij can be known when the structure of the target molecule is obtained. It should be noted that the above-mentioned relative spatial position r ij between electrons can be the relative spatial position between electrons belonging to the same atom of the target molecule, or the space between electrons belonging to different atoms of the target molecule relative position.
步骤S102:基于最小Slater基组和目标分子的结构特征值,求取重叠矩阵。Step S102: Calculate an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule.
相应于目标分子的结构特征值可以包括位于第i个基函数χ slater,i和第j个基函数χ slater,j上属于目标分子的目标原子的电子之间的空间相对位置r ij这一实施例。作为本申请的一个实施例,基于最小Slater基组和目标分子的结构特征值,求取重叠矩阵可以是:以空间相对位置r ij为积分变量,对 第i个基函数χ slater,i和第j个基函数χ slater,j的乘积进行积分,得到重叠矩阵的元素S ij,即S ij=∫χ slater,iχ slater,jdr ij。随着i和j不同或者χ slater,i、χ slater,j和/或r ij的不同,可以求得不同的元素S ij,从而得到重叠矩阵S。 Structural feature values corresponding to the target molecule may include the implementation of the spatial relative position r ij between the electrons belonging to the target atom of the target molecule on the i-th basis function χ slater,i and the j-th basis function χ slater,j example. As an embodiment of the present application, based on the minimum Slater basis set and the structural eigenvalues of the target molecule, the calculation of the overlap matrix can be: using the spatial relative position r ij as the integral variable, for the i-th basis function χ slater,i and the i-th basis function χ slater,i The product of j basis functions χ slater,j is integrated to obtain the element S ij of the overlapping matrix, that is, S ij =∫χ slater,i χ slater,j dr ij . As i and j are different or χ slater,i , χ slater,j and/or r ij are different, different elements S ij can be obtained, thereby obtaining the overlapping matrix S.
步骤S103:基于重叠矩阵,采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数。Step S103: Based on the overlapping matrix, the wave function coefficients of the target molecule as a quantum system are solved by using a semi-empirical quantum mechanics method.
在本申请实施例中,半经验量子力学方法可以是扩展的Huckel方法(EHT),即一种非自洽的半经验量子力学方法。需要说明的是,亦可以可使用自洽的Huckel方法(包括最小基和常规基组)或其它半经验量子力学方法和高斯基组(STO-3G)相结合,例如DFTB、AM1、PM7等来求解目标分子作为量子系统时的波函数系数,以下仅以EHT为例进行说明。In the embodiment of the present application, the semi-empirical quantum mechanical method may be an extended Huckel method (EHT), that is, a non-self-consistent semi-empirical quantum mechanical method. It should be noted that it is also possible to use the self-consistent Huckel method (including minimal basis and conventional basis set) or other semi-empirical quantum mechanical methods combined with the Gaussian group (STO-3G), such as DFTB, AM1, PM7, etc. to To solve the wave function coefficient when the target molecule is a quantum system, the following only takes EHT as an example to illustrate.
具体地,作为本申请一个实施例,基于重叠矩阵,采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数可以是:按照公式
Figure PCTCN2021142379-appb-000003
计算半经验量子力学方法中单电子哈密顿矩阵H的元素H ij,再按照公式HC=SCe求解目标分子作为量子系统时的波函数系数C,此处,K为经验参数,A为原子编号,S ij为重叠矩阵S的元素,e为单电子哈密顿矩阵H的本征矩阵经对角化得到的本征矢对应能量矩阵。不同于通常的EHT求解单电子哈密顿矩阵H的过程是自洽的而需要多轮迭代求解,本申请计算半经验量子力学方法中单电子哈密顿矩阵H的元素H ij属于非自洽的半经验量子力学方法,因而能够大幅降低计算量消耗。
Specifically, as an embodiment of the present application, based on the overlapping matrix, the wave function coefficient when the target molecule is used as a quantum system is solved by using a semi-empirical quantum mechanics method can be: according to the formula
Figure PCTCN2021142379-appb-000003
Calculate the element H ij of the single-electron Hamiltonian matrix H in the semi-empirical quantum mechanics method, and then solve the wave function coefficient C when the target molecule is a quantum system according to the formula HC=SCe, where K is an empirical parameter, A is the atom number, S ij is the element of the overlapping matrix S, and e is the energy matrix corresponding to the eigenvector obtained by diagonalizing the eigenmatrix of the single-electron Hamiltonian matrix H. Different from the usual EHT process of solving the one-electron Hamiltonian matrix H, which is self-consistent and requires multiple rounds of iterative solutions, the element H ij of the single-electron Hamiltonian matrix H in the semi-empirical quantum mechanics method of this application belongs to non-self-consistent semi- Empirical quantum mechanics method, which can greatly reduce the calculation consumption.
步骤S104:根据波函数系数,求解目标分子作为量子系统时的密度矩阵。Step S104: Solve the density matrix when the target molecule is a quantum system according to the wave function coefficients.
具体地,可以首先对目标分子作为量子系统时的波函数系数C进行共轭转秩操作得到
Figure PCTCN2021142379-appb-000004
然后,按照公式
Figure PCTCN2021142379-appb-000005
求解目标分子作为量子系统时的密度矩阵D,此处,λ为轨道占据矩阵。
Specifically, it is possible to perform a conjugate rank conversion operation on the wave function coefficient C of the target molecule as a quantum system first to obtain
Figure PCTCN2021142379-appb-000004
Then, according to the formula
Figure PCTCN2021142379-appb-000005
Solve the density matrix D when the target molecule is a quantum system, where λ is the orbital occupation matrix.
步骤S105:对最小Slater基组正则化,得到变换矩阵。Step S105: Regularize the minimum Slater basis set to obtain a transformation matrix.
变换矩阵可以将目标分子作为量子系统时的密度矩阵D变换为另一矩阵形式。一种方法是使用占据数加权对称正交化方法对最小Slater基组进行正则化,得到变换矩阵T,即,记最小Slater基组为{χ slater},则通过 {χ slater} T=T{χ slater},即可求解得到变换矩阵T。需要说明的是,除了使用占据数加权对称正交化方法对最小Slater基组进行正则化,得到变换矩阵T之外,还可以使用施密特正交化方法等其他正则化方法对最小Slater基组进行正则化。此外,对最小Slater基组进行占据数加权对称正交化方法的正则化得到变换矩阵T可以使得目标分子的结构能够满足旋转不变性。 The transformation matrix can transform the density matrix D when the target molecule is a quantum system into another matrix form. One method is to use the occupancy number weighted symmetric orthogonalization method to regularize the minimum Slater basis set to obtain the transformation matrix T, that is, record the minimum Slater basis set as {χ slater }, then pass {χ slater } T = T{ χ slater }, then the transformation matrix T can be obtained. It should be noted that, in addition to regularizing the minimum Slater basis set using the occupancy weighted symmetric orthogonalization method to obtain the transformation matrix T, other regularization methods such as the Schmidt orthogonalization method can also be used to regularize the minimum Slater basis set group for regularization. In addition, the transformation matrix T obtained by regularizing the occupancy weighted symmetric orthogonalization method on the minimum Slater basis set can make the structure of the target molecule satisfy the rotation invariance.
步骤S106:根据密度矩阵和变换矩阵,获取目标分子的特征描述。Step S106: Obtain the characteristic description of the target molecule according to the density matrix and the transformation matrix.
具体而言,根据密度矩阵和变换矩阵,获取目标分子的特征描述可以通过步骤S1061至步骤S1063,说明如下:Specifically, according to the density matrix and the transformation matrix, the characteristic description of the target molecule can be obtained through steps S1061 to S1063, as described below:
步骤S1061:采用变换矩阵T,按照公式
Figure PCTCN2021142379-appb-000006
对密度矩阵D进行变换,得到变换后密度矩阵D T,其中,
Figure PCTCN2021142379-appb-000007
表示变换矩阵T的逆矩阵进行共轭转秩操作。
Step S1061: Using the transformation matrix T, according to the formula
Figure PCTCN2021142379-appb-000006
Transform the density matrix D to obtain the transformed density matrix D T , where,
Figure PCTCN2021142379-appb-000007
Indicates that the inverse matrix of the transformation matrix T performs the conjugate rank conversion operation.
步骤S1062:按照公式
Figure PCTCN2021142379-appb-000008
计算目标分子中编号为A的原子的电荷q A作为目标分子的第一特征描述,其中,
Figure PCTCN2021142379-appb-000009
为变换后密度矩阵D T的元素,Z A为编号为A的原子的有效核电荷。
Step S1062: according to the formula
Figure PCTCN2021142379-appb-000008
Calculate the charge q A of the atom numbered A in the target molecule as the first feature description of the target molecule, where,
Figure PCTCN2021142379-appb-000009
is the element of the transformed density matrix D T , and Z A is the effective nuclear charge of the atom numbered A.
步骤S1063:按照公式
Figure PCTCN2021142379-appb-000010
计算目标分子中编号为A的原子与编号为B的原子之间的键级BO AB作为目标分子的第二特征描述。
Step S1063: According to the formula
Figure PCTCN2021142379-appb-000010
Calculate the bond order BO AB between the atom numbered A and the atom numbered B in the target molecule as the second characteristic description of the target molecule.
显然,无论是电荷q A还是键级BO AB,都具有明确的物理意义,因而不仅可以为基于分子或晶体结构的图卷积神经网络提供目的意义明确的边描述,而且还可以降低分子或晶体的深度神经网络的参数空间和训练集数量,这对提升神经网络的训练效率或降低训练成本非常有利。 Obviously, both the charge q A and the bond level BO AB have clear physical meanings, so it can not only provide purposeful and meaningful edge descriptions for graph convolutional neural networks based on molecular or crystal structures, but also reduce molecular or crystal The parameter space and the number of training sets of the deep neural network are very beneficial to improve the training efficiency of the neural network or reduce the training cost.
下面以图2示出的键长
Figure PCTCN2021142379-appb-000011
键角R HOH=100.04°的水分子结构作为目标分子为例,给出上述本申请的技术方案在获取其分子特征描述过程中的波函数系数和密度矩阵等,以及最后求取的原子的电荷和键级等分子特征描述。
The bond length shown below in Figure 2
Figure PCTCN2021142379-appb-000011
The water molecular structure with bond angle R HOH = 100.04° is taken as an example of the target molecule, and the wave function coefficient and density matrix in the process of obtaining the molecular characteristic description of the technical solution of the above-mentioned application are given, as well as the final charge of the atom obtained and molecular characterization such as bond order.
图2示出的水分子作为量子系统时的波函数系数C为:The wave function coefficient C of the water molecule shown in Figure 2 as a quantum system is:
Figure PCTCN2021142379-appb-000012
Figure PCTCN2021142379-appb-000012
Figure PCTCN2021142379-appb-000013
Figure PCTCN2021142379-appb-000013
图2示出的水分子作为量子系统时的密度矩阵D为:The density matrix D of the water molecule shown in Figure 2 as a quantum system is:
Figure PCTCN2021142379-appb-000014
Figure PCTCN2021142379-appb-000014
变换矩阵T为:The transformation matrix T is:
Figure PCTCN2021142379-appb-000015
Figure PCTCN2021142379-appb-000015
变换后密度矩阵D T为: The transformed density matrix D T is:
Figure PCTCN2021142379-appb-000016
Figure PCTCN2021142379-appb-000016
最后得到的得到电荷q A为: The final charge q A obtained is:
q A=[q O,q H,q H]=[-0.366,0.183,0.183] q A =[q O ,q H ,q H ]=[-0.366,0.183,0.183]
以及键级BO AB为: and the key level BO AB is:
Figure PCTCN2021142379-appb-000017
Figure PCTCN2021142379-appb-000017
Figure PCTCN2021142379-appb-000018
Figure PCTCN2021142379-appb-000018
将图2示出的分子结构进行调整,即调整为键长变为
Figure PCTCN2021142379-appb-000019
键角R HOH=103.49°,如图3所示,给出上述本申请的技术方案在获取其分子特征描述过程中的波函数系数和密度矩阵等,以及最后求取的原子的电荷和键级等分子特征描述如下:
Adjust the molecular structure shown in Figure 2, that is, adjust the bond length to become
Figure PCTCN2021142379-appb-000019
Bond angle R HOH = 103.49°, as shown in Figure 3, the wave function coefficient and density matrix in the process of obtaining the molecular characteristic description of the above-mentioned technical scheme of the present application, as well as the final charge and bond level of the atoms obtained The isomolecular features are described as follows:
图3示出的水分子作为量子系统时的波函数系数C为:The wave function coefficient C of the water molecule shown in Figure 3 as a quantum system is:
Figure PCTCN2021142379-appb-000020
Figure PCTCN2021142379-appb-000020
图3示出的水分子作为量子系统时的密度矩阵D为:The density matrix D when the water molecule shown in Figure 3 is used as a quantum system is:
Figure PCTCN2021142379-appb-000021
Figure PCTCN2021142379-appb-000021
变换矩阵T为:The transformation matrix T is:
Figure PCTCN2021142379-appb-000022
Figure PCTCN2021142379-appb-000022
Figure PCTCN2021142379-appb-000023
Figure PCTCN2021142379-appb-000023
变换后密度矩阵D T为: The transformed density matrix D T is:
Figure PCTCN2021142379-appb-000024
Figure PCTCN2021142379-appb-000024
最后得到的得到电荷q A为: The final charge q A obtained is:
q A=[q O,q H,q H]=[-0.347,0.182,0.166] q A =[q O ,q H ,q H ]=[-0.347,0.182,0.166]
以及键级BO AB为: and the key level BO AB is:
Figure PCTCN2021142379-appb-000025
Figure PCTCN2021142379-appb-000025
从上述图1示例的技术方案可知,在获取目标分子的结构特征值后,基于最小Slater基组求取重叠矩阵,并且基于求取的重叠矩阵,采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数。由于是基于最小Slater基组,而且采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数,其过程是非自洽的即仅需单轮的矩阵本征值求解进行计算,因此能够大幅降低计算消耗,快速目标分子的特征描述,从而在用于对深度神经网络进行训练时降低训练成本并提升训练效率。From the technical solution illustrated in Figure 1 above, after obtaining the structural eigenvalues of the target molecule, the overlapping matrix is obtained based on the minimum Slater basis set, and based on the obtained overlapping matrix, the target molecule is solved as a quantum system by using a semi-empirical quantum mechanical method wave function coefficients when . Since it is based on the minimum Slater basis set, and uses the semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is a quantum system, the process is non-self-consistent, that is, only a single round of matrix eigenvalue calculation is required for calculation, so it can be greatly improved. Reduce computing consumption and quickly describe the characteristics of target molecules, thereby reducing training costs and improving training efficiency when used to train deep neural networks.
与前述应用功能实现方法实施例相对应,本申请还提供了一种获取分子特征描述的装置、电子设备及相应的实施例。Corresponding to the aforementioned embodiments of the method for implementing application functions, the present application also provides a device for obtaining molecular feature descriptions, electronic equipment, and corresponding embodiments.
参见图4是本申请实施例示出的获取分子特征描述的装置的结构示意图。为了便于说明,仅仅示出与本申请实施例相关的部分。图4示例的装置可以包括第一获取模块401、第一计算模块402、第二计算模块403、第 三计算模块404、正则化模块405和第二获取模块406,说明如下:Referring to FIG. 4 , it is a schematic structural diagram of an apparatus for obtaining molecular feature descriptions shown in an embodiment of the present application. For ease of description, only the parts related to the embodiment of the present application are shown. The device illustrated in Fig. 4 may include a first acquisition module 401, a first calculation module 402, a second calculation module 403, a third calculation module 404, a regularization module 405 and a second acquisition module 406, as follows:
第一获取模块401,用于获取目标分子的结构特征值;The first obtaining module 401 is used to obtain the structural characteristic value of the target molecule;
第一计算模块402,用于基于最小Slater基组和目标分子的结构特征值,求取重叠矩阵;The first calculation module 402 is used to obtain an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule;
第二计算模块403,用于基于重叠矩阵,采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数;The second calculation module 403 is used to solve the wave function coefficient when the target molecule is a quantum system based on the overlapping matrix and using a semi-empirical quantum mechanics method;
第三计算模块404,用于根据波函数系数,求解目标分子作为量子系统时的密度矩阵;The third calculation module 404 is used to solve the density matrix when the target molecule is a quantum system according to the wave function coefficient;
正则化模块405,用于对最小Slater基组正则化,得到变换矩阵;The regularization module 405 is used for regularizing the minimum Slater basis set to obtain a transformation matrix;
第二获取模块406,用于根据密度矩阵和变换矩阵,获取目标分子的特征描述。The second acquisition module 406 is configured to acquire the characteristic description of the target molecule according to the density matrix and the transformation matrix.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不再做详细阐述说明。Regarding the apparatus in the above embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
从图4示例的装置可知,在获取目标分子的结构特征值后,基于最小Slater基组求取重叠矩阵,并且基于求取的重叠矩阵,采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数。由于是基于最小Slater基组,而且采用半经验量子力学方法求解目标分子作为量子系统时的波函数系数,其过程是非自洽的即仅需单轮的矩阵本征值求解进行计算,因此能够大幅降低计算消耗,快速目标分子的特征描述,从而在用于对深度神经网络进行训练时降低训练成本并提升训练效率。It can be seen from the device shown in Figure 4 that after obtaining the structural eigenvalues of the target molecule, the overlap matrix is obtained based on the minimum Slater basis set, and based on the obtained overlap matrix, the semi-empirical quantum mechanics method is used to solve the problem of the target molecule as a quantum system. wave function coefficients. Since it is based on the minimum Slater basis set, and uses the semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is a quantum system, the process is non-self-consistent, that is, only a single round of matrix eigenvalue calculation is required for calculation, so it can be greatly improved. Reduce computing consumption and quickly describe the characteristics of target molecules, thereby reducing training costs and improving training efficiency when used to train deep neural networks.
可选地,上述示例的目标分子的结构特征值包括位于第i个基函数χ slater,i和第j个基函数χ slater,j上属于目标分子的目标原子的电子之间的空间相对位置r ij,第i个基函数χ slater,i和第j个基函数χ slater,j为最小Slater基组的基函数。 Optionally, the structural characteristic value of the target molecule in the above example includes the spatial relative position r between the electrons belonging to the target atom of the target molecule on the i-th basis function χ slater,i and the j-th basis function χ slater,j ij , the i-th basis function χ slater,i and the j-th basis function χ slater,j are the basis functions of the smallest Slater basis set.
可选地,图4示例的第一计算模块402可以包括积分单元,用于以位于第i个基函数χ slater,i和第j个基函数χ slater,j上属于目标分子的目标原子的电子之间的空间相对位置r ij为积分变量,对第i个基函数χ slater,i和第j个基函数χ slater,j的乘积进行积分,得到重叠矩阵的元素S ijOptionally, the first calculation module 402 illustrated in FIG. 4 may include an integration unit for taking the electrons belonging to the target atom of the target molecule on the i-th basis function χ slater,i and the j-th basis function χ slater,j The spatial relative position r ij between is the integration variable, and the product of the i-th basis function χ slater,i and the j-th basis function χ slater,j is integrated to obtain the element S ij of the overlap matrix.
可选地,图4示例的第二计算模块403可以包括矩阵元素计算单元和波函数系数计算单元,其中:Optionally, the second calculation module 403 illustrated in FIG. 4 may include a matrix element calculation unit and a wave function coefficient calculation unit, wherein:
矩阵元素计算单元,用于按照公式Matrix element calculation unit for following the formula
Figure PCTCN2021142379-appb-000026
Figure PCTCN2021142379-appb-000026
计算半经验量子力学方法中单电子哈密顿矩阵H的元素H ij,其中,K为经验参数,A为原子编号,S ij为重叠矩阵S的元素; Calculate the element H ij of the one-electron Hamiltonian matrix H in the semi-empirical quantum mechanical method, where K is the empirical parameter, A is the atom number, and S ij is the element of the overlapping matrix S;
波函数系数计算单元,用于按照公式HC=SCe求解目标分子作为量子系统时的波函数系数C,其中,e为单电子哈密顿矩阵H的本征矩阵经对角化得到的本征矢对应能量矩阵。The wave function coefficient calculation unit is used to solve the wave function coefficient C when the target molecule is used as a quantum system according to the formula HC=SCe, wherein, e is the energy corresponding to the eigenvector obtained by diagonalizing the eigenmatrix of the single-electron Hamiltonian matrix H matrix.
可选地,图4示例的第三计算模块404可以包括共轭转秩计算单元和密度矩阵计算单元,其中:Optionally, the third calculation module 404 illustrated in FIG. 4 may include a conjugate conversion rank calculation unit and a density matrix calculation unit, wherein:
共轭转秩计算单元,用于对波函数系数C进行共轭转秩操作得到
Figure PCTCN2021142379-appb-000027
The conjugate conversion rank calculation unit is used to perform the conjugate conversion rank operation on the wave function coefficient C to obtain
Figure PCTCN2021142379-appb-000027
密度矩阵计算单元,用于按照公式
Figure PCTCN2021142379-appb-000028
求解目标分子作为量子系统时的密度矩阵D,其中,λ为轨道占据矩阵。
Density matrix calculation unit for following the formula
Figure PCTCN2021142379-appb-000028
Solve the density matrix D when the target molecule is a quantum system, where λ is the orbital occupation matrix.
可选地,图4示例的正则化模块405可以包括对称正交化单元,用于使用占据数加权对称正交化方法对最小Slater基组进行正则化,得到变换矩阵T。Optionally, the regularization module 405 shown in FIG. 4 may include a symmetric orthogonalization unit, configured to regularize the minimum Slater basis set using an occupancy weighted symmetric orthogonalization method to obtain a transformation matrix T.
可选地,图4示例的第二获取模块406可以包括矩阵变换单元、第一特征描述计算单元和第二特征描述计算单元,其中:Optionally, the second acquisition module 406 illustrated in FIG. 4 may include a matrix transformation unit, a first feature description calculation unit, and a second feature description calculation unit, wherein:
矩阵变换单元,用于采用变换矩阵T,按照公式
Figure PCTCN2021142379-appb-000029
对密度矩阵D进行变换,得到变换后密度矩阵D T,其中,
Figure PCTCN2021142379-appb-000030
表示变换矩阵T的逆矩阵进行共轭转秩操作;
The matrix transformation unit is used to adopt the transformation matrix T according to the formula
Figure PCTCN2021142379-appb-000029
Transform the density matrix D to obtain the transformed density matrix D T , where,
Figure PCTCN2021142379-appb-000030
Indicates that the inverse matrix of the transformation matrix T performs the conjugate rank conversion operation;
第一特征描述计算单元,用于按照公式
Figure PCTCN2021142379-appb-000031
计算目标分子中编号为A的原子的电荷q A作为目标分子的第一特征描述,其中,
Figure PCTCN2021142379-appb-000032
为变换后密度矩阵D T的元素,Z A为编号为A的原子的有效核电荷;
The first feature describes the computational unit for following the formula
Figure PCTCN2021142379-appb-000031
Calculate the charge q A of the atom numbered A in the target molecule as the first feature description of the target molecule, where,
Figure PCTCN2021142379-appb-000032
is the element of the transformed density matrix D T , Z A is the effective nuclear charge of the atom numbered A;
第二特征描述计算单元,用于按照公式
Figure PCTCN2021142379-appb-000033
计算目标分子中编号为A的原子与编号为B的原子之间的键级BO AB作为目标分子的第二特征描述。
The second feature describes the computational unit for following the formula
Figure PCTCN2021142379-appb-000033
Calculate the bond order BO AB between the atom numbered A and the atom numbered B in the target molecule as the second characteristic description of the target molecule.
参见图5,是本申请实施例示出的电子设备的结构示意图。该电子设备500包括存储器510和处理器520。Referring to FIG. 5 , it is a schematic structural diagram of an electronic device shown in an embodiment of the present application. The electronic device 500 includes a memory 510 and a processor 520 .
处理器520可以是中央处理单元(Central Processing Unit,CPU), 还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 520 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
存储器510可以包括各种类型的存储单元,例如系统内存、只读存储器(ROM),和永久存储装置。其中,ROM可以存储处理器520或者计算机的其他模块需要的静态数据或者指令。永久存储装置可以是可读写的存储装置。永久存储装置可以是即使计算机断电后也不会失去存储的指令和数据的非易失性存储设备。在一些实施方式中,永久性存储装置采用大容量存储装置(例如磁或光盘、闪存)作为永久存储装置。另外一些实施方式中,永久性存储装置可以是可移除的存储设备(例如软盘、光驱)。系统内存可以是可读写存储设备或者易失性可读写存储设备,例如动态随机访问内存。系统内存可以存储一些或者所有处理器在运行时需要的指令和数据。此外,存储器510可以包括任意计算机可读存储媒介的组合,包括各种类型的半导体存储芯片(DRAM,SRAM,SDRAM,闪存,可编程只读存储器),磁盘和/或光盘也可以采用。在一些实施方式中,存储器510可以包括可读和/或写的可移除的存储设备,例如激光唱片(CD)、只读数字多功能光盘(例如DVD-ROM,双层DVD-ROM)、只读蓝光光盘、超密度光盘、闪存卡(例如SD卡、min SD卡、Micro-SD卡等等)、磁性软盘等等。计算机可读存储媒介不包含载波和通过无线或有线传输的瞬间电子信号。The memory 510 may include various types of storage units, such as system memory, read only memory (ROM), and persistent storage. Wherein, the ROM can store static data or instructions required by the processor 520 or other modules of the computer. The persistent storage device may be a readable and writable storage device. Persistent storage may be a non-volatile storage device that does not lose stored instructions and data even if the computer is powered off. In some embodiments, the permanent storage device adopts a mass storage device (such as a magnetic or optical disk, flash memory) as the permanent storage device. In some other implementations, the permanent storage device may be a removable storage device (such as a floppy disk, an optical drive). The system memory can be a readable and writable storage device or a volatile readable and writable storage device, such as dynamic random access memory. System memory can store some or all of the instructions and data that the processor needs at runtime. In addition, memory 510 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic disks and/or optical disks may also be used. In some embodiments, memory 510 may include a readable and/or writable removable storage device, such as a compact disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual-layer DVD-ROM), Read-Only Blu-ray Disc, Super Density Disc, Flash memory card (such as SD card, min SD card, Micro-SD card, etc.), magnetic floppy disk, etc. Computer-readable storage media do not contain carrier waves and transient electronic signals transmitted by wireless or wire.
存储器510上存储有可执行代码,当可执行代码被处理器520处理时,可以使处理器520执行上文述及的方法中的部分或全部。Executable codes are stored in the memory 510 , and when the executable codes are processed by the processor 520 , the processor 520 can be made to execute part or all of the methods mentioned above.
此外,根据本申请的方法还可以实现为一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品包括用于执行本申请的上述方法中部分或全部步骤的计算机程序代码指令。In addition, the method according to the present application can also be implemented as a computer program or computer program product, the computer program or computer program product including computer program code instructions for executing some or all of the steps in the above method of the present application.
或者,本申请还可以实施为一种存储介质,包括非暂时性机器可读存储介质、计算机可读存储介质或机器可读存储介质,其上存储有可执行代码(或计算机程序、或计算机指令代码),当可执行代码(或计算机程序、 或计算机指令代码)被电子设备(或电子设备、服务器等)的处理器执行时,使处理器执行根据本申请的上述方法的各个步骤的部分或全部。Alternatively, the present application may also be implemented as a storage medium, including a non-transitory machine-readable storage medium, a computer-readable storage medium, or a machine-readable storage medium, on which executable code (or computer program, or computer instruction) is stored. code), when the executable code (or computer program, or computer instruction code) is executed by the processor of the electronic device (or electronic device, server, etc.), causing the processor to perform part or part of each step of the above-mentioned method according to the present application all.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims (16)

  1. 一种获取分子特征描述的方法,其特征在于,所述方法包括:A method for obtaining molecular characterization, characterized in that the method comprises:
    获取目标分子的结构特征值;Obtain the structural eigenvalues of the target molecule;
    基于最小Slater基组和所述目标分子的结构特征值,求取重叠矩阵;Obtaining an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule;
    基于所述重叠矩阵,采用半经验量子力学方法求解所述目标分子作为量子系统时的波函数系数;Based on the overlapping matrix, using a semi-empirical quantum mechanics method to solve the wave function coefficient when the target molecule is used as a quantum system;
    根据所述波函数系数,求解所述目标分子作为量子系统时的密度矩阵;Solving the density matrix of the target molecule as a quantum system according to the wave function coefficients;
    对所述最小Slater基组正则化,得到变换矩阵;Regularizing the minimum Slater basis set to obtain a transformation matrix;
    根据所述密度矩阵和变换矩阵,获取所述目标分子的特征描述。According to the density matrix and transformation matrix, the characteristic description of the target molecule is obtained.
  2. 根据权利要求1所述的获取分子特征描述的方法,其特征在于,所述目标分子的结构特征值包括位于第i个基函数χ slater,i和第j个基函数χ slater,j上属于所述目标分子的目标原子的电子之间的空间相对位置r ij,所述第i个基函数χ slater,i和第j个基函数χ slater,j为所述最小Slater基组的基函数。 The method for obtaining molecular feature description according to claim 1, wherein the structural feature value of the target molecule includes the i-th basis function χ slater,i and the j-th basis function χ slater,j belonging to the The spatial relative position r ij between the electrons of the target atom of the target molecule, the i-th basis function χ slater,i and the j-th basis function χ slater,j are the basis functions of the minimum Slater basis set.
  3. 根据权利要求2所述的获取分子特征描述的方法,其特征在于,所述基于最小Slater基组和所述目标分子的结构特征值,求取重叠矩阵,包括:The method for obtaining molecular feature descriptions according to claim 2, wherein the described method of obtaining an overlapping matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule comprises:
    以所述空间相对位置r ij为积分变量,对第i个基函数χ slater,i和第j个基函数χ slater,j的乘积进行积分,得到所述重叠矩阵的元素S ijTaking the spatial relative position r ij as the integration variable, integrating the product of the i-th basis function χ slater,i and the j-th basis function χ slater,j to obtain the element S ij of the overlap matrix.
  4. 根据权利要求1所述的获取分子特征描述的方法,其特征在于,所述基于所述重叠矩阵,采用半经验量子力学方法求解所述目标分子作为量子系统时的波函数系数,包括:The method for obtaining molecular feature description according to claim 1, wherein, based on the overlapping matrix, the semi-empirical quantum mechanics method is used to solve the wave function coefficient of the target molecule as a quantum system, including:
    按照公式according to the formula
    Figure PCTCN2021142379-appb-100001
    Figure PCTCN2021142379-appb-100001
    计算所述半经验量子力学方法中单电子哈密顿矩阵H的元素H ij,所述K为经验参数,所述A为原子编号,所述S ij为所述重叠矩阵S的元素; Calculate the element H ij of the single-electron Hamiltonian matrix H in the semi-empirical quantum mechanical method, the K is an empirical parameter, the A is the atom number, and the S ij is an element of the overlapping matrix S;
    按照公式HC=SCe求解所述目标分子作为量子系统时的波函数系数C,所述e为单电子哈密顿矩阵H的本征矩阵经对角化得到的本征矢对应能 量矩阵。According to the formula HC=SCe, the wave function coefficient C when the target molecule is used as a quantum system is solved, and the e is the energy matrix corresponding to the eigenvector obtained by diagonalizing the eigenmatrix of the single-electron Hamiltonian matrix H.
  5. 根据权利要求1所述的获取分子特征描述的方法,其特征在于,所述根据所述波函数系数,求解所述目标分子作为量子系统时的密度矩阵,包括:The method for obtaining molecular feature description according to claim 1, wherein, according to the wave function coefficient, solving the density matrix of the target molecule as a quantum system comprises:
    对所述波函数系数C进行共轭转秩操作得到
    Figure PCTCN2021142379-appb-100002
    Perform the conjugate conversion operation on the wave function coefficient C to obtain
    Figure PCTCN2021142379-appb-100002
    按照公式
    Figure PCTCN2021142379-appb-100003
    求解所述目标分子作为量子系统时的密度矩阵D,所述λ为轨道占据矩阵。
    according to the formula
    Figure PCTCN2021142379-appb-100003
    solving the density matrix D when the target molecule is a quantum system, and the λ is an orbital occupancy matrix.
  6. 根据权利要求1至5任意一项所述的获取分子特征描述的方法,其特征在于,所述对所述最小Slater基组正则化,得到变换矩阵,包括:The method for obtaining molecular feature description according to any one of claims 1 to 5, wherein the regularization to the minimum Slater basis set obtains a transformation matrix, comprising:
    使用占据数加权对称正交化方法对所述最小Slater基组进行正则化,得到所述变换矩阵T。The minimum Slater basis set is regularized by using an occupancy weighted symmetric orthogonalization method to obtain the transformation matrix T.
  7. 根据权利要求6所述的获取分子特征描述的方法,其特征在于,所述根据所述密度矩阵和变换矩阵,获取所述目标分子的特征描述,包括:The method for obtaining molecular feature description according to claim 6, wherein said obtaining the feature description of said target molecule according to said density matrix and transformation matrix comprises:
    采用所述变换矩阵T,按照公式
    Figure PCTCN2021142379-appb-100004
    对所述密度矩阵D进行变换,得到变换后密度矩阵D T,所述
    Figure PCTCN2021142379-appb-100005
    表示所述变换矩阵T的逆矩阵进行共轭转秩操作;
    Using the transformation matrix T, according to the formula
    Figure PCTCN2021142379-appb-100004
    Transform the density matrix D to obtain the transformed density matrix D T , the
    Figure PCTCN2021142379-appb-100005
    Indicates that the inverse matrix of the transformation matrix T performs a conjugate rank conversion operation;
    按照公式
    Figure PCTCN2021142379-appb-100006
    计算所述目标分子中编号为A的原子的电荷q A作为所述目标分子的第一特征描述,所述
    Figure PCTCN2021142379-appb-100007
    为所述变换后密度矩阵D T的元素,所述Z A为所述编号为A的原子的有效核电荷;
    according to the formula
    Figure PCTCN2021142379-appb-100006
    calculating the charge q A of the atom numbered A in the target molecule as the first characteristic description of the target molecule, the
    Figure PCTCN2021142379-appb-100007
    is the element of the transformed density matrix D T , and the Z A is the effective nuclear charge of the atom numbered A;
    按照公式
    Figure PCTCN2021142379-appb-100008
    计算所述目标分子中编号为A的原子与编号为B的原子之间的键级BO AB作为所述目标分子的第二特征描述。
    according to the formula
    Figure PCTCN2021142379-appb-100008
    Calculate the bond order BO AB between the atom numbered A and the atom numbered B in the target molecule as the second characteristic description of the target molecule.
  8. 一种获取分子特征描述的装置,其特征在于,所述装置包括:A device for obtaining molecular feature description, characterized in that the device comprises:
    第一获取模块,用于获取目标分子的结构特征值;The first obtaining module is used to obtain the structural characteristic value of the target molecule;
    第一计算模块,用于基于最小Slater基组和所述目标分子的结构特征值,求取重叠矩阵;The first calculation module is used to obtain an overlap matrix based on the minimum Slater basis set and the structural eigenvalues of the target molecule;
    第二计算模块,用于基于所述重叠矩阵,采用半经验量子力学方法求解所述目标分子作为量子系统时的波函数系数;The second calculation module is used to solve the wave function coefficient of the target molecule as a quantum system by using a semi-empirical quantum mechanical method based on the overlapping matrix;
    第三计算模块,用于根据所述波函数系数,求解所述目标分子作为量子系统时的密度矩阵;The third calculation module is used to solve the density matrix of the target molecule as a quantum system according to the wave function coefficient;
    正则化模块,用于对所述最小Slater基组正则化,得到变换矩阵;A regularization module, configured to regularize the minimum Slater basis set to obtain a transformation matrix;
    第二获取模块,用于根据所述密度矩阵和变换矩阵,获取所述目标分子的特征描述。The second obtaining module is used to obtain the characteristic description of the target molecule according to the density matrix and the transformation matrix.
  9. 根据权利要求8所述的获取分子特征描述的装置,其特征在于,所述目标分子的结构特征值包括位于第i个基函数χ slater,i和第j个基函数χ slater,j上属于所述目标分子的目标原子的电子之间的空间相对位置r ij,所述第i个基函数χ slater,i和第j个基函数χ slater,j为所述最小Slater基组的基函数。 The device for obtaining molecular feature descriptions according to claim 8, wherein the structural feature values of the target molecule include the i-th basis function χ slater,i and the j-th basis function χ slater,j belonging to the The spatial relative position r ij between the electrons of the target atom of the target molecule, the i-th basis function χ slater,i and the j-th basis function χ slater,j are the basis functions of the minimum Slater basis set.
  10. 根据权利要求9所述的获取分子特征描述的装置,其特征在于,所述第一计算模块,包括:The device for obtaining molecular feature descriptions according to claim 9, wherein the first calculation module includes:
    积分单元,用于以所述空间相对位置r ij为积分变量,对第i个基函数χ slater,i和第j个基函数χ slater,j的乘积进行积分,得到所述重叠矩阵的元素S ijIntegral unit, for taking described spatial relative position r ij as integral variable, integrate the product of the i-th basis function χ slater,i and the j-th basis function χ slater,j , to obtain the element S of the overlapping matrix ij .
  11. 根据权利要求8所述的获取分子特征描述的装置,其特征在于,所述第二计算模块包括:The device for obtaining molecular feature descriptions according to claim 8, wherein the second calculation module comprises:
    矩阵元素计算单元,用于按照公式Matrix element calculation unit for following the formula
    Figure PCTCN2021142379-appb-100009
    Figure PCTCN2021142379-appb-100009
    计算所述半经验量子力学方法中单电子哈密顿矩阵H的元素H ij,所述K为经验参数,所述A为原子编号,所述S ij为所述重叠矩阵S的元素; Calculate the element H ij of the single-electron Hamiltonian matrix H in the semi-empirical quantum mechanical method, the K is an empirical parameter, the A is the atom number, and the S ij is an element of the overlapping matrix S;
    波函数系数计算单元,用于按照公式HC=SCe求解所述目标分子作为量子系统时的波函数系数C,所述e为单电子哈密顿矩阵H的本征矩阵经对角化得到的本征矢对应能量矩阵。The wave function coefficient calculation unit is used to solve the wave function coefficient C when the target molecule is used as a quantum system according to the formula HC=SCe, and the e is the eigenvector obtained by diagonalizing the eigenmatrix of the single-electron Hamiltonian matrix H corresponding to the energy matrix.
  12. 根据权利要求8所述的获取分子特征描述的装置,其特征在于,所述第三计算模块包括:The device for obtaining molecular feature descriptions according to claim 8, wherein the third computing module comprises:
    共轭转秩计算单元,用于对所述波函数系数C进行共轭转秩操作得到
    Figure PCTCN2021142379-appb-100010
    A conjugate conversion rank calculation unit, configured to perform a conjugate conversion rank operation on the wave function coefficient C to obtain
    Figure PCTCN2021142379-appb-100010
    密度矩阵计算单元,用于按照公式
    Figure PCTCN2021142379-appb-100011
    求解所述目标分子作为量子系统时的密度矩阵D,所述λ为轨道占据矩阵。
    Density matrix calculation unit for following the formula
    Figure PCTCN2021142379-appb-100011
    solving the density matrix D when the target molecule is a quantum system, and the λ is an orbital occupancy matrix.
  13. 根据权利要求8至12任意一项所述的获取分子特征描述的装置,其特征在于,所述正则化模块包括:The device for obtaining molecular feature description according to any one of claims 8 to 12, wherein the regularization module comprises:
    对称正交化单元,用于使用占据数加权对称正交化方法对所述最小Slater基组进行正则化,得到所述变换矩阵T。A symmetric orthogonalization unit, configured to regularize the minimum Slater basis set by using an occupancy weighted symmetric orthogonalization method to obtain the transformation matrix T.
  14. 根据权利要求13所述的获取分子特征描述的装置,其特征在于,所述第二获取模块包括:The device for obtaining molecular feature descriptions according to claim 13, wherein the second obtaining module comprises:
    矩阵变换单元,用于采用所述变换矩阵T,按照公式
    Figure PCTCN2021142379-appb-100012
    对所述密度矩阵D进行变换,得到变换后密度矩阵D T,所述
    Figure PCTCN2021142379-appb-100013
    表示所述变换矩阵T的逆矩阵进行共轭转秩操作;
    A matrix transformation unit, configured to adopt the transformation matrix T according to the formula
    Figure PCTCN2021142379-appb-100012
    Transform the density matrix D to obtain the transformed density matrix D T , the
    Figure PCTCN2021142379-appb-100013
    Indicates that the inverse matrix of the transformation matrix T performs a conjugate rank conversion operation;
    第一特征描述计算单元,用于按照公式
    Figure PCTCN2021142379-appb-100014
    计算所述目标分子中编号为A的原子的电荷q A作为所述目标分子的第一特征描述,所述
    Figure PCTCN2021142379-appb-100015
    为所述变换后密度矩阵D T的元素,所述Z A为所述编号为A的原子的有效核电荷;
    The first feature describes the computational unit for following the formula
    Figure PCTCN2021142379-appb-100014
    calculating the charge q A of the atom numbered A in the target molecule as the first characteristic description of the target molecule, the
    Figure PCTCN2021142379-appb-100015
    is the element of the transformed density matrix D T , and the Z A is the effective nuclear charge of the atom numbered A;
    第二特征描述计算单元,用于按照公式
    Figure PCTCN2021142379-appb-100016
    计算所述目标分子中编号为A的原子与编号为B的原子之间的键级BO AB作为所述目标分子的第二特征描述。
    The second feature describes the computational unit for following the formula
    Figure PCTCN2021142379-appb-100016
    Calculate the bond order BO AB between the atom numbered A and the atom numbered B in the target molecule as the second characteristic description of the target molecule.
  15. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;以及processor; and
    存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求1至7中任一项所述的方法。A memory on which executable code is stored, and when the executable code is executed by the processor, causes the processor to execute the method according to any one of claims 1 to 7.
  16. 一种存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求1至7中任一项所述的方法。A storage medium, on which executable codes are stored, and when the executable codes are executed by a processor of an electronic device, the processor is made to execute the method according to any one of claims 1 to 7.
PCT/CN2021/142379 2021-12-29 2021-12-29 Method and apparatus for acquiring feature description of molecule, and storage medium WO2023123021A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/142379 WO2023123021A1 (en) 2021-12-29 2021-12-29 Method and apparatus for acquiring feature description of molecule, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/142379 WO2023123021A1 (en) 2021-12-29 2021-12-29 Method and apparatus for acquiring feature description of molecule, and storage medium

Publications (1)

Publication Number Publication Date
WO2023123021A1 true WO2023123021A1 (en) 2023-07-06

Family

ID=86996771

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/142379 WO2023123021A1 (en) 2021-12-29 2021-12-29 Method and apparatus for acquiring feature description of molecule, and storage medium

Country Status (1)

Country Link
WO (1) WO2023123021A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1010094A1 (en) * 1997-09-03 2000-06-21 Commonwealth Scientific And Industrial Research Organisation Compound screening system
CN112185477A (en) * 2020-09-25 2021-01-05 北京望石智慧科技有限公司 Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship
US20210125095A1 (en) * 2019-10-25 2021-04-29 Samsung Electronics Co., Ltd. Analysis method and analysis system
JP2021117798A (en) * 2020-01-28 2021-08-10 国立大学法人山形大学 Molecular design support system, method for predicting molecular characteristic value, and molecular design support program
WO2021159744A1 (en) * 2020-09-27 2021-08-19 平安科技(深圳)有限公司 Medicine classification method and apparatus, terminal device, and storage medium
CN113409893A (en) * 2021-06-25 2021-09-17 成都职业技术学院 Molecular feature extraction and performance prediction method based on image convolution

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1010094A1 (en) * 1997-09-03 2000-06-21 Commonwealth Scientific And Industrial Research Organisation Compound screening system
US20210125095A1 (en) * 2019-10-25 2021-04-29 Samsung Electronics Co., Ltd. Analysis method and analysis system
JP2021117798A (en) * 2020-01-28 2021-08-10 国立大学法人山形大学 Molecular design support system, method for predicting molecular characteristic value, and molecular design support program
CN112185477A (en) * 2020-09-25 2021-01-05 北京望石智慧科技有限公司 Method and device for extracting molecular characteristics and calculating three-dimensional quantitative structure-activity relationship
WO2021159744A1 (en) * 2020-09-27 2021-08-19 平安科技(深圳)有限公司 Medicine classification method and apparatus, terminal device, and storage medium
CN113409893A (en) * 2021-06-25 2021-09-17 成都职业技术学院 Molecular feature extraction and performance prediction method based on image convolution

Similar Documents

Publication Publication Date Title
US10282465B2 (en) Systems, apparatuses, and methods for deep learning of feature detectors with sparse coding
Jiang et al. Xnor-pop: A processing-in-memory architecture for binary convolutional neural networks in wide-io2 drams
CN104239351B (en) A kind of training method and device of the machine learning model of user behavior
Hussain et al. Machine learning classification of texture features of MRI breast tumor and peri-tumor of combined pre-and early treatment predicts pathologic complete response
Wu et al. Scalable global alignment graph kernel using random features: From node embedding to graph embedding
Dong et al. Semi-supervised classification method through oversampling and common hidden space
WO2022007349A1 (en) Neural network tuning method and system, terminal, and storage medium
CN105681052B (en) A kind of power-economizing method for the storage of data center's distributed document
US20210406646A1 (en) Method, accelerator, and electronic device with tensor processing
US20240004809A1 (en) Accelerator, method of operating an accelerator, and electronic device including an accelerator
CN114334018A (en) Method, device and storage medium for obtaining molecular feature description
Berlyand et al. Solutions with vortices of a semi-stiff boundary value problem for the Ginzburg–Landau equation
WO2023123021A1 (en) Method and apparatus for acquiring feature description of molecule, and storage medium
Aizenbud et al. A max-cut approach to heterogeneity in cryo-electron microscopy
Wang et al. Nearest Neighbor with Double Neighborhoods Algorithm for Imbalanced Classification.
Huang et al. Enhanced MRI reconstruction network using neural architecture search
Wang et al. Fast graph condensation with structure-based neural tangent kernel
CN103366348B (en) A kind of method and treatment facility suppressing skeletal image in X-ray image
Ollivier et al. PIRM: Processing in racetrack memories
Guo et al. Adaptive SV-Borderline SMOTE-SVM algorithm for imbalanced data classification
Du et al. An adaptive deep metric learning loss function for class-imbalance learning via intraclass diversity and interclass distillation
Zhou et al. A decoupling and bidirectional resampling method for multilabel classification of imbalanced data with label concurrence
CN102541813B (en) Method and corresponding device for multi-granularity parallel FFT (Fast Fourier Transform) butterfly computation
Si et al. Circuit design challenges in computing-in-memory for AI edge devices
Chen et al. Efficient process-in-memory architecture design for unsupervised gan-based deep learning using reram

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21969396

Country of ref document: EP

Kind code of ref document: A1