CN117649676A

CN117649676A - Chemical structural formula identification method based on deep learning model

Info

Publication number: CN117649676A
Application number: CN202410120876.3A
Authority: CN
Inventors: 唐博文; 牛张明; 张龙; 王晓枫; 马超; 黄俊杰; 晋旭锐; 江荧辉; 肖祥路
Original assignee: Hangzhou Derizhi Pharmaceutical Technology Co ltd
Current assignee: Hangzhou Derizhi Pharmaceutical Technology Co ltd
Priority date: 2024-01-29
Filing date: 2024-01-29
Publication date: 2024-03-05

Abstract

The invention relates to a method for identifying a chemical structural formula based on a deep learning model, which comprises the following steps: manufacturing a training image with a chemical structural formula and a corresponding image label as a training sample set; constructing a recognition model based on deep learning, wherein the recognition model comprises an encoder, a decoder and a key predictor; pre-training the recognition model by using a training sample set, inputting a training image into an encoder to obtain corresponding image feature vectors in the pre-training process, inputting the corresponding image feature vectors into a decoder and a key predictor to obtain prediction information, wherein the prediction information comprises geometric information of chemical structural formula atoms and keys, and comparing and adjusting model parameters by using image labels and the prediction information; acquiring a document image containing a target chemical structural formula, inputting the document image into a recognition model which is trained in advance, and outputting predicted geometrical information of atoms and keys; chemical structural formulas are identified and output based on predicted atom, bond geometry information.

Description

Chemical structural formula identification method based on deep learning model

Technical Field

The invention relates to a chemical structural formula identification method based on a deep learning model, and belongs to the technical field of chemical structural formula identification.

Background

Scientific literature often describes molecules and reactions in the form of information charts, which molecules are often drawn as two-dimensional images. These drawings exhibit a variety of styles, which complicate the task of converting these images into machine-readable molecular structures. Molecular structure recognition is the task of converting a molecular image into its graphic structure, a defect of existing models such as MolVec, img2 Mol:

1) Because of the variety of drawing styles in molecular images, new rules need to be written to cover edge conditions (such as bridging), and because the operation is that the SMILES character string rather than clearly identifying atoms and chemical bonds, chemical constraints are difficult to be integrated, and the accuracy of extracting chemical structures in the images of various journal articles is still low. When stereochemistry, functional group abbreviations and different drawing styles are involved, the model cannot recognize molecular structures with non-atomic symbols.

2) The non-chemical structure in the image cannot be distinguished, and the non-chemical structure is usually identified as a chemical structure, so that erroneous judgment is caused.

3) It is possible that 2 or more structural formulas overlap together in a segmented image, while small image disturbances may result in a model that is not identifiable accurately.

All of the above drawbacks increase the difficulty and cost of converting a structural image into chemical structural data.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a method for identifying a chemical structural formula based on a deep learning model.

The technical scheme of the invention is as follows:

in one aspect, the invention provides a method for identifying a chemical structural formula based on a deep learning model, which comprises the following steps:

manufacturing a training image with a chemical structural formula and a corresponding image label as a training sample set;

constructing a recognition model based on deep learning, wherein the recognition model comprises an encoder, a decoder and a key predictor; pre-training the recognition model by using a training sample set, inputting a training image into an encoder to obtain corresponding image feature vectors in the pre-training process, inputting the corresponding image feature vectors into a decoder and a key predictor to obtain prediction information, wherein the prediction information comprises geometric information of chemical structural formula atoms and keys, and comparing and adjusting model parameters by using image labels and the prediction information;

acquiring a document image containing a target chemical structural formula, inputting the document image into a recognition model which is trained in advance, and outputting predicted geometrical information of atoms and keys;

chemical structural formulas are identified and output based on predicted atom, bond geometry information.

As a preferred embodiment, the decoder and key predictor are trained by the method of teacher mapping during the pre-training, with the decoder taking the actual token as input in each iteration and predicting the next step under the previous token conditions; the key predictor takes as input the hidden state of the decoder and predicts the keys between each pair of atoms.

As a preferred embodiment, the step of identifying and outputting the chemical structural formula based on the predicted atom and bond geometry information specifically includes:

establishing a molecular structure and corresponding coordinates based on the predicted geometrical information of atoms and bonds;

reconstructing and outputting standardized molecules based on the molecular structure and corresponding coordinates;

judging whether the output standardized molecule is an effective molecular structure;

and if the molecular structure is effective, outputting the corresponding molecular structure.

In the step of reconstructing and outputting the standardized molecule based on the molecular structure and the corresponding coordinates, as a preferred embodiment, a stereochemical structure is determined by applying a chemical rule, and an enhanced training strategy is designed for a specific molecular structure, specifically as follows:

compiling a list of a plurality of common functional group replacement rules for abbreviations, and randomly replacing the functional groups with abbreviations in the training data;

randomly adding R group atoms to the training data for R groups, the R group atoms randomly sampled from the list [ R, R1, R2, … …, R12, X, Y, Z, a, ar ];

for aromaticity, randomly selecting a representation mode of an aromatic ring in training data, wherein the representation mode comprises circular or single double bond alternation;

for explicit hydrogen, hydrogen is randomly added to the training data as an explicit atom.

In a preferred embodiment, the step of determining whether the outputted standardized molecule has a valid molecular structure specifically includes:

judging whether chemical bonds exist in the output standardized molecules; meanwhile, judging whether the number of the chemical bonds accords with the chemical rule; and judging whether the chemical structure is contained according to the unique number of the predicted X and Y coordinate values and the predicted heavy atom type, and determining whether the output standardized molecule is a valid molecular structure.

As a preferred embodiment, the encoder uses a Swin transducer model;

the decoder adopts a transducer model with 6 layers and is provided with an attention head;

the key predictor employs a feed-forward neural network.

On the other hand, the invention also provides a recognition system of the chemical structural formula based on the deep learning model, which comprises the following components:

the training data making module is used for making training images with chemical structural formulas and corresponding image labels to serve as training sample sets;

the recognition model construction module is used for constructing a recognition model based on deep learning, and the recognition model comprises an encoder, a decoder and a key predictor; pre-training the recognition model by using a training sample set, inputting a training image into an encoder to obtain corresponding image feature vectors in the pre-training process, inputting the corresponding image feature vectors into a decoder and a key predictor to obtain prediction information, wherein the prediction information comprises geometric information of chemical structural formula atoms and keys, and comparing and adjusting model parameters by using image labels and the prediction information;

the basic information acquisition module is used for acquiring a document image containing a target chemical structural formula, inputting the document image into a recognition model which is trained in advance, and outputting predicted atom and key geometric information;

and the structural formula identification output module is used for identifying and outputting a chemical structural formula based on the predicted geometric information of atoms and bonds.

As a preferred embodiment, the recognition model building module includes a joint training unit for training the decoder and the key predictor by the method of teacher mapping during the pre-training, and in each iteration, the decoder takes the real mark as input and predicts the next step under the previous mark condition; the key predictor takes as input the hidden state of the decoder and predicts the keys between each pair of atoms.

In still another aspect, the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements a method for identifying a chemical structural formula based on a deep learning model according to any embodiment of the present invention when the program is executed by the processor.

In yet another aspect, the present invention further provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements a method for identifying chemical structural formulas based on a deep learning model according to any of the embodiments of the present invention.

The invention has the following beneficial effects:

the recognition method of the chemical structural formula based on the deep learning model can accurately predict atoms, chemical bonds and geometric layout of the atoms and the chemical bonds to construct a molecular structure, and can be aligned with an input image at an atomic level, so that a user can easily read the molecular structure. Meanwhile, a comprehensive data enhancement strategy is designed to create diversified training data, and the robustness of the system is enhanced by combining symbolized chemical constraint, so that the chemical constraint can be easily forced, the symbol reasoning can be carried out on a molecular graph, the chemical conventions such as chirality, functional group abbreviations and the like can be accurately identified, and the system has stronger robustness on patterns and disturbance of images so as to be popularized to any molecular image.

Drawings

FIG. 1 is a schematic flow chart of a method according to a first embodiment of the invention;

FIG. 2 is a diagram illustrating recognition of a handwriting image in accordance with an embodiment of the present invention;

FIG. 3 is an exemplary diagram of recognition of abbreviated images in an embodiment of the present invention;

FIG. 4 is an exemplary diagram of identifying images containing disturbances in an embodiment of the invention;

FIG. 5 is an exemplary diagram of identifying and filtering non-chemical structures in an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the step numbers used herein are for convenience of description only and are not limiting as to the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Embodiment one:

referring to fig. 1, the present embodiment proposes a method for identifying a chemical structural formula based on a deep learning model, including the following steps:

s100, manufacturing a training image with a chemical structural formula and a corresponding image label as a training sample set, wherein the image label comprises 3 parts: for the token of the chemical structure SMILES, a molecular diagram, and the coordinates corresponding to the atoms/functional groups); the training images are resized to a uniform width and height size, such as 384 x 384, and converted to image objects identified by the image computer, while image enhancement is applied during training to randomly perturb the training images, such as rotation, padding, cropping, and gaussian noise. The following operations are specifically applied to the image: 1) Rotating by a random angle selected from [ -90 ∘, 90 ∘ ]; 2) Clipping each side of the image by at most 1%; 3) Filling one side of the image up to 40%; 4) The image is reduced by 20 to 50 percent and then is enlarged; 5) Blurring an image using a kernel of random size; 6) Adding gaussian noise to the image; 7) Salt and pepper noise (random black pixels) is added to the image. Meanwhile, in order to improve robustness of the model in terms of drawing style, parts on a molecular structure, such as fonts, key widths, key lengths, and the like, are changed in generating synthetic data. The enhanced image has different styles and qualities, so that the device model is ensured to be trained by different styles and qualities, and information required by human understanding of molecular structures is reserved, so that the device model can be better popularized in the real world.

S200, constructing an identification model based on deep learning, wherein the identification model comprises an encoder, a decoder and a key predictor; the recognition model is pre-trained by utilizing a training sample set, a training image is input into an encoder in the pre-training process to obtain corresponding image feature vectors, namely, feature vector representations of an output chemical structural formula, the corresponding image feature vectors are input into a decoder and a key predictor to obtain prediction information, wherein the prediction information comprises geometric information of chemical structural formula atoms and keys, and specifically, the geometric information of the chemical structural formula atoms and the keys comprises: atomic information, for example, an atom is a C, H, or O atom; key type information including single, double, triple key types; geometric layout: including the X and Y coordinates of the atoms in the image, and whether there are links between the atoms, what the type of links is; and comparing and adjusting model parameters by using the image labels and the prediction information.

S300, acquiring a document image containing a target chemical structural formula, adjusting the document image to 384 x 384 resolution, converting the document image into an image object identified by an image computer, inputting the image object into an identification model which is pre-trained, and outputting predicted atom and key geometric information.

S400, extracting/reconstructing molecular structure and coordinates based on predicted atom, bond geometry information, and performing enhanced updating on the extracted object, because the identified structural region may have a loss of chemical structure. And finally, identifying and outputting the chemical structural formula.

As a preferred implementation of this embodiment, the decoder and key predictor are trained by the method of teacher mapping during the pre-training process, with the decoder taking the actual image label as input in each iteration and predicting the next Token under the previous label condition until the Token prediction ends. The key predictor takes as input the feature matrix output by the decoder and predicts the keys between each pair of atoms. All modules of the model are fully differentiable and jointly trained. In the reasoning process, greedy decoding is used to generate atoms, then the keys and coordinates are predicted. Namely, for each input picture, corresponding real atoms, keys and corresponding Token and coordinate data after word segmentation in SMILES are provided; and correspondingly comparing the predicted output of each key predictor. The steps of extracting/reconstructing the molecular structure and coordinates are specifically as follows:

s411, the decoder outputs the Token step by step, and each output depends on all Token output before and the feature vector output by the encoder until a special Token 'END' is output, at which time the SMIELS sequence prediction is finished.

S412, predicting the type of the key from the feature matrix output by the encoder through a key predictor on the basis of S411, wherein the atomic sequence is determined by the prediction sequence of the Token representing the atom, the sequence of the key is determined by the atomic sequence, the coordinate prediction is predicted by the coordinate predictor, and the sequence is determined by the atomic sequence.

S413, according to the predicted atom type and bond type, connecting and reducing the atoms to complete molecules, and judging whether the atoms accord with chemical standards, the atomic valence state saturation and whether the three-dimensional chirality needs to be repaired

As a preferred implementation of this embodiment, step S400 specifically includes:

s401, establishing a molecular structure and corresponding coordinates based on predicted atom and bond geometric information;

s402, reconstructing and outputting standardized molecules based on the molecular structure and corresponding coordinates; specifically, chemical rules are applied to predicted atoms, bonds and coordinates to determine clearly the stereochemical configuration, including the cis-trans isomerism of chiral atoms and double bonds of molecules, while a more flexible method is employed to resolve abbreviated structures and R groups. The abbreviation structure is regarded as a 'super atom' in the molecular diagram, and the super atom symbol is split into characters, so that the model is not limited by the prediction space of the model in training data, and can be popularized to the unseen model. The training strategy is enhanced aiming at the special molecular structure design, and the training strategy is specifically as follows: 1) Compiling a list of a plurality of common functional group replacement rules for abbreviations, and randomly replacing the functional groups with abbreviations in the training data; 2) Randomly adding R group atoms to the training data for R groups, the R group atoms randomly sampled from the list [ R, R1, R2, … …, R12, X, Y, Z, a, ar ]; 3) For aromaticity, the expression mode of the aromatic ring in the training data is expressed by a Baolin formula or a Kevlar formula, and the expression mode comprises circular or single double bond alternation; 4) For explicit hydrogen, because the hydrogen atoms in the chemical structural formula pictures are sometimes hidden and sometimes displayed, the embodiment randomly adds hydrogen to the training data as explicit atoms, i.e. randomly lets the hydrogen atoms displayed.

S403, judging whether the output standardized molecule is an effective molecular structure; judging whether chemical bonds exist in the output standardized molecules; meanwhile, judging whether the number of the chemical bonds accords with the chemical rule; and judging whether the chemical structure is contained or not according to the unique number of X and Y coordinate values of the predicted atoms in the image (sometimes a plurality of atoms are predicted, but X and Y coordinates are the same, and the unreasonable situation that the atoms overlap) and the predicted heavy atom type, and determining whether the output standardized molecule is a valid molecular structure or not. And screening the identified structure, and marking the input of the non-molecular structure with an invalid label.

S404, if the molecular structure is judged to be effective, the extracted molecular structure is stored in a format for calculating and storing the molecular structure according to the name of the input image. If the molecular structure is judged to be invalid, a label of a non-molecular structure is marked on the input image.

As a preferred implementation of this example, the encoder uses a Swin transducer model; the Swin transducer model is an advanced model among many computer vision tasks. The Swin transform model is a general backbone network for computer vision by utilizing a method of moving windows and a layered structure to reduce the computational complexity. The encoder uses in particular the Swin-B model for a total of 88M parameters. By making specific "chemical structure" images and corresponding image labels as a Swin transducer training set, swin transducer is iterated through thousands of rounds to be sufficient for chemical structure image recognition.

The decoder uses a 6-layer transducer model and is provided with 8 attention heads, the hidden dimension is 256, and a discard rate of 0.1 is applied using sinusoidal position coding (sinusoidal positional encoding).

The key predictor is a layer 2 feed-forward neural network with a ReLU activation function, placed above the decoder, and has the same hidden dimension as the decoder.

In order to verify the effectiveness and superiority of the method proposed in this embodiment, specific recognition cases are provided below:

case 1: handwriting image recognition

Referring to fig. 2, the method proposed in this embodiment can correctly identify chemical structures of handwriting images of different styles according to the identification result of the handwriting image case (input image on the left side and identification result on the right side in fig. 2), and the positions of molecules are consistent with the input image.

Case 2: abbreviated image recognition

Referring to fig. 3, the method according to the present embodiment correctly identifies the chemical structure in the image and correctly extracts the abbreviation structure and chirality in the structure according to the identification result of the abbreviation image case (input image on the left side and identification result on the right side in fig. 3).

Case 3: image recognition with interference

Referring to fig. 4, the method proposed in this embodiment can exclude the interference portion in the image and correctly identify the effective molecular structure, based on the identification result (input image on left side and identification result on right side in fig. 4) of the case with the interference image attached with the non-chemical structure in the molecular structure.

Case 4: filtration without chemical mechanism

Referring to fig. 5, the method proposed in this embodiment identifies and distinguishes whether there is a chemical structure (input image on left side and identification result on right side) on the input image based on defined rules, inputs the image of non-chemical structure into the invalid tag on the tag and filters, and rectangles in the prediction result represent a plurality of atoms X, Y coordinates that are overlapped together identically.

Embodiment two:

the embodiment provides a recognition system of a chemical structural formula based on a deep learning model, which comprises:

the training data making module is used for making training images with chemical structural formulas and corresponding image labels to serve as training sample sets; the module is used for implementing the function of step S100 in the first embodiment, and will not be described here again;

the recognition model construction module is used for constructing a recognition model based on deep learning, and the recognition model comprises an encoder, a decoder and a key predictor; pre-training the recognition model by using a training sample set, inputting a training image into an encoder to obtain corresponding image feature vectors in the pre-training process, inputting the corresponding image feature vectors into a decoder and a key predictor to obtain prediction information, wherein the prediction information comprises geometric information of chemical structural formula atoms and keys, and comparing and adjusting model parameters by using image labels and the prediction information; the module is used for implementing the function of step S200 in the first embodiment, and will not be described in detail herein;

the basic information acquisition module is used for acquiring a document image containing a target chemical structural formula, inputting the document image into a recognition model which is trained in advance, and outputting predicted atom and key geometric information; the module is used for implementing the function of step S300 in the first embodiment, and will not be described in detail herein;

the structural formula identification output module is used for identifying and outputting a chemical structural formula based on the geometric information of predicted atoms and bonds; this module is used to implement the function of step S400 in the first embodiment, and will not be described here again.

As a preferred implementation manner of this embodiment, the recognition model building module includes a joint training unit, which is used to train the decoder and the key predictor by the method of teacher mapping in the pre-training process, and in each iteration, the decoder takes the real mark as input and predicts the next step under the previous mark condition; the key predictor takes as input the hidden state of the decoder and predicts the keys between each pair of atoms.

Embodiment III:

the embodiment provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the identification of a chemical structural formula based on a deep learning model according to any embodiment of the invention when executing the program.

Embodiment four:

the present embodiment proposes a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an identification of a chemical structural formula based on a deep learning model according to any of the embodiments of the present invention.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent: a, b, c, a and b, a and c, b and c or a and b and c, wherein a, b and c can be single or multiple.

Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in the embodiments disclosed herein can be implemented as a combination of electronic hardware, computer software, and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In several embodiments provided herein, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (hereinafter referred to as ROM), a random access Memory (Random Access Memory) and various media capable of storing program codes such as a magnetic disk or an optical disk.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims

1. The method for identifying the chemical structural formula based on the deep learning model is characterized by comprising the following steps of: manufacturing a training image with a chemical structural formula and a corresponding image label as a training sample set; constructing a recognition model based on deep learning, wherein the recognition model comprises an encoder, a decoder and a key predictor; pre-training the recognition model by using a training sample set, inputting a training image into an encoder to obtain corresponding image feature vectors in the pre-training process, inputting the corresponding image feature vectors into a decoder and a key predictor to obtain prediction information, wherein the prediction information comprises geometric information of chemical structural formula atoms and keys, and comparing and adjusting model parameters by using image labels and the prediction information; acquiring a document image containing a target chemical structural formula, inputting the document image into a recognition model which is trained in advance, and outputting predicted geometrical information of atoms and keys; chemical structural formulas are identified and output based on predicted atom, bond geometry information.

2. The method for identifying a chemical structural formula based on a deep learning model according to claim 1, wherein the method comprises the following steps: training the decoder and the key predictor by a method of teacher mapping during the pre-training process, wherein in each iteration, the decoder takes a real mark as an input and predicts the next step under the previous mark condition; the key predictor takes as input the hidden state of the decoder and predicts the keys between each pair of atoms.

3. The method for identifying chemical structural formula based on deep learning model according to claim 1, wherein the step of identifying and outputting chemical structural formula based on geometric information of predicted atoms and bonds specifically comprises the following steps: establishing a molecular structure and corresponding coordinates based on the predicted geometrical information of atoms and bonds; reconstructing and outputting standardized molecules based on the molecular structure and corresponding coordinates; judging whether the output standardized molecule is an effective molecular structure; and if the molecular structure is effective, outputting the corresponding molecular structure.

4. The method for identifying chemical structural formula based on deep learning model according to claim 3, wherein in the step of reconstructing standardized molecules based on molecular structure and corresponding coordinates and outputting, a chemical rule is applied to determine a stereochemical structure, and an enhanced training strategy is designed for a specific molecular structure, specifically as follows: compiling a list of a plurality of common functional group replacement rules for abbreviations, and randomly replacing the functional groups with abbreviations in the training data; randomly adding R group atoms to the training data for R groups, the R group atoms randomly sampled from the list [ R, R1, R2, … …, R12, X, Y, Z, a, ar ]; for aromaticity, randomly selecting a representation mode of an aromatic ring in training data, wherein the representation mode comprises circular or single double bond alternation; for explicit hydrogen, hydrogen is randomly added to the training data as an explicit atom.

5. The method for identifying a chemical structural formula based on a deep learning model according to claim 3, wherein the step of determining whether the outputted standardized molecule is a valid molecular structure specifically comprises: judging whether chemical bonds exist in the output standardized molecules; meanwhile, judging whether the number of the chemical bonds accords with the chemical rule; and judging whether the chemical structure is contained according to the unique number of the predicted X and Y coordinate values and the predicted heavy atom type, and determining whether the output standardized molecule is a valid molecular structure.

6. The method for identifying a chemical structural formula based on a deep learning model according to claim 1, wherein the method comprises the following steps: the encoder adopts a Swin transducer model; the decoder adopts a transducer model with 6 layers and is provided with an attention head; the key predictor employs a feed-forward neural network.

7. A deep learning model-based chemical structural formula recognition system, comprising: the training data making module is used for making training images with chemical structural formulas and corresponding image labels to serve as training sample sets; the recognition model construction module is used for constructing a recognition model based on deep learning, and the recognition model comprises an encoder, a decoder and a key predictor; pre-training the recognition model by using a training sample set, inputting a training image into an encoder to obtain corresponding image feature vectors in the pre-training process, inputting the corresponding image feature vectors into a decoder and a key predictor to obtain prediction information, wherein the prediction information comprises geometric information of chemical structural formula atoms and keys, and comparing and adjusting model parameters by using image labels and the prediction information; the basic information acquisition module is used for acquiring a document image containing a target chemical structural formula, inputting the document image into a recognition model which is trained in advance, and outputting predicted atom and key geometric information; and the structural formula identification output module is used for identifying and outputting a chemical structural formula based on the predicted geometric information of atoms and bonds.

8. The deep learning model based chemical formula recognition system of claim 7, wherein: the recognition model construction module comprises a joint training unit, wherein the joint training unit is used for training a decoder and a key predictor through a method of teacher mapping in the pre-training process, and in each iteration, the decoder takes a real mark as an input and predicts the next step under the previous mark condition; the key predictor takes as input the hidden state of the decoder and predicts the keys between each pair of atoms.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements a method for identifying a chemical structural formula based on a deep learning model as claimed in any one of claims 1 to 6 when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method for identifying a chemical structural formula based on a deep learning model as claimed in any one of claims 1 to 6.