CN108334839B - Chemical information identification method based on deep learning image identification technology - Google Patents

Chemical information identification method based on deep learning image identification technology Download PDF

Info

Publication number
CN108334839B
CN108334839B CN201810098220.0A CN201810098220A CN108334839B CN 108334839 B CN108334839 B CN 108334839B CN 201810098220 A CN201810098220 A CN 201810098220A CN 108334839 B CN108334839 B CN 108334839B
Authority
CN
China
Prior art keywords
atoms
chemical
deep learning
identified
chemical bond
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810098220.0A
Other languages
Chinese (zh)
Other versions
CN108334839A (en
Inventor
井建军
魏凯
郑成伟
黄麒展
张帅
刘威
李勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Qingyuan Precision Agriculture Technology Co ltd
Original Assignee
Qingdao Qingyuan Precision Agriculture Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Qingyuan Precision Agriculture Technology Co ltd filed Critical Qingdao Qingyuan Precision Agriculture Technology Co ltd
Priority to CN201810098220.0A priority Critical patent/CN108334839B/en
Publication of CN108334839A publication Critical patent/CN108334839A/en
Priority to PCT/CN2018/105414 priority patent/WO2019148852A1/en
Application granted granted Critical
Publication of CN108334839B publication Critical patent/CN108334839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/457Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by analysing connectivity, e.g. edge linking, connected component analysis or slices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of image recognition, and particularly relates to a chemical information recognition method based on a deep learning image recognition technology. The method comprises the following steps: (1) identifying the input image by using a node target identifier; (2) identifying text content of the node identified in the step (1) by using a handwritten font target identifier, and further determining a specific atom corresponding to the node; (3) combining the plurality of recognized atoms pairwise, and recognizing the chemical bond between the two atoms by using a chemical bond target recognizer again; (4) searching the attribute of the identified atom in a database, calculating the related attribute of the structural formula and outputting the attribute; or storing the identified atoms and chemical bonds among the atoms as a file in a custom king format or drawing the atoms in a new picture and outputting the atoms and the chemical bonds. The invention can solve the problem of identifying chemical structural formulas or reaction formulas on hand-painted pictures and can be widely applied to daily work of chemical workers.

Description

Chemical information identification method based on deep learning image identification technology
Technical Field
The invention belongs to the technical field of image recognition, and particularly relates to a chemical information recognition method based on a deep learning image recognition technology.
Background
At present, deep learning is widely applied to the aspect of image recognition, and the main application scenes of the deep learning are face recognition, license plate recognition, common object recognition and plant recognition. However, the deep learning image recognition technology is not applied to the chemical structural formula or the reactive image recognition.
Disclosure of Invention
In order to solve the problem of identifying chemical structural formulas or reaction formulas on hand-drawing and pictures, the invention aims to take pictures of the chemical structural formulas or reaction formulas drawn on tools or paper by a user and upload the pictures to obtain the compositions of the corresponding structural formulas or reaction formulas and the related attributes of the structural formulas.
In order to achieve the above object, the present invention provides a chemical information recognition method based on a deep learning image recognition technology, which includes the following steps:
(1) identifying nodes by using a node target identifier based on a deep learning image identification technology on an input image;
(2) identifying text contents of the nodes identified in the step (1) by using a handwritten font target identifier based on a deep learning image identification technology, and further determining specific atoms corresponding to the nodes;
(3) combining the plurality of recognized atoms pairwise, and recognizing the chemical bond between the two atoms by using a chemical bond target recognizer based on a deep learning image recognition technology;
(4) searching the attributes of the identified atoms in a database, wherein the attributes comprise information such as relative atomic mass, isotope mass and abundance, common chemical valence and the like, calculating the related attributes of the structural formula and outputting the attributes;
or storing the identified atoms and chemical bonds among the atoms as a file with a user-defined king format and outputting the file;
or, the identified atoms and chemical bonds between the atoms are drawn in a new picture and output.
In addition, the method also comprises the following steps:
(5) performing arrow recognition on the input image by using an arrow target recognizer based on a deep learning image recognition technology;
then storing the identified arrows and the atoms identified in the steps (2) and (3) and chemical bonds among the atoms into a file with a custom king format, and outputting the file;
or, drawing the identified arrow and the atom and the chemical bond between the atoms identified in the steps (2) and (3) in a new picture and outputting the new picture.
The target recognizer based on the deep learning image recognition technology in the steps (1), (3) and (5) is obtained by performing off-line training in advance by using a fast-rcnn algorithm proposed by Ross Girshick team based on the deep learning image recognition technology, and is used for recognizing arrows, atoms, spatial coordinates of the atoms and chemical bonds in the image.
The handwritten font target recognizer based on the deep learning image recognition technology in the step (2) is obtained by performing offline training in advance by using a LeNet model based on the deep learning image recognition technology and Caffe, and is used for recognizing text contents in an image.
Preferably, the step of training the object recognizer offline comprises training the object recognizer offline using a set of images.
Training the image set used by the target recognizer includes: (a) handwriting the font picture; (b) nodes connected by multiple and multiple types of chemical bonds; (c) common chemical bonds such as single bond, double bond, triple bond, etc.; (d) arrow pictures are commonly used in chemistry.
More preferably, the image set is used to (a) train a handwriting recognizer in the LeNet model for determining whether a node is an element of the periodic table of elements, plain text, or an "carbon" element that is not to be displayed.
And (c) training a node target recognizer in a fast-rcnn algorithm by using the image set (b) for determining all nodes in the image and the spatial coordinates of the nodes.
Using the image set (c) to train a chemical bond target recognizer in a fast-rcnn algorithm for determining the chemical bond type between atoms and whether chemical bonds exist between atoms.
Using the image set (d) to train an arrow object recognizer in the fast-rcnn algorithm for determining whether an arrow and its spatial location coordinates are present in the input image.
Wherein, the step (3) comprises the following steps:
step (31), for all the identified atoms, combining every two, using the chemical bond target identifier to identify whether the chemical bond target identifier contains a chemical bond, and identifying the type of the chemical bond when the chemical bond target identifier contains the chemical bond;
and (32) adding association to the two atoms according to the identification of whether the chemical bond and the type of the chemical bond are contained, wherein the association type is the type of the identified chemical bond.
Wherein the correlation attributes of the computational structural formula described in step (4) include:
step (41), according to atoms and chemical bonds among the atoms, ensuring an electronic stable structure of the outermost layer 8 of the atoms, automatically supplementing hydrogen, counting the types of the atoms and the number of the atoms, and generating a molecular formula of a chemical structural formula;
step (42), converting the structural formula into smiles names according to the public smiles protocol format according to the atoms and the universal smiles names converted by chemical bonds among the atoms;
step (43), searching an English name corresponding to the chemical structural formula in the database through the corresponding smiles;
and (44) calculating the corresponding abundances of the accurate molecular mass, the relative molecular mass and the mass-to-charge ratio of the molecular formula.
Calculating the accurate molecular mass of the molecular formula in step (44), and adding the atomic masses with the maximum atomic isotope abundance in the molecular formula to obtain the accurate molecular mass; calculating the relative molecular mass, and adding the relative atomic masses of all atoms in the molecular formula to obtain the relative molecular mass; calculating the corresponding abundance of mass-to-charge ratio, from equation (a + b)nAnd (4) calculating expansion coefficients, wherein a and b represent isotopes of the same atom, and n represents the number of the atom in the molecule.
The user-defined king in the step (4) is a text file encoded by using a UTF8 format, and each online structural formula editor can automatically analyze the file content and can edit the file content again in the editor.
Compared with the prior art, the invention has the beneficial effects that: the invention can obtain the chemical structural formula or the reaction formula recognized by a computer through the recognized content after processing the hand-drawn chemical structural formula or the reaction formula and the chemical structural formula or the reaction formula in the picture through recognizing nodes, atoms, chemical bonds and arrows, and can obtain the related attributes of the structural formula through some calculations, thereby being widely applied to daily work of chemical workers, such as structural formula editors, word documents and the like, saving the drawing time and improving the working efficiency.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a diagram illustrating a custom king format according to the present invention.
Detailed Description
The invention is further explained below with reference to specific embodiments and the accompanying drawings.
Example 1
As shown in fig. 1, a chemical information recognition method based on a deep learning image recognition technology includes the following steps:
step 1, recognizing an arrow by using an arrow target recognizer based on a deep learning image recognition technology for an input image;
step 2, identifying nodes of the input image by using a node target identifier based on a deep learning image identification technology;
step 3, recognizing the text content of the node recognized in the step 2 by using a handwritten font target recognizer based on a deep learning image recognition technology, and further determining a specific atom corresponding to the node;
step 4, combining the plurality of recognized atoms pairwise, and recognizing the chemical bond between the two atoms by using a chemical bond target recognizer based on a deep learning image recognition technology;
step 5, searching the attributes of the identified atoms in a database, including information such as relative atomic mass, isotope mass and abundance, common chemical valence and the like, calculating the related attributes of the structural formula and outputting the attributes;
or step 6, storing the identified arrows, atoms and chemical bonds among the atoms into a file with a custom king format, and outputting the file;
or, step 7, drawing the identified arrows, atoms and chemical bonds between atoms in a new picture and outputting the new picture.
The target recognizer based on the deep learning image recognition technology in the steps 1, 2 and 4 is obtained by using a deep learning image recognition technology in advance and performing offline training by using a fast-rcnn algorithm proposed by a Ross Girshick team, and is used for recognizing atoms and spatial coordinates in an image and recognizing chemical bonds in the image, the handwriting font recognizer based on the deep learning image recognition technology in the step 3 is obtained by using a deep learning image recognition technology in advance and performing offline training by using a LeNet model of Caffe and is used for recognizing text contents in the image, and the step of performing offline training on the target recognizer comprises the step of performing offline training on the target recognizer by using an image set, wherein the image set comprises: (a) handwriting font picture: training a handwriting character recognizer in a LeNet model of Caffe to determine whether the node is an element in an element periodic table, a plain text or an undisplayed 'carbon' element; (b) multiple and multiple types of chemical bond-linked nodes: training a node target recognizer in a fast-rcnn algorithm proposed by a RossGirshick team for determining all nodes and spatial coordinates thereof in an image; (c) common chemical bonds such as single bond, double bond, triple bond, etc.: training a chemical bond target recognizer in a fast-rcnn algorithm proposed by a Ross Girshick team to determine the type of chemical bonds among atoms and whether chemical bonds exist among atoms; (d) arrow pictures are commonly used in chemistry: an arrow target recognizer is trained in the fast-rcnn algorithm proposed by Ross Girshick team to determine whether an arrow and its spatial position coordinates are present in the input image.
The step 4 specifically comprises the following steps:
step 41, combining all the identified atoms pairwise, identifying whether the atoms contain chemical bonds or not by using the chemical bond target identifier, and identifying the types of the chemical bonds when the atoms contain the chemical bonds;
and 42, adding association to the two atoms according to the identification of whether the chemical bond and the type of the chemical bond are contained, wherein the association type is the type of the identified chemical bond.
The correlation properties of the computational structural formula described in step 5 include:
step 51, according to atoms and chemical bonds among the atoms, ensuring an electronic stable structure of the outermost layer 8 of the atoms, automatically supplementing hydrogen, and counting the types and the numbers of the atoms to generate a molecular formula of a chemical structural formula;
step 52, converting the structural formula into smiles names according to the public smiles protocol format according to the atoms and the universal smiles names converted by chemical bonds among the atoms;
step 53, searching an English name corresponding to the chemical structural formula in the database through the corresponding smiles;
step 54, calculating the corresponding abundances of the accurate molecular mass, the relative molecular mass and the mass-to-charge ratio of the molecular formula: calculating the accurate molecular mass of the molecular formula, and adding the atomic masses with the maximum atomic isotopic abundance in the molecular formula to obtain the accurate molecular mass; calculating the relative molecular mass, and adding the relative atomic masses of all atoms in the molecular formula to obtain the relative molecular mass; calculating the corresponding abundance of mass-to-charge ratio, from equation (a + b)nThe coefficient of expansion is calculated, a and b represent isotopes of the same atom, and n represents the number of the atom in the molecule, for example: chlorine (Cl) element, isotopes having Cl35-34.96885、Cl37-36.9659 corresponding to an abundance of 75.78%, 24.22%, formula Cl2The mass-to-charge ratio and corresponding abundance were calculated as (Cl)35+Cl37)2Corresponding to the expansion formula of (Cl)35)2+2Cl35Cl37+(Cl37)2Then, there are three mass-to-charge ratios m/z, which are: cl35+Cl35=34.96885+34.96885=69.9377、Cl35+Cl37=34.96885+36.9659=71.93475、Cl37+Cl3773.9318 ═ 36.9659+ 36.9659; the corresponding abundances are: cl35*Cl35=75.78%*75.78%=0.57426084、Cl35*Cl37*2=75.78%*24.22%*2=0.36707832、Cl37*Cl3724.22% by 24.22% 0.05866084, the corresponding abundance after normalization is shown in table 1:
TABLE 1 formula Cl2Normalized corresponding abundance
m/z Abundance ratio
69.9377 100%
71.93475 63.9%
73.9318 10.2%
Step 6, the custom king is a text file encoded by using UTF8 format, each online structural formula editor can automatically analyze the file content, and can edit the file content again in the editor, as shown in the format shown in fig. 2:
AtomBlock memory atom in FIG. 2, contains the following format:
Begin Atom
Index Type x y HCount
End Atom
wherein, identifying multiple atoms adds multiple sets of text of the same format between Begin Atom and End Atom. Index is ordinal number, increasing from 1; type is element name, example: "C"; x is the x coordinate of the atom in the plane; y is the y coordinate of the atom in the plane; HCount is the number of hydrogen coordinated by the atom.
The BondBlock in FIG. 2 stores chemical bonds between atoms, comprising the following formats:
Begin Bond
Index Type Atom1index Atom2index
End Bond
wherein, identifying multiple chemical bonds adds sets of identically formatted text between the Begin Bond and the End Bond. Index is ordinal number, increasing from 1; type is a chemical bond Type; atom1index is the ordinal number of one of the connected atoms in Atom Block; atom2index is the ordinal number of another Atom attached in Atom Block.
Text Block in fig. 2 stores plain Text information, containing the following format:
Begin Text
Index x y Text
End Text
wherein, the recognition of a plurality of plain texts adds a plurality of groups of texts with the same format between the Begin Text and the End Text. Index is ordinal number, increasing from 1; x is the x coordinate of the plain text in the plane; y is the y coordinate of the plain text in the plane; text is the content of plain Text.
Shape Block stores the arrow information in fig. 2 containing the following format:
Begin Shape
Index x1,y1;x2,y2
End Shape
wherein, recognizing multiple arrows adds multiple groups of texts with the same format between the Begin Shape and the End Shape. Index is ordinal number, increasing from 1; x1 is the starting point x coordinate of the arrow in the plane; y1 is the origin y coordinate of the arrow in the plane; x2 is the end x coordinate of the arrow in the plane; y2 is the end y coordinate of the arrow in the plane.
Of course, the foregoing is only a preferred embodiment of the invention and should not be taken as limiting the scope of the embodiments of the invention. The present invention is not limited to the above examples, and equivalent changes and modifications made by those skilled in the art within the spirit and scope of the present invention should be construed as being included in the scope of the present invention.

Claims (10)

1. A chemical information identification method based on a deep learning image identification technology is characterized by comprising the following steps:
(1) identifying nodes by using a node target identifier based on a deep learning image identification technology on an input image;
(2) identifying text contents of the nodes identified in the step (1) by using a handwritten font target identifier based on a deep learning image identification technology, and further determining specific atoms corresponding to the nodes;
(3) combining the plurality of recognized atoms pairwise, and recognizing the chemical bond between the two atoms by using a chemical bond target recognizer based on a deep learning image recognition technology;
(4) searching the attribute of the identified atom in a database, calculating the related attribute of the structural formula and outputting the attribute;
or storing the identified atoms and chemical bonds among the atoms as a file with a user-defined king format and outputting the file;
or, the identified atoms and chemical bonds between the atoms are drawn in a new picture and output.
2. The chemical information identification method based on the deep learning image identification technology according to claim 1, characterized by further comprising the following steps:
(5) performing arrow recognition on the input image by using an arrow target recognizer based on a deep learning image recognition technology;
then storing the identified arrows and the atoms identified in the steps (2) and (3) and chemical bonds among the atoms into a file with a custom king format, and outputting the file;
or, drawing the identified arrow and the atom and the chemical bond between the atoms identified in the steps (2) and (3) in a new picture and outputting the new picture.
3. The method for identifying chemical information based on deep learning image identification technology as claimed in claim 2, wherein the target recognizer based on deep learning image identification technology in the steps (1) (2) (3) (5) is obtained by performing offline training using deep learning image identification technology in advance.
4. The method of claim 3, wherein the step of training the object recognizer offline comprises training the object recognizer offline by using an image set.
5. The method of claim 4, wherein training the set of images used by the target recognizer comprises: (a) handwriting the font picture; (b) nodes connected by multiple and multiple types of chemical bonds; (c) a common chemical bond; (d) arrow pictures are commonly used in chemistry.
6. The method of claim 5, wherein the image set is used to (a) train a handwriting recognizer in LeNet model for determining whether a node is an element in the periodic table of elements, plain text, or "carbon" without display;
training a node target recognizer in a fast-rcnn algorithm by using the image set (b) for determining all nodes and spatial coordinates thereof in the image;
training a chemical bond target recognizer in a fast-rcnn algorithm using the image set (c) for determining the chemical bond type between atoms and whether chemical bonds exist between atoms;
using the image set (d) to train an arrow object recognizer in the fast-rcnn algorithm for determining whether an arrow and its spatial location coordinates are present in the input image.
7. The chemical information identification method based on the deep learning image identification technology according to any one of claims 1 to 6, wherein the step (3) specifically comprises the following steps:
step (31), for all the identified atoms, combining every two, using the chemical bond target identifier to identify whether the chemical bond target identifier contains a chemical bond, and identifying the type of the chemical bond when the chemical bond target identifier contains the chemical bond;
and (32) adding association to the two atoms according to the identification of whether the chemical bond and the type of the chemical bond are contained, wherein the association type is the type of the identified chemical bond.
8. The method for identifying chemical information based on deep learning image identification technology as claimed in any one of claims 1 to 6, wherein the calculating the correlation attribute of the structural formula in step (4) comprises:
step (41), according to atoms and chemical bonds among the atoms, ensuring an electronic stable structure of the outermost layer 8 of the atoms, automatically supplementing hydrogen, counting the types of the atoms and the number of the atoms, and generating a molecular formula of a chemical structural formula;
step (42), converting the structural formula into smiles names according to the public smiles protocol format according to the atoms and the universal smiles names converted by chemical bonds among the atoms;
step (43), searching an English name corresponding to the chemical structural formula in the database through the corresponding smiles;
and (44) calculating the corresponding abundances of the accurate molecular mass, the relative molecular mass and the mass-to-charge ratio of the molecular formula.
9. The method for recognizing chemical information based on deep learning image recognition technology as claimed in claim 8, wherein the precise molecular mass of the molecular formula is calculated in step (44) and is obtained by adding the atomic masses with the maximum abundance of all atomic isotopes in the molecular formula; calculating the relative molecular mass, and adding the relative atomic masses of all atoms in the molecular formula to obtain the relative molecular mass; calculating the corresponding abundance of mass-to-charge ratio, from equation (a + b)nAnd (4) calculating expansion coefficients, wherein a and b represent isotopes of the same atom, and n represents the number of the atom in the molecule.
10. The method for identifying chemical information based on deep learning image identification technology as claimed in any one of claims 1 to 6, wherein the custom king in step (4) is a text file encoded by UTF8 format.
CN201810098220.0A 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology Active CN108334839B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810098220.0A CN108334839B (en) 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology
PCT/CN2018/105414 WO2019148852A1 (en) 2018-01-31 2018-09-13 Chemical information identification method based on deep learning image identification technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810098220.0A CN108334839B (en) 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology

Publications (2)

Publication Number Publication Date
CN108334839A CN108334839A (en) 2018-07-27
CN108334839B true CN108334839B (en) 2021-09-14

Family

ID=62927657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810098220.0A Active CN108334839B (en) 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology

Country Status (2)

Country Link
CN (1) CN108334839B (en)
WO (1) WO2019148852A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108334839B (en) * 2018-01-31 2021-09-14 青岛清原精准农业科技有限公司 Chemical information identification method based on deep learning image identification technology
CN110413740B (en) * 2019-08-06 2022-10-14 百度在线网络技术(北京)有限公司 Query method and device of chemical expression, electronic equipment and storage medium
WO2021125206A1 (en) 2019-12-16 2021-06-24 富士フイルム株式会社 Image analysis device, image analysis method, and program
EP3937106A1 (en) * 2020-07-08 2022-01-12 Tata Consultancy Services Limited System and method of extraction of information and graphical representation for design of formulated products
CN111897987B (en) * 2020-07-10 2022-05-31 山西大学 Molecular structure diagram retrieval method based on evolution calculation multi-view fusion
WO2023277725A1 (en) * 2021-06-28 2023-01-05 Autonomous Non-Profit Organization For Higher Education "Skolkovo Institute Of Science And Technology" Method and system for recognizing chemical information from document images
CN115908775A (en) * 2021-08-16 2023-04-04 中国科学院上海药物研究所 Chemical structural formula identification method and device, storage medium and electronic equipment
CN114464273A (en) * 2021-12-22 2022-05-10 天翼云科技有限公司 Molecular structure database construction method and device, electronic equipment and storage medium
CN114581924A (en) * 2022-03-01 2022-06-03 苏州阿尔脉生物科技有限公司 Method and device for extracting elements in chemical reaction flow chart
CN114842486A (en) * 2022-07-04 2022-08-02 南昌大学 Handwritten chemical structural formula recognition method, system, storage medium and equipment
CN114898391A (en) * 2022-07-12 2022-08-12 苏州阿尔脉生物科技有限公司 Method and device for determining chemical reaction route and electronic equipment

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5157736A (en) * 1991-04-19 1992-10-20 International Business Machines Corporation Apparatus and method for optical recognition of chemical graphics
JP3545075B2 (en) * 1994-12-28 2004-07-21 富士通株式会社 Compound analyzer
CN101261554A (en) * 2008-04-21 2008-09-10 东莞市步步高教育电子产品有限公司 Formula, expression hand-written inputting and computing system and method
CN101329731A (en) * 2008-06-06 2008-12-24 南开大学 Automatic recognition method pf mathematical formula in image
US20100163316A1 (en) * 2008-12-30 2010-07-01 Microsoft Corporation Handwriting Recognition System Using Multiple Path Recognition Framework
CN102033866A (en) * 2009-09-29 2011-04-27 国际商业机器公司 Method and system for checking chemical name
US8718375B2 (en) * 2010-12-03 2014-05-06 Massachusetts Institute Of Technology Sketch recognition system
US9558403B2 (en) * 2011-08-26 2017-01-31 Council Of Scientific And Industrial Research Chemical structure recognition tool
CN102693303B (en) * 2012-05-18 2017-06-06 上海极值信息技术有限公司 The searching method and device of a kind of formulation data
CN103700084A (en) * 2012-09-28 2014-04-02 淮海工学院 Chemical molecular structure chart partition method based on area size and curvature
US10346681B2 (en) * 2015-09-26 2019-07-09 Wolfram Research, Inc. Method and computing device for optically recognizing mathematical expressions
CN106980856B (en) * 2016-01-15 2020-11-27 北京字节跳动网络技术有限公司 Formula identification method and system and symbolic reasoning calculation method and system
CN105894931A (en) * 2016-06-06 2016-08-24 宁波市铭时三维科技发展有限公司 Two-dimensional code containing three-dimensional printing method for using molecular structure model as chemical training aid
CN106372456B (en) * 2016-08-26 2019-01-22 浙江工业大学 A kind of Advances in protein structure prediction based on deep learning
CN106650686A (en) * 2016-12-30 2017-05-10 南开大学 Online hand-written chemical symbol identification method based on Hidden Markov model
CN106874688B (en) * 2017-03-01 2019-03-12 中国药科大学 Intelligent lead compound based on convolutional neural networks finds method
CN107169485B (en) * 2017-03-28 2020-10-09 北京捷通华声科技股份有限公司 Mathematical formula identification method and device
CN108334839B (en) * 2018-01-31 2021-09-14 青岛清原精准农业科技有限公司 Chemical information identification method based on deep learning image identification technology

Also Published As

Publication number Publication date
CN108334839A (en) 2018-07-27
WO2019148852A1 (en) 2019-08-08

Similar Documents

Publication Publication Date Title
CN108334839B (en) Chemical information identification method based on deep learning image identification technology
WO2023138023A1 (en) Multimodal document information extraction method based on graph neural network, device and medium
WO2022142011A1 (en) Method and device for address recognition, computer device, and storage medium
CN109472234B (en) Intelligent recognition method for handwriting input
CN105574133A (en) Multi-mode intelligent question answering system and method
CN113052023A (en) CAD drawing analysis method, device, equipment and storage medium
CN105335348A (en) Object statement based dependency syntax analysis method and apparatus and server
CN109918351B (en) Method and system for converting Beamer presentation into PowerPoint presentation
CN110083580B (en) Method and system for converting Word document into PowerPoint document
CN113010711B (en) Method and system for automatically generating movie poster based on deep learning
CN115917613A (en) Semantic representation of text in a document
CN112650858A (en) Method and device for acquiring emergency assistance information, computer equipment and medium
CN108537109B (en) OpenPose-based monocular camera sign language identification method
CN103678593A (en) Interactive space scene retrieval method based on space scene draft description
CN109784236B (en) Method for identifying table contents in railway drawing
CN114821255A (en) Method, apparatus, device, medium and product for fusion of multimodal features
CN105912723A (en) Storage method of custom field
CN115359492A (en) Text image matching model training method, picture labeling method, device and equipment
CN113536798A (en) Multi-instance document key information extraction method and system
CN112231473A (en) Commodity classification method based on multi-mode deep neural network model
CN113065475A (en) Rapid and accurate CAD (computer aided design) legend identification method
CN111144256A (en) Spreadsheet formula synthesis and error detection method based on video dynamic analysis
CN113393179B (en) Data integration system based on time sequence difference
CN111062419B (en) Compression and recovery method for deep learning data set
Qiu et al. A font style learning and transferring method based on strokes and structure of Chinese characters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant