CN108334839A - A kind of chemical information recognition methods based on deep learning image recognition technology - Google Patents

A kind of chemical information recognition methods based on deep learning image recognition technology Download PDF

Info

Publication number
CN108334839A
CN108334839A CN201810098220.0A CN201810098220A CN108334839A CN 108334839 A CN108334839 A CN 108334839A CN 201810098220 A CN201810098220 A CN 201810098220A CN 108334839 A CN108334839 A CN 108334839A
Authority
CN
China
Prior art keywords
atom
chemical bond
deep learning
identified
recognition technology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810098220.0A
Other languages
Chinese (zh)
Other versions
CN108334839B (en
Inventor
井建军
魏凯
郑成伟
黄麒展
张帅
刘威
李勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Qingyuan Precision Agriculture Technology Co Ltd
Original Assignee
Qingdao Qingyuan Precision Agriculture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Qingyuan Precision Agriculture Technology Co Ltd filed Critical Qingdao Qingyuan Precision Agriculture Technology Co Ltd
Priority to CN201810098220.0A priority Critical patent/CN108334839B/en
Publication of CN108334839A publication Critical patent/CN108334839A/en
Application granted granted Critical
Publication of CN108334839B publication Critical patent/CN108334839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for recognising patterns
    • G06K9/62Methods or arrangements for pattern recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • G06K9/627Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches based on distances between the pattern to be recognised and training or reference patterns
    • G06K9/6271Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches based on distances between the pattern to be recognised and training or reference patterns based on distances to prototypes
    • G06K9/6272Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches based on distances between the pattern to be recognised and training or reference patterns based on distances to prototypes based on distances to cluster centroïds
    • G06K9/6273Smoothing the distance, e.g. Radial Basis Function Networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for recognising patterns
    • G06K9/62Methods or arrangements for pattern recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6256Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/457Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by analysing connectivity, e.g. edge linking, connected component analysis or slices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing

Abstract

The invention belongs to image identification technical fields, and in particular to a kind of chemical information recognition methods based on deep learning image recognition technology.It the described method comprises the following steps:(1) image of input is identified using node destination identifier;(2) it uses hand-written script target marker to carry out the identification of content of text the node identified in step (1), and then determines the corresponding specific atom of node;(3) above-mentioned multiple atoms identified are combined two-by-two, reuses two interatomic chemical bond of chemical bond target marker pair and is identified;(4) attribute for searching the above-mentioned atom identified in the database calculates the association attributes of structural formula, output;Alternatively, the above-mentioned atom identified and interatomic chemical bond are stored as the file of self-defined king formats or are plotted in new picture, export.The present invention can solve the problems, such as the chemical structural formula on Freehandhand-drawing and picture or reaction equation identification, can be widely applied in the routine work of chemist.

Description

A kind of chemical information recognition methods based on deep learning image recognition technology
Technical field
The invention belongs to image identification technical fields, and in particular to a kind of chemistry based on deep learning image recognition technology Information identifying method.
Background technology
Currently, deep learning has been widely used in terms of image recognition, main application scene is recognition of face, car plate knowledge Not, object identification and plants identification are commonly used.But deep learning image recognition technology is known in chemical structural formula or reaction equation image There is no apply for aspect.
Invention content
To solve the problems, such as the chemical structural formula on Freehandhand-drawing and picture or reaction equation identification, the purpose of the present invention is user exists The chemical structural formula or reaction equation drawn on tool or paper take pictures upload after obtain the composition and knot of corresponding construction formula or reaction equation The association attributes of structure formula.
To achieve the above object, the present invention provides a kind of chemical information identification side based on deep learning image recognition technology Method comprising following steps:
(1) node is carried out using the node destination identifier based on deep learning image recognition technology to the image of input Identification;
(2) node identified in step (1) is known using the hand-written script target based on deep learning image recognition technology Other device carries out the identification of content of text, and then determines the corresponding specific atom of node;
(3) above-mentioned multiple atoms identified are combined two-by-two, is reused based on deep learning image recognition technology Two interatomic chemical bond of chemical bond target marker pair is identified;
(4) search the attribute of the above-mentioned atom identified in the database, including relative atomic mass, isotopic mass and The information such as abundance, common chemical valence, calculate the association attributes of structural formula, export;
Alternatively, the above-mentioned atom identified and interatomic chemical bond are stored as to the file of self-defined king formats, it is defeated Go out;
Alternatively, the above-mentioned atom identified and interatomic chemical bond are plotted in new picture, export.
In addition, further comprising the steps of:
(5) arrow is carried out using the arrow target marker based on deep learning image recognition technology to the image of input Identification;
Then the atom and interatomic chemical bond above-mentioned arrow identified identified with step (2) (3) is stored as The file of self-defined king formats, output;
Alternatively, atom and interatomic chemical bond that the above-mentioned arrow identified is identified with step (2) (3) are plotted in In new picture, output.
Wherein, the target marker based on deep learning image recognition technology described in step (1) (3) (5) is advance Using based on deep learning image recognition technology, the faster-rcnn algorithms that Ross Girshick team proposes are instructed offline It gets, for identification the arrow in image, atom and its space coordinate and chemical bond.
Wherein, the hand-written script target marker based on deep learning image recognition technology described in step (2) is pre- First using deep learning image recognition technology is based on, the LeNet models of Caffe carry out what off-line training obtained, scheme for identification Content of text as in.
Preferably, the step of carrying out off-line training to the target marker includes using image set come off-line training institute State target marker.
The image set that the training target marker uses includes:(a) hand-written script picture;(b) a variety of and polymorphic type Learn the node of key connection;(c) the common chemical bond such as singly-bound, double bond, three keys;(d) arrow artwork is commonly used in chemistry.
It is highly preferred that a hand-written script identifier is trained in LeNet models using described image collection (a), for true It is " carbon " element of element or plain text either without showing in the periodic table of elements to determine node.
A node destination identifier is trained in faster-rcnn algorithms using described image collection (b), for determining figure All nodes and its space coordinate as in.
A chemical bond target marker is trained in faster-rcnn algorithms using described image collection (c), for determining It whether there is chemical bond between interatomic chemical bond types and atom.
An arrow target marker is trained in faster-rcnn algorithms using described image collection (d), it is defeated for determining Enter and whether there is arrow and its spatial position coordinate in image.
Wherein, step (3) specifically includes following steps:
Step (31), for all atoms identified, combination of two uses the chemical bond target marker, identification Wherein whether contain chemical bond, and identifies the type of chemical bond when containing chemical bond;
Step (32) is added two atoms and is associated with according to the above-mentioned type recognised that containing chemical bond and chemical bond, Association type is the chemical bond types identified.
Wherein, the association attributes of the calculating structural formula described in step (4) include:
Step (41) ensures 8 electronically stable structure of atom outermost layer according to atom and interatomic chemical bond, automatic to mend Hydrogen counts the type of atom and the number of atom, generates the molecular formula of chemical structural formula;
Step (42) turns general smiles titles, according to disclosed smiles according to atom and interatomic chemical bond Structural formula is turned smiles titles by protocol format;
Step (43) searches the corresponding English name of chemical structural formula in the database by corresponding smiles;
Step (44), accurate molecular mass, relative molecular mass, the mass-to-charge ratio for calculating molecular formula correspond to abundance.
Step (44) is fallen into a trap the accurate molecular mass of point counting minor, maximum by all atom isotope abundance in molecular formula Atomic mass adduction acquires;Relative molecular mass is calculated, is acquired by all atom relative atomic mass adductions in molecular formula;It calculates Mass-to-charge ratio corresponds to abundance, by equation (a+b)nExpansion coefficient calculates, and a, b represent the isotope of same atoms, and n represents molecule In this atom number.
Wherein, step (4) the self-defined king is the text file using UTF8 said shanks, and each online structural formula is compiled Volume device can voluntarily resolution file content, can again be edited in editing machine.
The advantageous effect of the present invention compared with the existing technology is:The present invention is to Freehandhand-drawing chemical structural formula or reaction equation and figure Chemical structural formula in piece or reaction equation can be led to after recognition node, identification atom, identification chemical bond, identification arrow processing The content for crossing identification obtains the chemical structural formula or reaction equation of computer understanding, and can be calculated the phase of structural formula by some Attribute is closed, can be widely applied in the routine work of chemist, such as structural formula editing machine, word document etc., save It draws the time, improves work efficiency.
Description of the drawings
Fig. 1 is the flow diagram of the method for the invention;
Fig. 2 is self-defined king form schematic diagrams of the present invention.
Specific implementation mode
The present invention is further explained below in conjunction with specific embodiments and the drawings.
Embodiment 1
As shown in Figure 1, a kind of chemical information recognition methods based on deep learning image recognition technology comprising following step Suddenly:
Step 1, arrow is carried out using the arrow target marker based on deep learning image recognition technology to the image of input The identification of head;
Step 2, the image of input is saved using the node destination identifier based on deep learning image recognition technology The identification of point;
Step 3, the node identified in step 2 is known using the hand-written script target based on deep learning image recognition technology Other device carries out the identification of content of text, and then determines the corresponding specific atom of node;
Step 4, above-mentioned multiple atoms identified are combined two-by-two, is reused based on deep learning image recognition technology Two interatomic chemical bond of chemical bond target marker pair be identified;
Step 5, the attribute of the above-mentioned atom identified, including relative atomic mass, isotope matter are searched in the database The information such as amount and abundance, common chemical valence, calculate the association attributes of structural formula, export;
Alternatively, step 6, self-defined king lattice are stored as by the above-mentioned arrow identified, atom and interatomic chemical bond The file of formula, output;
Alternatively, step 7, the above-mentioned arrow identified, atom and interatomic chemical bond are plotted in new picture, it is defeated Go out.
Wherein, the target marker based on deep learning image recognition technology described in step 1, step 2 and step 4 is In advance using be based on deep learning image recognition technology, Ross Girshick team propose faster-rcnn algorithms carry out from Line training obtains, for identification the chemical bond in the atom in image and its space coordinate and identification image, institute in step 3 The hand-written script identifier based on deep learning image recognition technology stated is advance using based on deep learning image recognition skill The LeNet models of art, Caffe carry out what off-line training obtained, and the content of text in image, knows the target for identification It includes using image set come target marker described in off-line training that other device, which carries out the step of off-line training, and described image collection includes: (a) hand-written script picture:One hand-written script identifier of training in the LeNet models of Caffe, for determining that node is element Element or plain text in periodic table are either without " carbon " element of display;(b) the chemical key connection of a variety of and polymorphic type Node:One node destination identifier of training in the faster-rcnn algorithms that RossGirshick team proposes, for determining All nodes and its space coordinate in image;(c) the common chemical bond such as singly-bound, double bond, three keys:In Ross Girshick team In the faster-rcnn algorithms of proposition training one chemical bond target marker, for determine interatomic chemical bond types and It whether there is chemical bond between atom;(d) arrow artwork is commonly used in chemistry:In the faster- that Ross Girshick team proposes One arrow target marker of training in rcnn algorithms, sits for determining in input picture with the presence or absence of arrow and its spatial position Mark.
Step 4 specifically includes following steps:
Step 41, for all atoms identified, combination of two identifies it using the chemical bond target marker In whether contain chemical bond, and the type of chemical bond is identified when containing chemical bond;
Step 42, according to the above-mentioned type recognised that containing chemical bond and chemical bond, two atoms is added and are associated with, closed It is the chemical bond types identified to join type.
The association attributes of the calculating structural formula include in steps of 5:
Step 51, according to atom and interatomic chemical bond, ensure 8 electronically stable structure of atom outermost layer, mend hydrogen automatically, The type of atom and the number of atom are counted, the molecular formula of chemical structural formula is generated;
Step 52, general smiles titles are turned according to atom and interatomic chemical bond, is assisted according to disclosed smiles Structural formula is turned smiles titles by view format;
Step 53, the corresponding English name of chemical structural formula is searched by corresponding smiles in the database;
Step 54, accurate molecular mass, relative molecular mass, the mass-to-charge ratio for calculating molecular formula correspond to abundance:Calculate molecule The accurate molecular mass of formula is acquired by the maximum atomic mass adduction of all atom isotope abundance in molecular formula;It calculates opposite Molecular mass is acquired by all atom relative atomic mass adductions in molecular formula;It calculates mass-to-charge ratio and corresponds to abundance, by equation (a +b)nExpansion coefficient calculates, and a, b represent the isotope of same atoms, and n represents the number of this atom in molecule, such as:Chlorine (Cl) element, isotope have Cl35-34.96885、Cl37- 36.9659, corresponding abundance is 75.78%, 24.22%, molecular formula Cl2Mass-to-charge ratio and corresponding abundance are calculated as, (Cl35+Cl37)2Corresponding expansion is (Cl35)2+2Cl35Cl37+(Cl37)2, then matter There are three types of lotus ratio m/z, is respectively:Cl35+Cl35=34.96885+34.96885=69.9377, Cl35+Cl37=34.96885+ 36.9659=71.93475, Cl37+Cl37=36.9659+36.9659=73.9318;Corresponding abundance is:Cl35*Cl35= 75.78%*75.78%=0.57426084, Cl35*Cl37* 2=75.78%*24.22%*2=0.36707832, Cl37*Cl37 =24.22%*24.22%=0.05866084, corresponding abundance is as shown in table 1 after normalization:
1 molecular formula Cl of table2Corresponding abundance after normalization
m/z Abundance
69.9377 100%
71.93475 63.9%
73.9318 10.2%
Self-defined king described in step 6 is the text file using UTF8 said shanks, and each online structural formula editing machine can Voluntarily resolution file content can again edit in editing machine, format as shown in Figure 2:
AtomBlock stores atom in Fig. 2, including following format:
Begin Atom
Index Type x y HCount
End Atom
Wherein, identify that multiple atoms just add the text of multigroup same format between Begin Atom and End Atom. Index is ordinal number, is incremented by since 1;Type is element term, example:“C”;X is the x coordinate of atom in the planes;Y is atom Y-coordinate in the planes;HCount is the hydrogen number of the atom coordination valence.
BondBlock stores interatomic chemical bond in Fig. 2, including following format:
Begin Bond
Index Type Atom1index Atom2index
End Bond
Wherein, identify that multiple chemical bonds just add the text of multigroup same format between Begin Bond and End Bond This.Index is ordinal number, is incremented by since 1;Type is chemical bond types;Atom1index is that one of connection atom exists Ordinal number in Atom Block;Atom2index is ordinal number of another atom of connection in Atom Block.
Text Block store plain text information in Fig. 2, including following format:
Begin Text
Index x y Text
End Text
Wherein, identify that multiple plain texts just add the text of multigroup same format between Begin Text and End Text This.Index is ordinal number, is incremented by since 1;X is the x coordinate of plain text in the planes;Y is the y-coordinate of plain text in the planes; Text is the content of plain text.
It includes following format that Shape Block, which store arrow information, in Fig. 2:
Begin Shape
Index x1,y1;x2,y2
End Shape
Wherein, identify that multiple arrows just add the text of multigroup same format between Begin Shape and End Shape This.Index is ordinal number, is incremented by since 1;X1 is the starting point x coordinate of arrow in the planes;Y1 is the starting point of arrow in the planes Y-coordinate;X2 is the terminal x coordinate of arrow in the planes;Y2 is the terminal y-coordinate of arrow in the planes.
Certainly, the above is only presently preferred embodiments of the present invention, should not be construed as limiting the implementation to the present invention Example range.The present invention is also not limited to the example above, essential scope of the those skilled in the art in the present invention Interior made all the changes and improvements etc. should all belong in the patent covering scope of the present invention.

Claims (10)

1. a kind of chemical information recognition methods based on deep learning image recognition technology, which is characterized in that include the following steps:
(1) knowledge of node is carried out using the node destination identifier based on deep learning image recognition technology to the image of input Not;
(2) the hand-written script target marker based on deep learning image recognition technology is used to the node identified in step (1) The identification of content of text is carried out, and then determines the corresponding specific atom of node;
(3) above-mentioned multiple atoms identified are combined two-by-two, reuses the chemistry based on deep learning image recognition technology Two interatomic chemical bond of key target marker pair is identified;
(4) attribute for searching the above-mentioned atom identified in the database calculates the association attributes of structural formula, output;
Alternatively, the above-mentioned atom identified and interatomic chemical bond to be stored as to the file of self-defined king formats, export;
Alternatively, the above-mentioned atom identified and interatomic chemical bond are plotted in new picture, export.
2. a kind of chemical information recognition methods based on deep learning image recognition technology according to claim 1, special Sign is, further comprising the steps of:
(5) knowledge of arrow is carried out using the arrow target marker based on deep learning image recognition technology to the image of input Not;
Then the atom and interatomic chemical bond above-mentioned arrow identified identified with step (2) (3) is stored as making by oneself The file of adopted king formats, output;
Alternatively, atom that the above-mentioned arrow identified and step (2) (3) identify and interatomic chemical bond are plotted in new In picture, output.
3. a kind of chemical information recognition methods based on deep learning image recognition technology according to claim 2, special Sign is that the target marker based on deep learning image recognition technology described in step (1) (2) (3) (5) is to use in advance Carry out what off-line training obtained based on deep learning image recognition technology.
4. a kind of chemical information recognition methods based on deep learning image recognition technology according to claim 3, special The step of sign is, off-line training is carried out to the target marker includes being known come target described in off-line training using image set Other device.
5. a kind of chemical information recognition methods based on deep learning image recognition technology according to claim 4, special Sign is that the image set that the training target marker uses includes:(a) hand-written script picture;(b) a variety of and polymorphic type is chemical The node of key connection;(c) chemical bond is commonly used;(d) arrow artwork is commonly used in chemistry.
6. a kind of chemical information recognition methods based on deep learning image recognition technology according to claim 5, special Sign is, a hand-written script identifier is trained in LeNet models using described image collection (a), for determining that node is member Element or plain text in plain periodic table are either without " carbon " element of display;
A node destination identifier is trained in faster-rcnn algorithms using described image collection (b), for determining in image All nodes and its space coordinate;
A chemical bond target marker is trained in faster-rcnn algorithms using described image collection (c), for determining atom Between chemical bond types and atom between whether there is chemical bond;
An arrow target marker is trained in faster-rcnn algorithms using described image collection (d), for determining that input is schemed It whether there is arrow and its spatial position coordinate as in.
7. a kind of chemical information based on deep learning image recognition technology according to claim 1~6 any one is known Other method, which is characterized in that step (3) specifically includes following steps:
Step (31), for all atoms identified, combination of two, using the chemical bond target marker, identification is wherein Whether contain chemical bond, and identifies the type of chemical bond when containing chemical bond;
Step (32) is added two atoms and is associated with according to the above-mentioned type recognised that containing chemical bond and chemical bond, association Type is the chemical bond types identified.
8. a kind of chemical information based on deep learning image recognition technology according to claim 1~7 any one is known Other method, which is characterized in that the association attributes of the calculating structural formula described in step (4) include:
Step (41) ensures 8 electronically stable structure of atom outermost layer, mends hydrogen automatically, unite according to atom and interatomic chemical bond The type of atom and the number of atom are counted, the molecular formula of chemical structural formula is generated;
Step (42) turns general smiles titles, according to disclosed smiles agreements according to atom and interatomic chemical bond Structural formula is turned smiles titles by format;
Step (43) searches the corresponding English name of chemical structural formula in the database by corresponding smiles;
Step (44), accurate molecular mass, relative molecular mass, the mass-to-charge ratio for calculating molecular formula correspond to abundance.
9. a kind of chemical information recognition methods based on deep learning image recognition technology according to claim 8, special Sign is, step (44) is fallen into a trap the accurate molecular mass of point counting minor, maximum by all atom isotope abundance in molecular formula Atomic mass adduction acquires;Relative molecular mass is calculated, is acquired by all atom relative atomic mass adductions in molecular formula;It calculates Mass-to-charge ratio corresponds to abundance, by equation (a+b)nExpansion coefficient calculates, and a, b represent the isotope of same atoms, and n represents molecule In this atom number.
10. according to a kind of chemical information knowledge based on deep learning image recognition technology as described in any one of claim 1 to 9 Other method, which is characterized in that step (4) the self-defined king is the text file using UTF8 said shanks.
CN201810098220.0A 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology Active CN108334839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810098220.0A CN108334839B (en) 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810098220.0A CN108334839B (en) 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology
PCT/CN2018/105414 WO2019148852A1 (en) 2018-01-31 2018-09-13 Chemical information identification method based on deep learning image identification technology

Publications (2)

Publication Number Publication Date
CN108334839A true CN108334839A (en) 2018-07-27
CN108334839B CN108334839B (en) 2021-09-14

Family

ID=62927657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810098220.0A Active CN108334839B (en) 2018-01-31 2018-01-31 Chemical information identification method based on deep learning image identification technology

Country Status (2)

Country Link
CN (1) CN108334839B (en)
WO (1) WO2019148852A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148852A1 (en) * 2018-01-31 2019-08-08 青岛清原精准农业科技有限公司 Chemical information identification method based on deep learning image identification technology
CN110413740A (en) * 2019-08-06 2019-11-05 百度在线网络技术(北京)有限公司 Querying method, device, electronic equipment and the storage medium of chemical expression
CN111897987A (en) * 2020-07-10 2020-11-06 山西大学 Molecular structure diagram retrieval method based on evolution calculation multi-view fusion
WO2021125206A1 (en) * 2019-12-16 2021-06-24 富士フイルム株式会社 Image analysis device, image analysis method, and program
CN114842486A (en) * 2022-07-04 2022-08-02 南昌大学 Handwritten chemical structural formula recognition method, system, storage medium and equipment
CN114898391A (en) * 2022-07-12 2022-08-12 苏州阿尔脉生物科技有限公司 Method and device for determining chemical reaction route and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3937106A1 (en) * 2020-07-08 2022-01-12 Tata Consultancy Services Limited System and method of extraction of information and graphical representation for design of formulated products

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08173155A (en) * 1994-12-28 1996-07-09 Fujitsu Ltd Method for analyzing compound and system therefor
CN101261554A (en) * 2008-04-21 2008-09-10 东莞市步步高教育电子产品有限公司 Formula, expression hand-written inputting and computing system and method
CN101329731A (en) * 2008-06-06 2008-12-24 南开大学 Automatic recognition method pf mathematical formula in image
US20100163316A1 (en) * 2008-12-30 2010-07-01 Microsoft Corporation Handwriting Recognition System Using Multiple Path Recognition Framework
CN102033866A (en) * 2009-09-29 2011-04-27 国际商业机器公司 Method and system for checking chemical name
US20120141032A1 (en) * 2010-12-03 2012-06-07 Massachusetts Institute Of Technology Sketch recognition system
CN102693303A (en) * 2012-05-18 2012-09-26 上海极值信息技术有限公司 Method and device for searching formulation data
CN103700084A (en) * 2012-09-28 2014-04-02 淮海工学院 Chemical molecular structure chart partition method based on area size and curvature
US20140301608A1 (en) * 2011-08-26 2014-10-09 Council Of Scientific & Industrial Research Chemical structure recognition tool
CN105894931A (en) * 2016-06-06 2016-08-24 宁波市铭时三维科技发展有限公司 Two-dimensional code containing three-dimensional printing method for using molecular structure model as chemical training aid
CN106372456A (en) * 2016-08-26 2017-02-01 浙江工业大学 Deep learning Residue2vec-based protein structure prediction method
US20170091597A1 (en) * 2015-09-26 2017-03-30 Wolfram Research, Inc. Method and computing device for optically recognizing mathematical expressions
CN106650686A (en) * 2016-12-30 2017-05-10 南开大学 Online hand-written chemical symbol identification method based on Hidden Markov model
CN106874688A (en) * 2017-03-01 2017-06-20 中国药科大学 Intelligent lead compound based on convolutional neural networks finds method
CN106980856A (en) * 2016-01-15 2017-07-25 上海谦问万答吧云计算科技有限公司 Formula identification method and system and symbolic reasoning computational methods and system
CN107169485A (en) * 2017-03-28 2017-09-15 北京捷通华声科技股份有限公司 A kind of method for identifying mathematical formula and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5157736A (en) * 1991-04-19 1992-10-20 International Business Machines Corporation Apparatus and method for optical recognition of chemical graphics
CN108334839B (en) * 2018-01-31 2021-09-14 青岛清原精准农业科技有限公司 Chemical information identification method based on deep learning image identification technology

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08173155A (en) * 1994-12-28 1996-07-09 Fujitsu Ltd Method for analyzing compound and system therefor
CN101261554A (en) * 2008-04-21 2008-09-10 东莞市步步高教育电子产品有限公司 Formula, expression hand-written inputting and computing system and method
CN101329731A (en) * 2008-06-06 2008-12-24 南开大学 Automatic recognition method pf mathematical formula in image
US20100163316A1 (en) * 2008-12-30 2010-07-01 Microsoft Corporation Handwriting Recognition System Using Multiple Path Recognition Framework
CN102033866A (en) * 2009-09-29 2011-04-27 国际商业机器公司 Method and system for checking chemical name
US20120141032A1 (en) * 2010-12-03 2012-06-07 Massachusetts Institute Of Technology Sketch recognition system
US20140301608A1 (en) * 2011-08-26 2014-10-09 Council Of Scientific & Industrial Research Chemical structure recognition tool
CN102693303A (en) * 2012-05-18 2012-09-26 上海极值信息技术有限公司 Method and device for searching formulation data
CN103700084A (en) * 2012-09-28 2014-04-02 淮海工学院 Chemical molecular structure chart partition method based on area size and curvature
US20170091597A1 (en) * 2015-09-26 2017-03-30 Wolfram Research, Inc. Method and computing device for optically recognizing mathematical expressions
CN106980856A (en) * 2016-01-15 2017-07-25 上海谦问万答吧云计算科技有限公司 Formula identification method and system and symbolic reasoning computational methods and system
CN105894931A (en) * 2016-06-06 2016-08-24 宁波市铭时三维科技发展有限公司 Two-dimensional code containing three-dimensional printing method for using molecular structure model as chemical training aid
CN106372456A (en) * 2016-08-26 2017-02-01 浙江工业大学 Deep learning Residue2vec-based protein structure prediction method
CN106650686A (en) * 2016-12-30 2017-05-10 南开大学 Online hand-written chemical symbol identification method based on Hidden Markov model
CN106874688A (en) * 2017-03-01 2017-06-20 中国药科大学 Intelligent lead compound based on convolutional neural networks finds method
CN107169485A (en) * 2017-03-28 2017-09-15 北京捷通华声科技股份有限公司 A kind of method for identifying mathematical formula and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BRADLEY EMI: "Optical Recognition of Hand-Drawn Chemical Structures", 《HTTPS://WEB.STANFORD.EDU》 *
IGOR V. FILIPPOV 等: "Modern Approaches to Chemical Image", 《CURRENT CHALLENGES IN PATENT INFORMATION RETRIEVAL》 *
PENG TANG 等: "Online Chemical Symbol Recognition for Handwritten Chemical Expression Recognition", 《2013 IEEE ICIS》 *
PRERANA JANA 等: "Generation of Search-able PDF of the Chemical Equations segmented from Document Images", 《DOCENG"16》 *
杨巨峰 等: "联机手写化学公式识别与分析", 《中国图象图形学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019148852A1 (en) * 2018-01-31 2019-08-08 青岛清原精准农业科技有限公司 Chemical information identification method based on deep learning image identification technology
CN110413740A (en) * 2019-08-06 2019-11-05 百度在线网络技术(北京)有限公司 Querying method, device, electronic equipment and the storage medium of chemical expression
WO2021125206A1 (en) * 2019-12-16 2021-06-24 富士フイルム株式会社 Image analysis device, image analysis method, and program
CN111897987A (en) * 2020-07-10 2020-11-06 山西大学 Molecular structure diagram retrieval method based on evolution calculation multi-view fusion
CN111897987B (en) * 2020-07-10 2022-05-31 山西大学 Molecular structure diagram retrieval method based on evolution calculation multi-view fusion
CN114842486A (en) * 2022-07-04 2022-08-02 南昌大学 Handwritten chemical structural formula recognition method, system, storage medium and equipment
CN114898391A (en) * 2022-07-12 2022-08-12 苏州阿尔脉生物科技有限公司 Method and device for determining chemical reaction route and electronic equipment

Also Published As

Publication number Publication date
WO2019148852A1 (en) 2019-08-08
CN108334839B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN108334839A (en) A kind of chemical information recognition methods based on deep learning image recognition technology
US20180276245A1 (en) Relationship Mapping Employing Multi-Dimensional Context Including Facial Recognition
CN104778258B (en) A kind of data pick-up method of Protocol-oriented data flow
Friz et al. Large deviations and asymptotic methods in finance
DE10322725A1 (en) Image annotation information generation method in computer system, involves generating annotation information automatically from selected images, using associated information and annotating images with generated information
CN104899340B (en) A kind of IETM technical information fragment retrieval device and its search method based on fragment of most compacting
CN103886020B (en) A kind of real estate information method for fast searching
CN103399857B (en) General method for extracting document structural information
CN111931269A (en) Automatic checking method and system for consistency of information of BIM and important components in drawings
CN103678593A (en) Interactive space scene retrieval method based on space scene draft description
CN103294791A (en) Extensible markup language pattern matching method
CN104217025A (en) System and method for extracting record items of multi-record web page
Baumann Publish and perish? The impact of citation indexing on the development of new fields of environmental research
CN103927168B (en) A kind of method and device of object-oriented data model persistence
JP4348357B2 (en) Related document display device
CN105868189A (en) Method and device for establishing spatial index of electronic map
CN102486767B (en) Method and device for labeling content
CN110096640A (en) User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items
Lüscher et al. Matching road data of scales with an order of magnitude difference
CN109388683A (en) A kind of log sheet information batch extracting method
CN107506339A (en) A kind of SCD nodes verification error localization method and device based on character skew
JP6511954B2 (en) Information processing apparatus and program
CN109885797B (en) Relational network construction method based on multi-identity space mapping
CN110334237B (en) Multi-mode data-based three-dimensional object retrieval method and system
Li et al. Administrative Divisions of Addresses Matching Algorithm Based on Moving Window Algorithm for Maximal Matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant