CN111709293A - Chemical structural formula segmentation method based on Resunet neural network - Google Patents

Chemical structural formula segmentation method based on Resunet neural network Download PDF

Info

Publication number
CN111709293A
CN111709293A CN202010419502.3A CN202010419502A CN111709293A CN 111709293 A CN111709293 A CN 111709293A CN 202010419502 A CN202010419502 A CN 202010419502A CN 111709293 A CN111709293 A CN 111709293A
Authority
CN
China
Prior art keywords
size
multiplied
chemical structural
neural network
resunet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010419502.3A
Other languages
Chinese (zh)
Other versions
CN111709293B (en
Inventor
王毅刚
邵锦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010419502.3A priority Critical patent/CN111709293B/en
Publication of CN111709293A publication Critical patent/CN111709293A/en
Application granted granted Critical
Publication of CN111709293B publication Critical patent/CN111709293B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a chemical structural formula segmentation method based on a ResUNet neural network. The invention comprises the following steps: constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generation training set T-2; step (2) sending the training set T into a Resunet neural network for training until the specified times of training or the Loss curve is not reduced and the precision is not improved any more, and storing the trained Resunet neural network model; and (3) segmenting the chemical structural formula by using the ResUNet neural network model trained in the step (2). The invention provides an improved ResUNet neural network on the basis of the ResUNet neural network, and simultaneously provides a method for automatically generating a large number of training sets of chemical structural formulas to generate the training sets, so that the ResUNet neural network can segment the chemical structural formulas, and the aim of improving the recognition accuracy of the neural network by using a large amount of data is fulfilled.

Description

Chemical structural formula segmentation method based on Resunet neural network
Technical Field
The invention belongs to the technical field of computer detection, and particularly relates to a chemical structural formula segmentation method based on a Resunet neural network.
Background
Part of scientific experimentation that is often of critical importance is the rapid processing and absorption of newly acquired data. In addition, new research methods do not necessarily collect, analyze, and utilize previously published experimental data. This is particularly useful for the discovery of small molecule drugs where the experimentally tested molecular pool is used for virtual screening programs, quantitative structure activity/property relationship (QSAR/QSPR) analysis, or physical modeling method based validation. Due to the difficulty and expense of generating large amounts of experimental data, many drug discovery programs are forced to rely on relatively small internal experimental databases. One promising solution to address the general lack of appropriate training set data in drug discovery is to utilize data that is currently being published. The Medline report published more than 2000 new life science papers every day, and it is becoming increasingly important to address problems associated with data extraction and management, and to automate these processes as much as possible, given that new experimental data is entering the public literature at such a high rate. The extraction of chemical structures from published sources such as journal articles and patent documents in life sciences is still difficult and very time consuming.
At present, a large number of books and other publications are still available only in paper or scanned versions, making reuse difficult. On the one hand, the materials of paper or scanned plates are not easy to retrieve, so that information scattered in a large amount of documents is not easy to find, and thus, the information is not fully utilized. On the other hand, further processing of these materials involves cumbersome and error-prone re-input work.
The research on the identification of chemical structural formulas has progressed slowly, mainly due to: firstly, a formula is surrounded by natural language in a document and is difficult to locate; secondly, due to the complex structure of the chemical structural formula, the symbols are various and have various fonts and different sizes, and the characters have the characteristics of non-regularity, logicality, complexity and the like.
The existing identification method of chemical structural formulas is divided into two steps: firstly, positioning and segmenting a chemical structural formula from a natural language; and secondly, sending the divided chemical structural formula into an identification engine for identification. The current segmentation method of chemical structural formula is basically completed based on the traditional image processing method, the segmentation accuracy is low, and special situations such as close distance between natural language and chemical molecular formula cannot be processed.
Disclosure of Invention
Based on the above, in order to improve the positioning and segmentation accuracy of the chemical structural formula, the invention provides an improved ResUNet neural network on the basis of the ResUNet neural network, and simultaneously provides a method for automatically generating a large number of training sets of the chemical structural formula to generate the training sets, so that the ResUNet neural network can segment the chemical structural formula, and the aim of improving the recognition accuracy of the neural network by using a large amount of data is fulfilled.
A method for segmenting a chemical structural formula based on a ResUNet neural network comprises the following steps:
and (1) constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generation training set T-2. Wherein, the chemical formula in the manual label publication is used as part of training set T-1, and a method for automatically generating the chemical structural formula training set is used for generating the training set T-2, the capacity ratio of the training set T-1 to the training set T-2 is 1: 50;
step (2) sending the training set T into an improved Resunet neural network for training until the specified times of training or the Loss curve is not reduced and the precision is not improved any more, and storing the trained Resunet neural network model;
and (3) segmenting the chemical structural formula by using the ResUNet neural network model trained in the step (2).
Further, the method for automatically generating the training set of the chemical structural formula is used for generating the training set based on the random filling of the images of the typesetting template, and the construction method comprises the following steps:
a. and constructing a typesetting template, and randomly generating text data in the character area.
b. A large number of chemical structural images are generated.
c. And searching blank positions in the typesetting template, randomly filling the chemical structural formula image formula and marking.
Further, the method for constructing the typesetting template comprises the following steps:
a-1, manually calibrating character areas in 200 pages of publications, rotating, turning up, down, left and right to expand data, and generating a typesetting template with 1000 pages, wherein the typesetting template marked manually is shown in an attached figure 2.
a-2, using the characters generated by the internet characters and the random text generator as the text data, and filling the text data into the character area in the typesetting template at random, and the generated result is shown in figure 3.
Further, the method for generating a plurality of chemical structural formula images comprises the following steps:
and b-1, rendering 5700 ten thousand molecule data available in a PubChem database into a 3-channel PNG format image of 256x256 pixels of various types (key width, character size and the like) at random by using Indigo software.
And b-2, carrying out angle rotation and data expansion operation of up-down and left-right inversion on the images to generate 10 ten thousand small molecular chemical structural images.
Further, the method for searching the blank position in the typesetting template to randomly fill the chemical structural formula image and mark comprises the following steps:
and c-1, randomly taking out the generated chemical structural formula image, and placing the chemical structural formula image at a blank position outside the text area after random scaling to obtain a data part in the training set T-2, as shown in the attached figure 4.
And c-2, marking the positions of the pixels occupied by the chemical structural formula image pixel by pixel to obtain the label part of the training set T-2, as shown in the attached figure 5.
Further, the improved ResUNet neural network is implemented as:
taking the training set T as an input image of the improved ResUNet neural network, wherein the input image is 512 multiplied by 3, and outputting a feature map res-1 with the size of 256 multiplied by 64 after 7 multiplied by 7 convolution of a first layer; then, pooling by using the maximum value of 3 × 3, repeating the convolution for three times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 for 9 times, and outputting a feature map res-2 with the size of 128 × 128 × 256; then, after repeating convolution for 12 times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 four times, outputting a feature map res-3 with the size of 64 × 64 × 512, and then repeating convolution for 18 times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 six times, outputting a feature map res-4 with the size of 32 × 32 × 1024; then, after repeating convolution for three times with the size of 1 multiplied by 1, the size of 3 multiplied by 3 and the size of 1 multiplied by 1 for 9 times, outputting a characteristic diagram res-5 with the size of 16 multiplied by 2048; then carrying out convolution with the size of 1 multiplied by 1, and outputting a characteristic diagram conv-1 with the size of 16 multiplied by 1024; then 2 x2 upsampling is carried out, and the output characteristic diagram up-1 and the characteristic diagram res-4 are spliced to obtain a 32 x 2048 size characteristic diagram concat-1; then, carrying out convolution with the size of 3 multiplied by 3, and outputting a feature map conv-2 with the size of 32 multiplied by 512; then 2 x2 upsampling is carried out, and the output characteristic graph up-2 and the characteristic graph res-3 are spliced to obtain a characteristic graph concat-2 with the size of 64 x 1024; then, carrying out convolution with the size of 3 multiplied by 3 to output a feature map conv-3 with the size of 64 multiplied by 256; then 2 x2 upsampling is carried out, and the output characteristic graph up-3 and the characteristic graph res-2 are spliced to obtain a 128 x 512 size characteristic graph concat-3; then carrying out convolution with the size of 3 multiplied by 3 to output a feature map conv-4 with the size of 128 multiplied by 64; then 2 x2 upsampling is carried out, and the output characteristic diagram up-4 and the characteristic diagram res-1 are spliced to obtain a 256x 128 size characteristic diagram concat-4; then carrying out convolution with the size of 3 multiplied by 3 to output a characteristic diagram conv-5 with the size of 256 multiplied by 64; finally, after 2 × 2 upsampling and 1 × 1 size convolution, a 512 × 512 × 2 result graph corresponding to the size of the original input image is output.
The invention has the following beneficial effects:
the invention provides an improved ResUNet neural network on the basis of the ResUNet neural network, and simultaneously provides a method for automatically generating a large number of training sets of chemical structural formulas to generate the training sets, so that the ResUNet neural network can segment the chemical structural formulas, and the aim of improving the recognition accuracy of the neural network by using a large amount of data is fulfilled.
Drawings
FIG. 1 is a schematic flow chart of an improved ResUNet neural network of the present invention;
FIG. 2 is a schematic diagram of a sample manual marking template according to the present invention;
FIG. 3 is a schematic diagram of a template sample after random text padding according to the present invention;
FIG. 4 is a schematic diagram of a template sample after random filling of a chemical structural formula according to the present invention;
FIG. 5 is a schematic diagram of a corresponding example of a mark of the template of the present invention;
Detailed Description
In order to describe the present invention more specifically, a method for segmenting a chemical structural formula based on a ResUNet neural network according to the present invention is described in detail below with reference to the accompanying drawings and the detailed description.
A method for segmenting a chemical structural formula based on a ResUNet neural network comprises the following steps:
and (1) constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generation training set T-2. Wherein, the chemical formula in the manual label publication is used as part of training set T-1, and a method for automatically generating the chemical structural formula training set is used for generating the training set T-2, the capacity ratio of the training set T-1 to the training set T-2 is 1: 50;
the method for automatically generating the chemical structural formula training set is used for generating the training set based on the random filling of the images of the typesetting template, and the construction method comprises the following steps:
a. constructing a typesetting template, randomly generating character data in a character area, manually calibrating the character area in the typesetting object of 200 pages, rotating, reversing up, down, left and right to expand data, generating 1000 pages of the template, wherein the manual marking template is shown in figure 2, characters generated by an internet character and random text generator are used as text data, the text data is randomly filled in the character area in the typesetting template, and the generated result is shown in figure 3.
b. Generating a large number of chemical structural formula images, using 5700 ten thousand molecule data available in a PubChem database, randomly rendering part of molecular data in the images into 3-channel PNG format images of 256x256 pixels of various types (key width, character size and the like) by using Indigo software, performing angle rotation on the images, performing data expansion operation of up-down and left-right inversion, and generating 10 ten thousand small molecule chemical structural formula images.
c. And searching blank positions in the typesetting template, randomly filling and marking the chemical structural formulas, randomly taking out the generated chemical structural formula images, and placing the images at the blank positions outside the text area after random scaling, as shown in the attached figure 4. The pixel-by-pixel positions occupied by the chemical structural image are marked as shown in fig. 5.
Step (2) constructing an improved Resunet neural network as shown in fig. 1, sending a training data set into the improved Resunet neural network for training until the training is appointed times or the Loss curve is not reduced any more and the precision is not improved any more, and storing the trained model;
further, the improved ResUNet neural network comprises the following steps: taking the training set T as an input image of the improved ResUNet neural network, wherein the input image is 512 multiplied by 3, and outputting a feature map res-1 with the size of 256 multiplied by 64 after 7 multiplied by 7 convolution of a first layer; then, pooling by using the maximum value of 3 × 3, repeating the convolution for three times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 for 9 times, and outputting a feature map res-2 with the size of 128 × 128 × 256; then, after repeating convolution for 12 times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 four times, outputting a feature map res-3 with the size of 64 × 64 × 512, and then repeating convolution for 18 times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 six times, outputting a feature map res-4 with the size of 32 × 32 × 1024; then, after repeating convolution for 9 times with the size of 1 multiplied by 1, the size of 3 multiplied by 3 and the size of 1 multiplied by 1 for three times, outputting a size characteristic diagram res-5 of 16 multiplied by 2048; then carrying out convolution with the size of 1 multiplied by 1, and outputting a characteristic diagram conv-1 with the size of 16 multiplied by 1024; then 2 x2 upsampling is carried out, and the output characteristic diagram up-1 and the characteristic diagram res-4 are spliced to obtain a 32 x 2048 size characteristic diagram concat-1; then, carrying out convolution with the size of 3 multiplied by 3, and outputting a feature map conv-2 with the size of 32 multiplied by 512; then 2 x2 upsampling is carried out, and the output characteristic graph up-2 and the characteristic graph res-3 are spliced to obtain a characteristic graph concat-2 with the size of 64 x 1024; then, carrying out convolution with the size of 3 multiplied by 3 to output a feature map conv-3 with the size of 64 multiplied by 256; then 2 x2 upsampling is carried out, and the output characteristic graph up-3 and the characteristic graph res-2 are spliced to obtain a 128 x 512 size characteristic graph concat-3; then carrying out convolution with the size of 3 multiplied by 3 to output a feature map conv-4 with the size of 128 multiplied by 64; then 2 x2 upsampling is carried out, and the output characteristic diagram up-4 and the characteristic diagram res-1 are spliced to obtain a 256x 128 size characteristic diagram concat-4; then carrying out convolution with the size of 3 multiplied by 3 to output a characteristic diagram conv-5 with the size of 256 multiplied by 64; finally, after 2 × 2 upsampling and 1 × 1 size convolution, a 512 × 512 × 2 result graph corresponding to the size of the original input image is output.
An improved reset neural network was constructed according to the following table:
Figure BDA0002496323150000071
Figure BDA0002496323150000081
and (3) segmenting by using the neural network trained in the step (2) to obtain a segmentation result.

Claims (6)

1. A method for segmenting a chemical structural formula based on a ResUNet neural network is characterized by comprising the following steps:
constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generation training set T-2;
step (2) sending the training set T into a Resunet neural network for training until the specified times of training or the Loss curve is not reduced and the precision is not improved any more, and storing the trained Resunet neural network model;
step (3) segmenting the chemical structural formula by using the ResUNet neural network model trained in the step (2);
the training set T-2 is generated by image random filling based on typesetting template through a method for automatically generating a chemical structural formula training set, and the construction method comprises the following steps:
a. constructing a typesetting template, and randomly generating text data in a character area;
b. generating a plurality of chemical structural formula images;
c. and searching blank positions in the typesetting template, randomly filling the chemical structural formula image formula and marking.
2. The method of claim 1, wherein the formula in the manually labeled publication is used as a part of the training set T-1, and the ratio of the capacities of the training set T-1 and the training set T-2 is 1: 50.
3. the method for segmenting chemical structural formulas based on ResUNet neural network as claimed in claim 1 or 2, wherein the method for constructing typeset templates comprises the following steps:
a-1, manually calibrating a text area in 200 pages of publications, rotating, reversing up and down and left and right to expand data, and generating 1000 pages of a typesetting template;
and a-2, using the characters generated by the Internet characters and the random text generator as text data, and filling the text data into character areas in the typesetting template at random.
4. The method for segmenting chemical structural formulas based on ResUNet neural network as claimed in claim 3, wherein the method for generating a plurality of chemical structural formula images comprises the following steps:
b-1, rendering 5700 ten thousand molecule data available in a PubChem database into various types of 3-channel PNG format images with 256x256 pixels by using Indigo software randomly;
and b-2, carrying out angle rotation and data expansion operation of up-down and left-right inversion on the images to generate 10 ten thousand small molecular chemical structural images.
5. The method of claim 4, wherein the method of searching for the blank position in the typesetting template to randomly fill and mark the chemical structural formula image comprises the following steps:
c-1, randomly taking out the generated chemical structural image, and placing the chemical structural image at a blank position outside the text area after random scaling to obtain a data part in the training set T-2;
and c-2, marking the positions of the pixels occupied by the chemical structural formula image pixel by pixel to obtain the label part of the training set T-2.
6. The method for dividing chemical structural formula based on ResUNet neural network as claimed in claim 5, wherein the ResUNet neural network is a modified ResUNet neural network, which is implemented as:
taking the training set T as an input image of the improved ResUNet neural network, wherein the input image is 512 multiplied by 3, and outputting a feature map res-1 with the size of 256 multiplied by 64 after 7 multiplied by 7 convolution of a first layer; then, pooling by using the maximum value of 3 × 3, repeating the convolution for three times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 for 9 times, and outputting a feature map res-2 with the size of 128 × 128 × 256; then, after repeating convolution for 12 times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 four times, outputting a feature map res-3 with the size of 64 × 64 × 512, and then repeating convolution for 18 times with the size of 1 × 1, the size of 3 × 3 and the size of 1 × 1 six times, outputting a feature map res-4 with the size of 32 × 32 × 1024; then, after repeating convolution for three times with the size of 1 multiplied by 1, the size of 3 multiplied by 3 and the size of 1 multiplied by 1 for 9 times, outputting a size characteristic diagram res-5 of 16 multiplied by 2048; then carrying out convolution with the size of 1 multiplied by 1, and outputting a characteristic diagram conv-1 with the size of 16 multiplied by 1024; then 2 x2 upsampling is carried out, and the output characteristic diagram up-1 and the characteristic diagram res-4 are spliced to obtain a 32 x 2048 size characteristic diagram concat-1; then, carrying out convolution with the size of 3 multiplied by 3, and outputting a feature map conv-2 with the size of 32 multiplied by 512; then 2 x2 upsampling is carried out, and the output characteristic graph up-2 and the characteristic graph res-3 are spliced to obtain a characteristic graph concat-2 with the size of 64 x 1024; then, carrying out convolution with the size of 3 multiplied by 3 to output a feature map conv-3 with the size of 64 multiplied by 256; then 2 x2 upsampling is carried out, and the output characteristic graph up-3 and the characteristic graph res-2 are spliced to obtain a 128 x 512 size characteristic graph concat-3; then carrying out convolution with the size of 3 multiplied by 3 to output a feature map conv-4 with the size of 128 multiplied by 64; then 2 x2 upsampling is carried out, and the output characteristic diagram up-4 and the characteristic diagram res-1 are spliced to obtain a 256x 128 size characteristic diagram concat-4; then carrying out convolution with the size of 3 multiplied by 3 to output a characteristic diagram conv-5 with the size of 256 multiplied by 64; finally, after 2 × 2 upsampling and 1 × 1 size convolution, a 512 × 512 × 2 result graph corresponding to the size of the original input image is output.
CN202010419502.3A 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network Active CN111709293B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010419502.3A CN111709293B (en) 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010419502.3A CN111709293B (en) 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network

Publications (2)

Publication Number Publication Date
CN111709293A true CN111709293A (en) 2020-09-25
CN111709293B CN111709293B (en) 2023-10-03

Family

ID=72538017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010419502.3A Active CN111709293B (en) 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network

Country Status (1)

Country Link
CN (1) CN111709293B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241505A (en) * 2021-12-20 2022-03-25 苏州阿尔脉生物科技有限公司 Method and device for extracting chemical structure image, storage medium and electronic equipment
CN114842486A (en) * 2022-07-04 2022-08-02 南昌大学 Handwritten chemical structural formula recognition method, system, storage medium and equipment

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1381795A (en) * 2001-04-18 2002-11-27 无敌科技(西安)有限公司 Automatic format setting method for palm-type browser
CN101561805A (en) * 2008-04-18 2009-10-21 日电(中国)有限公司 Document classifier generation method and system
WO2018138104A1 (en) * 2017-01-27 2018-08-02 Agfa Healthcare Multi-class image segmentation method
CN109087306A (en) * 2018-06-28 2018-12-25 众安信息技术服务有限公司 Arteries iconic model training method, dividing method, device and electronic equipment
CN109118491A (en) * 2018-07-30 2019-01-01 深圳先进技术研究院 A kind of image partition method based on deep learning, system and electronic equipment
CN109191476A (en) * 2018-09-10 2019-01-11 重庆邮电大学 The automatic segmentation of Biomedical Image based on U-net network structure
WO2019015785A1 (en) * 2017-07-21 2019-01-24 Toyota Motor Europe Method and system for training a neural network to be used for semantic instance segmentation
CN109658422A (en) * 2018-12-04 2019-04-19 大连理工大学 A kind of retinal images blood vessel segmentation method based on multiple dimensioned deep supervision network
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
US20190205606A1 (en) * 2016-07-21 2019-07-04 Siemens Healthcare Gmbh Method and system for artificial intelligence based medical image segmentation
CN110210362A (en) * 2019-05-27 2019-09-06 中国科学技术大学 A kind of method for traffic sign detection based on convolutional neural networks
CN110705459A (en) * 2019-09-29 2020-01-17 北京爱学习博乐教育科技有限公司 Automatic identification method and device for mathematical and chemical formulas and model training method and device
US20200057917A1 (en) * 2018-08-17 2020-02-20 Shenzhen Dorabot Inc. Object Location Method, Device and Storage Medium Based on Image Segmentation

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1381795A (en) * 2001-04-18 2002-11-27 无敌科技(西安)有限公司 Automatic format setting method for palm-type browser
CN101561805A (en) * 2008-04-18 2009-10-21 日电(中国)有限公司 Document classifier generation method and system
US20190205606A1 (en) * 2016-07-21 2019-07-04 Siemens Healthcare Gmbh Method and system for artificial intelligence based medical image segmentation
WO2018138104A1 (en) * 2017-01-27 2018-08-02 Agfa Healthcare Multi-class image segmentation method
WO2019015785A1 (en) * 2017-07-21 2019-01-24 Toyota Motor Europe Method and system for training a neural network to be used for semantic instance segmentation
CN109087306A (en) * 2018-06-28 2018-12-25 众安信息技术服务有限公司 Arteries iconic model training method, dividing method, device and electronic equipment
CN109118491A (en) * 2018-07-30 2019-01-01 深圳先进技术研究院 A kind of image partition method based on deep learning, system and electronic equipment
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
US20200057917A1 (en) * 2018-08-17 2020-02-20 Shenzhen Dorabot Inc. Object Location Method, Device and Storage Medium Based on Image Segmentation
CN109191476A (en) * 2018-09-10 2019-01-11 重庆邮电大学 The automatic segmentation of Biomedical Image based on U-net network structure
CN109658422A (en) * 2018-12-04 2019-04-19 大连理工大学 A kind of retinal images blood vessel segmentation method based on multiple dimensioned deep supervision network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN110210362A (en) * 2019-05-27 2019-09-06 中国科学技术大学 A kind of method for traffic sign detection based on convolutional neural networks
CN110705459A (en) * 2019-09-29 2020-01-17 北京爱学习博乐教育科技有限公司 Automatic identification method and device for mathematical and chemical formulas and model training method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
于玉海: "面向医学文献的图像模式识别关键技术研究", no. 2 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241505A (en) * 2021-12-20 2022-03-25 苏州阿尔脉生物科技有限公司 Method and device for extracting chemical structure image, storage medium and electronic equipment
CN114842486A (en) * 2022-07-04 2022-08-02 南昌大学 Handwritten chemical structural formula recognition method, system, storage medium and equipment

Also Published As

Publication number Publication date
CN111709293B (en) 2023-10-03

Similar Documents

Publication Publication Date Title
JP2022541199A (en) A system and method for inserting data into a structured database based on image representations of data tables.
Li et al. Tablebank: A benchmark dataset for table detection and recognition
CN114005123A (en) System and method for digitally reconstructing layout of print form text
CN108090400A (en) A kind of method and apparatus of image text identification
Clausner et al. Efficient and effective OCR engine training
CN113869017B (en) Table image reconstruction method, device, equipment and medium based on artificial intelligence
CN111709293B (en) Chemical structural formula segmentation method based on Resunet neural network
US9159147B2 (en) Method and apparatus for personalized handwriting avatar
CN114005126A (en) Table reconstruction method and device, computer equipment and readable storage medium
CN112560849A (en) Neural network algorithm-based grammar segmentation method and system
CN109685061A (en) The recognition methods of mathematical formulae suitable for structuring
Godfrey et al. An adaptable approach for generating vector features from scanned historical thematic maps using image enhancement and remote sensing techniques in a geographic information system
Vafaie et al. Handwritten and printed text identification in historical archival documents
CN116610304B (en) Page code generation method, device, equipment and storage medium
CN116721713B (en) Data set construction method and device oriented to chemical structural formula identification
Aswatha et al. A method for extracting text from stone inscriptions using character spotting
CN109410662B (en) Method and device for manufacturing Chinese character multimedia card
CN115019310B (en) Image-text identification method and equipment
CN111026899A (en) Product generation method based on deep learning
CN116341489A (en) Text information reading method, device and terminal
CN114913382A (en) Aerial photography scene classification method based on CBAM-AlexNet convolutional neural network
CN114565749A (en) Method and system for identifying key content of visa document of power construction site
CN114861595A (en) Vector line transformation-based individual font generation method
Scius-Bertrand et al. Annotation-free keyword spotting in historical Vietnamese manuscripts using graph matching
Hamplová et al. Character Segmentation in the Development of Palmyrene Aramaic OCR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant