CN111709293B - Chemical structural formula segmentation method based on Resunet neural network - Google Patents

Chemical structural formula segmentation method based on Resunet neural network Download PDF

Info

Publication number
CN111709293B
CN111709293B CN202010419502.3A CN202010419502A CN111709293B CN 111709293 B CN111709293 B CN 111709293B CN 202010419502 A CN202010419502 A CN 202010419502A CN 111709293 B CN111709293 B CN 111709293B
Authority
CN
China
Prior art keywords
size
structural formula
training set
neural network
chemical structural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010419502.3A
Other languages
Chinese (zh)
Other versions
CN111709293A (en
Inventor
王毅刚
邵锦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202010419502.3A priority Critical patent/CN111709293B/en
Publication of CN111709293A publication Critical patent/CN111709293A/en
Application granted granted Critical
Publication of CN111709293B publication Critical patent/CN111709293B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a chemical structural formula segmentation method based on a Resunet neural network. The method comprises the following steps: constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generating training set T-2; step (2) the training set T is sent to a Resune neural network for training, and the trained Resune neural network model is stored until the training designated times or the Loss curve is no longer reduced and the accuracy is no longer improved; and (3) segmenting the chemical structural formula by using the Resunet neural network model trained in the step (2). The invention provides an improved Resunet neural network based on the Resunet neural network, and simultaneously provides a method for automatically generating a large number of chemical structural formula training sets to generate the training sets, so that the Resunet neural network can divide the chemical structural formula, and the aim of improving the recognition precision of the neural network by a large amount of data is fulfilled.

Description

Chemical structural formula segmentation method based on Resunet neural network
Technical Field
The invention belongs to the technical field of computer detection, and particularly relates to a chemical structural formula segmentation method based on a Resunet neural network.
Background
A part of the scientific experimentation that is often critical is the rapid processing and absorption of newly acquired data. Furthermore, new research methods are also not available to collect, analyze and utilize previously published experimental data. This is particularly applicable for the discovery of small molecule drugs where experimentally tested molecular sets are used for virtual screening programs, quantitative structure activity/property relationship (QSAR/QSPR) analysis, or validation based on physical modeling methods. Because of the difficulty and expense of generating large amounts of experimental data, many drug discovery projects are forced to rely on relatively small internal experimental databases. One promising solution to address the general lack of adequate training set data in drug discovery is to utilize the data currently being published. The Medline report published more than 2000 new life science papers per day, and given that new experimental data is entering public literature at such high speeds, it is becoming increasingly important to solve the problems associated with data extraction and management, and to automate these processes as much as possible. Extraction of chemical structures from published sources, such as journal articles and patent documents, in life sciences remains difficult and time consuming.
Currently, a large number of books and other publications are still available in paper or scanned versions, creating difficulties in reuse. On the one hand, the materials of the paper or scanning plate are not easy to retrieve, so that information dispersed in a large number of documents is not easily found and is not fully utilized. On the other hand, further processing of these materials involves tedious and error-prone re-entry work.
Research on chemical structural formula identification is slow, and the main reasons are as follows: 1. formulas are surrounded by natural language in the document, so that the formulas are difficult to locate; and secondly, due to the complex structure of the chemical structural formula, the characters are various in variety, various in fonts and different in size, and the characters have the characteristics of irregularity, logicalness, complexity and the like.
The existing identification method of the chemical structural formula is divided into two steps: 1. positioning and dividing the chemical structural formula from natural language; 2. and sending the divided chemical structural formulas into a recognition engine for recognition. The current chemical structural formula segmentation method is basically completed based on the traditional image processing method, has low segmentation accuracy, and cannot deal with special cases such as natural language and chemical molecular formula with relatively close distance.
Disclosure of Invention
Based on the method, in order to improve the positioning and segmentation accuracy of the chemical structural formula, the invention provides an improved Resune neural network based on the Resune neural network, and simultaneously provides a method for automatically generating a large number of large-scale structural formula training sets to generate the training sets, so that the Resune neural network can segment the chemical structural formula, and the aim of improving the recognition accuracy of the neural network by a large amount of data is fulfilled.
A chemical structural formula segmentation method based on a Resunet neural network comprises the following steps:
and (2) constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generating training set T-2. The method comprises the steps of manually marking a chemical formula in a publication as a part of a training set T-1, generating a training set T-2 by using a method for automatically generating the training set with the chemical structural formula, wherein the capacity ratio of the training set T-1 to the training set T-2 is 1:50;
step (2) the training set T is sent into an improved Resune neural network for training, and the trained Resune neural network model is stored until the training designated times or the Loss curve is no longer reduced and the accuracy is no longer improved;
and (3) segmenting the chemical structural formula by using the Resunet neural network model trained in the step (2).
Further, by using the method for automatically generating the chemical structural formula training set, the training set is generated based on random filling of images of typesetting templates, and the construction method comprises the following steps:
a. and constructing a typesetting template, and randomly generating text data in the text area.
b. A large number of chemical structural images are generated.
c. And searching blank positions in the typesetting template, randomly filling the chemical structural formula image formula, and marking.
Further, the method for constructing the typesetting template comprises the following steps:
a-1, manually calibrating text areas in 200 pages of publications, rotating, reversing up and down, and expanding data, and generating 1000 pages of typesetting templates by symbiosis, wherein the typesetting templates marked manually are shown in figure 2.
and a-2, taking the Internet characters and the characters generated by the random text generator as text data, and randomly filling the text data into the character areas in the typesetting template, wherein the generation result is shown in figure 3.
Further, the method for generating a plurality of chemical structural image comprises the following steps:
b-1. 5700 million sub-data available in the pubhem database, part of which is rendered randomly using the Indigo software into 256x256 pixel 3-channel PNG format images of various types (key width, character size, etc.).
And b-2, performing angle rotation on the image, and performing data expansion operation of up-down left-right inversion to generate 10 ten-thousand small molecular chemical structural formula images.
Further, the method for searching blank positions in the typesetting template to randomly fill the chemical structural formula image and mark comprises the following steps:
c-1, randomly taking out the generated chemical structural formula image, and placing the image at a blank position outside the text region after random scaling to obtain a data part in the training set T-2, as shown in figure 4.
c-2. The positions of the pixels occupied by the pixel-by-pixel marking chemical structural formula image obtain a label part of the training set T-2, as shown in figure 5.
Further, the improved ResUNet neural network is implemented as:
taking the training set T as an input image of the improved Resunet neural network, wherein the input image is 512 multiplied by 3, and outputting a feature map res-1 with the size of 256 multiplied by 64 after 7 multiplied by 7 on the first layer; then, using maximum value of 3×3 to pool, repeating three times to 9 times of convolution of 1×1 size, 3×3 size and 1×1 size, and outputting feature map res-2 of 128×128×256 size; then, the convolution is repeated for 12 times for four times for 1×1 size, 3×3 size and 1×1 size to output a 64×64×512 size feature map res-3, and then the convolution is repeated for 18 times for six times for 1×1 size, 3×3 size and 1×1 size to output a 32×32×1024 size feature map res-4; then, the convolution is repeated for 9 times by repeating the steps of three times, namely the size of 1 multiplied by 1, the size of 3 multiplied by 3 and the size of 1 multiplied by 1, and a characteristic diagram res-5 with the size of 16 multiplied by 2048 is output; then, a convolution with the size of 1 multiplied by 1 is carried out, and a feature map conv-1 with the size of 16 multiplied by 1024 is output; then 2X 2 up-sampling is carried out, and the output characteristic diagram up-1 and the characteristic diagram res-4 are spliced to obtain a characteristic diagram concat-1 with the size of 32X 2048; then, a 3×3 size convolution is performed to output a 32×32×512 size feature map conv-2; then 2X 2 up-sampling is carried out, and the output feature map up-2 and the feature map res-3 are spliced to obtain a feature map concat-2 with the size of 64X 1024; then, a 3×3 size convolution is performed to output a 64×64×256 size feature map conv-3; then 2X 2 up-sampling is carried out, and the output feature map up-3 and the feature map res-2 are spliced to obtain a feature map concat-3 with the size of 128X 512; then, a 3×3 size convolution is performed to output a 128×128×64 size feature map conv-4; then 2X 2 up-sampling is carried out, and the output characteristic diagram up-4 and the characteristic diagram res-1 are spliced to obtain a characteristic diagram concat-4 with the size of 256X 128; then, a 3×3 size convolution is performed to output a 256×256×64 size feature map conv-5; finally, through up-sampling of 2×2 and convolution of 1×1 size, 512×512×2 result map corresponding to original input image size is output.
The invention has the following beneficial effects:
the invention provides an improved Resunet neural network based on the Resunet neural network, and simultaneously provides a method for automatically generating a large number of chemical structural formula training sets to generate the training sets, so that the Resunet neural network can divide the chemical structural formula, and the aim of improving the recognition precision of the neural network by a large amount of data is fulfilled.
Drawings
FIG. 1 is a schematic flow diagram of a modified Resunet neural network of the present invention;
FIG. 2 is a schematic illustration of a manually labeled template sample according to the present invention;
FIG. 3 is a schematic representation of a template sample after random text population in accordance with the present invention;
FIG. 4 is a schematic representation of a template sample after random filling of chemical formulas in accordance with the present invention;
FIG. 5 is a schematic diagram of a corresponding label sample of the template of the present invention;
Detailed Description
In order to more specifically describe the present invention, a chemical structural formula segmentation method based on a Resunet neural network according to the present invention is described in detail below with reference to the accompanying drawings and the detailed description.
A chemical structural formula segmentation method based on a Resunet neural network comprises the following steps:
and (2) constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generating training set T-2. The method comprises the steps of manually marking a chemical formula in a publication as a part of a training set T-1, generating a training set T-2 by using a method for automatically generating the training set with the chemical structural formula, wherein the capacity ratio of the training set T-1 to the training set T-2 is 1:50;
the method for automatically generating the chemical structural training set is used for generating the training set based on the random filling of the images of the typesetting templates, and the construction method comprises the following steps:
a. constructing a typesetting template, randomly generating text data in a text region, manually calibrating the text region in 200 pages of publications, rotating, reversing up and down and left and right to expand data, generating 1000 pages of templates by symbiosis, manually marking the template as shown in figure 2, taking the text generated by an Internet text and random text generator as text data, randomly filling the text data in the text region in the typesetting template, and generating a result as shown in figure 3.
b. Generating a large number of chemical structural formula images, using 5700 ppm sub data available in a PubCHem database, using Indigo software to randomly render part of molecular data therein into 256-x 256-pixel 3-channel PNG format images of various types (bond width, character size and the like), performing angle rotation on the images, performing data expansion operation of up-down left-right inversion, and generating 10 ten thousands of small molecular chemical structural formula images.
c. And searching for a blank position in the typesetting template, randomly filling a chemical structural formula, marking, randomly taking out a generated chemical structural formula image, and placing the image at the blank position outside the text area after random scaling, as shown in figure 4. The pixel-by-pixel signature chemistry image occupies the pixel location as shown in fig. 5.
Step (2) constructing an improved Resunet neural network as shown in fig. 1, sending a training data set into the improved Resunet neural network for training, and storing a trained model until the training designated times or the Loss curve is no longer reduced and the accuracy is no longer improved;
further, the improved ResUNet neural network comprises the following steps: taking the training set T as an input image of the improved Resunet neural network, wherein the input image is 512 multiplied by 3, and outputting a feature map res-1 with the size of 256 multiplied by 64 after 7 multiplied by 7 on the first layer; then, using maximum value of 3×3 to pool, repeating three times to 9 times of convolution of 1×1 size, 3×3 size and 1×1 size, and outputting feature map res-2 of 128×128×256 size; then, the convolution is repeated for 12 times for four times for 1×1 size, 3×3 size and 1×1 size to output a 64×64×512 size feature map res-3, and then the convolution is repeated for 18 times for six times for 1×1 size, 3×3 size and 1×1 size to output a 32×32×1024 size feature map res-4; then, the convolution is repeated for 9 times by repeating the steps of three times, namely the size of 1 multiplied by 1, the size of 3 multiplied by 3 and the size of 1 multiplied by 1, and a characteristic diagram res-5 with the size of 16 multiplied by 2048 is output; then, a convolution with the size of 1 multiplied by 1 is carried out, and a feature map conv-1 with the size of 16 multiplied by 1024 is output; then 2X 2 up-sampling is carried out, and the output characteristic diagram up-1 and the characteristic diagram res-4 are spliced to obtain a characteristic diagram concat-1 with the size of 32X 2048; then, a 3×3 size convolution is performed to output a 32×32×512 size feature map conv-2; then 2X 2 up-sampling is carried out, and the output feature map up-2 and the feature map res-3 are spliced to obtain a feature map concat-2 with the size of 64X 1024; then, a 3×3 size convolution is performed to output a 64×64×256 size feature map conv-3; then 2X 2 up-sampling is carried out, and the output feature map up-3 and the feature map res-2 are spliced to obtain a feature map concat-3 with the size of 128X 512; then, a 3×3 size convolution is performed to output a 128×128×64 size feature map conv-4; then 2X 2 up-sampling is carried out, and the output characteristic diagram up-4 and the characteristic diagram res-1 are spliced to obtain a characteristic diagram concat-4 with the size of 256X 128; then, a 3×3 size convolution is performed to output a 256×256×64 size feature map conv-5; finally, through up-sampling of 2×2 and convolution of 1×1 size, 512×512×2 result map corresponding to original input image size is output.
An improved ResUNet neural network was constructed according to the following table:
and (3) segmenting by using the neural network trained in the step (2) to obtain segmentation results.

Claims (1)

1. The chemical structural formula segmentation method based on the Resunet neural network is characterized by comprising the following steps of:
constructing a training set T, wherein the training set T comprises a manual labeling training set T-1 and an automatic generating training set T-2;
step (2) the training set T is sent to a Resune neural network for training, and the trained Resune neural network model is stored until the training designated times or the Loss curve is no longer reduced and the accuracy is no longer improved;
step (3), segmenting the chemical structural formula by using the Resunet neural network model trained in the step (2);
the training set T-2 is generated by a method for automatically generating a chemical structural formula training set and randomly filling images based on typesetting templates, and the construction method comprises the following steps:
a. constructing a typesetting template, and randomly generating text data in a text area;
b. generating a plurality of chemical structural formula images;
c. searching blank positions in the typesetting template, randomly filling chemical structural formula images and marking;
taking the chemical structural formula in the manual labeling publication as a training set T-1, wherein the capacity ratio of the training set T-1 to the training set T-2 is 1:50;
the method for constructing the typesetting template comprises the following steps:
a-1, manually calibrating text areas in 200 pages of publications, rotating, reversing up and down, and expanding data by left and right, and generating 1000 pages of typesetting templates;
a-2, taking the Internet characters and the characters generated by the random text generator as text data, and randomly filling the text data into the character areas in the typesetting template;
the method for generating a large number of chemical structural formula images comprises the following steps:
b-1, rendering part of molecular data in 5700 ppm sub-data available in the PubCHem database into various types of 256x256 pixel 3-channel PNG format images by using Indigo software at random;
b-2, performing data expansion operation of angle rotation, up-down left-right inversion on the image, and generating 10 ten thousand micromolecular chemical structural formula images;
the method for searching blank positions in the typesetting template to randomly fill chemical structural formula images and mark comprises the following steps:
c-1, randomly taking out the generated chemical structural formula image, and placing the image at a blank position outside a text region after random scaling to obtain a data part in a training set T-2;
c-2, marking the positions of pixels occupied by the chemical structural formula image pixel by pixel to obtain a label part of the training set T-2;
the Resunet neural network is an improved Resunet neural network, which is realized by the following steps:
taking the training set T as an input image of the improved Resunet neural network, wherein the input image is 512 multiplied by 3, and outputting a feature map res-1 with the size of 256 multiplied by 64 after 7 multiplied by 7 of the first layer; then, using maximum value of 3×3 to pool, repeating three times to 9 times of convolution of 1×1 size, 3×3 size and 1×1 size, and outputting feature map res-2 of 128×128×256 size; then, the convolution is repeated for 12 times for four times for 1×1 size, 3×3 size and 1×1 size to output a 64×64×512 size feature map res-3, and then the convolution is repeated for 18 times for six times for 1×1 size, 3×3 size and 1×1 size to output a 32×32×1024 size feature map res-4; then, the convolution is repeated for 9 times by repeating the steps of three times, namely the size of 1 multiplied by 1, the size of 3 multiplied by 3 and the size of 1 multiplied by 1, and a characteristic diagram res-5 with the size of 16 multiplied by 2048 is output; then, a convolution with the size of 1 multiplied by 1 is carried out, and a feature map conv-1 with the size of 16 multiplied by 1024 is output; then 2X 2 up-sampling is carried out, and the output characteristic diagram up-1 and the characteristic diagram res-4 are spliced to obtain a characteristic diagram concat-1 with the size of 32X 2048; then, a 3×3 size convolution is performed to output a 32×32×512 size feature map conv-2; then 2X 2 up-sampling is carried out, and the output feature map up-2 and the feature map res-3 are spliced to obtain a feature map concat-2 with the size of 64X 1024; then, a 3×3 size convolution is performed to output a 64×64×256 size feature map conv-3; then 2X 2 up-sampling is carried out, and the output feature map up-3 and the feature map res-2 are spliced to obtain a feature map concat-3 with the size of 128X 512; then, a 3×3 size convolution is performed to output a 128×128×64 size feature map conv-4; then 2X 2 up-sampling is carried out, and the output characteristic diagram up-4 and the characteristic diagram res-1 are spliced to obtain a characteristic diagram concat-4 with the size of 256X 128; then, a 3×3 size convolution is performed to output a 256×256×64 size feature map conv-5; finally, through up-sampling of 2×2 and convolution of 1×1 size, 512×512×2 result map corresponding to original input image size is output.
CN202010419502.3A 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network Active CN111709293B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010419502.3A CN111709293B (en) 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010419502.3A CN111709293B (en) 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network

Publications (2)

Publication Number Publication Date
CN111709293A CN111709293A (en) 2020-09-25
CN111709293B true CN111709293B (en) 2023-10-03

Family

ID=72538017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010419502.3A Active CN111709293B (en) 2020-05-18 2020-05-18 Chemical structural formula segmentation method based on Resunet neural network

Country Status (1)

Country Link
CN (1) CN111709293B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241505B (en) * 2021-12-20 2023-04-07 苏州阿尔脉生物科技有限公司 Method and device for extracting chemical structure image, storage medium and electronic equipment
CN114842486A (en) * 2022-07-04 2022-08-02 南昌大学 Handwritten chemical structural formula recognition method, system, storage medium and equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1381795A (en) * 2001-04-18 2002-11-27 无敌科技(西安)有限公司 Automatic format setting method for palm-type browser
CN101561805A (en) * 2008-04-18 2009-10-21 日电(中国)有限公司 Document classifier generation method and system
WO2018138104A1 (en) * 2017-01-27 2018-08-02 Agfa Healthcare Multi-class image segmentation method
CN109087306A (en) * 2018-06-28 2018-12-25 众安信息技术服务有限公司 Arteries iconic model training method, dividing method, device and electronic equipment
CN109118491A (en) * 2018-07-30 2019-01-01 深圳先进技术研究院 A kind of image partition method based on deep learning, system and electronic equipment
CN109191476A (en) * 2018-09-10 2019-01-11 重庆邮电大学 The automatic segmentation of Biomedical Image based on U-net network structure
WO2019015785A1 (en) * 2017-07-21 2019-01-24 Toyota Motor Europe Method and system for training a neural network to be used for semantic instance segmentation
CN109658422A (en) * 2018-12-04 2019-04-19 大连理工大学 A kind of retinal images blood vessel segmentation method based on multiple dimensioned deep supervision network
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN110210362A (en) * 2019-05-27 2019-09-06 中国科学技术大学 A kind of method for traffic sign detection based on convolutional neural networks
CN110705459A (en) * 2019-09-29 2020-01-17 北京爱学习博乐教育科技有限公司 Automatic identification method and device for mathematical and chemical formulas and model training method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109690554B (en) * 2016-07-21 2023-12-05 西门子保健有限责任公司 Method and system for artificial intelligence based medical image segmentation
CN109102543B (en) * 2018-08-17 2021-04-02 深圳蓝胖子机器智能有限公司 Object positioning method, device and storage medium based on image segmentation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1381795A (en) * 2001-04-18 2002-11-27 无敌科技(西安)有限公司 Automatic format setting method for palm-type browser
CN101561805A (en) * 2008-04-18 2009-10-21 日电(中国)有限公司 Document classifier generation method and system
WO2018138104A1 (en) * 2017-01-27 2018-08-02 Agfa Healthcare Multi-class image segmentation method
WO2019015785A1 (en) * 2017-07-21 2019-01-24 Toyota Motor Europe Method and system for training a neural network to be used for semantic instance segmentation
CN109087306A (en) * 2018-06-28 2018-12-25 众安信息技术服务有限公司 Arteries iconic model training method, dividing method, device and electronic equipment
CN109118491A (en) * 2018-07-30 2019-01-01 深圳先进技术研究院 A kind of image partition method based on deep learning, system and electronic equipment
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109191476A (en) * 2018-09-10 2019-01-11 重庆邮电大学 The automatic segmentation of Biomedical Image based on U-net network structure
CN109658422A (en) * 2018-12-04 2019-04-19 大连理工大学 A kind of retinal images blood vessel segmentation method based on multiple dimensioned deep supervision network
CN109903292A (en) * 2019-01-24 2019-06-18 西安交通大学 A kind of three-dimensional image segmentation method and system based on full convolutional neural networks
CN110210362A (en) * 2019-05-27 2019-09-06 中国科学技术大学 A kind of method for traffic sign detection based on convolutional neural networks
CN110705459A (en) * 2019-09-29 2020-01-17 北京爱学习博乐教育科技有限公司 Automatic identification method and device for mathematical and chemical formulas and model training method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
于玉海.面向医学文献的图像模式识别关键技术研究.信息科技.2019,(2),全文. *

Also Published As

Publication number Publication date
CN111709293A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
Li et al. Tablebank: Table benchmark for image-based table detection and recognition
CN101253514B (en) Grammatical parsing of document visual structures
Li et al. Tablebank: A benchmark dataset for table detection and recognition
Rebelo et al. Optical music recognition: state-of-the-art and open issues
Shahab et al. An open approach towards the benchmarking of table structure recognition systems
JP3086702B2 (en) Method for identifying text or line figure and digital processing system
JP3822277B2 (en) Character template set learning machine operation method
CN108090400B (en) Image text recognition method and device
CN111709293B (en) Chemical structural formula segmentation method based on Resunet neural network
Clausner et al. Efficient and effective OCR engine training
EP1457917A2 (en) Apparatus and methods for converting network drawings from raster format to vector format
CN114005123A (en) System and method for digitally reconstructing layout of print form text
Karasneh et al. Img2uml: A system for extracting uml models from images
Chiang et al. Automatic and accurate extraction of road intersections from raster maps
Baluja Learning typographic style: from discrimination to synthesis
JP2022541199A (en) A system and method for inserting data into a structured database based on image representations of data tables.
CN109685061A (en) The recognition methods of mathematical formulae suitable for structuring
WO2021068364A1 (en) Stroke skeleton information extracting method, apparatus, electronic device and storage medium
Lopresti et al. Issues in ground-truthing graphic documents
CN115019310B (en) Image-text identification method and equipment
CN111026899A (en) Product generation method based on deep learning
CN114861595B (en) Vector line transformation-based individual font generation method
CN116341489A (en) Text information reading method, device and terminal
CN114821222A (en) Test paper image generation method and device, storage medium and electronic equipment
Drapeau et al. Extraction of ancient map contents using trees of connected components

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant