WO2021125206A1 - 画像解析装置、画像解析方法、及びプログラム - Google Patents
画像解析装置、画像解析方法、及びプログラム Download PDFInfo
- Publication number
- WO2021125206A1 WO2021125206A1 PCT/JP2020/046887 JP2020046887W WO2021125206A1 WO 2021125206 A1 WO2021125206 A1 WO 2021125206A1 JP 2020046887 W JP2020046887 W JP 2020046887W WO 2021125206 A1 WO2021125206 A1 WO 2021125206A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- structural formula
- model
- information
- symbol information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/12—Detection or correction of errors, e.g. by rescanning the pattern
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/1801—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/18—Extraction of features or characteristics of the image
- G06V30/1801—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
- G06V30/18019—Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
- G06V30/18038—Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters
- G06V30/18048—Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters with interaction between the responses of different filters, e.g. cortical complex cells
- G06V30/18057—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/30—Character recognition based on the type of data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/42—Document-oriented image-based pattern recognition based on the type of document
- G06V30/422—Technical drawings; Geographical maps
Definitions
- the present invention relates to an image analysis apparatus, an image analysis method, and a program, and more particularly to an image analysis apparatus, an image analysis method, and a program for analyzing an image showing a structural formula of a compound.
- Patent Document 1 recognizes a pattern of character information (for example, atoms constituting a chemical substance) in a chemical structure diagram, and performs diagram information (for example, bonds between atoms) of the chemical structure diagram as a predetermined algorithm. Recognized by.
- Patent Document 2 reads an image of the structural formula of a compound, assigns a value indicating the attribute of the atomic symbol to a region (pixel) indicating the atomic symbol in the image, and assigns a value indicating the attribute of the atomic symbol to the region (pixel) indicating the combination symbol. ) Is assigned a value indicating the attribute of the join symbol.
- the present invention has been made in view of the above circumstances, and solves the above-mentioned problems of the prior art. Specifically, the present invention provides an image analysis device, an image analysis method, and an image analysis method capable of responding to changes in the writing style of the structural formula when generating character information of the structural formula from an image showing the structural formula of the compound.
- the purpose is to provide a program to realize it.
- the image analysis apparatus of the present invention is an image analysis apparatus including a processor and analyzing an image showing the structural formula of the compound, and the processor uses an analysis model to analyze the structural formula of the target compound. Based on the feature quantity of the target image showing, the symbolic information expressing the structural formula of the target compound in the linear notation is generated, and the analysis model linearly describes the training image and the structural formula of the compound shown in the learning image. It is characterized by being constructed by machine learning using symbolic information expressed by the method.
- the processor detects the target image from the document including the target image and inputs the detected target image into the analysis model to generate symbolic information of the structural formula of the target compound.
- the processor detects the target image from the document by using the object detection algorithm.
- the processor detects a plurality of target images from a document containing the plurality of target images, and inputs the detected plurality of target images into the analysis model for each target image, so that each of the plurality of target images is indicated. It is more preferable to generate symbolic information of the structural formula of the target compound.
- the analysis model includes a feature amount output model that outputs a feature amount when the target image is input and a symbol information output model that outputs symbolic information corresponding to the feature amount when the feature amount is input. It may be included.
- the feature output model may include a convolutional neural network
- the symbol information output model may include a recurrent neural network
- the symbolic information of the structural formula of the target compound is composed of a plurality of symbols, and the symbolic information output model sequentially identifies the symbols constituting the symbolic information corresponding to the feature amount from the beginning of the symbolic information, and in the specified order. It is preferable to output symbol information in which symbols are arranged.
- the processor may generate a plurality of symbolic information about the structural formula of the target compound based on the feature amount of the target image by the analysis model.
- the symbol information output model calculates the output probability of each of the plurality of symbols constituting the symbol information for each symbol information, and the symbol information is based on the output probability of each of the calculated plurality of symbols. It is more preferable to calculate the output score of the above and output a predetermined number of symbol information according to the calculated output score.
- the processor executes a determination process for determining the presence or absence of a notational abnormality for each of the symbol information output by the symbol information output model, and among the symbol information output by the symbol information output model, the abnormality is found. It is more preferable to output the normal symbol information that is not present as the symbol information of the structural formula of the target compound.
- the processor generates the first descriptive information in which the structural formula of the target compound is described by a description method different from the linear notation from the target image by the collation model, and the structural formula represented by the normal symbol information is used as the description method.
- the second descriptive information described above is generated, the first descriptive information and the second descriptive information are collated, and the normal symbol information is used as the structure of the target compound according to the degree of coincidence between the first descriptive information and the second descriptive information. It is more preferable to output it as symbolic information of an expression.
- the collation model is constructed by machine learning using the second learning image and the descriptive information in which the structural formula of the compound shown by the second learning image is described by the above description method. It is more suitable.
- the collation model is a feature amount output model that outputs a feature amount when the target image is input, and a first description information corresponding to the feature amount by inputting the feature amount output from the feature amount output model. It is more preferable to include a descriptive information output model that outputs.
- the analysis model is a description method different from the linear notation for the learning image, the symbolic information expressing the structural formula of the compound shown in the learning image in linear notation, and the structural formula of the compound shown in the learning image in linear notation. It may be constructed by machine learning using the descriptive information described in.
- the analysis model is a feature quantity output model that outputs the feature quantity by inputting the target image, and a description information output that outputs the description information of the structural formula of the target compound by inputting the target image. It may include a model and a symbol information output model that outputs symbol information corresponding to the composite information by inputting composite information obtained by synthesizing the output feature amount and description information.
- the feature output model outputs vectorized features
- the descriptive information output model outputs descriptive information consisting of vectorized molecular fingerprints.
- linear notation may be Simplified Molecular Input Line Entry System notation or Canonical Simplified Molecular Input Line Entry System notation.
- the above-mentioned object is an image analysis method for analyzing an image showing a structural formula of a compound, in which a processor uses an analysis model to analyze the structure of the target compound based on the feature amount of the target image showing the structural formula of the target compound.
- a step is performed to generate symbolic information in which the formula is expressed in linear notation, and the analysis model uses the training image and the symbolic information in which the structural formula of the compound shown in the training image is expressed in linear notation. It can be achieved by the image analysis method constructed by machine learning. It is also possible to realize a program for causing the processor to perform the steps of the above image analysis method.
- the present embodiment An image analysis apparatus, an image analysis method, and a program according to an embodiment of the present invention (hereinafter referred to as "the present embodiment") will be described below with reference to the accompanying drawings.
- the following embodiments are merely examples for the purpose of explaining the present invention in an easy-to-understand manner, and do not limit the present invention. That is, the present invention is not limited to the following embodiments, and various improvements or modifications can be made without departing from the gist of the present invention. Also, of course, the present invention includes an equivalent thereof.
- document and image are electronic (data) documents and images, and are information (data) that can be processed by a computer. ..
- the image analysis apparatus of this embodiment includes a processor and analyzes an image showing a structural formula of a compound.
- the main function of the image analysis apparatus of the present embodiment is to analyze an image (target image) showing the structural formula of the target compound and generate symbolic information of the structural formula shown by the target image.
- the "target compound” is a compound that is a target for generating symbolic information of a structural formula, and corresponds to, for example, an organic compound whose structural formula is shown in an image included in a document.
- the "image showing the structural formula” is an image of a diagram showing the structural formula.
- the diagram may change depending on how to draw (for example, the thickness and length of the bond line between atoms, the direction in which the line extends, etc.).
- the writing method of the structural formula includes the resolution of the image showing the structural formula.
- Symbol information is information expressing the structural formula of a compound in linear notation, and is composed of a plurality of symbols (for example, ASCII code) arranged side by side.
- the linear notation includes SMILES (Simplified Molecular Input Entry System) notation, canonical SMILES, SMARTS (Smiles Arbitrary Target Specification) notation, SLN (Sybyl Line Notation) notation, WLN (Wiswesser Line-Formula Notation) notation. (Representation of structure diagram arranged linearly) notation, InChI (International Chemical Identifier), InChIKey (hashed InChI) and the like can be mentioned.
- any notation may be used, but the SMILES notation is preferable in that it is relatively simple and widely used.
- canonical SMILES is also preferable in that the notation is uniquely determined in consideration of the order and order of atoms in the molecule.
- symbolic information expressing the structural formula according to the SMILES notation is generated. Further, in the following, the notation by the SMILES notation is also referred to as the SMILES notation.
- the SMILES notation is a notation that converts the structural formula of a compound into one-line symbol information (character information) composed of a plurality of symbols.
- the symbols used in the SMILES notation represent the type of atom (element), the bond between atoms, the branched structure, and the cut location when the ring structure is cut into a chain structure, and are determined according to a predetermined rule. ing.
- FIG. 1 shows an example of (S) -bromochloroflumethane. In FIG. 1, the structural formula is shown on the left side, and the symbolic information (structural formula expressed in SMILES) is shown on the right side.
- the image analysis device of the present embodiment performs machine learning using a learning image showing the structural formula of the compound and symbol information (correct label information) of the structural formula shown by the learning image as a learning data set. ..
- an analysis model that generates symbolic information of the structural formula shown by the image is constructed based on the feature amount of the image showing the structural formula of the compound.
- the analytical model will be described in detail in a later section.
- the image analysis apparatus of the present embodiment has a function of detecting an image (target image) from a document including an image showing the structural formula of the compound. Then, by inputting the detected target image into the above analysis model, symbol information of the structural formula indicated by the target image is generated.
- a document such as a paper or a patent specification
- the image is detected and the structural formula of the compound shown by the image is converted into symbolic information. be able to.
- the structural formula converted into symbolic information can be used as a search key thereafter, it is possible to easily search for a document containing an image showing the structural formula of the target compound. It becomes.
- the image analysis apparatus of the present embodiment has a function of checking the correctness of the symbol information generated by the analysis model. More specifically, in the present embodiment, a plurality of symbolic information is obtained from the feature amount of one target image, and it is determined whether or not there is a notational abnormality (for example, an erroneous notation in the SMILES notation) for each symbolic information. .. Furthermore, the collation processing described later is performed for each of the symbol information (normal symbol information) for which no abnormality is found. Then, a predetermined number of normal symbol information is output as symbol information of the structural formula of the target compound according to the result of the collation process. By checking the symbolic information generated by the analysis model as described above, accurate information can be obtained as the symbolic information of the structural formula of the target compound.
- a notational abnormality for example, an erroneous notation in the SMILES notation
- analysis model M1 The analysis model used in this embodiment (hereinafter referred to as analysis model M1) will be described. As shown in FIG. 2, the analysis model M1 is composed of a feature amount output model Ma and a symbol information output model Mb. The analysis model M1 is constructed by machine learning using a learning image showing the structural formula of the compound and symbol information (correct answer data) of the structural formula shown in the learning image as a learning data set. Will be done.
- the number of learning data sets used for machine learning should be large, preferably 50,000 or more, from the viewpoint of improving the learning accuracy.
- machine learning is supervised learning
- the method is deep learning (that is, a multi-layer neural network), but the method is not limited thereto.
- the type (algorithm) of machine learning may be unsupervised learning, semi-supervised learning, reinforcement learning, or transduction.
- the machine learning technique may be genetic programming, inductive logic programming, support vector machine, clustering, Bayesian network, extreme learning machine (ELM), or decision tree learning.
- ELM extreme learning machine
- the gradient descent method may be used, or the error backpropagation method may be used as a method of minimizing the objective function (loss function) in the machine learning of the neural network.
- the feature amount output model Ma is a model that outputs the feature amount of the target image by inputting an image (target image) showing the structural formula of the target compound.
- a convolutional layer having a convolutional layer and a pooling layer in the intermediate layer. It is composed of a neural network (CNN).
- CNN neural network
- the feature amount of the image is a learning feature amount in the convolutional neural network CNN, and is a feature amount specified in the process of general image recognition (pattern recognition).
- the feature output model Ma outputs a vectorized feature.
- the feature amount output model Ma may use a network model used for image classification, and examples of such a model include 16-layer CNN (VGG16) of Oxford visual geometriy group, Google Inc. Inception model (GoogLeNet), KaimingHe's 152-layer CNN (Resnet), and Collet's improved Conception model (Xception).
- VG16 16-layer CNN
- Inception model GoogLeNet
- KaimingHe's 152-layer CNN Resnet
- Xception Collet's improved Conception model
- the size of the image input to the feature amount output model Ma is not particularly limited, but the image of the compound may have a size of, for example, 75 ⁇ 75 in length and width. Alternatively, the image size of the compound may be made larger (eg, 300 ⁇ 300) for the purpose of increasing the output accuracy of the model. Further, in the case of a color image, it is preferable to convert it into a monochrome monochromatic image and input the monochromatic image to the feature amount output model Ma for the reason of reducing the calculation process.
- the fully connected layer is arranged after the convolution layer and the pooling layer are repeated in the intermediate layer, and the multidimensional vectorized features are output from this fully connected layer.
- the feature amount (multidimensional vector) output from the fully connected layer is input to the symbol information output model Mb after passing through the linear layer.
- the symbol information output model Mb is a model that outputs the symbol information of the structural formula of the target compound (character information in which the structural formula is expressed in SMILES) by inputting the feature amount output from the feature amount output model Ma.
- the symbol information output model Mb is composed of, for example, an LSTM (Long Short Term Memory) network which is a kind of recurrent neural network (RNN).
- the LSTM replaces the hidden layer of the RNN with the LSTM layer.
- an embedded layer (Embedding layer: referred to as Wemb in FIG. 2) is provided in front of each LSTM layer, and is unique to each input to the LSTM layer.
- Vector can be given.
- a softmax function (denoted as softmax in FIG. 2) is applied to the output from each LSTM layer, and the output from each LSTM layer is converted into a probability.
- the sum of the output probabilities of n (n is a natural number) to which the softmax function is applied is 1.0.
- the output from each LSTM layer is converted into a probability by the softmax function, and the loss (difference between the learning result and the correct answer data) is obtained by using the cross entropy error as the loss function.
- the symbol information output model Mb is configured by the LSTM network, but the present invention is not limited to this, and the symbol information output model Mb may be configured by the GRU (Gated Recurrent Unit).
- the analysis model M1 configured as described above generates a plurality of symbolic information about the structural formula of the target compound based on the feature amount of the target image.
- the feature amount output model Ma outputs the feature amount of the target image, and the feature amount is input to the symbol information output model Mb.
- the symbol information output model Mb sequentially specifies the symbols constituting the symbol information corresponding to the input feature amount from the beginning of the symbol information, and outputs the symbol information in which the symbols are arranged in the specified order.
- the symbol information output model Mb outputs symbol information consisting of m symbols (m is a natural number of 2 or more), a plurality of symbols from the 1st to mth symbols are output from the corresponding LSTM layer. Output candidates.
- the number of combinations of symbols is not limited to the number when all the plurality of candidates specified for each of the 1st to mth symbols are combined.
- a search algorithm such as beam search is applied to a plurality of candidates specified for each of the 1st to mth symbols, and the top K of the plurality of candidates are applied.
- the symbol K is a natural number
- the symbol information output model Mb calculates the output score of each symbol information based on the calculated output probability of each symbol.
- the output score is the sum when all the output probabilities of the m symbols constituting each symbol information are added together.
- the present invention is not limited to this, and the product of the output probabilities of the m symbols constituting each symbol information may be used as the output score.
- the symbol information output model Mb outputs a predetermined number of symbol information according to the calculated output score.
- Q symbol information is output in order from the symbol information having the highest calculated output score.
- the number Q of the symbol information to be output may be arbitrarily determined, but is preferably about 2 to 20.
- the present invention is not limited to this, and only one symbolic information having the highest output score may be output for the structural formula of the target compound. Alternatively, a number of symbol information corresponding to the number of combinations when all the candidates for each symbol are combined may be output.
- the image analysis device 10 is a computer in which a processor 11, a memory 12, an external interface 13, an input device 14, an output device 15, and a storage 16 are electrically connected to each other.
- the image analysis device 10 is configured by one computer, but the image analysis device 10 may be configured by a plurality of computers.
- the processor 11 is configured to execute a program 21 described later and execute a series of processes related to image analysis.
- the processor 11 is composed of one or a plurality of CPUs (Central Processing Units) and a program 21 described later.
- CPUs Central Processing Units
- the hardware processor constituting the processor 11 is not limited to the CPU, but is limited to the CPU, FPGA (Field Programmable Gate Array), DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), GPU (Graphics Processing Unit), MPU (Micro- It may be a Processing Unit) or another IC (Integrated Circuit), or a combination thereof. Further, the processor 11 may be one IC (Integrated Circuit) chip that exerts the functions of the entire image analysis device 10 as represented by SoC (System on Chip) and the like.
- SoC System on Chip
- the hardware processor described above may be an electric circuit (Circuitry) in which circuit elements such as semiconductor elements are combined.
- the memory 12 is composed of semiconductor memories such as ROM (Read Only Memory) and RAM (Random Access Memory), and provides a work area to the processor 11 by temporarily storing programs and data, and the processor 11 executes the memory 12. Various data generated by the processing is also temporarily stored.
- ROM Read Only Memory
- RAM Random Access Memory
- the program stored in the memory 12 includes the program 21 for image analysis.
- This program 21 is a program for performing machine learning to construct an analysis model M1, a program for detecting a target image from a document, and a structural formula of a target compound based on the feature amount of the target image by the analysis model M1.
- the program 21 further includes a program for executing a determination process and a collation process on the generated symbol information.
- the program 21 may be acquired by reading it from a computer-readable recording medium (media), or by receiving (downloading) it through a network such as the Internet or an intranet.
- the external interface 13 is an interface for connecting to an external device.
- the image analysis device 10 communicates with an external device, such as a scanner or another computer on the Internet, via the external interface 13. Through such communication, the image analysis device 10 can acquire a part or all of the data for machine learning, and can also acquire a document in which the target image is posted.
- the input device 14 comprises, for example, a mouse, a keyboard, or the like, and accepts a user's input operation.
- the image analysis device 10 can acquire a part of machine learning data by, for example, a user inputting character information corresponding to symbol information through an input device 14.
- the output device 15 is, for example, a device including a display, a speaker, or the like, for displaying symbol information generated by the analysis model M1 or reproducing voice.
- the storage 16 includes, for example, a flash memory, an HDD (Hard Disc Drive), an SSD (Solid State Drive), an FD (Flexible Disc), an MO disk (Magneto-Optical disc), a CD (Compact Disc), and a DVD (Digital Versatile Disc). , SD card (Secure Digital card), USB memory (Universal Serial Bus memory), and the like.
- Various data including data for machine learning are stored in the storage 16.
- the storage 16 also stores data of various models constructed by machine learning, including the analysis model M1.
- the symbolic information of the structural formula of the target compound generated by the analysis model M1 can be stored in the storage 16 and registered as a database.
- the storage 16 is a device built in the image analysis device 10, but the present invention is not limited to this, and the storage 16 is an external device connected to the image analysis device 10. It may be an external computer (for example, a server computer for a cloud service) that is communicably connected via a network.
- an external computer for example, a server computer for a cloud service
- the hardware configuration of the image analysis device 10 is not limited to the above configuration, and constituent devices can be added, omitted, and replaced as appropriate according to a specific embodiment.
- the image analysis flow of the present embodiment proceeds in the order of the learning phase S001, the symbol information generation phase S002, and the symbol information check phase S003. Each phase will be described below.
- the learning phase S001 is a phase in which machine learning is performed in order to build a model required in the subsequent phases.
- the first machine learning S011, the second machine learning S012, and the third machine learning S013 are carried out.
- the first machine learning S011 is machine learning for constructing the analysis model M1, and as described above, the learning image and the symbolic information of the structural formula of the compound shown by the learning image are used as the learning data set. It is done using.
- the second machine learning S012 is machine learning for constructing a matching model used in the symbol information check phase S003.
- the collation model is a model that generates descriptive information in which the structural formula of the target compound is described by a descriptive method different from the above-mentioned linear notation from the target image.
- a description method different from the linear notation for example, a description method using a molecular fingerprint can be mentioned.
- Molecules fingerprints are used to identify molecules with certain characteristics, and as shown in FIG. 5, the structural formula is a binary multidimensional vector that indicates the presence or absence of each type of partial structure (fragment) in the structural formula. It is converted to.
- the partial structure is an element representing a part of the structural formula, and includes a plurality of atoms and bonds between atoms.
- the number of dimensions of the vector constituting the molecular fingerprint can be arbitrarily determined, and is set to, for example, tens to thousands of dimensions.
- a molecular fingerprint represented by a 167-dimensional vector is used, following MACCS Keys, which is a typical fingerprint.
- the description method different from the linear notation is not limited to the molecular fingerprint, and other description methods such as KEGG (Kyoto Encyclopedia of Genes and Genomes) Chemical Function format (KCF format) and Molecular Design Limited are operated.
- the description method may be the MOL notation, which is an input format of the chemical structure database (MACCS), or the SDF method, which is a modified version of MOL.
- a learning image showing the structural formula of the compound and descriptive information of the structural formula shown by the second learning image (specifically, descriptive information consisting of molecular fingerprints).
- the second learning image used in the second machine learning S012 may be the same image as the learning image used in the first machine learning S011, or may be the same image as the first machine learning image.
- An image prepared separately from the learning image used in S011 may be used.
- a collation model is constructed by performing the second machine learning S012 using the above learning data.
- the matching model will be described in detail later.
- the third machine learning S013 is machine learning for constructing a model (hereinafter referred to as an image detection model) for detecting the image from a document in which an image showing the structural formula of the compound is posted.
- the image detection model is a model that detects an image of a structural formula from a document by using an object detection algorithm.
- the object detection algorithm for example, R-CNN (Region-based CNN), Fast R-CNN, YOLO (You only Look Once), and SDD (Single Shot Multibox Detector) can be used.
- an image detection model using YOLO is constructed from the viewpoint of detection speed.
- the learning data (teacher data) used in the third machine learning S013 is created by applying an annotation tool to a learning image showing the structural formula of the compound.
- the annotation tool is a tool that adds related information such as a correct label (tag) and coordinates of an object to the target data as annotations.
- Learning data is created by starting the annotation tool, displaying the document containing the learning image, surrounding the area showing the structural formula of the compound with a bounding box, and annotating that area.
- the annotation tool for example, labeImg manufactured by tzutalin, VoTT manufactured by Microsoft, and the like can be used.
- the symbol information generation phase S002 is a phase in which an image (target image) of the structural formula of the target compound contained in the document is analyzed and symbol information of the structural formula of the target compound is generated.
- the processor 11 of the image analysis device 10 applies the above-mentioned image detection model to the document including the target image, and detects the target image in the document (S021). That is, in this step S021, the processor 11 detects the target image from the document by using the object detection algorithm (specifically, YOLO).
- the object detection algorithm specifically, YOLO
- the processor 11 determines a plurality of target images (images of a portion surrounded by a broken line in FIG. 6) from the above document as shown in FIG. ) Is detected.
- the processor 11 inputs the detected target image into the analysis model M1 (S022).
- the analysis model M1 the feature amount output model Ma in the first stage outputs the feature amount of the target image
- the symbol information output model Mb in the second stage outputs the symbolic information of the structural formula of the target compound based on the input feature amount of the target image. Is output.
- a predetermined number of symbol information is output in order from the symbol information having the highest output score.
- the processor 11 generates a plurality of symbolic information about the structural formula of the target compound based on the feature amount of the target image by the analysis model M1 (S023).
- step S021 the processor 11 inputs the detected plurality of target images into the analysis model M1 for each target image. In that case, a plurality of symbolic information will be generated for each target image for the structural formula of the target compound indicated by each of the plurality of target images.
- the symbol information check phase S003 is a phase in which a determination process and a collation process are executed for each of the plurality of symbol information generated for the structural formula of the target compound in the symbol information generation phase S002.
- the processor 11 first executes the determination process (S031).
- the determination process is a process for determining the presence or absence of an abnormality in the SMILES notation for each of a predetermined number of symbol information output from the symbol information output model Mb of the analysis model M1. More specifically, the processor 11 changes the symbol information output by the symbol information output model Mb from the character string to the structural formula in order to determine whether the character string forming each symbol information is in the correct SMILES notation word order. Attempt to convert. Here, if the conversion to the structural formula is successful, it is determined that the symbol information has no notational abnormality (in other words, the symbol information is normal).
- the symbol information having no abnormality will be referred to as "normal symbol information" below.
- the processor 11 executes a collation process for the normal symbol information (S032).
- the collation process is a process of collating the first descriptive information of the structural formula of the target compound generated by the collation model with the second descriptive information generated from the normal symbol information.
- the first description information describes the structural formula of the target compound by the description method of the molecular fingerprint.
- the first description information is generated by inputting the target image into the collation model M2 shown in FIG. 7.
- the collation model M2 is constructed by the second machine learning S012 described above, and includes the feature amount output model Mc and the descriptive information output model Md as shown in FIG. 7.
- the feature amount output model Mc is a model that outputs the feature amount of the target image by inputting an image (target image) showing the structural formula of the target compound, similar to the feature amount output model Ma of the analysis model M1.
- it is composed of CNN.
- the feature amount output model Mc outputs a vectorized feature amount as in the feature amount output model Ma.
- the descriptive information output model Md is a model that outputs descriptive information (specifically, descriptive information consisting of molecular fingerprints) corresponding to the feature amount by inputting the feature amount output from the feature amount output model Mc. ..
- the descriptive information output model Md is configured by, for example, a neural network (NN).
- the descriptive information output model Md outputs descriptive information composed of vectorized molecular fingerprints as the first descriptive information.
- the descriptive information output from the descriptive information output model Md is the descriptive information of the structural formula of the target compound.
- the feature amount output model Mc of the collation model M2 the feature amount output model Ma of the analysis model M1 may also be used. That is, the weight of the intermediate layer of the CNN may be set to a common value among the feature amount output models Ma and Mc.
- the second machine learning S012 is fixed with the weight of the intermediate layer of the CNN determined in the first machine learning S011 as it is, and the weight of the intermediate layer of the NN which is the descriptive information output model Md is determined. Therefore, the load of model construction (calculation load) can be reduced.
- the collation model M2 does not also use the CNN (feature amount output model Ma) of the analysis model M1, and may be configured by a separate CNN.
- the second descriptive information is descriptive information in which the structural formula represented by the normal symbol information is described by the description method of the molecular fingerprint.
- the second descriptive information is generated by converting the symbolic information in SMILES notation into a molecular fingerprint according to a conversion rule.
- the conversion rule used at this time is defined by specifying the correspondence between the structural formula of SMILES notation and the molecular fingerprint for many compounds and making a rule.
- the first descriptive information and the second descriptive information generated as described above are collated, and the degree of agreement between the two descriptive information is calculated.
- the second description information is generated from each normal symbol information, and the degree of matching with the first description information is calculated for each second description information.
- a method for calculating the degree of agreement a known method for calculating the degree of similarity between molecular fingerprints can be used, and for example, a method for calculating the Tanimoto coefficient can be used.
- the output process is a process of finally outputting (for example, displaying) the normal symbol information as the symbol information of the structural formula of the target compound according to the degree of coincidence calculated by the collation process.
- to output the normal symbol information according to the degree of matching may mean, for example, to output only the normal symbol information whose degree of matching exceeds the reference value, or the normal symbol information having a high degree of matching. It may be output in order from.
- the image analysis device 10 of the present embodiment uses the analysis model M1 constructed by the first machine learning, and based on the feature amount of the target image showing the structural formula of the target compound, the structural formula is expressed in SMILES symbolic information. Can be generated. As a result, it becomes possible to appropriately respond to changes in the writing style of the structural formula in the target image.
- symbol information is generated from the feature amount of the target image by using the analysis model M1 which is the result of machine learning. That is, in the present embodiment, even if the writing method of the structural formula is changed, the feature amount of the image showing the structural formula can be specified, and if the feature amount can be specified, the symbol information can be generated from the feature amount. it can. As described above, according to the present embodiment, the symbolic information can be appropriately acquired even when the writing method of the structural formula of the target compound is changed.
- the image analysis apparatus the image analysis method, and the program of the present invention have been described above with specific examples, the above-described embodiment is merely an example, and other embodiments are also conceivable.
- the computer constituting the image analysis device is a server used for ASP (Application Service Provider), SaaS (Software as a Service), PaaS (Platform as a Service), IaaS (Infrastructure as a Service), or the like. You may.
- a user who uses a service such as the ASP operates a terminal (not shown) to transmit a document including a target image to a server.
- the server When the server receives the document sent from the user, the server detects the target image from the document and generates symbolic information of the structural formula of the target compound indicated by the target image based on the feature amount of the target image. Then, the server outputs (transmits) the generated symbol information to the user's terminal. On the user side, the symbol information sent from the server is displayed or the voice is reproduced.
- the present invention is not limited to this, and only one of the determination process and the collation process may be executed, or neither process may be executed.
- machine learning (first to third machine learning) for constructing various models is performed by the image analysis device 10, but the present invention is not limited to this. Some or all machine learning may be performed by another device (computer) different from the image analysis device 10.
- the image analysis device 10 acquires a model constructed by machine learning performed by another device. For example, when the first machine learning is performed by another device, the image analysis device 10 acquires the analysis model M1 constructed by the first machine learning from the other device. Then, the image analysis device 10 analyzes the target image by the acquired analysis model M1 and generates symbolic information about the structural formula of the target compound indicated by the image.
- the above analysis model M1 is constructed by machine learning using a learning image and symbol information representing the structural formula of the compound shown by the learning image in a linear notation. Then, the analysis model M1 generates symbolic information of the structural formula of the target compound indicated by the target image based on the feature amount of the target image.
- the present invention is not limited to this, and another model can be considered as an analysis model for generating symbolic information of the structural formula of the target compound.
- the analysis model shown in FIG. M3) can be mentioned.
- the analysis model M3 according to the modified example has a feature amount output model Me, a descriptive information output model Mf, and a symbol information output model Mg.
- the analysis model M3 according to the modified example is constructed by machine learning (hereinafter, machine learning related to the modified example).
- machine learning hereinafter, machine learning related to the modified example.
- the learning image showing the structural formula of the compound, the symbolic information of the structural formula of the compound shown by the learning image (for example, the symbolic information of SMILES notation), and the structure of the compound shown by the learning image. It is performed using the descriptive information of the formula (for example, descriptive information consisting of molecular fingerprints) as a training data set.
- the feature amount output model Me outputs the feature amount of the target image by inputting an image (target image) showing the structural formula of the target compound, and is composed of, for example, CNN. Will be done.
- the feature amount output model Me outputs a vectorized feature amount (for example, a 2048-dimensional vector).
- the descriptive information output model Mf is a model that outputs descriptive information of the structural formula of the target compound (specifically, descriptive information consisting of molecular fingerprints) when the target image is input.
- the descriptive information output model Mf is a model based on the above-mentioned collation model M2, and outputs descriptive information (for example, a 167-dimensional vector) composed of, for example, a CNN and vectorized molecular fingerprints.
- the feature amount output from the feature amount output model Me and the descriptive information output from the descriptive information output model Mf are combined and vectorized. Information is generated.
- the vector dimension number of the composite information is the sum of the vector dimension number of the feature quantity and the vector dimension number of the description information (that is, 2215 dimensions).
- Symbol information output model Mg is a model that outputs symbol information (specifically, symbol information in SMILES notation) corresponding to the composite information by inputting the above composite information.
- the symbol information output model Mg has almost the same as the symbol information output model Mb of the analysis model M1, and is composed of, for example, an RNN, and an LSTM network can be used as an example thereof.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
- Character Input (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021565611A JP7268198B2 (ja) | 2019-12-16 | 2020-12-16 | 画像解析装置、画像解析方法、及びプログラム |
| CN202080087306.9A CN114846508B (zh) | 2019-12-16 | 2020-12-16 | 图像分析装置、图像分析方法及计算机程序产品 |
| US17/839,468 US12417648B2 (en) | 2019-12-16 | 2022-06-13 | Image analysis apparatus, image analysis method, and program |
| JP2023068920A JP7472358B2 (ja) | 2019-12-16 | 2023-04-20 | 画像解析装置、端末、画像解析方法、表記情報取得方法、及びプログラム |
| US19/300,689 US20250371899A1 (en) | 2019-12-16 | 2025-08-15 | Image analysis apparatus, image analysis method, and program |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019-226239 | 2019-12-16 | ||
| JP2019226239 | 2019-12-16 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/839,468 Continuation US12417648B2 (en) | 2019-12-16 | 2022-06-13 | Image analysis apparatus, image analysis method, and program |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021125206A1 true WO2021125206A1 (ja) | 2021-06-24 |
Family
ID=76478653
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/046887 Ceased WO2021125206A1 (ja) | 2019-12-16 | 2020-12-16 | 画像解析装置、画像解析方法、及びプログラム |
Country Status (4)
| Country | Link |
|---|---|
| US (2) | US12417648B2 (https=) |
| JP (2) | JP7268198B2 (https=) |
| CN (1) | CN114846508B (https=) |
| WO (1) | WO2021125206A1 (https=) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114846508B (zh) * | 2019-12-16 | 2025-06-27 | 富士胶片株式会社 | 图像分析装置、图像分析方法及计算机程序产品 |
| JP7449961B2 (ja) * | 2019-12-26 | 2024-03-14 | 富士フイルム株式会社 | 情報処理装置、情報処理方法、及びプログラム |
| US12159227B2 (en) * | 2020-03-13 | 2024-12-03 | Korea University Research And Business Foundation | System for predicting optical properties of molecules based on machine learning and method thereof |
| US12499419B2 (en) * | 2020-09-30 | 2025-12-16 | X Development Llc | Techniques for predicting the spectra of materials using molecular metadata |
| US11822599B2 (en) * | 2020-12-16 | 2023-11-21 | International Business Machines Corporation | Visualization resonance for collaborative discourse |
| US12147407B2 (en) * | 2022-04-21 | 2024-11-19 | William Marsh Rice University | Method for mathematical language processing via tree embeddings |
| CN120826300A (zh) | 2023-03-06 | 2025-10-21 | 软银集团股份有限公司 | 机器人的控制系统、机器人的控制程序和机器人的管理系统 |
| CN117649676A (zh) * | 2024-01-29 | 2024-03-05 | 杭州德睿智药科技有限公司 | 一种基于深度学习模型的化学结构式的识别方法 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012021806A (ja) * | 2010-07-12 | 2012-02-02 | Noguchi Institute | 糖鎖構造認識用解析方法、糖鎖構造認識用解析装置およびプログラム |
| JP2013101510A (ja) * | 2011-11-08 | 2013-05-23 | Fujitsu Ltd | 情報提供装置、情報提供プログラムおよび情報提供方法 |
| CN108334839A (zh) * | 2018-01-31 | 2018-07-27 | 青岛清原精准农业科技有限公司 | 一种基于深度学习图像识别技术的化学信息识别方法 |
Family Cites Families (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5157736A (en) * | 1991-04-19 | 1992-10-20 | International Business Machines Corporation | Apparatus and method for optical recognition of chemical graphics |
| US20040090439A1 (en) * | 2002-11-07 | 2004-05-13 | Holger Dillner | Recognition and interpretation of graphical and diagrammatic representations |
| US8718375B2 (en) * | 2010-12-03 | 2014-05-06 | Massachusetts Institute Of Technology | Sketch recognition system |
| WO2013030850A2 (en) * | 2011-08-26 | 2013-03-07 | Council Of Scientific & Industrial Research | Chemical structure recognition tool |
| JP2013061886A (ja) | 2011-09-14 | 2013-04-04 | Kyushu Univ | 化学構造図認識システム及び化学構造図認識システム用のコンピュータプログラム |
| JP6051988B2 (ja) | 2013-03-19 | 2016-12-27 | 富士通株式会社 | 情報処理プログラム、情報処理方法および情報処理装置 |
| CN105893338B (zh) * | 2015-02-17 | 2021-07-09 | 北京三星通信技术研究有限公司 | 用于输入公式的方法、装置和电子设备 |
| US9530102B2 (en) * | 2015-02-17 | 2016-12-27 | The Mathworks, Inc. | Multimodal input processing |
| US9904847B2 (en) * | 2015-07-10 | 2018-02-27 | Myscript | System for recognizing multiple object input and method and product for same |
| US9881208B2 (en) * | 2016-06-20 | 2018-01-30 | Machine Learning Works, LLC | Neural network based recognition of mathematical expressions |
| WO2019004437A1 (ja) * | 2017-06-30 | 2019-01-03 | 学校法人 明治薬科大学 | 予測装置、予測方法、予測プログラム、学習モデル入力データ生成装置および学習モデル入力データ生成プログラム |
| KR102587959B1 (ko) | 2018-01-17 | 2023-10-11 | 삼성전자주식회사 | 뉴럴 네트워크를 이용하여 화학 구조를 생성하는 장치 및 방법 |
| US12217834B2 (en) * | 2019-05-31 | 2025-02-04 | D. E. Shaw Research, Llc | Molecular graph generation from structural features using an artificial neural network |
| CN110413814A (zh) * | 2019-07-12 | 2019-11-05 | 智慧芽信息科技(苏州)有限公司 | 图像数据库建立方法、搜索方法、电子设备和存储介质 |
| CN114846508B (zh) * | 2019-12-16 | 2025-06-27 | 富士胶片株式会社 | 图像分析装置、图像分析方法及计算机程序产品 |
| JP7449961B2 (ja) * | 2019-12-26 | 2024-03-14 | 富士フイルム株式会社 | 情報処理装置、情報処理方法、及びプログラム |
-
2020
- 2020-12-16 CN CN202080087306.9A patent/CN114846508B/zh active Active
- 2020-12-16 WO PCT/JP2020/046887 patent/WO2021125206A1/ja not_active Ceased
- 2020-12-16 JP JP2021565611A patent/JP7268198B2/ja active Active
-
2022
- 2022-06-13 US US17/839,468 patent/US12417648B2/en active Active
-
2023
- 2023-04-20 JP JP2023068920A patent/JP7472358B2/ja active Active
-
2025
- 2025-08-15 US US19/300,689 patent/US20250371899A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012021806A (ja) * | 2010-07-12 | 2012-02-02 | Noguchi Institute | 糖鎖構造認識用解析方法、糖鎖構造認識用解析装置およびプログラム |
| JP2013101510A (ja) * | 2011-11-08 | 2013-05-23 | Fujitsu Ltd | 情報提供装置、情報提供プログラムおよび情報提供方法 |
| CN108334839A (zh) * | 2018-01-31 | 2018-07-27 | 青岛清原精准农业科技有限公司 | 一种基于深度学习图像识别技术的化学信息识别方法 |
Non-Patent Citations (1)
| Title |
|---|
| SATO F., FUJIYOSHI A.: "Proposal of the Uses of a Formal Grammar to Recognize Condensed Structural Formulas for Optical Chemical Structure Recognition", TRANSACTIONS OF INFORMATION PROCESSING SOCIETY OF JAPAN (JOURNAL, vol. 57, no. 11, 1 November 2016 (2016-11-01), pages 2467 - 2474, XP055836107 * |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7472358B2 (ja) | 2024-04-22 |
| JPWO2021125206A1 (https=) | 2021-06-24 |
| US20250371899A1 (en) | 2025-12-04 |
| CN114846508A (zh) | 2022-08-02 |
| US20220309815A1 (en) | 2022-09-29 |
| JP7268198B2 (ja) | 2023-05-02 |
| US12417648B2 (en) | 2025-09-16 |
| JP2023083462A (ja) | 2023-06-15 |
| CN114846508B (zh) | 2025-06-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7472358B2 (ja) | 画像解析装置、端末、画像解析方法、表記情報取得方法、及びプログラム | |
| CN107004140B (zh) | 文本识别方法和计算机程序产品 | |
| JP7449961B2 (ja) | 情報処理装置、情報処理方法、及びプログラム | |
| JP2019212115A (ja) | 検査装置、検査方法、プログラム及び学習装置 | |
| CN112687332A (zh) | 用于确定致病风险变异位点的方法、设备和存储介质 | |
| Bicego et al. | A bioinformatics approach to 2D shape classification | |
| CN116363212A (zh) | 一种基于语义匹配知识蒸馏的3d视觉定位方法和系统 | |
| CN117115516A (zh) | 一种融合图像字幕和bert的多模态情感分析方法及系统 | |
| CN117873487B (zh) | 一种基于gvg的代码函数注释生成方法 | |
| KR102445098B1 (ko) | 인공 지능 기반 의료 텍스트의 노이즈 데이터 필터링 방법, 장치 및 프로그램 | |
| CN115879669A (zh) | 一种评论评分的预测方法、装置、电子设备及存储介质 | |
| JP2023016031A (ja) | テーブルに含まれる情報を認識する文字認識方法及び文字認識システム | |
| JP7475192B2 (ja) | 識別器学習装置及び識別器学習方法 | |
| CN111783088A (zh) | 一种恶意代码家族聚类方法、装置和计算机设备 | |
| CN113889281A (zh) | 一种中文医疗智能实体识别方法、装置及计算机设备 | |
| JP7761600B2 (ja) | 情報処理装置、情報処理方法及びプログラム | |
| CN118898270B (zh) | 一种用于训练生物语言模型的方法及装置 | |
| US12547877B1 (en) | Apparatus and method of multi-channel data encoding | |
| JP5942661B2 (ja) | 情報処理装置及び情報処理プログラム | |
| CN120409464A (zh) | 电力文本的信息抽取方法、装置、计算机设备、计算机可读存储介质和计算机程序产品 | |
| Syamala | An Image-Text Model for Effective Retrieval of Seventeenth-Century Spanish American Notary Records | |
| CN115146640A (zh) | 用于医保的疾病名称比对方法、装置、设备和存储介质 | |
| WO2025032757A1 (ja) | データ生成プログラム、データ生成方法および情報処理装置 | |
| JP5343579B2 (ja) | パターン認識辞書作成装置及びプログラム | |
| WO2026079389A1 (ja) | 情報処理システム、情報処理方法、及び、学習モデル作成方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20903804 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021565611 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20903804 Country of ref document: EP Kind code of ref document: A1 |
|
| WWG | Wipo information: grant in national office |
Ref document number: 202080087306.9 Country of ref document: CN |