WO2023173536A1

WO2023173536A1 - Chemical formula identification method and apparatus, computer device, and storage medium

Info

Publication number: WO2023173536A1
Application number: PCT/CN2022/089509
Authority: WO
Inventors: 郑喜民; 朱翌; 舒畅; 陈又新
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-03-15
Filing date: 2022-04-27
Publication date: 2023-09-21
Also published as: CN114627462A

Abstract

The present application relates to the field of artificial intelligence, and in particular to a chemical formula identification method and apparatus, a computer device, and a storage medium. The method comprises: obtaining an image to be detected comprising a chemical formula; inputting the image to be detected into a multi-target detection model to obtain a chemical formula region image; inputting the chemical formula region image into a chemical formula identification model to obtain a candidate chemical formula in the chemical formula region image; performing existence check on the candidate chemical formula according to a pre-established chemical formula database to obtain a check result; and when it is determined, according to the check result, that the candidate chemical formula exists, determining the candidate chemical formula as an identified chemical formula. In addition, the present application also relates to blockchain technology, and the image to be detected can be stored in a blockchain. The present application improves the efficiency of chemical formula identification.

Description

Chemical formula identification method, device, computer equipment and storage medium

This application claims priority to the Chinese patent application submitted to the China Patent Office on March 15, 2022, with application number 202210255360.0 and the invention title "Chemical formula identification method, device, computer equipment and storage medium", the entire content of which is incorporated by reference. in this application.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a chemical formula identification method, device, computer equipment and storage medium.

Background technique

With the development of computer technology, object detection through computers has become more and more widespread. Target detection, also called target extraction, uses computers to find targets or objects of interest in images and determine their categories and locations. Target detection is an important topic in the field of computer vision. Target detection is often associated with image description (Image Caption), which means that the computer generates corresponding descriptive text based on the input image.

The recognition of chemical formulas is a branch of target detection and image description tasks. The inventor realized that traditional chemical formula recognition technology is based on computer vision technology and requires a series of rule designs, including image vectorization, image decomposition, image thinning, line enhancement, optical character recognition, and reconstructed molecular graphics. Representation, etc., and the process is complex, making chemical formula recognition less efficient.

Contents of the invention

The purpose of the embodiments of the present application is to propose a chemical formula recognition method, device, computer equipment and storage medium to solve the problem of low chemical formula recognition efficiency.

In order to solve the above technical problems, embodiments of the present application provide a chemical formula identification method, which adopts the following technical solution:

Obtain the image to be detected containing the chemical formula;

Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;

Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;

Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;

When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.

In order to solve the above technical problems, embodiments of the present application also provide a chemical formula identification device, which adopts the following technical solution:

Image acquisition module, used to acquire images to be detected containing chemical formulas;

A region detection module, used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image;

A chemical formula recognition module, used to input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;

A chemical formula verification module, used to verify the existence of the candidate chemical formula according to a pre-established chemical formula database and obtain verification results;

A chemical formula determination module, configured to determine the candidate chemical formula as a recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.

In order to solve the above technical problems, embodiments of the present application also provide a computer device, including a memory and a processor. Computer-readable instructions are stored in the memory. When the processor executes the computer-readable instructions, the following steps are implemented:

Obtain the image to be detected containing the chemical formula;

In order to solve the above technical problems, embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the following steps are implemented:

Obtain the image to be detected containing the chemical formula;

Compared with the existing technology, the embodiments of the present application mainly have the following beneficial effects: after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and obtain the chemical formula area image; and then the chemical formula is The regional image is input into the chemical formula recognition model to identify the chemical formula and obtain the candidate chemical formula; the chemical formula database stores known chemical formulas. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the recognized chemical formula, ensuring the accuracy of chemical formula recognition. Accuracy; the multi-target detection model and chemical formula recognition model in this application can be neural networks, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.

Description of the drawings

In order to more clearly illustrate the solutions in this application, a brief introduction will be made below to the drawings needed to be used in describing the embodiments of this application. Obviously, the drawings in the following description are some embodiments of this application and are very useful for this field. Ordinary technicians can also obtain other drawings based on these drawings without exerting creative work.

Figure 1 is an exemplary system architecture diagram to which the present application can be applied;

Figure 2 is a flow chart of an embodiment of a chemical formula identification method according to the present application;

Figure 3 is a schematic structural diagram of an embodiment of a chemical formula identification device according to the present application;

Figure 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Detailed ways

Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field belonging to this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only. The purpose is not intended to limit the application; the terms "including" and "having" and any variations thereof in the description and claims of the application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second", etc. in the description and claims of this application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.

Reference herein to "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art understand, both explicitly and implicitly, that the embodiments described herein may be combined with other embodiments.

In order to enable those skilled in the art to better understand the solution of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.

As shown in Figure 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 is a medium used to provide communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

Users can use

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, etc. Various communication client applications can be installed on the

terminal devices

101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.

Terminal devices

101, 102, and 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compresses Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compresses Standard Audio Layer 4) players, laptops and desktop computers, etc.

The server 105 may be a server that provides various services, such as a backend server that provides support for pages displayed on the

terminal devices

101, 102, and 103.

It should be noted that the chemical formula identification method provided in the embodiments of the present application is generally executed by a server, and accordingly, the chemical formula identification device is generally installed in the server.

It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Depending on implementation needs, there can be any number of end devices, networks, and servers.

Continuing to refer to FIG. 2 , a flow chart of one embodiment of a chemical formula identification method according to the present application is shown. The chemical formula identification method includes the following steps:

Step S201: Obtain an image to be detected containing a chemical formula.

In this embodiment, the electronic device (such as the server shown in Figure 1) on which the chemical formula identification method runs can communicate with the terminal through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G/5G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connections that are now known or developed in the future. Connection method.

Specifically, the server obtains the image to be detected, the image to be detected contains the chemical formula, and the server needs to extract the chemical formula from the image to be detected. The image to be detected can be obtained through various methods, for example, it can be obtained by scanning; or, the image to be detected can be obtained by taking a picture by the terminal and then sent to the server.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned image to be detected, the above-mentioned image to be detected can also be stored in a node of the blockchain.

The blockchain referred to in this application is a new application model of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain is essentially a decentralized database. It is a series of data blocks generated using cryptographic methods. Each data block contains a batch of network transaction information and is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. Blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Step S202: Input the image to be detected into the multi-target detection model to obtain a chemical formula region image.

The multi-target detection model may be a model that detects image areas containing chemical formulas in the image to be detected.

Specifically, the image to be detected may include information such as chemical formulas and other text information. Therefore, the position of the chemical formula in the image to be detected can be determined first, that is, the chemical formula region image in which the chemical formula exists is determined.

The image to be detected can be input into a pre-trained multi-target detection model, and the multi-target detection model can be composed of a neural network capable of realizing target detection. It can identify the image area containing the chemical formula in the image to be detected and output the position information of the image area.

The multi-target detection model undergoes multi-label supervised training in advance, and can identify multiple chemical formula region images from the image to be detected. In one embodiment, chemical formula region image recognition can be performed through DETR (DEtection TRansformer, an end-to-end target detection network).

Step S203: Input the chemical formula region image into the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image.

The chemical formula recognition model may be a model used for chemical formula recognition.

Specifically, the chemical formula region image is input into the chemical formula recognition model. The chemical formula recognition model can be built based on the neural network and has been trained with end-to-end image description (Image Caption) in advance. The chemical formula recognition model first encodes the chemical formula region image, then decodes it, and outputs the chemical formula in the chemical formula region image to obtain the candidate chemical formula.

In one embodiment, the candidate chemical formula may be a chemical formula expressed in SMILES or InCHI, where SMILES and InCHI are two existing expression methods for chemical formulas.

Building a chemical formula recognition model based on neural networks does not require traditional image processing technologies such as image vectorization, image decomposition, and molecular reconstruction, and the process is simplified. For software designers and developers in this field, there is no need to carry out a large amount of manual feature design, and the process is simpler; for ordinary users, chemical formula recognition becomes faster.

Step S204: Existence verification is performed on the candidate chemical formulas based on the pre-established chemical formula database to obtain verification results.

The chemical formula database may be a pre-established database and may store all known chemical formulas.

Specifically, after obtaining the candidate chemical formula, an existence check is performed on the candidate chemical formula according to a pre-established chemical formula database; the existence check refers to searching in the chemical formula database whether there is a chemical formula that is the same as the candidate chemical formula. The search result can be a chemical formula If there is a chemical formula that is the same as the candidate chemical formula in the database, or there is no chemical formula that is the same as the candidate chemical formula, the search result is the verification result.

Step S205: When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.

Specifically, when it is determined according to the verification results that the chemical formula that is the same as the candidate chemical formula exists in the chemical formula database, it indicates that the identified candidate chemical formula actually exists, and the candidate chemical formula is determined to be the identified chemical formula and output, and the result of chemical formula identification is obtained.

In this embodiment, after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with The chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.

Further, the above step S202 may include: inputting the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected; inputting the first feature map into the feature extraction network in the multi-target detection model, Obtain the second feature map; input the second feature map into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.

Specifically, the multi-target detection model has a three-layer architecture, including a feature generation network, a feature extraction network, and a detection layer. The image to be detected is preferably input into the feature generation network, and the feature generation network converts the input image to be detected into a feature map, thereby generating a first feature map.

In one embodiment, the feature generation network may be a CNN (Convolutional Neural Networks, convolutional neural network) backbone network.

The first feature map is input into the feature extraction network for feature extraction, and the second feature map is obtained. In one embodiment, the feature extraction network may be a Transformer network. Transformer is a model proposed by Google that makes extensive use of the self-attention (Self-Attention) mechanism. The Transformer network includes an encoder and a decoder. The first feature map can be converted into a one-dimensional feature map by the multi-target detection model, and then input to the Transformer encoder. The output of the Transformer encoder is N fixed-length embedding vectors, where N is The number of objects in the image hypothesized by the network. In this application, N is the number of chemical formula region images hypothesized by the network. The Transformer decoder processes the embedding vector based on the self-attention mechanism to obtain the second feature map.

The second feature map is input to the detection layer. The detection layer includes a feedforward neural network and can output the category and location information of each image area in the image to be detected. The category is used to indicate whether the image area contains a chemical formula, and the location information is used to indicate the image area. position in the image to be detected. According to the category and position information output by the detection layer, the chemical formula region image in the image to be detected can be obtained.

In one embodiment, after obtaining the category and location information output by the detection layer, the image to be detected is cropped according to the category and location information, and the image area containing the chemical formula is cropped out to obtain a chemical formula area image.

The multi-target detection model needs to be trained in advance. The training can be end-to-end training. Since the multi-target detection model only needs to detect chemical formula region images and background images, the number of classes is set to 2; since the background image occupies In most areas, focal loss can be used to solve the problem of imbalance between positive and negative samples. During training, the Transformer decoder outputs a dictionary consisting of chemical elements.

The multi-target detection model can be trained not only on data sets that originally contain chemical formulas, but also on a large number of synthetic data sets. For example, the cut-paste method is used to combine the collected document data sets and chemical formula data sets. Combining random scaling and automatically labeling its location information can save the labor cost of manually labeling data sets and improve training efficiency.

In this embodiment, the multi-target detection model includes a three-layer architecture. The image to be detected can be first converted into a first feature map, then feature extraction is performed to obtain a second feature map, and then the position information of the image area containing the chemical formula is obtained through the detection layer. , thereby successfully obtaining the chemical formula region image.

Further, before the above step S203, it may also include: preprocessing the chemical formula region image, and the preprocessing includes binarization processing, image thinning processing, and image scaling processing.

Specifically, before inputting the chemical formula region image into the chemical formula recognition model, the chemical formula region image may also be preprocessed, where the preprocessing may include binarization processing, image thinning processing, and image scaling processing. Among them, the image scaling process is to adjust the chemical formula region image to a preset size.

In this embodiment, by preprocessing the chemical formula region image, the chemical formula region image can be image optimized to facilitate the processing of the chemical formula recognition model and ensure the accuracy of chemical formula recognition.

Further, the above step S203 may include: inputting the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map; inputting the encoded feature map into the decoder in the chemical formula recognition model to obtain the decoder in the chemical formula region image. Candidate chemical formula.

Specifically, the chemical formula recognition model can include a multi-level encoder and a decoder. The multi-level encoder is responsible for encoding the input chemical formula region image to obtain the encoded feature map; the decoder is responsible for decoding the encoded feature map and outputting the candidate chemical formula in the chemical formula region image.

In one embodiment, the multi-level encoder can be built based on the Swin Transformer encoder. Swin Transformer includes sliding window operations and has a hierarchical design. It is a Transformer specially used for image processing tasks. Swin Transformer adopts a hierarchical design, including 4 stages. Each stage will reduce the resolution of the input feature map and expand the receptive field layer by layer like a CNN network. Window attention divides the image into different windows according to a certain size. Each transformer's attention is only calculated inside the window. The shift window attention in Swin Transformer changes the way the window is divided, so that the window block for attention calculation of each pixel is changing. Its sliding window operation includes non-overlapping local window and overlapping cross-window. Limiting the attention calculation to a window can, on the one hand, introduce the locality of the CNN convolution operation, and on the other hand, save the amount of calculation.

In one embodiment, the decoder in the chemical formula recognition model can be built based on the decoder Decoder in Transformer. It uses masked multi-head attention, so that the model only sees past data but not future data. It uses the hidden value of the decoder. The value of Layer (as Q) and the value of the hidden Layer in the encoder part (as K) are used as attention, and then the input of the encoder is used as V, which is weighted to the input of the decoder.

In this embodiment, the chemical formula recognition model can be a neural network, including a multi-level encoder and a decoder, which first extracts features of the chemical formula region image and then decodes them to obtain candidate chemical formulas.

Further, the multi-level encoder includes several sequentially connected coding layers; the above-mentioned step of inputting the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map may include: performing slice processing on the chemical formula region image. , obtain the fragmented image; input the fragmented image into the first coding layer to obtain the coding feature map; for the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next One coding layer is iterated until the last coding layer; the coding feature map output by the last coding layer is determined as the coded feature map.

Specifically, a multi-level encoder includes several sequentially connected coding layers. In one embodiment, the multi-layer encoder may include four sequentially connected encoding layers, that is, four stages (stage1, stage2, stage3, and stage4). The four stages will gradually reduce the resolution of the input feature map to expand the experience. wild.

The chemical formula region image is first patched by a multi-layer encoder (Patch Partition) to obtain multiple patch images, and then input to the first coding layer for coding processing for feature extraction, and the coding output by the first coding layer is obtained. Feature map.

After the coding feature map output by the first coding layer is input to the second coding layer, it is first subjected to downsampling processing (Patch Merging) to reduce the resolution; among them, the coding feature map is input into the second coding layer and subsequent coding layer, the downsampling process will be performed according to the downsampling standard set by the corresponding coding layer, so that the number of channels of the output coding feature map gradually increases. After the downsampling process, the coding network in the second coding layer is input for coding processing, and the coding feature map output by the second coding layer is obtained.

The coding feature map output by the second coding layer is input to the subsequent coding layer for iteration until the last coding layer; the coding feature map output by the last coding layer will be used as the coded feature map.

In this embodiment, multiple sequentially connected coding layers in the multi-level encoder encode the chemical formula region image layer by layer and reduce the resolution layer by layer, which can fully extract features of the chemical formula region image and ensure chemical formula recognition. accuracy.

Further, the coding layer includes a coding network; the above steps of inputting the segmented image into the first coding layer to obtain the coding feature map may include: performing linear embedding processing on the segmented image; inputting the segmented image after linear embedding processing The encoding network in the first encoding layer uses a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization to process the sliced image after linear embedding. Perform feature extraction to obtain the encoded feature map.

Specifically, after obtaining the segmented image, the first coding layer first performs linear embedding on the segmented image, and then outputs the linearly embedded segmented image to the coding network in the first coding layer. The coding network is Swin Trasnformer Block. Each coding layer in the multi-layer coding layer has a Swin Trasnformer Block. Each Swin Trasnformer Block has the same internal processing logic except for the input and output dimensions.

The encoding network in the first coding layer characterizes the sliced images after linear embedding processing through window-based multi-head self-attention mechanism, sliding window-based multi-head self-attention mechanism, multi-layer perception mechanism and layer normalization processing. Extract and obtain the coding feature map; the coding network in the second coding layer and subsequent coding layers uses a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization The process performs feature extraction on the input encoding feature map and outputs a new encoding feature map.

The idea of residual connection is adopted inside each coding network Swin Trasnformer Block. The network input z ^l-1 is first subjected to layer normalization (LN), and then through the window-based multi-head self-attention mechanism (window based multi-head self-attention, W-MSA) processes the output of LN, and adds W-MSA to z ^l-1 to obtain

Input the second LN, process the output of the second LN through the multi-layer perception mechanism (MLP), and then combine the output of the MLP with

Add up to get z ^l ;

z ^l is input to the third LN, and the output of the third LN is processed through the multi-head self-attention mechanism (shifted window based multi-head self-attention, SW-MSA) based on the sliding window, and then the SW-MSA is combined with Add z ^l to get

is input to the fourth LN, the output of the fourth LN is then subjected to MLP, and the output of the second MLP is combined with

After adding, z ^l+1 is obtained, and z ^l+1 is the coding feature map of the coding layer of this layer.

Through the multi-head self-attention mechanism based on the window, the multi-head self-attention mechanism based on the sliding window, the multi-layer perception mechanism and layer normalization processing, the input encoding feature map is extracted and the encoding feature map is output. The CNN volume is introduced. The locality of the product operation can also control the overall calculation amount, extract features from the image more accurately, and improve the accuracy of image processing tasks.

In this embodiment, feature extraction is carried out through a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization processing, and the encoded feature map is output, so that the image can be extracted more accurately Feature extraction improves the accuracy of chemical formula recognition.

Further, after the above step S204, it may also include: when it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance; determining the similar chemical formula as the recognized chemical formula.

Specifically, when it is determined that the candidate chemical formula does not exist based on the verification results, the edit distance between the candidate chemical formula and each chemical formula in the chemical formula database can be calculated. The edit distance is also called Levenshtein Distance, which is based on two characters. A quantitative measure of the degree of difference between strings by looking at the minimum number of processes required to turn one string into another.

Select the chemical formula with the shortest edit distance to the candidate chemical formula, which chemical formula is most similar to the candidate chemical formula, and determine it as the identified chemical formula.

In this embodiment, when it is determined that the candidate chemical formula does not exist according to the verification results, similar chemical formulas of the candidate chemical formula are searched based on the edit distance, and the similar chemical formulas are regarded as the identified chemical formulas, thereby correcting the recognition result.

This application involves neural networks, machine learning and computer vision in the field of artificial intelligence.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a computer-readable storage medium. , when executed, the computer-readable instructions may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage media can be non-volatile storage media such as magnetic disks, optical disks, read-only memory (Read-Only Memory, ROM), or random access memory (Random Access Memory, RAM), etc.

It should be understood that although various steps in the flowchart of the accompanying drawings are shown in sequence as indicated by arrows, these steps are not necessarily performed in the order indicated by arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least some of the steps in the flow chart of the accompanying drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and their execution order is also It does not necessarily need to be performed sequentially, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.

With further reference to Figure 3, as an implementation of the method shown in Figure 2, the present application provides an embodiment of a chemical formula identification device. The device embodiment corresponds to the method embodiment shown in Figure 2. The device can specifically Used in various electronic equipment.

As shown in Figure 3, the chemical formula identification device 300 in this embodiment includes: an image acquisition module 301, an area detection module 302, a chemical formula identification module 303, a chemical formula verification module 304 and a chemical formula determination module 305, wherein:

The image acquisition module 301 is used to acquire an image to be detected containing a chemical formula.

The region detection module 302 is used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image.

The chemical formula recognition module 303 is used to input the chemical formula region image into the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image.

The chemical formula verification module 304 is used to verify the existence of candidate chemical formulas based on a pre-established chemical formula database and obtain verification results.

The chemical formula determination module 305 is configured to determine the candidate chemical formula as the recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.

In some optional implementations of this embodiment, the region detection module 302 may include: a feature map generation sub-module, a feature extraction sub-module and a region detection sub-module, where:

The feature map generation submodule is used to input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected.

The feature extraction submodule is used to input the first feature map into the feature extraction network in the multi-target detection model to obtain the second feature map.

The region detection submodule is used to input the second feature map into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.

In some optional implementations of this embodiment, the chemical formula identification device 300 may include: a preprocessing module, which is used to preprocess the chemical formula region image. The preprocessing includes binarization processing, image thinning processing, and Image scaling processing.

In some optional implementations of this embodiment, the chemical formula identification module 303 may include: an encoding sub-module and a decoding sub-module, where:

The encoding submodule is used to input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map.

The decoding submodule is used to input the encoded feature map into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.

In some optional implementations of this embodiment, the multi-level encoder includes several sequentially connected coding layers; the encoding sub-module may include: an image fragmentation unit, a fragmentation input unit, an iteration unit and a feature map determination unit, in:

The image segmentation unit is used to segment the chemical formula region image to obtain segmented images.

The fragment input unit is used to input the fragmented image into the first coding layer to obtain the coding feature map.

The iteration unit is used to downsample the coding feature map for the coding layer after the first layer, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer.

The feature map determination unit is used to determine the coded feature map output by the last coding layer as the coded feature map.

In some optional implementations of this embodiment, the coding layer includes a coding network; the slice input unit may include: an embedding processing subunit and a feature extraction subunit, where:

The embedding processing subunit is used to perform linear embedding processing on fragmented images.

The feature extraction subunit is used to input the linearly embedded sliced image into the coding network in the first coding layer to use the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, and the multi-layer The perception mechanism and layer normalization process extract features from the sliced images after linear embedding processing to obtain the encoded feature map.

In some optional implementations of this embodiment, the chemical formula identification device 300 may include: a similarity determination module and a similarity determination module, wherein:

The similarity determination module is used to determine similar chemical formulas of the candidate chemical formula based on edit distance when it is determined according to the verification results that the candidate chemical formula does not exist.

Similarity determination module is used to determine similar chemical formulas as recognized chemical formulas.

For specific limitations on the chemical formula identification device, please refer to the above limitations on the chemical formula identification method, which will not be described again here. Each module in the above-mentioned chemical formula recognition device can be implemented in whole or in part by software, hardware, and combinations thereof. For example, in one embodiment, the area detection module actually corresponds to the multi-target detection model, and the chemical formula recognition module corresponds to the chemical formula recognition model. . Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In order to solve the above technical problems, embodiments of the present application also provide computer equipment. Please refer to Figure 4 for details. Figure 4 is a basic structural block diagram of the computer equipment in this embodiment.

The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other for communication through a system bus. It should be noted that only the computer device 4 having components 41 - 43 is shown in the figure, but it should be understood that implementation of all the components shown is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculations and/or information processing according to preset or stored instructions. Its hardware includes but is not limited to microprocessors, special-purpose Integrated circuits (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.

The computer device may be a desktop computer, a notebook, a PDA, a cloud server and other computing devices. The computer device can perform human-computer interaction with the user through keyboard, mouse, remote control, touch panel or voice control device.

The memory 41 includes at least one type of computer-readable storage medium. The computer-readable storage medium can be non-volatile or volatile. The computer-readable storage medium includes flash memory, hard disk, and multimedia card. , card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programming read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or memory of the computer device 4 . In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMC), or a secure digital card equipped on the computer device 4. (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is usually used to store operating systems and various application software installed on the computer device 4, such as computer-readable instructions for chemical formula identification methods, etc. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.

The processor 42 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 42 is generally used to control the overall operation of the computer device 4 . In this embodiment, the processor 42 is configured to run computer-readable instructions stored in the memory 41 or process data, such as running computer-readable instructions for the chemical formula identification method.

The network interface 43 may include a wireless network interface or a wired network interface. The network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.

The computer device provided in this embodiment can execute the above chemical formula identification method. The chemical formula identification method here may be the chemical formula identification method of each of the above embodiments.

The present application also provides another implementation, that is, a computer-readable storage medium is provided, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the chemical formula identification method as described above.

Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present application.

Obviously, the above-described embodiments are only some of the embodiments of the present application, rather than all the embodiments. The preferred embodiments of the present application are given in the drawings, but do not limit the patent scope of the present application. The present application may be embodied in many different forms; rather, these embodiments are provided in order to provide a thorough and comprehensive understanding of the disclosure of the present application. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. . Any equivalent structure made using the contents of the specification and drawings of this application and directly or indirectly used in other related technical fields shall likewise fall within the scope of patent protection of this application.

Claims

A chemical formula identification method includes the following steps:

Obtain the image to be detected containing the chemical formula;

Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;

Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;

Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;

When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
The chemical formula recognition method according to claim 1, wherein the step of inputting the image to be detected into a multi-target detection model to obtain the chemical formula region image includes:

Input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected;

Input the first feature map into the feature extraction network in the multi-target detection model to obtain a second feature map;

The second feature map is input into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
The chemical formula recognition method according to claim 1, wherein before the step of inputting the chemical formula region image into a chemical formula recognition model to obtain the candidate chemical formula in the chemical formula region image, it further includes:

The chemical formula region image is preprocessed, and the preprocessing includes binarization processing, image thinning processing, and image scaling processing.
The chemical formula recognition method according to claim 1, wherein the step of inputting the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image includes:

Input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map;

The encoded feature map is input into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
The chemical formula recognition method according to claim 4, wherein the multi-level encoder includes several sequentially connected coding layers; the multi-level encoder that inputs the chemical formula region image into the chemical formula recognition model obtains the encoded The steps of feature map include:

Perform segmentation processing on the chemical formula region image to obtain segmented images;

Input the fragmented image into the first coding layer to obtain a coding feature map;

For the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer;

The coding feature map output by the last coding layer is determined as a coded feature map.
The chemical formula identification method according to claim 5, wherein the coding layer includes a coding network; the step of inputting the sliced image into the first coding layer to obtain the coding feature map includes:

Perform linear embedding processing on the sliced images;

The linearly embedded sliced image is input into the encoding network in the first encoding layer to pass the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, the multi-layer perception mechanism and layer normalization. Processing: Perform feature extraction on the sliced image processed by the linear embedding to obtain a coded feature map.
The chemical formula identification method according to claim 1, wherein after the step of performing an existence check on the candidate chemical formula according to the pre-established chemical formula database and obtaining the verification result, it further includes:

When it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance;

The similar chemical formula is determined as a recognized chemical formula.
A chemical formula identification device including:

Image acquisition module, used to acquire images to be detected containing chemical formulas;

A region detection module, used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image;

A chemical formula recognition module, used to input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;

A chemical formula verification module, used to verify the existence of the candidate chemical formula according to a pre-established chemical formula database and obtain verification results;

A chemical formula determination module, configured to determine the candidate chemical formula as a recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.
A computer device includes a memory and a processor. Computer-readable instructions are stored in the memory. When the processor executes the computer-readable instructions, the following steps are implemented:

Obtain the image to be detected containing the chemical formula;

Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;

Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;

Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;

When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
The computer device according to claim 9, wherein the step of inputting the image to be detected into a multi-target detection model to obtain the chemical formula region image includes:

Input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected;

Input the first feature map into the feature extraction network in the multi-target detection model to obtain a second feature map;

The second feature map is input into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
The computer device according to claim 9, wherein the step of inputting the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image includes:

Input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map;

The encoded feature map is input into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
The computer device according to claim 11, wherein the multi-level encoder includes several sequentially connected coding layers; the multi-level encoder inputs the chemical formula region image into the chemical formula recognition model to obtain the encoded features. The steps in the diagram include:

Perform segmentation processing on the chemical formula region image to obtain segmented images;

Input the fragmented image into the first coding layer to obtain a coding feature map;

For the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer;

The coding feature map output by the last coding layer is determined as a coded feature map.
The computer device according to claim 12, wherein the coding layer includes a coding network; the step of inputting the sliced image into the first coding layer to obtain the coding feature map includes:

Perform linear embedding processing on the sliced images;

The linearly embedded sliced image is input into the encoding network in the first encoding layer to pass the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, the multi-layer perception mechanism and layer normalization. Processing: Perform feature extraction on the sliced image processed by the linear embedding to obtain a coded feature map.
The computer device according to claim 9, wherein after the step of verifying the existence of the candidate chemical formula according to the pre-established chemical formula database and obtaining the verification result, the processor executes the computer readable instructions The following steps are also implemented:

When it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance;

The similar chemical formula is determined as a recognized chemical formula.
A computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions; when the computer-readable instructions are executed by a processor, the following steps are implemented:

Obtain the image to be detected containing the chemical formula;

Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;

Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;

Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;

When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
The computer-readable storage medium according to claim 15, wherein the step of inputting the image to be detected into a multi-target detection model to obtain the chemical formula region image includes:

Input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected;

Input the first feature map into the feature extraction network in the multi-target detection model to obtain a second feature map;

The second feature map is input into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
The computer-readable storage medium according to claim 15, wherein the step of inputting the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image includes:

Input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map;

The encoded feature map is input into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
The computer-readable storage medium according to claim 17, wherein the multi-level encoder includes several sequentially connected coding layers; the multi-level encoder that inputs the chemical formula region image into the chemical formula recognition model obtains The steps for encoding feature maps include:

Perform segmentation processing on the chemical formula region image to obtain segmented images;

Input the fragmented image into the first coding layer to obtain a coding feature map;

For the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer;

The coding feature map output by the last coding layer is determined as a coded feature map.
The computer-readable storage medium according to claim 18, wherein the coding layer includes a coding network; the step of inputting the sliced image into the first coding layer to obtain the coding feature map includes:

Perform linear embedding processing on the sliced images;

The linearly embedded sliced image is input into the encoding network in the first encoding layer to pass the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, the multi-layer perception mechanism and layer normalization. Processing: Perform feature extraction on the sliced image processed by the linear embedding to obtain a coded feature map.
The computer-readable storage medium according to claim 15, wherein after the step of verifying the existence of the candidate chemical formula according to the pre-established chemical formula database and obtaining the verification result, the computer-readable instruction is The processor also implements the following steps when executing:

When it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance;

The similar chemical formula is determined as a recognized chemical formula.