WO2023173536A1 - Chemical formula identification method and apparatus, computer device, and storage medium - Google Patents

Chemical formula identification method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2023173536A1
WO2023173536A1 PCT/CN2022/089509 CN2022089509W WO2023173536A1 WO 2023173536 A1 WO2023173536 A1 WO 2023173536A1 CN 2022089509 W CN2022089509 W CN 2022089509W WO 2023173536 A1 WO2023173536 A1 WO 2023173536A1
Authority
WO
WIPO (PCT)
Prior art keywords
chemical formula
image
feature map
coding
input
Prior art date
Application number
PCT/CN2022/089509
Other languages
French (fr)
Chinese (zh)
Inventor
郑喜民
朱翌
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023173536A1 publication Critical patent/WO2023173536A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a chemical formula identification method, device, computer equipment and storage medium.
  • Target detection also called target extraction, uses computers to find targets or objects of interest in images and determine their categories and locations.
  • Target detection is an important topic in the field of computer vision.
  • Target detection is often associated with image description (Image Caption), which means that the computer generates corresponding descriptive text based on the input image.
  • the recognition of chemical formulas is a branch of target detection and image description tasks.
  • the inventor realized that traditional chemical formula recognition technology is based on computer vision technology and requires a series of rule designs, including image vectorization, image decomposition, image thinning, line enhancement, optical character recognition, and reconstructed molecular graphics. Representation, etc., and the process is complex, making chemical formula recognition less efficient.
  • the purpose of the embodiments of the present application is to propose a chemical formula recognition method, device, computer equipment and storage medium to solve the problem of low chemical formula recognition efficiency.
  • the candidate chemical formula is determined as the recognized chemical formula.
  • Image acquisition module used to acquire images to be detected containing chemical formulas
  • a region detection module used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image
  • a chemical formula recognition module used to input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
  • a chemical formula verification module used to verify the existence of the candidate chemical formula according to a pre-established chemical formula database and obtain verification results
  • a chemical formula determination module configured to determine the candidate chemical formula as a recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.
  • embodiments of the present application also provide a computer device, including a memory and a processor.
  • Computer-readable instructions are stored in the memory.
  • the processor executes the computer-readable instructions, the following steps are implemented:
  • the candidate chemical formula is determined as the recognized chemical formula.
  • embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the candidate chemical formula is determined as the recognized chemical formula.
  • the embodiments of the present application mainly have the following beneficial effects: after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and obtain the chemical formula area image; and then the chemical formula is The regional image is input into the chemical formula recognition model to identify the chemical formula and obtain the candidate chemical formula; the chemical formula database stores known chemical formulas. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the recognized chemical formula, ensuring the accuracy of chemical formula recognition. Accuracy; the multi-target detection model and chemical formula recognition model in this application can be neural networks, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Figure 2 is a flow chart of an embodiment of a chemical formula identification method according to the present application.
  • Figure 3 is a schematic structural diagram of an embodiment of a chemical formula identification device according to the present application.
  • Figure 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • an embodiment means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application.
  • the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art understand, both explicitly and implicitly, that the embodiments described herein may be combined with other embodiments.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105.
  • the network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105.
  • Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • Terminal devices 101, 102, 103 Users can use terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, etc.
  • Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.
  • Terminal devices 101, 102, and 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compresses Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compresses Standard Audio Layer 4) players, laptops and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compresses Standard Audio Layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compresses Standard Audio Layer 4
  • the server 105 may be a server that provides various services, such as a backend server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • the chemical formula identification method provided in the embodiments of the present application is generally executed by a server, and accordingly, the chemical formula identification device is generally installed in the server.
  • the chemical formula identification method includes the following steps:
  • Step S201 Obtain an image to be detected containing a chemical formula.
  • the electronic device (such as the server shown in Figure 1) on which the chemical formula identification method runs can communicate with the terminal through a wired connection or a wireless connection.
  • the above wireless connection methods may include but are not limited to 3G/4G/5G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connections that are now known or developed in the future. Connection method.
  • the server obtains the image to be detected, the image to be detected contains the chemical formula, and the server needs to extract the chemical formula from the image to be detected.
  • the image to be detected can be obtained through various methods, for example, it can be obtained by scanning; or, the image to be detected can be obtained by taking a picture by the terminal and then sent to the server.
  • the above-mentioned image to be detected can also be stored in a node of the blockchain.
  • Blockchain is a new application model of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain is essentially a decentralized database. It is a series of data blocks generated using cryptographic methods. Each data block contains a batch of network transaction information and is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • Blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Step S202 Input the image to be detected into the multi-target detection model to obtain a chemical formula region image.
  • the multi-target detection model may be a model that detects image areas containing chemical formulas in the image to be detected.
  • the image to be detected may include information such as chemical formulas and other text information. Therefore, the position of the chemical formula in the image to be detected can be determined first, that is, the chemical formula region image in which the chemical formula exists is determined.
  • the image to be detected can be input into a pre-trained multi-target detection model, and the multi-target detection model can be composed of a neural network capable of realizing target detection. It can identify the image area containing the chemical formula in the image to be detected and output the position information of the image area.
  • the multi-target detection model undergoes multi-label supervised training in advance, and can identify multiple chemical formula region images from the image to be detected.
  • chemical formula region image recognition can be performed through DETR (DEtection TRansformer, an end-to-end target detection network).
  • Step S203 Input the chemical formula region image into the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image.
  • the chemical formula recognition model may be a model used for chemical formula recognition.
  • the chemical formula region image is input into the chemical formula recognition model.
  • the chemical formula recognition model can be built based on the neural network and has been trained with end-to-end image description (Image Caption) in advance.
  • the chemical formula recognition model first encodes the chemical formula region image, then decodes it, and outputs the chemical formula in the chemical formula region image to obtain the candidate chemical formula.
  • the candidate chemical formula may be a chemical formula expressed in SMILES or InCHI, where SMILES and InCHI are two existing expression methods for chemical formulas.
  • Step S204 Existence verification is performed on the candidate chemical formulas based on the pre-established chemical formula database to obtain verification results.
  • the chemical formula database may be a pre-established database and may store all known chemical formulas.
  • an existence check is performed on the candidate chemical formula according to a pre-established chemical formula database; the existence check refers to searching in the chemical formula database whether there is a chemical formula that is the same as the candidate chemical formula.
  • the search result can be a chemical formula If there is a chemical formula that is the same as the candidate chemical formula in the database, or there is no chemical formula that is the same as the candidate chemical formula, the search result is the verification result.
  • Step S205 When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
  • the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database.
  • the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with
  • the chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
  • step S202 may include: inputting the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected; inputting the first feature map into the feature extraction network in the multi-target detection model, Obtain the second feature map; input the second feature map into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
  • the multi-target detection model has a three-layer architecture, including a feature generation network, a feature extraction network, and a detection layer.
  • the image to be detected is preferably input into the feature generation network, and the feature generation network converts the input image to be detected into a feature map, thereby generating a first feature map.
  • the feature generation network may be a CNN (Convolutional Neural Networks, convolutional neural network) backbone network.
  • CNN Convolutional Neural Networks, convolutional neural network
  • the first feature map is input into the feature extraction network for feature extraction, and the second feature map is obtained.
  • the feature extraction network may be a Transformer network.
  • Transformer is a model proposed by Google that makes extensive use of the self-attention (Self-Attention) mechanism.
  • the Transformer network includes an encoder and a decoder.
  • the first feature map can be converted into a one-dimensional feature map by the multi-target detection model, and then input to the Transformer encoder.
  • the output of the Transformer encoder is N fixed-length embedding vectors, where N is The number of objects in the image hypothesized by the network. In this application, N is the number of chemical formula region images hypothesized by the network.
  • the Transformer decoder processes the embedding vector based on the self-attention mechanism to obtain the second feature map.
  • the second feature map is input to the detection layer.
  • the detection layer includes a feedforward neural network and can output the category and location information of each image area in the image to be detected.
  • the category is used to indicate whether the image area contains a chemical formula
  • the location information is used to indicate the image area. position in the image to be detected. According to the category and position information output by the detection layer, the chemical formula region image in the image to be detected can be obtained.
  • the image to be detected is cropped according to the category and location information, and the image area containing the chemical formula is cropped out to obtain a chemical formula area image.
  • the multi-target detection model needs to be trained in advance.
  • the training can be end-to-end training. Since the multi-target detection model only needs to detect chemical formula region images and background images, the number of classes is set to 2; since the background image occupies In most areas, focal loss can be used to solve the problem of imbalance between positive and negative samples.
  • the Transformer decoder outputs a dictionary consisting of chemical elements.
  • the multi-target detection model can be trained not only on data sets that originally contain chemical formulas, but also on a large number of synthetic data sets.
  • the cut-paste method is used to combine the collected document data sets and chemical formula data sets. Combining random scaling and automatically labeling its location information can save the labor cost of manually labeling data sets and improve training efficiency.
  • the multi-target detection model includes a three-layer architecture.
  • the image to be detected can be first converted into a first feature map, then feature extraction is performed to obtain a second feature map, and then the position information of the image area containing the chemical formula is obtained through the detection layer. , thereby successfully obtaining the chemical formula region image.
  • step S203 it may also include: preprocessing the chemical formula region image, and the preprocessing includes binarization processing, image thinning processing, and image scaling processing.
  • the chemical formula region image may also be preprocessed, where the preprocessing may include binarization processing, image thinning processing, and image scaling processing.
  • the image scaling process is to adjust the chemical formula region image to a preset size.
  • the chemical formula region image can be image optimized to facilitate the processing of the chemical formula recognition model and ensure the accuracy of chemical formula recognition.
  • step S203 may include: inputting the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map; inputting the encoded feature map into the decoder in the chemical formula recognition model to obtain the decoder in the chemical formula region image.
  • the chemical formula recognition model can include a multi-level encoder and a decoder.
  • the multi-level encoder is responsible for encoding the input chemical formula region image to obtain the encoded feature map; the decoder is responsible for decoding the encoded feature map and outputting the candidate chemical formula in the chemical formula region image.
  • the multi-level encoder can be built based on the Swin Transformer encoder.
  • Swin Transformer includes sliding window operations and has a hierarchical design. It is a Transformer specially used for image processing tasks. Swin Transformer adopts a hierarchical design, including 4 stages. Each stage will reduce the resolution of the input feature map and expand the receptive field layer by layer like a CNN network. Window attention divides the image into different windows according to a certain size. Each transformer's attention is only calculated inside the window. The shift window attention in Swin Transformer changes the way the window is divided, so that the window block for attention calculation of each pixel is changing. Its sliding window operation includes non-overlapping local window and overlapping cross-window. Limiting the attention calculation to a window can, on the one hand, introduce the locality of the CNN convolution operation, and on the other hand, save the amount of calculation.
  • the decoder in the chemical formula recognition model can be built based on the decoder Decoder in Transformer. It uses masked multi-head attention, so that the model only sees past data but not future data. It uses the hidden value of the decoder. The value of Layer (as Q) and the value of the hidden Layer in the encoder part (as K) are used as attention, and then the input of the encoder is used as V, which is weighted to the input of the decoder.
  • the chemical formula recognition model can be a neural network, including a multi-level encoder and a decoder, which first extracts features of the chemical formula region image and then decodes them to obtain candidate chemical formulas.
  • a multi-level encoder includes several sequentially connected coding layers.
  • the multi-layer encoder may include four sequentially connected encoding layers, that is, four stages (stage1, stage2, stage3, and stage4). The four stages will gradually reduce the resolution of the input feature map to expand the experience. wild.
  • the chemical formula region image is first patched by a multi-layer encoder (Patch Partition) to obtain multiple patch images, and then input to the first coding layer for coding processing for feature extraction, and the coding output by the first coding layer is obtained.
  • Feature map is first patched by a multi-layer encoder (Patch Partition) to obtain multiple patch images, and then input to the first coding layer for coding processing for feature extraction, and the coding output by the first coding layer is obtained.
  • the coding feature map output by the first coding layer is input to the second coding layer, it is first subjected to downsampling processing (Patch Merging) to reduce the resolution; among them, the coding feature map is input into the second coding layer and subsequent coding layer, the downsampling process will be performed according to the downsampling standard set by the corresponding coding layer, so that the number of channels of the output coding feature map gradually increases.
  • the coding network in the second coding layer is input for coding processing, and the coding feature map output by the second coding layer is obtained.
  • the coding feature map output by the second coding layer is input to the subsequent coding layer for iteration until the last coding layer; the coding feature map output by the last coding layer will be used as the coded feature map.
  • multiple sequentially connected coding layers in the multi-level encoder encode the chemical formula region image layer by layer and reduce the resolution layer by layer, which can fully extract features of the chemical formula region image and ensure chemical formula recognition. accuracy.
  • the coding layer includes a coding network; the above steps of inputting the segmented image into the first coding layer to obtain the coding feature map may include: performing linear embedding processing on the segmented image; inputting the segmented image after linear embedding processing
  • the encoding network in the first encoding layer uses a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization to process the sliced image after linear embedding. Perform feature extraction to obtain the encoded feature map.
  • the first coding layer first performs linear embedding on the segmented image, and then outputs the linearly embedded segmented image to the coding network in the first coding layer.
  • the coding network is Swin Trasnformer Block.
  • Each coding layer in the multi-layer coding layer has a Swin Trasnformer Block.
  • Each Swin Trasnformer Block has the same internal processing logic except for the input and output dimensions.
  • the encoding network in the first coding layer characterizes the sliced images after linear embedding processing through window-based multi-head self-attention mechanism, sliding window-based multi-head self-attention mechanism, multi-layer perception mechanism and layer normalization processing. Extract and obtain the coding feature map; the coding network in the second coding layer and subsequent coding layers uses a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization The process performs feature extraction on the input encoding feature map and outputs a new encoding feature map.
  • the idea of residual connection is adopted inside each coding network Swin Trasnformer Block.
  • the network input z l-1 is first subjected to layer normalization (LN), and then through the window-based multi-head self-attention mechanism (window based multi-head self-attention, W-MSA) processes the output of LN, and adds W-MSA to z l-1 to obtain
  • LN layer normalization
  • W-MSA window based multi-head self-attention
  • Input the second LN process the output of the second LN through the multi-layer perception mechanism (MLP), and then combine the output of the MLP with Add up to get z l ;
  • MLP multi-layer perception mechanism
  • z l is input to the third LN, and the output of the third LN is processed through the multi-head self-attention mechanism (shifted window based multi-head self-attention, SW-MSA) based on the sliding window, and then the SW-MSA is combined with Add z l to get
  • the multi-head self-attention mechanism shifted window based multi-head self-attention, SW-MSA
  • the multi-head self-attention mechanism based on the window Through the multi-head self-attention mechanism based on the window, the multi-head self-attention mechanism based on the sliding window, the multi-layer perception mechanism and layer normalization processing, the input encoding feature map is extracted and the encoding feature map is output.
  • the CNN volume is introduced.
  • the locality of the product operation can also control the overall calculation amount, extract features from the image more accurately, and improve the accuracy of image processing tasks.
  • feature extraction is carried out through a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization processing, and the encoded feature map is output, so that the image can be extracted more accurately Feature extraction improves the accuracy of chemical formula recognition.
  • step S204 it may also include: when it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance; determining the similar chemical formula as the recognized chemical formula.
  • the edit distance between the candidate chemical formula and each chemical formula in the chemical formula database can be calculated.
  • the edit distance is also called Levenshtein Distance, which is based on two characters. A quantitative measure of the degree of difference between strings by looking at the minimum number of processes required to turn one string into another.
  • This application involves neural networks, machine learning and computer vision in the field of artificial intelligence.
  • the computer-readable instructions can be stored in a computer-readable storage medium. , when executed, the computer-readable instructions may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage media can be non-volatile storage media such as magnetic disks, optical disks, read-only memory (Read-Only Memory, ROM), or random access memory (Random Access Memory, RAM), etc.
  • the present application provides an embodiment of a chemical formula identification device.
  • the device embodiment corresponds to the method embodiment shown in Figure 2.
  • the device can specifically Used in various electronic equipment.
  • the chemical formula identification device 300 in this embodiment includes: an image acquisition module 301, an area detection module 302, a chemical formula identification module 303, a chemical formula verification module 304 and a chemical formula determination module 305, wherein:
  • the image acquisition module 301 is used to acquire an image to be detected containing a chemical formula.
  • the region detection module 302 is used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image.
  • the chemical formula recognition module 303 is used to input the chemical formula region image into the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image.
  • the chemical formula verification module 304 is used to verify the existence of candidate chemical formulas based on a pre-established chemical formula database and obtain verification results.
  • the chemical formula determination module 305 is configured to determine the candidate chemical formula as the recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.
  • the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database.
  • the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with
  • the chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
  • the region detection module 302 may include: a feature map generation sub-module, a feature extraction sub-module and a region detection sub-module, where:
  • the feature map generation submodule is used to input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected.
  • the feature extraction submodule is used to input the first feature map into the feature extraction network in the multi-target detection model to obtain the second feature map.
  • the region detection submodule is used to input the second feature map into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
  • the multi-target detection model includes a three-layer architecture.
  • the image to be detected can be first converted into a first feature map, then feature extraction is performed to obtain a second feature map, and then the position information of the image area containing the chemical formula is obtained through the detection layer. , thereby successfully obtaining the chemical formula region image.
  • the chemical formula identification device 300 may include: a preprocessing module, which is used to preprocess the chemical formula region image.
  • the preprocessing includes binarization processing, image thinning processing, and Image scaling processing.
  • the chemical formula region image can be image optimized to facilitate the processing of the chemical formula recognition model and ensure the accuracy of chemical formula recognition.
  • the chemical formula identification module 303 may include: an encoding sub-module and a decoding sub-module, where:
  • the encoding submodule is used to input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map.
  • the decoding submodule is used to input the encoded feature map into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
  • the chemical formula recognition model can be a neural network, including a multi-level encoder and a decoder, which first extracts features of the chemical formula region image and then decodes them to obtain candidate chemical formulas.
  • the multi-level encoder includes several sequentially connected coding layers;
  • the encoding sub-module may include: an image fragmentation unit, a fragmentation input unit, an iteration unit and a feature map determination unit, in:
  • the image segmentation unit is used to segment the chemical formula region image to obtain segmented images.
  • the fragment input unit is used to input the fragmented image into the first coding layer to obtain the coding feature map.
  • the iteration unit is used to downsample the coding feature map for the coding layer after the first layer, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer.
  • the feature map determination unit is used to determine the coded feature map output by the last coding layer as the coded feature map.
  • multiple sequentially connected coding layers in the multi-level encoder encode the chemical formula region image layer by layer and reduce the resolution layer by layer, which can fully extract features of the chemical formula region image and ensure chemical formula recognition. accuracy.
  • the coding layer includes a coding network;
  • the slice input unit may include: an embedding processing subunit and a feature extraction subunit, where:
  • the embedding processing subunit is used to perform linear embedding processing on fragmented images.
  • the feature extraction subunit is used to input the linearly embedded sliced image into the coding network in the first coding layer to use the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, and the multi-layer
  • the perception mechanism and layer normalization process extract features from the sliced images after linear embedding processing to obtain the encoded feature map.
  • feature extraction is carried out through a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization processing, and the encoded feature map is output, so that the image can be extracted more accurately Feature extraction improves the accuracy of chemical formula recognition.
  • the chemical formula identification device 300 may include: a similarity determination module and a similarity determination module, wherein:
  • the similarity determination module is used to determine similar chemical formulas of the candidate chemical formula based on edit distance when it is determined according to the verification results that the candidate chemical formula does not exist.
  • Similarity determination module is used to determine similar chemical formulas as recognized chemical formulas.
  • Each module in the above-mentioned chemical formula recognition device can be implemented in whole or in part by software, hardware, and combinations thereof.
  • the area detection module actually corresponds to the multi-target detection model
  • the chemical formula recognition module corresponds to the chemical formula recognition model.
  • Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • Figure 4 is a basic structural block diagram of the computer equipment in this embodiment.
  • the computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other for communication through a system bus. It should be noted that only the computer device 4 having components 41 - 43 is shown in the figure, but it should be understood that implementation of all the components shown is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculations and/or information processing according to preset or stored instructions. Its hardware includes but is not limited to microprocessors, special-purpose Integrated circuits (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer device may be a desktop computer, a notebook, a PDA, a cloud server and other computing devices.
  • the computer device can perform human-computer interaction with the user through keyboard, mouse, remote control, touch panel or voice control device.
  • the memory 41 includes at least one type of computer-readable storage medium.
  • the computer-readable storage medium can be non-volatile or volatile.
  • the computer-readable storage medium includes flash memory, hard disk, and multimedia card. , card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programming read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or memory of the computer device 4 .
  • the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMC), or a secure digital card equipped on the computer device 4. (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device.
  • the memory 41 is usually used to store operating systems and various application software installed on the computer device 4, such as computer-readable instructions for chemical formula identification methods, etc.
  • the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 42 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 42 is generally used to control the overall operation of the computer device 4 .
  • the processor 42 is configured to run computer-readable instructions stored in the memory 41 or process data, such as running computer-readable instructions for the chemical formula identification method.
  • the network interface 43 may include a wireless network interface or a wired network interface.
  • the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
  • the computer device provided in this embodiment can execute the above chemical formula identification method.
  • the chemical formula identification method here may be the chemical formula identification method of each of the above embodiments.
  • the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database.
  • the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with
  • the chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
  • the present application also provides another implementation, that is, a computer-readable storage medium is provided, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the chemical formula identification method as described above.
  • the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database.
  • the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with
  • the chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
  • the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present application.

Abstract

The present application relates to the field of artificial intelligence, and in particular to a chemical formula identification method and apparatus, a computer device, and a storage medium. The method comprises: obtaining an image to be detected comprising a chemical formula; inputting the image to be detected into a multi-target detection model to obtain a chemical formula region image; inputting the chemical formula region image into a chemical formula identification model to obtain a candidate chemical formula in the chemical formula region image; performing existence check on the candidate chemical formula according to a pre-established chemical formula database to obtain a check result; and when it is determined, according to the check result, that the candidate chemical formula exists, determining the candidate chemical formula as an identified chemical formula. In addition, the present application also relates to blockchain technology, and the image to be detected can be stored in a blockchain. The present application improves the efficiency of chemical formula identification.

Description

化学式识别方法、装置、计算机设备及存储介质Chemical formula identification method, device, computer equipment and storage medium
本申请要求于2022年03月15日提交中国专利局、申请号为202210255360.0,发明名称为“化学式识别方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application submitted to the China Patent Office on March 15, 2022, with application number 202210255360.0 and the invention title "Chemical formula identification method, device, computer equipment and storage medium", the entire content of which is incorporated by reference. in this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种化学式识别方法、装置、计算机设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a chemical formula identification method, device, computer equipment and storage medium.
背景技术Background technique
随着计算机技术的发展,通过计算机进行目标检测(Object Detection)也越发广泛。目标检测也叫目标提取,是通过计算机找出图像中感兴趣的目标或物体,确定它们的类别和位置,目标检测是计算机视觉领域的重要议题。目标检测常常关联于图像描述(Image Caption),是指计算机根据输入的图像生成对应的描述性文字。With the development of computer technology, object detection through computers has become more and more widespread. Target detection, also called target extraction, uses computers to find targets or objects of interest in images and determine their categories and locations. Target detection is an important topic in the field of computer vision. Target detection is often associated with image description (Image Caption), which means that the computer generates corresponding descriptive text based on the input image.
化学式的识别是目标检测与图像描述任务中的一个分支。发明人意识到,传统的化学式识别技术,是基于计算机视觉技术实现的,需要进行一系列的规则设计,包括图像矢量化、图像分解、图像细化、线条增强、光学字符识别、重建分子的图形表示等,流程复杂,使得化学式识别的效率较低。The recognition of chemical formulas is a branch of target detection and image description tasks. The inventor realized that traditional chemical formula recognition technology is based on computer vision technology and requires a series of rule designs, including image vectorization, image decomposition, image thinning, line enhancement, optical character recognition, and reconstructed molecular graphics. Representation, etc., and the process is complex, making chemical formula recognition less efficient.
发明内容Contents of the invention
本申请实施例的目的在于提出一种化学式识别方法、装置、计算机设备及存储介质,以解决化学式识别效率较低的问题。The purpose of the embodiments of the present application is to propose a chemical formula recognition method, device, computer equipment and storage medium to solve the problem of low chemical formula recognition efficiency.
为了解决上述技术问题,本申请实施例提供一种化学式识别方法,采用了如下所述的技术方案:In order to solve the above technical problems, embodiments of the present application provide a chemical formula identification method, which adopts the following technical solution:
获取包含化学式的待检测图像;Obtain the image to be detected containing the chemical formula;
将所述待检测图像输入多目标检测模型,得到化学式区域图像;Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;
当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
为了解决上述技术问题,本申请实施例还提供一种化学式识别装置,采用了如下所述的技术方案:In order to solve the above technical problems, embodiments of the present application also provide a chemical formula identification device, which adopts the following technical solution:
图像获取模块,用于获取包含化学式的待检测图像;Image acquisition module, used to acquire images to be detected containing chemical formulas;
区域检测模块,用于将所述待检测图像输入多目标检测模型,得到化学式区域图像;A region detection module, used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
化学式识别模块,用于将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;A chemical formula recognition module, used to input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
化学式校验模块,用于根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;A chemical formula verification module, used to verify the existence of the candidate chemical formula according to a pre-established chemical formula database and obtain verification results;
化学式确定模块,用于当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。A chemical formula determination module, configured to determine the candidate chemical formula as a recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:In order to solve the above technical problems, embodiments of the present application also provide a computer device, including a memory and a processor. Computer-readable instructions are stored in the memory. When the processor executes the computer-readable instructions, the following steps are implemented:
获取包含化学式的待检测图像;Obtain the image to be detected containing the chemical formula;
将所述待检测图像输入多目标检测模型,得到化学式区域图像;Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;
当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:In order to solve the above technical problems, embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores computer-readable instructions. When the computer-readable instructions are executed by a processor, the following steps are implemented:
获取包含化学式的待检测图像;Obtain the image to be detected containing the chemical formula;
将所述待检测图像输入多目标检测模型,得到化学式区域图像;Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;
当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
与现有技术相比,本申请实施例主要有以下有益效果:获取待检测图像后,先将待检测图像输入多目标检测模型,以确定化学式所在的图像区域,得到化学式区域图像;然后将化学式区域图像输入化学式识别模型进行化学式的识别,得到候选化学式;化学式数据库中存储有已知的化学式,当根据化学式数据库确定候选化学式真实存在时,将候选化学式确定为已识别化学式,确保了化学式识别的准确性;本申请中的多目标检测模型与化学式识别模型可以是神经网络,减少了人工参与的图像处理规则设计,简化了识别流程,提高了化学式识别的效率。Compared with the existing technology, the embodiments of the present application mainly have the following beneficial effects: after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and obtain the chemical formula area image; and then the chemical formula is The regional image is input into the chemical formula recognition model to identify the chemical formula and obtain the candidate chemical formula; the chemical formula database stores known chemical formulas. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the recognized chemical formula, ensuring the accuracy of chemical formula recognition. Accuracy; the multi-target detection model and chemical formula recognition model in this application can be neural networks, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
附图说明Description of the drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the solutions in this application, a brief introduction will be made below to the drawings needed to be used in describing the embodiments of this application. Obviously, the drawings in the following description are some embodiments of this application and are very useful for this field. Ordinary technicians can also obtain other drawings based on these drawings without exerting creative work.
图1是本申请可以应用于其中的示例性系统架构图;Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的化学式识别方法的一个实施例的流程图;Figure 2 is a flow chart of an embodiment of a chemical formula identification method according to the present application;
图3是根据本申请的化学式识别装置的一个实施例的结构示意图;Figure 3 is a schematic structural diagram of an embodiment of a chemical formula identification device according to the present application;
图4是根据本申请的计算机设备的一个实施例的结构示意图。Figure 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meanings as commonly understood by those skilled in the technical field belonging to this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only. The purpose is not intended to limit the application; the terms "including" and "having" and any variations thereof in the description and claims of the application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second", etc. in the description and claims of this application or the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art understand, both explicitly and implicitly, that the embodiments described herein may be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solution of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104 可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in Figure 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is a medium used to provide communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, etc. Various communication client applications can be installed on the terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。 Terminal devices 101, 102, and 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compresses Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compresses Standard Audio Layer 4) players, laptops and desktop computers, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a backend server that provides support for pages displayed on the terminal devices 101, 102, and 103.
需要说明的是,本申请实施例所提供的化学式识别方法一般由服务器执行,相应地,化学式识别装置一般设置于服务器中。It should be noted that the chemical formula identification method provided in the embodiments of the present application is generally executed by a server, and accordingly, the chemical formula identification device is generally installed in the server.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the number of terminal devices, networks and servers in Figure 1 is only illustrative. Depending on implementation needs, there can be any number of end devices, networks, and servers.
继续参考图2,示出了根据本申请的化学式识别方法的一个实施例的流程图。所述的化学式识别方法,包括以下步骤:Continuing to refer to FIG. 2 , a flow chart of one embodiment of a chemical formula identification method according to the present application is shown. The chemical formula identification method includes the following steps:
步骤S201,获取包含化学式的待检测图像。Step S201: Obtain an image to be detected containing a chemical formula.
在本实施例中,化学式识别方法运行于其上的电子设备(例如图1所示的服务器)可以通过有线连接方式或者无线连接方式与终端进行通信。需要指出的是,上述无线连接方式可以包括但不限于3G/4G/5G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the electronic device (such as the server shown in Figure 1) on which the chemical formula identification method runs can communicate with the terminal through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G/5G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connections that are now known or developed in the future. Connection method.
具体地,服务器获取待检测图像,待检测图像中包含化学式,服务器需要从待检测图像中提取出化学式。待检测图像可以通过多种途径获取,例如,可以通过扫描得到;或者,由终端拍照得到待检测图像,然后发送至服务器。Specifically, the server obtains the image to be detected, the image to be detected contains the chemical formula, and the server needs to extract the chemical formula from the image to be detected. The image to be detected can be obtained through various methods, for example, it can be obtained by scanning; or, the image to be detected can be obtained by taking a picture by the terminal and then sent to the server.
需要强调的是,为进一步保证上述待检测图像的私密和安全性,上述待检测图像还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned image to be detected, the above-mentioned image to be detected can also be stored in a node of the blockchain.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application model of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain is essentially a decentralized database. It is a series of data blocks generated using cryptographic methods. Each data block contains a batch of network transaction information and is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. Blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
步骤S202,将待检测图像输入多目标检测模型,得到化学式区域图像。Step S202: Input the image to be detected into the multi-target detection model to obtain a chemical formula region image.
其中,多目标检测模型可以是检测待检测图像中包含化学式的图像区域的模型。The multi-target detection model may be a model that detects image areas containing chemical formulas in the image to be detected.
具体地,待检测图像中可以包括化学式、其他文本信息等信息,因此,可以先确定化学式在待检测图像中的位置,即确定存在化学式的化学式区域图像。Specifically, the image to be detected may include information such as chemical formulas and other text information. Therefore, the position of the chemical formula in the image to be detected can be determined first, that is, the chemical formula region image in which the chemical formula exists is determined.
可以将待检测图像输入预先训练完毕的多目标检测模型,多目标检测模型可以由能够实现目标检测的神经网络构成。能够识别待检测图像中包含化学式的图像区域,并输出该图像区域的位置信息。The image to be detected can be input into a pre-trained multi-target detection model, and the multi-target detection model can be composed of a neural network capable of realizing target detection. It can identify the image area containing the chemical formula in the image to be detected and output the position information of the image area.
多目标检测模型预先进行多标签的有监督训练,可以从待检测图像中识别出多个化学式区域图像。在一个实施例中,可以通过DETR(DEtection TRansformer,一种端到端的目标检测网络)进行化学式区域图像的识别。The multi-target detection model undergoes multi-label supervised training in advance, and can identify multiple chemical formula region images from the image to be detected. In one embodiment, chemical formula region image recognition can be performed through DETR (DEtection TRansformer, an end-to-end target detection network).
步骤S203,将化学式区域图像输入化学式识别模型,得到化学式区域图像中的候选化学式。Step S203: Input the chemical formula region image into the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image.
其中,化学式识别模型可以是用于进行化学式识别的模型。The chemical formula recognition model may be a model used for chemical formula recognition.
具体地,将化学式区域图像输入化学式识别模型,化学式识别模型可以基于神经网络 搭建,预先经过端到端的图像描述(Image Caption)训练。化学式识别模型先对化学式区域图像进行编码,再进行解码,输出化学式区域图像中的化学式,即得到候选化学式。Specifically, the chemical formula region image is input into the chemical formula recognition model. The chemical formula recognition model can be built based on the neural network and has been trained with end-to-end image description (Image Caption) in advance. The chemical formula recognition model first encodes the chemical formula region image, then decodes it, and outputs the chemical formula in the chemical formula region image to obtain the candidate chemical formula.
在一个实施例中,候选化学式可以是SMILES或InCHI表达法的化学式,其中,SMILES与InCHI是现有的化学式的两种表示方法。In one embodiment, the candidate chemical formula may be a chemical formula expressed in SMILES or InCHI, where SMILES and InCHI are two existing expression methods for chemical formulas.
基于神经网络构建化学式识别模型,不需要传统的图像矢量化、图像分解、分子重建等图像处理技术,在流程上有所简化。对该领域的软件设计和开发人员来说,不需要进行大量的手工特征设计,流程更加简单;对于普通用户来说,化学式识别速度变得更快。Building a chemical formula recognition model based on neural networks does not require traditional image processing technologies such as image vectorization, image decomposition, and molecular reconstruction, and the process is simplified. For software designers and developers in this field, there is no need to carry out a large amount of manual feature design, and the process is simpler; for ordinary users, chemical formula recognition becomes faster.
步骤S204,根据预先建立的化学式数据库对候选化学式进行存在性校验,得到校验结果。Step S204: Existence verification is performed on the candidate chemical formulas based on the pre-established chemical formula database to obtain verification results.
其中,化学式数据库可以是预先建立的数据库,可以存储已知的全部化学式。The chemical formula database may be a pre-established database and may store all known chemical formulas.
具体地,在得到候选化学式后,根据预先建立的化学式数据库对候选化学式进行存在性校验;存在性校验,是指在化学式数据库中搜索是否存在与候选化学式相同的化学式,搜索结果可以是化学式数据库中存在与候选化学式相同的化学式,或者不存在与候选化学式相同的化学式,搜索结果即为校验结果。Specifically, after obtaining the candidate chemical formula, an existence check is performed on the candidate chemical formula according to a pre-established chemical formula database; the existence check refers to searching in the chemical formula database whether there is a chemical formula that is the same as the candidate chemical formula. The search result can be a chemical formula If there is a chemical formula that is the same as the candidate chemical formula in the database, or there is no chemical formula that is the same as the candidate chemical formula, the search result is the verification result.
步骤S205,当根据校验结果确定候选化学式存在时,将候选化学式确定为已识别化学式。Step S205: When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
具体地,当根据校验结果确定化学式数据库中存在于候选化学式相同的化学式时,表明识别出的候选化学式真实存在,将候选化学式确定为已识别化学式进行输出,得到化学式识别的结果。Specifically, when it is determined according to the verification results that the chemical formula that is the same as the candidate chemical formula exists in the chemical formula database, it indicates that the identified candidate chemical formula actually exists, and the candidate chemical formula is determined to be the identified chemical formula and output, and the result of chemical formula identification is obtained.
本实施例中,获取待检测图像后,先将待检测图像输入多目标检测模型,以确定化学式所在的图像区域,得到化学式区域图像;然后将化学式区域图像输入化学式识别模型进行化学式的识别,得到候选化学式;化学式数据库中存储有已知的化学式,当根据化学式数据库确定候选化学式真实存在时,将候选化学式确定为已识别化学式,确保了化学式识别的准确性;本申请中的多目标检测模型与化学式识别模型可以是神经网络,减少了人工参与的图像处理规则设计,简化了识别流程,提高了化学式识别的效率。In this embodiment, after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with The chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
进一步的,上述步骤S202可以包括:将待检测图像输入多目标检测模型中的特征生成网络,得到待检测图像的第一特征图;将第一特征图输入多目标检测模型中的特征提取网络,得到第二特征图;将第二特征图输入多目标检测模型中的检测层,得到待检测图像中的化学式区域图像。Further, the above step S202 may include: inputting the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected; inputting the first feature map into the feature extraction network in the multi-target detection model, Obtain the second feature map; input the second feature map into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
具体地,多目标检测模型具有三层架构,包括特征生成网络、特征提取网络以及检测层。待检测图像首选输入特征生成网络,特征生成网络将输入的待检测图像转换为特征图,从而生成第一特征图。Specifically, the multi-target detection model has a three-layer architecture, including a feature generation network, a feature extraction network, and a detection layer. The image to be detected is preferably input into the feature generation network, and the feature generation network converts the input image to be detected into a feature map, thereby generating a first feature map.
在一个实施例中,特征生成网络可以是CNN(Convolutional Neural Networks,卷积神经网络)骨干网。In one embodiment, the feature generation network may be a CNN (Convolutional Neural Networks, convolutional neural network) backbone network.
第一特征图被输入特征提取网络进行特征提取,得到第二特征图。在一个实施例中,特征提取网络可以是Transformer网络,Transformer是谷歌提出的一种模型,大量用到了自注意力(Self-Attention)机制。Transformer网络包含编码器与解码器,第一特征图可以被多目标检测模型转换为一维特征图,然后输入Transformer编码器,Transformer编码器的输出是N个固定长度的嵌入向量,其中,N是网络假设的图像中的对象数,在本申请中,N为网络假设的化学式区域图像的数量。Transformer解码器基于自注意力机制对嵌入向量进行处理,得到第二特征图。The first feature map is input into the feature extraction network for feature extraction, and the second feature map is obtained. In one embodiment, the feature extraction network may be a Transformer network. Transformer is a model proposed by Google that makes extensive use of the self-attention (Self-Attention) mechanism. The Transformer network includes an encoder and a decoder. The first feature map can be converted into a one-dimensional feature map by the multi-target detection model, and then input to the Transformer encoder. The output of the Transformer encoder is N fixed-length embedding vectors, where N is The number of objects in the image hypothesized by the network. In this application, N is the number of chemical formula region images hypothesized by the network. The Transformer decoder processes the embedding vector based on the self-attention mechanism to obtain the second feature map.
第二特征图输入检测层,检测层包括前馈神经网络,可以输出待检测图像中各图像区域的类别以及位置信息,其中类别用以表示图像区域是否包含化学式,位置信息用于表示该图像区域在待检测图像中的位置。根据检测层输出的类别以及位置信息,可以得到待检测图像中的化学式区域图像。The second feature map is input to the detection layer. The detection layer includes a feedforward neural network and can output the category and location information of each image area in the image to be detected. The category is used to indicate whether the image area contains a chemical formula, and the location information is used to indicate the image area. position in the image to be detected. According to the category and position information output by the detection layer, the chemical formula region image in the image to be detected can be obtained.
在一个实施例中,在得到检测层输出的类别以及位置信息后,根据类别和位置信息, 对待检测图像进行剪裁处理,将包含化学式的图像区域剪裁出来,得到化学式区域图像。In one embodiment, after obtaining the category and location information output by the detection layer, the image to be detected is cropped according to the category and location information, and the image area containing the chemical formula is cropped out to obtain a chemical formula area image.
需要预先对多目标检测模型进行训练,训练可以是端到端的训练,由于多目标检测模型只需要检测化学式区域图像与背景图像两类,因此设定的类别class数目设置为2;由于背景图像占大部分区域,可以采用focal loss损失解决正负样本不均衡问题。在训练中,Transformer解码器输出的字典由化学元素组成。The multi-target detection model needs to be trained in advance. The training can be end-to-end training. Since the multi-target detection model only needs to detect chemical formula region images and background images, the number of classes is set to 2; since the background image occupies In most areas, focal loss can be used to solve the problem of imbalance between positive and negative samples. During training, the Transformer decoder outputs a dictionary consisting of chemical elements.
多目标检测模型的训练不仅可以在原本就含有化学式的数据集上进行训练,还可以在大量合成的数据集上训练,例如采用cut-paste(剪贴)方法将采集的文档数据集和化学式数据集进行随机缩放结合,再自动标注其位置信息,可以节省手工标注数据集的人力成本,提高训练效率。The multi-target detection model can be trained not only on data sets that originally contain chemical formulas, but also on a large number of synthetic data sets. For example, the cut-paste method is used to combine the collected document data sets and chemical formula data sets. Combining random scaling and automatically labeling its location information can save the labor cost of manually labeling data sets and improve training efficiency.
本实施例中,多目标检测模型包含三层架构,可以先将待检测图像转换为第一特征图,然后进行特征提取得到第二特征图,再通过检测层获取包含化学式的图像区域的位置信息,从而顺利得到化学式区域图像。In this embodiment, the multi-target detection model includes a three-layer architecture. The image to be detected can be first converted into a first feature map, then feature extraction is performed to obtain a second feature map, and then the position information of the image area containing the chemical formula is obtained through the detection layer. , thereby successfully obtaining the chemical formula region image.
进一步的,上述步骤S203之前,还可以包括:对化学式区域图像进行预处理,预处理包括二值化处理、图像细化处理和图像缩放处理。Further, before the above step S203, it may also include: preprocessing the chemical formula region image, and the preprocessing includes binarization processing, image thinning processing, and image scaling processing.
具体地,在将化学式区域图像输入化学式识别模型之前,还可以先对化学式区域图像进行预处理,其中预处理可以包括二值化处理、图像细化处理以及图像缩放处理。其中,图像缩放处理是将化学式区域图像调整至预设尺寸。Specifically, before inputting the chemical formula region image into the chemical formula recognition model, the chemical formula region image may also be preprocessed, where the preprocessing may include binarization processing, image thinning processing, and image scaling processing. Among them, the image scaling process is to adjust the chemical formula region image to a preset size.
本实施例中,通过对化学式区域图像进行预处理,可以对化学式区域图像进行图像优化,方便化学式识别模型的处理,保证化学式识别的准确性。In this embodiment, by preprocessing the chemical formula region image, the chemical formula region image can be image optimized to facilitate the processing of the chemical formula recognition model and ensure the accuracy of chemical formula recognition.
进一步的,上述步骤S203可以包括:将化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图;将已编码特征图输入化学式识别模型中的解码器,得到化学式区域图像中的候选化学式。Further, the above step S203 may include: inputting the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map; inputting the encoded feature map into the decoder in the chemical formula recognition model to obtain the decoder in the chemical formula region image. Candidate chemical formula.
具体地,化学式识别模型可以包含多层次编码器和解码器两部分。多层次编码器负责对输入的化学式区域图像进行编码,得到已编码特征图;假么器负责对已编码特征图进行解码处理,输出化学式区域图像中的候选化学式。Specifically, the chemical formula recognition model can include a multi-level encoder and a decoder. The multi-level encoder is responsible for encoding the input chemical formula region image to obtain the encoded feature map; the decoder is responsible for decoding the encoded feature map and outputting the candidate chemical formula in the chemical formula region image.
在一个实施例中,多层次编码器可以基于Swin Transformer编码器搭建。Swin Transformer包含滑窗操作,具有层级设计,是一种专门用于图像处理任务的Transformer。Swin Transformer采取层次化的设计,包含4个stage,每个stage都会缩小输入特征图的分辨率,像CNN网络一样逐层扩大感受野。window attention是按照一定的尺寸将图像划分为不同的window,每次transformer的attention只在window内部进行计算。Swin Transformer中的shift window attention(转移窗口注意力)则是变更window划分的方式,让每一个像素点做attention计算的window块处于变化之中。它的滑窗操作包括不重叠的local window,和重叠的cross-window。将注意力计算限制在一个窗口中,一方面能引入CNN卷积操作的局部性,另一方面能节省计算量。In one embodiment, the multi-level encoder can be built based on the Swin Transformer encoder. Swin Transformer includes sliding window operations and has a hierarchical design. It is a Transformer specially used for image processing tasks. Swin Transformer adopts a hierarchical design, including 4 stages. Each stage will reduce the resolution of the input feature map and expand the receptive field layer by layer like a CNN network. Window attention divides the image into different windows according to a certain size. Each transformer's attention is only calculated inside the window. The shift window attention in Swin Transformer changes the way the window is divided, so that the window block for attention calculation of each pixel is changing. Its sliding window operation includes non-overlapping local window and overlapping cross-window. Limiting the attention calculation to a window can, on the one hand, introduce the locality of the CNN convolution operation, and on the other hand, save the amount of calculation.
在一个实施例中,化学式识别模型中的解码器可以基于Transformer中的解码器Decoder构建,它使用masked multi-head attention,使模型仅看到过去数据而看不到未来数据,它使用decoder的hidden Layer的值(作为Q)和encoder部分的hidden Layer的值(作为K)做attention,然后把encoder的input作为V,加权给decoder的输入。In one embodiment, the decoder in the chemical formula recognition model can be built based on the decoder Decoder in Transformer. It uses masked multi-head attention, so that the model only sees past data but not future data. It uses the hidden value of the decoder. The value of Layer (as Q) and the value of the hidden Layer in the encoder part (as K) are used as attention, and then the input of the encoder is used as V, which is weighted to the input of the decoder.
本实施例中,化学式识别模型可以是神经网络,包括多层次编码器和解码器,先对化学式区域图像进行特征提取,然后再进行解码,从而得到候选化学式。In this embodiment, the chemical formula recognition model can be a neural network, including a multi-level encoder and a decoder, which first extracts features of the chemical formula region image and then decodes them to obtain candidate chemical formulas.
进一步的,多层次编码器包括若干层顺序相连的编码层;上述将化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图的步骤可以包括:将化学式区域图像进行分片处理,得到分片图像;将分片图像输入第一层编码层,得到编码特征图;对于第一层以后的编码层,对编码特征图进行下采样,并将下采样后的编码特征图输入下一层编码层进行迭代,直至最后一层编码层;将最后一层编码层输出的编码特征图确定为已编码特征图。Further, the multi-level encoder includes several sequentially connected coding layers; the above-mentioned step of inputting the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map may include: performing slice processing on the chemical formula region image. , obtain the fragmented image; input the fragmented image into the first coding layer to obtain the coding feature map; for the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next One coding layer is iterated until the last coding layer; the coding feature map output by the last coding layer is determined as the coded feature map.
具体地,多层次编码器包括若干层顺序相连的编码层。在一个实施例中,多层次编码器可以包含四个顺序相连的编码层,即包含4个stage(stage1、stage2、stage3和stage4),4个stage会逐渐缩小输入特征图的分辨率以扩大感受野。Specifically, a multi-level encoder includes several sequentially connected coding layers. In one embodiment, the multi-layer encoder may include four sequentially connected encoding layers, that is, four stages (stage1, stage2, stage3, and stage4). The four stages will gradually reduce the resolution of the input feature map to expand the experience. wild.
化学式区域图像先被多层次编码器进行分片处理(Patch Partition),得到多个分片图像,然后输入第一层编码层进行编码处理,以进行特征提取,得到第一层编码层输出的编码特征图。The chemical formula region image is first patched by a multi-layer encoder (Patch Partition) to obtain multiple patch images, and then input to the first coding layer for coding processing for feature extraction, and the coding output by the first coding layer is obtained. Feature map.
第一层编码层输出的编码特征图在输入第二层编码层后,先进行下采样处理(Patch Merging)以缩小分辨率;其中,编码特征图在输入第二层编码层及其之后的编码层时,都会根据对应编码层设定的下采样标准进行下采样处理,使得输出的编码特征图的通道数逐渐增加。下采样处理之后,再输入第二层编码层中的编码网络进行编码处理,得到第二层编码层输出的编码特征图。After the coding feature map output by the first coding layer is input to the second coding layer, it is first subjected to downsampling processing (Patch Merging) to reduce the resolution; among them, the coding feature map is input into the second coding layer and subsequent coding layer, the downsampling process will be performed according to the downsampling standard set by the corresponding coding layer, so that the number of channels of the output coding feature map gradually increases. After the downsampling process, the coding network in the second coding layer is input for coding processing, and the coding feature map output by the second coding layer is obtained.
将第二层编码层输出的编码特征图输入后续的编码层进行迭代,直至最后一层编码层;最后一层编码层输出的编码特征图将作为已编码特征图。The coding feature map output by the second coding layer is input to the subsequent coding layer for iteration until the last coding layer; the coding feature map output by the last coding layer will be used as the coded feature map.
本实施例中,多层次编码器中的多个顺序相连的编码层对化学式区域图像进行逐层的编码,并逐层降低分辨率,可以充分地对化学式区域图像进行特征提取,确保了化学式识别的准确性。In this embodiment, multiple sequentially connected coding layers in the multi-level encoder encode the chemical formula region image layer by layer and reduce the resolution layer by layer, which can fully extract features of the chemical formula region image and ensure chemical formula recognition. accuracy.
进一步的,编码层中包括编码网络;上述将分片图像输入第一层编码层,得到编码特征图的步骤可以包括:对分片图像进行线性嵌入处理;将线性嵌入处理后的分片图像输入第一层编码层中的编码网络,以通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对线性嵌入处理后的分片图像进行特征提取,得到编码特征图。Further, the coding layer includes a coding network; the above steps of inputting the segmented image into the first coding layer to obtain the coding feature map may include: performing linear embedding processing on the segmented image; inputting the segmented image after linear embedding processing The encoding network in the first encoding layer uses a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization to process the sliced image after linear embedding. Perform feature extraction to obtain the encoded feature map.
具体地,第一层编码层在得到分片图像后,先对分片图像进行线性嵌入处理(Linear Embedding),然后将线性嵌入处理后的分片图像输出第一层编码层中的编码网络。编码网络即Swin Trasnformer Block,多层编码层中的每个编码层各自拥有一个Swin Trasnformer Block,每个Swin Trasnformer Block除了输入、输出的维度不同,内部处理逻辑是相同的。Specifically, after obtaining the segmented image, the first coding layer first performs linear embedding on the segmented image, and then outputs the linearly embedded segmented image to the coding network in the first coding layer. The coding network is Swin Trasnformer Block. Each coding layer in the multi-layer coding layer has a Swin Trasnformer Block. Each Swin Trasnformer Block has the same internal processing logic except for the input and output dimensions.
第一层编码层中的编码网络通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对线性嵌入处理后的分片图像进行特征提取,得到编码特征图;第二层编码层及以后的编码层中的编码网络通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对输入的编码特征图进行特征提取,输出新的编码特征图。The encoding network in the first coding layer characterizes the sliced images after linear embedding processing through window-based multi-head self-attention mechanism, sliding window-based multi-head self-attention mechanism, multi-layer perception mechanism and layer normalization processing. Extract and obtain the coding feature map; the coding network in the second coding layer and subsequent coding layers uses a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization The process performs feature extraction on the input encoding feature map and outputs a new encoding feature map.
在每个编码网络Swin Trasnformer Block内部采用了残差连接的思想,先对网络输入z l-1进行层归一化处理(layer normalization,LN),然后通过基于窗口的多头自注意力机制(window based multi-head self-attention,W-MSA)对LN的输出进行处理,将W-MSA与z l-1相加得到
Figure PCTCN2022089509-appb-000001
The idea of residual connection is adopted inside each coding network Swin Trasnformer Block. The network input z l-1 is first subjected to layer normalization (LN), and then through the window-based multi-head self-attention mechanism (window based multi-head self-attention, W-MSA) processes the output of LN, and adds W-MSA to z l-1 to obtain
Figure PCTCN2022089509-appb-000001
Figure PCTCN2022089509-appb-000002
输入第二个LN,通过多层感知机制(muti-layer perception,MLP)对第二个LN的输出进行处理,再将MLP的输出与
Figure PCTCN2022089509-appb-000003
相加得到z l
Figure PCTCN2022089509-appb-000002
Input the second LN, process the output of the second LN through the multi-layer perception mechanism (MLP), and then combine the output of the MLP with
Figure PCTCN2022089509-appb-000003
Add up to get z l ;
z l被输入第三个LN,通过基于滑动窗口的多头自注意力机制(shifted window based multi-head self-attention,SW-MSA)对第三个LN的输出进行处理,再将SW-MSA与z l相加,得到
Figure PCTCN2022089509-appb-000004
z l is input to the third LN, and the output of the third LN is processed through the multi-head self-attention mechanism (shifted window based multi-head self-attention, SW-MSA) based on the sliding window, and then the SW-MSA is combined with Add z l to get
Figure PCTCN2022089509-appb-000004
Figure PCTCN2022089509-appb-000005
被输入第四个LN,第四个LN的输出再进行MLP,将第二个MLP的输出与
Figure PCTCN2022089509-appb-000006
相加,得到z l+1,z l+1即为本层编码层的编码特征图。
Figure PCTCN2022089509-appb-000005
is input to the fourth LN, the output of the fourth LN is then subjected to MLP, and the output of the second MLP is combined with
Figure PCTCN2022089509-appb-000006
After adding, z l+1 is obtained, and z l+1 is the coding feature map of the coding layer of this layer.
通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对输入的编码特征图进行特征提取,输出编码特征图,引入了将CNN卷积操作的局部性,还能控制整体计算量,可以更准确地从图像中提取特征,提升了图像处理任务的准确性。Through the multi-head self-attention mechanism based on the window, the multi-head self-attention mechanism based on the sliding window, the multi-layer perception mechanism and layer normalization processing, the input encoding feature map is extracted and the encoding feature map is output. The CNN volume is introduced. The locality of the product operation can also control the overall calculation amount, extract features from the image more accurately, and improve the accuracy of image processing tasks.
本实施例中,通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理进行特征提取,输出编码特征图,可以更准确地从图像中提取特征,提升了化学式识别的准确性。In this embodiment, feature extraction is carried out through a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization processing, and the encoded feature map is output, so that the image can be extracted more accurately Feature extraction improves the accuracy of chemical formula recognition.
进一步的,上述步骤S204之后,还可以包括:当根据校验结果确定候选化学式不存在时,基于编辑距离确定候选化学式的相似化学式;将相似化学式确定为已识别化学式。Further, after the above step S204, it may also include: when it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance; determining the similar chemical formula as the recognized chemical formula.
具体地,当根据校验结果确定候选化学式是不存在的化学式时,可以计算候选化学式与化学式数据库中各化学式的编辑距离,编辑距离又叫莱文斯坦距离(Levenshtein Distance),是针对二个字符串的差异程度的量化量测,量测方式是看至少需要多少次的处理才能将一个字符串变成另一个字符串。Specifically, when it is determined that the candidate chemical formula does not exist based on the verification results, the edit distance between the candidate chemical formula and each chemical formula in the chemical formula database can be calculated. The edit distance is also called Levenshtein Distance, which is based on two characters. A quantitative measure of the degree of difference between strings by looking at the minimum number of processes required to turn one string into another.
选取与候选化学式的编辑距离最短的化学式,该化学式与候选化学式最为相似,将其确定为已识别化学式。Select the chemical formula with the shortest edit distance to the candidate chemical formula, which chemical formula is most similar to the candidate chemical formula, and determine it as the identified chemical formula.
本实施例中,当根据校验结果确定候选化学式不存在时,基于编辑距离查找候选化学式的相似化学式,将相似化学式作为已识别化学式,从而对识别结果进行修正。In this embodiment, when it is determined that the candidate chemical formula does not exist according to the verification results, similar chemical formulas of the candidate chemical formula are searched based on the edit distance, and the similar chemical formulas are regarded as the identified chemical formulas, thereby correcting the recognition result.
本申请涉及人工智能领域中的神经网络、机器学习以及计算机视觉等。This application involves neural networks, machine learning and computer vision in the field of artificial intelligence.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a computer-readable storage medium. , when executed, the computer-readable instructions may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage media can be non-volatile storage media such as magnetic disks, optical disks, read-only memory (Read-Only Memory, ROM), or random access memory (Random Access Memory, RAM), etc.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although various steps in the flowchart of the accompanying drawings are shown in sequence as indicated by arrows, these steps are not necessarily performed in the order indicated by arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders. Moreover, at least some of the steps in the flow chart of the accompanying drawings may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and their execution order is also It does not necessarily need to be performed sequentially, but may be performed in turn or alternately with other steps or sub-steps of other steps or at least part of the stages.
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种化学式识别装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to Figure 3, as an implementation of the method shown in Figure 2, the present application provides an embodiment of a chemical formula identification device. The device embodiment corresponds to the method embodiment shown in Figure 2. The device can specifically Used in various electronic equipment.
如图3所示,本实施例所述的化学式识别装置300包括:图像获取模块301、区域检测模块302、化学式识别模块303、化学式校验模块304以及化学式确定模块305,其中:As shown in Figure 3, the chemical formula identification device 300 in this embodiment includes: an image acquisition module 301, an area detection module 302, a chemical formula identification module 303, a chemical formula verification module 304 and a chemical formula determination module 305, wherein:
图像获取模块301,用于获取包含化学式的待检测图像。The image acquisition module 301 is used to acquire an image to be detected containing a chemical formula.
区域检测模块302,用于将待检测图像输入多目标检测模型,得到化学式区域图像。The region detection module 302 is used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image.
化学式识别模块303,用于将化学式区域图像输入化学式识别模型,得到化学式区域图像中的候选化学式。The chemical formula recognition module 303 is used to input the chemical formula region image into the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image.
化学式校验模块304,用于根据预先建立的化学式数据库对候选化学式进行存在性校验,得到校验结果。The chemical formula verification module 304 is used to verify the existence of candidate chemical formulas based on a pre-established chemical formula database and obtain verification results.
化学式确定模块305,用于当根据校验结果确定候选化学式存在时,将候选化学式确定为已识别化学式。The chemical formula determination module 305 is configured to determine the candidate chemical formula as the recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.
本实施例中,获取待检测图像后,先将待检测图像输入多目标检测模型,以确定化学式所在的图像区域,得到化学式区域图像;然后将化学式区域图像输入化学式识别模型进行化学式的识别,得到候选化学式;化学式数据库中存储有已知的化学式,当根据化学式数据库确定候选化学式真实存在时,将候选化学式确定为已识别化学式,确保了化学式识别的准确性;本申请中的多目标检测模型与化学式识别模型可以是神经网络,减少了人工参与的图像处理规则设计,简化了识别流程,提高了化学式识别的效率。In this embodiment, after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with The chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
在本实施例的一些可选的实现方式中,区域检测模块302可以包括:特征图生成子模 块、特征提取子模块以及区域检测子模块,其中:In some optional implementations of this embodiment, the region detection module 302 may include: a feature map generation sub-module, a feature extraction sub-module and a region detection sub-module, where:
特征图生成子模块,用于将待检测图像输入多目标检测模型中的特征生成网络,得到待检测图像的第一特征图。The feature map generation submodule is used to input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected.
特征提取子模块,用于将第一特征图输入多目标检测模型中的特征提取网络,得到第二特征图。The feature extraction submodule is used to input the first feature map into the feature extraction network in the multi-target detection model to obtain the second feature map.
区域检测子模块,用于将第二特征图输入多目标检测模型中的检测层,得到待检测图像中的化学式区域图像。The region detection submodule is used to input the second feature map into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
本实施例中,多目标检测模型包含三层架构,可以先将待检测图像转换为第一特征图,然后进行特征提取得到第二特征图,再通过检测层获取包含化学式的图像区域的位置信息,从而顺利得到化学式区域图像。In this embodiment, the multi-target detection model includes a three-layer architecture. The image to be detected can be first converted into a first feature map, then feature extraction is performed to obtain a second feature map, and then the position information of the image area containing the chemical formula is obtained through the detection layer. , thereby successfully obtaining the chemical formula region image.
在本实施例的一些可选的实现方式中,化学式识别装置300可以包括:预处理模块,预处理模块用于对化学式区域图像进行预处理,预处理包括二值化处理、图像细化处理和图像缩放处理。In some optional implementations of this embodiment, the chemical formula identification device 300 may include: a preprocessing module, which is used to preprocess the chemical formula region image. The preprocessing includes binarization processing, image thinning processing, and Image scaling processing.
本实施例中,通过对化学式区域图像进行预处理,可以对化学式区域图像进行图像优化,方便化学式识别模型的处理,保证化学式识别的准确性。In this embodiment, by preprocessing the chemical formula region image, the chemical formula region image can be image optimized to facilitate the processing of the chemical formula recognition model and ensure the accuracy of chemical formula recognition.
在本实施例的一些可选的实现方式中,化学式识别模块303可以包括:编码子模块以及解码子模块,其中:In some optional implementations of this embodiment, the chemical formula identification module 303 may include: an encoding sub-module and a decoding sub-module, where:
编码子模块,用于将化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图。The encoding submodule is used to input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map.
解码子模块,用于将已编码特征图输入化学式识别模型中的解码器,得到化学式区域图像中的候选化学式。The decoding submodule is used to input the encoded feature map into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
本实施例中,化学式识别模型可以是神经网络,包括多层次编码器和解码器,先对化学式区域图像进行特征提取,然后再进行解码,从而得到候选化学式。In this embodiment, the chemical formula recognition model can be a neural network, including a multi-level encoder and a decoder, which first extracts features of the chemical formula region image and then decodes them to obtain candidate chemical formulas.
在本实施例的一些可选的实现方式中,多层次编码器包括若干层顺序相连的编码层;编码子模块可以包括:图像分片单元、分片输入单元、迭代单元以及特征图确定单元,其中:In some optional implementations of this embodiment, the multi-level encoder includes several sequentially connected coding layers; the encoding sub-module may include: an image fragmentation unit, a fragmentation input unit, an iteration unit and a feature map determination unit, in:
图像分片单元,用于将化学式区域图像进行分片处理,得到分片图像。The image segmentation unit is used to segment the chemical formula region image to obtain segmented images.
分片输入单元,用于将分片图像输入第一层编码层,得到编码特征图。The fragment input unit is used to input the fragmented image into the first coding layer to obtain the coding feature map.
迭代单元,用于对于第一层以后的编码层,对编码特征图进行下采样,并将下采样后的编码特征图输入下一层编码层进行迭代,直至最后一层编码层。The iteration unit is used to downsample the coding feature map for the coding layer after the first layer, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer.
特征图确定单元,用于将最后一层编码层输出的编码特征图确定为已编码特征图。The feature map determination unit is used to determine the coded feature map output by the last coding layer as the coded feature map.
本实施例中,多层次编码器中的多个顺序相连的编码层对化学式区域图像进行逐层的编码,并逐层降低分辨率,可以充分地对化学式区域图像进行特征提取,确保了化学式识别的准确性。In this embodiment, multiple sequentially connected coding layers in the multi-level encoder encode the chemical formula region image layer by layer and reduce the resolution layer by layer, which can fully extract features of the chemical formula region image and ensure chemical formula recognition. accuracy.
在本实施例的一些可选的实现方式中,编码层中包括编码网络;分片输入单元可以包括:嵌入处理子单元以及特征提取子单元,其中:In some optional implementations of this embodiment, the coding layer includes a coding network; the slice input unit may include: an embedding processing subunit and a feature extraction subunit, where:
嵌入处理子单元,用于对分片图像进行线性嵌入处理。The embedding processing subunit is used to perform linear embedding processing on fragmented images.
特征提取子单元,用于将线性嵌入处理后的分片图像输入第一层编码层中的编码网络,以通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对线性嵌入处理后的分片图像进行特征提取,得到编码特征图。The feature extraction subunit is used to input the linearly embedded sliced image into the coding network in the first coding layer to use the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, and the multi-layer The perception mechanism and layer normalization process extract features from the sliced images after linear embedding processing to obtain the encoded feature map.
本实施例中,通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理进行特征提取,输出编码特征图,可以更准确地从图像中提取特征,提升了化学式识别的准确性。In this embodiment, feature extraction is carried out through a window-based multi-head self-attention mechanism, a sliding window-based multi-head self-attention mechanism, a multi-layer perception mechanism and layer normalization processing, and the encoded feature map is output, so that the image can be extracted more accurately Feature extraction improves the accuracy of chemical formula recognition.
在本实施例的一些可选的实现方式中,化学式识别装置300可以包括:相似确定模块以及相似确定模块,其中:In some optional implementations of this embodiment, the chemical formula identification device 300 may include: a similarity determination module and a similarity determination module, wherein:
相似确定模块,用于当根据校验结果确定候选化学式不存在时,基于编辑距离确定候 选化学式的相似化学式。The similarity determination module is used to determine similar chemical formulas of the candidate chemical formula based on edit distance when it is determined according to the verification results that the candidate chemical formula does not exist.
相似确定模块,用于将相似化学式确定为已识别化学式。Similarity determination module is used to determine similar chemical formulas as recognized chemical formulas.
本实施例中,当根据校验结果确定候选化学式不存在时,基于编辑距离查找候选化学式的相似化学式,将相似化学式作为已识别化学式,从而对识别结果进行修正。In this embodiment, when it is determined that the candidate chemical formula does not exist according to the verification results, similar chemical formulas of the candidate chemical formula are searched based on the edit distance, and the similar chemical formulas are regarded as the identified chemical formulas, thereby correcting the recognition result.
关于化学式识别装置的具体限定可以参见上文中对于化学式识别方法的限定,在此不再赘述。上述化学式识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现,例如,在一个实施例中,区域检测模块实际上对应于多目标检测模型,化学式识别模块对应于化学式识别模型。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For specific limitations on the chemical formula identification device, please refer to the above limitations on the chemical formula identification method, which will not be described again here. Each module in the above-mentioned chemical formula recognition device can be implemented in whole or in part by software, hardware, and combinations thereof. For example, in one embodiment, the area detection module actually corresponds to the multi-target detection model, and the chemical formula recognition module corresponds to the chemical formula recognition model. . Each of the above modules may be embedded in or independent of the processor of the computer device in the form of hardware, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。In order to solve the above technical problems, embodiments of the present application also provide computer equipment. Please refer to Figure 4 for details. Figure 4 is a basic structural block diagram of the computer equipment in this embodiment.
所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that are connected to each other for communication through a system bus. It should be noted that only the computer device 4 having components 41 - 43 is shown in the figure, but it should be understood that implementation of all the components shown is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculations and/or information processing according to preset or stored instructions. Its hardware includes but is not limited to microprocessors, special-purpose Integrated circuits (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a desktop computer, a notebook, a PDA, a cloud server and other computing devices. The computer device can perform human-computer interaction with the user through keyboard, mouse, remote control, touch panel or voice control device.
所述存储器41至少包括一种类型的计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如化学式识别方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 41 includes at least one type of computer-readable storage medium. The computer-readable storage medium can be non-volatile or volatile. The computer-readable storage medium includes flash memory, hard disk, and multimedia card. , card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programming read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or memory of the computer device 4 . In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMC), or a secure digital card equipped on the computer device 4. (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is usually used to store operating systems and various application software installed on the computer device 4, such as computer-readable instructions for chemical formula identification methods, etc. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行所述化学式识别方法的计算机可读指令。The processor 42 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 42 is generally used to control the overall operation of the computer device 4 . In this embodiment, the processor 42 is configured to run computer-readable instructions stored in the memory 41 or process data, such as running computer-readable instructions for the chemical formula identification method.
所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。The network interface 43 may include a wireless network interface or a wired network interface. The network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
本实施例中提供的计算机设备可以执行上述化学式识别方法。此处化学式识别方法可以是上述各个实施例的化学式识别方法。The computer device provided in this embodiment can execute the above chemical formula identification method. The chemical formula identification method here may be the chemical formula identification method of each of the above embodiments.
本实施例中,获取待检测图像后,先将待检测图像输入多目标检测模型,以确定化学式所在的图像区域,得到化学式区域图像;然后将化学式区域图像输入化学式识别模型进行化学式的识别,得到候选化学式;化学式数据库中存储有已知的化学式,当根据化学式数据库确定候选化学式真实存在时,将候选化学式确定为已识别化学式,确保了化学式识 别的准确性;本申请中的多目标检测模型与化学式识别模型可以是神经网络,减少了人工参与的图像处理规则设计,简化了识别流程,提高了化学式识别的效率。In this embodiment, after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with The chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的化学式识别方法的步骤。The present application also provides another implementation, that is, a computer-readable storage medium is provided, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the chemical formula identification method as described above.
本实施例中,获取待检测图像后,先将待检测图像输入多目标检测模型,以确定化学式所在的图像区域,得到化学式区域图像;然后将化学式区域图像输入化学式识别模型进行化学式的识别,得到候选化学式;化学式数据库中存储有已知的化学式,当根据化学式数据库确定候选化学式真实存在时,将候选化学式确定为已识别化学式,确保了化学式识别的准确性;本申请中的多目标检测模型与化学式识别模型可以是神经网络,减少了人工参与的图像处理规则设计,简化了识别流程,提高了化学式识别的效率。In this embodiment, after obtaining the image to be detected, the image to be detected is first input into the multi-target detection model to determine the image area where the chemical formula is located, and the chemical formula area image is obtained; then the chemical formula area image is input into the chemical formula recognition model to identify the chemical formula, and we obtain Candidate chemical formulas; known chemical formulas are stored in the chemical formula database. When the candidate chemical formula is determined to actually exist according to the chemical formula database, the candidate chemical formula is determined as the identified chemical formula, ensuring the accuracy of chemical formula identification; the multi-target detection model in this application is consistent with The chemical formula recognition model can be a neural network, which reduces manual participation in image processing rule design, simplifies the recognition process, and improves the efficiency of chemical formula recognition.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only some of the embodiments of the present application, rather than all the embodiments. The preferred embodiments of the present application are given in the drawings, but do not limit the patent scope of the present application. The present application may be embodied in many different forms; rather, these embodiments are provided in order to provide a thorough and comprehensive understanding of the disclosure of the present application. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. . Any equivalent structure made using the contents of the specification and drawings of this application and directly or indirectly used in other related technical fields shall likewise fall within the scope of patent protection of this application.

Claims (20)

  1. 一种化学式识别方法,包括下述步骤:A chemical formula identification method includes the following steps:
    获取包含化学式的待检测图像;Obtain the image to be detected containing the chemical formula;
    将所述待检测图像输入多目标检测模型,得到化学式区域图像;Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
    将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
    根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;
    当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
  2. 根据权利要求1所述的化学式识别方法,其中,所述将所述待检测图像输入多目标检测模型,得到化学式区域图像的步骤包括:The chemical formula recognition method according to claim 1, wherein the step of inputting the image to be detected into a multi-target detection model to obtain the chemical formula region image includes:
    将所述待检测图像输入多目标检测模型中的特征生成网络,得到所述待检测图像的第一特征图;Input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected;
    将所述第一特征图输入所述多目标检测模型中的特征提取网络,得到第二特征图;Input the first feature map into the feature extraction network in the multi-target detection model to obtain a second feature map;
    将所述第二特征图输入所述多目标检测模型中的检测层,得到所述待检测图像中的化学式区域图像。The second feature map is input into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
  3. 根据权利要求1所述的化学式识别方法,其中,在所述将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式的步骤之前,还包括:The chemical formula recognition method according to claim 1, wherein before the step of inputting the chemical formula region image into a chemical formula recognition model to obtain the candidate chemical formula in the chemical formula region image, it further includes:
    对所述化学式区域图像进行预处理,所述预处理包括二值化处理、图像细化处理和图像缩放处理。The chemical formula region image is preprocessed, and the preprocessing includes binarization processing, image thinning processing, and image scaling processing.
  4. 根据权利要求1所述的化学式识别方法,其中,所述将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式的步骤包括:The chemical formula recognition method according to claim 1, wherein the step of inputting the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image includes:
    将所述化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图;Input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map;
    将所述已编码特征图输入所述化学式识别模型中的解码器,得到所述化学式区域图像中的候选化学式。The encoded feature map is input into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
  5. 根据权利要求4所述的化学式识别方法,其中,所述多层次编码器包括若干层顺序相连的编码层;所述将所述化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图的步骤包括:The chemical formula recognition method according to claim 4, wherein the multi-level encoder includes several sequentially connected coding layers; the multi-level encoder that inputs the chemical formula region image into the chemical formula recognition model obtains the encoded The steps of feature map include:
    将所述化学式区域图像进行分片处理,得到分片图像;Perform segmentation processing on the chemical formula region image to obtain segmented images;
    将所述分片图像输入第一层编码层,得到编码特征图;Input the fragmented image into the first coding layer to obtain a coding feature map;
    对于第一层以后的编码层,对所述编码特征图进行下采样,并将下采样后的编码特征图输入下一层编码层进行迭代,直至最后一层编码层;For the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer;
    将所述最后一层编码层输出的编码特征图确定为已编码特征图。The coding feature map output by the last coding layer is determined as a coded feature map.
  6. 根据权利要求5所述的化学式识别方法,其中,所述编码层中包括编码网络;所述将所述分片图像输入第一层编码层,得到编码特征图的步骤包括:The chemical formula identification method according to claim 5, wherein the coding layer includes a coding network; the step of inputting the sliced image into the first coding layer to obtain the coding feature map includes:
    对所述分片图像进行线性嵌入处理;Perform linear embedding processing on the sliced images;
    将线性嵌入处理后的分片图像输入第一层编码层中的编码网络,以通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对所述线性嵌入处理后的分片图像进行特征提取,得到编码特征图。The linearly embedded sliced image is input into the encoding network in the first encoding layer to pass the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, the multi-layer perception mechanism and layer normalization. Processing: Perform feature extraction on the sliced image processed by the linear embedding to obtain a coded feature map.
  7. 根据权利要求1所述的化学式识别方法,其中,在所述根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果的步骤之后,还包括:The chemical formula identification method according to claim 1, wherein after the step of performing an existence check on the candidate chemical formula according to the pre-established chemical formula database and obtaining the verification result, it further includes:
    当根据所述校验结果确定所述候选化学式不存在时,基于编辑距离确定所述候选化学式的相似化学式;When it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance;
    将所述相似化学式确定为已识别化学式。The similar chemical formula is determined as a recognized chemical formula.
  8. 一种化学式识别装置,包括:A chemical formula identification device including:
    图像获取模块,用于获取包含化学式的待检测图像;Image acquisition module, used to acquire images to be detected containing chemical formulas;
    区域检测模块,用于将所述待检测图像输入多目标检测模型,得到化学式区域图像;A region detection module, used to input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
    化学式识别模块,用于将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;A chemical formula recognition module, used to input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
    化学式校验模块,用于根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;A chemical formula verification module, used to verify the existence of the candidate chemical formula according to a pre-established chemical formula database and obtain verification results;
    化学式确定模块,用于当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。A chemical formula determination module, configured to determine the candidate chemical formula as a recognized chemical formula when it is determined that the candidate chemical formula exists according to the verification result.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory and a processor. Computer-readable instructions are stored in the memory. When the processor executes the computer-readable instructions, the following steps are implemented:
    获取包含化学式的待检测图像;Obtain the image to be detected containing the chemical formula;
    将所述待检测图像输入多目标检测模型,得到化学式区域图像;Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
    将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
    根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;
    当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
  10. 根据权利要求9所述的计算机设备,其中,所述将所述待检测图像输入多目标检测模型,得到化学式区域图像的步骤包括:The computer device according to claim 9, wherein the step of inputting the image to be detected into a multi-target detection model to obtain the chemical formula region image includes:
    将所述待检测图像输入多目标检测模型中的特征生成网络,得到所述待检测图像的第一特征图;Input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected;
    将所述第一特征图输入所述多目标检测模型中的特征提取网络,得到第二特征图;Input the first feature map into the feature extraction network in the multi-target detection model to obtain a second feature map;
    将所述第二特征图输入所述多目标检测模型中的检测层,得到所述待检测图像中的化学式区域图像。The second feature map is input into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
  11. 根据权利要求9所述的计算机设备,其中,所述将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式的步骤包括:The computer device according to claim 9, wherein the step of inputting the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image includes:
    将所述化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图;Input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map;
    将所述已编码特征图输入所述化学式识别模型中的解码器,得到所述化学式区域图像中的候选化学式。The encoded feature map is input into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
  12. 根据权利要求11所述的计算机设备,其中,所述多层次编码器包括若干层顺序相连的编码层;所述将所述化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图的步骤包括:The computer device according to claim 11, wherein the multi-level encoder includes several sequentially connected coding layers; the multi-level encoder inputs the chemical formula region image into the chemical formula recognition model to obtain the encoded features. The steps in the diagram include:
    将所述化学式区域图像进行分片处理,得到分片图像;Perform segmentation processing on the chemical formula region image to obtain segmented images;
    将所述分片图像输入第一层编码层,得到编码特征图;Input the fragmented image into the first coding layer to obtain a coding feature map;
    对于第一层以后的编码层,对所述编码特征图进行下采样,并将下采样后的编码特征图输入下一层编码层进行迭代,直至最后一层编码层;For the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer;
    将所述最后一层编码层输出的编码特征图确定为已编码特征图。The coding feature map output by the last coding layer is determined as a coded feature map.
  13. 根据权利要求12所述的计算机设备,其中,所述编码层中包括编码网络;所述将所述分片图像输入第一层编码层,得到编码特征图的步骤包括:The computer device according to claim 12, wherein the coding layer includes a coding network; the step of inputting the sliced image into the first coding layer to obtain the coding feature map includes:
    对所述分片图像进行线性嵌入处理;Perform linear embedding processing on the sliced images;
    将线性嵌入处理后的分片图像输入第一层编码层中的编码网络,以通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对所述线性嵌入处理后的分片图像进行特征提取,得到编码特征图。The linearly embedded sliced image is input into the encoding network in the first encoding layer to pass the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, the multi-layer perception mechanism and layer normalization. Processing: Perform feature extraction on the sliced image processed by the linear embedding to obtain a coded feature map.
  14. 根据权利要求9所述的计算机设备,其中,所述根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果的步骤之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein after the step of verifying the existence of the candidate chemical formula according to the pre-established chemical formula database and obtaining the verification result, the processor executes the computer readable instructions The following steps are also implemented:
    当根据所述校验结果确定所述候选化学式不存在时,基于编辑距离确定所述候选化学式的相似化学式;When it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance;
    将所述相似化学式确定为已识别化学式。The similar chemical formula is determined as a recognized chemical formula.
  15. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令;所述计算机可读指令被处理器执行时实现如下步骤:A computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions; when the computer-readable instructions are executed by a processor, the following steps are implemented:
    获取包含化学式的待检测图像;Obtain the image to be detected containing the chemical formula;
    将所述待检测图像输入多目标检测模型,得到化学式区域图像;Input the image to be detected into a multi-target detection model to obtain a chemical formula region image;
    将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式;Input the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image;
    根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果;Perform existence verification on the candidate chemical formula according to the pre-established chemical formula database to obtain verification results;
    当根据所述校验结果确定所述候选化学式存在时,将所述候选化学式确定为已识别化学式。When it is determined that the candidate chemical formula exists according to the verification result, the candidate chemical formula is determined as the recognized chemical formula.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述待检测图像输入多目标检测模型,得到化学式区域图像的步骤包括:The computer-readable storage medium according to claim 15, wherein the step of inputting the image to be detected into a multi-target detection model to obtain the chemical formula region image includes:
    将所述待检测图像输入多目标检测模型中的特征生成网络,得到所述待检测图像的第一特征图;Input the image to be detected into the feature generation network in the multi-target detection model to obtain the first feature map of the image to be detected;
    将所述第一特征图输入所述多目标检测模型中的特征提取网络,得到第二特征图;Input the first feature map into the feature extraction network in the multi-target detection model to obtain a second feature map;
    将所述第二特征图输入所述多目标检测模型中的检测层,得到所述待检测图像中的化学式区域图像。The second feature map is input into the detection layer in the multi-target detection model to obtain the chemical formula region image in the image to be detected.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述化学式区域图像输入化学式识别模型,得到所述化学式区域图像中的候选化学式的步骤包括:The computer-readable storage medium according to claim 15, wherein the step of inputting the chemical formula region image into a chemical formula recognition model to obtain candidate chemical formulas in the chemical formula region image includes:
    将所述化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图;Input the chemical formula region image into the multi-level encoder in the chemical formula recognition model to obtain the encoded feature map;
    将所述已编码特征图输入所述化学式识别模型中的解码器,得到所述化学式区域图像中的候选化学式。The encoded feature map is input into the decoder in the chemical formula recognition model to obtain candidate chemical formulas in the chemical formula area image.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述多层次编码器包括若干层顺序相连的编码层;所述将所述化学式区域图像输入化学式识别模型中的多层次编码器,得到已编码特征图的步骤包括:The computer-readable storage medium according to claim 17, wherein the multi-level encoder includes several sequentially connected coding layers; the multi-level encoder that inputs the chemical formula region image into the chemical formula recognition model obtains The steps for encoding feature maps include:
    将所述化学式区域图像进行分片处理,得到分片图像;Perform segmentation processing on the chemical formula region image to obtain segmented images;
    将所述分片图像输入第一层编码层,得到编码特征图;Input the fragmented image into the first coding layer to obtain a coding feature map;
    对于第一层以后的编码层,对所述编码特征图进行下采样,并将下采样后的编码特征图输入下一层编码层进行迭代,直至最后一层编码层;For the coding layers after the first layer, downsample the coding feature map, and input the downsampled coding feature map into the next coding layer for iteration until the last coding layer;
    将所述最后一层编码层输出的编码特征图确定为已编码特征图。The coding feature map output by the last coding layer is determined as a coded feature map.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述编码层中包括编码网络;所述将所述分片图像输入第一层编码层,得到编码特征图的步骤包括:The computer-readable storage medium according to claim 18, wherein the coding layer includes a coding network; the step of inputting the sliced image into the first coding layer to obtain the coding feature map includes:
    对所述分片图像进行线性嵌入处理;Perform linear embedding processing on the sliced images;
    将线性嵌入处理后的分片图像输入第一层编码层中的编码网络,以通过基于窗口的多头自注意力机制、基于滑动窗口的多头自注意力机制、多层感知机制和层归一化处理对所述线性嵌入处理后的分片图像进行特征提取,得到编码特征图。The linearly embedded sliced image is input into the encoding network in the first encoding layer to pass the window-based multi-head self-attention mechanism, the sliding window-based multi-head self-attention mechanism, the multi-layer perception mechanism and layer normalization. Processing: Perform feature extraction on the sliced image processed by the linear embedding to obtain a coded feature map.
  20. 根据权利要求15所述的计算机可读存储介质,其中,在所述根据预先建立的化学式数据库对所述候选化学式进行存在性校验,得到校验结果的步骤之后,所述计算机可读指令被处理器执行时还实现如下步骤:The computer-readable storage medium according to claim 15, wherein after the step of verifying the existence of the candidate chemical formula according to the pre-established chemical formula database and obtaining the verification result, the computer-readable instruction is The processor also implements the following steps when executing:
    当根据所述校验结果确定所述候选化学式不存在时,基于编辑距离确定所述候选化学式的相似化学式;When it is determined that the candidate chemical formula does not exist according to the verification result, determining similar chemical formulas of the candidate chemical formula based on edit distance;
    将所述相似化学式确定为已识别化学式。The similar chemical formula is determined as a recognized chemical formula.
PCT/CN2022/089509 2022-03-15 2022-04-27 Chemical formula identification method and apparatus, computer device, and storage medium WO2023173536A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210255360.0 2022-03-15
CN202210255360.0A CN114627462A (en) 2022-03-15 2022-03-15 Chemical formula identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023173536A1 true WO2023173536A1 (en) 2023-09-21

Family

ID=81902610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089509 WO2023173536A1 (en) 2022-03-15 2022-04-27 Chemical formula identification method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN114627462A (en)
WO (1) WO2023173536A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721713B (en) * 2023-08-09 2023-10-31 北京望石智慧科技有限公司 Data set construction method and device oriented to chemical structural formula identification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868728A (en) * 2016-04-12 2016-08-17 中国传媒大学 Method for detecting chemical formula in image based on characteristics of chemical formula
US20170364744A1 (en) * 2016-06-20 2017-12-21 Machine Learning Works, LLC Neural network based recognition of mathematical expressions
CN110413740A (en) * 2019-08-06 2019-11-05 百度在线网络技术(北京)有限公司 Querying method, device, electronic equipment and the storage medium of chemical expression
CN114121179A (en) * 2022-01-28 2022-03-01 药渡经纬信息科技(北京)有限公司 Extraction method and extraction device of chemical structural formula

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868728A (en) * 2016-04-12 2016-08-17 中国传媒大学 Method for detecting chemical formula in image based on characteristics of chemical formula
US20170364744A1 (en) * 2016-06-20 2017-12-21 Machine Learning Works, LLC Neural network based recognition of mathematical expressions
CN110413740A (en) * 2019-08-06 2019-11-05 百度在线网络技术(北京)有限公司 Querying method, device, electronic equipment and the storage medium of chemical expression
CN114121179A (en) * 2022-01-28 2022-03-01 药渡经纬信息科技(北京)有限公司 Extraction method and extraction device of chemical structural formula

Also Published As

Publication number Publication date
CN114627462A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN112685565B (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN107679039B (en) Method and device for determining statement intention
US20230106873A1 (en) Text extraction method, text extraction model training method, electronic device and storage medium
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN110442856B (en) Address information standardization method and device, computer equipment and storage medium
CN110532381B (en) Text vector acquisition method and device, computer equipment and storage medium
CN113989593A (en) Image processing method, search method, training method, device, equipment and medium
CN112632278A (en) Labeling method, device, equipment and storage medium based on multi-label classification
CN112287069A (en) Information retrieval method and device based on voice semantics and computer equipment
CN113869138A (en) Multi-scale target detection method and device and computer readable storage medium
CN112749695A (en) Text recognition method and device
CN115565177B (en) Character recognition model training, character recognition method, device, equipment and medium
CN114780746A (en) Knowledge graph-based document retrieval method and related equipment thereof
US20230114673A1 (en) Method for recognizing token, electronic device and storage medium
CN115438215A (en) Image-text bidirectional search and matching model training method, device, equipment and medium
CN112528029A (en) Text classification model processing method and device, computer equipment and storage medium
CN112686243A (en) Method and device for intelligently identifying picture characters, computer equipment and storage medium
WO2023173536A1 (en) Chemical formula identification method and apparatus, computer device, and storage medium
CN115438149A (en) End-to-end model training method and device, computer equipment and storage medium
CN115983271A (en) Named entity recognition method and named entity recognition model training method
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN115312033A (en) Speech emotion recognition method, device, equipment and medium based on artificial intelligence
CN113434636A (en) Semantic-based approximate text search method and device, computer equipment and medium
US20240021000A1 (en) Image-based information extraction model, method, and apparatus, device, and storage medium
CN112329454A (en) Language identification method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931563

Country of ref document: EP

Kind code of ref document: A1