CN116704537B - Lightweight pharmacopoeia picture and text extraction method - Google Patents

Lightweight pharmacopoeia picture and text extraction method Download PDF

Info

Publication number
CN116704537B
CN116704537B CN202211539551.6A CN202211539551A CN116704537B CN 116704537 B CN116704537 B CN 116704537B CN 202211539551 A CN202211539551 A CN 202211539551A CN 116704537 B CN116704537 B CN 116704537B
Authority
CN
China
Prior art keywords
pharmacopoeia
input
stage
characteristic
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211539551.6A
Other languages
Chinese (zh)
Other versions
CN116704537A (en
Inventor
李朋
于硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202211539551.6A priority Critical patent/CN116704537B/en
Publication of CN116704537A publication Critical patent/CN116704537A/en
Application granted granted Critical
Publication of CN116704537B publication Critical patent/CN116704537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19127Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of visual document understanding, and discloses a lightweight pharmacopoeia picture and text extraction method which comprises two key steps. 1) Constructing a pharmacopoeia characteristic light focusing module: firstly, constructing a low-rank neural network layer by using a full-rank network feature main component, then designing a focusing strategy to extract key information from input features, and 2) constructing a pharmacopoeia document information identification extraction network: the 8 pharmacopoeia characteristic light focusing modules are connected in series to form a network skeleton, and a multi-stage encoder is constructed to extract pharmacopoeia data characteristic embedding; then, 8 pharmacopoeia characteristic light focusing modules are connected in series to form a network skeleton, and a multi-stage decoder is constructed to convert pharmacopoeia data information into a specific text, so that pharmacopoeia electronization is realized; and finally, measuring the difference between the pharmacopoeia data text extracted by the decoder and the original pharmacopoeia data text by using the cross entropy loss, and optimizing network parameters by minimizing the cross entropy loss.

Description

Lightweight pharmacopoeia picture and text extraction method
Technical Field
The invention belongs to the technical field of visual document understanding, and relates to a lightweight pharmacopoeia picture and text extraction method.
Background
Currently, the great variety and large scale of pharmaceutical agents result in difficult pharmacopoeia management and maintenance. Meanwhile, the biomedical industry is vigorously developed, various novel pharmaceutical preparations are rapidly increased, and the pharmacopoeia management difficulty is further increased. The pharmacopoeia electronization is realized by combining the information technology, and the pharmacopoeia management formula is hopeful. However, current pharmacopoeia electronization still has a great challenge, mainly represented by low speed, structuring degree and integration degree of data acquisition in the pharmacopoeia data acquisition process, which makes it difficult to efficiently manage and utilize effective knowledge in pharmacopoeia. Thus, there is an urgent need to develop a new method to more effectively promote the progress of pharmacopoeia electronization.
The information extraction means that character information is identified and extracted from image data of a document, is a key task for understanding a visual document, and widely exists in the data electronization process. Conventional information extraction methods generally rely on Optical Character Recognition (OCR) to scan document materials, detect scanned images, obtain text content information, and further perform character recognition and layout recovery using algorithms such as image classification or template matching. In recent years, as the popularity and complexity of acquiring data are continuously improved, the usability of the information extraction method based on the traditional OCR is gradually reduced, and related researchers develop the information extraction method based on the depth OCR. Among the current many information extraction methods based on depth OCR, a method combining a Convolutional Neural Network (CNN), a cyclic neural network (RNN), and an attention mechanism is the mainstream, and a good effect is obtained in recognition extraction of pharmacopoeia related documents.
However, most current information extraction methods that rely on OCR have inherent disadvantages. As a preprocessing method, OCR generally requires an expensive training expenditure, requires additional inference cost in a scene of pursuing high-quality information extraction output, and may further propagate internal errors to the rest of the information extraction method, affecting the method performance. In addition, the current information extraction method relies on an attention mechanism to extract data key information, so that the calculation cost is too high, and the requirement of pharmacopoeia electronization on efficient data acquisition is difficult to meet.
In summary, the invention provides a light-weight pharmacopoeia picture text extraction method, which utilizes the weight main component approximation idea to design a low-rank pharmacopoeia characteristic light-weight focusing module to realize efficient key information extraction, and then constructs an effective pharmacopoeia document information identification extraction network based on the coding and decoding idea to realize accurate and efficient electronization of pharmacopoeia data.
Disclosure of Invention
In order to solve the problems, the invention provides a lightweight pharmacopoeia picture and text extraction method, which comprises the following steps:
step 1, constructing a pharmacopoeia characteristic light focusing module
The construction of the pharmacopoeia characteristic light focusing module comprises the construction of a low-rank neural network layer and the realization of a focusing strategy;
construction of a low-rank neural network layer: according to the tensor CP decomposition principle, using a network weight main component to perform weighting calculation tasks in the neural network layer, and constructing a low-rank neural network layer;
specifically, the low rank neural network layer comprises the input variable of the layer is marked as characteristic z, the output variable is marked as characteristic z', an activation function sigma, a bias vector b and an importance factor lambda r K weight vectors for the layerThe pharmacopoeia characteristic light focusing module multiplies the characteristic z by K weight vectors in turn, and then the characteristic z is multiplied by an importance factor lambda according to the value range of tensor rank r r Weighted summation is carried out, then the offset vector b is overlapped, and finally, the final calculation result z' is obtained through the activation function sigma; the specific formula of the low-rank full-connection layer calculation process is as follows:
wherein W is an equivalent network weight tensor,r is the value range of R which is set for the vector outer product;
the specific formula of the calculation process of the low-rank convolution layer is as follows:
wherein ,and->Output elements and input elements of the low-rank convolution layer respectively; i.e 1 ,i 2 ,i 3 The value range is the dimension of the output characteristic, which is the subscript of the element in the output characteristic; j (j) 1 ,j 2 ,j 3 The value range is the dimension of the convolution kernel;
implementation of the focusing strategy: the pharmacopoeia characteristic light focusing module comprises a low-rank full-connection layer f for mapping input characteristics, L low-rank convolution layers c for extracting input characteristic multi-layer representation and L gating factors G l And mapping low-rank full-connection layers h and q of the modulation characteristic and the query characteristic respectively; the pharmacopoeia characteristic light focusing module is used for cascading a plurality of low-rank convolution layers and mapping input characteristics into multi-layer representation; then fusing the multi-layer representation by using a gating mechanism to obtain multi-layer integration characteristics of the input characteristics; finally, mapping the input features and the multi-layer integration features into query features and modulation features by using two low-rank full-connection layers respectively, and obtaining key information of the input features by using element-by-element multiplication of the query features and the modulation features;
specifically: given the input feature z, the pharmacopoeia feature lightweight focus module obtains an initial representation z of the input feature using low rank full connected layer mapping 0 =f (z); then, L low-rank convolution layer cascade mapping is utilized to obtain multi-layer representation z of input characteristics l =c(z l-1 ) L=1, 2, … L; then using L gating factors G l And multi-layer representation z l Corresponding element-by-element multiplication and superposition are carried out to obtain a multi-layer integration feature; finally, mapping the multi-layer integrated feature and the original input feature Z into a modulation feature and a query feature in a public feature space by using a low-rank full-connection layer respectively, and obtaining key information of the input feature, namely a focusing feature Z, by multiplying elements of the two features; the specific calculation formula of the process is as follows:
the pharmacopoeia characteristic light focusing module utilizes the low-rank full-connection layer and the low-rank convolution layer to extract the input characteristic key information, effectively reduces the quantity of model parameters under the condition of guaranteeing the input characteristic extraction effect, and improves the operation efficiency of the module;
step 2, constructing a pharmacopoeia document information identification extraction network
The pharmacopoeia document information identification and extraction network comprises an encoder and a decoder; wherein the encoder comprises five computation stages, the first stage is to input H×WConversion of the pharmacopoeia image of x 3 toA sequence of length 48; the second stage comprises 2 pharmacopoeia characteristic light focusing modules, outputting +.>Takes as input a two-dimensional sequence of (2) into +.>An output feature of length 128; the third stage comprises 2 pharmacopoeia characteristic light focusing modules, and the light focusing modules are output in the second stageIs converted into +.>Output features of length 256; the fourth stage comprises 14 pharmacopoeia characteristic light focusing modules, outputting +.>Is converted into +.>An output feature of length 512; the fifth stage comprises 2 pharmacopoeia characteristic light focusing modules, and the light focusing modules are output in the fourth stageIs converted into +.>Output features 1024 in length; the decoder comprises four calculation stages, the first stage uses the output characteristics of the encoder (++>Length 1024) as input, converted to +.>Output features 1024 in length; the second, third and fourth phases are the same as the first phase in calculation; the specific construction process is as follows:
construction of an encoder: the encoder comprises five stages connected end to end, refines the input pharmacopoeia image data to be processed stage by stage, and extracts characteristic information contained in the pharmacopoeia image data;
the first stage is a block division stage; given x as input pharmacopoeia image data to be processed, the height, width and channel number of the input pharmacopoeia image data are H, W and 3 respectively, the block dividing stage divides the input image into non-overlapped blocks with the size of 4×4×3, each block dimension is 4×4×3=48, and the block number isI.e. the input pharmacopoeia image data to be processed is converted into +.>Is a two-dimensional sequence of (2);
the second stage takes the output of the first stage as input, and sequentially passes through a focusing characteristic extraction stage, a cyclic shift stage and a focusing characteristic extraction stage, wherein the second stage comprises a low-rank full-connection layer and two pharmacopoeia characteristic lightweight focusing modules; the method comprises the following steps: firstly, mapping each block with dimension 48 to 128 dimensions by adopting a low-rank full connection layer to obtainIs a two-dimensional linear embedded sequence of (a); extracting +.f. by adopting pharmacopoeia characteristic light focusing module>Is a focus feature of (2); then circularly shifting the original block dividing boundary by half block distance along the block diagonal direction to realize the information interaction between blocks; finally extracting +.f. under new block division by adopting a second pharmacopoeia characteristic light focusing module>Embedding focusing characteristics as characteristics output in the first stage;
the third stage takes the output of the second stage as the outputThe system comprises a low-rank full-connecting layer and two pharmacopoeia characteristic light-weight focusing modules; first, adjacent 2×2 blocks in the input are spliced so that the number of blocks is determined byReduce to +.>While the block dimension is increased to 512; then reducing the size of each block to 256 by using a low-rank full connection layer; finally, the same focusing characteristic extraction-cyclic shift-focusing characteristic extraction process as the first stage is used to calculate the +.>Feature embedding;
the flow of the fourth stage, the fifth stage and the third stage is the same; the characteristic embedding output in the fifth stage is the final output of the encoder;
construction of the decoder: the decoder takes the output of the encoder as input and comprises four stages connected end to end, converts the key information extracted by the encoder into text data conforming to a specific format, and realizes the identification and extraction of the pharmacopoeia document information;
the first stage takes the output of the encoder as input and comprises two pharmacopoeia characteristic light focusing modules and two low-rank full-connection layers; firstly, mapping position information and input features into the same dimension space by using two low-rank full-connection layers to be combined, then refining the input features with the position information by using two continuous pharmacopoeia feature light focusing modules, finally amplifying the refined input feature dimensions to 4 times by using two continuous low-rank full-connection layers, recovering the original dimensions, and effectively fusing the internal information of the features by a scaling process to generate output features of a stage;
the second, third and fourth stages take the output of the previous stage as input, and the characteristic internal information is further integrated by utilizing the continuous two pharmacopoeia characteristic light focusing modules and the continuous two low-rank full-connection layers; the fourth stage output characteristic is mapped to the same dimension as the output of the encoder through a low-rank full-connection layer, namely, the text data conforming to a specific format;
step 3, calculating network model loss
Measuring the prediction loss in the extraction process of the pharmacopoeia image data characteristics, and promoting the optimization of the identification and extraction network parameters of the pharmacopoeia document information by minimizing the prediction loss; specifically, prediction loss L ce Measuring the difference between the pharmacopoeia data text predicted and extracted by the network decoder and the original pharmacopoeia data text, and forcing the encoder and the decoder to accurately learn the pharmacopoeia image data information; the predicted loss is calculated as follows:
wherein ,yi And (3) withThe text of the ith original pharmacopoeia data and the text of the predicted pharmacopoeia data are respectively, and N is the total number of the pharmacopoeia data.
Drawings
Fig. 1 is a flow chart of a lightweight pharmacopoeia picture text extraction method;
figure 2 is a frame diagram of a lightweight pharmacopoeia picture text extraction method.
Detailed Description
Embodiments of the present invention will be further described with reference to the accompanying drawings.
Fig. 2 is a frame diagram of a lightweight pharmacopoeia picture text extraction method. According to the invention, the original pharmacopoeia image data is firstly input into an encoder of a pharmacopoeia document information identification extraction network, and key information contained in the input pharmacopoeia image data is extracted by means of a pharmacopoeia characteristic light focusing module in the encoder. And then, the key information extracted by the encoder is converted by a decoder of the pharmacopoeia document information recognition extraction network and mapped into text data conforming to a specific format, so that the recognition extraction of the pharmacopoeia document information is realized. And finally, calculating model loss by using a predictive loss function, guiding optimization learning of all network parameters, and improving the accuracy of network extraction.
The method comprises the following steps:
step 1, constructing a pharmacopoeia characteristic light focusing module
The construction of the pharmacopoeia characteristic light focusing module comprises the construction of a low-rank neural network layer and the realization of a focusing strategy;
construction of a low-rank neural network layer: according to the tensor CP decomposition principle, using a network weight main component to perform weighting calculation tasks in the neural network layer, and constructing a low-rank neural network layer;
specifically, the low rank neural network layer includes the input variable of the layer as characteristic z, the output variable as characteristic z', an activation function sigma, a bias vector b and an importance factor lambda r And K weight vectors for the layerThe pharmacopoeia characteristic light focusing module multiplies the characteristic z by K weight vectors in turn, and then the characteristic z is multiplied by an importance factor lambda according to the value range of tensor rank r r The weighted summation is followed by superposition of the bias vectors b, and finally the final calculation result z' is obtained via the activation function sigma. The specific formula of the low-rank full-connection layer calculation process is as follows:
wherein W is an equivalent network weight tensor,r is the value range of R which is set for the vector outer product;
the specific formula of the calculation process of the low-rank convolution layer is as follows:
wherein ,and->Output elements of low rank convolutional layers respectivelyAnd an input element. i.e 1 ,i 2 ,i 3 The value range is the dimension of the output feature, which is the subscript of the element in the output feature. j (j) 1 ,j 2 ,j 3 The index of the element in the convolution kernel is the index of the convolution kernel.
Implementation of the focusing strategy: the pharmacopoeia characteristic light focusing module comprises a low-rank full-connection layer f for mapping input characteristics, L low-rank convolution layers c for extracting input characteristic multi-layer representation and L gating factors G l And low-rank full connection layers h and q mapping modulation characteristics and query characteristics respectively. The pharmacopoeia characteristic light focusing module is used for cascading a plurality of low-rank convolution layers and mapping input characteristics into multi-layer representation; then fusing the multi-layer representation by using a gating mechanism to obtain multi-layer integration characteristics of the input characteristics; finally, mapping the input features and the multi-layer integration features into query features and modulation features by using two low-rank full-connection layers respectively, and obtaining key information of the input features by using element-by-element multiplication of the query features and the modulation features;
specifically: given the input feature z, the pharmacopoeia feature lightweight focus module obtains an initial representation z of the input feature using low rank full connected layer mapping 0 =f (z); then, L low-rank convolution layer cascade mapping is utilized to obtain multi-layer representation z of input characteristics l =c(z l-1 ) L=1, 2, … L; then using L gating factors G l And multi-layer representation z l Corresponding element-by-element multiplication and superposition are carried out to obtain a multi-layer integration feature; finally, mapping the multi-layer integrated feature and the original input feature Z into a modulation feature and a query feature in a public feature space by using a low-rank full-connection layer respectively, and obtaining key information of the input feature, namely a focusing feature Z, by multiplying elements of the two features; the specific calculation formula of the process is as follows:
wherein h and q are low rank full connection layers that generate modulation signature and query signature, respectively. The pharmacopoeia characteristic light focusing module utilizes the low-rank full-connection layer and the low-rank convolution layer to extract the input characteristic key information, effectively reduces the quantity of model parameters under the condition of guaranteeing the input characteristic extraction effect, and improves the operation efficiency of the module;
step 2, constructing a pharmacopoeia document information identification extraction network
The pharmacopoeia document information identification and extraction network comprises an encoder and a decoder; wherein the encoder comprises five calculation stages, the first stage converts the pharmacopoeia image input as H×W×3 intoA sequence of length 48; the second stage comprises 2 pharmacopoeia characteristic light focusing modules, outputting +.>Takes as input a two-dimensional sequence of (2) into +.>An output feature of length 128; the third stage comprises 2 pharmacopoeia characteristic light focusing modules, and the light focusing modules are output in the second stageIs converted into +.>Output features of length 256; the fourth stage comprises 14 pharmacopoeia characteristic light focusing modules, outputting +.>Is converted into +.>An output feature of length 512; the fifth stage comprises 2 pharmacopoeia characteristic light focusing modules, and the light focusing modules are output in the fourth stageIs converted into +.>With 1024 output features in length. The decoder comprises four calculation stages, the first stage uses the output characteristics of the encoder (++>Length 1024) as input, converted to +.>Output features 1024 in length; the second, third and fourth stages are the same as the first stage. The specific construction process is as follows.
Construction of an encoder: the encoder comprises five stages connected end to end, refines the input pharmacopoeia image data to be processed stage by stage, and extracts characteristic information contained in the pharmacopoeia image data;
the first stage is a block division stage; given x as input pharmacopoeia image data to be processed, the height, width and channel number of the input pharmacopoeia image data are H, W and 3 respectively, the block dividing stage divides the input image into non-overlapped blocks with the size of 4×4×3, each block dimension is 4×4×3=48, and the block number isI.e. the input pharmacopoeia image data to be processed is converted into +.>Is a two-dimensional sequence of (2);
the second stage takes the output of the first stage as input, and sequentially passes through a focusing characteristic extraction stage, a cyclic shift stage and a focusing characteristic extraction stage, and comprises a low-rank full-connection layer and two pharmacopoeia characteristic lightweight focusing modules. The method comprises the following steps: firstly, mapping each block with dimension 48 to 128 dimensions by adopting a low-rank full connection layer to obtainIs a two-dimensional linear embedded sequence of (a); extracting +.f. by adopting pharmacopoeia characteristic light focusing module>Is a focus feature of (2); then circularly shifting the original block dividing boundary by half a block distance along the block diagonal directionRealizing the information interaction between blocks; finally extracting +.f. under new block division by adopting a second pharmacopoeia characteristic light focusing module>Embedding focusing characteristics as characteristics output in the first stage;
and the third stage takes the output of the second stage as input and comprises a low-rank full-connecting layer and two pharmacopoeia characteristic light-weight focusing modules. First, adjacent 2×2 blocks in the input are spliced so that the number of blocks is determined byReduce to +.>While the block dimension is increased to 512; then reducing the size of each block to 256 by using a low-rank full connection layer; finally, the same focusing characteristic extraction-cyclic shift-focusing characteristic extraction process as the first stage is used to calculate the +.>Feature embedding;
the flow of the fourth stage, the fifth stage and the third stage is the same; the characteristic embedding output in the fifth stage is the final output of the encoder;
construction of the decoder: the decoder takes the output of the encoder as input and comprises four stages connected end to end, converts the key information extracted by the encoder into text data conforming to a specific format, and realizes the identification and extraction of the pharmacopoeia document information;
the first stage takes the output of the encoder as input and comprises two pharmacopoeia characteristic light focusing modules and two low-rank full connecting layers. Firstly, mapping position information and input features into the same dimension space by using two low-rank full-connection layers to be combined, then refining the input features with the position information by using two continuous pharmacopoeia feature light focusing modules, finally amplifying the refined input feature dimensions to 4 times by using two continuous low-rank full-connection layers, recovering the original dimensions, and effectively fusing the internal information of the features by a scaling process to generate output features of a stage;
the second, third and fourth stages take the output of the previous stage as input, and the characteristic internal information is further integrated by utilizing the continuous two pharmacopoeia characteristic light focusing modules and the continuous two low-rank full-connection layers; the fourth stage output characteristic is mapped to the same dimension as the output of the encoder through a low-rank full-connection layer, namely, the text data conforming to a specific format;
step 3, calculating network model loss
Measuring the prediction loss in the extraction process of the pharmacopoeia image data characteristics, and promoting the optimization of the identification and extraction network parameters of the pharmacopoeia document information by minimizing the prediction loss; specifically, prediction loss L ce The difference between the pharmacopoeia data text predicted and extracted by the network decoder and the original pharmacopoeia data text is identified and extracted by (cross entropy) measuring the pharmacopoeia document information, so that the encoder and the decoder are forced to accurately learn the pharmacopoeia image data information; the predicted loss is calculated as follows:
wherein ,yi And (3) withThe text of the ith original pharmacopoeia data and the text of the predicted pharmacopoeia data are respectively, and N is the total number of the pharmacopoeia data.
Table 1 the network structure for identifying and extracting the information of the present invention pharmacopoeia documents
/>
In table 1, LRLinear (48, 128,20, true) is a low rank full link layer with an input dimension of 48, an output dimension of 128, and a rank of 20 with offset, representing a linear embedding mapping that maps 48-dimensional blocks to the second stage of the 128-dimensional encoder. LRConv (8, (3, 3), 1, true) is a low rank convolution layer, the number of input channels is 8, the number of output channels is 8, the convolution kernel size is 3*3, the rank is 1, the step size is 1, the zero padding layer number is 1, and the offset is provided, so that the multi-layer representation extraction process of the second stage focusing module a of the encoder is represented. The low-rank full-connection layers marked (f), (h) and (q) in the table respectively represent a low-rank full-connection layer for mapping input features, a low-rank full-connection layer for mapping modulation features and a low-rank full-connection layer for mapping query features in the pharmacopoeia feature lightweight focusing module, and the low-rank convolution layers marked (c 1) and (c 2) are 2 low-rank convolution layers for extracting input feature multi-layer representation. Each focusing module in table 1 has two consecutive low rank convolutional layers reflecting that the L value of the focusing module is set to 2. In the table, z1 is the output of the low-rank convolution layer c1, z2 is the output of the low-rank convolution layer c2, g1 and g2 are gating factors corresponding to z1 and z2 respectively, and z_sum is the multilayer integration characteristic obtained by weighting and summing z1, g1, z2 and g 2. GERU is composed of a low-rank full-connection layer-an activation function-a low-rank full-connection layer, input features are first increased to 4 times of original dimensions, then an activation function is introduced, and finally the dimensions are reduced back to the original dimensions, so that feature internal information fusion is promoted.
Verification result
In the experiments of the present invention, a general CORD unified receipt dataset was selected, and a dataset composed of partial entries was randomly selected from the first part of the chinese pharmacopoeia 2020 edition to verify the validity of the present invention, and specific information of the dataset is shown in table 2.
CORD unified receipt dataset: consists of 1000 Zhang Lading Wen Shouju images, which contain relatively complex nesting structures in addition to part of the common fields.
Chinese pharmacopoeia dataset: 1000 pictures are randomly selected from the 2020 edition of first part, wherein each document picture contains information such as prescriptions, preparation methods, characters, authentication and the like.
Table 2 specific information of dataset
Data set Number of samples Language (L)
CORD 1000 Latin language
Chinese pharmacopoeia 1000 Chinese language
Evaluation criteria used in the present invention: parameter amount (members, unit: M) required when information extraction is performed, field level F1-score (F1), and accuracy based on tree edit distance (Tree Edit Distance (TED) based accuracies, ACC).
To verify the effect of the present invention, 4 general visual document understanding models were selected: the method comprises the steps of two-way coding representation transformation model (BROS) based on space information, improved universal document understanding pre-training model (LayoutLMv 2), space-dependent semi-structured document information extraction analysis model (SPASDE), and end-to-end weak supervision document generation analysis model (WYVERN) for comparison.
The comparative results of Params, F1 and ACC performances of the method provided by the invention on the CORD data set and the Chinese pharmacopoeia data set are shown in tables 3, 4 and 5.
TABLE 3 comparison of the parameters required for the methods
Method BROS LayoutLMv2 SPADE WYVERN The invention is that
Params/M 141 190 156 170 31
Table 4 comparison of the performance of the methods on the CORD dataset
F1/% ACC/%
BROS 83.7 80.3
LayoutLMv2 88.9 87.0
SPADE 83.1 84.5
WYVERN 62.8 70.5
The invention is that 89.6 88.5
Table 5 comparison of the performance of the methods on the chinese pharmacopoeia dataset
From tables 3, 4 and 5, it can be observed that the method of the present invention is superior to the comparative baseline method in both the standard dataset CORD and the real dataset chinese pharmacopoeia for 3 evaluation indexes Params, F1 and ACC. Specifically, on a CORD data set with a complex information structure, the ACC index obtained by the method is superior to other baseline methods, and the observation shows that the method can not only effectively extract key information in a document, but also has strong applicability to the complex information structure. On the Chinese pharmacopoeia data set, the F1 evaluation index obtained by the invention slightly leads LayoutLMv2, but on the accuracy ACC index, the invention has obvious advantages compared with other baselines, which shows that the invention has stronger extraction capability on the real pharmacopoeia data. In addition, the Chinese pharmacopoeia data is composed of Chinese with complex character sets, and good indexes on the Chinese pharmacopoeia data set reflect that the Chinese pharmacopoeia data has high-efficiency and accurate information extraction capability on complex character set documents. Meanwhile, the parameter quantity required in the calculation and reasoning process is far lower than that of other methods, and the calculation and reasoning cost is effectively reduced.

Claims (1)

1. The light pharmacopoeia picture text extraction method is characterized by comprising the following steps of:
step 1, constructing a pharmacopoeia characteristic light focusing module
The construction of the pharmacopoeia characteristic light focusing module comprises the construction of a low-rank neural network layer and the realization of a focusing strategy;
construction of a low-rank neural network layer: according to the tensor CP decomposition principle, using a network weight main component to perform weighting calculation tasks in the neural network layer, and constructing a low-rank neural network layer;
specifically, the low rank neural network layer comprises an input variable characteristic z, an output variable characteristic z', an activation function sigma, a bias vector b and an importance factor lambda r K weight vectors for the layerThe pharmacopoeia characteristic light focusing module multiplies the input variable characteristic z by K weight vectors in turn, and then the importance factor lambda is used according to the value range of tensor rank r r Weighted summation is carried out, then offset vector b is overlapped, and finally final calculated result output variable characteristic z' is obtained through an activation function sigma; the specific formula of the low-rank full-connection layer calculation process is as follows:
wherein W is an equivalent network weight tensor,r is the value range of R which is set for the vector outer product;
the specific formula of the calculation process of the low-rank convolution layer is as follows:
wherein ,and->Output elements and input elements of the low-rank convolution layer respectively; i.e 1 ,i 2 ,i 3 The value range is the dimension of the output characteristic, which is the subscript of the element in the output characteristic; j (j) 1 ,j 2 ,j 3 The value range is the dimension of the convolution kernel;
implementation of the focusing strategy: the pharmacopoeia characteristic light focusing module comprises a low-rank full-connection layer f for mapping input characteristics, L low-rank convolution layers c for extracting input characteristic multi-layer representation and L gating factors G l And low-rank full-connection layers h and q (z) respectively mapping modulation characteristics and query characteristics; the pharmacopoeia characteristic light focusing module is used for cascading a plurality of low-rank convolution layers and mapping input characteristics into multi-layer representation; then fusing the multi-layer representation by using a gating mechanism to obtain multi-layer integration characteristics of the input characteristics; finally, mapping the input features and the multi-layer integration features into query features and modulation features by using two low-rank full-connection layers respectively, and obtaining key information of the input features by using element-by-element multiplication of the query features and the modulation features;
specifically: given the input variable feature z, the pharmacopoeia feature lightweight focus module obtains an initial representation z of the input feature using low rank full connected layer mapping 0 =f (z); then, L low-rank convolution layer cascade mapping is utilized to obtain multi-layer representation z of input characteristics l =c(z l-1 ) L=1, 2, … L; then using L gating factors G l And multi-layer representation z l Corresponding element-by-element multiplication and superposition are carried out to obtain a multi-layer integration feature; finally, mapping the multi-layer integrated feature and the original input variable feature Z into a modulation feature and a query feature in a public feature space by using a low-rank full-connection layer respectively, and obtaining input feature key information, namely a focusing feature Z, by multiplying elements of the two features; the specific calculation formula of the process is as follows:
the pharmacopoeia characteristic light focusing module utilizes the low-rank full-connection layer and the low-rank convolution layer to extract the input characteristic key information, effectively reduces the quantity of model parameters under the condition of guaranteeing the input characteristic extraction effect, and improves the operation efficiency of the module;
step 2, constructing a pharmacopoeia document information identification extraction network
The pharmacopoeia document information identification and extraction network comprises an encoder and a decoder; wherein the encoder comprises five calculation stages, the first stage converts the pharmacopoeia image input as H×W×3 intoA sequence of length 48; the second stage comprises 2 pharmacopoeia characteristic light focusing modules, outputting +.>Takes as input a two-dimensional sequence of (2) into +.>An output feature of length 128; the third stage comprises 2 pharmacopoeia characteristic light focusing modules, outputting +.>Is characterized by two dimensions ofFor input, convert to +.>Output features of length 256; the fourth stage comprises 14 pharmacopoeia characteristic light focusing modules, outputting +.>Is converted into +.>An output feature of length 512; the fifth stage comprises 2 pharmacopoeia feature light focusing modules, outputting +.>Is converted into +.>Output features 1024 in length; the decoder comprises four calculation stages, the first stage uses the output characteristics of the encoder +.>Length 1024 is used as input, converted into +.>Output features 1024 in length; the second, third and fourth phases are the same as the first phase in calculation; the specific construction process is as follows:
construction of an encoder: the encoder comprises five stages connected end to end, refines the input pharmacopoeia image data to be processed stage by stage, and extracts characteristic information contained in the pharmacopoeia image data;
the first stage is a block division stage; given x as input pharmacopoeia image data to be processed, the height, width and channel number of the image data are H, W and 3 respectively, and the image data are input in the block dividing stageThe incoming image is split into non-overlapping blocks of size 4×4×3, each block dimension 4×4×3=48, number of blocksI.e. the input pharmacopoeia image data to be processed is converted into +.>Is a two-dimensional sequence of (2);
the second stage takes the output of the first stage as input, and sequentially passes through a focusing characteristic extraction stage, a cyclic shift stage and a focusing characteristic extraction stage, wherein the second stage comprises a low-rank full-connection layer and two pharmacopoeia characteristic lightweight focusing modules; the method comprises the following steps: firstly, mapping each block with dimension 48 to 128 dimensions by adopting a low-rank full connection layer to obtainIs a two-dimensional linear embedded sequence of (a); extracting +.f. by adopting pharmacopoeia characteristic light focusing module>Is a focus feature of (2); then circularly shifting the original block dividing boundary by half block distance along the block diagonal direction to realize the information interaction between blocks; finally extracting +.f. under new block division by adopting a second pharmacopoeia characteristic light focusing module>Embedding focusing characteristics as characteristics output in the first stage;
the third stage takes the output of the second stage as input and comprises a low-rank full-connecting layer and two pharmacopoeia characteristic light focusing modules; first, adjacent 2×2 blocks in the input are spliced so that the number of blocks is determined byReduce to +.>While the block dimension is increased to 512; then reducing the size of each block to 256 by using a low-rank full connection layer; finally, the same focusing characteristic extraction-cyclic shift-focusing characteristic extraction process as the first stage is used to calculate the +.>Feature embedding;
the flow of the fourth stage, the fifth stage and the third stage is the same; the characteristic embedding output in the fifth stage is the final output of the encoder;
construction of the decoder: the decoder takes the output of the encoder as input and comprises four stages connected end to end, converts the key information extracted by the encoder into text data conforming to a specific format, and realizes the identification and extraction of the pharmacopoeia document information;
the first stage takes the output of the encoder as input and comprises two pharmacopoeia characteristic light focusing modules and two low-rank full-connection layers; firstly, mapping position information and input features into the same dimension space by using two low-rank full-connection layers to be combined, then refining the input features with the position information by using two continuous pharmacopoeia feature light focusing modules, finally amplifying the refined input feature dimensions to 4 times by using two continuous low-rank full-connection layers, recovering the original dimensions, and effectively fusing the internal information of the features by a scaling process to generate output features of a stage;
the second, third and fourth stages take the output of the previous stage as input, and the characteristic internal information is further integrated by utilizing the continuous two pharmacopoeia characteristic light focusing modules and the continuous two low-rank full-connection layers; the fourth stage output characteristic is mapped to the same dimension as the output of the encoder through a low-rank full-connection layer, namely, the text data conforming to a specific format;
step 3, calculating network model loss
Measuring the prediction loss in the extraction process of the pharmacopoeia image data characteristics, and promoting the optimization of the identification and extraction network parameters of the pharmacopoeia document information by minimizing the prediction loss; specifically, prediction loss L ce Network decoder for identifying and extracting information of documents in measurement pharmacopoeiaPredicting the difference between the extracted pharmacopoeia data text and the original pharmacopoeia data text, forcing the encoder and the decoder to accurately learn the pharmacopoeia image data information; the predicted loss is calculated as follows:
wherein ,yi And (3) withThe text of the ith original pharmacopoeia data and the text of the predicted pharmacopoeia data are respectively, and N is the total number of the pharmacopoeia data.
CN202211539551.6A 2022-12-02 2022-12-02 Lightweight pharmacopoeia picture and text extraction method Active CN116704537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211539551.6A CN116704537B (en) 2022-12-02 2022-12-02 Lightweight pharmacopoeia picture and text extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211539551.6A CN116704537B (en) 2022-12-02 2022-12-02 Lightweight pharmacopoeia picture and text extraction method

Publications (2)

Publication Number Publication Date
CN116704537A CN116704537A (en) 2023-09-05
CN116704537B true CN116704537B (en) 2023-11-03

Family

ID=87842135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211539551.6A Active CN116704537B (en) 2022-12-02 2022-12-02 Lightweight pharmacopoeia picture and text extraction method

Country Status (1)

Country Link
CN (1) CN116704537B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN113920210A (en) * 2021-06-21 2022-01-11 西北工业大学 Image low-rank reconstruction method based on adaptive graph learning principal component analysis method
CN114418886A (en) * 2022-01-19 2022-04-29 电子科技大学 Robustness denoising method based on deep convolution self-encoder

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021000362A1 (en) * 2019-07-04 2021-01-07 浙江大学 Deep neural network model-based address information feature extraction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111985369A (en) * 2020-08-07 2020-11-24 西北工业大学 Course field multi-modal document classification method based on cross-modal attention convolution neural network
CN113920210A (en) * 2021-06-21 2022-01-11 西北工业大学 Image low-rank reconstruction method based on adaptive graph learning principal component analysis method
CN114418886A (en) * 2022-01-19 2022-04-29 电子科技大学 Robustness denoising method based on deep convolution self-encoder

Also Published As

Publication number Publication date
CN116704537A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN110490946B (en) Text image generation method based on cross-modal similarity and antagonism network generation
CN111160343B (en) Off-line mathematical formula symbol identification method based on Self-Attention
CN111178316B (en) High-resolution remote sensing image land coverage classification method
CN112991354B (en) High-resolution remote sensing image semantic segmentation method based on deep learning
CN111858954A (en) Task-oriented text-generated image network model
CN110647632B (en) Image and text mapping technology based on machine learning
CN115116066A (en) Scene text recognition method based on character distance perception
CN112487020B (en) Method and system for converting graph of SQL to text into natural language statement
CN115761594B (en) Optical flow calculation method based on global and local coupling
CN114612902A (en) Image semantic segmentation method, device, equipment, storage medium and program product
CN116912708A (en) Remote sensing image building extraction method based on deep learning
Jiang et al. Tabcellnet: Deep learning-based tabular cell structure detection
Peng et al. Image-free single-pixel object detection
CN114708455A (en) Hyperspectral image and LiDAR data collaborative classification method
CN116704537B (en) Lightweight pharmacopoeia picture and text extraction method
CN117033609A (en) Text visual question-answering method, device, computer equipment and storage medium
Ismael et al. Unsupervised domain adaptation for the semantic segmentation of remote sensing images via one-shot image-to-image translation
CN116452819A (en) Multisource remote sensing image processing method and system
Ai et al. ELUNet: an efficient and lightweight U-shape network for real-time semantic segmentation
Bashmal et al. Language Integration in Remote Sensing: Tasks, datasets, and future directions
CN115512357A (en) Zero-sample Chinese character recognition method based on component splitting
CN115713672A (en) Target detection method based on two-way parallel attention mechanism
CN113052209B (en) Single-sample semantic segmentation method fusing capsule similarity
CN113627466A (en) Image tag identification method and device, electronic equipment and readable storage medium
Zhang et al. Objective evaluation-based efficient learning framework for hyperspectral image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant