WO2023221293A1 - 基于图像处理的文档信息抽取方法、装置、设备及介质 - Google Patents

基于图像处理的文档信息抽取方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023221293A1
WO2023221293A1 PCT/CN2022/108443 CN2022108443W WO2023221293A1 WO 2023221293 A1 WO2023221293 A1 WO 2023221293A1 CN 2022108443 W CN2022108443 W CN 2022108443W WO 2023221293 A1 WO2023221293 A1 WO 2023221293A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
information
image
vector
coding
Prior art date
Application number
PCT/CN2022/108443
Other languages
English (en)
French (fr)
Inventor
陈东来
Original Assignee
深圳前海环融联易信息科技服务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海环融联易信息科技服务有限公司 filed Critical 深圳前海环融联易信息科技服务有限公司
Publication of WO2023221293A1 publication Critical patent/WO2023221293A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the technical field of document information recognition, and in particular to a document information extraction method, device, equipment and medium based on image processing.
  • the embodiments of the present application provide a document information extraction method, device, equipment and medium based on image processing, aiming to solve the problem in the existing technical methods that the required document information cannot be accurately extracted from the image efficiently.
  • embodiments of the present application provide a method for extracting document information based on image processing.
  • the method includes:
  • the character encoding sequence is parsed according to preset encoding parsing rules to obtain document information corresponding to the information extraction task.
  • embodiments of the present application provide a document information extraction device based on image processing, which includes:
  • a coding feature information acquisition unit configured to receive the input information extraction task and perform feature coding processing on the document image to be processed in the information extraction task to obtain corresponding coding feature information
  • An input vector set acquisition unit configured to segment and convert the encoding feature information according to the pixel coordinate position of the document image to be processed, so as to obtain an input vector set composed of multiple encoding feature vectors
  • An image weight feature vector acquisition unit used to input the input vector set to a preset multi-head self-attention neural network to calculate the corresponding image weight feature vector;
  • a combined feature vector acquisition unit configured to combine the task information in the information extraction task with the image weight feature vector to obtain a combined feature vector
  • a character encoding sequence acquisition unit configured to simultaneously input the image weight feature vector and the combined feature vector to a preset decoder to perform vector integration decoding to obtain the corresponding character encoding sequence;
  • a document information acquisition unit is configured to parse the character encoding sequence according to preset encoding parsing rules to obtain document information corresponding to the information extraction task.
  • embodiments of the present application provide a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program.
  • the program implements the document information extraction method based on image processing described in the first aspect.
  • embodiments of the present application further provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when executed by a processor, the computer program causes the processor to execute the above-mentioned first step.
  • the document information extraction method based on image processing described in one aspect.
  • Embodiments of the present application provide a document information extraction method, device, equipment and medium based on image processing.
  • the document image to be processed in the information extraction task is feature encoded to obtain the coded feature information and segmented and converted to obtain an input vector set.
  • the input vector set is input into the multi-head self-attention neural network to calculate the image weight feature vector.
  • the task information in the information extraction task is combined with the image weight feature vector to obtain the combined feature vector.
  • the image weight feature vector and the combined feature vector are input to the decoder at the same time. Perform vector integration and decoding to obtain a character encoding sequence, and parse the character encoding sequence to obtain document information corresponding to the information extraction task.
  • Figure 1 is a schematic flowchart of a document information extraction method based on image processing provided by an embodiment of the present application
  • Figure 2 is a schematic sub-flow diagram of the document information extraction method based on image processing provided by the embodiment of the present application;
  • Figure 3 is another sub-flow schematic diagram of the document information extraction method based on image processing provided by the embodiment of the present application;
  • Figure 4 is a schematic diagram of another sub-flow of the document information extraction method based on image processing provided by the embodiment of the present application;
  • Figure 5 is a schematic diagram of another sub-flow of the document information extraction method based on image processing provided by the embodiment of the present application;
  • Figure 6 is a schematic diagram of the latter sub-flow of the document information extraction method based on image processing provided by the embodiment of the present application;
  • Figure 7 is a schematic diagram of another sub-flow of the document information extraction method based on image processing provided by the embodiment of the present application.
  • Figure 8 is a schematic block diagram of a document information extraction device based on image processing provided by an embodiment of the present application.
  • Figure 9 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • Figure 1 is a schematic flow chart of an image processing-based document information extraction method provided by an embodiment of the present application.
  • the image processing-based document information extraction method is applied in a user terminal or a management server, and is installed on the user terminal or management server.
  • the application software in the management server is executed; the user terminal can be used to execute the document information extraction method based on image processing to analyze the input information extraction task and extract the document corresponding to the information extraction task from the document image to be processed in the information extraction task.
  • the user terminal can be a terminal device such as a desktop computer, laptop computer, tablet computer or mobile phone.
  • the management server is used to execute the document information extraction method based on image processing to analyze the information extraction tasks uploaded by the user terminal and extract information from The server side that extracts the document information corresponding to the information extraction task from the document image to be processed in the task, such as the server side built within an enterprise or government department. As shown in Figure 1, the method includes steps S110 to S160.
  • the document image to be processed in the information extraction task can be processed.
  • the information extraction task includes the document image to be processed and the task information.
  • the document image to be processed included in the information extraction task can be One or more images. If the information extraction task contains multiple document images to be processed, the document types of the multiple document images to be processed are the same; the task information can be information corresponding to a type of document. If the information extraction task is an invoice information extraction task, the information extraction task may include one or more invoice images.
  • the invoice image is the document image to be processed, and the task information is the task setting information corresponding to the invoice.
  • Information extraction tasks can also be contract information extraction tasks, form information extraction tasks, etc.
  • the method in the embodiment of the present application is implemented based on the Transformer model.
  • the feature encoding of the document to be processed can be performed first, and the obtained encoded feature information is input into the Transformer model for subsequent analysis to realize the document information. Extraction.
  • the embodiment of this application uses the information extraction task that only contains one document image to be processed as an example.
  • the method of extracting document information from multiple document images to be processed with the same document type in the information extraction task can be deduced in the same way.
  • Feature coding can be performed on the document image to be processed in the information extraction task to obtain the corresponding coding feature information.
  • the coding feature information is information that uses image coding to represent the characteristics of the document image to be processed.
  • step S110 includes sub-steps S111 and S112.
  • the document image to be processed can be converted into an image of a preset size first, and the image of the preset size can be converted into corresponding tensor feature information according to the image conversion rules.
  • the document image to be processed can be converted into an image of 384 ⁇ 384 size, and then converted into tensor feature information.
  • the image is converted into a third-order tensor, and the tensor feature information
  • the numerical value range in is all [0,1].
  • an image with a size of 384 ⁇ 384 is used as an example.
  • the document image to be processed can be converted into an image of any other size.
  • the size of the image is converted into tensor feature information corresponding to RGB.
  • the tensor feature information contains third-order tensors corresponding to the three color channels of R, G, and B. Each order tensor contains a corresponding color channel.
  • the pixel values of all pixels if the image conversion rule is the HSB conversion rule, the image of the preset size obtained after conversion is converted into tensor feature information corresponding to HSB.
  • the tensor feature information includes H (color), A third-order tensor corresponding to the three-dimensional classification of S (saturation) and B (brightness). Each order tensor pair contains the pixel values of all pixels corresponding to one dimension.
  • the obtained tensor feature information is coded through a preset coding neural network.
  • a convolutional neural network CNN
  • the tensor feature information can be used as the input of the coding neural network.
  • Information the middle layer of the coding neural network performs correlation calculations on the input tensor feature information, and the output layer of the coding neural network outputs the corresponding coding feature information.
  • the size of the obtained encoded feature information is the same as the size of the tensor feature information.
  • step S112 includes sub-steps S1121 and S1122.
  • the coding neural network is configured with multiple convolutional layers.
  • the tensor feature information can be convolved through multiple convolutional layers to obtain convolutional feature vectors corresponding to multiple convolutional layers.
  • the coding neural network contains multiple convolutional layers.
  • a convolution layer multiple convolution layers are set in series.
  • the convolution layer can be used to convolve each tensor feature value in the tensor feature information.
  • the previous convolution layer convolves the feature value to obtain
  • the convolution result can be used as input information to the next convolution layer for convolution processing.
  • the convolution feature vectors of multiple convolution layers can be affine transformed respectively according to the affine transformation network.
  • the convolution feature vector corresponding to the convolution layer can be used as the basic feature vector of the document image to be processed.
  • the convolution feature vector of each convolution layer is subjected to affine transformation through the affine transformation network, and the obtained convolution feature vector is subjected to affine transformation to obtain the feature code.
  • Each convolution feature vector corresponds to a feature code, and the obtained The feature encoding corresponding to each convolution feature vector can obtain the coded feature information corresponding to the tensor feature information.
  • Each pixel point in the document image to be processed corresponds to a pixel coordinate position.
  • the pixel coordinate position can represent the position of the pixel point in the document image to be processed.
  • Each pixel coordinate position in the encoding feature information can be determined based on the pixel coordinate position of the document image to be processed.
  • the pixel coordinate position corresponding to the feature encoding, and segment the feature encoding information according to the pixel coordinate position of the feature encoding convert the segmented multi-dimensional encoding features to obtain a single-dimensional encoding feature vector, and the encoding feature vector is combined with the encoding feature
  • step S120 includes sub-steps S121, S122 and S123.
  • the corresponding pixel coordinate position can be added to each feature code, and the pixel coordinate position added to the feature code is expressed in a plane coordinate manner.
  • the pixel coordinate position corresponding to each feature encoding can also be determined based on the pixel coordinate position of the document image to be processed converted into an image of a preset size.
  • the segmentation rules include specific information for segmenting the encoded feature information. It can be segmented according to the segmentation rules and the pixel coordinate position of each feature encoding to obtain multiple encoded feature blocks. Each encoded feature block contains a certain amount of encoding characteristics.
  • the segmentation rule is to divide it into 24 equal parts according to length and width, and the encoded feature information contains a total of 384 ⁇ 384 pieces. Then the encoded feature information can be divided into 16 ⁇ 16 small blocks, that is, a total of 576 blocks. Each encoded feature Each block contains 256 coding features.
  • the image is divided into blocks by dividing the length and width of the image into 24 equal parts. This is just for illustration. In actual application, the side length of the image can be divided into equal parts. It is an arbitrary share to achieve segmentation of the image.
  • the encoding feature vector in the input vector set can be input to the multi-head self-attention neural network, so that the corresponding image weight feature vector is calculated through the multi-head self-attention neural network.
  • the multi-head self-attention (Multi-Head Self-Attention) neural network represents the input encoding feature vector as a set of key-value pairs (K, V) and query Q, then K, V and Q respectively represent three elements , the dimensions of K and Q are equal.
  • the multi-heads in the multi-head self-attention neural network are multiple self-attention directions. The number of self-attention directions can be preset by the user.
  • the Transformer model of the technical method of this application is composed of the encoder (Transformer encoder) and a decoder (Transformer decoder).
  • the encoder can be constructed based on the above-mentioned multi-head self-attention neural network.
  • the corresponding image weight feature vector can be obtained by processing the input vector set through the encoder.
  • step S130 includes sub-steps S131 and S132.
  • the coding feature vectors can be input into multiple feature coding layers of the multi-head self-attention neural network respectively.
  • Each feature coding layer can input K, V and Q simultaneously.
  • the weight parameters configured in each feature coding layer are different. are different, the specific process of coding calculation through the feature coding layer can be expressed by the following formula:
  • the corresponding multi-head vector matrix can be calculated, in which d K is the number of dimensions of Q and K, K T is the vector matrix obtained by vector transformation of K, W Q , W K , W V are the weight matrices corresponding to Q, K and V respectively, i is the number of self-attention directions included in the multi-head self-attention network, head i is the calculation of the i-th self-attention direction in the current feature encoding layer result.
  • the obtained multi-head vector matrix of each feature encoding layer can be feature combined through the feature combination layer to obtain the corresponding image weight feature vector.
  • the feature combination layer can be composed of a normalization layer and a fully connected layer. Both the normalization layer and the fully connected layer can be constructed based on a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the multi-head vector matrix of each feature encoding layer is As the input information, the normalization layer is input, the output information of the normalization layer is input to the fully connected layer, and the image weight feature vector is obtained through the output of the fully connected layer.
  • the task information and the image weight feature vector can be combined according to different task information in the information extraction task, that is, the position identifier in the task information is added to the image weight feature vector.
  • the [BOS] logo can be added to the head of the image weight feature vector and the [EOS] logo to the tail.
  • the [BOS] logo is the starting logo for image information recognition
  • the [EOS] logo is It is the end mark for image information recognition; in the process of extracting image text information, [Start of project date] is added to the head and [End of project date] is added to the end. Special characters indicating corresponding meanings are included in [ ].
  • the combined feature vector can be obtained.
  • the image weight feature vector and the combined feature vector can be input to the decoder at the same time.
  • the decoder can combine the combined feature vector to perform vector integration decoding on the image weight feature vector to obtain the corresponding character encoding sequence.
  • the character encoding sequence is in the encoding form. Information recorded on characters, each character uniquely corresponds to a character code.
  • the decoder (Transformer decoder) of the technical method of this application is constructed from the first multi-head self-attention neural network, the second multi-head self-attention neural network and the feature decoding layer.
  • the image weight feature vector and the combination are processed by the decoder.
  • the feature vector can be processed to obtain the corresponding character encoding sequence.
  • step S150 includes sub-steps S151, S152 and S153.
  • the decoder includes a first multi-head self-attention neural network, and the combined feature vector can be input to the first multi-head self-attention neural network, so that the corresponding first multi-head self-attention neural network can be calculated through the first multi-head self-attention neural network.
  • a weight feature vector, the structure of the first multi-head self-attention neural network is similar to the above-mentioned multi-head self-attention neural network, then the specific calculation process of obtaining the first weight feature vector is also similar to the calculation process of obtaining the image weight feature vector, the difference It just lies in the number of heads, the number of layers and the weight parameters in the first multi-head self-attention neural network.
  • Feature weighted fusion is performed on the image weight feature vector according to the first weight feature vector and the second multi-head self-attention neural network of the decoder to obtain a fusion feature vector corresponding to the image weight feature vector. Since some of the image information contained in the document image to be processed is of high importance, it needs to be focused on; some of the image information is of low importance and can reduce the degree of attention.
  • the image features in the document image to be processed are characterized by the image weight feature vector. Then, the first weight feature vector can be fused with the image weight feature vector to achieve the feature-weighted fusion of some key image information contained in the document image to be processed. Focus.
  • the first weight feature vector and the image weight feature vector are simultaneously input into the second multi-head self-attention neural network for calculation, and the corresponding fusion weighting coefficient is obtained.
  • the first weight feature vector can be used as the Query value (Q value) of the attention neural network
  • the image weight feature vector can be used as the Key value (K value) of the attention neural network.
  • the calculation process of self-attention analysis through the second multi-head self-attention neural network is similar to the calculation process of obtaining the image weight feature vector mentioned above.
  • the image weight feature vector is weighted according to the obtained fusion weighting coefficient.
  • each feature value in the image weight feature vector needs to be weighted separately.
  • the corresponding weighted feature value can be obtained.
  • all weighted feature values are combined into a fusion feature vector corresponding to the image weight feature vector.
  • the calculation process of weighted calculation can be expressed by formula (3):
  • weight is the fusion weighting coefficient
  • K is the image weight feature vector
  • the encoder is also configured with a feature decoding layer, through which the fused feature vector can be decoded.
  • the value range of each vector value in the fused feature vector is [0,1].
  • the feature decoding layer can be constructed based on a convolutional neural network (CNN).
  • CNN convolutional neural network
  • the fused feature vector can be input to the feature decoding layer, and the fused feature vector can be associated and calculated through the feature decoding layer, so as to obtain the result from the feature decoding layer.
  • the output layer obtains the corresponding character encoding sequence.
  • Each character code in the character code sequence can consist of multiple digits, such as [3521].
  • S160 Parse the character encoding sequence according to preset encoding parsing rules to obtain document information corresponding to the information extraction task.
  • the character encoding sequence can be parsed according to the encoding parsing rules, thereby restoring the character encoding sequence into characters, and combining the characters with task information to obtain document information corresponding to the information extraction task.
  • step S160 includes sub-steps S161 and S162.
  • the character encoding sequence can be parsed according to the encoding parsing rules.
  • the encoding parsing rules include the correspondence between each character encoding and the corresponding character. According to the correspondence, the character encoding contained in the character encoding sequence can be converted into corresponding parsing. Characters, parsed characters can include Chinese characters, English characters, numbers, punctuation marks, etc.
  • the task information includes the parsing position.
  • the parsing position can be used to locate the document information to be extracted in the document image to be processed.
  • the parsing position corresponds to the position identifier added to the image weight feature vector.
  • the parsing position is also what needs to be done. The specific location where the parsed text content needs to be added to the task information. Specifically, if the parsing position corresponds to the start identifier and the end identifier in the combined feature vector, then the parsing characters located between the corresponding start identifier and the end identifier can be Add to the parsing position in the corresponding area in the task information to combine the parsing characters with the task information to obtain complete document information.
  • the task information is "[Information extraction starts] [Project date starts] "[Project date ends]", then according to the corresponding relationship between the identification and the parsed position, "October 10, 2021" can be added to the above task information, and the obtained document information is "[Information extraction starts] October 10, 2021" Day...[End of information extraction]”.
  • the document information extraction method based on image processing provided by the embodiment of the present application
  • feature encoding is performed on the document image to be processed in the information extraction task to obtain the encoded feature information
  • the input vector set is obtained by segmentation and conversion
  • the input vector set is input into the multi-head
  • the self-attention neural network calculates the image weight feature vector, combines the task information in the information extraction task with the image weight feature vector to obtain the combined feature vector, and inputs the image weight feature vector and the combined feature vector simultaneously into the decoder for vector integration decoding.
  • Character encoding sequence parse the character encoding sequence to obtain document information corresponding to the information extraction task.
  • Embodiments of the present application also provide an image processing-based document information extraction device.
  • the image processing-based document information extraction device can be configured in a user terminal or a management server.
  • the image processing-based document information extraction device is used to perform the aforementioned Any embodiment of a document information extraction method based on image processing.
  • Figure 8 is a schematic block diagram of a document information extraction device based on image processing provided by an embodiment of the present application.
  • the document information extraction device 100 based on image processing includes a coding feature information acquisition unit 110, an input vector set acquisition unit 120, an image weight feature vector acquisition unit 130, a combined feature vector acquisition unit 140, and a character encoding sequence acquisition unit. 150 and document information acquisition unit 160.
  • the coding feature information acquisition unit 110 is configured to receive the input information extraction task, and perform feature coding processing on the document image to be processed in the information extraction task to obtain corresponding coding feature information.
  • the input vector set acquisition unit 120 is configured to segment and convert the encoding feature information according to the pixel coordinate position of the document image to be processed, so as to obtain an input vector set composed of multiple encoding feature vectors.
  • the image weight feature vector acquisition unit 130 is used to input the input vector set into a preset multi-head self-attention neural network to calculate the corresponding image weight feature vector.
  • the combined feature vector acquisition unit 140 is configured to combine the task information in the information extraction task with the image weight feature vector to obtain a combined feature vector.
  • the character encoding sequence acquisition unit 150 is configured to input the image weight feature vector and the combined feature vector to a preset decoder at the same time, so as to perform vector integration decoding to obtain the corresponding character encoding sequence.
  • the document information acquisition unit 160 is configured to parse the character encoding sequence according to preset encoding parsing rules to obtain document information corresponding to the information extraction task.
  • the above-mentioned image processing-based document information extraction method is used to perform feature encoding on the document image to be processed in the information extraction task to obtain the coded feature information and perform segmentation and conversion to obtain the input.
  • Vector set input the input vector set into the multi-head self-attention neural network to calculate the image weight feature vector, combine the task information in the information extraction task with the image weight feature vector to obtain the combined feature vector, and combine the image weight feature vector with the combined feature vector
  • the decoder is input to perform vector integration and decoding to obtain a character encoding sequence, and the character encoding sequence is parsed to obtain document information corresponding to the information extraction task.
  • the above image processing-based document information extraction device can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in Figure 9.
  • FIG. 9 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • the computer device may be a user terminal or a management server for executing a document information extraction method based on image processing to extract document information corresponding to the information extraction task from a document image to be processed in the information extraction task.
  • the computer device 500 includes a processor 502 , a memory, and a network interface 505 connected through a system bus 501 , where the memory may include a storage medium 503 and an internal memory 504 .
  • the storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the computer program 5032 When executed, it can cause the processor 502 to execute the document information extraction method based on image processing, wherein the storage medium 503 can be a volatile storage medium or a non-volatile storage medium.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500 .
  • the internal memory 504 provides an environment for the execution of the computer program 5032 in the storage medium 503.
  • the computer program 5032 When executed by the processor 502, it can cause the processor 502 to execute a document information extraction method based on image processing.
  • the network interface 505 is used for network communication, such as providing transmission of data information, etc.
  • the network interface 505 is used for network communication, such as providing transmission of data information, etc.
  • the specific computer device 500 may include more or fewer components than shown, some combinations of components, or a different arrangement of components.
  • the processor 502 is used to run the computer program 5032 stored in the memory to implement the corresponding functions in the above image processing-based document information extraction method.
  • the embodiment of the computer device shown in Figure 9 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or fewer components than shown in the figure. Or combining certain parts, or different parts arrangements.
  • the computer device may only include a memory and a processor. In such an embodiment, the structure and function of the memory and processor are consistent with the embodiment shown in FIG. 9 and will not be described again.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general processor may be a microprocessor or the processor may be any conventional processor.
  • a computer-readable storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, the steps included in the above image processing-based document information extraction method are implemented.
  • the disclosed equipment, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only for logical functions. In actual implementation, there may be other division methods, and units with the same functions may also be assembled into one unit. Units, such as multiple units or components, may be combined or may be integrated into another system, or some features may be omitted, or not performed.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be an indirect coupling or communication connection through some interfaces, devices or units, or may be electrical, mechanical or other forms of connection.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiments of the present application.
  • each functional unit in various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a computer programmable computer.
  • Reading the storage medium includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned computer-readable storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or optical disk and other media that can store program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

本申请公开了基于图像处理的文档信息抽取方法、装置、设备及介质,方法包括:对信息抽取任务的待处理文档图像进行特征编码得到编码特征信息并进行切分转换得到输入向量集,将输入向量集输入多头自注意力神经网络以计算得到图像权重特征向量,对信息抽取任务中的任务信息与图像权重特征向量组合得到组合特征向量,将图像权重特征向量与组合特征向量同时输入解码器进行向量整合解码得到字符编码序列,对字符编码序列进行解析得到与信息抽取任务对应的文档信息。

Description

基于图像处理的文档信息抽取方法、装置、设备及介质
本申请要求于2022年05月17日提交中国专利局、申请号为202210533116.6,发明名称为“基于图像处理的文档信息抽取方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及文档信息识别技术领域,尤其涉及一种基于图像处理的文档信息抽取方法、装置、设备及介质。
背景技术
随着人工智能技术的发展,现有技术可以对用户输入的图像进行处理,以从中识别得到对应的文字内容并进一步提取得到所需文本信息,然而发明人现有的图像识别方法通常基于复杂模型结构进行识别来实现,进行图像分析识别的过程中计算量太大,且需要耗费大量时间,导致对图像进行分析识别时存在抽取效率较低的问题,影响了下游任务的顺利进行,导致无法高效从图像中准确抽取所需的文档信息。因此,现有的技术方法存在无法高效地从图像中准确抽取所需文档信息的问题。
发明内容
本申请实施例提供了一种基于图像处理的文档信息抽取方法、装置、设备及介质,旨在解决现有技术方法中所存在的无法高效地从图像中准确抽取所需文档信息的问题。
第一方面,本申请实施例提供了一种基于图像处理的文档信息抽取方法,所述方法包括:
接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处理得到对应的编码特征信息;
根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集;
将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量;
对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量;
将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列;
根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
第二方面,本申请实施例提供了一种基于图像处理的文档信息抽取装置,其包括:
编码特征信息获取单元,用于接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处理得到对应的编码特征信息;
输入向量集获取单元,用于根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集;
图像权重特征向量获取单元,用于将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量;
组合特征向量获取单元,用于对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量;
字符编码序列获取单元,用于将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列;
文档信息获取单元,用于根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的基于图像处理的文档信息抽取方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的基于图像处理的文档信息抽取方法。
本申请实施例提供了一种基于图像处理的文档信息抽取方法、装置、设备及介质,对信息抽取任务的待处理文档图像进行特征编码得到编码特征信息并进行切分转换得到输入向量集,将输入向量集输入多头自注意力神经网络以计算得到图像权重特征向量,对信息抽取任务中的任务信息与图像权重特征向量组合得到组合特征向量,将图像权重特征向量与组合特征向量同时输入解码器进行向量整合解码得到字符编码序列,对字符编码序列进行解析得到与信息抽取任务对应的文档信息。通过上述方法,将图像分析识别处理与文本信息抽取进行结合,大幅提高了对文档信息进行抽取的效率,通过灵活调整多头自注意力神经网络及信息抽取任务,可使文档信息抽取方法适用于各种文档图像,提高了文档信息抽取的灵活性。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的基于图像处理的文档信息抽取方法的流程示意图;
图2为本申请实施例提供的基于图像处理的文档信息抽取方法的子流程示意图;
图3为本申请实施例提供的基于图像处理的文档信息抽取方法的另一子流程示意图;
图4为本申请实施例提供的基于图像处理的文档信息抽取方法的又一子流程示意图;
图5为本申请实施例提供的基于图像处理的文档信息抽取方法的再一子流程示意图;
图6为本申请实施例提供的基于图像处理的文档信息抽取方法的后一子流程示意图;
图7为本申请实施例提供的基于图像处理的文档信息抽取方法的其后又一子流程示意图;
图8为本申请实施例提供的基于图像处理的文档信息抽取装置的示意性框图;
图9为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
请参阅图1,图1为本申请实施例提供的基于图像处理的文档信息抽取方法的流程示意图;该基于图像处理的文档信息抽取方法应用于用户终端或管理服务器中,通过安装于用户终端或管理服务器中的应用软件进行执行;用户终端可用于执行基于图像处理的文档信息抽取方法以对输入的信息抽取任务进行分析并从信息抽取任务的待处理文档图像中提取与信息抽取任务对应的文档信息,用户终端可以是台式电脑、笔记本电脑、平板电脑或手机等终端设备,管理服务器即是用于执行基于图像处理的文档信息抽取方法以对用户终端上传的信息抽取任务进行分析并从信息抽取任务的待处理文档图像中提取与信息抽取任务对应的文档信息的服务器端,如企业或政府部门内部所构建的服务器端。如图1所示,该方法包括步骤S110~S160。
S110、接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处理得到对应的编码特征信息。
若接收到用户所输入的信息抽取任务,可对信息抽取任务中的待处理文档图像进行处理,其中信息抽取任务包括待处理文档图像及任务信息,信息抽取任务中包含的待处理文档图像可以为一张或多张,若信息抽取任务中包含多张待处理文档图像,则多张待处理文档图像的文档类型均相同;任务信息可以是与一类文档类型对应的信息。如信息抽取任务为发票信息抽取任务,则该信息抽取任务中可包含一张或多种发票图像,发票图像即为待处理文档图像,任务信息即为与发票对应的任务设定信息。信息抽取任务还可以是合同信息抽取任务、表格信息抽取任务等。
本申请实施例中的方法基于Transformer模型实现,在基于Transformer模型对待处理文档图像进行分析之前,可先对待处理文档进行特征编码,将得到的编码特征信息输入至Transformer模型进行后续分析从而实现文档信息抽取。
本申请实施例采用信息抽取任务中仅包含一张待处理文档图像进行举例说明,对信息抽取任务中文档类型相同的多张待处理文档图像进行文档信息抽取的方法可以以此类推。可对信息抽取任务中的待处理文档图像进行特征编码,得到对应的编码特征信息,编码特征信息 也即是采用图像编码形式对待处理文档图像的特征进行表示的信息。
在一实施例中,如图2所示,步骤S110包括子步骤S111和S112。
S111、根据预置图像转换规则将所述待处理文档图像转换为对应的张量特征信息。
如可先将待处理文档图像转换为预设尺寸大小的图像,并根据图像转换规则将预设尺寸大小的图像转换为对应的张量特征信息。例如,在本申请实施例中,可将待处理文档图像转换为384×384大小的图像,在转换为张量特征信息,本申请实施例中将图像转换为三阶张量,张量特征信息中的数值取值范围均为[0,1],本实施例中仅仅是用尺寸为384×384大小的图像进行举例说明,实际应用过程中可将待处理文档图像转换为其它任意尺寸的图像进行后续处理,如将待处理文档图像转换为边长小于384的图像,以得到Transformer模型可以接受的图片尺寸大小;具体的,若图像转换规则为RGB转换规则,则将转换后得到的预设尺寸大小的图像转换为RGB对应的张量特征信息,张量特征信息中包含与R、G、B三个颜色通道分别对应的三阶张量,每一阶张量对应包含一个颜色通道对应的所有像素点的像素值;若图像转换规则为HSB转换规则,则将转换后得到的预设尺寸大小的图像转换为HSB对应的张量特征信息,张量特征信息中包含与H(色泽)、S(饱和度)、B(亮度)三个维度分类对应的三阶张量,每一阶张量对包含一个维度对应的所有像素点的像素值。
S112、根据预置的编码神经网络对所述张量特征信息进行编码处理,以得到对应编码特征信息。
通过预置的编码神经网络对所得到的张量特征信息进行编码处理,例如,可采用卷积神经网络(convolutional neural network,CNN)作为编码神经网络,将张量特征信息作为编码神经网络的输入信息,编码神经网络的中间层对所输入的张量特征信息进行关联计算,并由编码神经网络的输出层输出对应的编码特征信息。所得到的编码特征信息的尺寸大小与张量特征信息的尺寸大小相同。
在一实施例中,如图3所示,步骤S112包括子步骤S1121和S1122。
S1121、根据所述编码神经网络中的多个卷积层分别对所述张量特征信息进行卷积处理,以得到多个所述卷积层分别对应的卷积特征向量。
编码神经网络中配置有多个卷积层,可通过多个卷积层分别对张量特征信息进行卷积处理,从而得到多个卷积层分别对应的卷积特征向量编码神经网络中包含多个卷积层,多个卷积层进行串联设置,卷积层可用于对张量特征信息中每一张量特征值进行卷积处理,上一卷积层对特征值进行卷积处理所得到的卷积结果可作为输入信息输入至下一卷积层进行卷积处理。
S1122、根据所述编码神经网络中的仿射变换网络对多个所述卷积层的卷积特征向量分别进行仿射变换,以得到与所述张量特征信息对应的编码特征信息。
可根据仿射变换网络对多个卷积层的卷积特征向量分别进行仿射变换,具体的,卷积层所对应的卷积特征向量可作为待处理文档图像的基础特征向量进行使用,可通过仿射变换网络将每一卷积层的卷积特征向量分别进行仿射变换,将所得到的卷积特征向量进行仿射变换得到特征编码,每一个卷积特征向量对应一个特征编码,获取与每一卷积特征向量对应的特 征编码即可得到与张量特征信息对应的编码特征信息。
S120、根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集。
待处理文档图像中每一像素点对应一个像素坐标位置,像素坐标位置即可对像素点在待处理文档图像中位置进行表示,可根据待处理文档图像的像素坐标位置确定编码特征信息中每一特征编码对应的像素坐标位置,并根据特征编码的像素坐标位置对特征编码信息进行切分,将切分得到的多维编码特征进行转换得到单维的编码特征向量,编码特征向量组合作为与编码特征信息对应的输入向量集。
在一实施例中,如图4所示,步骤S120包括子步骤S121、S122和S123。
S121、根据所述待处理文档图像的像素坐标位置在所述编码特征信息的每一特征编码中添加对应的像素坐标位置。
具体的,可根据待处理文档图像的像素坐标位置与相应特征编码的对应关系,在每一特征编码中添加对应的像素坐标位置,添加至特征编码中的像素坐标位置采用平面坐标方式进行表示,如(x,y)。在具体应用过程中,还可以根据对待处理文档图像转换为预设尺寸大小的图像的像素坐标位置确定每一特征编码对应的像素坐标位置。
S122、根据预置的切分规则及所述编码特征信息中所添加的像素坐标位置对所述编码特征信息进行切分以得到多个编码特征块。
切分规则中包括对编码特征信息进行切分的具体信息,可根据切分规则及每一特征编码的像素坐标位置进行切分,得到多个编码特征块,每一编码特征块中包含一定量的编码特征。
例如,切分规则为按长度及宽度等分24份,编码特征信息共包含384×384个,则可将编码特征信息切分为16×16的小块,即共有576块,每一编码特征块中均包含256个编码特征,本实施例中采用对图像长度及宽度等分为24份的方式对图像进行分块,仅仅是为了举例说明,实际应用过程中可将图像的边长等分为任意份额,以实现对图像进行切分。
S123、将每一所述编码特征块拉平为编码特征向量并组合以得到与所述编码特征信息对应的输入向量集。
将获取到的编码特征块拉平,也即是将采用二维数组表示的编码特征块转换为一维的编码特征向量,获取所有编码特征向量组合即可得到与编码特征信息对应的输入向量集。
S130、将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量。
可将输入向量集中的编码特征向量输入至多头自注意力神经网络,从而通过多头自注意力神经网络计算得到对应的图像权重特征向量。其中,多头自注意力((Multi-Head Self-Attention)神经网络将输入的编码特征向量表示为一组键值对(K,V)以及查询Q,则K、V及Q分别代表三个元素,K与Q的维度数相等,多头自注意力神经网络中的多头即多个自注意力方向,自注意力方向的数量可由用户预先设定。本申请技术方法的Transformer模型由编码器(Transformer encoder)及解码器(Transformer decoder)组成,编码器可基于上述多头自注意力神经网络构建得到,通过编码器对输入向量集进行处理即可得到对应的图像权重特 征向量。
在一实施例中,如图5所示,步骤S130包括子步骤S131和S132。
S131、将所述输入向量集中所包含的编码特征向量分别输入所述多头自注意力神经网络的多个特征编码层分别进行编码计算,得到与每一所述特征编码层对应的多头向量矩阵。
可将编码特征向量分别输入多头自注意力神经网络的多个特征编码层中,每一特征编码层可对应对K、V及Q同时进行输入,每一特征编码层中所配置的权重参数各不相同,通过特征编码层进行编码计算的具体过程可采用以下公式进行表示:
Figure PCTCN2022108443-appb-000001
head i=Attention(QW i Q,KW i K,KW i V)      (2);
将公式(1)与公式(2)结合,即可计算得到对应的多头向量矩阵,其d K即为Q和K的维度数,K T为对K进行向量转制得到向量矩阵,W Q、W K、W V分别为Q、K及V对应的权重矩阵,i为多头自注意力网络包含的自注意力方向的数量,head i即为当前特征编码层中第i个自注意力方向的计算结果。
S132、根据所述多头自注意力神经网络的特征组合层对每一所述特征编码层的多头向量矩阵进行特征组合,以得到对应的图像权重特征向量。
可通过特征组合层对所得到的每一特征编码层的多头向量矩阵进行特征组合,从而得到对应的图像权重特征向量。具体的,特征组合层可由范化层及全连接层组成,范化层及全连接层均可基于卷积神经网络(convolutional neural network,CNN)构建得到,每一特征编码层的多头向量矩阵均作为输入信息输入范化层,范化层的输出信息输入至全连接层,通过全连接层输出得到图像权重特征向量。
S140、对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量。
可根据信息抽取任务中不同的任务信息,将任务信息与图像权重特征向量进行组合,也即是将任务信息中的位置标识添加至图像权重特征向量中。例如,在图像信息识别过程中,可在图像权重特征向量的头部添加[BOS]标识、在尾部添加[EOS]标识,[BOS]标识为进行图像信息识别的起始标识,[EOS]标识为进行图像信息识别的结束标识;在图像文字信息提取过程中,头部添加[工程日期开始]、末尾添加[工程日期结束],[]内为表示相应含义的特殊字符。将任务信息中的位置标识与图像权重特征向量组合后,即可得到组合特征向量。
S150、将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列。
可将图像权重特征向量与组合特征向量同时输入至解码器,通过解码器可结合组合特征向量对图像权重特征向量进行向量整合解码,从而得到对应的字符编码序列,字符编码序列即为采用编码形式对字符进行记载的信息,每一个字符唯一对应一个字符编码。本申请技术方法的解码器(Transformer decoder)由第一多头自注意力神经网络、第二多头自注意力神经 网络及特征解码层构建得到,通过解码器对图像权重特征向量及所述组合特征向量进行处理即可得到对应的字符编码序列。
在一实施例中,如图6所示,步骤S150包括子步骤S151、S152和S153。
S151、将所述组合特征向量输入所述解码器的第一多头自注意力神经网络,以计算得到对应的第一权重特征向量。
具体的,解码器中包含第一多头自注意力神经网络,可将组合特征向量输入至第一多头自注意力神经网络,从而通过第一多头自注意力神经网络计算得到对应的第一权重特征向量,第一多头自注意力神经网络与上述多头自注意力神经网络的结构类似,则获取第一权重特征向量的具体计算过程也与获取图像权重特征向量的计算过程类似,区别仅仅在于第一多头自注意力神经网络中多头数量、层数量及权重参数。
S152、根据所述第一权重特征向量及所述解码器的第二多头自注意力神经网络对所述图像权重特征向量进行特征加权融合,以得到与所述图像权重特征向量对应的融合特征向量。
根据所述第一权重特征向量及解码器的第二多头自注意力神经网络对图像权重特征向量进行特征加权融合,得到与图像权重特征向量对应的融合特征向量。由于待处理文档图像中包含的部分图像信息重要性较高,因此需要重点关注;部分图像信息的重要性较低,可降低关注度。待处理文档图像中的图像特征采用图像权重特征向量进行表征,则可通过将第一权重特征向量与对图像权重特征向量进行特征加权融合,以实现对待处理文档图像中包含的部分重点图像信息进行重点关注。
将第一权重特征向量及图像权重特征向量同时输入第二多头自注意力神经网络进行计算,得到对应的融合加权系数。
具体的,在第二多头自注意力神经网络中,第一权重特征向量可作为注意力神经网络的Query值(Q值),图像权重特征向量可作为注意力神经网络的Key值(K值),通过第二多头自注意力神经网络进行自注意力分析的计算过程与上述获取图像权重特征向量的计算过程类似。
根据所述融合加权系数对所述图像权重特征向量进行加权计算,得到与所述图像权重特征向量对应的融合特征向量。
根据所得到的融合加权系数对图像权重特征向量进行加权计算,计算过程中需要对图像权重特征向量中每一特征值分别进行加权计算,每一特征值进行加权计算后可得到对应的加权特征值,所有加权特征值即组合为与图像权重特征向量对应的融合特征向量,具体的,加权计算的计算过程可采用公式(3)进行表示:
output=weight×K       (3);
其中,weight为融合加权系数,K为图像权重特征向量。
S153、根据所述解码器的特征解码层对所述融合特征向量进行解码,以得到对应的字符编码序列。
编码器中还配置有特征解码层,可通过特征解码层对融合特征向量进行解码,融合特征向量中每一向量值的取值范围均为[0,1]。具体的,特征解码层可基于卷积神经网络 (convolutional neural network,CNN)构建得到,可将融合特征向量输入至特征解码层,通过特征解码层对融合特征向量进行关联计算,从而从特征解码层的输出层获取对应的字符编码序列。字符编码序列中的每一字符编码均可由多位数字组成,如[3521]。
S160、根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
可根据编码解析规则对字符编码序列进行解析,从而将字符编码序列还原成字符,并将字符与任务信息组合,从而得到与信息抽取任务对应的文档信息。
在一实施例中,如图7所示,步骤S160包括子步骤S161和S162。
S161、根据所述编码解析规则对所述字符编码序列进行解析,以还原得到与所述字符编码序列对应的解析字符。
可根据编码解析规则对字符编码序列进行解析,编码解析规则中包括每一字符编码与相应字符之间的对应关系,根据该对应关系即可将字符编码序列中所包含的字符编码转换为相应解析字符,解析字符可包括中文字符、英文字符、数字、标点符号等。
S162、根据所述解析位置将所述解析字符添加至所述任务信息中与所述解析位置对应的区域,对所述解析字符与所述任务信息组合得到对应的文档信息。
任务信息中包含解析位置,解析位置可用于对所述待处理文档图像中待抽取文档信息进行定位,解析位置与上述添加至图像权重特征向量中的位置标识相对应,解析位置也即所需进行解析的文本内容所需添加至任务信息中的具体位置。具体的,解析位置与组合特征向量中的起始标识及结束标识相对应,则可根据起始标识及结束标识与解析位置的对应关系,将位于相应起始标识与结束标识之间的解析字符添加至解析位置在任务信息中对应的区域中,以将解析字符与任务信息进行组合得到完整的文档信息。
例如,任务信息为“[信息抽取开始][工程日期开始]XXXXX[工程日期结束]……[信息抽取结束]”,所得到的某一段解析字符为“[工程日期开始]2021年10月10日[工程日期结束]”,则根据标识与解析位置的对应关系,可将“2021年10月10日”添加至上述任务信息中,得到文档信息为“[信息抽取开始]2021年10月10日……[信息抽取结束]”。
在本申请实施例所提供的基于图像处理的文档信息抽取方法中,对信息抽取任务的待处理文档图像进行特征编码得到编码特征信息并进行切分转换得到输入向量集,将输入向量集输入多头自注意力神经网络以计算得到图像权重特征向量,对信息抽取任务中的任务信息与图像权重特征向量组合得到组合特征向量,将图像权重特征向量与组合特征向量同时输入解码器进行向量整合解码得到字符编码序列,对字符编码序列进行解析得到与信息抽取任务对应的文档信息。通过上述方法,将图像分析识别处理与文本信息抽取进行结合,大幅提高了对文档信息进行抽取的效率,通过灵活调整多头自注意力神经网络及信息抽取任务,可使文档信息抽取方法适用于各种文档图像,提高了文档信息抽取的灵活性。
本申请实施例还提供一种基于图像处理的文档信息抽取装置,该基于图像处理的文档信息抽取装置可配置于用户终端或管理服务器中,该基于图像处理的文档信息抽取装置用于执行前述的基于图像处理的文档信息抽取方法的任一实施例。具体地,请参阅图8,图8为本 申请实施例提供的基于图像处理的文档信息抽取装置的示意性框图。
如图8所示,基于图像处理的文档信息抽取装置100包括编码特征信息获取单元110、输入向量集获取单元120、图像权重特征向量获取单元130、组合特征向量获取单元140、字符编码序列获取单元150和文档信息获取单元160。
编码特征信息获取单元110,用于接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处理得到对应的编码特征信息。
输入向量集获取单元120,用于根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集。
图像权重特征向量获取单元130,用于将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量。
组合特征向量获取单元140,用于对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量。
字符编码序列获取单元150,用于将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列。
文档信息获取单元160,用于根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
在本申请实施例所提供的基于图像处理的文档信息抽取装置应用上述基于图像处理的文档信息抽取方法,对信息抽取任务的待处理文档图像进行特征编码得到编码特征信息并进行切分转换得到输入向量集,将输入向量集输入多头自注意力神经网络以计算得到图像权重特征向量,对信息抽取任务中的任务信息与图像权重特征向量组合得到组合特征向量,将图像权重特征向量与组合特征向量同时输入解码器进行向量整合解码得到字符编码序列,对字符编码序列进行解析得到与信息抽取任务对应的文档信息。通过上述方法,将图像分析识别处理与文本信息抽取进行结合,大幅提高了对文档信息进行抽取的效率,通过灵活调整多头自注意力神经网络及信息抽取任务,可使文档信息抽取方法适用于各种文档图像,提高了文档信息抽取的灵活性。
上述基于图像处理的文档信息抽取装置可以实现为计算机程序的形式,该计算机程序可以在如图9所示的计算机设备上运行。
请参阅图9,图9是本申请实施例提供的计算机设备的示意性框图。该计算机设备可以是用于执行基于图像处理的文档信息抽取方法以从信息抽取任务的待处理文档图像中提取与信息抽取任务对应的文档信息的用户终端或管理服务器。
参阅图9,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括存储介质503和内存储器504。
该存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行基于图像处理的文档信息抽取方法,其中,存储介质503可以为易失性的存储介质或非易失性的存储介质。
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。
该内存储器504为存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行基于图像处理的文档信息抽取方法。
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现上述的基于图像处理的文档信息抽取方法中对应的功能。
本领域技术人员可以理解,图9中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图9所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为易失性或非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现上述的基于图像处理的文档信息抽取方法中所包含的步骤。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为逻辑功能划分,实际实现时可以有另外的划分方式,也可以将具有相同功能的单元集合成一个单元,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部 件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个计算机可读存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的计算机可读存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (21)

  1. 一种基于图像处理的文档信息抽取方法,所述方法包括:
    接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处理得到对应的编码特征信息;
    根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集;
    将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量;
    对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量;
    将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列;
    根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
  2. 根据权利要求1所述的基于图像处理的文档信息抽取方法,其中,所述对所述待处理文档图像进行特征编码处理得到对应的编码特征信息,包括:
    根据预置图像转换规则将所述待处理文档图像转换为对应的张量特征信息;
    根据预置的编码神经网络对所述张量特征信息进行编码处理,以得到对应编码特征信息。
  3. 根据权利要求2所述的基于图像处理的文档信息抽取方法,其中,所述根据预置的编码神经网络对所述张量特征信息进行编码处理,以得到对应编码特征信息,包括:
    根据所述编码神经网络中的多个卷积层分别对所述张量特征信息进行卷积处理,以得到多个所述卷积层分别对应的卷积特征向量;
    根据所述编码神经网络中的仿射变换网络对多个所述卷积层的卷积特征向量分别进行仿射变换,以得到与所述张量特征信息对应的编码特征信息。
  4. 根据权利要求1所述的基于图像处理的文档信息抽取方法,其中,所述根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集,包括:
    根据所述待处理文档图像的像素坐标位置在所述编码特征信息的每一特征编码中添加对应的像素坐标位置;
    根据预置的切分规则及所述编码特征信息中所添加的像素坐标位置对所述编码特征信息进行切分以得到多个编码特征块;
    将每一所述编码特征块拉平为编码特征向量并组合以得到与所述编码特征信息对应的输入向量集。
  5. 根据权利要求1所述的基于图像处理的文档信息抽取方法,其中,所述将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量,包括:
    将所述输入向量集中所包含的编码特征向量分别输入所述多头自注意力神经网络的多个特征编码层分别进行编码计算,得到与每一所述特征编码层对应的多头向量矩阵;
    根据所述多头自注意力神经网络的特征组合层对每一所述特征编码层的多头向量矩阵进 行特征组合,以得到对应的图像权重特征向量。
  6. 根据权利要求1所述的基于图像处理的文档信息抽取方法,其中,所述将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列,包括:
    将所述组合特征向量输入所述解码器的第一多头自注意力神经网络,以计算得到对应的第一权重特征向量;
    根据所述第一权重特征向量及所述解码器的第二多头自注意力神经网络对所述图像权重特征向量进行特征加权融合,以得到与所述图像权重特征向量对应的融合特征向量;
    根据所述解码器的特征解码层对所述融合特征向量进行解码,以得到对应的字符编码序列。
  7. 根据权利要求1所述的基于图像处理的文档信息抽取方法,其中,所述任务信息中包括对所述待处理文档图像中待抽取的文档信息进行定位的解析位置,所述根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息,包括:
    根据所述编码解析规则对所述字符编码序列进行解析,以还原得到与所述字符编码序列对应的解析字符;
    根据所述解析位置将所述解析字符添加至所述任务信息中与所述解析位置对应的区域,对所述解析字符与所述任务信息组合得到对应的文档信息。
  8. 根据权利要求1所述的基于图像处理的文档信息抽取方法,其中,对所述任务信息与所述图像权重特征向量组合,包括:将所述任务信息中各所述位置标识添加至相应的所述图像权重特征向量中。
  9. 一种基于图像处理的文档信息抽取装置,其中,所述装置包括:
    编码特征信息获取单元,用于接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处理得到对应的编码特征信息;
    输入向量集获取单元,用于根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集;
    图像权重特征向量获取单元,用于将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量;
    组合特征向量获取单元,用于对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量;
    字符编码序列获取单元,用于将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列;
    文档信息获取单元,用于根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
  10. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:
    接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处 理得到对应的编码特征信息;
    根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集;
    将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量;
    对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量;
    将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列;
    根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
  11. 根据权利要求10所述的计算机设备,其中,所述对所述待处理文档图像进行特征编码处理得到对应的编码特征信息,包括:
    根据预置图像转换规则将所述待处理文档图像转换为对应的张量特征信息;
    根据预置的编码神经网络对所述张量特征信息进行编码处理,以得到对应编码特征信息。
  12. 根据权利要求11所述的计算机设备,其中,所述根据预置的编码神经网络对所述张量特征信息进行编码处理,以得到对应编码特征信息,包括:
    根据所述编码神经网络中的多个卷积层分别对所述张量特征信息进行卷积处理,以得到多个所述卷积层分别对应的卷积特征向量;
    根据所述编码神经网络中的仿射变换网络对多个所述卷积层的卷积特征向量分别进行仿射变换,以得到与所述张量特征信息对应的编码特征信息。
  13. 根据权利要求10所述的计算机设备,其中,所述根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集,包括:
    根据所述待处理文档图像的像素坐标位置在所述编码特征信息的每一特征编码中添加对应的像素坐标位置;
    根据预置的切分规则及所述编码特征信息中所添加的像素坐标位置对所述编码特征信息进行切分以得到多个编码特征块;
    将每一所述编码特征块拉平为编码特征向量并组合以得到与所述编码特征信息对应的输入向量集。
  14. 根据权利要求10所述的计算机设备,其中,所述将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量,包括:
    将所述输入向量集中所包含的编码特征向量分别输入所述多头自注意力神经网络的多个特征编码层分别进行编码计算,得到与每一所述特征编码层对应的多头向量矩阵;
    根据所述多头自注意力神经网络的特征组合层对每一所述特征编码层的多头向量矩阵进行特征组合,以得到对应的图像权重特征向量。
  15. 根据权利要求10所述的计算机设备,其中,所述将所述图像权重特征向量及所述组 合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列,包括:
    将所述组合特征向量输入所述解码器的第一多头自注意力神经网络,以计算得到对应的第一权重特征向量;
    根据所述第一权重特征向量及所述解码器的第二多头自注意力神经网络对所述图像权重特征向量进行特征加权融合,以得到与所述图像权重特征向量对应的融合特征向量;
    根据所述解码器的特征解码层对所述融合特征向量进行解码,以得到对应的字符编码序列。
  16. 根据权利要求10所述的计算机设备,其中,所述任务信息中包括对所述待处理文档图像中待抽取的文档信息进行定位的解析位置,所述根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息,包括:
    根据所述编码解析规则对所述字符编码序列进行解析,以还原得到与所述字符编码序列对应的解析字符;
    根据所述解析位置将所述解析字符添加至所述任务信息中与所述解析位置对应的区域,对所述解析字符与所述任务信息组合得到对应的文档信息。
  17. 根据权利要求10所述的计算机设备,其中,对所述任务信息与所述图像权重特征向量组合,包括:将所述任务信息中各所述位置标识添加至相应的所述图像权重特征向量中。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,当所述计算机程序被处理器执行时,实现以下操作:
    接收所输入的信息抽取任务,对所述信息抽取任务中的待处理文档图像进行特征编码处理得到对应的编码特征信息;
    根据所述待处理文档图像的像素坐标位置对所述编码特征信息进行切分转换,以得到由多个编码特征向量组成的输入向量集;
    将所述输入向量集输入至预置的多头自注意力神经网络,以计算得到对应的图像权重特征向量;
    对所述信息抽取任务中的任务信息与所述图像权重特征向量组合以得到组合特征向量;
    将所述图像权重特征向量及所述组合特征向量同时输入至预置的解码器,以进行向量整合解码得到对应的字符编码序列;
    根据预置的编码解析规则对所述字符编码序列进行解析,以得到与所述信息抽取任务对应的文档信息。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述对所述待处理文档图像进行特征编码处理得到对应的编码特征信息,包括:
    根据预置图像转换规则将所述待处理文档图像转换为对应的张量特征信息;
    根据预置的编码神经网络对所述张量特征信息进行编码处理,以得到对应编码特征信息。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述根据预置的编码神经网络对所述张量特征信息进行编码处理,以得到对应编码特征信息,包括:
    根据所述编码神经网络中的多个卷积层分别对所述张量特征信息进行卷积处理,以得到 多个所述卷积层分别对应的卷积特征向量;
    根据所述编码神经网络中的仿射变换网络对多个所述卷积层的卷积特征向量分别进行仿射变换,以得到与所述张量特征信息对应的编码特征信息。
  21. 根据权利要求18所述的基于图像处理的文档信息抽取方法,其中,对所述任务信息与所述图像权重特征向量组合,包括:将所述任务信息中各所述位置标识添加至相应的所述图像权重特征向量中。
PCT/CN2022/108443 2022-05-17 2022-07-28 基于图像处理的文档信息抽取方法、装置、设备及介质 WO2023221293A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210533116.6A CN114663896B (zh) 2022-05-17 2022-05-17 基于图像处理的文档信息抽取方法、装置、设备及介质
CN202210533116.6 2022-05-17

Publications (1)

Publication Number Publication Date
WO2023221293A1 true WO2023221293A1 (zh) 2023-11-23

Family

ID=82037453

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/108443 WO2023221293A1 (zh) 2022-05-17 2022-07-28 基于图像处理的文档信息抽取方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN114663896B (zh)
WO (1) WO2023221293A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663896B (zh) * 2022-05-17 2022-08-23 深圳前海环融联易信息科技服务有限公司 基于图像处理的文档信息抽取方法、装置、设备及介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089533A1 (en) * 2016-09-27 2018-03-29 Abbyy Development Llc Automated methods and systems for locating document subimages in images to facilitate extraction of information from the located document subimages
CN112036292A (zh) * 2020-08-27 2020-12-04 平安科技(深圳)有限公司 基于神经网络的文字识别方法、装置及可读存储介质
CN112269872A (zh) * 2020-10-19 2021-01-26 北京希瑞亚斯科技有限公司 简历解析方法、装置、电子设备及计算机存储介质
CN112633290A (zh) * 2021-03-04 2021-04-09 北京世纪好未来教育科技有限公司 文本识别方法、电子设备及计算机可读介质
CN113283427A (zh) * 2021-07-20 2021-08-20 北京世纪好未来教育科技有限公司 文本识别方法、装置、设备及介质
CN114359941A (zh) * 2022-01-10 2022-04-15 上海亿保健康管理有限公司 发票信息的抽取方法、装置、电子设备及存储介质
CN114419642A (zh) * 2021-12-14 2022-04-29 北京易道博识科技有限公司 一种文档图像中键值对信息的抽取方法、装置及系统
CN114663896A (zh) * 2022-05-17 2022-06-24 深圳前海环融联易信息科技服务有限公司 基于图像处理的文档信息抽取方法、装置、设备及介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6895552B1 (en) * 2000-05-31 2005-05-17 Ricoh Co., Ltd. Method and an apparatus for visual summarization of documents
US7702673B2 (en) * 2004-10-01 2010-04-20 Ricoh Co., Ltd. System and methods for creation and use of a mixed media environment
US8233726B1 (en) * 2007-11-27 2012-07-31 Googe Inc. Image-domain script and language identification
US7996390B2 (en) * 2008-02-15 2011-08-09 The University Of Utah Research Foundation Method and system for clustering identified forms
CN111433812A (zh) * 2017-12-03 2020-07-17 脸谱公司 动态对象实例检测、分割和结构映射的优化
CN111368542A (zh) * 2018-12-26 2020-07-03 北京大学 一种基于递归神经网络的文本语言关联抽取方法和系统
CN110097019B (zh) * 2019-05-10 2023-01-10 腾讯科技(深圳)有限公司 字符识别方法、装置、计算机设备以及存储介质
CN110737413A (zh) * 2019-10-15 2020-01-31 深圳前海环融联易信息科技服务有限公司 基于人脸识别的打印管理方法、装置、计算机设备
WO2021195133A1 (en) * 2020-03-23 2021-09-30 Sorcero, Inc. Cross-class ontology integration for language modeling
KR20220050356A (ko) * 2020-10-16 2022-04-25 삼성에스디에스 주식회사 문서 인식 장치 및 방법
CN112989970A (zh) * 2021-02-26 2021-06-18 北京百度网讯科技有限公司 文档版面分析方法、装置、电子设备及可读存储介质
CN113065549A (zh) * 2021-03-09 2021-07-02 国网河北省电力有限公司 基于深度学习的文档信息抽取方法及装置
CN113378580B (zh) * 2021-06-23 2022-11-01 北京百度网讯科技有限公司 文档版面分析方法、模型训练方法、装置和设备
CN114266230A (zh) * 2021-12-30 2022-04-01 安徽科大讯飞医疗信息技术有限公司 文本结构化处理方法、装置、存储介质及计算机设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089533A1 (en) * 2016-09-27 2018-03-29 Abbyy Development Llc Automated methods and systems for locating document subimages in images to facilitate extraction of information from the located document subimages
CN112036292A (zh) * 2020-08-27 2020-12-04 平安科技(深圳)有限公司 基于神经网络的文字识别方法、装置及可读存储介质
CN112269872A (zh) * 2020-10-19 2021-01-26 北京希瑞亚斯科技有限公司 简历解析方法、装置、电子设备及计算机存储介质
CN112633290A (zh) * 2021-03-04 2021-04-09 北京世纪好未来教育科技有限公司 文本识别方法、电子设备及计算机可读介质
CN113283427A (zh) * 2021-07-20 2021-08-20 北京世纪好未来教育科技有限公司 文本识别方法、装置、设备及介质
CN114419642A (zh) * 2021-12-14 2022-04-29 北京易道博识科技有限公司 一种文档图像中键值对信息的抽取方法、装置及系统
CN114359941A (zh) * 2022-01-10 2022-04-15 上海亿保健康管理有限公司 发票信息的抽取方法、装置、电子设备及存储介质
CN114663896A (zh) * 2022-05-17 2022-06-24 深圳前海环融联易信息科技服务有限公司 基于图像处理的文档信息抽取方法、装置、设备及介质

Also Published As

Publication number Publication date
CN114663896A (zh) 2022-06-24
CN114663896B (zh) 2022-08-23

Similar Documents

Publication Publication Date Title
JP7424078B2 (ja) 画像エンコーディング方法及び装置並びに画像デコーディング方法及び装置
US8195672B2 (en) Searching a repository of documents using a source image as a query
CN112001914A (zh) 深度图像补全的方法和装置
TW202032423A (zh) 圖像處理方法及裝置
CN109408058B (zh) 基于机器学习的前端辅助开发方法和装置
WO2021175040A1 (zh) 视频处理方法及相关装置
US20230386002A1 (en) Shadow elimination method and apparatus for text image, and electronic device
US11645787B2 (en) Color conversion between color spaces using reduced dimension embeddings
WO2022166258A1 (zh) 行为识别方法、装置、终端设备及计算机可读存储介质
WO2023221293A1 (zh) 基于图像处理的文档信息抽取方法、装置、设备及介质
EP3783503A1 (en) Information processing method, related device, and computer storage medium
CN116978011B (zh) 一种用于智能目标识别的图像语义通信方法及系统
Vázquez et al. Using normalized compression distance for image similarity measurement: an experimental study
US11501470B2 (en) Geometric encoding of data
WO2021051562A1 (zh) 人脸特征点定位方法、装置、计算设备和存储介质
Xing et al. Scale-arbitrary invertible image downscaling
CN115955513A (zh) 一种物联网数据优化传输方法
CN117523593B (zh) 患者病历数据处理方法及系统
CN113434672A (zh) 文本类型智能识别方法、装置、设备及介质
Ali et al. A meta-heuristic method for reassemble bifragmented intertwined JPEG image files in digital forensic investigation
WO2021072872A1 (zh) 基于字符转换的姓名存储方法、装置、计算机设备
Hu et al. Informative retrieval framework for histopathology whole slides images based on deep hashing network
CN114820666A (zh) 一种增加抠图精确度的方法、装置、计算机设备及存储介质
CN114372169A (zh) 一种同源视频检索的方法、装置以及存储介质
WO2020168526A1 (zh) 图像编码方法、设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22942324

Country of ref document: EP

Kind code of ref document: A1