WO2021128578A1 - 图像处理方法及装置、电子设备和存储介质 - Google Patents
图像处理方法及装置、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2021128578A1 WO2021128578A1 PCT/CN2020/077247 CN2020077247W WO2021128578A1 WO 2021128578 A1 WO2021128578 A1 WO 2021128578A1 CN 2020077247 W CN2020077247 W CN 2020077247W WO 2021128578 A1 WO2021128578 A1 WO 2021128578A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- target
- text
- extracted
- relative position
- Prior art date
Links
- 238000003860 storage Methods 0.000 title claims abstract description 32
- 238000003672 processing method Methods 0.000 title claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 111
- 238000000034 method Methods 0.000 claims abstract description 67
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 47
- 230000000007 visual effect Effects 0.000 claims description 52
- 238000012545 processing Methods 0.000 claims description 48
- 238000012549 training Methods 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 29
- 239000011159 matrix material Substances 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 21
- 238000012512 characterization method Methods 0.000 claims description 20
- 230000009466 transformation Effects 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 20
- 230000008569 process Effects 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 10
- 230000014509 gene expression Effects 0.000 description 9
- 230000008859 change Effects 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/42—Document-oriented image-based pattern recognition based on the type of document
Definitions
- the present disclosure relates to the field of computer technology, and in particular to an image processing method and device, electronic equipment, and storage medium.
- the extraction of key text information from images plays a very important role in automated office and other scenarios. For example, by extracting key text information in images, functions such as receipt information extraction, invoice information extraction, and identity information extraction can be realized.
- the recognized text When extracting the text in the image, the recognized text will be mapped to different fields for subsequent operations such as structured storage and display of the text. For example, if the recognized text is "19.88 yuan”, it is necessary to determine whether “19.88 yuan” corresponds to the field “total price” or the corresponding field “unit price”, so that "19.88 yuan” is subsequently stored as the value of a certain field.
- a template is defined in advance according to the arrangement rules of the text in the image, and the corresponding relationship between the text at a certain position and the field is defined in the template, so that the field corresponding to the recognized text at a certain position can be determined.
- the field corresponding to the recognized text at a certain position can be determined. For example, predefine the field corresponding to the text in the lower right corner of the image as “Total Price”, so that it can be determined that the field corresponding to "19.88 Yuan” identified in the lower right corner of the image is “Total Price”.
- the present disclosure proposes a technical solution for image processing.
- an image processing method including: recognizing an image, determining a plurality of target regions in the image, where the target region is the region where the text to be extracted is located; determining each of the images in the image The relative position feature between the target areas; determine the target feature of each target area, the target feature includes the feature of the text to be extracted; through the graph convolutional neural network, the relative position feature and the target feature Perform feature extraction to obtain the extracted feature; according to the extracted feature, determine the field corresponding to the text to be extracted.
- a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position feature between the target regions and the feature of the text to be extracted.
- the text extraction can be performed without relying on a fixed template. Compared with the method of text extraction based on a template, the accuracy of text extraction is higher when the text is extracted from an image without a suitable template.
- feature extraction is performed on the relative position feature and the target feature through a graph convolutional neural network to obtain the extracted features, including: taking each target feature as a node of the graph, Use each of the relative position features as the edges connecting two nodes to construct a connected graph; through the graph convolutional neural network, the connected graph is iteratively updated, and the connected graph that meets the convergence condition after the iterative update is used as the extracted feature .
- the constructed connected graph includes not only the target features in the image, but also the relative position features between the target features in the image, which can characterize the characteristics of the text in the image as a whole, and therefore can improve the key information The accuracy of the extraction results.
- graph convolutional neural networks can represent images in the form of connected graphs and extract features.
- Connected graph is composed of several nodes (Node) and edges (Edge) connecting two nodes. Edges are used to describe the relationship between different nodes. Therefore, the features extracted by the graph convolutional neural network can accurately characterize the relative position between the target regions and the features of the text to be extracted, so as to improve the accuracy of subsequent text extraction.
- determining the field corresponding to the text to be extracted according to the extracted features includes: according to a plurality of pre-defined preset categories, the nodes in the connected graph output by the graph convolutional neural network The classification is performed to obtain the category of the node.
- the preset category includes: the category of the characterization text belonging to the identifier of the preset field, and the category of the field value of the characterization text belonging to the preset field; according to the category of the node, the text to be extracted is determined The identifier or field value corresponding to the preset field.
- the identifier or field value of the text to be extracted corresponding to the preset field can be obtained, which improves The accuracy of text extraction is improved.
- determining the relative position characteristics between the target areas in the image includes: determining the relative position parameters of the first target area and the second target area in the image; Perform characterization processing to obtain the relative position characteristics of the first target area and the second target area.
- the relative position parameter includes at least one of the following: the lateral distance and the longitudinal distance of the first target area relative to the second target area; the aspect ratio of the first target area; The aspect ratio of the second target area; the relative size relationship between the first target area and the second target area.
- the relative position parameter includes the horizontal distance and the vertical distance, the aspect ratio of the first target area, and the relative size relationship between the first target area and the second target area, so that The extraction result of key information is more accurate.
- performing characterization processing on the relative position parameter to obtain the relative position characteristics of the first target area and the second target area includes: mapping the relative position parameter to a sine-cosine transformation matrix A D-dimensional space is used to obtain a D-dimensional eigenvector, where D is a positive integer; the D-dimensional eigenvector is converted into a 1-dimensional weight value by a preset weight matrix; the weight value is calculated by a preset activation function Perform processing to obtain relative position characteristics.
- the relative position parameter can be converted into the data format required by the edge of the graph convolutional neural network through the feature processing, which is convenient for subsequent feature extraction through the graph convolutional neural network.
- determining the target feature of each target area includes: determining pixel data in the target area, performing feature extraction on the pixel data to obtain visual features; determining text characters in the target area, Perform feature extraction on the text characters to obtain character features; and determine the target features of the target area according to the extracted visual features and character features.
- determining the target feature of the target area according to the extracted visual features and character features includes: assigning different weights to the visual features and character features; and assigning weights to the visual features It merges with the character feature to obtain the target feature of the target area.
- the method is implemented by a pre-built classification network, and the training steps of the classification network are as follows: the sample image is input into the classification network for processing, and the first part of the text to be extracted in the sample image is obtained. Prediction category, and the corresponding relationship between each category in the first prediction category; training the classification network according to the first prediction category and the label category of the sample image, the label category includes: characterization text The category of the identifier belonging to the preset field, and the category of the field value of the characterizing text belonging to the preset field; training the classification network according to the corresponding relationship and the corresponding relationship between the labeled texts to be extracted.
- the classification network can be trained more accurately by labeling the category of the sample image and the corresponding relationship between each category.
- the trained classification network performs text extraction on images without a suitable template. When the time, the accuracy is higher.
- the image includes at least one of the following: a receipt image, an invoice image, and a business card image.
- an image processing device including: a recognition module for recognizing an image and determining a plurality of target regions in the image, where the target region is the region where the text to be extracted is located;
- the location feature determination module is used to determine the relative location feature between each target area in the image;
- the target feature determination module is used to determine the target feature of each target area, the target feature includes the text to be extracted Features;
- graph convolution module used to extract features from the relative position feature and the target feature through the graph convolutional neural network, to obtain the extracted features; field determination module, used to determine according to the extracted features The field corresponding to the text to be extracted.
- a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position feature between the target regions and the feature of the text to be extracted.
- the text extraction can be performed without relying on a fixed template. Compared with the method of text extraction based on a template, the accuracy of text extraction is higher when the text is extracted from an image without a suitable template.
- the graph convolution module includes: a first graph convolution sub-module and a second graph convolution sub-module, where the first graph convolution sub-module is configured to take each of the target features as The nodes of the graph use each of the relative position features as the edges connecting the two nodes to construct a connected graph; the second graph convolution submodule is used to iteratively update the connected graph through the graph convolutional neural network, and After the iterative update, the connected graph that meets the convergence condition is used as the extracted feature.
- the constructed connected graph includes not only the target features in the image, but also the relative position features between the target features in the image, which can characterize the characteristics of the text in the image as a whole, and therefore can improve the key information The accuracy of the extraction results.
- graph convolutional neural networks can represent images in the form of connected graphs and extract features.
- a connected graph is composed of several nodes and edges connecting two nodes. The edges are used to describe the relationship between different nodes. Therefore, the features extracted by the graph convolutional neural network can accurately characterize the relative position between the target regions and the features of the text to be extracted, so as to improve the accuracy of subsequent text extraction.
- the field determination module includes: a first field determination sub-module and a second field determination sub-module, wherein the first field determination sub-module is configured to perform the The nodes in the connected graph output by the graph convolutional neural network are classified to obtain the category of the node.
- the preset category includes: the category of the identifier representing the text belonging to the preset field, and the category of the field value representing the text belonging to the preset field ;
- the second field determination sub-module is used to determine the identifier or field value of the preset field corresponding to the text to be extracted according to the category of the node.
- the identifier or field value of the text to be extracted corresponding to the preset field can be obtained, which improves The accuracy of text extraction is improved.
- the relative position feature determination module includes: a first relative position feature determination sub-module and a second relative position feature determination sub-module, wherein the first relative position feature determination sub-module is used to determine The relative position parameters of the first target area and the second target area; the second relative position feature determination sub-module is used to characterize the relative position parameters to obtain the relative positions of the first target area and the second target area feature.
- the relative position parameter includes at least one of the following: the lateral distance and the longitudinal distance of the first target area relative to the second target area; the aspect ratio of the first target area; The aspect ratio of the second target area; the relative size relationship between the first target area and the second target area.
- the relative position parameter includes the horizontal distance and the vertical distance, the aspect ratio of the first target area, and the relative size relationship between the first target area and the second target area, so that The extraction result of key information is more accurate.
- the second relative position feature determination submodule is used to map the relative position parameter to a D-dimensional space through a sine-cosine transformation matrix to obtain a D-dimensional eigenvector, where D is a positive integer Transform the D-dimensional feature vector into a 1-dimensional weight value through a preset weight matrix; process the weight value through a preset activation function to obtain a relative position feature.
- the relative position parameter can be converted into the data format required by the edge of the graph convolutional neural network through the feature processing, which is convenient for subsequent feature extraction through the graph convolutional neural network.
- the target feature determination module includes a first target feature determination sub-module, a second target feature determination sub-module, and a third target feature determination sub-module, wherein the first target feature determination sub-module uses To determine the pixel data in the target area, perform feature extraction on the pixel data to obtain visual features; the second target feature determination submodule is used to determine the text characters in the target area, and perform feature extraction on the text characters to obtain Character feature; the third target feature determination sub-module is used to determine the target feature of the target area according to the extracted visual features and character features.
- the third target feature determination submodule is used to assign different weights to the visual features and character features; to fuse the weighted visual features and character features to obtain the target area Target characteristics.
- the device is implemented by a pre-built classification network, and the device further includes: a first training module for inputting sample images into the classification network for processing to obtain sample images to be extracted The first prediction category of the text, and the correspondence between each category in the first prediction category; the second training module is used to train the first prediction category and the label category of the sample image
- the classification network includes: the characterization text belongs to the identification category of the preset field, and the characterization text belongs to the field value category of the preset field; the third training module is used for according to the corresponding relationship and the label to be marked. The correspondence between the texts is extracted, and the classification network is trained.
- the classification network can be trained more accurately by labeling the classification of the sample image and the corresponding relationship between each classification.
- the trained classification network performs text extraction on the image without a suitable template. When the time, the accuracy is higher.
- the image includes at least one of the following: a receipt image, an invoice image, and a business card image.
- an electronic device including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the foregoing method.
- a computer-readable storage medium having computer program instructions stored thereon, and the computer program instructions implement the above-mentioned method when executed by a processor.
- a computer program including computer readable code, and when the computer readable code runs in an electronic device, a processor in the electronic device executes instructions for implementing the above method .
- a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position feature between the target regions and the feature of the text to be extracted. It does not rely on a fixed template for text extraction. Compared with the method of text extraction based on a template, the accuracy of text extraction for images without a suitable template is higher.
- Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure
- Fig. 2 shows a schematic structural diagram of a connected graph according to an embodiment of the present disclosure
- Fig. 3 shows a schematic structural diagram of a classification network according to an embodiment of the present disclosure
- Fig. 4 shows a block diagram of an image processing device according to an embodiment of the present disclosure
- Fig. 5 shows a block diagram of an electronic device according to an embodiment of the present disclosure
- Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
- the technology of extracting key information based on images has made great progress.
- the text in the image can be recognized.
- the structure of the recognized text will be determined.
- Information that is, to determine which field in the structured data corresponds to a certain recognized text, so as to facilitate subsequent operations such as structured storage and display of the recognized data.
- the embodiments of the present disclosure provide an image processing method, which can determine the image in the image based on the relative position features between the target regions and the features of the text to be extracted through the graph convolutional neural network.
- the field corresponding to the text to be extracted This method does not rely on a fixed template for text extraction.
- the accuracy is higher when extracting text information from an image without a suitable template.
- the image processing method provided by the embodiments of the present disclosure can be applied to the extraction of key information in the image, can realize functions such as receipt information extraction, invoice information extraction, and identity information extraction, and has high application value.
- Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in Fig. 1, the image processing method includes:
- Step S11 the image is recognized, and multiple target regions in the image are determined.
- the target area is the area where the text to be extracted is located.
- the distribution of the text to be extracted on the image is often relatively scattered, for example, there is a certain interval between the text "total price" and "19.88 yuan". Therefore, when determining the target area, you can determine the target area according to the text on the image.
- the distribution relationship is based on the interval between the texts, and the image is divided to obtain multiple target regions.
- the target area may also be divided according to other methods, and the specific division method may depend on the specific application scenarios of the present disclosure, which is not limited in the present disclosure.
- the area where the text that constitutes a word, a sentence, or expresses a certain meaning can be determined as a target area, for example, the area where the text "total price" is to be extracted is a target Area, the area where "19.88 yuan" is located is a target area.
- the present disclosure does not limit this.
- Step S12 Determine the relative position characteristics between the target regions in the image.
- the relative position feature can characterize the relative position relationship between the target areas.
- the specific relative position feature can be determined according to the center points of the two target areas, or it can be determined according to a vertex of the two target areas. limit.
- the relative position feature in the present disclosure can also be determined according to some other parameters, which will be specifically discussed in the possible implementation manners disclosed in the following text, and will not be repeated here.
- Step S13 Determine the target feature of each target area.
- the target feature includes the feature of the text to be extracted.
- the feature of the text to be extracted is the feature of the text to be extracted.
- the feature may include the visual feature of the text to be extracted as a whole, the feature of the text character of the text to be extracted, or one of the above two features.
- Step S14 Perform feature extraction on the relative position feature and the target feature through the graph convolutional neural network to obtain the extracted feature.
- the relative position feature and the target feature are input into the graph convolutional neural network, and feature extraction is performed to obtain the extracted features.
- graph convolutional neural networks can represent images in the form of connected graphs and extract features.
- Connected graph is composed of several nodes (Node) and edges (Edge) connecting two nodes. Edges are used to describe the relationship between different nodes.
- the features extracted by the graph convolutional neural network can accurately characterize the relative position between the target regions and the features of the text to be extracted, so as to improve the accuracy of subsequent text extraction.
- Step S15 Determine the field corresponding to the text to be extracted according to the extracted features.
- the network can classify the text to be extracted based on the extracted features.
- the classification category is used to characterize the text corresponding to the text to be extracted.
- Field After the category of the text to be extracted is determined according to the extracted features, the field corresponding to the text to be extracted is determined.
- a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position feature between the target regions and the feature of the text to be extracted.
- the text extraction can be performed without relying on a fixed template. Compared with the method of text extraction based on a template, the accuracy of text extraction is higher when the text is extracted from an image without a suitable template.
- determining the relative position characteristics between the target areas in the image includes: determining the relative position parameters of the first target area and the second target area in the image; and characterizing the relative position parameters , Get the relative position characteristics of the first target area and the second target area.
- the first target area and the second target area are any two target areas in the image.
- the relative position parameters of the first target area and the second target area in the image include at least one of the following:
- the horizontal and vertical distances of the first target area relative to the second target area may be the horizontal and vertical distances between the reference point of the first target area and the reference point of the second target area, and the selection of the reference point of the target area , Can be the center point of the target area or a vertex of the target area.
- the selection of a specific reference point is not limited in the present disclosure.
- the lateral distance ⁇ x ij and the longitudinal distance ⁇ y ij of the first target area relative to the second target area are expressed as follows:
- the first target area is the area where the text t i to be extracted is located, and the second target area is the area where the text t j is to be extracted.
- the horizontal distance ⁇ x ij and the vertical distance ⁇ y ij can also be normalized to obtain the normalized horizontal and vertical distances.
- the ⁇ x ij and ⁇ x ij and the vertical distance can be determined by the image size parameter.
- ⁇ y ij is normalized. For example, when normalizing by the width W of the image, the relative position parameter is obtained The expression is as follows:
- the high H of the image can also be used for normalization, which will not be repeated here.
- the aspect ratio of the first target area is w i /h i
- the aspect ratio of the second target area is w j /h j .
- the relative size relationship between the first target area and the second target area may represent the relative size relationship between the size of the first target area and the size of the second target area. Since there are some specific relationships between the text sizes of certain fields, the relative position feature takes into account the relative size relationship between the first target area and the second target area, which can make the extraction result of the key information more accurate.
- the size of the text "address” is shorter, and the size of the text “xx city xx street xx road xx number” is longer, so the difference between the two sizes is larger; while the text “total price” and “19.88 yuan” "The gap between the sizes is smaller. Therefore, the relative size relationship of the target area can reflect the field category corresponding to the text to a certain extent.
- the relative position parameter includes the normalized horizontal and vertical distances, the aspect ratio of the first target area, and the relative size relationship between the first target area and the second target area. , Can make the extraction result of key information more accurate.
- the relative position parameters can be characterized to obtain the relative position characteristics of the first target area and the second target area.
- Characterizing the relative position parameters to obtain the relative position characteristics of the first target area and the second target area includes: mapping the relative position parameters to a D-dimensional space through a sine-cosine transformation matrix to obtain a D-dimensional feature vector, D is a positive integer; the D-dimensional feature vector is multiplied by a preset weight matrix to obtain a 1-dimensional weight value; the weight value is processed by the preset activation function to obtain the relative position feature.
- the sine-cosine transformation matrix here is the transformation matrix used in Fourier sine transformation or cosine transformation.
- the specific value of the preset weight matrix here can be determined by network training, and the initial value can be determined by random methods. During network training, the preset weight matrix will be tuned. The training process of the network will be described later, so I won’t go into details here.
- the preset activation function here may be, for example, a linear rectification function (Rectified Linear Unit, ReLU), and the specific activation function may depend on the actual application scenario of the present disclosure, which is not limited in the present disclosure.
- ReLU Rectified Linear Unit
- M represents a sine-cosine transformation matrix
- M(r ij ) represents that the relative position parameter r ij is mapped to a D-dimensional space through a sine-cosine transformation matrix M
- W m is a preset weight matrix
- ReLU represents a linear rectification function .
- the relative position parameter can be converted into the data format required by the edge of the graph convolutional neural network through the feature processing, which is convenient for subsequent feature extraction through the graph convolutional neural network.
- the target features in the embodiments of the present disclosure may include the visual features of the text to be extracted as a whole, or the text characters of the text to be extracted.
- determining the target characteristics of each target area includes: determining the pixel data in the target area, extracting the pixel data to obtain the visual characteristics; determining the text characters in the target area, and comparing the text Characters perform feature extraction to obtain text character features; according to the extracted visual features and character features, the target features of the target area are determined.
- the visual features can reflect the overall visual information of the text in the target area.
- the specific extraction can be performed by a region of interest alignment (Region of Interest Align, RoI Align) method, and the present disclosure does not limit the specific way of extracting visual features.
- the text characters can be recognized and extracted through text recognition technology. For example, it is possible to perform feature extraction on text characters through optical character recognition technology (Optical Character Recognition, OCR) to obtain text characters.
- OCR Optical Character Recognition
- the present disclosure does not limit the specific method of extracting text characters.
- performing feature extraction on the text characters to obtain character features includes: mapping the text characters to a low-dimensional feature space through one-hot encoding; and then through bidirectional Bi-LSTM processes the text characters in the low-dimensional feature space to obtain the feature representation of the text, that is, obtain the character features of the text to be extracted.
- discrete features text characters
- a certain value of discrete features corresponds to a point in Euclidean space, which makes the calculation between features more reasonable.
- determining the target feature of the target area according to the extracted visual features and character features includes: assigning different weights to the visual features and character features; and assigning weights to the visual features Combine (for example, add) with the character features to obtain the target feature of the target area.
- weights can be optimized through network training, and the specific training process is described in detail later, and will not be repeated here.
- the process of performing feature extraction on the text character s i to obtain the character feature t i can be expressed as the available formula (7).
- W ⁇ R C ⁇ D represents the projection matrix of the one-hot encoding
- Bi-LSTM represents the processing of the text characters after the one-hot encoding through the two-way long and short time series network. Represents the jth character in the text character s i.
- the target feature n i can be obtained by referring to formulas (8) and (9).
- ⁇ i ⁇ (W t t i +W v v i ) (8)
- n i ⁇ i U t t i +(1- ⁇ i )U v v i (9)
- W t ⁇ R 1 ⁇ Dt and W v ⁇ R 1 ⁇ Dv are one-dimensional projection matrices, which can be specifically optimized through network training, and ⁇ is the activation function.
- U t ⁇ R Dh ⁇ Dt and U v ⁇ R Dh ⁇ Dt are projection parameters, which can also be obtained through network training.
- the relative location feature and the target feature can be extracted through the graph convolutional neural network.
- the relative position feature and the target feature are extracted through the graph convolutional neural network, and the extracted features are obtained, including: taking each target feature as the node of the graph, and taking each relative position feature as Connect the edges of two nodes to construct a connected graph; through the graph convolutional neural network, the connected graph is iteratively updated, and the connected graph that meets the convergence condition after the iterative update is used as the extracted feature.
- the relative position feature When constructing a connected graph using the relative position feature of the target area as the edge connecting two nodes, the relative position feature will be used as a parameter of the adjacency matrix between the nodes.
- the adjacency matrix can also include the semantic similarity of the nodes and other things. Parameters, this disclosure does not limit the specific settings of other parameters.
- FIG. 2 is a schematic diagram of a connected graph provided in the present disclosure.
- the nodes of the graph are target features
- the edges connecting two nodes are the relative position features of the target area.
- the connected graph constructed by the embodiments of the present disclosure includes not only the target features in the image, but also the relative position features between the target features in the image, which can characterize the characteristics of the text in the image as a whole, and therefore can improve the extraction of key information. The accuracy of the results.
- the connected graph can be iteratively updated through the graph convolutional neural network, and the connected graph that meets the convergence condition after the iterative update is used as the extracted feature.
- the feature of any node i is updated by projecting the feature value of each node through the adjacency matrix of each node connected to node i.
- the feature value of each node will be It will no longer change with the increase of the number of iterations, that is, the eigenvalues of the nodes remain unchanged, at this time it can be regarded as meeting the convergence condition, and the connected graph meeting the convergence condition can be used as the extracted feature.
- N l+1 ⁇ ((A l N l )W l ) (10)
- N l is the feature of node N in the lth iteration
- W l is the conversion matrix, which can be obtained through network training optimization
- a l is the adjacency matrix of the node
- the expression of the adjacency matrix A l ij of the nodes i and j as follows:
- (n l i ) T represents the transposition of n l i, Represents the normalization parameters, which can be optimized through network training.
- the extracted features determine the field corresponding to the text to be extracted, including: outputting the image convolutional neural network according to a plurality of pre-defined preset categories
- the nodes in the connected graph are classified to obtain the category of the node.
- the preset category includes: the category of the identifier of the characterizing text belonging to the preset field, and the category of the field value of the characterizing text belonging to the preset field; Type, to determine the identifier or field value of the preset field corresponding to the text to be extracted.
- the recognized text there may be text that characterizes the identifier of the preset field, and there may also be text that characterizes the field value of the preset field.
- the text that characterizes the identifier of the preset field is the text in the image used to indicate which field the field value belongs to, and the field value is the specific value under the field. For example, for the preset field "Total Price”, the image is identified The text “total price”, “total price” or “sub total”, etc., are all specific identifiers of the preset field “total price”; for the recognized text "19.88 yuan", “ ⁇ : 19.88", etc. , Are the field values of the preset fields.
- two categories can be set to correspond to the preset field respectively.
- one category is the category that characterizes the text belonging to the identifier of the preset field
- the other category is the field that characterizes the text belongs to the preset field.
- the category of the value When there are multiple different preset fields, each preset field can be set to 2 categories, so there will be multiple characterization texts belonging to the identification category of the preset field, and multiple characterization texts belonging to the preset field The category of the field value.
- the preset fields can be set to "name”, “address”, “phone number”, “date”, “time”, “product category”, “product name”, “Commodity unit price”, “Single product total price”, “Taxes”, “Total total price”, “Reminder”, a total of 12 preset fields, then 24 categories can be preset, which respectively indicate the preset value of each preset field. Set the field identifier and the field value of each preset field. In addition, the category “Others” can be set to distinguish and extract texts that do not belong to the above categories, that is, a total of 25 categories are set.
- the image processing method of the embodiment of the present disclosure may be implemented by a pre-built classification network, and the training steps of the classification network are as follows:
- the label category includes: the category of the identifier representing the text belonging to a preset field, and the field value of the characterizing text belonging to the preset field Category
- the classification network can be used to implement the image processing technology of the present disclosure.
- the classification network can include the graph convolutional neural network described above.
- the classification network can also include other networks.
- the Bi-LSTM network for the networks included in the classification network of the present disclosure, may be determined according to the specific application scenarios of the embodiments of the present disclosure, which is not limited in the present disclosure.
- FIG. 3 is a schematic structural diagram of a specific implementation of a classification network provided in this application.
- the network includes a target feature extraction module, a relative position feature extraction module, a convolutional network feature extraction module, and a classification module. Extract the target feature of the image containing the text to be extracted through the target feature extraction module, and extract the relative position feature of the image through the relative position feature extraction module; input the target feature and relative position feature to the convolutional network feature extraction module for iterative update, and get The iteratively extracted features; then the iteratively extracted features are classified through the classification module to obtain the predicted category of the node.
- the category characterizes the field corresponding to the text to be extracted, after the category of the text to be extracted is determined according to the extracted features, the field corresponding to the text to be extracted is determined.
- the specific functions of each module please refer to the relevant discussion in this disclosure, which will not be repeated here.
- the label category may be the preset category described above, which will not be repeated here.
- the parameters in the classification network can be adjusted according to the loss of the first prediction category relative to the label category, so that the classification network can The difference between the predicted category and the labeled category of the sample image is the smallest.
- the identification and identification value of whether two texts belong to the same preset field is also beneficial to the classification accuracy of the classification network.
- the two texts respectively belonging to the identification and identification value of the same preset field are referred to as a field pair, for example, the text "total price” and "19.88 yuan" constitute a field pair.
- the classification network when training the classification network, the classification network will also output the correspondence between the categories in the first prediction category, and at the same time, the correspondence between the texts will also be marked in the sample image. Then, the classification network can be trained according to the correspondence between the output of the classification network and the correspondence between the labeled texts to be extracted.
- the loss function used during training may specifically be a cross entropy loss function (Cross Entropy Loss, CE), and the specific loss function may be selected according to actual requirements, which is not specifically limited in the present disclosure.
- CE Cross Entropy Loss
- the trained classification network can be used to determine the field corresponding to the text to be extracted during the extraction of key text information.
- the trained classification network has higher accuracy when extracting text from images without adapted templates.
- the recognized image includes at least one of the following: a receipt image, an invoice image, and a business card image.
- a receipt image a receipt image
- an invoice image a payment image
- a business card image a payment image that specifies the payment amount.
- the embodiments of the present disclosure can also be used to recognize other images, and the present disclosure does not specifically limit this.
- a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position feature between the target regions and the feature of the text to be extracted.
- the text extraction can be performed without relying on a fixed template. Compared with the method of text extraction based on a template, the accuracy of text extraction is higher when the text is extracted from an image without a suitable template.
- the embodiments of the present disclosure when text extraction is performed, not only the text character features in the target area are used, but also the visual features of the target area are used, which reduces the influence of misrecognition of text characters on the final classification and improves the performance of text extraction. Accuracy; In addition, by establishing the spatial position relationship between the text areas, it is not dependent on the pre-designed templates, and can handle unseen templates, which has better scalability.
- the image processing method can be executed by electronic equipment such as a terminal device or a server, and the terminal device can be a user equipment (UE), a mobile device, a user terminal, a terminal, a cellular phone, or a cordless
- UE user equipment
- PDAs personal digital assistants
- the method can be implemented by a processor invoking computer-readable instructions stored in a memory.
- the method can be executed by a server.
- the present disclosure also provides image processing devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any of the image processing methods provided in the present disclosure.
- image processing devices electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any of the image processing methods provided in the present disclosure.
- FIG. 4 shows a block diagram of an image processing device according to an embodiment of the present disclosure.
- the image processing device 20 includes:
- the recognition module 21 is configured to recognize an image and determine multiple target regions in the image, where the target region is the region where the text to be extracted is located;
- the relative position feature determining module 22 is used to determine the relative position feature between each target area in the image
- the target feature determining module 23 is configured to determine the target feature of each target area, where the target feature includes the feature of the text to be extracted;
- the graph convolution module 24 is configured to perform feature extraction on the relative position feature and the target feature through the graph convolution neural network to obtain the extracted feature;
- the field determination module 25 is configured to determine the field corresponding to the text to be extracted according to the extracted features.
- a graph convolutional neural network can be used to determine the field corresponding to the text to be extracted in the image based on the relative position feature between the target regions and the feature of the text to be extracted.
- the text extraction can be performed without relying on a fixed template. Compared with the method of text extraction based on a template, the accuracy of text extraction is higher when the text is extracted from an image without a suitable template.
- the graph convolution module 24 includes: a first graph convolution sub-module and a second graph convolution sub-module, wherein:
- the first graph convolution submodule is used to construct a connected graph by taking each of the target features as the nodes of the graph, and using each of the relative position features as the edges connecting the two nodes;
- the second graph convolution submodule is used to iteratively update the connected graph through the graph convolutional neural network, and use the connected graph that meets the convergence condition after the iterative update as the extracted feature.
- the constructed connected graph includes not only the target features in the image, but also the relative position features between the target features in the image, which can characterize the characteristics of the text in the image as a whole, and therefore can improve the key information The accuracy of the extraction results.
- graph convolutional neural networks can represent images in the form of connected graphs and extract features.
- Connected graph is composed of several nodes (Node) and edges (Edge) connecting two nodes. Edges are used to describe the relationship between different nodes. Therefore, the features extracted by the graph convolutional neural network can accurately characterize the relative position between the target regions and the features of the text to be extracted, so as to improve the accuracy of subsequent text extraction.
- the field determination module 25 includes: a first field determination sub-module and a second field determination sub-module, where:
- the first field determination sub-module is used to classify the nodes in the connected graph output by the graph convolutional neural network according to a plurality of pre-defined preset categories to obtain the category of the node, and the preset category includes: the characterization text belongs to The category of the identifier of the preset field, and the category of the field value of the characterizing text belonging to the preset field;
- the second field determination submodule is used to determine the identifier or field value of the preset field corresponding to the text to be extracted according to the category of the node.
- the identifier or field value of the text to be extracted corresponding to the preset field can be obtained, which improves The accuracy of text extraction is improved.
- the relative position feature determination module 22 includes: a first relative position feature determination sub-module and a second relative position feature determination sub-module, wherein:
- the first relative position feature determining sub-module is used to determine the relative position parameters of the first target area and the second target area in the image;
- the second relative position feature determination sub-module is used to perform characterization processing on the relative position parameters to obtain the relative position features of the first target area and the second target area.
- the relative position parameter includes at least one of the following:
- the relative position parameter includes the horizontal distance and the vertical distance, the aspect ratio of the first target area, and the relative size relationship between the first target area and the second target area, so that The extraction result of key information is more accurate.
- the second relative position feature determination submodule is used to map the relative position parameter to a D-dimensional space through a sine-cosine transformation matrix to obtain a D-dimensional eigenvector, where D is a positive integer Transform the D-dimensional feature vector into a 1-dimensional weight value through a preset weight matrix; process the weight value through a preset activation function to obtain a relative position feature.
- the relative position parameter can be converted into the data format required by the edge of the graph convolutional neural network through the feature processing, which is convenient for subsequent feature extraction through the graph convolutional neural network.
- the target feature determination module 23 includes a first target feature determination sub-module, a second target feature determination sub-module, and a third target feature determination sub-module, wherein:
- the first target feature determination sub-module is used to determine pixel data in the target area, and perform feature extraction on the pixel data to obtain visual features;
- the second target feature determination sub-module is used to determine text characters in the target area, and perform feature extraction on the text characters to obtain character features;
- the third target feature determination sub-module is used to determine the target feature of the target area according to the extracted visual features and character features.
- the third target feature determination submodule is used to assign different weights to the visual features and character features; to fuse the weighted visual features and character features to obtain the target area Target characteristics.
- the device is implemented through a pre-built classification network, and the device further includes:
- the first training module is configured to input the sample image into the classification network for processing to obtain the first prediction category of the text to be extracted in the sample image, and the corresponding relationship between each category in the first prediction category;
- the second training module is configured to train the classification network according to the first prediction category and the label category of the sample image.
- the label category includes: the category of the identifier that characterizes the text belonging to the preset field, and the characterization text The category of the field value belonging to the preset field;
- the third training module is configured to train the classification network according to the corresponding relationship and the corresponding relationship between the labeled texts to be extracted.
- the classification network can be trained more accurately by labeling the classification of the sample image and the corresponding relationship between each classification.
- the trained classification network performs text extraction on the image without a suitable template. When the time, the accuracy is higher.
- the image includes at least one of the following: a receipt image, an invoice image, and a business card image.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
- the computer-readable storage medium may be a volatile computer-readable storage medium or a non-volatile computer-readable storage medium.
- An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.
- the embodiments of the present disclosure also provide a computer program product, including computer-readable code.
- the processor in the device executes the image processing method for implementing the image processing method provided by any of the above embodiments. instruction.
- the embodiments of the present disclosure also provide another computer program product for storing computer-readable instructions, which when executed, cause the computer to perform the operations of the image processing method provided by any of the foregoing embodiments.
- the electronic device can be provided as a terminal, server or other form of device.
- FIG. 5 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
- the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
- the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
- the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
- the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
- the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
- the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
- the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phone book data, messages, images, videos, etc.
- the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable and Programmable Read Only Memory
- PROM Programmable Read Only Memory
- ROM Read Only Memory
- Magnetic Memory Flash Memory
- Magnetic Disk Magnetic Disk or Optical Disk.
- the power supply component 806 provides power for various components of the electronic device 800.
- the power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.
- the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
- the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
- the audio component 810 is configured to output and/or input audio signals.
- the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
- the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
- the audio component 810 further includes a speaker for outputting audio signals.
- the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
- the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
- the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
- the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
- the component is the display and the keypad of the electronic device 800.
- the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
- the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
- the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
- the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
- the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
- the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
- the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- ASIC application-specific integrated circuits
- DSP digital signal processors
- DSPD digital signal processing devices
- PLD programmable logic devices
- FPGA field-available A programmable gate array
- controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
- FIG. 6 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
- the electronic device 1900 may be provided as a server. 6
- the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by the memory 1932, for storing instructions executable by the processing component 1922, such as application programs.
- the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
- the processing component 1922 is configured to execute instructions to perform the above-described methods.
- the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to the network, and an input output (I/O) interface 1958 .
- the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
- the present disclosure may be a system, method and/or computer program product.
- the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
- the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- flash memory flash memory
- SRAM static random access memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disk
- memory stick floppy disk
- mechanical encoding device such as a printer with instructions stored thereon
- the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
- the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
- the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
- Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
- Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
- the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connection).
- LAN local area network
- WAN wide area network
- an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
- FPGA field programmable gate array
- PDA programmable logic array
- the computer-readable program instructions are executed to realize various aspects of the present disclosure.
- These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
- each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
- Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
- the computer program product can be specifically implemented by hardware, software, or a combination thereof.
- the computer program product is specifically embodied as a computer storage medium.
- the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
- SDK software development kit
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Character Discrimination (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (23)
- 一种图像处理方法,其中,包括:对图像进行识别,确定所述图像中的多个目标区域,所述目标区域为待提取文本所在区域;确定所述图像中各目标区域之间的相对位置特征;确定各所述目标区域的目标特征,所述目标特征包括所述待提取文本的特征;通过图卷积神经网络,对所述相对位置特征和所述目标特征进行特征提取,得到提取后的特征;根据提取后的特征,确定所述待提取文本对应的字段。
- 根据权利要求1所述方法,其中,通过图卷积神经网络,对所述相对位置特征和所述目标特征进行特征提取,得到提取后的特征,包括:以各所述目标特征为图的节点,以各所述相对位置特征为连接两个节点的边,构建连通图;通过图卷积神经网络,对所述连通图进行迭代更新,将迭代更新后满足收敛条件的连通图作为提取后的特征。
- 根据权利要求2所述方法,其中,根据提取后的特征,确定所述待提取文本对应的字段,包括:根据预先定义的多个预设类别,对图卷积神经网络输出的连通图中的节点进行分类,得到节点的类别,所述预设类别包括:表征文本属于预设字段的标识的类别,以及表征文本属于预设字段的字段值的类别;根据所述节点的类别,确定待提取文本对应于预设字段的标识或字段值。
- 根据权利要求1-3中任一项所述方法,其中,确定所述图像中各目标区域之间的相对位置特征,包括:确定图像中的第一目标区域和第二目标区域的相对位置参数;对所述相对位置参数进行特征化处理,得到第一目标区域和第二目标区域的相对位置特征。
- 根据权利要求4所述方法,其中,所述相对位置参数包括下述至少一种:第一目标区域相对于第二目标区域的横向距离和纵向距离;所述第一目标区域的宽高比;所述第二目标区域的宽高比;所述第一目标区域和所述第二目标区域的相对尺寸关系。
- 根据权利要求4或5任一所述方法,其中,对所述相对位置参数进行特征化处理,得到第一目标区域和第二目标区域的相对位置特征,包括:将所述相对位置参数通过正余弦变换矩阵映射到一个D维的空间,得到D维的特征向量,D为正整数;通过预设权重矩阵,将所述D维的特征向量转化为1维的权重值;通过预设激活函数对所述权重值进行处理,得到相对位置特征。
- 根据权利要求1-6中任一项所述方法,其中,确定各所述目标区域的目标特征,包括:确定目标区域中的像素数据,对所述像素数据进行特征提取,得到视觉特征;确定目标区域中的文本字符,对所述文本字符进行特征提取,得到字符特征;根据提取到的视觉特征和字符特征,确定目标区域的目标特征。
- 根据权利要求7所述方法,其中,根据提取到的视觉特征和字符特征,确定目标区域的目标特征,包括:将所述视觉特征和字符特征赋予不同的权重;对赋予权重后的所述视觉特征和字符特征进行融合,得到目标区域的目标特征。
- 根据权利要求1-8中任一项所述方法,其中,所述方法通过预先构建的分类网络实现,所述分类网络的训练步骤如下:将样本图像输入所述分类网络中处理,得到样本图像中待提取文本的第一预测类别,以及所述第一预测类别中各个类别之间的对应关系;根据所述第一预测类别,以及所述样本图像的标注类别,训练所述分类网络,所述标注类别包括:表征文本属于预设字段的标识的类别,以及表征文本属于预设字段的字段值的类别;根据所述对应关系,以及标注的待提取文本之间的对应关系,训练所述分类网络。
- 根据权利要求1-9中任一项所述方法,其中,所述图像包括下述至少一种:收据图像、发票图像、名片图像。
- 一种图像处理装置,其中,包括:识别模块,用于对图像进行识别,确定所述图像中的多个目标区域,所述目标区域为待提取文本所在区域;相对位置特征确定模块,用于确定所述图像中各目标区域之间的相对位置特征;目标特征确定模块,用于确定各所述目标区域的目标特征,所述目标特征包括所述待提取文本的特征;图卷积模块,用于通过图卷积神经网络,对所述相对位置特征和所述目标特征进行特征提取,得到提取后的特征;字段确定模块,用于根据提取后的特征,确定所述待提取文本对应的字段。
- 根据权利要求11所述装置,其中,所述图卷积模块包括:第一图卷积子模块和第二图卷积子模块,其中:第一图卷积子模块,用于以各所述目标特征为图的节点,以各所述相对位置特征为连接两个节点的边,构建连通图;第二图卷积子模块,用于通过图卷积神经网络,对所述连通图进行迭代更新,将迭代更新后满足收敛条件的连通图作为提取后的特征。
- 根据权利要求12所述装置,其中,所述字段确定模块包括:第一字段确定子模块和第二字段确定子模块,其中:第一字段确定子模块,用于根据预先定义的多个预设类别,对图卷积神经网络输出的连通图中的节点进行分类,得到节点的类别,所述预设类别包括:表征文本属于预设字段的标识的类别,以及表征文本属于预设字段的字段值的类别;第二字段确定子模块,用于根据所述节点的类别,确定待提取文本对应于预设字段的标识或字段值。
- 根据权利要求11-13中任一项所述装置,其中,相对位置特征确定模块包括:第一相对位置特征确定子模块和第二相对位置特征确定子模块,其中:第一相对位置特征确定子模块,用于确定图像中的第一目标区域和第二目标区域的相对位置参数;第二相对位置特征确定子模块,用于对所述相对位置参数进行特征化处理,得到第一目标区域和第二目标区域的相对位置特征。
- 根据权利要求14所述装置,其中,所述相对位置参数包括下述至少一种:第一目标区域相对于第二目标区域的横向距离和纵向距离;所述第一目标区域的宽高比;所述第二目标区域的宽高比;所述第一目标区域和所述第二目标区域的相对尺寸关系。
- 根据权利要求14或15任一所述装置,其中,第二相对位置特征确定子模块,用于将所述相对位置参数通过正余弦变换矩阵映射到一个D维的空间,得到D维的特征向 量,D为正整数;通过预设权重矩阵,将所述D维的特征向量转化为1维的权重值;通过预设激活函数对所述权重值进行处理,得到相对位置特征。
- 根据权利要求11-16中任一项所述装置,其中,目标特征确定模块,包括第一目标特征确定子模块、第二目标特征确定子模块和第三目标特征确定子模块,其中:第一目标特征确定子模块,用于确定目标区域中的像素数据,对所述像素数据进行特征提取,得到视觉特征;第二目标特征确定子模块,用于确定目标区域中的文本字符,对所述文本字符进行特征提取,得到字符特征;第三目标特征确定子模块,用于根据提取到的视觉特征和字符特征,确定目标区域的目标特征。
- 根据权利要求17所述装置,其中,第三目标特征确定子模块,用于将所述视觉特征和字符特征赋予不同的权重;对赋予权重后的所述视觉特征和字符特征进行融合,得到目标区域的目标特征。
- 根据权利要求11-18中任一项所述装置,其中,所述装置通过预先构建的分类网络实现,所述装置还包括:第一训练模块,用于将样本图像输入所述分类网络中处理,得到样本图像中待提取文本的第一预测类别,以及所述第一预测类别中各个类别之间的对应关系;第二训练模块,用于根据所述第一预测类别,以及所述样本图像的标注类别,训练所述分类网络,所述标注类别包括:表征文本属于预设字段的标识的类别,以及表征文本属于预设字段的字段值的类别;第三训练模块,用于根据所述对应关系,以及标注的待提取文本之间的对应关系,训练所述分类网络。
- 根据权利要求11-19中任一项所述装置,其中,所述图像包括下述至少一种:收据图像、发票图像、名片图像。
- 一种电子设备,其中,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至10中任意一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序指令,其中,所述计算机程序指令被处理器执行时实现权利要求1至10中任意一项所述的方法。
- 一种计算机程序,其中,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1-10中的任一权利要求所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217020203A KR20210113192A (ko) | 2019-12-27 | 2020-02-28 | 이미지 처리 방법 및 장치, 전자 기기 및 기억 매체 |
JP2021538344A JP7097513B2 (ja) | 2019-12-27 | 2020-02-28 | 画像処理方法及び装置、電子機器並びに記憶媒体 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911387827.1 | 2019-12-27 | ||
CN201911387827.1A CN111191715A (zh) | 2019-12-27 | 2019-12-27 | 图像处理方法及装置、电子设备和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021128578A1 true WO2021128578A1 (zh) | 2021-07-01 |
Family
ID=70707802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/077247 WO2021128578A1 (zh) | 2019-12-27 | 2020-02-28 | 图像处理方法及装置、电子设备和存储介质 |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP7097513B2 (zh) |
KR (1) | KR20210113192A (zh) |
CN (1) | CN111191715A (zh) |
TW (1) | TWI736230B (zh) |
WO (1) | WO2021128578A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113506322A (zh) * | 2021-07-15 | 2021-10-15 | 清华大学 | 图像处理方法及装置、电子设备和存储介质 |
CN113688686A (zh) * | 2021-07-26 | 2021-11-23 | 厦门大学 | 基于图卷积神经网络的虚拟现实视频质量评价方法 |
CN113705559A (zh) * | 2021-08-31 | 2021-11-26 | 平安银行股份有限公司 | 基于人工智能的文字识别方法及装置、电子设备 |
CN114283403A (zh) * | 2021-12-24 | 2022-04-05 | 北京有竹居网络技术有限公司 | 一种图像检测方法、装置、存储介质及设备 |
CN114511864A (zh) * | 2022-04-19 | 2022-05-17 | 腾讯科技(深圳)有限公司 | 文本信息提取方法、目标模型的获取方法、装置及设备 |
CN114708961A (zh) * | 2022-03-18 | 2022-07-05 | 北京理工大学珠海学院 | 个人生理和心理特性类别测评装置及方法 |
CN114724133A (zh) * | 2022-04-18 | 2022-07-08 | 北京百度网讯科技有限公司 | 文字检测和模型训练方法、装置、设备及存储介质 |
CN114863245A (zh) * | 2022-05-26 | 2022-08-05 | 中国平安人寿保险股份有限公司 | 图像处理模型的训练方法和装置、电子设备及介质 |
WO2023005468A1 (zh) * | 2021-07-30 | 2023-02-02 | 上海商汤智能科技有限公司 | 检测呼吸率的方法、装置、存储介质及电子设备 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112801099B (zh) * | 2020-06-02 | 2024-05-24 | 腾讯科技(深圳)有限公司 | 一种图像处理方法、装置、终端设备及介质 |
CN111695517B (zh) * | 2020-06-12 | 2023-08-18 | 北京百度网讯科技有限公司 | 图像的表格提取方法、装置、电子设备及存储介质 |
CN112069877B (zh) * | 2020-07-21 | 2022-05-03 | 北京大学 | 一种基于边缘信息和注意力机制的人脸信息识别方法 |
CN112016438B (zh) * | 2020-08-26 | 2021-08-10 | 北京嘀嘀无限科技发展有限公司 | 一种基于图神经网络识别证件的方法及系统 |
CN112784720A (zh) * | 2021-01-13 | 2021-05-11 | 浙江诺诺网络科技有限公司 | 基于银行回单的关键信息提取方法、装置、设备及介质 |
CN113807369B (zh) * | 2021-09-26 | 2024-09-17 | 北京市商汤科技开发有限公司 | 目标重识别方法及装置、电子设备和存储介质 |
CN114037985A (zh) * | 2021-11-04 | 2022-02-11 | 北京有竹居网络技术有限公司 | 信息提取方法、装置、设备、介质及产品 |
KR102485944B1 (ko) | 2021-11-19 | 2023-01-10 | 주식회사 스탠다임 | 트랜스포머 신경망에서의 그래프 인코딩 방법 |
CN116383428B (zh) * | 2023-03-31 | 2024-04-05 | 北京百度网讯科技有限公司 | 一种图文编码器训练方法、图文匹配方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098608A1 (en) * | 2013-09-05 | 2016-04-07 | Ebay Inc. | System and method for scene text recognition |
CN108921166A (zh) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | 基于深度神经网络的医疗票据类文本检测识别方法及系统 |
CN109308476A (zh) * | 2018-09-06 | 2019-02-05 | 邬国锐 | 票据信息处理方法、系统及计算机可读存储介质 |
CN109919014A (zh) * | 2019-01-28 | 2019-06-21 | 平安科技(深圳)有限公司 | Ocr识别方法及其电子设备 |
CN110033000A (zh) * | 2019-03-21 | 2019-07-19 | 华中科技大学 | 一种票据图像的文本检测与识别方法 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000132639A (ja) | 1998-10-27 | 2000-05-12 | Nippon Telegr & Teleph Corp <Ntt> | 文字抽出認識方法及び装置及びこの方法を記録した記録媒体 |
US7756871B2 (en) * | 2004-10-13 | 2010-07-13 | Hewlett-Packard Development Company, L.P. | Article extraction |
CN101894123A (zh) * | 2010-05-11 | 2010-11-24 | 清华大学 | 基于子图的链接相似度的快速近似计算系统和方法 |
CN105786980B (zh) * | 2016-02-14 | 2019-12-20 | 广州神马移动信息科技有限公司 | 对描述同一实体的不同实例进行合并的方法、装置及设备 |
CN107679153A (zh) * | 2017-09-27 | 2018-02-09 | 国家电网公司信息通信分公司 | 一种专利分类方法及装置 |
JP7068570B2 (ja) | 2017-12-11 | 2022-05-17 | 富士通株式会社 | 生成プログラム、情報処理装置及び生成方法 |
JP6928876B2 (ja) | 2017-12-15 | 2021-09-01 | 京セラドキュメントソリューションズ株式会社 | フォーム種別学習システムおよび画像処理装置 |
CN109977723B (zh) * | 2017-12-22 | 2021-10-22 | 苏宁云商集团股份有限公司 | 大票据图片文字识别方法 |
CN108549850B (zh) * | 2018-03-27 | 2021-07-16 | 联想(北京)有限公司 | 一种图像识别方法及电子设备 |
JP7063080B2 (ja) | 2018-04-20 | 2022-05-09 | 富士通株式会社 | 機械学習プログラム、機械学習方法および機械学習装置 |
CN109086756B (zh) * | 2018-06-15 | 2021-08-03 | 众安信息技术服务有限公司 | 一种基于深度神经网络的文本检测分析方法、装置及设备 |
CN110619325B (zh) * | 2018-06-20 | 2024-03-08 | 北京搜狗科技发展有限公司 | 一种文本识别方法及装置 |
WO2020113437A1 (zh) * | 2018-12-04 | 2020-06-11 | 区链通网络有限公司 | 图结构处理方法、系统、网络设备及存储介质 |
CN110276396B (zh) * | 2019-06-21 | 2022-12-06 | 西安电子科技大学 | 基于物体显著性和跨模态融合特征的图片描述生成方法 |
CN110598759A (zh) * | 2019-08-23 | 2019-12-20 | 天津大学 | 一种基于多模态融合的生成对抗网络的零样本分类方法 |
CN110569846A (zh) * | 2019-09-16 | 2019-12-13 | 北京百度网讯科技有限公司 | 图像文字识别方法、装置、设备及存储介质 |
CN110610166B (zh) * | 2019-09-18 | 2022-06-07 | 北京猎户星空科技有限公司 | 文本区域检测模型训练方法、装置、电子设备和存储介质 |
-
2019
- 2019-12-27 CN CN201911387827.1A patent/CN111191715A/zh active Pending
-
2020
- 2020-02-28 WO PCT/CN2020/077247 patent/WO2021128578A1/zh active Application Filing
- 2020-02-28 JP JP2021538344A patent/JP7097513B2/ja active Active
- 2020-02-28 KR KR1020217020203A patent/KR20210113192A/ko not_active Application Discontinuation
- 2020-04-22 TW TW109113453A patent/TWI736230B/zh active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160098608A1 (en) * | 2013-09-05 | 2016-04-07 | Ebay Inc. | System and method for scene text recognition |
CN108921166A (zh) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | 基于深度神经网络的医疗票据类文本检测识别方法及系统 |
CN109308476A (zh) * | 2018-09-06 | 2019-02-05 | 邬国锐 | 票据信息处理方法、系统及计算机可读存储介质 |
CN109919014A (zh) * | 2019-01-28 | 2019-06-21 | 平安科技(深圳)有限公司 | Ocr识别方法及其电子设备 |
CN110033000A (zh) * | 2019-03-21 | 2019-07-19 | 华中科技大学 | 一种票据图像的文本检测与识别方法 |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113506322A (zh) * | 2021-07-15 | 2021-10-15 | 清华大学 | 图像处理方法及装置、电子设备和存储介质 |
CN113506322B (zh) * | 2021-07-15 | 2024-04-12 | 清华大学 | 图像处理方法及装置、电子设备和存储介质 |
CN113688686B (zh) * | 2021-07-26 | 2023-10-27 | 厦门大学 | 基于图卷积神经网络的虚拟现实视频质量评价方法 |
CN113688686A (zh) * | 2021-07-26 | 2021-11-23 | 厦门大学 | 基于图卷积神经网络的虚拟现实视频质量评价方法 |
WO2023005468A1 (zh) * | 2021-07-30 | 2023-02-02 | 上海商汤智能科技有限公司 | 检测呼吸率的方法、装置、存储介质及电子设备 |
CN113705559A (zh) * | 2021-08-31 | 2021-11-26 | 平安银行股份有限公司 | 基于人工智能的文字识别方法及装置、电子设备 |
CN113705559B (zh) * | 2021-08-31 | 2024-05-10 | 平安银行股份有限公司 | 基于人工智能的文字识别方法及装置、电子设备 |
CN114283403A (zh) * | 2021-12-24 | 2022-04-05 | 北京有竹居网络技术有限公司 | 一种图像检测方法、装置、存储介质及设备 |
CN114283403B (zh) * | 2021-12-24 | 2024-01-16 | 北京有竹居网络技术有限公司 | 一种图像检测方法、装置、存储介质及设备 |
CN114708961A (zh) * | 2022-03-18 | 2022-07-05 | 北京理工大学珠海学院 | 个人生理和心理特性类别测评装置及方法 |
CN114724133A (zh) * | 2022-04-18 | 2022-07-08 | 北京百度网讯科技有限公司 | 文字检测和模型训练方法、装置、设备及存储介质 |
CN114724133B (zh) * | 2022-04-18 | 2024-02-02 | 北京百度网讯科技有限公司 | 文字检测和模型训练方法、装置、设备及存储介质 |
WO2023202268A1 (zh) * | 2022-04-19 | 2023-10-26 | 腾讯科技(深圳)有限公司 | 文本信息提取方法、目标模型的获取方法、装置及设备 |
CN114511864A (zh) * | 2022-04-19 | 2022-05-17 | 腾讯科技(深圳)有限公司 | 文本信息提取方法、目标模型的获取方法、装置及设备 |
CN114863245A (zh) * | 2022-05-26 | 2022-08-05 | 中国平安人寿保险股份有限公司 | 图像处理模型的训练方法和装置、电子设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN111191715A (zh) | 2020-05-22 |
JP2022518889A (ja) | 2022-03-17 |
JP7097513B2 (ja) | 2022-07-07 |
TW202125307A (zh) | 2021-07-01 |
KR20210113192A (ko) | 2021-09-15 |
TWI736230B (zh) | 2021-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021128578A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
TWI728621B (zh) | 圖像處理方法及其裝置、電子設備、電腦可讀儲存媒體和電腦程式 | |
WO2021155632A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
US11120078B2 (en) | Method and device for video processing, electronic device, and storage medium | |
TWI749423B (zh) | 圖像處理方法及裝置、電子設備和電腦可讀儲存介質 | |
WO2021051857A1 (zh) | 目标对象匹配方法及装置、电子设备和存储介质 | |
WO2021056808A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
WO2020232977A1 (zh) | 神经网络训练方法及装置以及图像处理方法及装置 | |
WO2021208667A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
WO2021051650A1 (zh) | 人脸和人手关联检测方法及装置、电子设备和存储介质 | |
JP6007354B2 (ja) | 直線検出方法、装置、プログラム、及び記録媒体 | |
CN110009090B (zh) | 神经网络训练与图像处理方法及装置 | |
US11288531B2 (en) | Image processing method and apparatus, electronic device, and storage medium | |
CN110443366B (zh) | 神经网络的优化方法及装置、目标检测方法及装置 | |
CN110532956B (zh) | 图像处理方法及装置、电子设备和存储介质 | |
CN111259967B (zh) | 图像分类及神经网络训练方法、装置、设备及存储介质 | |
WO2021208666A1 (zh) | 字符识别方法及装置、电子设备和存储介质 | |
WO2023115911A1 (zh) | 对象重识别方法及装置、电子设备、存储介质和计算机程序产品 | |
WO2020147414A1 (zh) | 网络优化方法及装置、图像处理方法及装置、存储介质 | |
CN112926310B (zh) | 一种关键词提取方法及装置 | |
WO2021082463A1 (zh) | 数据处理方法及装置、电子设备和存储介质 | |
WO2022141969A1 (zh) | 图像分割方法及装置、电子设备、存储介质和程序 | |
CN105824955A (zh) | 短信聚类方法及装置 | |
TW201344577A (zh) | 利用圖像辨識導引安裝應用程式的方法及電子裝置 | |
US20220270352A1 (en) | Methods, apparatuses, devices, storage media and program products for determining performance parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2021538344 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20906061 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 31.10.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20906061 Country of ref document: EP Kind code of ref document: A1 |