CN112149663A - RPA and AI combined image character extraction method and device and electronic equipment - Google Patents
RPA and AI combined image character extraction method and device and electronic equipment Download PDFInfo
- Publication number
- CN112149663A CN112149663A CN202010886737.3A CN202010886737A CN112149663A CN 112149663 A CN112149663 A CN 112149663A CN 202010886737 A CN202010886737 A CN 202010886737A CN 112149663 A CN112149663 A CN 112149663A
- Authority
- CN
- China
- Prior art keywords
- detection
- image
- detection frame
- processed
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 45
- 238000001514 detection method Methods 0.000 claims abstract description 465
- 238000000034 method Methods 0.000 claims abstract description 62
- 238000012545 processing Methods 0.000 claims abstract description 27
- 230000015654 memory Effects 0.000 claims description 19
- 230000007246 mechanism Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 abstract description 16
- 238000013473 artificial intelligence Methods 0.000 description 46
- 238000005516 engineering process Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 235000019580 granularity Nutrition 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 239000000284 extract Substances 0.000 description 4
- 238000012015 optical character recognition Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 235000021167 banquet Nutrition 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004801 process automation Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The application provides an extraction method and device of image characters by combining RPA and AI, electronic equipment and a storage medium, and belongs to the technical field of image processing. Wherein, the method comprises the following steps: performing target detection on the image to be processed to determine the position information of each detection frame and the type of each detection frame contained in the image to be processed, wherein the type of each detection frame comprises: characters, non-characters, text line beginning and text line ending; combining the detection frames with the types of characters according to the position information of each detection frame and the type of each detection frame to determine each text frame contained in the image to be processed; and performing character recognition on each text box to determine characters contained in the image to be processed. Therefore, by the extraction method of the image characters combining the RPA and the AI, the data contents of different types in the image can be determined by one-time detection, so that the process of extracting the image characters is simplified, and the efficiency of extracting the characters is improved.
Description
Technical Field
The present application relates to the field of automation technologies, and in particular, to a method and an apparatus for extracting image and text in combination with RPA and AI, an electronic device, and a storage medium.
Background
Robot Process Automation (RPA) is a Process task automatically executed according to rules by simulating human operations on a computer through specific robot software.
Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence.
With the development of AI, Optical Character Recognition (OCR) technology has been advanced in various fields to help people reduce repetitive inefficient work, especially work required to transcribe text information to a computer. The combination of the RPA technology and the OCR technology has become a new trend in the RPA field to help enterprises process character and image data more efficiently and improve work efficiency.
However, in the related art, for documents with various types of contents, such as texts, tables, red chapters, etc., it is usually necessary for a plurality of models to perform character extraction on different types of contents respectively and sequentially, so that the process of character extraction is complicated and the efficiency is low.
Disclosure of Invention
The extraction method, the extraction device, the electronic equipment and the storage medium of the image characters combined with the RPA and the AI are used for solving the problems that in the related technology, for documents with various types of contents including texts, tables, red chapters and the like, a plurality of models are generally needed to respectively and sequentially extract characters from different types of contents, so that the character extraction process is complicated, and the efficiency is low.
An embodiment of the application provides a method for extracting image characters by combining an RPA and an AI, which includes: performing target detection on an image to be processed to determine position information of each detection frame and a type of each detection frame included in the image to be processed, wherein the type of each detection frame comprises: characters, non-characters, text line beginning and text line ending; combining the detection frames with the types of characters according to the position information of each detection frame and the type of each detection frame to determine each text frame contained in the image to be processed; and performing character recognition on each text box to determine characters contained in the image to be processed.
Optionally, in a possible implementation manner of the embodiment of the first aspect of the present application, the performing target detection on the image to be processed to determine the position information of each detection frame and the type of each detection frame included in the image to be processed specifically includes:
extracting a plurality of dimensional features of each detection frame from the image to be processed respectively;
performing attention mechanism learning on the plurality of dimensional features to acquire adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame;
and determining the type of each detection frame according to the adjacent frame information of each detection frame and the corresponding text line head and tail information.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, before the extracting, from the image to be processed, the multiple dimensional features of each detection frame respectively, the method further includes:
preprocessing the image to be processed to obtain a plurality of characteristic graphs corresponding to the image to be processed;
the extracting the multiple dimensional features of each detection frame from the image to be processed specifically includes:
and extracting a plurality of dimensional features of each detection frame from the plurality of feature maps respectively.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, the extracting, from the image to be processed, the multiple dimensional features of each detection frame respectively includes:
and performing convolution processing on the image to be processed by utilizing at least two filters to acquire at least two dimensional characteristics of each detection frame, wherein the receptive fields of the at least two filters are different.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, before performing attention mechanism learning on the multiple dimensional features to obtain adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame, the method further includes:
and splicing the plurality of dimensional features to generate the feature of each detection frame.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, before performing attention mechanism learning on the multiple dimensional features to obtain adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame, the method further includes:
and carrying out normalization processing on the plurality of dimension characteristics.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, the position information of each detection frame includes a coordinate of each detection frame in the first direction and an offset of each detection frame in the second direction, and the combining, according to the position information of each detection frame and the type of each detection frame, the detection frames of which the types are characters to determine each text frame included in the image to be processed specifically includes:
if the type of any detection frame is the beginning of a text line, acquiring candidate detection frames matched with a second coordinate in the first direction and the first coordinate from each detection frame according to the first coordinate of any detection frame in the first direction;
acquiring an adjacent detection frame adjacent to any detection frame in a second direction from the candidate detection frames according to a first offset of the detection frame in the second direction;
and if the type of the adjacent detection frame is a character, combining the adjacent detection frame with any detection frame.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, after the merging the detection frames of which the types are characters according to the position information of each detection frame and the type of each detection frame to determine each text frame included in the image to be processed, the method further includes:
performing connected domain analysis on each text box to determine a connected domain shape corresponding to each text box;
and if the shape of the connected component corresponding to any text box is a circle, determining that the red chapter is contained in any text box.
Another embodiment of the present application provides an apparatus for extracting image text in combination with RPA and AI, including: the first determining module is configured to perform target detection on an image to be processed to determine position information of each detection frame and a type of each detection frame included in the image to be processed, where the type of each detection frame includes: characters, non-characters, text line beginning and text line ending; the second determining module is used for combining the detection frames with the types of characters according to the position information of each detection frame and the type of each detection frame so as to determine each text frame contained in the image to be processed; and the third determining module is used for performing character recognition on each text box so as to determine characters contained in the image to be processed.
Optionally, in a possible implementation manner of the embodiment of the first aspect of the present application, the first determining module specifically includes:
the extraction unit is used for respectively extracting a plurality of dimensional features of each detection frame from the image to be processed;
a first obtaining unit, configured to perform attention mechanism learning on the multiple dimensional features to obtain adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame;
and the determining unit is used for determining the type of each detection box according to the adjacent box information of each detection box and the corresponding text line head and tail information.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, the first determining module further includes:
the second acquisition unit is used for preprocessing the image to be processed to acquire a plurality of characteristic graphs corresponding to the image to be processed;
the extraction unit specifically comprises:
and the extracting subunit is used for extracting a plurality of dimensional features of each detection frame from the plurality of feature maps respectively.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, the extracting unit specifically includes:
and the acquisition subunit is configured to perform convolution processing on the image to be processed by using at least two filters to acquire at least two dimensional features of each detection frame, where receptive fields of the at least two filters are different.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, the first determining module further includes:
and the splicing unit is used for splicing the plurality of dimensional features to generate the feature of each detection frame.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, the first determining module further includes:
and the normalization unit is used for performing normalization processing on the plurality of dimension characteristics.
Optionally, in another possible implementation manner of the embodiment of the first aspect of the present application, the position information of each detection frame includes a coordinate of each detection frame in the first direction and an offset of each detection frame in the second direction, and the second determining module specifically includes:
a third obtaining unit, configured to, when a type of any detection frame is a start of a text line, obtain, from each detection frame, a candidate detection frame that matches a second coordinate in a first direction with a first coordinate according to the first coordinate in the first direction of the detection frame;
a fourth acquiring unit, configured to acquire, from the candidate detection frames, an adjacent detection frame adjacent to the any detection frame in the second direction according to a first offset amount of the any detection frame in the second direction;
and the merging unit is used for merging the adjacent detection frame and any detection frame when the type of the adjacent detection frame is a character.
Optionally, in yet another possible implementation manner of the embodiment of the first aspect of the present application, the apparatus further includes:
the fourth determining module is used for analyzing the connected domain of each text box to determine the shape of the connected domain corresponding to each text box;
and the fifth determining module is used for determining that the any text box contains the red chapter when the connected domain corresponding to the any text box is in a circular shape.
An embodiment of another aspect of the present application provides an electronic device, which includes: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for extracting image text in combination with RPA and AI as described above when executing the program.
In another aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for extracting image text by combining RPA and AI as described above.
In another aspect of the present application, a computer program is provided, which is executed by a processor to implement the method for extracting image text by combining RPA and AI according to the embodiment of the present application.
The method, the device, the electronic device, the computer-readable storage medium, and the computer program for extracting image characters by combining RPA and AI provided in the embodiments of the present application perform target detection on an image to be processed to determine position information of each detection box and a type of each detection box included in the image to be processed, merge the detection boxes of which the types are characters according to the position information of each detection box and the type of each detection box to determine each text box included in the image to be processed, and perform character recognition on each text box to determine characters included in the image to be processed. Therefore, the type of each detection frame is determined while the target detection is carried out on the image to be processed, so that the data contents of different types in the image can be determined through one-time detection, the process of extracting the image characters is simplified, and the efficiency of extracting the characters is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of an image and text extraction method in combination with an RPA and an AI according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of the positions of an image to be processed and a detection frame;
fig. 3 is a schematic flowchart of another method for extracting image and text in combination with RPA and AI according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of another method for extracting image and text in combination with RPA and AI according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an image and text extraction device combining an RPA and an AI according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the like or similar elements throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The embodiment of the application provides an extraction method of image characters combining RPA and AI, aiming at the problems that in the related art, for documents with various types of contents, such as texts, tables, red chapters and the like, a plurality of models are generally needed to respectively and sequentially extract characters from different types of contents, so that the character extraction process is complicated, and the efficiency is low.
According to the extraction method of the image characters combining the RPA and the AI, the position information of each detection frame and the type of each detection frame contained in the image to be processed are determined by performing target detection on the image to be processed, the detection frames with the types of characters are combined according to the position information of each detection frame and the type of each detection frame, each text frame contained in the image to be processed is determined, and then character recognition is performed on each text frame, so that the characters contained in the image to be processed are determined. Therefore, the type of each detection frame is determined while the target detection is carried out on the image to be processed, so that the data contents of different types in the image can be determined through one-time detection, the process of extracting the image characters is simplified, and the efficiency of extracting the characters is improved.
The following describes in detail an extraction method, an extraction device, an electronic device, a storage medium, and a computer program for extracting image text in combination with RPA and AI provided by the present application with reference to the drawings.
Fig. 1 is a schematic flow chart of an image and text extraction method combining an RPA and an AI according to an embodiment of the present disclosure.
As shown in fig. 1, the method for extracting image text by combining RPA and AI includes the following steps:
It should be noted that the RPA technology can intelligently understand the existing application of the electronic device through the user interface, automate repeated regular operations based on rules and in large batch, such as automatically and repeatedly reading mails, reading Office components, operating databases, web pages, client software, and the like, collect data and perform complex calculations, so as to generate files and reports in large batch, thereby greatly reducing the input of labor cost and effectively improving the Office efficiency through the RPA technology. Therefore, in the scene of extracting the image characters, the RPA program can be configured in the electronic device for extracting the image characters, so that the electronic device can automatically extract the characters of the acquired image according to the rules set in the RPA program.
In practical use, the method for extracting image characters by combining RPA and AI according to the embodiment of the present application can be applied to any scene where characters in an image are extracted, and the embodiment of the present application does not limit this. For example, the method can be applied to the recording scene of paper files such as certificates and bills.
The image to be processed may refer to an image acquired by the RPA robot. For example, when the extraction method of image characters combining the RPA and the AI according to the embodiment of the present application is applied to a document information uploading scenario of an accounting department, the image to be processed may be an image of various documents, such as travel fees, transportation fees, banquet fees, and the like, which is acquired by an RPA robot and uploaded by a user through an electronic device.
The position information of the detection frame may include coordinates of each vertex of the detection frame in the image to be processed; or, when the coordinate system corresponding to the image to be processed includes the first direction and the second direction, the position information of the detection frame may also include a coordinate of the detection frame in the first direction and an offset in the second direction, so as to determine a specific position of the detection frame in the image to be processed by the position information of the detection frame.
For example, as shown in fig. 2, the position of the image to be processed and a detection frame are schematically illustrated. Wherein, O is the origin of the coordinate system corresponding to the image 20 to be processed, Y is the first direction of the coordinate system corresponding to the image 20 to be processed, and X is the second direction of the coordinate system corresponding to the image 20 to be processed, then Y is1To detect the coordinates of the frame 21 in the first direction, x1The amount of displacement of the detection frame 21 in the second direction, i.e. the position information of the detection frame 21 is (y)1,x1)。
As a possible implementation manner, an OCR algorithm based on ctpn (detecting Text in natural Image with connectivity Text forward network) may be adopted to perform target detection on an Image to be processed, so as to determine the position information of each frame to be detected and the type of each detection frame included in the Image to be processed. Specifically, a transform may be used to replace a Long Short-Term Memory (LSTM) network in a CTPN framework to perform target detection on an image to be processed, and obtain text line association information in the image, so that not only can position information of detection boxes corresponding to each target in the image be determined, and whether the type of each detection box is a character, but also information such as whether the type of each detection box is a text line start and a text line end can be determined.
And 102, combining the detection frames with the types of the characters according to the position information of each detection frame and the type of each detection frame to determine each text frame contained in the image to be processed.
The text box included in the image to be processed may include a complete and independent text in the image to be processed.
In the embodiment of the application, after the position information of each detection frame and the type of each detection frame in the image to be processed are determined, the detection frames in the same row may be determined according to the position information of each detection frame, and then whether the detection frame is the beginning of a text line may be determined according to the types of the detection frames in the same row. If one detection box A is the beginning of a text line, determining the next detection box B which is positioned in the same line and adjacent to the detection box A according to the position information of the detection box A; if the type of the detection box B is character and the detection box B is the end of the text line, the detection box A and the detection box B can be combined to be used as a text box; if the type of the detection box B is a character and is not the end of a text line, the detection box A and the detection box B can be merged, the next detection box C adjacent to the detection box B is continuously determined, the steps are further repeated to determine whether the detection box C can be merged with the detection box A and the detection box B or not until the detection box D with the type of the end of the text line is traversed, and all the detection boxes between the detection box A and the detection box D can be merged to generate a text box. Thus, repeating the above steps can determine all text boxes included in the image to be processed.
And 103, performing character recognition on each text box to determine characters contained in the image to be processed.
In the embodiment of the application, after the text box included in the image to be processed is determined, the characters in each text box can be identified by adopting any character identification algorithm, so as to determine the characters corresponding to each text box, and further determine the characters included in the image to be processed.
According to the extraction method of the image characters combining the RPA and the AI, the position information of each detection frame and the type of each detection frame contained in the image to be processed are determined by performing target detection on the image to be processed, the detection frames with the types of characters are combined according to the position information of each detection frame and the type of each detection frame, each text frame contained in the image to be processed is determined, and then character recognition is performed on each text frame, so that the characters contained in the image to be processed are determined. Therefore, the type of each detection frame is determined while the target detection is carried out on the image to be processed, so that the data contents of different types in the image can be determined through one-time detection, the process of extracting the image characters is simplified, and the efficiency of extracting the characters is improved.
In a possible implementation form of the method, the multiple dimensional features of the detection frame can be extracted to determine the type of the detection frame, so that the accuracy of image character extraction is further improved.
The method for extracting image and text in combination with RPA and AI provided in the embodiment of the present application is further described with reference to fig. 3.
Fig. 3 is a schematic flowchart of another method for extracting image and text by combining an RPA and an AI according to an embodiment of the present disclosure.
As shown in fig. 3, the method for extracting image text by combining RPA and AI includes the following steps:
The detailed implementation process and principle of step 201 may refer to the detailed description of the above embodiments, and are not described herein again.
The plurality of dimensional features may be features that can represent features of a detection frame in the image to be processed from different granularities.
In the embodiment of the application, convolution processing can be performed on an image to be processed through convolution kernels with different sizes so as to generate a plurality of dimensional features of each detection frame in the image to be processed, wherein a result of convolution and convolution processing performed on the image to be processed is used as one dimensional feature of each detection frame in the image to be processed; or, the convolution processing can be performed on the image to be processed by convolution kernels with the same size and different convolution modes to generate a plurality of dimensional features of each detection frame in the image to be processed, wherein a convolution result corresponding to one convolution mode is one dimensional feature of each detection frame in the image to be processed. Therefore, the characteristics of different granularities of the image to be processed are extracted, and the accuracy of identifying the detection frame is improved.
As a possible implementation, the image to be processed may be subjected to convolution processing by different filters to generate different dimensional features of each detection frame. That is, in a possible implementation form of the embodiment of the present application, the step 202 may include:
and performing convolution processing on the image to be processed by utilizing at least two filters to acquire at least two dimensional characteristics of each detection frame, wherein the receptive fields of the at least two filters are different.
The filter may be a convolution kernel with a coefficient of expansion. For example, the filter may have a size of 3 × 3, and the expansion coefficient may be 1, 2, 5, and so on.
As a possible implementation, n may be used1Convolution processing is carried out on the image to be processed by each filter with the size of 3 x 3, namely the receptive field of the filter is 3 x 3, so as to generate n of each detection frame in the image to be processed1Dimension characteristics; and can pass through n2Carrying out hole convolution on the image to be processed by using a filter with the size of 3 multiplied by 3 and the expansion coefficient of 2, namely the size of the receptive field of the filter is 7 multiplied by 7, so as to generate n of each detection frame in the image to be processed2Dimension characteristics; finally, can be represented by n3Carrying out hole convolution on the image to be processed by using a filter with the size of 3 multiplied by 3 and the expansion coefficient of 5, namely the size of the receptive field of the filter is 19 multiplied by 19, so as to generate n of each detection frame in the image to be processed3Dimensional features such that n for each detection frame of the image to be processed can be generated1+n2+n3And (5) dimension characteristics.
In practice, n is1、n2、n3The specific value of (a) can be determined according to actual needs, and the embodiment of the application is not limited thereto. For example, n1Can be 256, n2May be 128, n3May be 128.
Furthermore, before extracting the multi-dimensional features of the image to be processed, the image to be processed can be preprocessed to generate a feature map corresponding to the image to be processed, so that the accuracy of recognizing the image to be processed is further improved. That is, in a possible implementation form of the embodiment of the present application, before the step 202, the method may further include:
preprocessing an image to be processed to obtain a plurality of characteristic graphs corresponding to the image to be processed;
accordingly, the step 202 may include:
and extracting a plurality of dimensional features of each detection frame from the plurality of feature maps respectively.
As a possible implementation manner, the densenert 121 may be adopted to perform feature extraction on the image to be processed, so as to generate a plurality of feature maps corresponding to the image to be processed. For example, feature maps with a size of 512 × (pic _ height/8) × (pic _ width/8) corresponding to the image to be processed may be generated, that is, feature maps with a size of (pic _ height/8) × (pic _ width/8) corresponding to the image to be processed are generated, where pic _ height is the height of the image to be processed, and pic _ width is the width of the image to be processed. After a plurality of feature maps corresponding to the image to be processed are determined, each feature map may be subjected to convolution processing, so as to generate a plurality of feature maps, and a plurality of dimensional features of each detection frame are extracted respectively.
Specifically, the same convolution processing may be performed on the feature map in the above manner of performing convolution on the image to be processed, so as to extract the multiple dimensional features of each detection frame from the feature map. For example, if there are 512 feature maps corresponding to the image to be processed, n can be passed1The filters with the size of 3 × 3 × 512 respectively perform convolution processing on 512 feature maps to generate n of each detection frame1Dimension characteristics; and can pass through n2Respectively performing hole convolution on 512 feature maps by using filters with the size of 3 multiplied by 512 and the expansion coefficient of 2 to generate n of each detection frame2Dimension characteristics; finally, can be represented by n3Respectively performing hole convolution on 512 feature maps by using filters with the size of 3 multiplied by 512 and the expansion coefficient of 5 to generate n of each detection frame3Dimensional features such that n for each detection frame of the image to be processed can be generated1+n2+n3And (5) dimension characteristics.
In this embodiment of the present application, a target detection model including a multilayer decoder and an attention mechanism may be used to learn a plurality of dimensional features to obtain adjacent box information and corresponding text line head and tail information of each detection box, that is, to obtain type information of the adjacent detection box of each detection box, and whether each detection box is a text line head and whether each detection box is a text line tail.
As a possible implementation manner, when the target detection model identifies the image to be processed, if it is detected that the image to be processed includes k detection boxes, the target detection model may input 2k pieces of coordinate information for indicating the position information of each detection box (for example, the coordinate of each detection box in the first direction and the offset of each detection box in the second direction), and may output 4k pieces of score information for indicating the category of each detection box, that is, each detection box may correspond to 4 pieces of score information for indicating the probability that the detection box is a character, the probability that the detection box is a non-character, the probability that the detection box is the beginning of a text line, and the probability that the detection box is the end of a text line, respectively.
Furthermore, before the attention mechanism learning is carried out on the multiple dimension features, the multiple dimension features can be fused to identify the multiple dimension features integrally, and the accuracy of identifying the detection frame in the image to be processed is further improved. That is, in a possible implementation form of the embodiment of the present application, before step 203, the method may further include:
and splicing the plurality of dimensional features to generate the feature of each detection frame.
As a possible implementation manner, in order to enable the target detection mode to recognize the image to be processed, the multiple dimensional features of each detection frame in the image to be processed may be spliced in combination with the features of the image to be processed with different granularities to generate the features of each detection frame, that is, the features of each detection frame are represented by an overall feature vector, so that the features of each detection frame may include feature information of the image to be processed with different granularities, thereby improving the accuracy of recognizing the detection frame in the image to be processed.
For example, the image to be processed is subjected to convolution processing through 256 filters with the size of 3 × 3, and 256-dimensional features of each detection frame in the image to be processed are generated; performing hole convolution on the image to be processed through 128 filters with the size of 3 multiplied by 3 and the expansion coefficient of 2 to generate 128-dimensional characteristics of each detection frame in the image to be processed; the method comprises the steps of performing hole convolution on an image to be processed through 128 filters with the size of 3 x 3 and the expansion coefficient of 5 to generate 128-dimensional features of each detection frame in the image to be processed, and accordingly performing splicing processing on the generated 256-dimensional features and the two 128-dimensional features to generate 512-dimensional features of each detection frame of the image to be processed.
As a possible implementation manner, after the multi-dimensional features are spliced, the features generated after splicing can be compressed through the full connection layer, so as to reduce the size of the spliced features, and reduce the calculation amount for identifying the spliced features. For example, if the feature after splicing is 512 dimensions, the feature after splicing can be compressed into 256 dimensions through the full connection layer.
Further, since the feature metrics determined in different ways may be different, features with too small brightness may be easily ignored, thereby affecting the reliability of the recognition of the image to be processed. That is, in a possible implementation form of the embodiment of the present application, before step 203, the method may further include:
and carrying out normalization processing on the multiple dimension characteristics.
As a possible implementation manner, after the multiple dimensional features of each detection frame in the image to be processed are determined in different manners, normalization processing may be performed on the multiple dimensional features, so that the metrics of the dimensional features are in the same numerical range, and thus, the influence of different metrics of the multiple dimensional features on the recognition accuracy of the image to be processed can be reduced.
As another possible implementation, after the multiple dimensional features are spliced, normalization processing may be performed on the spliced features.
In the embodiment of the application, after the information of the adjacent detection frame and the corresponding text line head and tail information of each detection frame in the image to be processed are determined through an attention mechanism, the type of each detection frame can be determined according to the information of the adjacent detection frame and the text line head and tail information of each detection frame.
As a possible implementation manner, the type of each detection box may also be determined only according to the head and tail information of the text line corresponding to each detection box. For example, the 4 pieces of score information of the detection box a output in the target detection mode are [0.99,0,1,0], where the 4 pieces of score information sequentially represent the probability that the detection box a is a character, the probability that the detection box a is a non-character, the probability that the detection box a is a beginning of a text line, and the probability that the detection box a is an end of a text line, so that the type of the detection box can be determined as a character and a beginning of a text line.
As another possible implementation manner, the type of each detection box may also be determined by the adjacent box information of each detection box and the corresponding text line head and tail information. Specifically, the type of each detection frame may be determined according to the head and tail information of the text line corresponding to each detection frame, and then the type of each detection frame is checked according to the adjacent frame information of each detection frame, so as to assist in judging whether the determined type of the detection frame is accurate, and further improve the accuracy of determining the type of the detection frame.
For example, the 4 pieces of score information of the detection box a output by the target detection mode are [0.99,0,1,0], where the 4 pieces of score information sequentially represent the probability that the detection box a is a character, the probability that the detection box a is a non-character, the probability that the detection box a is a beginning of a text line, and the probability that the detection box a is an end of a text line in order, so that the type of the detection box a can be determined as a character and a beginning of a text line. The score information of the adjacent box B located before the detection box a is [0.9,0.05,0.1,0.95], and the score information of the adjacent box C located after the detection box a is [0.92,0.1,0.1,0.1], so that the probability that the type of the detection box a is the beginning of the text line is very high, and the type of the detection box a is determined to be the beginning of the character or the text line.
And step 206, performing character recognition on each text box to determine characters contained in the image to be processed.
The detailed implementation process and principle of the steps 205 and 206 can refer to the detailed description of the above embodiments, and are not described herein again.
The method for extracting image characters by combining the RPA and the AI according to the embodiment of the present application includes extracting multiple dimensional features of each detection box from an image to be processed, and performing attention mechanism learning on the multiple dimensional features to obtain adjacent box information of each detection box and text line head and tail information corresponding to each detection box, then determining a type of each detection box according to the adjacent box information of each detection box and the text line head and tail information corresponding to each detection box, and further combining the detection boxes of which the types are characters according to position information of each detection box and the type of each detection box to determine each text box included in the image to be processed, and performing character recognition on each text box to determine characters included in the image to be processed. Therefore, by extracting the multi-dimensional features of the images to be processed with different granularities and performing feature representation on each detection frame in the images to be processed, the data contents of different types in the images can be determined by one-time detection, the image character extraction process is simplified, the character extraction efficiency is improved, and the character extraction accuracy and reliability are further improved.
In a possible implementation form of the method and the device, the character type detection frames can be combined according to the position information of the detection frames to determine each text frame included in the image, and connected domain analysis can be performed on the text frames to realize identification of the red seal in the image, so that the practicability and the universality of image character extraction are further improved.
The method for extracting image text in combination with RPA and AI provided in the embodiment of the present application is further described below with reference to fig. 4.
Fig. 4 is a schematic flowchart of another method for extracting image and text by combining RPA and AI according to an embodiment of the present disclosure.
As shown in fig. 4, the method for extracting image text by combining RPA and AI includes the following steps:
The detailed implementation process and principle of step 301 may refer to the detailed description of the above embodiments, and are not described herein again.
The position information of the detection frames comprises the coordinates of each detection frame in the first direction and the offset of each detection frame in the second direction. It should be noted that the first direction may be a Y-axis direction of a coordinate system corresponding to the image to be processed, and the second direction may be an X-axis direction of the coordinate system corresponding to the image to be processed, which is not limited in this embodiment of the present application. The specific schematic diagram can be explained with reference to fig. 2 and fig. 2 in the above embodiment, and is not repeated here.
In the embodiment of the present application, if the type of one detection box is the beginning of a text line, the detection boxes in the same independent text can be determined from the detection boxes in the same line and merged. Specifically, assuming that the type of the detection frame a is the beginning of a text line, a detection frame with a difference between the second coordinate and the first coordinate smaller than the first threshold may be determined according to the first coordinate of the detection frame a in the first direction and the second coordinate of each of the other detection frames in the image to be processed in the first direction, and the detection frame is determined as a candidate detection frame with the second coordinate matching the first coordinate, that is, the candidate detection frame and the detection frame a are in the same line.
It should be noted that, in actual use, a specific value of the first threshold may be determined according to actual needs and the height of the detection frame, which is not limited in this application. For example, the first threshold may be 1/3 for the detection box height.
In the embodiment of the present application, after determining the candidate detection frame in the same line as the detection frame whose type is the beginning of the text line, an adjacent detection frame adjacent to the detection frame may be determined according to the first offset of the detection frame in the second direction and the second offset of each candidate detection frame in the second direction. Specifically, assuming that the detection box a is a detection box with a type of the beginning of a text line, if a difference between a second offset of a candidate detection box B corresponding to the detection box a and a first offset of the detection box a is less than or equal to a width of the detection box, it may be determined that the candidate detection box B is an adjacent detection box of the detection box a; otherwise, it may be determined that the candidate detection box B is not an adjacent detection box to the detection box a.
And 304, if the type of the adjacent detection frame is a character, combining the adjacent detection frame with any detection frame.
In the embodiment of the present application, after determining the adjacent detection frame of the detection frame, if the type of the adjacent detection frame is a character, the adjacent detection frame and the detection frame may be merged. Thereafter, in the same manner, an adjacent detection frame adjacent to the adjacent detection frame is determined, and it is determined whether or not the merging process can be performed. It can be understood that after traversing all the detection boxes in the image to be processed by the above method, all the text boxes included in the image to be processed can be determined.
And 305, performing connected component analysis on each text box to determine the shape of the connected component corresponding to each text box.
In the embodiment of the disclosure, the connected component analysis may be further performed on the text box to identify the red chapter included in the image to be processed. Specifically, because the contained content is a text box of characters, which is usually square, even if the text box containing the content of characters is analyzed for connected components, the generated connected components are usually square; the red seal is generally a high-height circle, and the red seal generally contains different types of contents such as characters and images, and the characters are not distributed in rows, so that the red seal part can be generally divided into a plurality of text boxes, and the text boxes corresponding to the red seal have the same image characteristics. Therefore, the connected component analysis is performed on each text box included in the image to be processed, the text boxes corresponding to the red seal can be connected together to form a complete connected component, and the connected component corresponding to the red seal is usually circular.
In the embodiment of the present application, since the shape of the chapter red in the text is generally circular, after performing connected component analysis on each text box in the image to be processed, since the shape of the connected component of the text box corresponding to the character is generally square, and the shape of the connected component corresponding to the chapter red is generally circular, after performing connected component analysis on each text box, the text box whose corresponding connected component shape is circular can be determined as the text box containing the chapter red.
The detailed implementation process and principle of the step 307 may refer to the detailed description of the above embodiments, and are not described herein again.
In the method for extracting image characters by combining RPA and AI provided in the embodiment of the present application, when a detection frame is a beginning of a text line, a candidate detection frame is determined according to a first coordinate of the detection frame in a first direction and a second coordinate of other detection frames in the first direction, and an adjacent detection frame adjacent to the detection frame and having a character type is merged with the detection frame according to a first offset of the detection frame in a second direction and a type of the candidate detection frame, so that a red chapter in an image to be processed is determined through connected component analysis, and character recognition is performed on each text frame to determine characters included in the image to be processed. From this, through carrying out connected domain analysis to the text box to the red chapter that contains in the discernment image, thereby not only can determine the data content of different grade type in the image through once detecting, simplified the process that the image characters drawed, promoted the efficiency that the characters were drawed, further promoted the practicality and the commonality that the image characters were drawed moreover.
In order to implement the above embodiments, the present application further provides an image text extraction device combining RPA and AI.
Fig. 5 is a schematic structural diagram of an image and text extraction device combining an RPA and an AI according to an embodiment of the present disclosure.
As shown in fig. 5, the RPA and AI combined image character extracting apparatus 40 includes:
a first determining module 41, configured to perform target detection on the image to be processed to determine position information of each detection frame and a type of each detection frame included in the image to be processed, where the type of each detection frame includes: characters, non-characters, text line beginning and text line ending;
a second determining module 42, configured to combine the detection frames with the types of characters according to the position information of each detection frame and the type of each detection frame, so as to determine each text frame included in the image to be processed;
and a third determining module 43, configured to perform character recognition on each text box to determine characters included in the image to be processed.
In practical use, the extraction device for RPA and AI combined image characters provided in the embodiment of the present application may be configured in any electronic device to execute the extraction method for RPA and AI combined image characters.
The extraction device of image characters combining RPA and AI provided in the embodiment of the present application determines, by performing target detection on an image to be processed, location information of each detection box and a type of each detection box included in the image to be processed, and merges, according to the location information of each detection box and the type of each detection box, the detection boxes whose types are characters to determine each text box included in the image to be processed, and then performs character recognition on each text box to determine characters included in the image to be processed. Therefore, the type of each detection frame is determined while the target detection is carried out on the image to be processed, so that the data contents of different types in the image can be determined through one-time detection, the process of extracting the image characters is simplified, and the efficiency of extracting the characters is improved.
In a possible implementation form of the present application, the first determining module 41 specifically includes:
the extraction unit is used for respectively extracting a plurality of dimensional features of each detection frame from the image to be processed;
a first obtaining unit, configured to perform attention mechanism learning on the multiple dimensional features to obtain adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame;
and the determining unit is used for determining the type of each detection box according to the adjacent box information of each detection box and the corresponding text line head and tail information.
Further, in another possible implementation form of the present application, the first determining module 41 further includes:
the second acquisition unit is used for preprocessing the image to be processed so as to acquire a plurality of characteristic graphs corresponding to the image to be processed;
correspondingly, the extraction unit specifically includes:
and the extracting subunit is used for extracting the multiple dimensional features of each detection frame from the multiple feature maps respectively.
Further, in another possible implementation form of the present application, the extracting unit specifically includes:
and the acquisition subunit is used for performing convolution processing on the image to be processed by utilizing at least two filters so as to acquire at least two dimensional characteristics of each detection frame, wherein the receptive fields of the at least two filters are different.
Further, in another possible implementation form of the present application, the first determining module 41 further includes:
and the splicing unit is used for splicing the plurality of dimensional features to generate the feature of each detection frame.
Further, in another possible implementation form of the present application, the first determining module 41 further includes:
and the normalization unit is used for performing normalization processing on the multiple dimension characteristics.
Further, in another possible implementation form of the present application, the position information of each detection frame includes a coordinate of each detection frame in the first direction and an offset of each detection frame in the second direction; correspondingly, the second determining module 42 specifically includes:
a third acquiring unit, configured to acquire, when the type of any one of the detection frames is a start of a text line, a candidate detection frame that matches a second coordinate in the first direction with the first coordinate from each of the detection frames according to the first coordinate in the first direction of any one of the detection frames;
a fourth acquiring unit configured to acquire, from the candidate detection frames, an adjacent detection frame adjacent to any detection frame in the second direction according to a first offset amount of the any detection frame in the second direction;
and the merging unit is used for merging the adjacent detection frame with any detection frame when the type of the adjacent detection frame is a character.
Further, in another possible implementation form of the present application, the device 40 for extracting image text in combination with RPA and AI further includes:
the fourth determining module is used for analyzing the connected domain of each text box so as to determine the shape of the connected domain corresponding to each text box;
and the fifth determining module is used for determining that the red chapter is contained in any text box when the shape of the connected domain corresponding to any text box is a circle.
It should be noted that the foregoing explanation of the embodiment of the method for extracting image characters by combining RPA and AI shown in fig. 1, fig. 3, and fig. 4 is also applicable to the device 40 for extracting image characters by combining RPA and AI in this embodiment, and will not be repeated here.
The extraction device for the image characters combining the RPA and the AI provided by the embodiment of the application extracts a plurality of dimensional features of each detection frame from an image to be processed respectively, and performs attention mechanism learning on the plurality of dimensional features to obtain adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame, and then determines the type of each detection frame according to the adjacent frame information of each detection frame and the text line head and tail information corresponding to each detection frame, and then combines the detection frames with the type of characters according to the position information of each detection frame and the type of each detection frame to determine each text frame contained in the image to be processed, and performs character recognition on each text frame to determine characters contained in the image to be processed. Therefore, by extracting the multi-dimensional features of the images to be processed with different granularities and performing feature representation on each detection frame in the images to be processed, the data contents of different types in the images can be determined by one-time detection, the image character extraction process is simplified, the character extraction efficiency is improved, and the character extraction accuracy and reliability are further improved.
In order to implement the above embodiments, the present application further provides an electronic device.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
As shown in fig. 6, the electronic device 200 includes:
a memory 210 and a processor 220, a bus 230 connecting different components (including the memory 210 and the processor 220), wherein the memory 210 stores a computer program, and when the processor 220 executes the program, the method for extracting image and text by combining RPA and AI according to the embodiment of the present application is implemented.
A program/utility 280 having a set (at least one) of program modules 270, including but not limited to an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment, may be stored in, for example, the memory 210. The program modules 270 generally perform the functions and/or methodologies of the embodiments described herein.
The processor 220 executes various functional applications and data processing by executing programs stored in the memory 210.
It should be noted that, for the implementation process and the technical principle of the electronic device of this embodiment, reference is made to the foregoing explanation of the method for extracting image and text in combination with RPA and AI in this embodiment of the application, and details are not repeated here.
The electronic device provided in this embodiment of the present application may execute the extraction method of image words by combining RPA and AI as described above, and perform target detection on an image to be processed to determine location information of each detection box and a type of each detection box included in the image to be processed, and according to the location information of each detection box and the type of each detection box, combine the detection boxes whose types are characters to determine each text box included in the image to be processed, and further perform word recognition on each text box to determine words included in the image to be processed. Therefore, the type of each detection frame is determined while the target detection is carried out on the image to be processed, so that the data contents of different types in the image can be determined through one-time detection, the process of extracting the image characters is simplified, and the efficiency of extracting the characters is improved.
In order to implement the above embodiments, the present application also proposes a computer-readable storage medium.
The computer readable storage medium stores thereon a computer program, which when executed by a processor, implements the extraction method of the image text combining the RPA and the AI according to the embodiment of the present application.
In order to implement the foregoing embodiments, a further embodiment of the present application provides a computer program, which when executed by a processor, implements the method for extracting image text by combining RPA and AI according to the embodiments of the present application.
In an alternative implementation, the embodiments may be implemented in any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on the consumer electronic device, as a stand-alone software package, partly on the consumer electronic device and partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external electronic device (e.g., through the internet using an internet service provider).
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (18)
1. An extraction method of image characters combining RPA and AI is characterized by comprising the following steps:
performing target detection on an image to be processed to determine position information of each detection frame and a type of each detection frame included in the image to be processed, wherein the type of each detection frame comprises: characters, non-characters, text line beginning and text line ending;
combining the detection frames with the types of characters according to the position information of each detection frame and the type of each detection frame to determine each text frame contained in the image to be processed;
and performing character recognition on each text box to determine characters contained in the image to be processed.
2. The method according to claim 1, wherein the performing target detection on the image to be processed to determine the position information of each detection frame and the type of each detection frame included in the image to be processed specifically includes:
extracting a plurality of dimensional features of each detection frame from the image to be processed respectively;
performing attention mechanism learning on the plurality of dimensional features to acquire adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame;
and determining the type of each detection frame according to the adjacent frame information of each detection frame and the corresponding text line head and tail information.
3. The method of claim 2, wherein before said extracting the plurality of dimensional features of each of the detection frames from the image to be processed, respectively, further comprises:
preprocessing the image to be processed to obtain a plurality of characteristic graphs corresponding to the image to be processed;
the extracting the multiple dimensional features of each detection frame from the image to be processed specifically includes:
and extracting a plurality of dimensional features of each detection frame from the plurality of feature maps respectively.
4. The method according to claim 2, wherein the extracting the dimensional features of each of the detection frames from the image to be processed respectively comprises:
and performing convolution processing on the image to be processed by utilizing at least two filters to acquire at least two dimensional characteristics of each detection frame, wherein the receptive fields of the at least two filters are different.
5. The method of claim 2, before the learning of the attention mechanism on the plurality of dimensional features to obtain the adjacent box information of each detection box and the head and tail information of the text line corresponding to each detection box, further comprising:
and splicing the plurality of dimensional features to generate the feature of each detection frame.
6. The method according to any one of claims 2-5, further comprising, before the learning of the attention mechanism on the plurality of dimensional features to obtain the adjacent box information of each of the detection boxes and the text line head-tail information corresponding to each of the detection boxes, the following steps:
and carrying out normalization processing on the plurality of dimension characteristics.
7. The method according to any one of claims 1 to 5, wherein the position information of each of the detection boxes includes a coordinate of each of the detection boxes in a first direction and an offset of each of the detection boxes in a second direction, and the combining the detection boxes of which the types are characters according to the position information of each of the detection boxes and the type of each of the detection boxes to determine each text box included in the image to be processed specifically includes:
if the type of any detection frame is the beginning of a text line, acquiring candidate detection frames matched with a second coordinate in the first direction and the first coordinate from each detection frame according to the first coordinate of any detection frame in the first direction;
acquiring an adjacent detection frame adjacent to any detection frame in a second direction from the candidate detection frames according to a first offset of the detection frame in the second direction;
and if the type of the adjacent detection frame is a character, combining the adjacent detection frame with any detection frame.
8. The method according to any one of claims 1 to 5, wherein after said combining the detection boxes of which the types are characters according to the position information of each detection box and the type of each detection box to determine each text box included in the image to be processed, further comprising:
performing connected domain analysis on each text box to determine a connected domain shape corresponding to each text box;
and if the shape of the connected component corresponding to any text box is a circle, determining that the red chapter is contained in any text box.
9. An extraction device for combining RPA and AI images and texts, comprising:
the first determining module is configured to perform target detection on an image to be processed to determine position information of each detection frame and a type of each detection frame included in the image to be processed, where the type of each detection frame includes: characters, non-characters, text line beginning and text line ending;
the second determining module is used for combining the detection frames with the types of characters according to the position information of each detection frame and the type of each detection frame so as to determine each text frame contained in the image to be processed;
and the third determining module is used for performing character recognition on each text box so as to determine characters contained in the image to be processed.
10. The apparatus of claim 9, wherein the first determining module specifically comprises:
the extraction unit is used for respectively extracting a plurality of dimensional features of each detection frame from the image to be processed;
a first obtaining unit, configured to perform attention mechanism learning on the multiple dimensional features to obtain adjacent frame information of each detection frame and text line head and tail information corresponding to each detection frame;
and the determining unit is used for determining the type of each detection box according to the adjacent box information of each detection box and the corresponding text line head and tail information.
11. The apparatus of claim 10, wherein the first determining module further comprises:
the second acquisition unit is used for preprocessing the image to be processed to acquire a plurality of characteristic graphs corresponding to the image to be processed;
the extraction unit specifically comprises:
and the extracting subunit is used for extracting a plurality of dimensional features of each detection frame from the plurality of feature maps respectively.
12. The apparatus according to claim 10, wherein the extracting unit specifically comprises:
and the acquisition subunit is configured to perform convolution processing on the image to be processed by using at least two filters to acquire at least two dimensional features of each detection frame, where receptive fields of the at least two filters are different.
13. The apparatus of claim 10, wherein the first determining module further comprises:
and the splicing unit is used for splicing the plurality of dimensional features to generate the feature of each detection frame.
14. The apparatus of any of claims 10-13, wherein the first determining module further comprises:
and the normalization unit is used for performing normalization processing on the plurality of dimension characteristics.
15. The apparatus according to any one of claims 9-13, wherein the position information of each of the detection frames includes coordinates of each of the detection frames in a first direction and an offset of each of the detection frames in a second direction, and the second determining module specifically includes:
a third obtaining unit, configured to, when a type of any detection frame is a start of a text line, obtain, from each detection frame, a candidate detection frame that matches a second coordinate in a first direction with a first coordinate according to the first coordinate in the first direction of the detection frame;
a fourth acquiring unit, configured to acquire, from the candidate detection frames, an adjacent detection frame adjacent to the any detection frame in the second direction according to a first offset amount of the any detection frame in the second direction;
and the merging unit is used for merging the adjacent detection frame and any detection frame when the type of the adjacent detection frame is a character.
16. The apparatus of any of claims 9-13, further comprising:
the fourth determining module is used for analyzing the connected domain of each text box to determine the shape of the connected domain corresponding to each text box;
and the fifth determining module is used for determining that the any text box contains the red chapter when the connected domain corresponding to the any text box is in a circular shape.
17. An electronic device, comprising: memory, processor and program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the method for extracting image text in combination with RPA and AI according to any of claims 1-8.
18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the RPA and AI combined image text extraction method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010886737.3A CN112149663A (en) | 2020-08-28 | 2020-08-28 | RPA and AI combined image character extraction method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010886737.3A CN112149663A (en) | 2020-08-28 | 2020-08-28 | RPA and AI combined image character extraction method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112149663A true CN112149663A (en) | 2020-12-29 |
Family
ID=73890014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010886737.3A Pending CN112149663A (en) | 2020-08-28 | 2020-08-28 | RPA and AI combined image character extraction method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112149663A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112651394A (en) * | 2020-12-31 | 2021-04-13 | 北京一起教育科技有限责任公司 | Image detection method and device and electronic equipment |
CN112766892A (en) * | 2021-01-11 | 2021-05-07 | 北京来也网络科技有限公司 | Method and device for combining fund ratio of RPA and AI and electronic equipment |
CN112926420A (en) * | 2021-02-09 | 2021-06-08 | 海信视像科技股份有限公司 | Display device and menu character recognition method |
CN113778303A (en) * | 2021-08-23 | 2021-12-10 | 深圳价值在线信息科技股份有限公司 | Character extraction method and device and computer readable storage medium |
CN114419636A (en) * | 2022-01-10 | 2022-04-29 | 北京百度网讯科技有限公司 | Text recognition method, device, equipment and storage medium |
WO2022247823A1 (en) * | 2021-05-25 | 2022-12-01 | 阿里巴巴(中国)有限公司 | Image detection method, and device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809164A (en) * | 2016-03-11 | 2016-07-27 | 北京旷视科技有限公司 | Character identification method and device |
CN109902622A (en) * | 2019-02-26 | 2019-06-18 | 中国科学院重庆绿色智能技术研究院 | A kind of text detection recognition methods for boarding pass information verifying |
US20190272438A1 (en) * | 2018-01-30 | 2019-09-05 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for detecting text |
CN110287952A (en) * | 2019-07-01 | 2019-09-27 | 中科软科技股份有限公司 | A kind of recognition methods and system for tieing up sonagram piece character |
CN110442744A (en) * | 2019-08-09 | 2019-11-12 | 泰康保险集团股份有限公司 | Extract method, apparatus, electronic equipment and the readable medium of target information in image |
WO2020010547A1 (en) * | 2018-07-11 | 2020-01-16 | 深圳前海达闼云端智能科技有限公司 | Character identification method and apparatus, and storage medium and electronic device |
-
2020
- 2020-08-28 CN CN202010886737.3A patent/CN112149663A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105809164A (en) * | 2016-03-11 | 2016-07-27 | 北京旷视科技有限公司 | Character identification method and device |
US20190272438A1 (en) * | 2018-01-30 | 2019-09-05 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for detecting text |
WO2020010547A1 (en) * | 2018-07-11 | 2020-01-16 | 深圳前海达闼云端智能科技有限公司 | Character identification method and apparatus, and storage medium and electronic device |
CN109902622A (en) * | 2019-02-26 | 2019-06-18 | 中国科学院重庆绿色智能技术研究院 | A kind of text detection recognition methods for boarding pass information verifying |
CN110287952A (en) * | 2019-07-01 | 2019-09-27 | 中科软科技股份有限公司 | A kind of recognition methods and system for tieing up sonagram piece character |
CN110442744A (en) * | 2019-08-09 | 2019-11-12 | 泰康保险集团股份有限公司 | Extract method, apparatus, electronic equipment and the readable medium of target information in image |
Non-Patent Citations (2)
Title |
---|
LIANG YAO 等: "Grape convolutional networks for text classification", 《PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》, vol. 33, no. 1, 17 July 2019 (2019-07-17), pages 7370 - 7377 * |
谢金宝 等: "基于语义理解注意力神经网络的多元特征融合中文文本分类", 《电子与信息学报》, vol. 40, no. 5, 31 May 2018 (2018-05-31), pages 1258 - 1265 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112651394A (en) * | 2020-12-31 | 2021-04-13 | 北京一起教育科技有限责任公司 | Image detection method and device and electronic equipment |
CN112651394B (en) * | 2020-12-31 | 2023-11-14 | 北京一起教育科技有限责任公司 | Image detection method and device and electronic equipment |
CN112766892A (en) * | 2021-01-11 | 2021-05-07 | 北京来也网络科技有限公司 | Method and device for combining fund ratio of RPA and AI and electronic equipment |
CN112926420A (en) * | 2021-02-09 | 2021-06-08 | 海信视像科技股份有限公司 | Display device and menu character recognition method |
CN112926420B (en) * | 2021-02-09 | 2022-11-08 | 海信视像科技股份有限公司 | Display device and menu character recognition method |
WO2022247823A1 (en) * | 2021-05-25 | 2022-12-01 | 阿里巴巴(中国)有限公司 | Image detection method, and device and storage medium |
CN113778303A (en) * | 2021-08-23 | 2021-12-10 | 深圳价值在线信息科技股份有限公司 | Character extraction method and device and computer readable storage medium |
CN114419636A (en) * | 2022-01-10 | 2022-04-29 | 北京百度网讯科技有限公司 | Text recognition method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10853638B2 (en) | System and method for extracting structured information from image documents | |
CN112149663A (en) | RPA and AI combined image character extraction method and device and electronic equipment | |
CN109858555B (en) | Image-based data processing method, device, equipment and readable storage medium | |
CN108564035B (en) | Method and system for identifying information recorded on document | |
CN110148084B (en) | Method, device, equipment and storage medium for reconstructing 3D model from 2D image | |
CN110232340B (en) | Method and device for establishing video classification model and video classification | |
CN110084289B (en) | Image annotation method and device, electronic equipment and storage medium | |
CN109408829B (en) | Method, device, equipment and medium for determining readability of article | |
CN109947924B (en) | Dialogue system training data construction method and device, electronic equipment and storage medium | |
CN110188766B (en) | Image main target detection method and device based on convolutional neural network | |
CN110826494A (en) | Method and device for evaluating quality of labeled data, computer equipment and storage medium | |
CN112509661B (en) | Methods, computing devices, and media for identifying physical examination reports | |
CN112016638A (en) | Method, device and equipment for identifying steel bar cluster and storage medium | |
JP2022185143A (en) | Text detection method, and text recognition method and device | |
CN111401309A (en) | CNN training and remote sensing image target identification method based on wavelet transformation | |
CN111563429A (en) | Drawing verification method and device, electronic equipment and storage medium | |
CN111008624A (en) | Optical character recognition method and method for generating training sample for optical character recognition | |
CN110737770B (en) | Text data sensitivity identification method and device, electronic equipment and storage medium | |
CN110796108A (en) | Method, device and equipment for detecting face quality and storage medium | |
CN111414889B (en) | Financial statement identification method and device based on character identification | |
CN113762455A (en) | Detection model training method, single character detection method, device, equipment and medium | |
CN111898528A (en) | Data processing method and device, computer readable medium and electronic equipment | |
CN113807416B (en) | Model training method and device, electronic equipment and storage medium | |
CN115761778A (en) | Document reconstruction method, device, equipment and storage medium | |
CN111428724B (en) | Examination paper handwriting statistics method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |