CN115546796A - Non-contact data acquisition method and system based on visual computation - Google Patents

Non-contact data acquisition method and system based on visual computation Download PDF

Info

Publication number
CN115546796A
CN115546796A CN202211155147.9A CN202211155147A CN115546796A CN 115546796 A CN115546796 A CN 115546796A CN 202211155147 A CN202211155147 A CN 202211155147A CN 115546796 A CN115546796 A CN 115546796A
Authority
CN
China
Prior art keywords
text
image
line
character
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211155147.9A
Other languages
Chinese (zh)
Inventor
闵圣捷
方波
饶定远
唐雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ceic Metadata Technology Co ltd
Original Assignee
Ceic Metadata Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ceic Metadata Technology Co ltd filed Critical Ceic Metadata Technology Co ltd
Priority to CN202211155147.9A priority Critical patent/CN115546796A/en
Publication of CN115546796A publication Critical patent/CN115546796A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion

Abstract

The invention discloses a non-contact data acquisition method and a system based on visual computation, wherein the acquisition and identification method comprises the following steps: s1: capturing contents in a scene, setting a capturing frame to correct the captured contents, and acquiring picture data with better quality; s2: preprocessing the acquired image data; s3: extracting a connected component of the image data for iterative processing; s4: and performing character recognition by adopting an OCR model, and then outputting a recognition result. The invention provides the additional image acquisition and image data processing flow, the two flows improve the quality of image processing, are beneficial to subsequent image recognition, and provide an effective iterative model for processing the data trained by the classifier, so that the classification performance of the classifier can be enhanced.

Description

Non-contact data acquisition method and system based on visual computation
Technical Field
The invention relates to the field of practical application of visual computation, in particular to a non-contact data acquisition method and system based on visual computation.
Background
Character recognition is a vital part of OCR systems and it mainly consists of two methods: pattern recognition and feature extraction. The principle of pattern recognition is to analyze each character as a whole and compare it to a matrix of characters stored in software. The method matches the input character with the stored character by comparing the similarity of the input character and the stored character in the shape and length ratio. Feature extraction is a more complex and more versatile character recognition method, more closely mimicking the way a human brain processes text. Each character can be decomposed into independent features by utilizing a feature extraction technology, and straight lines, curves, angles and intersection points are identified; the characteristics of these characters are matched with the corresponding letters. However, many factors such as: definition shape and characteristic extraction of collected image
The accuracy of character recognition is affected by the accuracy of the acquisition and the selection training of the classifier.
In view of the above disadvantages of the conventional recognition models, some researchers and experts have proposed various algorithms to improve the OCR model, but there are still disadvantages: (1) Only the improvement of the OCR model is considered separately, and the improvement of the whole process is not carried out. (2) Although a series of algorithms and models such as neural networks are mostly used for selecting the classifier, the influence of the recognition result on the models is not considered. (3) The recognition result usually has the problems of word missing or semantic incompliance.
In summary, in order to ensure that the system can acquire image data with high quality without damage, and the character recognition model of the system can have high accuracy and good processing capability for various complicated parts of speech and continuous sentences, it is necessary to design a non-contact data acquisition method based on visual computation to solve the above problems.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a non-contact data acquisition method based on visual computation, so as to break through the limitation that the traditional contact data acquisition cost is high and the loss of the acquired data is low, and solve the problems encountered in the image data acquisition process by using the theory of visual computation.
One of the purposes of the invention is realized by the following technical scheme:
a method of contactless data acquisition based on visual computation, the method comprising:
acquiring image data in a non-contact mode;
processing the acquired image data;
training a classifier by an iterative model;
and performing character recognition by using the OCR model, and outputting a recognition result.
Further, the processing of the collected image data is specifically as follows;
step S201: rotating the image; adjusting the direction of a text area to be identified in the image to enable the text of the text area to be parallel to the image boundary, setting a straight line as the left margin of the text area, wherein the slope of the straight line defines the page rotation angle;
step S202, scanning an image; scanning each horizontal line from left to right, storing the first black pixel appearing on each scan line in array D, D = { D = { D } 1 ,D 2 ,…,D n N represents the nth scanning line, and m represents the last column of the text area; e.g. the first black pixel point on the first scanning line is (x) 1 ,y 1 ) The position of the first black pixel point which is sequentially pushed to the nth scanning line is (x) n ,y 1 );
D 1 =((x 1 ,y 1 ),(x 1 ,y 2 ),…,(x 1 ,y m ));
D 2 =((x 2 ,y 1 ),(x 2 ,y 2 ),…,(x 2 ,y m ));
………
D n =((x n ,y 1 ),(x n ,y 2 ),…,(x n ,y m ));
Wherein, y m Last column, x, representing text area n A last line representing the text region;
step S203: and (3) visual angle correction: when the position of the image acquisition equipment is not perpendicular to the screen, the obtained area is not a rectangle, so that the fitting of OCR is influenced; the specific treatment method comprises the following steps:
defining an image containing the entire page, first defining the four vertices of a quadrilateral: f 1 ,F 2 ,F 3 ,F 4 Then, the boundary identifying the content is defined as: x =. DELTA.F 12 +△F 34 /2,Y=△F 13 +△F 24 /2;
Wherein, F 1 =(x 1 ,y 1 ),F 2 =(x 1 ,y m ),F 3 =(x n ,y 1 ),F 4 =(x n ,y m ),△F 12 =F 2 -F 1 ,△F 34 =F 4 -F 3 ,△F 13 =F 3 -F 1 ,△F 24 =F 4 -F 2
Performing internal interpolation operation on the colors of pixels of all point sets of the image containing the text region, and calculating F in the image after viewing angle correction and the original image 1 And F 2 、F 3 And F 4 、F 1 And F 3 、F 2 And F 4 The coordinate difference of the displacement points can finally calculate the size of the image angle deviation after the visual angle correction compared with the original image through the processing;
step S204: if the recognized text is curved, carrying out nonlinear image transformation for correcting the upper and lower boundaries of the previously recognized text region, setting the left and right boundaries as straight lines, and setting a sample line with four nodes for interpolating a curve; after marking an upper sample curve and a lower sample curve, carrying out correction operation, wherein each point of the sample curves belongs to [0 ] by a parameter g; 1]Describing, for a specific value g, a straight line between upper and lower sample curves is defined; starting point (x) i (0),y i (0) And (x) g (0),y g (0) Length between) is the height of the new image after correction; each row (x) i (t),y i (t)) it is transformed into a vertical line using an interpolation formula, and finally a rectangular image is created.
Further, the iterative model training classifier specifically includes:
step S301: constructing a two-step iteration process, and defining two rules for generating the domain relation graph: the method comprises relaxed domain relation rules and strict domain relation rules;
step S302: in the first iteration process, strict domain relation rules are applied to part-of-speech tagging, and connected components are classified into text connected components and non-text connected components; "text connected component" refers to continuous text, i.e. black pixel point variable, and "non-text connected component" refers to continuous white pixel point component not belonging to text; distributing the text connected components to rectangular areas, and then putting the rectangular areas into a separation module of an OCR (optical character recognition) model, wherein the OCR module adds a weight coefficient for the position of a text line in each rectangular area; regions with high weighting coefficients are identified as "text lines", the corresponding connected components are identified as "text connected components", and the others are identified as "uncertain connected components";
step S303: in the second iteration process, a domain graph is constructed by using a specified strictless relation rule, part of speech tagging is carried out, and a weight threshold value serving as a text line is set to be a lower level; finally, obtaining an estimated text region, namely an iteration result and an identified text region;
step S304: and extracting connected components after the second iteration process, wherein the text and the background in the scene have distinguishable colors and brightness, and the characters in a certain text line also have similar colors, the text region and the background are separated according to the different color attributes of the text region and the background region, then extracting the connected components by adopting image binarization and connected text filtering, and finally putting the extracted connected components into a classifier for training and recognition.
Further, the relaxed domain relation rule requires that node pairs including all closed pixels in the graph can connect background traffic to a greater extent; the strict domain relation rule requires that each pair of adjacent connected components have overlapped horizontal projection on the basis of relaxing the domain relation rule, so that the region crossing text lines is prevented.
Further, character recognition is performed by using the OCR model, and the specific step of outputting the recognition result includes:
step S401: image preprocessing: including but not limited to binarization, image enhancement, noise processing, and image filtering;
step S402: and (3) carrying out feature extraction:
step S403: character recognition, namely putting the extracted character features into a classifier for classification and comparison to recognize characters; and finally outputting a character recognition result.
Further, the feature extraction of step S402 includes:
dividing a text area, and dividing a text content area and a background area of the image through analysis of pixel values;
extracting each line of text, horizontally projecting the image in a text area, starting from the first black pixel point on the left side of each line to the last black pixel point on the right side, extracting the text content of one line and storing the text content in a dictionary;
character cutting, namely performing vertical projection on each line of text content, namely each line of text in a dictionary, finding the left and right boundaries of each character, and performing single character cutting on each line of text;
and (3) feature extraction, wherein the position features of black or white pixels in each character image are extracted and stored in an array A, and the structural feature vectors of each black pixel in the character images are extracted and stored in an array B, wherein the structural features comprise singular points, straight lines, curves, angle intersection points, convex arcs and strokes.
Further, step S403 includes:
extracting features, namely extracting feature arrays A and B obtained in the step S402 in sequence;
merging the characteristics, namely fusing the characteristics A and B extracted from each character;
classifying by a classifier, namely putting the combined features of each character into a trained classifier with a machine learning algorithm, namely combining a conditional random field and a BP neural network, and carrying out feature classification comparison;
finally, outputting a character recognition result;
further, a post-processing step of step S404 is included, and step S404 specifically includes:
checking, correcting and supplementing grammar and information of the recognized text content in relation to the context, so that the text content is smooth;
converting the extracted text data into a computerized file, and outputting a recognition result.
Further, the step of acquiring image data specifically includes:
constructing a position framework of the camera, so that the visual angle of the camera can completely contain the content in the scene range;
locating an identified target in a scene;
setting a monitoring frame to correct the content, pressing a trigger button, then capturing pictures and successfully acquiring image data;
and correcting the image frame, wherein the position of the image acquisition equipment is not vertical to the screen, the obtained area is not a rectangle, and a quadrangle containing a text range is marked by manually correcting.
The invention also provides a non-contact data acquisition system based on visual computation, which is used for realizing the method, wherein the system is used for positioning a target in a scene and capturing an image, then processing the image data, extracting text information of the image data and finally outputting an identification result;
the data acquisition module performs rotary moving focusing by using a plurality of cameras, positions contents to be identified in a scene, performs high-definition image capture and is used for acquiring image resources of the whole system; the plurality of cameras can acquire and capture contents needing to be identified in a scene from different angles, and image data of the identified contents can be acquired more efficiently and accurately;
the recognition module is used for extracting and recognizing characters after acquiring a high-definition image, and comprises an image preprocessing submodule, namely, image data is subjected to primary processing so as to facilitate subsequent work; the characteristic extraction submodule extracts the characteristics of the characters through characteristic extraction; the character recognition submodule is used for recognizing and comparing the extracted features through character recognition to obtain a text recognition result, and finally correcting the recognition result and converting the corrected result into a computer file for output;
the iterative module is used for extracting the connected components of the text characters, extracting the characteristics of the connected components of the characters, putting the extracted characteristics into a classifier combining a random field condition algorithm and a BP neural network, and training the classifier.
The invention has the beneficial effects that:
(1) The invention provides the method for increasing the image acquisition and image data processing flows, and the two flows are provided independently, so that the quality of image processing can be enhanced, and the method is beneficial to subsequent identification;
(2) The invention provides an effective iterative model to process the data trained by the classifier, so that the classification performance of the classifier can be enhanced; the difference from other recognition models is characterized in that other models are not added with a training module to train the recognition models. In addition, the training model of the invention combines the conditional random field algorithm and the BP neural network.
(3) The classifier combining the conditional random field algorithm and the BP neural network can effectively improve the accuracy of the model for character recognition and can also effectively reverse the recognition result to the optimization of the parameters;
(4) The invention provides an effective characteristic extraction and characteristic fusion technology, and during characteristic extraction, the positions of character pixels are considered, and various characteristics of straight lines, bends, cross points and characteristic vectors of character characteristics are considered, and the characteristics are put into an identification model, so that the performance of the identification model is improved;
(5) In the invention, during the correction of the recognition result by the post-processing module, the factors such as the part of speech, the context, the original image format and the like are fully considered, so that the effective correction of the recognition result is realized, and the final recognition result is accurate and smooth in content and more neat in format.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the present invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings, in which:
fig. 1 is a flow chart of a non-contact data acquisition method based on visual computation in the present invention.
Fig. 2 is a schematic system structure diagram of a non-contact data acquisition method based on visual computation in the present invention.
Fig. 3 is an iterative flow diagram in the present invention.
FIG. 4 is a schematic diagram of the ORC model structure in the present invention.
Fig. 5 is a schematic diagram of a definition structure of a page image in the present invention.
Detailed Description
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be understood that the preferred embodiments are illustrative of the invention only and are not limiting upon the scope of the invention.
As shown in the figure, the non-contact data acquisition method based on the vision calculation comprises the following steps:
acquiring image data in a non-contact mode;
and processing the acquired image data.
And (5) training a classifier by using the iterative model.
And performing character recognition by adopting an OCR model, and outputting a recognition result.
Specifically, the invention collects data in a non-contact way, reduces the labor cost consumption, avoids the damage to the data during the collection, leads the system to carry out iterative learning with lower cost, and adopts a reverse propagation neural network combined with a classifier of a random field algorithm aiming at the character recognition classification of a text area by recognizing high-quality image data collected in a non-contact way, so that the invention can recognize the text with high precision, and can continuously optimize the system recognition precision by reversely taking a recognition result as an input tuning parameter, thereby leading the collection and recognition system to operate more virtuous cycle.
It should be noted that any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
Each step will be described in detail below, wherein the image data acquisition process in step S1 specifically includes:
step S101: constructing a position framework of the camera, so that the visual angle of the camera can completely contain the content in the scene range;
s102, positioning an identification target in a scene;
s103, setting a monitoring frame to correct the content, pressing a trigger button, then capturing pictures and successfully acquiring image data;
step S104: and correcting an image frame, wherein the position of image acquisition equipment (a camera) is not vertical to a screen, the obtained area is not a rectangle, manual correction is carried out, and a quadrangle containing a text range is marked.
Specifically, the image data collection subsystem is used for carrying out remote data collection by methods such as a camera, so that damage to collected data can be avoided, collection of high-quality image data is guaranteed, and labor cost consumption is reduced due to mechanical automation of the subsystem.
In addition, the image data processing in step S2 specifically includes:
step S201: rotating the image; adjusting the direction of a text region to be identified in an image to enable the text to be parallel to the image boundary, setting a straight line as the left margin of the text region, wherein the slope of the straight line defines the page rotation angle;
step S202, scanning an image; scanning each horizontal line from left to right, storing the first black pixel appearing on each scan line in array D, D = { D = { D } 1 ,D 2 ,…,D n N represents the nth scanning line, and m represents the last column of the text area; for example, the first black pixel point on the first scanning line is located at (x) 1 ,y 1 ) Sequentially class-pushed to the first black pixel position on the nth scanning lineIs (x) n ,y 1 );
D 1 =((x 1 ,y 1 ),(x 1 ,y 2 ),…,(x 1 ,y m ));
D 2 =((x 2 ,y 1 ),(x 2 ,y 2 ),…,(x 2 ,y m ));
………
D n =((x n ,y 1 ),(x n ,y 2 ),…,(x n ,y m ));
Wherein, y m Last column, x, representing text area n Representing the last line of the text region.
When the image is scanned, the image can be rotated, and boundary points in the image can be found more conveniently. The boundary lines of the text regions in the image are usually vertical, so when the text regions of the image are scanned, points which enable the boundary lines to be more vertical are selected and stored in the array D;
step S203: and (3) visual angle correction: when the position of the image acquisition equipment is not perpendicular to the screen, the obtained area is not a rectangle, so that the fitting of OCR is influenced; the specific treatment method comprises the following steps:
defining an image containing the entire page, as shown in fig. 5, first define the four vertices of the quadrilateral: f 1 ,F 2 ,F 3 ,F 4 Then, the boundaries identifying the content are defined as: x =. DELTA.F 12 +△F 34 /2,Y=△F 13 +△F 24 /2;
Wherein, F 1 =(x 1 ,y 1 ),F 2 =(x 1 ,y m ),F 3 =(x n ,y 1 ),F 4 =(x n ,y m ),△F 12 =F 2 -F 1 ,△F 34 =F 4 -F 3 ,△F 13 =F 3 -F 1 ,△F 24 =F 4 -F 2
Performing an internal interpolation operation on the colors of the pixels of all the point sets of the image containing the text region, by countingF in the corrected image and the original image with calculated view angle 1 And F 2 、F 3 And F 4 、F 1 And F 3 、F 2 And F 4 The coordinate difference of the displacement points can finally calculate the size of the image angle deviation after the visual angle correction compared with the original image through the processing;
step S204: and (3) nonlinear image conversion, if the recognized text paper is not flat and the characters are curved, correcting the upper and lower boundaries of the previously recognized text area, setting the left and right boundaries as straight lines, and setting a sample line with four nodes to interpolate the curve. After the upper and lower sample curves are plotted, a calibration operation is performed. Each point of the sample curve is defined by a parameter g ∈ [0;1]Describing, for a particular value g, a straight line is drawn between the upper and lower sample curves. Starting point (x) i (0),y i (0) And (x) g (0),y g (0) Length between) is the height of the new image after correction; each row (x) i (t),y i (t)) transforms it into a vertical line using an interpolation formula, and finally creates a rectangular image.
The acquired image data is subjected to preliminary processing such as image rotation, visual angle correction and nonlinear image transformation, so that the image data better meets the requirements of an identification model on the image, and the identification error caused by irregular image input data is reduced.
In this embodiment, the specific process of the classifier with the iterative training function in step S3 is as follows:
step S301: constructing a two-step iteration process, and defining two rules for generating the domain relation graph:
(1) Relaxed domain relation rules require that the graph contains all node pairs with closed pixels, which can connect background traffic to a greater extent;
(2) Strict domain relation rules require that each pair of adjacent connected components have overlapped horizontal projection on the basis of relaxed domain relation rules, so that the text line crossing region is prevented.
Step S302: in the first iteration process, strict domain relation rules are applied to part-of-speech tagging, and connected components are classified into text connected components and non-text connected components. The text connected components are distributed to rectangular areas and then put into a separation module of an OCR model, and the OCR module adds weight coefficients for the positions of each rectangular area as text lines. The region with the high weight coefficient is identified as "text line", the corresponding connected component is identified as "text connected component", and the others are identified as "uncertain connected component".
Step S303: in the second iteration process, the domain graph constructed by the relaxed domain relation rule is utilized and part of speech tagging is performed, the weight threshold value serving as the text line is set to be a lower level, the lower level is the effect of maximally including the text region in the image region, namely, blank gaps between texts are also included in the range, and therefore the text connected components can be retained to the maximum extent. And finally, obtaining the estimated text region, namely the iteration result and the identified text region.
Step S304: and (3) extracting connected components after the second iteration process, wherein the text and the background in the scene have distinguishable colors and brightness, and the characters in a certain text line also have similar colors, separating the text region from the background according to the different color attributes of the text region and the background region, extracting the connected components by image binarization and connected text filtering, and finally putting the extracted connected components into a classifier for training and recognition.
Specifically, a classifier with iterative training is arranged, the feature connected components extracted from the image data are processed, the final character recognition result is subjected to back propagation iteration, and parameters and a model of the classifier are trained, so that the recognition accuracy of the system is improved.
It should be noted that the specific flow of the OCR recognition model in step S4 includes:
step S401: image preprocessing:
(1) Binarization: and carrying out binarization processing on the rectangular image by adopting local self-adaptive binarization.
(2) Image enhancement: and (3) carrying out appropriate speckle removal on the black and white image after binarization processing, enhancing the brightness of local black pixels, and enhancing the contrast of the background and the text.
(3) Noise processing: and (3) carrying out denoising model training by adopting a random forest algorithm model in machine learning, then constructing a sliding window on the basis of a model algorithm, and carrying out denoising treatment on each pixel in the image.
(4) Image filtering: and performing noise reduction smoothing, sharpening and pixel edge smoothing by adopting a bilateral filtering method.
Step S402: feature extraction:
(1) And dividing the text area, namely dividing the text content area and the background area of the image through analysis of pixel values.
(2) Extracting each line of text, horizontally projecting the image in a text area, starting from the first black pixel point on the left side of each line to the last black pixel point on the right side, extracting the text content of one line, and storing the text content in a dictionary.
(3) And character cutting, namely performing vertical projection on each line of text content, namely each line of text in the dictionary, finding the left and right boundaries of each character, and performing single character cutting on each line of text.
(4) And (3) feature extraction, namely extracting the position features of black or white pixels in each character image and storing the position features into an array A, and extracting and storing the structural feature vectors of each black pixel in the character image, including singular points, straight lines, curves, angle intersection points, convex arcs and strokes, into an array B.
Step S403: character recognition, namely putting the extracted character features into a classifier for classification and comparison to recognize characters;
(1) And (4) extracting features, namely extracting feature arrays A and B obtained in the step S402 in sequence.
(2) And combining the characteristics, namely fusing the characteristics A and B extracted from each character.
(3) And (4) classifying by a classifier, namely putting the combined features of each character into a trained classifier with a machine learning algorithm, namely combining a conditional random field and a BP neural network, and carrying out feature classification comparison.
(4) And finally, outputting a character recognition result.
Step S404: post-processing, which is another error correction technique that ensures high accuracy of OCR. Accuracy can be further improved if the output is limited by the dictionary.
(1) The context is connected to check, modify and supplement the recognized text content in grammar and information, so that the text content is smooth.
(2) Converting the extracted text data into a computerized file, and outputting a recognition result.
Specifically, the feature extraction and classifier in the recognition process of the OCR model are improved, the feature extraction process is improved and optimized, the features of the characters are extracted more accurately, the trained classifier is used for classifying and recognizing the extracted character features, and the system is operated more accurately and efficiently.
In order to realize the non-contact data acquisition method based on the visual computation, the invention also provides a non-contact data acquisition system based on the visual computation, the system is used for positioning a target in a scene and capturing an image, then processing the image data, extracting text information of the image data, and finally outputting an identification result, and the system comprises a data acquisition module, an identification module and an iteration module;
the data acquisition module performs rotary moving focusing by using a plurality of cameras, positions contents to be identified in a scene, performs high-definition image capture and is used for acquiring image resources of the whole system; the plurality of cameras can acquire and capture contents needing to be identified in a scene from different angles, and image data of the identified contents can be acquired more efficiently and accurately;
the recognition module is used for extracting and recognizing characters after acquiring a high-definition image, and comprises an image preprocessing submodule, namely, image data is subjected to primary processing so as to facilitate subsequent work; the characteristic extraction submodule extracts the characteristics of the characters through characteristic extraction; the character recognition submodule is used for recognizing and comparing the extracted features through character recognition to obtain a text recognition result, and finally correcting the recognition result through post-processing and converting the result into a computer file to output;
the iteration module is used for extracting the connected components of the text characters, extracting the characteristics of the connected components of the characters, putting the extracted characteristics into a classifier combining a random field condition algorithm and a BP neural network, and training the classifier.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (10)

1. A non-contact data acquisition method based on visual computation is characterized in that: the method comprises the following steps:
acquiring image data in a non-contact mode;
processing the acquired image data;
training a classifier by an iterative model;
and performing character recognition by using the OCR model, and outputting a recognition result.
2. A method of contactless data acquisition based on visual computation of claim 1, characterized in that: processing the acquired image data, specifically:
step S201: rotating the image; adjusting the direction of a text region to be identified in an image to enable the text to be parallel to the image boundary, setting a straight line as the left margin of the text region, wherein the slope of the straight line defines the page rotation angle;
step S202, scanning an image; scanning each horizontal line from left to right, storing the first appearance of black pixels on each scan line in array D, D = { D = { D = 1 ,D 2 ,…,D n Where n denotes the nth scanning line and m denotes a text areaThe last column; e.g. the first black pixel point on the first scanning line is (x) 1 ,y 1 ) The position of the first black pixel point which is sequentially pushed to the nth scanning line is (x) n ,y 1 );
D 1 =((x 1 ,y 1 ),(x 1 ,y 2 ),…,(x 1 ,y m ));
D 2 =((x 2 ,y 1 ),(x 2 ,y 2 ),…,(x 2 ,y m ));
………
D n =((x n ,y 1 ),(x n ,y 2 ),…,(x n ,y m ));
Wherein, y m Last column, x, representing text area n The last line representing the text region;
step S203: and (3) visual angle correction: when the position of the image acquisition equipment is not perpendicular to the screen, the obtained area is not a rectangle, and fitting of OCR (optical character recognition) is influenced; the specific treatment method comprises the following steps:
defining an image containing the entire page, first defining the four vertices of a quadrilateral: f 1 ,F 2 ,F 3 ,F 4 Then, the boundaries identifying the content are defined as: x =. DELTA.F 12 +△F 34 /2,Y=△F 13 +△F 24 /2;
Wherein, F 1 =(x 1 ,y 1 ),F 2 =(x 1 ,y m ),F 3 =(x n ,y 1 ),F 4 =(x n ,y m ),△F 12 =F 2 -F 1 ,△F 34 =F 4 -F 3 ,△F 13 =F 3 -F 1 ,△F 24 =F 4 -F 2
Performing internal interpolation operation on the colors of pixels of all point sets of the image containing the text region, and calculating F in the image after viewing angle correction and the original image 1 And F 2 、F 3 And F 4 、F 1 And F 3 、F 2 And F 4 The coordinate difference of the displacement points can finally calculate the size of the image angle deviation after the visual angle correction compared with the original image through the processing;
step S204: performing nonlinear image transformation, setting left and right boundaries as straight lines, and setting a sample line with four nodes to interpolate a curve, wherein the nonlinear image transformation is used for correcting the upper and lower boundaries of the previously identified text region; after marking an upper sample curve and a lower sample curve, carrying out correction operation, wherein each point of the sample curves belongs to [0 ] by a parameter g; 1]Describing, for a specific value g, a straight line between upper and lower sample curves is defined; starting point (x) i (0),y i (0) And (x) g (0),y g (0) Length between) is the height of the new image after correction; each row (x) i (t),y i (t)) transforms it into a vertical line using an interpolation formula, and finally creates a rectangular image.
3. A method for contactless data acquisition based on visual computation of claim 1 or 2, characterized in that: the iterative model training classifier specifically comprises:
step S301: constructing a two-step iteration process, and defining two rules for generating a domain relation graph: the method comprises the following steps of (1) including relaxed domain relation rules and strict domain relation rules;
step S302: in the first iteration process, strict domain relation rules are applied to part-of-speech tagging, and connected components are classified into text connected components and non-text connected components; "text connected component" refers to continuous text, i.e. black pixel point variable, and "non-text connected component" refers to continuous white pixel point component not belonging to text; distributing the text connected components to rectangular areas, and then putting the rectangular areas into a separation module of an OCR (optical character recognition) model, wherein the OCR module adds a weight coefficient for the position of each rectangular area as a text line; regions with high weighting coefficients are identified as "text lines", the corresponding connected components are identified as "text connected components", and the others are identified as "uncertain connected components";
step S303: in the second iteration process, a domain graph is constructed by using a specified strictness relation rule, part of speech tagging is carried out, and a weight threshold value serving as a text line is set to be a lower level; finally, obtaining an estimated text region, namely an iteration result and an identified text region;
step S304: and extracting connected components after the second iteration process, wherein the text and the background in the scene have distinguishable colors and brightness, and the characters in a certain text line also have similar colors, the text region and the background are separated according to the different color attributes of the text region and the background region, then extracting the connected components by adopting image binarization and connected text filtering, and finally putting the extracted connected components into a classifier for training and recognition.
4. A visual computing-based contactless data acquisition method according to claim 3, characterized in that: the relaxed domain relation rule requires that the graph comprises all node pairs with closed pixels, so that the background flux can be connected to a greater extent; the strict domain relation rule requires that each pair of adjacent connected components have overlapped horizontal projection on the basis of relaxing the domain relation rule, so that the region crossing text lines is prevented.
5. A method of contactless data acquisition based on visual computation of claim 1, characterized in that: the method comprises the following specific steps of performing character recognition by using an OCR model and outputting a recognition result:
step S401: image preprocessing: including but not limited to binarization, image enhancement, noise processing, and image filtering;
step S402: carrying out feature extraction;
step S403: character recognition, namely putting the extracted character features into a classifier for classification and comparison to recognize characters; and finally, outputting a character recognition result.
6. A visual computing-based contactless data acquisition method according to claim 5, characterized in that: the feature extraction of step S402 includes:
dividing a text area, and dividing a text content area and a background area of the image through analysis of pixel values;
extracting each line of text, horizontally projecting the image in a text area, starting from the first black pixel point on the left side of each line to the last black pixel point on the right side, extracting the text content of one line and storing the text content in a dictionary;
character cutting, namely performing vertical projection on each line of text content, namely each line of text in a dictionary, finding the left and right boundaries of each character, and performing single character cutting on each line of text;
and (3) feature extraction, wherein the position features of black or white pixels in each character image are extracted and stored in an array A, and the structural feature vectors of each black pixel in the character images are extracted and stored in an array B, wherein the structural features comprise singular points, straight lines, curves, angle intersection points, convex arcs and strokes.
7. A visual computing-based contactless data acquisition method according to claim 6, characterized in that: step S403 includes:
extracting features, namely extracting feature arrays A and B obtained in the step S402 in sequence;
combining the characteristics, namely fusing the characteristics A and B extracted from each character;
classifying by a classifier, namely putting the combined features of each character into a trained classifier with a machine learning algorithm, namely combining a conditional random field and a BP neural network, and carrying out feature classification comparison;
and finally outputting a character recognition result.
8. A visual computing-based contactless data acquisition method according to claim 5, characterized in that: further comprising a post-processing step of step S404, wherein step S404 specifically comprises:
checking, correcting and supplementing the recognized text content in grammar and information by contacting with the context, so that the text content is smooth;
converting the extracted text data into a computerized file, and outputting a recognition result.
9. A method of contactless data acquisition based on visual computation of claim 1, characterized in that: the step of acquiring image data specifically comprises:
constructing a position framework of a camera, so that the view angle of the camera can completely contain the content in the scene range;
locating an identified target in a scene;
setting a monitoring frame to correct the content, pressing a trigger button, then capturing pictures and successfully acquiring image data;
and correcting the image frame, wherein the position of the image acquisition equipment is not vertical to the screen, the obtained area is not a rectangle, and manually correcting is carried out to mark a quadrangle containing a text range.
10. A system for contactless data acquisition based on visual computation, for implementing the method according to any one of claims 1 to 9, characterized in that: the system is used for positioning a target and capturing an image in a scene, then processing image data, extracting text information of the image data, and finally outputting an identification result, and comprises a data acquisition module, an identification module and an iteration module;
the data acquisition module utilizes a camera to perform rotary moving focusing, positions contents to be identified in a scene, and captures high-definition images for acquiring image resources of the whole system;
the recognition module is used for extracting and recognizing characters after acquiring a high-definition image, and comprises an image preprocessing submodule, namely, image data is subjected to primary processing so as to facilitate subsequent work; the characteristic extraction submodule extracts the characteristics of the characters through characteristic extraction; the character recognition submodule is used for recognizing and comparing the extracted features through character recognition to obtain a text recognition result, and finally correcting the recognition result and converting the corrected result into a computer file for output;
the iterative module is used for extracting the connected components of the text characters, extracting the characteristics of the connected components of the characters, putting the extracted characteristics into a classifier combining a random field condition algorithm and a BP neural network, and training the classifier.
CN202211155147.9A 2022-09-22 2022-09-22 Non-contact data acquisition method and system based on visual computation Pending CN115546796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211155147.9A CN115546796A (en) 2022-09-22 2022-09-22 Non-contact data acquisition method and system based on visual computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211155147.9A CN115546796A (en) 2022-09-22 2022-09-22 Non-contact data acquisition method and system based on visual computation

Publications (1)

Publication Number Publication Date
CN115546796A true CN115546796A (en) 2022-12-30

Family

ID=84728988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211155147.9A Pending CN115546796A (en) 2022-09-22 2022-09-22 Non-contact data acquisition method and system based on visual computation

Country Status (1)

Country Link
CN (1) CN115546796A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912845A (en) * 2023-06-16 2023-10-20 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912845A (en) * 2023-06-16 2023-10-20 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI
CN116912845B (en) * 2023-06-16 2024-03-19 广东电网有限责任公司佛山供电局 Intelligent content identification and analysis method and device based on NLP and AI

Similar Documents

Publication Publication Date Title
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN109993160B (en) Image correction and text and position identification method and system
EP2344980B1 (en) Device, method and computer program for detecting a gesture in an image, and said device, method and computer program for controlling a device
CN101673338B (en) Fuzzy license plate identification method based on multi-angle projection
CN111160352B (en) Workpiece metal surface character recognition method and system based on image segmentation
CN112733797B (en) Method, device and equipment for correcting sight of face image and storage medium
CN109472249A (en) A kind of method and device of determining script superiority and inferiority grade
Peyrard et al. ICDAR2015 competition on text image super-resolution
CN110647885B (en) Test paper splitting method, device, equipment and medium based on picture identification
CN112101262B (en) Multi-feature fusion sign language recognition method and network model
CN110032932B (en) Human body posture identification method based on video processing and decision tree set threshold
CN110163211B (en) Image recognition method, device and storage medium
WO2015131468A1 (en) Method and system for estimating fingerprint pose
Kölsch et al. Recognizing challenging handwritten annotations with fully convolutional networks
CN112541422A (en) Expression recognition method and device with robust illumination and head posture and storage medium
CN113435240A (en) End-to-end table detection and structure identification method and system
CN115546796A (en) Non-contact data acquisition method and system based on visual computation
CN114092938B (en) Image recognition processing method and device, electronic equipment and storage medium
CN109919128B (en) Control instruction acquisition method and device and electronic equipment
CN109711420B (en) Multi-affine target detection and identification method based on human visual attention mechanism
CN114758341A (en) Intelligent contract image identification and contract element extraction method and device
Keefer et al. A survey on document image processing methods useful for assistive technology for the blind
CN112597998A (en) Deep learning-based distorted image correction method and device and storage medium
CN111914749A (en) Lane line recognition method and system based on neural network
CN110991440A (en) Pixel-driven mobile phone operation interface text detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination