CN113392848A

CN113392848A - Deep learning-based reading method and device for OCR on cylinder

Info

Publication number: CN113392848A
Application number: CN202110948821.8A
Authority: CN
Inventors: 施晨涛; 任世强; 吴潘
Original assignee: Hitery Tianjin Technology Co ltd
Current assignee: Hitery Tianjin Technology Co ltd
Priority date: 2021-08-18
Filing date: 2021-08-18
Publication date: 2021-09-14

Abstract

The invention discloses a deep learning-based reading method and device for OCR on a cylinder, which solve the problem that the arc surface of the cylinder cannot be imaged stably to read the OCR and are compatible with OCR characters with different colors and fonts; the invention is feasible, obtains stable effect, has short required time, does not need manual participation, and can lay a good foundation for product digital management and tracing; the method can be flexibly used for the transformation of the existing production line, the required hardware change is little, a complex industrial vision system is not required to be designed, and certain reference significance is designed for the scheme of the related problems.

Description

Deep learning-based reading method and device for OCR on cylinder

Technical Field

The invention relates to the technical field of image recognition, in particular to a deep learning-based reading method and device for OCR (optical character recognition) on a cylinder.

Background

The OCR recognition detection is an important application technology of computer image processing in machine vision, and the main function of the OCR recognition detection is to further analyze and extract character information and corresponding position information by processing and characterizing images. And integrating the position information according to the character position arrangement sequence to form character information which accords with logic, thereby providing a foundation for the filing of a digital database of a subsequent object and the tracing management of products. The OCR reading is widely applied in the industry and is an indispensable link of a digital factory, the planar OCR characters have relatively mature application and solutions, but for some special industries, such as the bicycle industry, the OCR characters are printed on an arc surface and widely exist in the industrial production design process, because of the reasons that the lighting and the camera are difficult to image, the reading and the calibration of the OCR information are mainly completed manually at present, and the main problems exist: (1) the labor cost is too high; (2) the efficiency is low; (3) the accuracy cannot be guaranteed; (4) the problem of characters in the production and manufacturing process cannot be solved, and a digital factory is prevented from forming a closed loop.

The existing curved surface OCR recognition methods are mainly classified into 4 types: (1) based on an X-ray imaging reading method, reading image information formed by X-ray irradiation (2) based on a line laser 3D imaging reading method, reading three-dimensional image information formed by calibration shooting of a point laser and a plane camera (3) based on a line scanning camera imaging reading method, reading images under light sources at different angles for a plurality of times by controlling the camera to shoot by a camera or a recognition object movement imaging reading method (4) based on a multi-surface light imaging reading method, and synthesizing a 2.5D image for reading.

The method based on X-ray imaging has certain damage to the surface of the measured object and cannot be applied to reading of vulnerable products. According to the reading method based on the line laser 3D imaging, on one hand, an object or a camera needs to move and cannot be suitable for a fixed detection position, and the reading result is influenced due to the fact that the imaging effect of a low-reflection material is not ideal, and on the other hand, the reading method is not easy to achieve due to high manufacturing cost. The linear scanning camera is difficult to be compatible with detection objects with different depths of field for image reading, and the detection objects and the camera need to generate relative motion, so that the linear scanning camera cannot be applied to fixed scenes. The reading method based on the multi-surface light imaging is characterized in that the product imaging depends on a synthesis algorithm and a product shooting position, the OCR position is not fixed and stable imaging cannot be achieved without solving the parameter problems of the light source angle and the synthesis algorithm, and the image forming needs time, is long in time consumption compared with common detection and is not suitable for a high-speed detection scene.

The main existing difficulties of curved surface OCR reading: (1) the imaging is difficult to stabilize, the reflection angle of the curved surface imaging is larger, and the imaging effect is poorer when the imaging visual field is larger; (2) OCR characters cannot be stably imaged into the same picture for processing; (3) the OCR variation difference is large for products with different materials and colors; (4) the size and position of the OCR character will change and the shape of the character will also change.

The existing method can not realize stable imaging and reading of the cylinder, and a part of the curved surface of the cylinder where the OCR is located is intercepted as a curved surface by utilizing a surface integral principle, as long as the part of the curved surface meets the condition that the highest point from a plane where the cross section is located to the curved surface is smaller than the depth of field of the selected camera. The characteristic of the image can be approximately regarded as a plane, and the stable reading can be carried out by using the multi-camera, and the characteristic of the multi-camera is combined with an algorithm with higher robustness so as to be compatible with OCR of different positions and color angles.

Disclosure of Invention

The invention provides a reading method and a device of OCR on a cylinder based on deep learning, aiming at overcoming the defects of the background art, and solving the problems that the imaging is difficult and the OCR recognition is difficult under the curved surface of the cylinder by recognizing the OCR characters of the cylinder in a visible light image. The method can be used for OCR recognition of cylinder curved surfaces of various colors based on deep learning algorithm processing. The method has accurate identification and short time consumption, does not need manual participation, and can lay a good foundation for subsequent factory digital management.

In order to achieve the above object, one aspect of the present invention provides the following embodiments: a reading method of OCR on a cylinder based on deep learning comprises the following steps:

selecting at least two cameras for photographing and collecting, and selecting a proper camera installation angle according to the size of OCR (optical character recognition), so that the area of a superposed view area of two adjacent cameras is not less than 1/3 of the area of a view area of each camera;

setting camera parameters and shooting and collecting;

thirdly, building a target detection basic model by using YOLOv3, and building a rear-end classification model by using a ResNet34 model to obtain a deep learning model;

marking the picture data samples collected by the camera to generate corresponding sample marking files;

step five, training the deep learning model by using a sample marking file to obtain an OCR model;

taking a picture through a camera to acquire picture data;

step seven, inputting the picture data collected by the camera in the step six into the OCR model in the step five for picture data recognition and character marking processing to obtain character data;

and step eight, processing and integrating the character data obtained in the step seven to realize OCR reading.

Further, in the fifth step, the Deep learning model includes cbr convolution module, crc convolution module and Deep convolution module, the cbr convolution module is formed by mutually connecting convolution layer conv, batch normalization layer bn and Relu activation function in series, the crc convolution module is formed by connecting convolution layer conv, Relu activation function and convolution layer conv in series, the Deep convolution module is formed by connecting two cbr convolution modules in series, and the cbr convolution module, crc convolution module and Deep convolution module are used for extracting features of the OCR picture to form the OCR detection model.

Further, step eight includes three parts of removing repeated character data, longitudinally arranging character data, segmenting character set according to actual conditions and integrating.

Further, in step eight, the repeated character data are removed: detecting the characteristic position and the maximum position range of an OCR (optical character recognition) from a single picture, mapping characters on each position to a plurality of classes by a deep learning model, wherein each class corresponds to a similarity score value of one character and each class, the score values range from 0 to 1, and removing repeated character data by taking the highest score value of the same position.

Further, in step eight, the character data is arranged longitudinally: by means of the coordinate y of the output point in the y direction_iThe values are arranged in ascending order, and the character processing mode for the total number of M and the number of lines of N is that

The number of lines thereof

Wherein n is_kIs the number of lines where the character is located, M is the total number of characters, N is the number of lines of the character, i is the index of the character, the value range is 0 to M-1, y₁Is the minimum value of the y coordinates of all characters, y₂Is the maximum value of the y-coordinate of all characters, y_jIs the average of the minimum and maximum values of the y-coordinate in all characters.

Further, in the sixth step, the character set is segmented and integrated according to the actual situation: the single line of characters for any of the cameras Cam _ L and Cam _ R are processed by taking the characters over the entire field of view

The number of effective total characters satisfies the total number of characters

Wherein L is_LIs the number of characters in the left camera, L_RThe number of the characters in the right camera is the number of the characters in the right camera, the left camera starts to take the characters from the left side of the image, the right camera starts to take the characters from the right side of the image, and the above operation is carried out on each line of characters to obtain all the characters of the shooting arc surface.

Another aspect of the invention provides the following examples: an on-cylinder OCR reading device comprising:

at least two cameras;

a light source;

the processor is connected with the camera and the light source, and comprises picture fusion and recognition software; and

the controller is used for realizing the reading method of the OCR on the cylinder based on the deep learning;

the controller is respectively connected with the camera, the light source and the processor, when the light source irradiates OCR on a cylindrical surface, the controller controls the camera to collect images at preset time, obtains partial plane images of a plurality of OCRs, controls the processor to operate the image fusion and recognition software, fuses the partial plane images of the OCRs, generates a complete plane image of the OCRs, and processes and recognizes the complete plane image.

Further, the camera is a CCD image sensing camera or a CMOS image sensing camera.

Further, the device also comprises a rotatable cylindrical workpiece supporting mechanism.

Compared with the prior art, the invention has the following beneficial effects:

the invention fully uses the principle of the surface integral of the curved surface of the cylinder for reference, solves the problem that the arc surface of the cylinder cannot be imaged stably to read the OCR, and can be compatible with OCR characters with different colors and fonts; the invention is feasible, obtains stable effect, has short required time, does not need manual participation, and can lay a good foundation for product digital management and tracing; the method can be flexibly used for the transformation of the existing production line, the required hardware change is little, a complex industrial vision system is not required to be designed, and certain reference significance is designed for the scheme of the related problems.

Drawings

Fig. 1 is a flowchart illustrating a deep learning-based reading method for OCR on a cylinder according to the present invention.

Fig. 2 is a camera detection primitive diagram in embodiment 1 of the present invention.

Fig. 3 shows the reading result of a single camera in embodiment 1 of the present invention.

FIG. 4 is a diagram of the integrated final read effect in embodiment 1 of the present invention.

FIG. 5 is a schematic structural diagram of a reading apparatus for OCR on a cylinder according to the present invention.

Figure 6 is a top view of an OCR on cylinder reading apparatus of the present invention.

In the figure: 1. a camera; 2. a light source; 3. a processor; 4. and a controller.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

Example 1

Referring to fig. 1, the present invention provides a method for reading an OCR on a cylinder based on deep learning, including the following steps:

the following takes two cameras as an example:

selecting at least two cameras for photographing and image-taking detection, and selecting the installation angle of each camera according to the size of the maximum OCR (optical character recognition), so that each two cameras respectively irradiate the position 2/3 in the length direction of the cylindrical OCR, and the area of the overlapped vision area of the two adjacent cameras is not less than 1/3 of the area of the vision area of each camera;

specifically, as can be seen from fig. 5, two cameras are used for photographing and image-taking detection, and two camera installation angles are selected according to the size of the maximum OCR, so that the cameras Cam _ L and Cam _ R respectively irradiate the positions 2/3 in the length direction of the cylindrical OCR, the overlapping view area of two adjacent cameras is not less than 1/3 of the view area of each camera, and the OCR in different positions and different sizes can be clearly imaged.

As shown in fig. 2, step two, setting appropriate camera parameters and taking a picture for collection;

specifically, setting appropriate camera parameters, taking pictures and collecting, and respectively collecting a large number of OCR pictures with different colors and different states to form a data set;

and step three, in order to guarantee reading speed and stability, building a target detection basic model by using YOLOv3, and building a rear-end classification model by using a ResNet34 model to obtain a deep learning model.

Specifically, DFAPI is selected for development in deep learning basic development, an open-source framework Paddlex is selected for secondary development, the performance of the Paddlex and the ecological environment of deep learning in China are mainly considered, the open-source framework is commercially friendly following an Apache License protocol, in order to guarantee reading speed and stability, a target detection basic model is YOLOv3, and a rear-end classification model is identified by a ResNet 34.

Marking the pictures acquired by the camera to generate corresponding sample marking files;

specifically, full-image data labeling is carried out on pictures of two cameras through self-developed software Dolphin Focus (DF), characteristic characters are selected from each picture to be labeled, a corresponding sample labeling data file is generated, software automatic labeling is carried out on the existing picture data through an OCR model which is completed, and then manual verification is completed, or software labeling is used for setting the size of characters, and then low-score characters which cannot be labeled through software are manually labeled.

Further specifically, the deep learning model maps the characters at each position to a plurality of classes, each class corresponds to a similarity score value of one character and one class, the value range of the score values is 0 to 1, and the characters with the score value lower than 0.5 marked by software need to be manually marked again.

specifically, the deep learning model is trained and tested through the marked data to obtain an OCR model, and the OCR model is used for performing OCR detection on an image to be detected containing characters to form a detection result.

Further specifically, the Deep learning model comprises an cbr convolution module, a crc convolution module and a Deep convolution module, the cbr convolution module is formed by mutually connecting convolution layer conv, batch normalization layer bn and Relu activation functions in series, the crc convolution module is formed by connecting convolution layer conv, Relu activation function and convolution layer conv in series, the Deep convolution module is formed by connecting two cbr convolution modules in series, and the cbr convolution module, the crc convolution module and the Deep convolution module are used for extracting features of the OCR picture to obtain the OCR detection model.

Meanwhile, the light source and the camera can be controlled to be compatible with different product backgrounds, so that OCR recognition objects with different background colors and materials can be realized, the camera background can be adjusted by adjusting the exposure of the camera, the exposure parameters of the camera are bound with the product numbers, and different exposures are adopted for photographing and image taking detection when different detection schemes are used.

As shown in fig. 3-4, step six, taking a picture by a camera to acquire picture data;

Further, step eight, the method also comprises three parts of removing repeated characters, imaging and arranging character data, segmenting a character set according to actual conditions and integrating.

Specifically, in step eight, the repeated character data are removed: detecting feature positions and the maximum position range of an OCR (optical character recognition) from a single picture, mapping characters at each position to a plurality of classes by a deep learning model, wherein each class corresponds to a similarity score value of one character and one class, and the value range of the score values is 0 to 1;

specifically, in step eight, the character data are arranged longitudinally: by means of the coordinate y of the output point in the y direction_iThe values are arranged in ascending order, and the character processing mode for the total number of M and the number of lines of N is that

The number of lines thereof

Specifically, in the sixth step, the character set is segmented and integrated according to the actual situation: the single line of characters for any of the cameras Cam _ L and Cam _ R are processed by taking the characters over the entire field of view

Illustrate by way of example

There are 2 lines of characters, the first line is "12345678", the second line is "abcdefgh"; 2 cameras, wherein 2 lines of characters can be seen in the visual field of the 1 st camera, the content of the 1 st line is '12345', and the content of the 2 nd line is 'abcde'; in the view of the 2 nd camera, 2 lines of characters can be seen, the content of the first line is '45678', and the content of the 2 nd line is 'defgh'.

In the first step, with repeated characters removed, the same character may recognize multiple results, for example, the character "6" may be recognized as "8", but the score value recognized as "6" is 0.8, and the score value recognized as "8" is 0.1, and the final recognition result of each character is determined by the maximum score value.

And secondly, longitudinally arranging, namely, after the character content is recognized, disordering in the vertical direction, and needing to be longitudinally arranged, calculating the number of lines where each character is located according to a formula, wherein the number of lines is 1 or 2, and the line 1 character content of the 1 st camera is 12345, the line 2 character content is abcde, the line 1 character content of the 2 nd camera is 45678, and the line 2 content is defgh.

Thirdly, segmentation and integration, wherein it is known that line 1 has 8 characters and line 2 has 8 characters, so that in the 1 st camera, line 1 takes the left 4 characters "1234", and line 2 takes the left 4 characters "abcd"; in camera 2, line 1 takes the right 4 characters "5678" and line 2 takes the right 4 characters "efgh". Finally, the character content "12345678" in the 1 st line and the character content "abcdefgh" in the 2 nd line are obtained through integration.

Example 2

Detailed structure of the invention referring to fig. 5, a reading apparatus for OCR on a cylinder includes:

at least two cameras 1;

light source 2: specifically, the light source can be adjusted, for satisfying the OCR discernment object of different background colours and material, camera background is adjusted through adjusting camera exposure simultaneously to this scheme, and camera exposure parameter binds with the product senna, adopts different exposures to shoot when using different detection scheme and gets the picture and detect.

The processor is connected with the camera 1 and the light source 2, and comprises picture fusion and recognition software; and

the controller is respectively connected with the camera, the light source and the processor, when the light source irradiates OCR on a cylindrical surface, the controller controls the camera to collect images at preset time so as to obtain a plurality of partial plane images of the OCR, and controls the processor to operate the image fusion and recognition software so as to fuse the partial plane images of the OCR, thereby generating a complete plane image of the OCR, and processing and recognizing the complete plane image.

Further, the camera 1 is a CCD image sensing camera or a CMOS image sensing camera.

Specifically, the camera 1 may be a CCD or CMOS image sensor, and preferably has a resolution of not less than 30 ten thousand pixels, and the camera 1 can output a digital image signal, so that the microprocessor operates image fusion and recognition software to perform image fusion and recognition. The CCD image sensing camera as a novel photoelectric converter is widely applied to the fields of camera shooting, image acquisition, scanners, industrial measurement and the like, and has the advantages of small size, light weight, high resolution, high sensitivity, wide dynamic range, low power consumption, good shock resistance and impact resistance, high reliability and the like. The CMOS image sensing camera has a series of advantages of random window reading capability, radiation resistance, high reliability and the like.

Further, a rotatable cylindrical workpiece support mechanism 3 is included.

Specifically, as shown in fig. 6, the cylindrical workpiece support mechanism 3 may be formed by a support mounting frame having a rotary driving member.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

While there have been shown and described what are at present considered the fundamental principles and essential features of the invention and its advantages, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A reading method of OCR on a cylinder based on deep learning is characterized by comprising the following steps:

setting camera parameters and shooting and collecting;

taking a picture through a camera to acquire picture data;

2. The deep learning based reading method for OCR on cylinder as claimed in claim 1, wherein: in the fifth step, the Deep learning model comprises an cbr convolution module, a crc convolution module and a Deep convolution module, the cbr convolution module is formed by mutually connecting convolution layer conv, batch normalization layer bn and Relu activation functions in series, the crc convolution module is formed by connecting convolution layer conv, Relu activation functions and convolution layer conv in series, the Deep convolution module is formed by connecting two cbr convolution modules in series, and the cbr convolution module, the crc convolution module and the Deep convolution module are used for extracting features of the OCR picture to form the OCR detection model.

3. The deep learning based reading method for OCR on cylinder as claimed in claim 1, wherein: and step eight, removing repeated character data, longitudinally arranging the character data, segmenting a character set and integrating.

4. The deep learning based on-cylinder OCR reading method as claimed in claim 3, wherein:

and step eight, removing repeated character data: the feature position and the maximum position range of the OCR are detected from a single picture, the deep learning model maps the character data of each position to a plurality of classes, each class corresponds to a similarity score value of a character and the class, the score values range from 0 to 1, and repeated character data are removed by taking the highest score value of the same position.

5. The deep learning based on-cylinder OCR reading method as claimed in claim 3, wherein:

in the eighth step, the character data are arranged longitudinally: by seating in the y-direction of the output pointMark y_iThe values are arranged in ascending order, and the character processing mode for the total number of M and the number of lines of N is that

The number of lines thereof

6. The deep learning based on-cylinder OCR reading method as claimed in claim 3, wherein:

in the eighth step, the character set is divided and integrated: the single line of characters for any of the cameras Cam _ L and Cam _ R are processed by taking the characters over the entire field of view

7. An OCR on cylinder reading apparatus, characterized by: the method comprises the following steps:

at least two cameras;

a light source;

a controller for implementing a deep learning based on-cylinder OCR reading method as claimed in any one of claims 1-5;

8. An on-cylinder OCR reading apparatus as claimed in claim 7 wherein the camera is a CCD image sensing camera or a CMOS image sensing camera.

9. An on-cylinder OCR reading apparatus according to claim 7 further comprising a rotatable cylinder workpiece support mechanism.