CN115527210A

CN115527210A - OCR character detection method and device based on YOLOv7 algorithm

Info

Publication number: CN115527210A
Application number: CN202211170987.2A
Authority: CN
Inventors: 姚正; 刘超; 张庆庆; 李建勋; 李欢
Original assignee: Kunshan Baiao Software Co ltd
Current assignee: Kunshan Baiao Software Co ltd
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2022-12-27

Abstract

The invention discloses an OCR character detection method and device based on a YOLOv7 algorithm, wherein the detection method comprises the following steps: constructing an OCR character detection network model, wherein the OCR character detection network model adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network of the YOLOv7 network structure in a deep separable convolution operation mode, and an SE attention mechanism module is embedded in a head prediction module of the YOLOv7 network structure to carry out visualization and extraction of key features; establishing a training set and a verification set for carrying out training, learning and detection verification on the OCR character detection network model; and the configuration calling and deploying module is used for converting the trained OCR character detection network model into a universal model in a unified file format, deploying the universal model into the engineering platform, detecting the training set and/or the verification set and outputting an OCR character prediction result. The detection method has small parameter quantity and operand, and can achieve the light OCR character detection effect with high operation precision and high speed.

Description

OCR character detection method and device based on YOLOv7 algorithm

Technical Field

The invention relates to the technical field of OCR character image processing, in particular to an OCR character detection method and device based on a YOLOv7 algorithm.

Background

In modern life, electronic equipment is widely applied to production and life of people. The OCR character refers to a character printed on an electronic device, and industry can quickly acquire related production information of the electronic device through the OCR character. OCR character recognition has been a subject of close attention by learners because OCR characters are generally affected by the size of electronic equipment and the manufacturing process, and the size of characters is difficult to be recognized quickly and directly by the naked eye. With the continuous development of computer technology, the deep learning-based OCR character method makes new progress on OCR character detection technology, and becomes a main research method in multiple fields of current image processing through the powerful feature generalization and expression capability of the deep learning-based OCR character method. At present, two methods are mainly adopted for OCR character detection/recognition: one is an OCR character detection method based on a traditional method, and the other is an OCR character detection method based on deep learning.

Based on the traditional OCR character detection method, a corresponding threshold is set according to the shape and character characteristics of OCR characters on electronic equipment to realize the OCR character detection effect, characteristics are artificially screened by setting a reasonable threshold or applying a multiple image morphological processing mode, and the OCR character detection effect with certain precision can be usually realized according to actual conditions.

In the past OCR character detection model is realized by adopting a convolutional neural network with high depth and large width, so that a high-precision detection model can be obtained usually. However, as the depth of the network increases, the parameter quantity and the computation quantity of the network also increase greatly, so that the detection speed and the occupation of the program running space of the OCR character detection model during industrial deployment are seriously affected.

Therefore, how to obtain an OCR character detection model with high accuracy and high speed by using a detection network with a small number of parameters and a small occupied weight space becomes a technical problem to be solved urgently. In view of the above, the present invention is particularly proposed.

Disclosure of Invention

In order to overcome the defects, the invention provides an OCR character detection method and device based on a YOLOv7 algorithm, which have small parameters and operand and can achieve the light OCR character detection effect with high operation precision and high speed.

The technical scheme adopted by the invention for solving the technical problem is as follows: an OCR character detection method based on a YOLOv7 algorithm comprises the following steps:

constructing an OCR character detection network model, wherein the OCR character detection network model adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network of the YOLOv7 network structure in a deep separable convolution operation mode, and an SE attention mechanism module is embedded in a head prediction module of the YOLOv7 network structure to carry out visualization and extraction of key features;

establishing a training set and a verification set for carrying out training, learning and detection verification on the OCR character detection network model;

and the configuration calling and deploying module is used for converting the trained OCR character detection network model into a universal model in a unified file format, deploying the universal model into an engineering platform, detecting the training set and/or the verification set and outputting an OCR character prediction result.

As a further improvement of the present invention, the backbone feature extraction network has four groups of first CBS modules each composed of a depth separable convolutional layer, a batch normalization layer BN and an activation function SiLu, and three groups of first combination modules each composed of an MP1 module and an ELAN module, the four groups of the first CBS modules and the three groups of the first combination modules are sequentially arranged in a data processing order, the three groups of the first combination modules respectively output feature maps, and the feature maps output by the three groups of the first combination modules are different in size;

the head prediction module is provided with a second combination module consisting of an SPPCSPC module, two UP modules, four ELAN-H modules and two MP2 modules, and a third combination module consisting of a REP module and a CONV module, wherein the second combination module is used for receiving and fusing feature maps output by the first combination module to obtain three groups of fused feature results; the three groups of third combination modules are respectively used for receiving and processing the three groups of fusion characteristic results to obtain three groups of network prediction results with different sizes; in addition, the SPPCSPPC module is embedded with the SE attention mechanism module.

As a further refinement of the present invention, the depth separable convolution layer includes a depth separable convolution with a convolution kernel size of 3 × 3, and a point-by-point convolution with a convolution kernel size of 1 × 1;

in each of the first combination modules, the MP1 module is composed of three second CBS modules and one maximized pooling layer, the ELAN module is composed of seven third CBS modules and one fully connected layer, and the structures of the second CBS module and the third CBS module are respectively the same as the structure of the first CBS module.

As a further improvement of the present invention, the sppcpcs module is composed of seven fourth CBS modules, three maximum pooling layers and the SE attention mechanism module, the MP2 module is composed of three fifth CBS modules and one maximum pooling layer, and the structures of the fourth CBS module and the fifth CBS module are respectively the same as the structure of the first CBS module.

As a further improvement of the present invention, a method for establishing the training set and the verification set comprises:

acquiring a plurality of target object pictures with OCR characters;

labeling the obtained target object picture to obtain a corresponding labeling frame; dividing the marked target object picture into a training sample set and a verification sample set;

performing enhancement processing on the obtained training sample set to obtain the training set;

the validation sample set is used directly as the validation set.

As a further improvement of the invention, based on the training set, the method for performing iterative training on the OCR character detection network model comprises the following steps:

inputting the training set into the OCR character detection network model in batches for forward propagation training, and obtaining a network prediction result of each batch of forward propagation training after passing through an intermediate layer of the OCR character detection network model;

updating and optimizing the OCR character detection network model by adopting a back propagation algorithm, namely: firstly, calculating the network prediction result of forward propagation training of each batch and the error between the labeling frames corresponding to the training samples of the batch by using a loss function of the OCR character detection network model, and then updating the convolution kernel weight parameters and the intermediate layer connection parameters of the OCR character detection network model by using an Adam optimizer until the error value tends to be stable or the OCR character detection network model reaches a set training round, so as to obtain the trained OCR character detection network model.

As a further improvement of the invention, based on the verification set, the method for performing detection verification by the OCR character detection network model comprises the following steps:

and inputting the verification set into the trained OCR character detection network model to obtain the size, position and category of an OCR character image, namely completing a target detection task and verifying the OCR character detection network model.

As a further improvement of the invention, the file format in the general model is an ONNX format.

The invention also provides an OCR character detection device based on the YOLOv7 algorithm, which comprises the following components:

the OCR character detection network model is configured to adopt a YOLOv7 network structure as a basic frame, adopt a deep separable convolution operation mode to extract deep features in a backbone feature extraction network of the YOLOv7 network structure, and embed an SE attention mechanism module in a head prediction module of the YOLOv7 network structure to display and extract key features;

a training set configured for training learning for the OCR character detection network model;

a validation set configured for the OCR character detection network model to perform detection validation;

and the calling and deploying module is configured to convert the trained OCR character detection network model into a general model in a unified file format, deploy the general model into an engineering platform, detect the training set and/or the verification set and output an OCR character prediction result.

The beneficial effects of the invention are: according to the invention, through technical innovation, the constructed OCR character detection network model is embedded with a deep separable convolution operation and SE attention mechanism on the basis of a YOLOv7 network structure, so that the parameters and the operand of the OCR character detection network model are small, the light OCR character detection effect with high operation precision and high speed is achieved, and the problems of low operation precision and low inference speed of the conventional OCR detection network are well solved.

Drawings

FIG. 1 is a flow chart of the OCR character detection method based on the YOLOv7 algorithm according to the present invention;

FIG. 2 is a block diagram of the architecture of an OCR character detection network model according to the present invention;

FIG. 3 is a block diagram of the MP module according to the present invention;

FIG. 4 is a block diagram of the ELAN module according to the present invention;

FIG. 5 is a block diagram of the structure of the SPPCSPC module according to the present invention;

FIG. 6 is a block diagram of a configuration of an SE attention mechanism module according to the present invention;

fig. 7 is a block diagram of the OCR character detection apparatus based on the YOLOv7 algorithm according to the present invention.

The following description is made with reference to the accompanying drawings:

1. OCR character detection network model; 10. a backbone feature extraction network; 100. a first CBS module; 101. a first assembly module; 11. a head prediction module; 110. a second assembling module; 111. a third assembling module; 20. training a set; 21. a verification set; 3. and calling and deploying the module.

Detailed Description

A preferred embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

Example 1:

please refer to fig. 1, which is a flowchart illustrating an OCR character detection method based on the YOLOv7 algorithm according to the present invention.

The OCR character detection method based on the YOLOv7 algorithm comprises the following operation steps:

s1: constructing an OCR character detection network model 1, wherein the OCR character detection network model 1 adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network 10 of the YOLOv7 network structure by adopting a depth separable convolution operation mode, specifically, OCR character feature information is extracted by adopting depth separable convolution, and an SE attention mechanism module is embedded in a head prediction module 11 of the YOLOv7 network structure for visualization and extraction of key features, so that the detection precision is improved;

s2: establishing a training set 20 and a verification set 21 for the OCR character detection network model 1 to perform training, learning and detection verification;

s3: and the configuration calling and deploying module 3 is used for converting the trained OCR character detection network model 1 into a universal model in a unified file format, deploying the universal model into an engineering platform, detecting the training set 20 and/or the verification set 21, and outputting an OCR character prediction result.

According to the invention, through technical innovation, the constructed OCR character detection network model is embedded with a deep separable convolution operation and SE attention mechanism on the basis of a YOLOv7 network structure, so that the parameters and the operand of the OCR character detection network model are small, the light OCR character detection effect with high operation precision and high speed is achieved, and the problems of low operation precision and low inference speed of the conventional OCR detection network are well solved.

Example 2:

embodiment 2 provides an OCR character detection network model structure, and the specific structure can be referred to fig. 2.

The OCR character detection network model 1 comprises a backbone feature extraction network 10 and a head prediction module 11, wherein the backbone feature extraction network 10 comprises four first CBS modules 100 and three first combination modules 101, wherein the four first CBS modules 100 are composed of depth separable convolution layers (depthwise partial), batch normalization layers BN and activation functions SiLu, the three first combination modules 101 are composed of MP1 modules and ELAN modules, the four first CBS modules 100 and the three first combination modules 101 are sequentially arranged according to a data processing sequence, the three first combination modules 101 respectively output feature maps, and the feature maps output by the three first combination modules 101 are different in size; the head predicting module 11 has a second combination module 110 composed of an SPPCSPC module, two UP modules, four ELAN-H modules and two MP2 modules, and a third combination module 111 composed of a REP module and a CONV module, where the second combination module 110 is configured to receive and fuse three sets of feature maps output by the first combination module 101 to obtain three sets of fused feature results; the three groups of third combination modules 111 are respectively configured to receive and process the three groups of fusion feature results to obtain three groups of network prediction results (i.e., prediction frames) with different sizes; in addition, the SPPCSPPC module is embedded with the SE attention mechanism module.

Further preferably, the depth-separable convolution layer includes depth-separable convolutions with convolution kernel size of 3 × 3 and point-by-point convolutions with convolution kernel size of 1 × 1, which can achieve the effect of reducing the number of parameters.

Further preferably, in each of the first combination modules 101, the MP1 module is composed of three second CBS modules and one maximized pooling layer, the ELAN module is composed of seven third CBS modules and one fully-connected layer, and the structures of the second CBS module and the third CBS module are the same as the structure of the first CBS module 100. For the MP1 module (the specific structure of the MP module can be seen in fig. 3), its role is mainly to visualize features while reducing dimensionality in a manner of minimal information loss; for the ELAN module (the specific structure can be seen in fig. 4), the same group parameters and channel multipliers are applied to convolution kernels of a computation layer by expanding channels and cardinality, then the group parameters are scrambled into a plurality of groups, and then the groups are spliced and fused, so that the OCR character detection network model 1 can learn the effects of diversified feature information and multi-scale feature information.

Further preferably, the number of layers corresponding to each of the first CBS modules 100 is 2. Therefore, according to the backbone feature extraction network 10 structure shown in fig. 2, the number of layers corresponding to the MP1 module is 7, the number of layers corresponding to the ELAN module is 15, and the backbone feature extraction network 10 includes 93 layers in total.

Further preferably, the sppcpcs module is composed of seven fourth CBS modules, three maximum pooling layers and the SE attention mechanism module, the MP2 module is composed of three fifth CBS modules and one maximum pooling layer, and the structures of the fourth CBS module and the fifth CBS module are respectively the same as the structure of the first CBS module 100. For the SPPCSPC module, in combination with the SPPCSPC module structure diagram shown in fig. 5 and the SE attention mechanism module structure diagram shown in fig. 6, it can be seen that: the SPPCSPPC module divides the input into two branches, one branch is subjected to feature extraction, then features of different sizes are subjected to maximum pooling through a feature pyramid, feature information is extracted in a display mode, and the result and a feature map (namely the other branch) which is not subjected to feature pyramid pooling are spliced. For the MP2 module, the MP1 module and the MP2 module belong to MP modules, and the ratio of the two modules is mainly changed by the number of channels.

In addition, the SE attention mechanism module employs conventional technical means, and therefore, will not be described in detail herein. The structure of the ELAN-H module is the same as that of the ELAN module, but the number of output channels of the ELAN-H module is 1/4 of the number of input channels. The REP module includes one depth separable convolutional layer with convolution kernel size of 3 × 3 and one convolutional layer with convolution kernel size of 1 × 1.

In summary, the structure of the OCR character detection network model 1 is generally described with reference to fig. 2, which is: (1) after passing through four groups of the first CBS modules 100, one ELAN module is accessed, and then the three groups of the first combination modules 101 are entered, the output of the three groups of the first combination modules 101 corresponds to the C3/C4/C5 output, and the sizes of the output three groups of feature maps are respectively 80 × 80 × 512, 40 × 40 × 1024, and 20 × 20 × 1024. (2) Three groups of feature maps output by the backbone feature extraction network 10 enter different-size modules of the head prediction module 11 respectively, wherein one group of feature maps (20 × 20 × 1024) enter the SPPCSPC module, and the number of time channels is reduced from 1024 to 512; the remaining two groups of feature maps (80 multiplied by 512 and 40 multiplied by 1024) are fused with the result of the UP module according to the form of top-down to obtain results P3, P4 and P5, and then are fused with the results P4 and P5 according to the form of bottom-UP to obtain three groups of fused feature results; finally, the three groups of fusion feature results respectively perform channel number adjustment and convolution calculation through the three groups of third combination modules 111, specifically: obtaining 3 output frames with different sizes, then calculating and comparing the intersection ratio of the training set (ground route) and the three output frames, and outputting a prediction frame with the highest intersection ratio value (namely a network prediction result).

In addition, the loss function defined in the OCR character detection network model 1 according to the present invention is:

in the above formula, L _total Is the total loss function, N is the number of detection layers, L _conf As a function of confidence, L _local For the localization loss function, L _cls As a function of the classification loss, λ ₁ 、λ ₂ 、λ ₃ The weights of the three loss functions are respectively.

The three loss functions are specifically described below:

(1) confidence function L _conf The expression of (a) is:

σ(x _i )＝Sigmoid(x _i )；

in the above formula, y _i ∈[0,1]Expressing the intersection ratio IOU of the predicted target boundary box and the real target boundary box, sigma expressing the obtained prediction confidence, b expressing the predicted box ^gt Representing the label box, n is the number of positive and negative samples, x _i Is the probability of the current class obtained after the activation function.

(2) Location loss function L _local The expression of (a) is:

L _local ＝(x-x ₁ ) ² +(y-y ₁ ) ² +(w-w ₁ ) ² +(h-h ₁ ) ² ；

in the above formula, x, y, w and h represent the coordinates of the predicted frame and the length and width of the frame, respectively.

(3) Classification loss function L _cls The expression of (a) is:

in the above formula, S _i ×S _j The number of grids into which the scale is divided is shown, n is the number of positive and negative samples, x _i Representing the current class predictor, y _i Representing the current categoryThe true value of (d).

Example 3:

this embodiment 3 provides a method for establishing the training set 20 and the validation set 21.

With reference to fig. 3, the method for establishing the training set 20 and the validation set 21 includes the following steps:

s21: acquiring a plurality of target object pictures with OCR characters;

specifically, a wafer image is acquired through an acquisition device (such as a camera), then a wafer image without OCR characters is removed, and a wafer image with OCR characters is reserved, so that a plurality of target object images with OCR characters are obtained.

S22: labeling the obtained target object picture to obtain a corresponding labeling frame; dividing the marked target object picture into a training sample set and a verification sample set;

specifically, labelme label making software is used for manually labeling each wafer picture with OCR characters, namely, each OCR character is labeled with a label, and after labeling, a corresponding labeling frame, position information (including coordinates of a central point of the labeling frame, frame height and frame width) and category information can be obtained; then, dividing the marked target object picture into a training sample set and a verification sample set, wherein the number proportion of the pictures in the training sample set and the verification sample set is not limited, and the number proportion is preferably 9:1 or 8:2, or others.

S23: performing enhancement processing on the obtained training sample set to obtain the training set 20; the validation sample set may be used directly as the validation set 21.

Specifically, a training sample set is enhanced by adopting enhancement modes such as mosaic, mosaic9, mixup, copy _ paste and the like to obtain the training set 20; the mosaic and mosaic9 are the effect of randomly extracting 4 images from a data set to splice random areas to realize data enhancement, the mixup is the enhancement by mixing layers of two random images, and the copy _ pass is the enhancement by randomly replacing the backgrounds of different images.

Example 4:

based on the training set 20 and the verification set 21 established in embodiment 3, this embodiment 4 provides a method for performing training, learning, detection, and verification on the OCR character detection network model 1, which specifically includes:

i) based on the training set 20, the method for performing iterative training on the OCR character detection network model 1 comprises the following steps:

s24: setting an initial iteration number (e.g. 2000), an initial learning rate (e.g. 10) ^-4 ) And batch _ size (e.g., 320), and set data set path parameters and category parameters (e.g., 38), trained with the parallel GPU;

s25: inputting the training set 20 into the OCR character detection network model 1 in batches for forward propagation training, and obtaining a network prediction result of each batch of forward propagation training after passing through the middle layer of the OCR character detection network model 1;

specifically, clustering target object pictures in the training set 20 to obtain K prior frames, inputting all the prior frames and original pictures in the training set 20 into the OCR character detection network model 1 to generate a feature map, and obtaining position information, category information and confidence of the feature map relative to the prior frames; obtaining a certain number of candidate frames based on a set confidence threshold value, and position information, category information and confidence of the feature map relative to the prior frame; and performing non-maximum suppression on all the candidate frames to obtain a prediction frame, namely a network prediction result.

S26: updating and optimizing the OCR character detection network model 1 by adopting a back propagation algorithm, namely: firstly, calculating errors between a network prediction result (a prediction frame) of forward propagation training of each batch and a labeling frame corresponding to a batch of training samples by using a loss function of the OCR character detection network model 1, and then updating a convolution kernel weight parameter and an intermediate layer connection parameter of the OCR character detection network model 1 by using an Adam optimizer until the error value tends to be stable or the OCR character detection network model 1 reaches a set training round, so as to obtain the trained OCR character detection network model 1.

Specifically, the update formula is:

η in the above formula is the step length, typically η =1 × 10 ^-5 ；ω ^t+1 ,v ^t+1 Respectively represent omega ^t ,v ^t As a result of the update of (2),

the partial derivative result is shown.

II) based on the verification set 21, the method for the OCR character detection network model 1 to perform detection verification (specifically, the method is used for verifying the model detection result of the OCR character detection network model 1 after each iteration training) comprises the following steps:

and inputting the verification set 21 into the trained OCR character detection network model 1 to obtain the size, position and category of an OCR character image, namely completing a target detection task and verifying the OCR character detection network model 1.

Example 5:

this embodiment 5 provides a method for the calling and deploying module 3 to convert and deploy the trained OCR character detection network model 1, which specifically includes:

s51: applying the trained OCR character detection network model 1 weight to a Yolov7 official program, converting a suffix pt format file into a suffix ONNX format, and obtaining the general model, namely, the file format in the general model is the ONNX format;

s52: downloading visual studio 2022 from the official network, downloading NET platform deployment engineering from the Yolov7 official network, replacing ONNX format files in the original Yolov7 with ONNX format files of the general model, and then modifying type parameters, type quantity parameters, size parameters, confidence coefficient parameters and the like to finish the deployment of the general model in the engineering platform;

s53: and inputting the training set 20 and/or the verification set 21 into an engineering platform for detection to obtain all prediction results, and simultaneously performing test analysis.

As can be seen from the above, the calling and deploying module 3 can be understood as a set of "Yolov7 official program, visual studio 2022 for official website download, and NET platform deployment engineering for Yolov7 official website download".

The method for the engineering platform to detect the training set 20 and/or the verification set 21 is as follows: in the detection, six evaluation indexes, namely precision (precision), recall (Recall), intersection Over Unit (IOU), average Precision (AP), average precision Mean (MAP) and detection speed (speed), are adopted as evaluation standards of the OCR character detection experiment.

The precision ratio is defined as:

recall is defined as:

the cross-over ratio is defined as:

the average accuracy is defined as:

the mean precision mean is defined as:

in the above formula, TP represents a positive sample predicted as a positive class by the model, TN represents a negative sample predicted as a negative class by the model, FP represents a negative sample predicted as a positive class by the model, FN represents a positive sample predicted as a negative class by the model, a represents a predicted OCR character prediction result, B represents an OCR character label, n represents the number of classes, P (k) and ^△R(k) precision and recall, respectively.

Example 6:

this embodiment 6 provides an OCR character detection device based on the YOLOv7 algorithm, and the specific structure can be seen from fig. 7.

The OCR character detection apparatus includes: the method comprises an OCR character detection network model 1, a training set 20, a verification set 21 and a calling and deploying module 3, wherein the OCR character detection network model 1 is configured to adopt a YOLOv7 network structure as a basic framework, deep feature extraction is carried out in a backbone feature extraction network 10 of the YOLOv7 network structure in a deep separable convolution operation mode, an SE attention mechanism module is embedded in a head prediction module 11 of the YOLOv7 network structure to carry out visualization and extraction of key features, and detection accuracy is improved; the training set 20 is configured for training learning by the OCR character detection network model 1; the validation set 21 is configured to be used for the OCR character detection network model 1 to perform detection validation, specifically, for model detection result validation after each round of iterative training is performed on the OCR character detection network model 1; the calling and deploying module 3 is configured to convert the trained OCR character detection network model 1 into a universal model in a unified file format, deploy the universal model into an industrial application engineering platform, and then detect the training set 20 and/or the verification set 21 and output OCR character prediction results.

Further preferably, the OCR character detection network model 1 may adopt the network model structure shown in embodiment 2.

The training set 20 and the validation set 21 may adopt the "training set" and "validation set" structures shown in example 3.

The invoking and deploying module 3 may adopt Yolov7 official program, visual studio 2022 downloaded by the official website, and NET platform deployment project downloaded by the Yolov7 official website as shown in embodiment 5.

The OCR character detection apparatus provided in this embodiment 6 can execute the OCR character detection method based on the YOLOv7 algorithm provided in any embodiment of the present invention, and has corresponding beneficial effects of the execution method.

In conclusion, through technical innovation, the constructed OCR character detection network model is embedded with a deep separable convolution operation and an SE attention mechanism on the basis of a YOLOv7 network structure, so that the parameters and the operand of the OCR character detection network model are small, the light OCR character detection effect with high operation precision and high speed is achieved, and the problems of low operation precision and low inference speed of the conventional OCR detection network are well solved.

In the previous description, numerous specific details were set forth in order to provide a thorough understanding of the present invention. The foregoing description is that of the preferred embodiment of the invention only, and the invention can be practiced in many ways other than as described herein, so that the invention is not limited to the specific implementations disclosed above. And that those skilled in the art may, using the methods and techniques disclosed above, make numerous possible variations and modifications to the disclosed embodiments, or modify equivalents thereof, without departing from the scope of the claimed embodiments. Any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the scope of the technical solution of the present invention.

Claims

1. An OCR character detection method based on a YOLOv7 algorithm is characterized in that: the method comprises the following steps:

constructing an OCR character detection network model (1), wherein the OCR character detection network model (1) adopts a YOLOv7 network structure as a basic frame, deep feature extraction is carried out in a backbone feature extraction network (10) of the YOLOv7 network structure in a deep separable convolution operation mode, and an SE attention mechanism module is embedded in a head prediction module (11) of the YOLOv7 network structure for visualization and extraction of key features;

establishing a training set (20) and a verification set (21) for the OCR character detection network model (1) to perform training learning and detection verification;

and the configuration calling and deploying module (3) is used for converting the trained OCR character detection network model (1) into a universal model in a unified file format, deploying the universal model into an engineering platform, detecting the training set (20) and/or the verification set (21) and outputting an OCR character prediction result.

2. An OCR character detection method based on the YOLOv7 algorithm as claimed in claim 1, characterized in that: the backbone feature extraction network (10) is provided with four first CBS modules (100) which are all composed of a depth separable convolution layer, a batch normalization layer BN and an activation function SiLu, and three first combination modules (101) which are all composed of an MP1 module and an ELAN module, wherein the four first CBS modules (100) and the three first combination modules (101) are sequentially arranged according to a data processing sequence, the three first combination modules (101) respectively output feature maps, and the feature maps output by the three first combination modules (101) are different in size;

the head prediction module (11) is provided with a second combination module (110) consisting of an SPPCSPC module, two UP modules, four ELAN-H modules and two MP2 modules, and three third combination modules (111) consisting of a REP module and a CONV module, wherein the second combination module (110) is used for receiving and fusing three groups of feature maps output by the first combination module (101) to obtain three groups of fused feature results; the three groups of third combination modules (111) are respectively used for receiving and processing the three groups of fusion characteristic results to obtain three groups of network prediction results with different sizes; in addition, the SPPCSPPC module is embedded with the SE attention mechanism module.

3. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 2, wherein: the depth separable convolution layer comprises depth separable convolution with convolution kernel size of 3 multiplied by 3 and point-by-point convolution with convolution kernel size of 1 multiplied by 1;

in each of the first combination modules (101), the MP1 module is composed of three second CBS modules and one maximized pooling layer, the ELAN module is composed of seven third CBS modules and one fully connected layer, and the structures of the second CBS module and the third CBS module are respectively the same as the structure of the first CBS module (100).

4. An OCR character detection method based on the YOLOv7 algorithm as claimed in claim 2, characterized in that: the SPPCSPPC module consists of seven fourth CBS modules, three maximum pooling layers and the SE attention mechanism module, the MP2 module consists of three fifth CBS modules and one maximum pooling layer, and the structures of the fourth CBS modules and the fifth CBS modules are respectively the same as that of the first CBS module (100).

5. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 1, wherein: a method of establishing the training set (20) and the validation set (21), comprising:

acquiring a plurality of target object pictures with OCR characters;

performing enhancement processing on the obtained training sample set to obtain the training set (20);

the verification sample set is directly used as the verification set (21).

6. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 5 wherein: based on the training set (20), the method for iteratively training the OCR character detection network model (1) comprises the following steps:

inputting the training set (20) into the OCR character detection network model (1) in batches for forward propagation training, and obtaining a network prediction result of each batch of forward propagation training after passing through the middle layer of the OCR character detection network model (1);

updating and optimizing the OCR character detection network model (1) by adopting a back propagation algorithm, namely: firstly, calculating the network prediction result of forward propagation training of each batch and the error between the labeling frames corresponding to the training samples of the batch by using a loss function of the OCR character detection network model (1), and then updating the convolution kernel weight parameters and the intermediate layer connection parameters of the OCR character detection network model (1) by using an Adam optimizer until the error value tends to be stable or the OCR character detection network model (1) reaches a set training round to obtain the trained OCR character detection network model (1).

7. An OCR character detection method based on the YOLOv7 algorithm as recited in claim 5 wherein: based on the verification set (21), the method for detecting and verifying the OCR character detection network model (1) comprises the following steps:

and inputting the verification set (21) into the trained OCR character detection network model (1) to obtain the size, position and category of an OCR character image, namely completing a target detection task and verifying the OCR character detection network model (1).

8. An OCR character detection method based on the YOLOv7 algorithm as claimed in claim 1, characterized in that: the file format in the general model is an ONNX format.

9. An OCR character detection device based on a YOLOv7 algorithm is characterized in that: the method comprises the following steps:

the OCR character detection network model (1) is configured to adopt a YOLOv7 network structure as a basic frame, adopt a deep separable convolution operation mode to extract deep features in a backbone feature extraction network (10) of the YOLOv7 network structure, and embed an SE attention mechanism module in a head prediction module (11) of the YOLOv7 network structure to display and extract key features;

a training set (20) configured for training learning for the OCR character detection network model (1);

a validation set (21) configured for detection validation by the OCR character detection network model (1);

and the calling and deploying module (3) is configured to convert the trained OCR character detection network model (1) into a universal model in a unified file format, and after the universal model is deployed in an engineering platform, detect the training set (20) and/or the verification set (21) and output an OCR character prediction result.