CN115661703A

CN115661703A - Method for extracting shop signboard information based on deep learning

Info

Publication number: CN115661703A
Application number: CN202211256545.XA
Authority: CN
Inventors: 戴激光; 谷越; 张子恩
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2022-10-14
Filing date: 2022-10-14
Publication date: 2023-01-31

Abstract

The invention provides a deep learning-based store signboard information extraction method, which specifically comprises the following steps: acquiring data for constructing a deep learning model and dividing a data set; improving a feature extraction structure, enhancing pyramid features, and further constructing a model for extracting shop signboard information; training and verifying the target detection model to obtain an optimal detection model weight file; and importing a target image to be detected, detecting the position of the target image and outputting a target detection result according to a preset threshold value. Through the detection of the deep learning target detection and the shop signboard, the generated model is small, and the deployment and the transplantation to the edge device are facilitated to carry out result calculation, so that the working efficiency of inspection personnel is improved, and the cost is reduced.

Description

Method for extracting shop signboard information based on deep learning

Technical Field

The invention belongs to the technical field of industrial vision and smart cities, and particularly relates to a method for extracting shop signboard information based on deep learning.

Background

Early city infrastructure information was mostly counted and aggregated in the form of field surveys. Although the data is accurate, when the urban infrastructure information is investigated in a large range, the method needs to consume extremely expensive time cost and material cost, and the obtained result has hysteresis and greatly reduced utilization value. The shop signboard construction information is one of the most important city infrastructure information, and has important values for the works of city appearance management, city economic development analysis, city three-dimensional reconstruction and the like.

In order to obtain the ground object target information efficiently, two modes are mainly adopted by scholars at home and abroad. One is an object-oriented approach. The method utilizes an image segmentation algorithm to cluster pixel points with similar characteristics to form a segmentation region unit, and then utilizes an image classification algorithm. However, in the conventional method, the image objects are formed by bottom-up aggregation based on the spectral information and the texture information of the images in the segmentation and classification, namely, based on the homogeneity of pixels. The dimensions and color rendering of the shop signboards are different as recognition targets, which undoubtedly results in large spatial heterogeneity and difficulty in extracting shop signboard information. On the technical method level, the method for obtaining the image object by segmentation and classification does not basically consider the high-level features of the image, such as morphological information, context and semantic information, and the like, and is based on the pixel aggregation of the spectral features, and does not fully utilize other features of the street view image. Therefore, the object units obtained by the traditional method are not matched with the shape cognition of people on the actual target ground object, and the object-level classification result cannot be converted into an interpretation result with actual entity significance. The second is a deep learning method. The method is mainly based on data-driven decision and self-iterative optimization, and can perform multi-level abstraction in semantic image analysis through the representation learning of a depth network, so that the performance of the method in remote sensing application is far superior to that of a traditional method. For example, for the occlusion and small target problems, xu et al use a Loss function Rep Loss with repulsive force to solve the target occlusion problem, and separately detect the two scales of targets using scale-aware two-way sub-networks, and finally perform perceptual fusion at the inference end. Aiming at the positioning problem of a target covered by a perspective panel under multivariable, morera et al use a self-labeling data set, and combine loss functions such as localization loss and the like through various data enhancement modes. Aiming at the problem of target detection of partial occlusion under the condition of low resolution, li Linsheng people design the size of a candidate frame to generate 9 different candidate areas, so that the recognition capability of a model on different features is improved, and two sliding windows are added on the basis of the original RPN, so that the information intake range of the model is enlarged.

The method aims to realize classification and extraction of different types of targets in an image by adopting an inherent deep learning model, and always cannot achieve the effect of manual visual interpretation, namely, each type of target needs to respectively design a deep learning extraction algorithm according to specific visual features of the target. By means of the advantages of high detection precision and high reasoning speed of YOLOv5, the invention provides a shop signboard information extraction network integrating position attention and topology reconstruction under the constraint condition of rectangular features.

Disclosure of Invention

In order to solve the problems of time and labor waste, high cost and the like of the traditional manual routing inspection of the shop signboard information, the invention provides a deep learning-based shop signboard information extraction method to solve the existing technical defect problems.

The invention provides a deep learning-based store signboard information extraction method, which specifically comprises the following steps:

s1: acquiring data for constructing a deep learning model and dividing a data set;

s2: improving a feature extraction structure, enhancing pyramid features, and further constructing a model for extracting shop signboard information;

s3: training and verifying the target detection model to obtain an optimal detection model weight file;

s4: and importing a target image to be detected, detecting the position of the target image and outputting a target detection result according to a preset threshold value.

By the deep learning target detection method, the generated model is small, and the deployment and the transplantation to edge equipment for result calculation are facilitated, so that the working efficiency of inspection personnel is improved, and the cost is reduced.

Further, step S1 mainly includes the following steps:

s11, sorting data, and only considering the object imaged by the front view angle as initial data;

s12, importing initial data, marking position information of a shop signboard through a manual picture frame of marking software LabelImg, marking information of target detection, and representing a marked text file according to a format of < object-class-ID > < X center > < Y center > < frame width > < frame height >;

and S13, converting the obtained data set in the VOC format into a TXT format, further dividing the data set into a training set, a verification set and a test set according to the proportion of 1.

Further, the deep learning model is constructed based on a Yolov5 target detection model.

Further, step S2 mainly includes the following steps:

s21, integrating an SP module and a PSA module aiming at multi-scale rectangular features of a shop signboard to form a multi-scale rectangular feature constraint, and further building an MFSPP module to improve the generalization capability of the model on the basis of ensuring the multi-scale perception performance;

s22, aiming at the spatial distribution characteristics of the shop signboards, introducing a Coordconv layer to improve the attention of the middle position of the image and reduce the interference of similar ground objects at other positions; reconstructing a topological relation through an integrated CoT module, and solving the problem of target fragmentation caused by shielding;

and S23, aiming at the weakening problem of feature propagation from a high level to a low level of the network, CBAM is introduced for feature enhancement, and the sufficiency and accuracy of feature utilization are ensured.

Further, step S3 mainly includes the following steps:

s31, setting a pre-training model path, setting a data set reading path, further setting training iteration times and the size of an incoming picture, setting other parameter options as defaults, and starting to train the target model;

s32, visualizing the training process and checking a change curve of indexes related to the model;

s33, after the training is finished, comparing, checking and storing the best target model and outputting the best target model to a specified path;

and S34, performing predictive reasoning on the test sample on the trained target model to verify the target model.

Has the advantages that: compared with the prior art, the method for extracting the shop signboard information based on deep learning has the following advantages:

(1) By the deep learning target detection method, the generated model is small, and the deployment and the transplantation to edge equipment for result calculation are facilitated, so that the working efficiency of inspection personnel is improved, and the cost is reduced;

(2) The speed of prediction and inference is effectively improved through a single-stage target detection model.

Drawings

The description of the present invention will become apparent and readily understood in conjunction with the following drawings, in which:

FIG. 1 is a schematic flow chart of a method for extracting information of a store signboard based on deep learning according to an embodiment of the present invention;

FIG. 2 is a schematic overall flow chart of a deep learning-based method for extracting information of store signboards according to an embodiment of the present invention;

3-1 and 3-2 are schematic diagrams respectively illustrating screening of effective data stored in a data set and labeling of the data set according to a label in a deep learning-based method for extracting information of a store signboard according to an embodiment of the present invention;

4-1-4-9 are schematic diagrams of the overall architecture of the improved Yoldov 5 target detection model, a CoordConv module, an SP module, a CoT module, a PSA module, a BottleNeck module in C3_ P, an MFSPP module, a BottleNeck module in C3_ SC, and an attention module, respectively, in the method for extracting information of a shop signboard based on deep learning according to the embodiment of the present invention;

5-1-5-8 are diagrams illustrating a visualization prediction process, a label _ correlogram, a representation of a training set and a verification set on a model, P _ curve, PR _ curve, R _ curve, a positive sample prediction result, and a preview of a sampling result of the positive sample prediction set in the deep learning-based method for extracting shop signboard information according to the embodiment of the present invention, respectively;

fig. 6 is a visualization result of the deep learning-based method for extracting information of the store signboard and other methods on the CTW data set according to the embodiment of the present invention.

Detailed Description

In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced.

Fig. 1 shows an embodiment of the present invention and discloses a deep learning-based method for extracting information of a store signboard, which specifically includes the following steps, as shown in fig. 1 and fig. 2:

further, S1 specifically includes:

The data set for the deep learning model is divided into several categories, namely target detection, image segmentation, image classification and instance segmentation according to the executed task, in the embodiment, the data set with the file in the labeling format is used for image segmentation, and the data set obtained after labeling is divided into a training set, a verification set and a test set according to the proportion of 8.

Specifically, the data set in the VOC format used in this embodiment is mainly street view data obtained by screening a mobile collection vehicle, selecting an image covering a shop signboard target as initial data, performing information labeling of target detection through labeling software label img, namely, manually labeling position information of a target object with a picture frame, converting the data set in the VOC format into a txt format dedicated for a YOLO model, and finally dividing the proportion according to the data set to obtain the data set required by final training.

There are usually multiple data sets used to build the final model, and at different stages of building the model there are usually three data sets: training set, validation set and test set.

First, the model is fitted on a training set (training dataset). For supervised learning, a training set is a set of samples that are used to fit parameters (e.g., weights for links between neurons in an artificial neural network). In practice, a training set is typically a data pair consisting of an input vector (scalar) and an output vector (scalar). Where the output vector (scalar) is called a target or tag. During the training process, the current model predicts each sample in the training set and compares the prediction result with the target.

Based on the comparison, the learning algorithm updates the parameters of the model. The process of model fitting may include both feature selection and parameter estimation.

Next, the fitted model is predicted on the second dataset, validation dataset. When adjusting for hyper-parameters of the model (e.g., the number of neurons in the hidden layer in the neural network), the validation set provides an unbiased evaluation of the model fitted on the training set. The validation set can be used for early stopping in regularization: when the validation set error rises, which is a signal that is over-fitted on the training set, the training is stopped. However, in practice, this sometimes fails because the validation set errors fluctuate during the training process. Thus, one has devised rules for determining a signal that is better overfit.

Finally, a test dataset (test dataset) can be used to provide an unbiased evaluation of the final model. If the test set is never used in the training process (e.g., not used in cross-validation), it is also referred to as a reserved set.

Referring to fig. 3-1 and 3-2, in the present embodiment, the specific processing flow for the data set is as follows:

1. arranging street view images acquired by the mobile acquisition vehicle;

2. storing collected and sorted images covering store signboards into a data set, wherein the images have front-view imaging conditions;

3. importing the sorted pictures into labeling software labelImg for data cleaning and data labeling;

4. and converting the marked data file into a yolo format, and training.

referring to fig. 4-1 to 4-8, S2 further includes:

the training and verifying part is mainly used for carrying out corresponding setting according to the hyper-parameter configuration of the network model. In the training process, several important parameters are involved: data is a data set reading path, epochs is the training iteration times, img-size is the size of an incoming picture, and other parameters are default; since the data set is divided into the training set, the verification set and the test set when being divided, it is equivalent to verifying the verification set while training.

Referring to fig. 5-1 to 5-8, S3 specifically includes:

Specifically, in this embodiment, the preset threshold is set to 0.45, and the function of the threshold is mainly to limit the magnitude of the confidence, that is, the threshold will filter out the redundant detection results that do not satisfy the value condition > 0.45.

Fig. 6 shows the visualization results of the deep learning-based method for extracting information of store signboards and other methods on the CTW data set according to the embodiment of the present invention, and the detailed analysis is as follows:

the validation of the method proposed by the present invention is verified below by a set of experiments, which are the extraction results on the CTW data set, and compared with existing algorithms. Fig. 6 is a visualization of results on a CTW data set, including the extraction results of the present invention and the comparison method results.

As can be seen from fig. 6, the biggest difficulty for automatic detection in the CTW data set is that the shop signboard on the image is occluded by trees, resulting in blurred or even no observable boundaries. Compared with other testing methods, the algorithm is less affected by occlusion, and the method can accurately extract the shop signboards. As can be seen from fig. 6, the first column shows the large-scale shop signboard without occlusion, and the third to fifth columns show the shop signboard with higher and higher occlusion degree, i.e. with higher and higher difficulty in automatic extraction. For the non-occlusion large-scale shop signboard shown in the first column of the drawing, the extraction result can be completely obtained by our method and other four methods except YOLOv 5. However, in the case of occlusion, these methods behave differently. From the second column we can see that only our method, efficientdet and YOLOX, can extract small-scale partially occluded store signs, but the boundaries extracted by both Efficientdet and YOLOX methods are not very accurate. For the severely-occluded store signboard shown in the third and fourth column diagrams, compared with other test algorithms, the extraction result of the method is more consistent with the actual store signboard area on the image, which shows that the OSO-YOLOv5 store signboard extraction network provided by the method can effectively improve the detection precision and eliminate the influence of occlusion on road extraction. In summary, the method can sense multi-scale information, and can also infer the shielded part through context information, so as to obtain higher detection precision.

Table 1 shows that the five indicators obtained the highest scores on the self-labeling CTW data set by our method, indicating that our algorithm can balance between accuracy and recall. The corresponding accuracy and recall were 82.7% and 87.6%, respectively, indicating that 82.7% of the detected elements were store signs in the test image and 87.6% of all the store sign elements in the image were correctly detected. The F1 value of the method is 82.4 percent and is 6.0 to 33.8 percent higher than that of other methods, which shows that the method has high advantages. Our method has a IoU value of 78.1%, 16.3% higher than the second ranked YOLOv5_ l. However, our method showed a 7.5% increase in recall and a 3.5% increase in IoU over the original YOLOv5_ l, indicating that our method resulted in more successfully detected store signs than the original YOLOv5_ l.

TABLE 1 quantitative evaluation of CTW data sets

According to the technical scheme, the deep learning target detection is combined with the detection of the shop signboards, the generated model is small, and the deployment and the transplantation to edge equipment for result calculation are facilitated, so that the working efficiency of inspection personnel is improved, and the cost is reduced; the speed of prediction and inference is effectively improved through a single-stage target detection model.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the scope of the invention as defined by the appended claims. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims

1. A method for extracting shop signboard information based on deep learning is characterized by comprising the following steps:

2. The method for extracting shop signboard information based on deep learning according to claim 1, wherein S1 further comprises:

3. The method for extracting information of doorways signboards based on deep learning of claim 2, wherein the deep learning model is constructed based on a YOLOv5 target detection model.

4. The method for extracting shop signboard information based on deep learning according to claim 3, wherein the step S2 further comprises:

5. The method for extracting shop signboard information based on deep learning according to claim 4, wherein in S3, the method further comprises: