CN116664883A

CN116664883A - Cargo image recognition method and system based on convolutional neural network

Info

Publication number: CN116664883A
Application number: CN202310551013.7A
Authority: CN
Inventors: 林阿勇; 武博; 祁锋; 吴天愉; 符莉婷; 陈文�
Original assignee: Hainan Port And Channel Logistics Co ltd
Current assignee: Hainan Port And Channel Logistics Co ltd
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-08-29

Abstract

The invention discloses a cargo image recognition method and system based on a convolutional neural network, and belongs to the technical field of artificial intelligence. Acquiring X-ray images of a container to be detected, carrying out image segmentation on each X-ray image to be processed by adopting a trained 2D convolutional neural network model based on a U-Net architecture, obtaining a plurality of segmented images, carrying out similar image detection based on BOW and K-means, matching each segmented image with a standard picture in a cargo database, and combining the segmented model and the similar image detection model with highest similarity as a recognition result, wherein the segmentation effect is greatly improved in the face of a logistics cargo counting scene, and further the cargo recognition accuracy is improved.

Description

Cargo image recognition method and system based on convolutional neural network

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a cargo image recognition method and system based on a convolutional neural network.

Background

Because of cost limitations, logistic companies can only use very few people to inventory goods, so that an image-based logistic automatic judging system is particularly important.

Patent CN 115035129A discloses a method for identifying goods, comprising: acquiring an image to be processed containing at least one cargo by monitoring equipment; processing the image to be processed through the trained instance segmentation model to generate an instance segmentation result of the goods; generating a cargo pile outline of the cargo according to the example segmentation result; and matching the profile of the goods stack with data in a goods database to acquire the information of the goods. The segmentation model consists of a Detector module and a mask branch (BlendMask module), the Detector network uses fcos (Fully Convolutional One-Stage Object Detection, full convolution, object detection in a per pixel prediction mode) algorithm. The Fcos algorithm uses multi-level detection such as FPN (Feature Pyramid Networks, feature map pyramid network) to detect targets of different sizes at feature layers of different origins.

However, this method has the following drawbacks and disadvantages: the intelligent goods-watching for storage management is achieved, the acquired images are directly shot through monitoring equipment, the goods with large quantity and simple structure such as steel bars, steel coils, rubber and crops are segmented and matched, the scene difference with the logistics goods is large, and when the segmentation network is used for logistics goods counting, the segmentation effect is poor and the accuracy is low.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a cargo image recognition method and system based on a convolutional neural network, which aims to solve the problems of poor segmentation effect and low accuracy of the existing recognition method.

To achieve the above object, in a first aspect, the present invention provides a cargo image recognition method based on a convolutional neural network, the method comprising:

acquiring an X-ray image of a container to be inspected;

performing image segmentation on each X-ray image to be processed by adopting a trained 2D convolutional neural network model based on a U-Net architecture to obtain a plurality of segmented images;

and on the basis of similar image detection of BOW and K-means, matching each segmented image with a standard picture in a cargo database, and taking the highest similarity as a recognition result.

Preferably, the detection of similar images based on BOW and K-means is specifically as follows:

firstly, generating characteristic points and descriptors of each image in an image database by using a SIFT algorithm;

clustering the characteristic points in the image library by using a K-means algorithm, wherein the number of clustering centers is K, combining the clustering centers to form a dictionary, and calculating the weight of each visual word TF-IDF according to the IDF principle to represent the importance degree of the visual word on distinguishing the images;

counting the number of times that each word in the dictionary appears in the feature set of each image in the image database, and representing each image as a histogram;

after the histogram vector of each image is obtained, constructing an inverted list of features to the images, and rapidly indexing the related candidate images through the inverted list;

and for the image to be detected, computing a sift characteristic, converting the sift characteristic into a frequency histogram according to the TF-IDF, and judging the similarity of the histogram vectors according to the index result.

Preferably, the 2D convolutional neural network model based on the U-Net architecture comprises:

an encoder consisting of a plurality of convolution layers and a pooling layer for extracting features of an input image;

a decoder composed of a plurality of deconvolution layers and convolution layers for upsampling the feature map extracted in the encoder and restoring to the size of the input image, and outputting a predicted segmentation result through the convolution layers;

a jump connection is added between the encoder and the decoder for fusing the shallow features and the deep features.

Preferably, the encoder is VGG or ResNet.

Preferably, the output of each convolution and transpose convolution layer in the U-Net architecture based 2D convolutional neural network model uses batch normalization for accelerating convergence and yielding better results.

Preferably, the method further comprises:

counting the quantity and the types of the goods to be detected, performing data matching with a declared goods list, and reminding or warning if the goods are inconsistent.

To achieve the above object, in a second aspect, the present invention provides a cargo image recognition system based on a convolutional neural network, including: a processor and a memory; the memory is used for storing computer execution instructions; the processor is configured to execute the computer-executable instructions such that the method of the first aspect is performed.

Preferably, a Spring Boot application is deployed on the server using nglnx as a reverse proxy, which listens only to the local loop address of the virtual machine, so that it can be accessed only through the nglnx proxy, and the picture file can be uploaded to the server-specified folder.

Preferably, the system further comprises:

the accelerator is used for generating X-rays, and the X-rays have different degrees of energy loss after passing through the container to be detected;

the detector is used for receiving the X-rays, converting the X-rays into electric signals with different voltages according to the degree of energy loss and sending the electric signals to the image acquisition module;

the image acquisition module is used for converting the electric signals into image information through an image processing algorithm;

and the transmission and scanning module is used for controlling the relative movement of the inspected container and the X-ray source so as to obtain perspective images of the inspected container at different visual angles.

Preferably, the detector employs a photon counting X-ray detector.

In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art:

the invention provides a cargo image recognition method and system based on a convolutional neural network, which are characterized in that an X-ray image of a container to be detected is acquired, a trained 2D convolutional neural network model based on a U-Net architecture is adopted to carry out image segmentation on each X-ray image to be processed, a plurality of segmented images are obtained, each segmented image is matched with a standard picture in a cargo database based on similar image detection of BOW and K-means, the highest similarity is used as a recognition result, the segmented model is combined with a similar image detection model, the object cargo checking scene is faced, the segmentation effect is greatly improved, and the recognition accuracy is further improved.

Drawings

Fig. 1 is a flowchart of a cargo image recognition method based on a convolutional neural network.

Fig. 2 is a schematic X-ray image of a container under inspection provided in an embodiment of the present invention.

FIG. 3 is a schematic diagram of a similar image detection principle based on BOW and K-means according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, the invention provides a cargo image recognition method based on a convolutional neural network, which comprises the following steps:

as shown in fig. 2, an X-ray image of the container under inspection is acquired;

Preferably, the X-ray image to be processed is subjected to a filling operation prior to segmentation. This is to ensure that the size of the image satisfies the requirements when performing the convolution operation. The fill value is 0 ('CONSTANT').

an encoder, which is composed of a plurality of convolution layers and a pooling layer, is used to extract features of an input image.

The encoder comprises 4 stages, each comprising two convolutional layers followed by a max pooling layer. The convolutional layer uses a layer 2d layer BN function, which implements a convolutional layer with Batch Normalization (BN). The convolution layer is used to extract features of the input image, while the max-pooling layer is used to reduce the spatial size of the image. The number of filters in the convolutional layer is doubled in turn as the encoder goes deeper, increasing from 64 to 1024. After the last layer of the encoder is output, two convolution layers are connected to form an intermediate layer.

And a decoder composed of a plurality of deconvolution layers and convolution layers for upsampling the feature map extracted in the encoder and restoring to the size of the input image, and outputting a predicted segmentation result through the convolution layers.

The decoder comprises 4 stages, each comprising a deconvolution layer followed by two convolution layers. The deconvolution layer implements a band-batch normalized deconvolution layer using a layers. Deconv2d_layer_bn function. The deconvolution layer is used for expanding the spatial dimension of the feature map and recovering the spatial detail of the image. The number of filters in the convolutional layer is gradually halved as the decoder depth increases, from 512 to 64.

At each stage of the decoder, the output of the corresponding encoder stage is clipped and spliced with the deconvolution layer output of the current stage using a layers. These jump connections help to combine the advanced features extracted by the encoder with the spatial information recovered by the decoder, thereby improving model performance.

At the final layer of the decoder, the output feature map is converted to a prediction using a 1x1 convolutional layer (also with batch normalization). The number of filters here is equal to nlabels, representing the number of classes.

Finally, the function returns a predicted outcome tensor pred. This prediction can be used to calculate the loss function and optimize the model during the training process.

In this embodiment, a cross entropy loss function is used as an objective function for training U-Net for measuring the difference between the predicted result and the real tag.

Preferably, the encoder is VGG or ResNet.

Batch normalization helps reduce internal covariate offset by normalizing the input to each layer to the same mean and variance, so that each layer can learn better features independently. In addition, batch normalization also allows for higher learning rates to be used, further speeding up the training process.

Preferably, the present invention is based on similar image detection of BOW and K-means. As shown in fig. 3, feature points and descriptors of each graph in the image database are first generated by using a SIFT algorithm; clustering the characteristic points in the image library by using a K-means algorithm, wherein the number of clustering centers is K, and combining the clustering centers together to form a dictionary; according to the IDF principle, each visual word TF-IDF weight is calculated to represent the importance of the visual word to distinguish images. For each image in the image database, counting the number of times each word in the dictionary appears in its feature set, and representing each image as a histogram. After the histogram vector of each graph is obtained, an inverted list of features to the images is constructed, and the images of related candidates are rapidly indexed through the inverted list. And for the image to be detected, computing a sift characteristic, converting the sift characteristic into a frequency histogram according to the TF-IDF, and judging the similarity of the histogram vectors according to the index result.

Preferably, the method further comprises:

A table is built in the MySQL database and used for storing diagnosis results, a special mark field (SI) is arranged in the table, and each SI corresponds to a standard picture of a logistics company and can represent the container characteristics or the goods characteristics of a company.

The invention provides a cargo image recognition system based on a convolutional neural network, which comprises: a processor and a memory; the memory is used for storing computer execution instructions; the processor is configured to execute the computer-executable instructions such that the method of the first aspect is performed.

The configuration file of the Springboot is added with the following configuration:

server.address＝127.0.0.1；

server.port＝8080；

the specific path of the file to be uploaded is upload.

The virtual machine installs the Nginx, the program is arranged into jar packets and uploaded to the server, and the configuration file of the Nginx is edited on the virtual machine and is usually located at/etc/ginx/sites-available/default. The following are added within the server block:

location/{

proxy_pass http://127.0.0.1:8080；

proxy_set_header Host$host；

proxy_set_header X-Real-IP$remote_addr；

proxy_set_header

X-Forwarded-For$proxy_add_x_forwarded_for；

proxy_set_header X-Forwarded-Proto$scheme；

}

after configuration, the nginx is restarted, and the pictures uploaded by the user through the webpage are saved to a designated folder on the virtual machine.

Preferably, the system further comprises:

Preferably, the detector employs a photon counting X-ray detector.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The cargo image recognition method based on the convolutional neural network is characterized by comprising the following steps of:

acquiring an X-ray image of a container to be inspected;

2. The method of claim 1, wherein the box and K-means based similar image detection is specified as follows:

3. The method of claim 1, wherein the U-Net architecture based 2D convolutional neural network model comprises:

4. The method of claim 3, wherein the encoder is VGG or res net.

5. The method of claim 3, wherein the output of each convolution and transpose convolution layer in the U-Net architecture based 2D convolutional neural network model uses batch normalization for accelerating convergence and yielding better results.

6. The method of any one of claims 1 to 5, further comprising:

7. A cargo image recognition system based on convolutional neural network, comprising: a processor and a memory;

the memory is used for storing computer execution instructions;

the processor for executing the computer-executable instructions such that the method of any one of claims 1 to 6 is performed.

8. The system of claim 7, wherein a Spring Boot application is deployed on the server using Nginx as a reverse proxy, the Spring Boot application listens only to the local loop address of the virtual machine for access only by the Nginx proxy, and the picture file is able to be uploaded to the server-specified folder.

9. The system of claim 7 or 8, wherein the system further comprises:

10. The system of claim 9, wherein the detector employs a photon counting X-ray detector.