CN115311632A

CN115311632A - Vehicle weight recognition method and device based on multiple cameras

Info

Publication number: CN115311632A
Application number: CN202210961626.3A
Authority: CN
Inventors: 卢鑫; 贾军营
Original assignee: Shenyang Fengchi Software Co ltd
Current assignee: Shenyang Fengchi Software Co ltd
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-11-08

Abstract

The invention provides a vehicle weight recognition method and device based on multiple cameras. The method comprises the steps of obtaining an initial image, carrying out target detection, obtaining the category of a target, extracting a target vehicle image in the initial image and storing the target vehicle image in a target image library; storing the vehicle image to be matched to an image library to be matched; selecting a target vehicle image from a target image library for target segmentation; introducing various vehicle characteristics for coding the target vehicle image to obtain coding information; inputting the coding information into a multi-scale hierarchical feature extraction network based on a Transformer to obtain global features; extracting local features according to the convolutional neural network and the target segmentation result; and calculating the approximation degree between the target vehicle image and all the vehicle images to be matched in the image library to be matched according to the global features and the local features, and re-identifying the target vehicle. In this way, the heavy matching of the vehicle under the multiple cameras is realized, the detection efficiency and the quality are improved, the manpower and the material resources are saved, and the cost and the resources are saved.

Description

Vehicle weight recognition method and device based on multiple cameras

Technical Field

The present invention relates generally to the field of vehicle identification, and more particularly, to a method and apparatus for vehicle weight identification based on multiple cameras.

Background

The multi-camera vehicle weight recognition means that the same vehicle is found out at different times in a plurality of cameras. By analyzing the cameras in which the vehicle appears, the driving track of the vehicle can be judged according to the positions of the cameras, so that the vehicle can be tracked.

At present, vehicle tracking identification under multiple cameras mainly depends on manual analysis, workers are arranged to carefully check images shot by each camera, and the same vehicle to be tracked is found out from each camera. In recent years, there are also technologies for implementing vehicle heavy identification by using a computer vision method, including:

1. the license plate of the vehicle is used for recognizing the vehicle weight, the license plates of the vehicle are recognized under different cameras, when the same license plate is detected under different cameras, the vehicle is judged to be the same vehicle, the driving track of the vehicle can be judged according to the positions of the cameras, and the vehicle weight recognition is realized.

2. The vehicle weight recognition is carried out by utilizing information such as the appearance color of the vehicle, and the vehicle with the closest color is recognized under different cameras to be determined as the same vehicle. The driving track of the vehicle can be judged according to the positions of the cameras, and vehicle weight recognition is achieved.

The existing manual detection technology needs to arrange a large number of workers to carefully check shot images and find out the same vehicle shot under different cameras, so that a large amount of manpower, material resources and financial resources are consumed. The biggest defect of vehicle re-identification by using license plate information is that when the license plate is shielded and the like, and a camera cannot shoot the license plate or shoot the license plate but cannot identify the license plate, the vehicle re-identification fails due to the fact that the license plate information cannot be obtained, and the same vehicle cannot be found out under multiple cameras. The biggest deficiency of vehicle weight recognition by using vehicle color information is that under different cameras, the colors of vehicles may be different under different illumination environments due to different environmental conditions such as illumination around the cameras, and therefore, many detection errors may occur when the vehicle appearance colors are used as the basis for judging the vehicle weight recognition.

Disclosure of Invention

According to an embodiment of the invention, a multi-camera based vehicle weight recognition scheme is provided. The scheme replaces the traditional manual detection method, can solve the problem of low accuracy of vehicle weight identification based on the vehicle license plate, realizes vehicle weight matching under multiple cameras, improves detection efficiency and quality, saves manpower and material resources, and saves cost and resources.

In a first aspect of the invention, a multi-camera based vehicle weight recognition method is provided. The method comprises the following steps:

acquiring initial images acquired by a plurality of cameras;

carrying out target detection on the initial image by using a target detection algorithm, acquiring the category of a target, extracting a target vehicle image in the initial image and storing the target vehicle image in a target image library; storing the vehicle image to be matched to an image library to be matched;

performing target segmentation on the images in the target image library and the image library to be matched to obtain a target segmentation result;

coding the images in the target image library and the image library to be matched, and introducing various vehicle characteristics to code the vehicle to obtain coding information of the images;

inputting the coding information of the image into a multi-scale hierarchical feature extraction network based on a Transformer to obtain the global features of the image; extracting local features of the image according to the convolutional neural network and the target segmentation result;

and acquiring a target vehicle image from the target image library, calculating the approximation degree between the target vehicle image and all the vehicle images to be matched in the image library to be matched according to the global characteristic and the local characteristic of the target vehicle image, and re-identifying the target vehicle according to the approximation degree.

Further, the acquiring of the initial images acquired by the plurality of cameras includes:

selecting a plurality of cameras, wherein one camera is used as a collecting camera and is used for acquiring a target image; the other cameras are used as cameras to be matched and used for acquiring images to be matched;

and setting an image acquisition time interval, and acquiring the image to be matched and the target image as initial images according to the time interval.

Further, the target detection is carried out on the initial image by using a target detection algorithm, the category of the target is obtained, and a target vehicle image in the initial image is extracted and stored in a target image library; and storing the vehicle image to be matched to an image library to be matched, wherein the method comprises the following steps:

inputting the initial image into a deep learning convolutional neural network for target detection, outputting position information and belonging category information of all vehicles in the initial image in the corresponding initial image, and marking all vehicles in the initial image by using a rectangular frame;

and extracting the part in the rectangular frame to obtain a target vehicle image and a vehicle image to be matched.

Further, the target segmentation is performed on the images in the target image library and the image library to be matched to obtain a target segmentation result, and the method includes:

and utilizing an image segmentation algorithm to segment each image in the target image library and the image library to be matched to obtain a plurality of region segmentation blocks as a target segmentation result of each image, wherein the region segmentation blocks are used for describing a front region, a roof region, a side region, a tail region and a background region in the image.

Further, the encoding the images in the target image library and the image library to be matched to obtain the encoding information of the images includes:

dividing each image in the target image library and the image library to be matched into a plurality of image blocks with equal areas, and coding according to the positions of the image blocks in the corresponding images to obtain position coding information of the image blocks;

performing convolution and linear normalization processing on the image information of the image block to obtain image coding information;

obtaining target category coding information according to the category to which the target belongs;

obtaining target segmentation result coding information according to the target segmentation result;

and taking the sum of the image coding information, the target category coding information, the target segmentation result coding information and the position coding information of the image block as the coding information of the current image.

Further, the Transformer-based feature extraction network comprises a first Transformer structural layer, a second Transformer structural layer, a third Transformer structural layer and a fourth Transformer structural layer, wherein the first Transformer structural layer, the second Transformer structural layer and the fourth Transformer structural layer respectively comprise 2 Transformer structures; the fourth Transformer structural layer comprises 6 Transformer structures; and 1 upsampling layer is arranged between adjacent Transformer structural layers.

Further, extracting local features of the image according to the convolutional neural network and the target segmentation result, comprising:

extracting the image characteristics of each image in the target image library and the image library to be matched through a ResNet convolution neural network;

respectively manufacturing masks of corresponding parts according to the target segmentation result;

multiplying the image characteristics of each image with the mask of the corresponding part, and then carrying out linear transformation to obtain the local characteristics of the current image.

Further, the calculating the approximation degree between the target vehicle image and all the vehicle images to be matched in the image library to be matched according to the global features and the local features of the target vehicle image includes:

traversing all vehicle images to be matched in the image library to be matched by the target vehicle image, calculating the Euclidean distance of the local features of the corresponding parts of the target vehicle image and the vehicle images to be matched, and accumulating the weights of the corresponding parts to obtain a plurality of local feature distance values; calculating the Euclidean distance of the global features of the target vehicle image and the vehicle image to be matched to obtain a plurality of global feature distance values;

and accumulating the local characteristic distance values and the global characteristic distance values corresponding to the same vehicle image to be matched to obtain a plurality of characteristic distance values between the target vehicle image and the vehicle image to be matched.

Further, the re-identifying the target vehicle according to the approximation degree includes:

obtaining the minimum value of the plurality of characteristic distance values;

if the minimum value of the plurality of characteristic distance values is smaller than a preset threshold value, the target vehicle is successfully re-identified, and the vehicle image which is successfully re-identified is deleted from the corresponding image library;

if the minimum value in the characteristic distance values is not smaller than a preset threshold value, the target vehicle is failed to be identified again, and the local characteristic information, the global characteristic information and the mask information of the target vehicle are stored

In a second aspect of the invention, an electronic device is provided. The electronic device at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect of the invention.

It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of any embodiment of the invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

The above and other features, advantages and aspects of various embodiments of the present invention will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:

FIG. 1 shows a flow diagram of a multi-camera based vehicle weight recognition method according to an embodiment of the invention;

FIG. 2 shows a schematic diagram of an improved YOLOv4 network structure according to an embodiment of the invention;

FIG. 3 illustrates an initial image schematic of a marked rectangular box according to an embodiment of the invention;

FIG. 4 shows a schematic diagram of image segmentation results according to an embodiment of the invention;

FIG. 5 shows a schematic diagram of a mask made from the result of image segmentation according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a Transformer-based feature extraction network structure according to an embodiment of the present invention;

FIG. 7 shows a schematic structural diagram of a Transformer layer according to an embodiment of the invention;

FIG. 8 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the present invention;

here, 800 is an electronic apparatus, 801 is a CPU, 802 is a ROM, 803 is a RAM, 804 is a bus, 805 is an I/O interface, 806 is an input unit, 807 is an output unit, 808 is a storage unit, and 809 is a communication unit.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The invention replaces the traditional manual detection method, can solve the problem of low accuracy of vehicle weight identification based on the vehicle license plate, realizes vehicle weight matching under multiple cameras, improves the detection efficiency and quality, saves manpower and material resources, and saves cost and resources.

Fig. 1 shows a flowchart of a multi-camera based vehicle weight recognition method according to an embodiment of the present invention.

The method comprises the following steps:

s101, acquiring initial images acquired by a plurality of cameras.

As an embodiment of the present invention, the acquiring initial images collected by a plurality of cameras includes:

firstly, selecting a plurality of cameras, wherein one camera is arbitrarily selected as a collecting camera for obtaining a target image; and the other cameras are used as the cameras to be matched and used for acquiring images to be matched.

Then, setting an image acquisition time interval, and acquiring an image to be matched and a target image as an initial image according to the time interval.

In this embodiment, several numbers may be represented by N, where N is a positive integer not less than 2. The camera captures one image every q seconds, so that N images can be acquired every q seconds. Any one camera is selected as a collecting camera, and the rest N-1 cameras are used as cameras to be matched. Every other time interval q seconds, 1 target image and N-1 images to be matched can be acquired.

S102, performing target detection on the initial image by using a target detection algorithm, extracting a target vehicle image in the initial image and storing the target vehicle image in a target image library; and storing the vehicle image to be matched to an image library to be matched.

As an embodiment of the present invention, the performing target detection on the initial image by using a target detection algorithm, obtaining a category to which a target belongs, extracting a target vehicle image in the initial image, and storing the target vehicle image in a target image library includes:

inputting the initial image into a deep learning convolutional neural network for target detection, outputting position information of all target vehicles in the initial image in the corresponding initial image, and marking out the target vehicles in the initial image and the vehicles to be matched in the images to be matched by using rectangular frames; and extracting the part in the rectangular frame to obtain a target vehicle image and a vehicle image to be matched.

In this embodiment, the deep learning convolutional neural network for target detection is a YOLO v4 network structure.

As an embodiment of the present invention, as shown in fig. 2, improvement can be made on the basis of YOLO v4 algorithm, and the improved YOLO v4 network structure is formed by: CSPDarknet5, CBL, SPP, upSample, conv and 3 output modules.

The CSPDarknet53 network is composed of CSP and Darknet 53; CSP is Cross, stage, partial respectively; the CSP can enhance the learning ability of the CNN, keep the accuracy while lightening the weight, and reduce the calculation difficulty and the memory cost. Darknet53 is a Darknet-based network structure that contains 5 large residual network blocks, each of which contains a certain number of residual network structures. The CSPDarknet53 adds CSP structure to each large residual block of Darknet 53.

The CBL consists of Conv + BN + Leaky _ relu activation functions, wherein Conv represents a convolutional layer, BN represents normalization, and Leaky _ relu is an activation function.

The SPP is a spatial pyramid pooling layer, can generate fixed output for input with any size, and solves the problem of image deformation error caused by non-proportional compression of an input image.

UpSample denotes an upsampling operation on data.

The 3 output modules are 76 + 3 (4 +1+ class _num), 38 + 3 (4 +1+ class _num) and 19 + 3 (4 +1+ class _num), respectively.

In the above embodiment, the improved YOLO v4 network structure adds two SPP modules in the YOLO v4 network, so that the error of image deformation is reduced, and the detection capability of detecting small targets is improved.

For N initial images acquired in each acquisition cycle, firstly, performing target detection on 1 target image, acquiring vehicle position information, and marking w detected target vehicles in the target image by using a rectangular frame, as shown in fig. 3, where w is a positive integer not less than 1. Extracting the image in the rectangular frame to obtain w target vehicle images, and recording the w target vehicle images as P _0w . Will P _0w And putting the vehicle weight data into a target image library for subsequent vehicle weight recognition and matching. Secondly, carrying out target detection on N-1 images to be detected, and setting M for detection on the ith image to be detected _i Marking the vehicle to be detected by a rectangular frame, extracting the image in the rectangular frame, and regarding the jth image, marking as P _ij (i＝1～N-1；j＝1～M _i ). And putting all the vehicle images to be matched into an image library to be matched for subsequent vehicle re-identification matching with the target images in the target image library.

In the present embodiment, a total of S vehicle images are extracted. Wherein the content of the first and second substances,

wherein S is the number of the extracted vehicle images; w is the number of target vehicles in the target image; m _i The number of the vehicles to be detected in the ith image to be detected is obtained; and N is the number of the vehicle images acquired in each acquisition period.

S103, performing target segmentation on the images in the target image library and the image library to be matched to obtain a target segmentation result.

As an embodiment of the present invention, the images in the target image library and the image library to be matched are segmented by using an image segmentation algorithm, and each image obtains a plurality of region segmentation blocks as a target segmentation result, where the region segmentation blocks are used for describing a front region, a roof region, a side region, a tail region and a background region in the image.

In the embodiment, the image segmentation can adopt an image segmentation algorithm based on deep learning, such as Deeplab series, PSPNet network, U-Net + + network, and the like.

In this embodiment, the vehicle image may be divided into 4 parts, respectively the front, side, top and rear of the vehicle. Wherein the front part of the vehicle can be a part displayed by taking the vehicle head as a front view; the side portions of the vehicle may be portions shown in front view on both side portions of the vehicle, the top portion of the vehicle may be portions shown in front view on the roof, and the rear portion of the vehicle may be portions shown in front view on the rear of the vehicle. Wherein the divided front, side, top and rear portions do not have a repetition area and cover the entire portion of the vehicle body. The remaining part of the image is the background part of the image, which is not taken into account when calculating the local features. As shown in fig. 4, where fig. 4 (a) is a vehicle image and fig. 4 (b) is an image segmentation result, including the front, side and top of the vehicle.

And S104, coding the images in the target image library and the image library to be matched to obtain coding information of the images.

As an embodiment of the present invention, encoding an image includes:

dividing each image in a target image library and an image library to be matched into a plurality of image blocks with equal areas, and coding according to the positions of the image blocks in corresponding images to obtain position coding information of the image blocks;

In the above embodiment, the sizes of the images obtained by the target detection are unified into 224 × 3.

Position coding of image blocks:

the vehicle image is divided into image blocks with equal areas, the width of the input image is 224, the height of the input image is 224, and the number of image channels is 3. The input image is first partitioned into 4*4 as a window, i.e., the 224 × 224 image is partitioned into 56 × 56=3136 image blocks. And carrying out position coding on the 3136 image blocks from 0 to 3135 to obtain position coding information of the image blocks.

Image coding:

and (3) performing convolution and linear normalization operation on the input image by taking the convolution kernel size as 4 and the number of channels as 96 to obtain an output result of Bx96-56-96, and finishing the output result into Bx3136-96, namely image coding information.

Target class encoding:

the target detection divides the vehicles into 9 categories, namely, sedan (two-carriage), SUV, van, sedan (three-carriage), MPV, pickup truck, bus, truck and multifunctional vehicle. Their corresponding class IDs are 0 to 8, respectively. And coding by adopting the class ID according to the vehicle class to obtain target class coding information.

And (3) encoding an image segmentation result:

in the above-described embodiment, the image of the vehicle has been subjected to object segmentation by an image segmentation algorithm, dividing the vehicle into 5 parts, i.e., the front region of the vehicle, the roof region, the side region of the vehicle, the rear region of the vehicle, and the background region. And counting the area occupied by each part in the image. Binary coding is carried out in the sequence of the front part of the vehicle, the top part of the vehicle, the side part of the vehicle and the tail part of the vehicle, if 4 parts are not detected, the code is 0000, and the corresponding decimal code is 0. The vehicle front region area is S1, the roof region area is S2, the vehicle side region area is S3, and the vehicle rear region area is S4. When S1 is greater than 0.2 × (S1 + S2+ S3+ S4), the front of the vehicle is considered to be detected, and the binary code is 1000, i.e., the decimal code is 8. If the side portion and the rear portion of the vehicle are detected simultaneously in one image of the target vehicle, the binary code is 0011, i.e., the decimal code is 3. And counting the segmentation results of all the images, wherein each image corresponds to one digital code.

And expanding the feature codes to a uniform dimension, adding the feature codes in the same dimension to obtain the integral feature code information of the target vehicle image, and inputting the integral feature code information into a feature extraction network based on a Transformer.

S105, inputting the coding information of the target vehicle image into a transform-based feature extraction network to obtain the global features of the image; and extracting local features of the image according to the convolutional neural network and the target segmentation result.

Specifically, in the embodiment, a transform technology is introduced to extract the global target, so that the method has excellent performance in feature extraction, solves the problem that a convolutional neural network is easy to lose detail features in small-size image feature extraction, and improves the capability of re-identifying small-size images.

In this embodiment, as shown in fig. 6, the Transformer-based feature extraction network includes a first Transformer structural layer, a second Transformer structural layer, a third Transformer structural layer, and a fourth Transformer structural layer, where the first Transformer structural layer, the second Transformer structural layer, and the fourth Transformer structural layer respectively include 2 Transformer structures; the fourth Transformer structural layer comprises 6 Transformer structures; and 1 upper sampling layer is arranged between adjacent Transformer structural layers.

The coded information of a picture is first input into a first Transformer layer, which is composed of two Transformer structures, wherein the Transformer structures are shown in fig. 7. The vehicle image is both 224 long and wide, so the input size is B56 x 96. The output after the first Transformer layer is also B56 x 96. And then input to a first upsampling layer, which is used to perform a field-of-view expansion operation in order to obtain a larger field-of-view. The convolution and linear transformation are carried out on the output result, the width and the height of the output are respectively reduced by half compared with the input, and the number of channels is doubled. And then sequentially passing through a second transform layer, an upper sampling layer, a third transform layer, an upper sampling layer and a fourth transform layer, performing transform layer transformation for 3 times and field expansion operation for 2 times of the upper sampling layer, and outputting the final global characteristics. The transform structure is shown in fig. 7, and includes two linear transformation layers, a Mulit-headed Self-attention layer, a multi-layer sensor, and the like.

Through the process, the global feature information of a plurality of images can be acquired, but the vehicle weight is difficult to accurately identify under different cameras only through the global information of the images, so that the local features of the images are introduced in addition to the global features of the images.

As an embodiment of the present invention, the extracting local features of an image according to a convolutional neural network and the target segmentation result includes:

extracting the characteristics of a plurality of target vehicle images through a ResNet convolution neural network;

and multiplying the image characteristics of each image by the corresponding part of the mask, and then performing linear transformation to obtain the local characteristics of the target vehicle image.

In this embodiment, the manufacturing the masks of the corresponding portions according to the target segmentation result respectively includes:

and respectively manufacturing masks at the front part, the side part, the top part and the tail part of the vehicle according to the target segmentation result. If the image is divided into portions where no vehicle is detected, the mask for the corresponding portions is blank. As shown in fig. 5, in which fig. 5 (a) shows a mask of a front portion of the cart, fig. 5 (b) shows a mask of a rear portion of the cart, fig. 5 (c) shows a mask of a ceiling portion of the cart, and fig. 5 (d) shows a mask of a side portion of the cart. It can be seen that, in the vehicle image, since the vehicle tail part is not detected by image segmentation, the mask corresponding to the vehicle tail part is almost empty.

Each mask is 32 x 32 in size, so a total of S x 4 x 32 masks are formed.

In the embodiment, the characteristics of S vehicle images are extracted through a convolutional neural network; the output layer dimension of the convolutional neural network is 32 x 32, so the total number of features output is S x 32.

According to the above embodiment, for each vehicle image, the corresponding 4 masks are multiplied by the features with the size of 32 × 32, so as to obtain 4 × 32 features, and the features are changed into 4 × 516 features after linear transformation. Thus, for S vehicle images, there are S × 4 × 516 local features.

S106, obtaining a target vehicle image from the target image library, calculating the approximation degree between the target vehicle image and all the vehicle images to be matched in the image library to be matched according to the global features and the local features of the target vehicle image, and re-identifying the target vehicle according to the approximation degree.

As an embodiment of the present invention, acquiring a target vehicle image from the target image library, and calculating an approximation degree between the target vehicle image and all vehicle images to be matched in the image library to be matched according to a global feature and a local feature includes:

traversing all vehicle images to be matched in the image library to be matched by the target vehicle image, calculating Euclidean distances of local features of corresponding parts of the target vehicle image and the vehicle images to be matched, and accumulating the weights of the corresponding parts to obtain a plurality of local feature distance values; and the number of the local characteristic distance values is the number of the vehicle images to be matched in the traversal image library to be matched.

In this embodiment, for any vehicle image to be matched in the image library to be matched, the euclidean distance between the target vehicle image and the local feature of the corresponding portion of the vehicle image to be matched is calculated, and the weight of the corresponding portion is accumulated to obtain a local feature distance value. For example, the target vehicle library has S ₁ A vehicle image, the image library to be matched has S ₂ Selecting one vehicle image N from the target vehicle library ₃ The vehicle image N ₃ Respectively with S in the image library to be matched ₂ Calculating the Euclidean distance of local features by using the vehicle image, and if the vehicle image N is used ₃ Divided into 4 parts, a total of 4*S results ₂ The Euclidean distance values are weighted and calculated according to the respective partial weights to obtain S ₂ Local feature distance values.

When comparing local features of two images, the weights of different local features should be different. The weight calculation formula is as follows:

wherein the content of the first and second substances,

and

areas of the ith part in the target vehicle image and the vehicle image to be matched are respectively;

the weight of the ith part in the target vehicle image and the vehicle image to be matched is taken as the weight of the ith part in the target vehicle image and the vehicle image to be matched; z is the number of parts of the image after segmentation.

The same vehicle is under different cameras, and the size of each local mask is different from the size proportion of the whole vehicle. For example, there are some car images that have front, top, and side masks of the car, where the front and side masks of the car are larger in area, and the back masks are almost absent. Also for example, there are some vehicle images that have rear, top, and side masks of the vehicle, where the rear and side masks of the vehicle are larger in area, and the front mask is almost absent. Therefore, when local feature comparison is performed, the local feature information of the overlapped part (such as the top part, the side part and the like) is considered emphatically, and the weight of the feature comparison of the part is increased; and for parts with very low repetition (e.g., front, back, etc.), the weight should be reduced. Through the above calculation formula, a reasonable weight can be calculated for comparison of subsequent local features.

Further, traversing all the vehicle images to be matched in the image library to be matched, and calculating the Euclidean distance of the global features of the target vehicle image and the vehicle images to be matched as a global feature distance value. The number of the characteristic distance values is the same as the number of the vehicle images to be matched in the traversal image library to be matched.

In the present embodiment, the target vehicle library has S ₁ Sheet vehicleImage, the image library to be matched has S ₂ Selecting one vehicle image N from the target vehicle library ₃ The vehicle image N ₃ Respectively with S in the image library to be matched ₂ Calculating the Euclidean distance of the global features by using the image of the vehicle to obtain S ₂ A global feature distance value.

And further, accumulating the local characteristic distance values and the global characteristic distance values corresponding to the same vehicle image to be matched to obtain a plurality of characteristic distance values between the target vehicle image and the vehicle image to be matched.

In this embodiment, the feature distance value between the target vehicle image and the vehicle image to be matched is calculated as follows:

feature distance value =0.5 local feature distance value +0.5 global feature distance value

In the above embodiment, the euclidean distances of the global feature and the local feature are calculated for the two images, respectively, and are added to obtain the feature distance value between the two images. The characteristic distance value between the target vehicle image and the vehicle image to be matched is used for representing the approximation degree between the two vehicle images, and the smaller the characteristic distance value is, the higher the approximation degree is; conversely, the larger the characteristic distance value, the lower the degree of approximation.

As an embodiment of the present invention, the re-identifying the target vehicle according to the degree of approximation includes:

obtaining the minimum value of the plurality of characteristic distance values;

and if the minimum value of the plurality of characteristic distance values is smaller than a preset threshold value, the target vehicle is successfully re-identified, and the vehicle image which is successfully re-identified is deleted from the corresponding image library.

In the present embodiment, the target vehicle library has S ₁ A vehicle image, the image library to be matched has S ₂ A vehicle image. For any target vehicle image in the target image library, S is obtained ₂ A characteristic distance value. Obtaining S ₂ Judging whether the minimum value is smaller than a preset threshold value, wherein the preset threshold value is set between 0.45 and 0.6, for exampleIf the minimum value is smaller than the preset threshold value, the approximation degree of the two vehicle images is high, and the two vehicle images are successfully paired, namely the vehicle weight identification is successful.

And if the vehicle re-identification is successful, recording the camera information of the image with the successful vehicle re-identification, and deleting the image with the successful vehicle re-identification from the database.

As another embodiment of the present invention, the re-identifying the target vehicle according to the approximation degree further includes:

obtaining the minimum value of the characteristic distance values;

and if the minimum value in the characteristic distance values is not smaller than a preset threshold value, the target vehicle is failed to be identified again, and the local characteristic information, the global characteristic information and the mask information of the target vehicle are stored for the subsequent vehicle identification.

In the above embodiment, S is obtained for the target vehicle image of any one of the target image libraries ₂ A characteristic distance value. Obtaining S ₂ And judging whether the minimum value is smaller than a preset threshold value or not, wherein the preset threshold value is set between 0.45 and 0.6, and if the minimum value is not smaller than the preset threshold value, namely is larger than or equal to the preset threshold value, the approximation degree of the two vehicle images is low, and the two vehicle images are unsuccessfully paired, namely the vehicle re-identification fails.

And in the next image acquisition period, obtaining the local features and the global features of the newly extracted image, putting the local features and the global features together with the features which are not successfully re-identified in the previous period, calculating the approximation degree of the image features again, and performing vehicle re-identification.

According to the embodiment of the invention, the vehicle weight matching under multiple cameras is intelligently realized by adopting a computer vision algorithm, the traditional manual detection method is replaced, and manpower and material resources are saved. And meanwhile, the method does not depend on basic information such as license plates, vehicle body colors and the like.

According to the embodiment of the invention, the local feature and the global feature are respectively extracted, and the two features are combined for vehicle weight identification. The performance and efficiency of vehicle weight identification are greatly improved.

According to the embodiment of the invention, a Transformer technology is introduced for global target extraction, so that the method has excellent performance when extracting features, solves the problem that detailed features are easy to lose when a convolutional neural network extracts the features of small pictures, and improves the capability of re-identifying the small-size pictures.

According to the embodiment of the invention, when the local features are extracted, the technology of target segmentation and local feature extraction is introduced, and the feature weight is added, so that the capability of local feature comparison is improved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required to practice the invention.

In the technical scheme of the invention, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations without violating the good customs of the public order.

According to an embodiment of the invention, the invention further provides an electronic device.

FIG. 8 shows a schematic block diagram of an electronic device 800 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

The device 800 comprises a computing unit 801 which may perform various suitable actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the methods S101 to S105. For example, in some embodiments, methods S101-S105 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the methods S101-S105 described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the methods S101-S105 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A vehicle weight recognition method based on multiple cameras is characterized by comprising the following steps:

acquiring initial images acquired by a plurality of cameras;

and acquiring a target vehicle image from the target image library, calculating the approximation degree between the target vehicle image and all vehicle images to be matched in the image library to be matched according to the global features and the local features of the target vehicle image, and re-identifying the target vehicle according to the approximation degree.

2. The method of claim 1, wherein the acquiring initial images acquired by a plurality of cameras comprises:

3. The method according to claim 2, wherein the target detection is performed on the initial image by using a target detection algorithm, a category to which a target belongs is obtained, a target vehicle image in the initial image is extracted and stored in a target image library; and storing the vehicle image to be matched to an image library to be matched, wherein the method comprises the following steps:

4. The method according to claim 1, wherein the performing the target segmentation on the images in the target image library and the image library to be matched to obtain a target segmentation result comprises:

5. The method according to claim 1, wherein the encoding the images in the target image library and the image library to be matched to obtain the encoding information of the images comprises:

6. The method of claim 1, wherein the Transformer-based feature extraction network comprises a first Transformer structural layer, a second Transformer structural layer, a third Transformer structural layer, and a fourth Transformer structural layer, wherein the first Transformer structural layer, the second Transformer structural layer, and the fourth Transformer structural layer each comprise 2 Transformer structures; the fourth Transformer structural layer comprises 6 Transformer structures; and 1 upper sampling layer is arranged between adjacent Transformer structural layers.

7. The method of claim 1, wherein extracting local features of an image from a convolutional neural network and the target segmentation result comprises:

8. The method according to claim 1, wherein the calculating the degree of approximation between the target vehicle image and all the vehicle images to be matched in the image library to be matched according to the global features and the local features of the target vehicle image comprises:

9. The method of claim 8, wherein said re-identifying the target vehicle based on the degree of approximation comprises:

obtaining the minimum value of the characteristic distance values;

if the minimum value in the plurality of characteristic distance values is not smaller than a preset threshold value, the target vehicle is failed to be identified again, and the local characteristic information, the global characteristic information and the mask information of the target vehicle are stored.

10. An electronic device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.