CN112037520A

CN112037520A - Road monitoring method and system and electronic equipment

Info

Publication number: CN112037520A
Application number: CN202011220145.4A
Authority: CN
Inventors: 程舒通; 何振华
Original assignee: Hangzhou Polytechnic
Current assignee: Hangzhou Zhenggeng Technology Co ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2020-12-04
Anticipated expiration: 2040-11-05
Also published as: CN112037520B

Abstract

The application discloses a road monitoring method based on object re-identification with a channel attention mechanism. The method comprises the following steps: acquiring a first street view image and a second street view image at a preset time interval; inputting the first street view image and the second street view image into a convolutional neural network respectively to obtain a first characteristic graph and a second characteristic graph; inputting the first feature map and the second feature map into a channel attention network respectively to obtain a first channel feature vector and a second channel feature vector; fusing the first feature map with the first channel feature vector to obtain a first fused feature map and fusing the second feature map with the second channel feature vector to obtain a second fused feature map, respectively; cascading the first and second fused feature maps according to channels to obtain a classification feature map; and classifying the classification characteristic graph by a classification function to obtain a classification result. In this way, the monitoring result of whether or not the object violating the road management regulation is included in the street view image is accurately obtained based on the channel attention mechanism.

Description

Road monitoring method and system and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence technology, and more particularly, to a road monitoring method, system and electronic device based on object re-identification with a channel attention mechanism.

Background

The intelligent city effectively fuses information technology and advanced city operation service concepts, and provides a more convenient, efficient and flexible innovative service mode for public management for the city by carrying out digital network management on the geography, resources, environment, economy and the like of the city.

In the field of urban management, it is necessary to monitor roads to detect objects violating the regulations for road management, for example, the objects violating the regulations for road management include mobile vendors who perform road occupation and articles used by the vendors to perform road occupation, such as tables and chairs placed in the road occupation and umbrellas privately placed on both sides of the roads.

Currently, monitoring of objects that violate street management regulations is performed by urban management departments, which is inefficient and difficult to supervise.

In recent years, deep learning, especially monitoring of objects whose development of neural networks violates street management regulations, has provided new solutions and solutions.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. Embodiments of the present application provide a road monitoring method, system and electronic device based on object re-identification with a channel attention mechanism, which convert an object monitoring problem violating a street management rule into an object re-identification problem combining with an object classification problem for acquiring street view images at different times, and process the street view images with a convolutional neural network with a channel attention mechanism to accurately obtain a monitoring result whether the street view images include an object violating the road management rule.

According to one aspect of the present application, there is provided a road monitoring method based on object re-identification with a channel attention mechanism, comprising:

acquiring a first street view image and a second street view image at a preset time interval;

inputting the first street view image and the second street view image into a convolutional neural network respectively to obtain a first feature map and a second feature map;

inputting the first feature map and the second feature map into a channel attention network respectively to obtain a first channel feature vector and a second channel feature vector;

fusing the first feature map with the first channel feature vector to obtain a first fused feature map and fusing the second feature map with the second channel feature vector to obtain a second fused feature map, respectively;

cascading the first fused feature map and the second fused feature map according to channels to obtain a classification feature map; and

classifying the classification feature map by a classification function to obtain a classification result, wherein the classification result is used for indicating whether an object violating a road management regulation is included in the first street view image and the second street view image.

In the above road monitoring method based on object re-identification with a channel attention mechanism, acquiring a first street view image and a second street view image at a predetermined time interval includes: acquiring a street view video through a mobile camera for street patrol; acquiring a first video clip and a second video clip of the same geographic position based on the path information of the mobile camera; and respectively intercepting the first street view image and the second street view image from the first video clip and the second video clip.

In the above road monitoring method based on object re-identification with a channel attention mechanism, inputting the first feature map and the second feature map into a channel attention network respectively to obtain a first channel feature vector and a second channel feature vector, including: inputting the first feature map into a first global pooling layer to obtain a first channel pooling vector; inputting the second feature map into a second global pooling layer to obtain a second channel pooling vector; inputting the first channel pooling vector into a first full-connection layer and activating by a Sigmoid activating function to obtain a first channel characteristic vector; and inputting the second channel pooling vector into a second full-connection layer and activating by a Sigmoid activation function to obtain a second channel feature vector.

In the above road monitoring method based on object re-identification with a channel attention mechanism, respectively fusing the first feature map and the first channel feature vector to obtain a first fused feature map and fusing the second feature map and the second channel feature vector to obtain a second fused feature map, the method includes: multiplying the first feature map by the first channel feature vector by channel to obtain a first fused feature map; and multiplying the second feature map by the second channel feature vector according to channels to obtain a second fused feature map.

In the above road monitoring method based on object re-identification with a channel attention mechanism, respectively fusing the first feature map and the first channel feature vector to obtain a first fused feature map and fusing the second feature map and the second channel feature vector to obtain a second fused feature map, the method includes: multiplying the first feature map by the first channel feature vector by channel to obtain a first weighted feature map; adding the first weighted feature map to the first feature map to obtain the first fused feature map; multiplying the second feature map by the second channel feature vector by channel to obtain a second weighted feature map; and adding the second weighted feature map and the second feature map to obtain the second fused feature map.

In the above road monitoring method based on object re-identification with channel attention mechanism, the convolutional neural network and the classification function are obtained by training a training image set.

In the above road monitoring method based on object re-identification with a channel attention mechanism, the training process includes: calculating an attention loss function value based on the first channel feature vector and the second channel feature vector, the attention loss function value being an average of along-channel positions of a weighted sum of a logarithm of the first channel feature vector and a logarithm of the second channel feature vector; calculating a classification loss function value of the classification characteristic map classified by a classification function; and updating the convolutional neural network and the classification function with backpropagation of gradient descent by minimizing a weighted sum of the attention loss function values and the classification loss function values.

In the above road monitoring method based on object re-identification with a channel attention mechanism, in the training process, a weighting coefficient of a logarithm of the first channel feature vector and a logarithm of the second channel feature vector, and a weighting coefficient of the attention loss function value and the classification loss function value are taken as hyper-parameters.

According to another aspect of the present application, there is provided a road monitoring system based on object re-identification with a channel attention mechanism, comprising:

the street view image acquisition unit is used for acquiring a first street view image and a second street view image at a preset time interval;

the feature map generation unit is used for inputting the first street view image and the second street view image obtained by the street view image acquisition unit into a convolutional neural network respectively to obtain a first feature map and a second feature map;

a channel feature vector generation unit, configured to input the first feature map and the second feature map obtained by the feature map generation unit into a channel attention network respectively to obtain a first channel feature vector and a second channel feature vector;

a fusion unit configured to fuse the first feature map obtained by the feature map generation unit and the first channel feature vector obtained by the channel feature vector generation unit to obtain a first fused feature map and fuse the second feature map obtained by the feature map generation unit and the second channel feature vector obtained by the channel feature vector generation unit to obtain a second fused feature map, respectively;

a classification feature map generation unit, configured to cascade the first fusion feature map and the second fusion feature map obtained by the fusion unit according to a channel to obtain a classification feature map; and

and a classification unit configured to classify the classification feature map obtained by the classification feature map generation unit by a classification function to obtain a classification result, where the classification result is used to indicate whether an object violating a road management regulation is included in the first street view image and the second street view image.

In the above road monitoring system based on object re-identification with a channel attention mechanism, the street view image obtaining unit is further configured to: acquiring a street view video through a mobile camera for street patrol; acquiring a first video clip and a second video clip of the same geographic position based on the path information of the mobile camera; and respectively intercepting the first street view image and the second street view image from the first video clip and the second video clip.

In the above road monitoring system based on object re-identification with a channel attention mechanism, the channel feature vector generation unit includes:

a first pooling subunit for inputting the first feature map into a first global pooling layer to obtain a first channel pooling vector;

a second pooling subunit for inputting the second feature map into a second global pooling layer to obtain a second channel pooling vector;

the first channel vector transformation unit is used for inputting the first channel pooling vector into a first full-connection layer and activating the first channel pooling vector by a Sigmoid activation function to obtain a first channel characteristic vector; and

and the second channel vector transformation unit is used for inputting the second channel pooling vector into a second full-connection layer and activating the second channel pooling vector by using a Sigmoid activation function to obtain a second channel feature vector.

In the above road monitoring system based on object re-identification with channel attention mechanism, the fusion unit is further configured to: multiplying the first feature map by the first channel feature vector by channel to obtain a first fused feature map; and multiplying the second feature map by the second channel feature vector according to channels to obtain a second fused feature map.

In the above road monitoring system based on object re-identification with a channel attention mechanism, the fusion unit includes:

a first weighted feature map generating subunit, configured to multiply the first feature map by the first channel feature vector according to a channel to obtain a first weighted feature map;

a first fusion subunit, configured to add the first weighted feature map and the first feature map to obtain the first fused feature map;

a second weighted feature map generating subunit, configured to multiply the second feature map by the second channel feature vector according to a channel to obtain a second weighted feature map; and

and the second fusion subunit is used for adding the second weighted feature map and the second feature map to obtain the second fusion feature map.

In the above road monitoring system based on object re-identification with channel attention mechanism, the convolutional neural network and the classification function are obtained by training a training image set.

In the above road monitoring system based on object re-identification with channel attention mechanism, further comprising a training unit for:

calculating an attention loss function value based on the first channel feature vector and the second channel feature vector, the attention loss function value being an average of along-channel positions of a weighted sum of a logarithm of the first channel feature vector and a logarithm of the second channel feature vector;

calculating a classification loss function value of the classification characteristic map classified by a classification function; and

updating the convolutional neural network and the classification function with backpropagation of gradient descent by minimizing a weighted sum of the attention loss function values and the classification loss function values.

In the above road monitoring system based on object re-identification with a channel attention mechanism, in the training process, the weighting coefficients of the logarithm of the first channel feature vector and the logarithm of the second channel feature vector and the weighting coefficients of the attention loss function value and the classification loss function value are taken as hyper-parameters.

According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory in which are stored computer program instructions which, when executed by the processor, cause the processor to perform a road monitoring method based on object re-identification with a channel attention mechanism as described above.

According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a road monitoring method based on object re-identification with channel attention mechanism as described above.

Compared with the prior art, the road monitoring method, the road monitoring system and the electronic equipment based on object re-identification with the channel attention mechanism convert the object monitoring problem violating the street management regulations into the object re-identification problem of acquiring street view images at different times and combining the object classification problem, and process the street view images by the convolutional neural network with the channel attention mechanism so as to accurately obtain the monitoring result of whether the object violating the road management regulations is included in the street view images.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 illustrates an application scenario diagram of a road monitoring method based on object re-identification with a channel attention mechanism according to an embodiment of the present application.

FIG. 2 illustrates a flow diagram of a road monitoring method based on object re-identification with a channel attention mechanism according to an embodiment of the present application.

Fig. 3 illustrates an architecture diagram of a road monitoring method based on object re-identification with a channel attention mechanism according to an embodiment of the present application.

Fig. 4 illustrates a flowchart of inputting the first feature map and the second feature map into a channel attention network to obtain a first channel feature vector and a second channel feature vector, respectively, in a road monitoring method based on object re-identification with a channel attention mechanism according to an embodiment of the present application.

Fig. 5 is a schematic diagram illustrating an architecture in which the first feature map and the second feature map are respectively input into a channel attention network to obtain a first channel feature vector and a second channel feature vector in a road monitoring method based on object re-identification with a channel attention mechanism according to an embodiment of the present application.

FIG. 6 illustrates a block diagram of a road monitoring system based on object re-identification with channel attention mechanism according to an embodiment of the present application.

Fig. 7 illustrates a block diagram of a channel feature vector generation unit in a road monitoring system based on object re-identification with a channel attention mechanism according to an embodiment of the present application.

FIG. 8 illustrates a block diagram of a fusion unit in a road monitoring system based on object re-identification with channel attention mechanism according to an embodiment of the application.

FIG. 9 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Overview of a scene

As described above, in the field of urban management, it is necessary to monitor roads to detect objects violating the regulations for road management, for example, the objects violating the regulations for road management include mobile vendors who perform road occupation and articles used by the vendors to perform road occupation, such as tables and chairs placed in the road occupation and umbrellas put aside on both sides of the roads. In recent years, deep learning, especially monitoring of objects whose development of neural networks violates street management regulations, has provided new solutions and solutions.

When detecting these objects violating the road management regulations, on the one hand, the types of the objects may involve a plurality of types such as persons and articles; on the other hand, other objects which are accidentally present in the road area need to be taken into account.

In view of the above, the inventors of the present application have converted the detection of such objects into an object re-identification problem in which street view images are taken at different times in combination with an object classification problem, that is, by determining the same illegal object in street view images taken at different times, it is determined that the illegal object belongs to an object associated with long-time lane change that violates the regulations for road management, to be distinguished from other objects that occasionally appear in the street view images.

In addition, in the present application, there are many types of objects to be detected, and it is necessary to enhance feature extraction for various types of objects in the process of extracting features by the convolutional neural network. In consideration of the fact that different channels of the feature map play different roles in specifying the objects, a channel attention mechanism is introduced into the convolutional neural network, so that the convolutional neural network can extract useful features for each type of objects to be detected, and an effective classification result can be obtained in the subsequent classification process through a classification function.

Based on this, the present application proposes a road monitoring method based on object re-identification with a channel attention mechanism, comprising: acquiring a first street view image and a second street view image at a preset time interval; inputting the first street view image and the second street view image into a convolutional neural network respectively to obtain a first feature map and a second feature map; inputting the first feature map and the second feature map into a channel attention network respectively to obtain a first channel feature vector and a second channel feature vector; fusing the first feature map with the first channel feature vector to obtain a first fused feature map and fusing the second feature map with the second channel feature vector to obtain a second fused feature map, respectively; cascading the first fused feature map and the second fused feature map according to channels to obtain a classification feature map; and classifying the classification feature map by a classification function to obtain a classification result, wherein the classification result is used for indicating whether an object violating a road management regulation is included in the first street view image and the second street view image.

As shown in fig. 1, in the application scenario, street view images of a preset time interval are collected by an image collecting device (e.g., a camera C as illustrated in fig. 1), for example, the preset time interval is half an hour, that is, street view images of the same road are collected every half an hour. Then, the street view images acquired at different time points are input into a server (for example, S as illustrated in fig. 1) deployed with a road monitoring algorithm based on object re-recognition with a channel attention mechanism, wherein the server can perform re-object recognition processing on the street view images based on the road monitoring algorithm based on object re-recognition with a channel attention mechanism to detect whether an object violating the road management regulations is included on the road.

Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Exemplary method

FIG. 2 illustrates a flow diagram of a road monitoring method based on object re-identification with a channel attention mechanism according to an embodiment of the present application. As shown in fig. 2, a road monitoring method based on object re-identification with a channel attention mechanism according to an embodiment of the present application includes: s110, acquiring a first street view image and a second street view image at a preset time interval; s120, inputting the first street view image and the second street view image into a convolutional neural network respectively to obtain a first feature map and a second feature map; s130, inputting the first feature map and the second feature map into a channel attention network respectively to obtain a first channel feature vector and a second channel feature vector; s140, respectively fusing the first feature map and the first channel feature vector to obtain a first fused feature map and fusing the second feature map and the second channel feature vector to obtain a second fused feature map; s150, cascading the first fused feature map and the second fused feature map according to channels to obtain a classification feature map; and S160, classifying the classification feature map by a classification function to obtain a classification result, wherein the classification result is used for indicating whether the first street view image and the second street view image comprise objects violating the road management regulation or not.

As shown in fig. 3, the architecture of the road monitoring method according to the embodiment of the present application includes: a convolutional neural network (e.g., CNN as illustrated in fig. 3) for obtaining a first street view image (e.g., Fs as illustrated in fig. 3), a channel attention network (e.g., CAN as illustrated in fig. 3)₁) And a second street view image (e.g., Fs as illustrated in FIG. 3)₂) Performing convolution processing to obtain a first feature map (e.g., F1 as illustrated in fig. 3) and a second feature map (e.g., F2 as illustrated in fig. 3); the channel attention network is used for performing channel-based weighting processing on the first feature map and the second feature icon to obtain a first channel feature vector (e.g., Vc as illustrated in FIG. 3)₁) And a second channel feature vector (e.g., Vc as illustrated in FIG. 3)₂). Further, the first feature map and the first channel feature vector are fused to obtain a first fused feature map (e.g., Fm as illustrated in fig. 3)₁) The first mentionedThe two feature maps are fused with the second channel feature vector to obtain a second fused feature map (e.g., Fm as illustrated in FIG. 3₂). Further, the first fused feature map and the second fused feature map are concatenated by channel to obtain a classification feature map (e.g., Fc as illustrated in fig. 3). Further, the classification feature map is classified by a classification function (e.g., circle S as illustrated in fig. 3) to obtain a classification result, which is used to indicate whether an object violating road management regulations is included in the first street view image and the second street view image.

In step S110, a first street view image and a second street view image of a predetermined time interval are acquired. As described above, when detecting these objects that violate the road management regulations, on the one hand, the types of the objects may involve a plurality of types such as persons and articles; on the other hand, other objects which are accidentally present in the road area need to be taken into account. In view of the above, the inventors of the present application have converted the detection of such objects into an object re-identification problem in which street view images are taken at different times in combination with an object classification problem, that is, by determining the same illegal object in street view images taken at different times, it is determined that the illegal object belongs to an object associated with long-time lane change that violates the regulations for road management, to be distinguished from other objects that occasionally appear in the street view images. Therefore, street view images at different time points need to be acquired for object re-identification detection.

In a specific example of the present application, the process of acquiring a first street view image and a second street view image at a predetermined time interval includes: firstly, acquiring a street view video through a mobile camera for street patrol; then, acquiring a first video clip and a second video clip at the same geographic position based on the path information of the mobile camera; then, the first street view image and the second street view image are respectively cut out from the first video clip and the second video clip.

That is, the first street view image and the second street view image of the predetermined time interval are acquired by using the mobile camera for the street patrol, and particularly, when the street patrol is performed, the same position of the street is passed through at different times, so that the first video clip and the second video clip of the same geographical position can be acquired from the street view video based on the path information of the mobile camera, and thus the first video clip and the second video clip will contain the first street view image and the second street view image of the predetermined time interval. In addition, the first street view image and the second street view image obtained in the way usually have different shooting angles and background images, so that the convolutional neural network for object re-identification can extract targeted image features more conveniently.

In step S120, the first street view image and the second street view image are respectively input to a convolutional neural network to obtain a first feature map and a second feature map. That is, the first street view image and the second street view image are processed by a convolutional neural network to extract a high-dimensional feature in the first street view image and the second street view image, wherein the high-dimensional feature can represent an object included in the street view image.

In step S130, the first feature map and the second feature map are respectively input into a channel attention network to obtain a first channel feature vector and a second channel feature vector. Here, in the present application, since there are many types of objects to be detected, it is necessary to enhance feature extraction for various types of objects in the process of extracting features by the convolutional neural network. In consideration of the fact that different channels of the feature map play different roles in specifying the objects, a channel attention mechanism is introduced into the convolutional neural network, so that the convolutional neural network can extract useful features for each type of objects to be detected, and an effective classification result can be obtained in the subsequent classification process through a classification function.

In a specific example of the present application, the process of inputting the first feature map and the second feature map into a channel attention network to obtain a first channel feature vector and a second channel feature vector includes the following steps.

Firstly, inputting the first feature map into a first global pooling layer to obtain a first channel pooling vector, and inputting the second feature map into a second global pooling layer to obtain a second channel pooling vector;

then, inputting the first channel pooling vector into a first full-connection layer and activating with a Sigmoid activation function to obtain a first channel feature vector, and inputting the second channel pooling vector into a second full-connection layer and activating with a Sigmoid activation function to obtain a second channel feature vector.

In particular, different channels of the feature map play different roles in specifying objects, and by obtaining channel feature vectors from the feature map based on a channel attention mechanism, different channels in the feature map can be given different weights, thereby making the categories of the feature map "attention" objects different and obtaining the feature map associated with the categories of the objects.

As shown in fig. 4, inputting the first feature map and the second feature map into a channel attention network to obtain a first channel feature vector and a second channel feature vector, respectively, includes the steps of: s210, inputting the first feature map into a first global pooling layer to obtain a first channel pooling vector; s220, inputting the second feature map into a second global pooling layer to obtain a second channel pooling vector; s230, inputting the first channel pooling vector into a first full-connection layer and activating by a Sigmoid activating function to obtain a first channel characteristic vector; and S240, inputting the second channel pooling vector into a second full-connection layer and activating by a Sigmoid activation function to obtain a second channel feature vector.

As shown in fig. 5, the architecture of the channel attention network includes: first Global pooling layer (e.g., lp as illustrated in FIG. 5)₁) A second global pooling layer (e.g. lp as illustrated in fig. 5)₂) First fully connected layer (e.g., Fcl as illustrated in fig. 5)₁) A second fully connected layer (e.g., Fcl as illustrated in fig. 5)₁) Wherein the first global pooling layer is used to globally pool the first feature map (e.g., F1 as illustrated in FIG. 5) to obtain a first channel pooling vector (e.g., Vp as illustrated in FIG. 5)₁) The second global pooling layer is used for global pooling processing of the second feature map (e.g., F2 as illustrated in FIG. 5) to obtain a second channel pooling vector (e.g., Vp as illustrated in FIG. 5)₂) The first fully-connected layer is used for converting the first channel pooling vector into the first channel feature vector (e.g., Vc as illustrated in FIG. 5) with a Sigmoid activation function₁) The second fully-connected layer is configured to convert the second channel pooling vector into the second channel feature vector (e.g., Vc as illustrated in FIG. 5) with a Sigmoid activation function₂）。

In step S140, the first feature map and the first channel feature vector are fused to obtain a first fused feature map and the second channel feature vector are fused to obtain a second fused feature map, respectively. It should be appreciated that the first channel feature vector and the second channel feature vector can assign different weights to different channels in the feature map, thereby making the feature map "attention" to different categories of objects. Accordingly, a feature map associated with a category of an object in the first street view image, that is, the first fused feature map, can be obtained by fusing the first channel feature vector with the first feature map; by fusing the second channel feature vector with the second feature map, a feature map associated with a category of an object in the second street view image, that is, the second fused feature map, can be obtained.

In a specific example of the present application, fusing the first feature map and the first channel feature vector to obtain a first fused feature map and fusing the second feature map and the second channel feature vector to obtain a second fused feature map respectively includes: multiplying the first feature map by the first channel feature vector by channel to obtain a first fused feature map; and multiplying the second feature map by the second channel feature vector according to channels to obtain a second fused feature map.

That is, in this example, the first feature map and the first channel vector are fused directly by weighting by channel, and the second feature map and the second channel vector are fused in such a manner that the calculation is simple.

In another example of the present application, a process of fusing the first feature map and the first channel feature vector to obtain a first fused feature map and fusing the second feature map and the second channel feature vector to obtain a second fused feature map respectively includes: firstly, multiplying the first feature map by the first channel feature vector according to channels to obtain a first weighted feature map; then, adding the first weighted feature map and the first feature map to obtain the first fused feature map; then, multiplying the second feature map by the second channel feature vector according to channels to obtain a second weighted feature map; then, the second weighted feature map and the second feature map are added to obtain the second fused feature map.

That is, in this example, the first feature map is first multiplied by the first channel feature vector by a channel to obtain a first weighted feature map, and then the first weighted feature map is added to the first feature map to obtain the first fused feature map, so that the first fused feature map can retain original image features, i.e., features in the first feature map, to a greater extent. Then, the second feature map is multiplied by the second channel feature vector according to channels to obtain a second weighted feature map, and then the second weighted feature map and the second feature map are added to obtain the second fused feature map, so that the second fused feature map can retain original image features, namely, features in the second feature map to a greater extent.

In step S150, the first fused feature map and the second fused feature map are cascaded by channels to obtain a classification feature map. It should be understood that, by cascading the first fused feature map and the second fused feature map by channels, the first fused feature map and the second fused feature map may be respectively calculated by the same neural node of the deep neural network to some extent in the process of the feature vector obtained by the deep neural network, so as to ensure the association items between the objects in the first fused feature map and the association items between the objects in the second fused feature map, and also consider the association between the objects in the first fused feature map and the objects in the second fused feature map, thereby improving the accuracy of object re-identification.

In step S160, the classification feature map is classified by a classification function to obtain a classification result, where the classification result is used to indicate whether an object violating a road management rule is included in the first street view image and the second street view image.

In a specific example of the present application, the classification feature map is converted into a classification feature vector, and then a function value corresponding to the classification feature vector is obtained based on a Softmax classification function, so as to obtain a classification result. Of course, in other examples of the present application, the classification function may also be set as other classification functions, and this is not limited to the present application.

In summary, a road monitoring method based on object re-identification with a channel attention mechanism based on an embodiment of the present application is clarified, which converts an object monitoring problem violating a street management rule into an object re-identification problem of acquiring street view images at different times in combination with an object classification problem, and processes the street view images with a convolutional neural network with a channel attention mechanism to accurately obtain a monitoring result whether the object violating the street management rule is included in the street view images.

It is worth mentioning that the convolutional neural network and the classification function according to the embodiment of the present application are obtained by training a training image set, where the training image set includes a label of whether a street view image includes an object violating a road management rule.

In particular, in the embodiment of the present application, the process of training the convolutional neural network and the classification function includes: first, calculating an attention loss function value based on the first channel feature vector and the second channel feature vector, the attention loss function value being an average value of along-channel positions of a weighted sum of a logarithm of the first channel feature vector and a logarithm of the second channel feature vector; then, calculating a classification loss function value of the classification characteristic graph classified by a classification function; then, the convolutional neural network and the classification function are updated with backpropagation of gradient descent by minimizing a weighted sum of the attention loss function values and the classification loss function values.

And, in the training process, a weighting coefficient of a logarithm of the first channel feature vector and a logarithm of the second channel feature vector, and a weighting coefficient of the attention loss function value and the classification loss function value are taken as hyper-parameters.

Exemplary System

As shown in fig. 6, a road monitoring system 600 according to an embodiment of the present application includes: a street view image acquiring unit 610 for acquiring a first street view image and a second street view image at a predetermined time interval; a feature map generation unit 620, configured to input the first street view image and the second street view image obtained by the street view image obtaining unit 610 into a convolutional neural network respectively to obtain a first feature map and a second feature map; a channel feature vector generating unit 630, configured to input the first feature map and the second feature map obtained by the feature map generating unit 620 into a channel attention network respectively to obtain a first channel feature vector and a second channel feature vector; a fusion unit 640, configured to fuse the first feature map obtained by the feature map generation unit 620 and the first channel feature vector obtained by the channel feature vector generation unit 630 to obtain a first fused feature map and fuse the second feature map obtained by the feature map generation unit 620 and the second channel feature vector obtained by the channel feature vector generation unit 630 to obtain a second fused feature map, respectively; a classification feature map generating unit 650, configured to cascade the first fused feature map and the second fused feature map obtained by the fusing unit 640 according to channels to obtain a classification feature map; and a classification unit 660 configured to classify the classification feature map obtained by the classification feature map generation unit 650 by a classification function to obtain a classification result, where the classification result is used to indicate whether an object violating a road management regulation is included in the first street view image and the second street view image.

In an example, in the road monitoring system 600, the street view image obtaining unit 610 is further configured to: acquiring a street view video through a mobile camera for street patrol; acquiring a first video clip and a second video clip of the same geographic position based on the path information of the mobile camera; and respectively intercepting the first street view image and the second street view image from the first video clip and the second video clip.

In one example, in the road monitoring system 600, as shown in fig. 7, the channel feature vector generating unit 630 includes: a first pooling subunit 631 for inputting the first feature map into a first global pooling layer to obtain a first channel pooling vector; a second pooling subunit 632, configured to input the second feature map into a second global pooling layer to obtain a second channel pooling vector; a first channel vector transformation unit 633, configured to input the first channel pooling vector into a first full connection layer and activate the first channel pooling vector with a Sigmoid activation function to obtain a first channel feature vector; and a second channel vector transformation unit 634, configured to input the second channel pooling vector into a second full connection layer and activate with a Sigmoid activation function to obtain a second channel feature vector.

In an example, in the above road monitoring system 600, the fusion unit 640 is further configured to: multiplying the first feature map by the first channel feature vector by channel to obtain a first fused feature map; and multiplying the second feature map by the second channel feature vector according to channels to obtain a second fused feature map.

In one example, in the road monitoring system 600, as shown in fig. 8, the fusion unit 640 includes: a first weighted feature map generating subunit 641, configured to multiply the first feature map by the first channel feature vector according to a channel to obtain a first weighted feature map; a first fusion subunit 642, configured to add the first weighted feature map and the first feature map to obtain the first fused feature map; a second weighted feature map generation subunit 643, configured to multiply the second feature map by the second channel feature vector to obtain a second weighted feature map; and a second fused subunit 644, configured to add the second weighted feature map and the second feature map to obtain the second fused feature map.

In one example, in the road monitoring system 600, the convolutional neural network and the classification function are obtained by training through a training image set.

In one example, in the above-mentioned road monitoring system 600, the system further comprises a training unit 670 for: calculating an attention loss function value based on the first channel feature vector and the second channel feature vector, the attention loss function value being an average of along-channel positions of a weighted sum of a logarithm of the first channel feature vector and a logarithm of the second channel feature vector; calculating a classification loss function value of the classification characteristic map classified by a classification function; and updating the convolutional neural network and the classification function with backpropagation of gradient descent by minimizing a weighted sum of the attention loss function values and the classification loss function values.

In one example, in the road monitoring system 600 described above, the weighting coefficients of the logarithm of the first channel feature vector and the logarithm of the second channel feature vector and the weighting coefficients of the attention loss function value and the classification loss function value are taken as hyper-parameters in the training process.

Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described road monitoring system 600 have been described in detail in the description of the road monitoring method based on object re-recognition with a channel attention mechanism above with reference to fig. 1 to 5, and thus, a repetitive description thereof will be omitted.

As described above, the road monitoring system 600 according to the embodiment of the present application may be implemented in various terminal devices, such as a server for monitoring a road, and the like. In one example, the road monitoring system 600 according to the embodiment of the present application may be integrated into the terminal device as one software module and/or hardware module. For example, the road monitoring system 600 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the road monitoring system 600 may also be one of many hardware modules of the terminal device.

Alternatively, in another example, the road monitoring system 600 and the terminal device may be separate devices, and the road monitoring system 600 may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information according to an agreed data format.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 9.

As shown in fig. 9, the electronic device 10 includes one or more processors 11 and memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 11 to implement the functions in the road monitoring method based on object re-identification with channel attention mechanism of the various embodiments of the present application described above and/or other desired functions. Various contents such as a street view image may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 13 may include, for example, a keyboard, a mouse, and the like.

The output device 14 can output various information including the classification result to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 9, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the functions in a road monitoring method based on object re-identification with a channel attention mechanism according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in functions in a road monitoring method based on object re-identification with a channel attention mechanism according to various embodiments of the present application described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A road monitoring method based on object re-identification with a channel attention mechanism is characterized by comprising the following steps:

2. The road monitoring method based on object re-identification with a channel attention mechanism as claimed in claim 1, wherein acquiring the first street view image and the second street view image of the predetermined time interval comprises:

acquiring a street view video through a mobile camera for street patrol;

acquiring a first video clip and a second video clip of the same geographic position based on the path information of the mobile camera; and

respectively intercepting the first street view image and the second street view image from the first video clip and the second video clip.

3. The road monitoring method based on object re-identification with a channel attention mechanism according to claim 1, wherein the inputting the first feature map and the second feature map into a channel attention network to obtain a first channel feature vector and a second channel feature vector respectively comprises:

inputting the first feature map into a first global pooling layer to obtain a first channel pooling vector;

inputting the second feature map into a second global pooling layer to obtain a second channel pooling vector;

inputting the first channel pooling vector into a first full-connection layer and activating by a Sigmoid activating function to obtain a first channel characteristic vector; and

and inputting the second channel pooling vector into a second full-connection layer and activating by a Sigmoid activation function to obtain a second channel feature vector.

4. The road monitoring method based on object re-identification with a channel attention mechanism according to claim 1, wherein fusing the first feature map with the first channel feature vector to obtain a first fused feature map and fusing the second feature map with the second channel feature vector to obtain a second fused feature map respectively comprises:

multiplying the first feature map by the first channel feature vector by channel to obtain a first fused feature map; and

multiplying the second feature map by the second channel feature vector by channel to obtain a second fused feature map.

5. The road monitoring method based on object re-identification with a channel attention mechanism according to claim 1, wherein fusing the first feature map with the first channel feature vector to obtain a first fused feature map and fusing the second feature map with the second channel feature vector to obtain a second fused feature map respectively comprises:

multiplying the first feature map by the first channel feature vector by channel to obtain a first weighted feature map;

adding the first weighted feature map to the first feature map to obtain the first fused feature map;

multiplying the second feature map by the second channel feature vector by channel to obtain a second weighted feature map; and

and adding the second weighted feature map and the second feature map to obtain the second fused feature map.

6. The road monitoring method based on object re-recognition with channel attention mechanism according to claim 1, wherein the convolutional neural network and the classification function are obtained by training of a training image set.

7. The road monitoring method based on object re-recognition with channel attention mechanism as claimed in claim 6, wherein the training process comprises:

8. The road monitoring method based on object re-identification with channel attention mechanism of claim 7, wherein in the training process, weighting coefficients of the logarithm of the first channel feature vector and the logarithm of the second channel feature vector and weighting coefficients of the attention loss function value and the classification loss function value are taken as hyper-parameters.

9. A roadway monitoring system based on object re-identification with a lane attentiveness mechanism, comprising:

10. An electronic device, comprising:

a processor; and

a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform a method of road monitoring based on object re-identification with channel attention mechanism according to any one of claims 1-8.