CN116798010A

CN116798010A - Training method, device, equipment and medium for vehicle image retrieval model

Info

Publication number: CN116798010A
Application number: CN202310863394.2A
Authority: CN
Inventors: 陈思宝; 张明周; 芮箕祥; 罗斌
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-09-22

Abstract

The application provides a training method, a device, equipment and a medium of a vehicle image retrieval model, wherein the training method comprises the steps of acquiring an image data set of a vehicle; normalizing the image dataset to generate a training dataset; constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and taking the training data set as an input variable of the initial vehicle image retrieval model, training and optimizing the initial vehicle image retrieval model, and obtaining a target vehicle image retrieval model. The application can meet the data retrieval of large-scale vehicle images and ensure higher retrieval efficiency.

Description

Training method, device, equipment and medium for vehicle image retrieval model

Technical Field

The application relates to the technical field of image processing, in particular to a training method, device, equipment and medium of a vehicle image retrieval model.

Background

As the size of road cargo transportation continues to expand, the image retrieval application of vehicles gradually expands from conventional image retrieval to image retrieval in large data scenes. With the gradual increase of an image retrieval library, the prior art aims at the problems of low retrieval speed and low retrieval efficiency of large-scale vehicle image retrieval. Therefore, there is a need for improvement.

Disclosure of Invention

In view of the above drawbacks of the prior art, the present application provides a training method, apparatus, device and medium for a vehicle image retrieval model, so as to solve the above technical problems.

The application provides a training method of a lane line recognition model, which comprises the following steps:

acquiring an image dataset of a vehicle;

normalizing the image dataset to generate a training dataset;

constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and

and taking the training data set as an input variable of the initial vehicle image retrieval model, training and optimizing the initial vehicle image retrieval model, and obtaining a target vehicle image retrieval model.

In one embodiment of the application, the step of acquiring an image dataset of the vehicle comprises:

acquiring vehicle images of a plurality of scenes;

respectively carrying out label labeling processing on the vehicle images to generate corresponding image data sets, wherein the image data sets comprise a plurality of image data; and

dividing all the image data into a training image set, a verification image set and a test image set according to a preset proportion.

In one embodiment of the application, the step of configuring the optimizer and loss function of the initial vehicle image retrieval model comprises:

setting an optimizer of the initial vehicle image retrieval model as a random gradient descent optimizer; and

and calculating the loss function of the initial vehicle image retrieval model according to the cross entropy classification loss function.

In one embodiment of the application, the Loss function Loss of the initial vehicle image retrieval model _final The following formula may be satisfied:

where k represents a local feature vector, J (V _I ) The global feature vector is represented as such,the method is characterized in that the method comprises the steps of representing local feature weight coefficients generated based on local feature vectors and global feature vectors, M (R) represents a feature vector obtained by pooling local areas, omega represents all local areas obtained by multi-scale sampling, and R represents one area in all local areas.

In one embodiment of the present application, the step of training and optimizing the initial vehicle image retrieval model using the training data set as an input variable of the initial vehicle image retrieval model includes:

performing feature extraction processing on the training data set to generate a local feature vector of each local image;

carrying out hash processing on the local feature vectors to generate hash codes of the local images; and

and generating the hash codes of the whole image based on the hash codes of the local images.

In one embodiment of the present application, the step of performing feature extraction processing on the training data set to generate a local feature vector of each local image includes:

carrying out overall feature extraction processing on the training data set to generate overall features of the vehicle image;

performing multi-scale sampling and co-scale transformation on the vehicle image to generate multi-scale features; and

and carrying out maximum pooling treatment and post-treatment on the multi-scale features to generate feature vectors corresponding to each local image.

In one embodiment of the present application, the step of generating a hash code of the whole image based on the hash codes of the partial images includes:

calculating the weight coefficient of each local image; and

and generating the hash code of the whole image based on the weight coefficient and the hash code of the local image.

The application also provides a training device of the vehicle image retrieval model, which comprises the following steps:

the system comprises an image acquisition module, a test image acquisition module and a display module, wherein the image acquisition module is used for acquiring an image data set of a vehicle, and the image data set comprises a training image set, a verification image set and a test image set;

the data processing module is used for carrying out normalization processing on the image data set so as to generate a training data set;

the model construction module is used for constructing an initial vehicle image retrieval model based on a local attention mechanism and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and

the model training module is used for training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.

The application also provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the training method of the vehicle image retrieval model as described in any one of the above.

The present application also provides a computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the training method of the vehicle image retrieval model as set forth in any one of the above.

In summary, the training method, device, equipment and medium of the vehicle image retrieval model have the following beneficial effects: the application can search similar vehicle pictures through analyzing and comparing the searched vehicle pictures and the pictures in the vehicle gallery, thereby realizing the search of the vehicle images and ensuring higher search efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that a further understanding of the present application may be obtained from these drawings to those of ordinary skill in the art without undue effort. In the drawings:

fig. 1 is a schematic flow chart of a training method of a vehicle image retrieval model provided by the application.

Fig. 2 is a flow chart of an embodiment of step S100 in fig. 1.

Fig. 3 is a flowchart illustrating an embodiment of step S300 in fig. 1.

Fig. 4 is a flowchart illustrating an embodiment of step S400 in fig. 1.

Fig. 5 is a flowchart illustrating an embodiment of step S410 in fig. 4.

Fig. 6 is a schematic diagram of a multi-scale sampling structure according to an embodiment of the present application.

FIG. 7 is a schematic diagram of a local feature hashing architecture in accordance with an embodiment of the present application.

Fig. 8 is a flowchart illustrating an embodiment of step S430 in fig. 4.

Fig. 9 is a schematic diagram of a training device for a vehicle image retrieval model according to the present application.

Fig. 10 is a schematic diagram of a computer device of the present application.

FIG. 11 is a schematic diagram of another computer device of the present application.

Detailed Description

Further advantages and effects of the present application will become readily apparent to those skilled in the art from the description herein, by referring to the accompanying drawings and the preferred embodiments. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.

The drawings provided in the following embodiments merely illustrate the basic idea of the present application by way of illustration, and only the components related to the present application are shown in the drawings, not according to the number, shape and size of the components in actual implementation, the form, number and proportion of each component in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present application, it will be apparent to one skilled in the art that embodiments of the present application may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present application.

The "search by map" mainly uses a Content-based image retrieval technique, and image feature extraction is a key of CBIR (Content-based image retrieval), content-based image retrieval). To better express image-rich semantic information, global high-dimensional features are generally used for representation, such as GIST, SIFT features, or fusion of multiple features. The content-based image retrieval method has the problem of low efficiency when the data volume is very large if sequential searching is performed. The solution is to build an index structure to improve the retrieval efficiency, such as tree structures of K-D tree, B tree, R tree, etc. However, the search efficiency of these tree index structures decreases rapidly with increasing feature dimensions, and is even lower than linear search. The application introduces a semantic hash correlation algorithm, namely image retrieval based on hash. Based on the approximate concept, the retrieval efficiency is more focused in large-scale data retrieval, and the retrieval accuracy is not required to be too high, so that the requirements of most users are met.

Referring to fig. 1, fig. 1 is a flow chart illustrating a training method of a vehicle image retrieval model according to the present application. The application provides a training method of a vehicle image retrieval model, which can retrieve similar vehicle images in a video-monitored vehicle image library through analysis and comparison of the retrieved vehicle images and images in the vehicle image library, thereby realizing retrieval of the vehicle images. The application can meet the requirement of large-scale data retrieval and can ensure higher retrieval efficiency. The training method may include the steps of:

step S100, acquiring an image data set of a vehicle;

step 200, carrying out normalization processing on the image data set to generate a training data set;

step S300, constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model;

and step 400, training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.

Referring to fig. 2, fig. 2 is a flow chart illustrating an embodiment of step S100 in fig. 1. In one embodiment of the present application, when step S100 is performed, steps S110 to S130 may be included, which are described in detail as follows:

step S110, acquiring vehicle images of a plurality of scenes;

step S120, respectively performing label labeling processing on the vehicle images to generate corresponding image data sets, wherein the image data sets comprise a plurality of image data;

and S130, dividing all the image data into a training image set, a verification image set and a test image set according to a preset proportion.

In one embodiment of the present application, when steps S110 to S130 are performed. Specifically, a plurality of intelligent traffic imaging devices can be used for acquiring vehicle images, for example, the passing vehicles can be respectively shot through cameras arranged on roads and bridges, traffic lights or mobile monitoring vehicles, and then vehicle images of a plurality of scenes are generated. Then, the obtained vehicle images of the plurality of scenes may be subjected to a label labeling process using a picture labeling tool to generate corresponding image data. In this embodiment, the image labeling tool may be an open source image labeling tool Labelme. The image data may include a vehicle image and a tag image corresponding to the vehicle image. The plurality of image data are integrated, and an image dataset can be generated.

Further, since the number of image data is plural, it is necessary to distribute it into a training image set, a verification image set, and a test image set according to a predetermined ratio. For example, the preset ratio may be 4:1:1, i.e. the ratio of the number of image data in the training image set, the number of image data in the verification image set and the number of image data in the test image set may be 4:1:1. the specific size of the preset proportion may not be limited as long as training of the vehicle image retrieval model can be satisfied. Wherein, m vehicle images in the training image set can be expressed as t= { T ₁ ，T ₂ ，…，T _i ，…，T _m The corresponding m label images may be represented as tl= { TL ₁ ，TL ₂ ，…，TL _i ，…，TL _m }, wherein T is _i Representing the ith vehicle image in the training image set, TL _i And a label image representing an ith vehicle image in the training image set, wherein i is less than m. The n vehicle images in the verification image set may be represented as v= { V ₁ ，V ₂ ，…，V _i ，…，V _n The corresponding n label images may be represented as vl= { VL ₁ ，VL ₂ ，…，VL _i ，…，VL _n }, wherein V _i Representing the ith vehicle image in the verification image set, VL _i And a label image representing an i-th vehicle image in the verification image set, i < n. The test image set may include several images of the vehicle for testing.

In one embodiment of the present application, when step S200 is performed, the image dataset is normalized to generate a training dataset. Specifically, normalization processing may be performed on all image data in the image data set, so that each vehicle image and its corresponding tag image are unified to a preset size. In this embodiment, the preset size may be set to 800×800 so that it is advantageous for the convergence of the model.

In one embodiment of the present application, when step S300 is performed, an initial vehicle image retrieval model is constructed based on the local attention mechanism, and an optimizer and a loss function of the initial vehicle image retrieval model are configured. Specifically, an encoder-decoder of an initial vehicle image retrieval model is constructed based on a local attention mechanism. The model retrieval efficiency can be improved on the premise of meeting accuracy by using a local attention mechanism with a fixed window size.

Referring to fig. 3, fig. 3 is a flow chart illustrating an embodiment of step S300 in fig. 1. In one embodiment of the present application, when step S300 is performed, steps S310 to S320 may be included, and the following is described in detail:

step S310, setting an optimizer of an initial vehicle image retrieval model as a random gradient descent optimizer;

step S320, calculating a loss function of the initial vehicle image retrieval model according to the cross entropy classification loss function.

In one embodiment of the present application, when steps S310 to S320 are performed. Specifically, the optimizer of the initial vehicle image retrieval model can be set as a random gradient descent (SGD) optimizer, thereby realizing control of update step length and iterative update of network parametersThe network prediction result is closer to the true value, and the effect of training the network is achieved. Meanwhile, the loss function can be classified according to the cross entropy, and the final loss function can be obtained through calculation. In this embodiment, the loss function may be obtained according to the weight coefficient of the local area generated by the local attention module. Loss function Loss _final The following formula may be satisfied:

where k represents a local feature vector, J (V _I ) The global feature vector is represented as such,representing local feature weight coefficients generated based on local feature vectors and global feature vectors, M (R) representing a feature vector after local region pooling, Ω representing all local regions after multi-scale sampling, and R representing one of all local regions.

Further, after setting the optimizer and the loss function of the initial vehicle image retrieval model, the built initial vehicle image retrieval model may be pre-trained using a preset data set to increase the model convergence speed. In this embodiment, the default dataset may be an ImageNet dataset.

In one embodiment of the present application, when step S400 is performed, the training data set is used as an input variable of the initial vehicle image retrieval model, and the initial vehicle image retrieval model is trained and optimized to obtain the target vehicle image retrieval model. Specifically, it should be noted that, first, a model training strategy is also formulated before model training is performed. In one embodiment of the application, the preset ratio 4 of the training image set and the verification image set may be based on: and 1, formulating a model training strategy. Specifically, it may be set to calculate the current model accuracy on the training image set after every 10 epochs from the 100 th epoch end and to preserve the model accuracy. Wherein epoch characterizes the process of training once using all samples in the training image set. The current model accuracy is then calculated every 10 epochs and compared to the previous model accuracy, and if the accuracy of the next model exceeds the progress of the previous model, the previous model is replaced, otherwise the previous model is retained. And training and optimizing the initial vehicle image retrieval model according to the model training strategy by taking the training data set as an input variable of the initial vehicle image retrieval model, so that the model is suitable for processing the vehicle image. In the embodiment, model training is performed on the basis of the training image set, model accuracy verification is performed on the basis of the verification image set, meanwhile, the model stores optimal weight, the accuracy of the training image set and the verification set is recorded, parameter adjustment is facilitated, and finally the target vehicle image retrieval model meeting the preset accuracy requirement is obtained.

Referring to fig. 4, fig. 4 is a flow chart illustrating an embodiment of step S400 in fig. 1. In one embodiment of the present application, when step S400 is performed, step S400 may include steps S410 to S430, which are described in detail as follows:

step S410, carrying out feature extraction processing on the training data set to generate a local feature vector of each local image;

step S420, carrying out hash processing on the local feature vectors to generate hash codes of all the local images;

step S430, generating hash codes of the whole image based on the hash codes of the partial images.

Referring to fig. 5, fig. 5 is a flow chart illustrating an embodiment of step S410 in fig. 4. In one embodiment of the present application, when step S410 is performed, steps S411 to S413 may be included, and the following is described in detail:

step S411, carrying out overall feature extraction processing on the training data set to generate overall features of the vehicle image;

step S412, performing multi-scale sampling and co-scale transformation processing on the vehicle image to generate multi-scale features;

and step S413, carrying out maximum pooling processing and post-processing on the multi-scale features to generate feature vectors corresponding to each local image.

In one embodiment of the present application, when step S411 is performed, the training data set is subjected to the global feature extraction process to generate global features of the vehicle image. Specifically, the training data set may be subjected to overall feature extraction processing through a deep convolutional network to extract overall features of each vehicle image.

Referring to fig. 6, fig. 6 is a schematic diagram of a multi-scale sampling structure according to an embodiment of the application. In one embodiment of the present application, when step S412 is performed, the vehicle image is subjected to multi-scale sampling and co-scale transformation to generate multi-scale features. Specifically, based on the initial vehicle image retrieval model, the vehicle image 601 is multi-scale sampled using an R-MAC (Regional Maximum Activation of Convolutions, local feature accumulator) to extract multi-scale features 602, and co-scale transformed to map into depth features extracted by the convolutional neural network. In this embodiment, the R-MAC may be utilized to perform multi-scale sampling, for example 55 times. Wherein, the sampling overlap rate can be set to 0.4, and the sampling area size can satisfy the following formula:

R _s ＝2min(W,H)/(s+1)

where W and H represent the width and height of the feature map, respectively, and s represents the scale ratio of the samples.

Further, the coordinates of the R-MAC samples are subjected to co-scale transformation, and the characteristics of the sampled local area can be obtained according to the transformed coordinates. In this embodiment, the co-scale variation coefficient may be set to 0.03125. The global feature size of the picture extracted by using the convolutional neural network can be set to 25×25, and the scale normalization processing is performed according to the image dataset, wherein the size of the image dataset can satisfy the following formula:

Ω＝C _s (V _I )

wherein V is _I A convolution feature map representing a vehicle image sample, C _s Is a local area sampler, can obtain multi-scale characteristics of a vehicle image sample after the same-scale transformation, can have different receptive fields in the characteristic diagram under the same scale, and is fused with the characteristics after superpositionFIG. 603 is a diagram featuring multiple receptive fields, more representative.

Referring to fig. 7, fig. 7 is a schematic diagram illustrating a partial feature hashing structure according to an embodiment of the application. In one embodiment of the present application, when step S413 is performed, the multi-scale features are subjected to a max pooling process and a post-process to generate feature vectors corresponding to each partial image. Specifically, the multi-scale features are pooled and post-processed by the pooling layer 701. In this embodiment, after the multi-scale features of the vehicle image sample are obtained, the multi-scale sampled features are pooled first, so that the local features of different scales are consistent in size, so as to facilitate subsequent processing. In this embodiment, data processing is performed by maximum pooling.

Further, after the maximum pooling processing, post-processing is required to be performed on the data after the maximum pooling processing, so as to facilitate subsequent generation of the corresponding hash code. In the post-processing, a local feature vector R generated after post-processing of the local feature _I The following formula may be satisfied:

R _I ＝P(M(R))

where M (R) represents maximum pooling processing of local features, and P (M (R)) represents post-processing of data after the maximum pooling processing, the purpose of the post-processing being to obtain local feature vectors.

Referring to fig. 7, in one embodiment of the present application, when step S420 is performed, a hashing process is performed on the local feature vectors to generate hash codes of each local image. Specifically, after the local feature vector is obtained through post-processing, the local feature vector is hashed by using the hash layer 702, so as to obtain a hash code corresponding to the local feature. The local feature vector hashing may satisfy the following formula:

R _H ＝H(R _I )

wherein H (R) _I ) The local feature vector generated after post-processing is hashed by the hash layer 702.

Referring to fig. 8, fig. 8 is a flow chart illustrating an embodiment of step S430 in fig. 4. In one embodiment of the present application, when step S430 is performed, steps S431 to S432 may be included, which are described in detail below:

step S431, calculating the weight coefficient of each local image;

step S432, based on the weight coefficient and the hash code of the local image, the hash code of the whole image is generated.

In one embodiment of the present application, when performing steps S413 to S432, the context awareness module may be used to calculate the weight coefficient of each partial image, and obtain the hash code of the whole image through the weight coefficient and the hash code of the partial image. Specifically, the context sensing module performs estimation analysis on each local area to obtain the weight coefficient of each local area, the hash code of the whole image is obtained through the weight coefficient and the hash code of the local image, and finally, the precise image retrieval is performed through the hash code,

in one embodiment of the present application, when step S431 is performed, i.e., the weight coefficient of each partial image is calculated. Specifically, through a context sensing module, each local area is estimated and analyzed, and the weight coefficient of each local area is obtained. First, a context-aware global feature vector may be calculated, while a regional attention weighting coefficient is conditionally calculated, wherein the regional attention weighting coefficient may satisfy the following formula:

where k represents a local feature vector, J (V _I ) Representing the global feature vector, the local feature and the global feature may be considered jointly as input to Φ () in order to conditionally consider the regional attention weight.

In one embodiment of the present application, when step S432 is performed, i.e., based on the weight coefficients and the hash code of the partial image, the hash code of the entire image is generated. Specifically, the context awareness module is used for generating hash codes of the whole image based on the hash codes of the local images, namely the hash codes of the global context awareness are obtained, and the hash codes of the whole image can meet the following formula:

wherein,,representing the regional attention weighting coefficients generated based on the local features and the global features, H (P (M (R))) representing the hash code generated based on the local features, Ω representing all local regions after all R-MAC multiscale sampling, Ω|representing the number of regions of the multiscale sampling.

Further, after the hash code of the entire image is generated, the vehicle target search library can be used for searching. Firstly, comparing the generated hash code with the hash code of the vehicle image in the vehicle target search library; then, the relevant vehicle pictures are ranked according to the hamming distance and listed side by side.

Referring to fig. 9, the application further provides a training device for the vehicle image retrieval model, where the training device corresponds to the training method in the above embodiment one by one. The training apparatus may include an image acquisition module 901, a data processing module 902, a model building module 903, and a model training module 904. The functional modules are described in detail as follows:

the image acquisition module 901 may be configured to acquire an image dataset of a vehicle, wherein the image dataset includes a training image set, a verification image set, and a test image set. Further, the image acquisition module 901 may be specifically configured to acquire vehicle images of a plurality of scenes; respectively carrying out label labeling processing on the vehicle images to generate corresponding image data sets, wherein the image data sets comprise a plurality of image data; and dividing all the image data into a training image set, a verification image set and a test image set according to a preset proportion.

The data processing module 902 may be used to normalize the image data set to generate a training data set. Specifically, the data processing module 902 may normalize all the image data in the image data set, so that each vehicle image and its corresponding label image are unified to a preset size.

The model building module 903 may be used to build an initial vehicle image retrieval model based on the local attention mechanism and configure an optimizer and a loss function of the initial vehicle image retrieval model. Further, the model building module 903 may be specifically configured to set an optimizer of the initial vehicle image retrieval model to be a random gradient descent optimizer; and calculating a loss function of the initial vehicle image retrieval model according to the cross entropy classification loss function.

The model training module 904 is operable to train and optimize an initial vehicle image retrieval model using the training dataset as an input variable to the initial vehicle image retrieval model to obtain a target vehicle image retrieval model. Further, the model training module 904 may be specifically configured to perform feature extraction processing on the training data set to generate a local feature vector of each local image; hashing the local feature vectors to generate hash codes of each local image; and generating the hash code of the whole image based on the hash codes of the partial images.

For specific limitations of the training device, reference may be made to the limitations of the training method described above, and will not be repeated here. The various modules in the training device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Referring to fig. 10, the present application further provides a computer device, which may be a server. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes non-volatile and/or volatile storage media and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external client via a network connection. The computer program is executed by a processor to perform the functions or steps of a training method for a vehicle image retrieval model.

Referring to fig. 11, the present application also provides another computer device, which may be a client. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external server via a network connection. The computer program is executed by a processor to perform the functions or steps of a training method for a vehicle image retrieval model.

In one embodiment of the application, a computer device is provided comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

acquiring an image dataset of a vehicle, wherein the image dataset comprises a training image set, a verification image set and a test image set;

normalizing the image dataset to generate a training dataset;

constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model;

and training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.

In one embodiment of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:

normalizing the image dataset to generate a training dataset;

It should be noted that, the functions or steps that can be implemented by the computer readable storage medium or the computer device may correspond to those described in the foregoing method embodiments, and are not described herein for avoiding repetition.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

In summary, the application provides a training method, device, equipment and medium for a vehicle image retrieval model, which can be applied to the technical field of image processing. The application can be applied to vehicle image retrieval by utilizing the technical means of semantic hashing, feature fusion, attention mechanism and the like, improves the retrieval speed and accuracy aiming at the vehicle images, and solves the problem of poor effect of the conventional vehicle image retrieval algorithm applied to a large-scale vehicle image library. Based on a convolutional neural network based on deep learning, according to the input vehicle images of a plurality of scenes, different areas of different scales of the images are automatically sampled; and then the fusion of the region features with different scales is realized through the context sensing module, and more representative vehicle image features are obtained. The cross entropy classification loss function is used, so that the training effect of the model is more stable and the effect is better. Meanwhile, the enhancement operation during query is realized by utilizing the query expansion technology, more accurate retrieval is performed, and higher retrieval precision is achieved.

In the description of the present specification, the descriptions of the terms "present embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The embodiments of the application disclosed above are intended only to help illustrate the application. The examples are not intended to be exhaustive or to limit the application to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A training method for a vehicle image retrieval model, comprising:

acquiring an image dataset of a vehicle;

normalizing the image dataset to generate a training dataset;

2. The training method of a vehicle image retrieval model according to claim 1, wherein the step of acquiring an image dataset of a vehicle comprises:

acquiring vehicle images of a plurality of scenes;

3. The training method of a vehicle image retrieval model according to claim 1, characterized in that the step of configuring an optimizer and a loss function of the initial vehicle image retrieval model comprises:

4. A method according to claim 3The training method of the vehicle image retrieval model is characterized in that the Loss function Loss of the initial vehicle image retrieval model _final The following formula may be satisfied:

wherein k represents a local feature vector, J # _I ) The global feature vector is represented as such,the method is characterized in that the method comprises the steps of representing local feature weight coefficients generated based on local feature vectors and global feature vectors, M (R) represents a feature vector obtained by pooling local areas, omega represents all local areas obtained by multi-scale sampling, and R represents one area in all local areas.

5. The training method of the vehicle image retrieval model according to claim 2, characterized in that the step of training and optimizing the initial vehicle image retrieval model using the training data set as an input variable of the initial vehicle image retrieval model includes:

6. The training method of a vehicle image retrieval model according to claim 5, wherein the step of performing feature extraction processing on the training data set to generate a local feature vector for each local image includes:

7. The training method of the vehicle image retrieval model according to claim 5, wherein the step of generating a hash code of the entire image based on the hash codes of the respective partial images includes:

calculating the weight coefficient of each local image; and

8. A training device for a vehicle image retrieval model, comprising:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the training method of the vehicle image retrieval model according to any one of claims 1 to 7.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the training method of the vehicle image retrieval model according to any one of claims 1 to 7.