CN116798010A - Training method, device, equipment and medium for vehicle image retrieval model - Google Patents

Training method, device, equipment and medium for vehicle image retrieval model Download PDF

Info

Publication number
CN116798010A
CN116798010A CN202310863394.2A CN202310863394A CN116798010A CN 116798010 A CN116798010 A CN 116798010A CN 202310863394 A CN202310863394 A CN 202310863394A CN 116798010 A CN116798010 A CN 116798010A
Authority
CN
China
Prior art keywords
image
retrieval model
training
vehicle image
image retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310863394.2A
Other languages
Chinese (zh)
Inventor
陈思宝
张明周
芮箕祥
罗斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202310863394.2A priority Critical patent/CN116798010A/en
Publication of CN116798010A publication Critical patent/CN116798010A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a training method, a device, equipment and a medium of a vehicle image retrieval model, wherein the training method comprises the steps of acquiring an image data set of a vehicle; normalizing the image dataset to generate a training dataset; constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and taking the training data set as an input variable of the initial vehicle image retrieval model, training and optimizing the initial vehicle image retrieval model, and obtaining a target vehicle image retrieval model. The application can meet the data retrieval of large-scale vehicle images and ensure higher retrieval efficiency.

Description

Training method, device, equipment and medium for vehicle image retrieval model
Technical Field
The application relates to the technical field of image processing, in particular to a training method, device, equipment and medium of a vehicle image retrieval model.
Background
As the size of road cargo transportation continues to expand, the image retrieval application of vehicles gradually expands from conventional image retrieval to image retrieval in large data scenes. With the gradual increase of an image retrieval library, the prior art aims at the problems of low retrieval speed and low retrieval efficiency of large-scale vehicle image retrieval. Therefore, there is a need for improvement.
Disclosure of Invention
In view of the above drawbacks of the prior art, the present application provides a training method, apparatus, device and medium for a vehicle image retrieval model, so as to solve the above technical problems.
The application provides a training method of a lane line recognition model, which comprises the following steps:
acquiring an image dataset of a vehicle;
normalizing the image dataset to generate a training dataset;
constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and
and taking the training data set as an input variable of the initial vehicle image retrieval model, training and optimizing the initial vehicle image retrieval model, and obtaining a target vehicle image retrieval model.
In one embodiment of the application, the step of acquiring an image dataset of the vehicle comprises:
acquiring vehicle images of a plurality of scenes;
respectively carrying out label labeling processing on the vehicle images to generate corresponding image data sets, wherein the image data sets comprise a plurality of image data; and
dividing all the image data into a training image set, a verification image set and a test image set according to a preset proportion.
In one embodiment of the application, the step of configuring the optimizer and loss function of the initial vehicle image retrieval model comprises:
setting an optimizer of the initial vehicle image retrieval model as a random gradient descent optimizer; and
and calculating the loss function of the initial vehicle image retrieval model according to the cross entropy classification loss function.
In one embodiment of the application, the Loss function Loss of the initial vehicle image retrieval model final The following formula may be satisfied:
where k represents a local feature vector, J (V I ) The global feature vector is represented as such,the method is characterized in that the method comprises the steps of representing local feature weight coefficients generated based on local feature vectors and global feature vectors, M (R) represents a feature vector obtained by pooling local areas, omega represents all local areas obtained by multi-scale sampling, and R represents one area in all local areas.
In one embodiment of the present application, the step of training and optimizing the initial vehicle image retrieval model using the training data set as an input variable of the initial vehicle image retrieval model includes:
performing feature extraction processing on the training data set to generate a local feature vector of each local image;
carrying out hash processing on the local feature vectors to generate hash codes of the local images; and
and generating the hash codes of the whole image based on the hash codes of the local images.
In one embodiment of the present application, the step of performing feature extraction processing on the training data set to generate a local feature vector of each local image includes:
carrying out overall feature extraction processing on the training data set to generate overall features of the vehicle image;
performing multi-scale sampling and co-scale transformation on the vehicle image to generate multi-scale features; and
and carrying out maximum pooling treatment and post-treatment on the multi-scale features to generate feature vectors corresponding to each local image.
In one embodiment of the present application, the step of generating a hash code of the whole image based on the hash codes of the partial images includes:
calculating the weight coefficient of each local image; and
and generating the hash code of the whole image based on the weight coefficient and the hash code of the local image.
The application also provides a training device of the vehicle image retrieval model, which comprises the following steps:
the system comprises an image acquisition module, a test image acquisition module and a display module, wherein the image acquisition module is used for acquiring an image data set of a vehicle, and the image data set comprises a training image set, a verification image set and a test image set;
the data processing module is used for carrying out normalization processing on the image data set so as to generate a training data set;
the model construction module is used for constructing an initial vehicle image retrieval model based on a local attention mechanism and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and
the model training module is used for training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.
The application also provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the training method of the vehicle image retrieval model as described in any one of the above.
The present application also provides a computer-readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the training method of the vehicle image retrieval model as set forth in any one of the above.
In summary, the training method, device, equipment and medium of the vehicle image retrieval model have the following beneficial effects: the application can search similar vehicle pictures through analyzing and comparing the searched vehicle pictures and the pictures in the vehicle gallery, thereby realizing the search of the vehicle images and ensuring higher search efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that a further understanding of the present application may be obtained from these drawings to those of ordinary skill in the art without undue effort. In the drawings:
fig. 1 is a schematic flow chart of a training method of a vehicle image retrieval model provided by the application.
Fig. 2 is a flow chart of an embodiment of step S100 in fig. 1.
Fig. 3 is a flowchart illustrating an embodiment of step S300 in fig. 1.
Fig. 4 is a flowchart illustrating an embodiment of step S400 in fig. 1.
Fig. 5 is a flowchart illustrating an embodiment of step S410 in fig. 4.
Fig. 6 is a schematic diagram of a multi-scale sampling structure according to an embodiment of the present application.
FIG. 7 is a schematic diagram of a local feature hashing architecture in accordance with an embodiment of the present application.
Fig. 8 is a flowchart illustrating an embodiment of step S430 in fig. 4.
Fig. 9 is a schematic diagram of a training device for a vehicle image retrieval model according to the present application.
Fig. 10 is a schematic diagram of a computer device of the present application.
FIG. 11 is a schematic diagram of another computer device of the present application.
Detailed Description
Further advantages and effects of the present application will become readily apparent to those skilled in the art from the description herein, by referring to the accompanying drawings and the preferred embodiments. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.
The drawings provided in the following embodiments merely illustrate the basic idea of the present application by way of illustration, and only the components related to the present application are shown in the drawings, not according to the number, shape and size of the components in actual implementation, the form, number and proportion of each component in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present application, it will be apparent to one skilled in the art that embodiments of the present application may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present application.
The "search by map" mainly uses a Content-based image retrieval technique, and image feature extraction is a key of CBIR (Content-based image retrieval), content-based image retrieval). To better express image-rich semantic information, global high-dimensional features are generally used for representation, such as GIST, SIFT features, or fusion of multiple features. The content-based image retrieval method has the problem of low efficiency when the data volume is very large if sequential searching is performed. The solution is to build an index structure to improve the retrieval efficiency, such as tree structures of K-D tree, B tree, R tree, etc. However, the search efficiency of these tree index structures decreases rapidly with increasing feature dimensions, and is even lower than linear search. The application introduces a semantic hash correlation algorithm, namely image retrieval based on hash. Based on the approximate concept, the retrieval efficiency is more focused in large-scale data retrieval, and the retrieval accuracy is not required to be too high, so that the requirements of most users are met.
Referring to fig. 1, fig. 1 is a flow chart illustrating a training method of a vehicle image retrieval model according to the present application. The application provides a training method of a vehicle image retrieval model, which can retrieve similar vehicle images in a video-monitored vehicle image library through analysis and comparison of the retrieved vehicle images and images in the vehicle image library, thereby realizing retrieval of the vehicle images. The application can meet the requirement of large-scale data retrieval and can ensure higher retrieval efficiency. The training method may include the steps of:
step S100, acquiring an image data set of a vehicle;
step 200, carrying out normalization processing on the image data set to generate a training data set;
step S300, constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model;
and step 400, training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.
Referring to fig. 2, fig. 2 is a flow chart illustrating an embodiment of step S100 in fig. 1. In one embodiment of the present application, when step S100 is performed, steps S110 to S130 may be included, which are described in detail as follows:
step S110, acquiring vehicle images of a plurality of scenes;
step S120, respectively performing label labeling processing on the vehicle images to generate corresponding image data sets, wherein the image data sets comprise a plurality of image data;
and S130, dividing all the image data into a training image set, a verification image set and a test image set according to a preset proportion.
In one embodiment of the present application, when steps S110 to S130 are performed. Specifically, a plurality of intelligent traffic imaging devices can be used for acquiring vehicle images, for example, the passing vehicles can be respectively shot through cameras arranged on roads and bridges, traffic lights or mobile monitoring vehicles, and then vehicle images of a plurality of scenes are generated. Then, the obtained vehicle images of the plurality of scenes may be subjected to a label labeling process using a picture labeling tool to generate corresponding image data. In this embodiment, the image labeling tool may be an open source image labeling tool Labelme. The image data may include a vehicle image and a tag image corresponding to the vehicle image. The plurality of image data are integrated, and an image dataset can be generated.
Further, since the number of image data is plural, it is necessary to distribute it into a training image set, a verification image set, and a test image set according to a predetermined ratio. For example, the preset ratio may be 4:1:1, i.e. the ratio of the number of image data in the training image set, the number of image data in the verification image set and the number of image data in the test image set may be 4:1:1. the specific size of the preset proportion may not be limited as long as training of the vehicle image retrieval model can be satisfied. Wherein, m vehicle images in the training image set can be expressed as t= { T 1 ,T 2 ,…,T i ,…,T m The corresponding m label images may be represented as tl= { TL 1 ,TL 2 ,…,TL i ,…,TL m }, wherein T is i Representing the ith vehicle image in the training image set, TL i And a label image representing an ith vehicle image in the training image set, wherein i is less than m. The n vehicle images in the verification image set may be represented as v= { V 1 ,V 2 ,…,V i ,…,V n The corresponding n label images may be represented as vl= { VL 1 ,VL 2 ,…,VL i ,…,VL n }, wherein V i Representing the ith vehicle image in the verification image set, VL i And a label image representing an i-th vehicle image in the verification image set, i < n. The test image set may include several images of the vehicle for testing.
In one embodiment of the present application, when step S200 is performed, the image dataset is normalized to generate a training dataset. Specifically, normalization processing may be performed on all image data in the image data set, so that each vehicle image and its corresponding tag image are unified to a preset size. In this embodiment, the preset size may be set to 800×800 so that it is advantageous for the convergence of the model.
In one embodiment of the present application, when step S300 is performed, an initial vehicle image retrieval model is constructed based on the local attention mechanism, and an optimizer and a loss function of the initial vehicle image retrieval model are configured. Specifically, an encoder-decoder of an initial vehicle image retrieval model is constructed based on a local attention mechanism. The model retrieval efficiency can be improved on the premise of meeting accuracy by using a local attention mechanism with a fixed window size.
Referring to fig. 3, fig. 3 is a flow chart illustrating an embodiment of step S300 in fig. 1. In one embodiment of the present application, when step S300 is performed, steps S310 to S320 may be included, and the following is described in detail:
step S310, setting an optimizer of an initial vehicle image retrieval model as a random gradient descent optimizer;
step S320, calculating a loss function of the initial vehicle image retrieval model according to the cross entropy classification loss function.
In one embodiment of the present application, when steps S310 to S320 are performed. Specifically, the optimizer of the initial vehicle image retrieval model can be set as a random gradient descent (SGD) optimizer, thereby realizing control of update step length and iterative update of network parametersThe network prediction result is closer to the true value, and the effect of training the network is achieved. Meanwhile, the loss function can be classified according to the cross entropy, and the final loss function can be obtained through calculation. In this embodiment, the loss function may be obtained according to the weight coefficient of the local area generated by the local attention module. Loss function Loss final The following formula may be satisfied:
where k represents a local feature vector, J (V I ) The global feature vector is represented as such,representing local feature weight coefficients generated based on local feature vectors and global feature vectors, M (R) representing a feature vector after local region pooling, Ω representing all local regions after multi-scale sampling, and R representing one of all local regions.
Further, after setting the optimizer and the loss function of the initial vehicle image retrieval model, the built initial vehicle image retrieval model may be pre-trained using a preset data set to increase the model convergence speed. In this embodiment, the default dataset may be an ImageNet dataset.
In one embodiment of the present application, when step S400 is performed, the training data set is used as an input variable of the initial vehicle image retrieval model, and the initial vehicle image retrieval model is trained and optimized to obtain the target vehicle image retrieval model. Specifically, it should be noted that, first, a model training strategy is also formulated before model training is performed. In one embodiment of the application, the preset ratio 4 of the training image set and the verification image set may be based on: and 1, formulating a model training strategy. Specifically, it may be set to calculate the current model accuracy on the training image set after every 10 epochs from the 100 th epoch end and to preserve the model accuracy. Wherein epoch characterizes the process of training once using all samples in the training image set. The current model accuracy is then calculated every 10 epochs and compared to the previous model accuracy, and if the accuracy of the next model exceeds the progress of the previous model, the previous model is replaced, otherwise the previous model is retained. And training and optimizing the initial vehicle image retrieval model according to the model training strategy by taking the training data set as an input variable of the initial vehicle image retrieval model, so that the model is suitable for processing the vehicle image. In the embodiment, model training is performed on the basis of the training image set, model accuracy verification is performed on the basis of the verification image set, meanwhile, the model stores optimal weight, the accuracy of the training image set and the verification set is recorded, parameter adjustment is facilitated, and finally the target vehicle image retrieval model meeting the preset accuracy requirement is obtained.
Referring to fig. 4, fig. 4 is a flow chart illustrating an embodiment of step S400 in fig. 1. In one embodiment of the present application, when step S400 is performed, step S400 may include steps S410 to S430, which are described in detail as follows:
step S410, carrying out feature extraction processing on the training data set to generate a local feature vector of each local image;
step S420, carrying out hash processing on the local feature vectors to generate hash codes of all the local images;
step S430, generating hash codes of the whole image based on the hash codes of the partial images.
Referring to fig. 5, fig. 5 is a flow chart illustrating an embodiment of step S410 in fig. 4. In one embodiment of the present application, when step S410 is performed, steps S411 to S413 may be included, and the following is described in detail:
step S411, carrying out overall feature extraction processing on the training data set to generate overall features of the vehicle image;
step S412, performing multi-scale sampling and co-scale transformation processing on the vehicle image to generate multi-scale features;
and step S413, carrying out maximum pooling processing and post-processing on the multi-scale features to generate feature vectors corresponding to each local image.
In one embodiment of the present application, when step S411 is performed, the training data set is subjected to the global feature extraction process to generate global features of the vehicle image. Specifically, the training data set may be subjected to overall feature extraction processing through a deep convolutional network to extract overall features of each vehicle image.
Referring to fig. 6, fig. 6 is a schematic diagram of a multi-scale sampling structure according to an embodiment of the application. In one embodiment of the present application, when step S412 is performed, the vehicle image is subjected to multi-scale sampling and co-scale transformation to generate multi-scale features. Specifically, based on the initial vehicle image retrieval model, the vehicle image 601 is multi-scale sampled using an R-MAC (Regional Maximum Activation of Convolutions, local feature accumulator) to extract multi-scale features 602, and co-scale transformed to map into depth features extracted by the convolutional neural network. In this embodiment, the R-MAC may be utilized to perform multi-scale sampling, for example 55 times. Wherein, the sampling overlap rate can be set to 0.4, and the sampling area size can satisfy the following formula:
R s =2min(W,H)/(s+1)
where W and H represent the width and height of the feature map, respectively, and s represents the scale ratio of the samples.
Further, the coordinates of the R-MAC samples are subjected to co-scale transformation, and the characteristics of the sampled local area can be obtained according to the transformed coordinates. In this embodiment, the co-scale variation coefficient may be set to 0.03125. The global feature size of the picture extracted by using the convolutional neural network can be set to 25×25, and the scale normalization processing is performed according to the image dataset, wherein the size of the image dataset can satisfy the following formula:
Ω=C s (V I )
wherein V is I A convolution feature map representing a vehicle image sample, C s Is a local area sampler, can obtain multi-scale characteristics of a vehicle image sample after the same-scale transformation, can have different receptive fields in the characteristic diagram under the same scale, and is fused with the characteristics after superpositionFIG. 603 is a diagram featuring multiple receptive fields, more representative.
Referring to fig. 7, fig. 7 is a schematic diagram illustrating a partial feature hashing structure according to an embodiment of the application. In one embodiment of the present application, when step S413 is performed, the multi-scale features are subjected to a max pooling process and a post-process to generate feature vectors corresponding to each partial image. Specifically, the multi-scale features are pooled and post-processed by the pooling layer 701. In this embodiment, after the multi-scale features of the vehicle image sample are obtained, the multi-scale sampled features are pooled first, so that the local features of different scales are consistent in size, so as to facilitate subsequent processing. In this embodiment, data processing is performed by maximum pooling.
Further, after the maximum pooling processing, post-processing is required to be performed on the data after the maximum pooling processing, so as to facilitate subsequent generation of the corresponding hash code. In the post-processing, a local feature vector R generated after post-processing of the local feature I The following formula may be satisfied:
R I =P(M(R))
where M (R) represents maximum pooling processing of local features, and P (M (R)) represents post-processing of data after the maximum pooling processing, the purpose of the post-processing being to obtain local feature vectors.
Referring to fig. 7, in one embodiment of the present application, when step S420 is performed, a hashing process is performed on the local feature vectors to generate hash codes of each local image. Specifically, after the local feature vector is obtained through post-processing, the local feature vector is hashed by using the hash layer 702, so as to obtain a hash code corresponding to the local feature. The local feature vector hashing may satisfy the following formula:
R H =H(R I )
wherein H (R) I ) The local feature vector generated after post-processing is hashed by the hash layer 702.
Referring to fig. 8, fig. 8 is a flow chart illustrating an embodiment of step S430 in fig. 4. In one embodiment of the present application, when step S430 is performed, steps S431 to S432 may be included, which are described in detail below:
step S431, calculating the weight coefficient of each local image;
step S432, based on the weight coefficient and the hash code of the local image, the hash code of the whole image is generated.
In one embodiment of the present application, when performing steps S413 to S432, the context awareness module may be used to calculate the weight coefficient of each partial image, and obtain the hash code of the whole image through the weight coefficient and the hash code of the partial image. Specifically, the context sensing module performs estimation analysis on each local area to obtain the weight coefficient of each local area, the hash code of the whole image is obtained through the weight coefficient and the hash code of the local image, and finally, the precise image retrieval is performed through the hash code,
in one embodiment of the present application, when step S431 is performed, i.e., the weight coefficient of each partial image is calculated. Specifically, through a context sensing module, each local area is estimated and analyzed, and the weight coefficient of each local area is obtained. First, a context-aware global feature vector may be calculated, while a regional attention weighting coefficient is conditionally calculated, wherein the regional attention weighting coefficient may satisfy the following formula:
where k represents a local feature vector, J (V I ) Representing the global feature vector, the local feature and the global feature may be considered jointly as input to Φ () in order to conditionally consider the regional attention weight.
In one embodiment of the present application, when step S432 is performed, i.e., based on the weight coefficients and the hash code of the partial image, the hash code of the entire image is generated. Specifically, the context awareness module is used for generating hash codes of the whole image based on the hash codes of the local images, namely the hash codes of the global context awareness are obtained, and the hash codes of the whole image can meet the following formula:
wherein,,representing the regional attention weighting coefficients generated based on the local features and the global features, H (P (M (R))) representing the hash code generated based on the local features, Ω representing all local regions after all R-MAC multiscale sampling, Ω|representing the number of regions of the multiscale sampling.
Further, after the hash code of the entire image is generated, the vehicle target search library can be used for searching. Firstly, comparing the generated hash code with the hash code of the vehicle image in the vehicle target search library; then, the relevant vehicle pictures are ranked according to the hamming distance and listed side by side.
Referring to fig. 9, the application further provides a training device for the vehicle image retrieval model, where the training device corresponds to the training method in the above embodiment one by one. The training apparatus may include an image acquisition module 901, a data processing module 902, a model building module 903, and a model training module 904. The functional modules are described in detail as follows:
the image acquisition module 901 may be configured to acquire an image dataset of a vehicle, wherein the image dataset includes a training image set, a verification image set, and a test image set. Further, the image acquisition module 901 may be specifically configured to acquire vehicle images of a plurality of scenes; respectively carrying out label labeling processing on the vehicle images to generate corresponding image data sets, wherein the image data sets comprise a plurality of image data; and dividing all the image data into a training image set, a verification image set and a test image set according to a preset proportion.
The data processing module 902 may be used to normalize the image data set to generate a training data set. Specifically, the data processing module 902 may normalize all the image data in the image data set, so that each vehicle image and its corresponding label image are unified to a preset size.
The model building module 903 may be used to build an initial vehicle image retrieval model based on the local attention mechanism and configure an optimizer and a loss function of the initial vehicle image retrieval model. Further, the model building module 903 may be specifically configured to set an optimizer of the initial vehicle image retrieval model to be a random gradient descent optimizer; and calculating a loss function of the initial vehicle image retrieval model according to the cross entropy classification loss function.
The model training module 904 is operable to train and optimize an initial vehicle image retrieval model using the training dataset as an input variable to the initial vehicle image retrieval model to obtain a target vehicle image retrieval model. Further, the model training module 904 may be specifically configured to perform feature extraction processing on the training data set to generate a local feature vector of each local image; hashing the local feature vectors to generate hash codes of each local image; and generating the hash code of the whole image based on the hash codes of the partial images.
For specific limitations of the training device, reference may be made to the limitations of the training method described above, and will not be repeated here. The various modules in the training device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
Referring to fig. 10, the present application further provides a computer device, which may be a server. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes non-volatile and/or volatile storage media and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external client via a network connection. The computer program is executed by a processor to perform the functions or steps of a training method for a vehicle image retrieval model.
Referring to fig. 11, the present application also provides another computer device, which may be a client. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external server via a network connection. The computer program is executed by a processor to perform the functions or steps of a training method for a vehicle image retrieval model.
In one embodiment of the application, a computer device is provided comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring an image dataset of a vehicle, wherein the image dataset comprises a training image set, a verification image set and a test image set;
normalizing the image dataset to generate a training dataset;
constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model;
and training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.
In one embodiment of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring an image dataset of a vehicle, wherein the image dataset comprises a training image set, a verification image set and a test image set;
normalizing the image dataset to generate a training dataset;
constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model;
and training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.
It should be noted that, the functions or steps that can be implemented by the computer readable storage medium or the computer device may correspond to those described in the foregoing method embodiments, and are not described herein for avoiding repetition.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
In summary, the application provides a training method, device, equipment and medium for a vehicle image retrieval model, which can be applied to the technical field of image processing. The application can be applied to vehicle image retrieval by utilizing the technical means of semantic hashing, feature fusion, attention mechanism and the like, improves the retrieval speed and accuracy aiming at the vehicle images, and solves the problem of poor effect of the conventional vehicle image retrieval algorithm applied to a large-scale vehicle image library. Based on a convolutional neural network based on deep learning, according to the input vehicle images of a plurality of scenes, different areas of different scales of the images are automatically sampled; and then the fusion of the region features with different scales is realized through the context sensing module, and more representative vehicle image features are obtained. The cross entropy classification loss function is used, so that the training effect of the model is more stable and the effect is better. Meanwhile, the enhancement operation during query is realized by utilizing the query expansion technology, more accurate retrieval is performed, and higher retrieval precision is achieved.
In the description of the present specification, the descriptions of the terms "present embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The embodiments of the application disclosed above are intended only to help illustrate the application. The examples are not intended to be exhaustive or to limit the application to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims (10)

1. A training method for a vehicle image retrieval model, comprising:
acquiring an image dataset of a vehicle;
normalizing the image dataset to generate a training dataset;
constructing an initial vehicle image retrieval model based on a local attention mechanism, and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and
and taking the training data set as an input variable of the initial vehicle image retrieval model, training and optimizing the initial vehicle image retrieval model, and obtaining a target vehicle image retrieval model.
2. The training method of a vehicle image retrieval model according to claim 1, wherein the step of acquiring an image dataset of a vehicle comprises:
acquiring vehicle images of a plurality of scenes;
respectively carrying out label labeling processing on the vehicle images to generate corresponding image data sets, wherein the image data sets comprise a plurality of image data; and
dividing all the image data into a training image set, a verification image set and a test image set according to a preset proportion.
3. The training method of a vehicle image retrieval model according to claim 1, characterized in that the step of configuring an optimizer and a loss function of the initial vehicle image retrieval model comprises:
setting an optimizer of the initial vehicle image retrieval model as a random gradient descent optimizer; and
and calculating the loss function of the initial vehicle image retrieval model according to the cross entropy classification loss function.
4. A method according to claim 3The training method of the vehicle image retrieval model is characterized in that the Loss function Loss of the initial vehicle image retrieval model final The following formula may be satisfied:
wherein k represents a local feature vector, J # I ) The global feature vector is represented as such,the method is characterized in that the method comprises the steps of representing local feature weight coefficients generated based on local feature vectors and global feature vectors, M (R) represents a feature vector obtained by pooling local areas, omega represents all local areas obtained by multi-scale sampling, and R represents one area in all local areas.
5. The training method of the vehicle image retrieval model according to claim 2, characterized in that the step of training and optimizing the initial vehicle image retrieval model using the training data set as an input variable of the initial vehicle image retrieval model includes:
performing feature extraction processing on the training data set to generate a local feature vector of each local image;
carrying out hash processing on the local feature vectors to generate hash codes of the local images; and
and generating the hash codes of the whole image based on the hash codes of the local images.
6. The training method of a vehicle image retrieval model according to claim 5, wherein the step of performing feature extraction processing on the training data set to generate a local feature vector for each local image includes:
carrying out overall feature extraction processing on the training data set to generate overall features of the vehicle image;
performing multi-scale sampling and co-scale transformation on the vehicle image to generate multi-scale features; and
and carrying out maximum pooling treatment and post-treatment on the multi-scale features to generate feature vectors corresponding to each local image.
7. The training method of the vehicle image retrieval model according to claim 5, wherein the step of generating a hash code of the entire image based on the hash codes of the respective partial images includes:
calculating the weight coefficient of each local image; and
and generating the hash code of the whole image based on the weight coefficient and the hash code of the local image.
8. A training device for a vehicle image retrieval model, comprising:
the system comprises an image acquisition module, a test image acquisition module and a display module, wherein the image acquisition module is used for acquiring an image data set of a vehicle, and the image data set comprises a training image set, a verification image set and a test image set;
the data processing module is used for carrying out normalization processing on the image data set so as to generate a training data set;
the model construction module is used for constructing an initial vehicle image retrieval model based on a local attention mechanism and configuring an optimizer and a loss function of the initial vehicle image retrieval model; and
the model training module is used for training and optimizing the initial vehicle image retrieval model by taking the training data set as an input variable of the initial vehicle image retrieval model to obtain a target vehicle image retrieval model.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the training method of the vehicle image retrieval model according to any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the training method of the vehicle image retrieval model according to any one of claims 1 to 7.
CN202310863394.2A 2023-07-13 2023-07-13 Training method, device, equipment and medium for vehicle image retrieval model Pending CN116798010A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310863394.2A CN116798010A (en) 2023-07-13 2023-07-13 Training method, device, equipment and medium for vehicle image retrieval model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310863394.2A CN116798010A (en) 2023-07-13 2023-07-13 Training method, device, equipment and medium for vehicle image retrieval model

Publications (1)

Publication Number Publication Date
CN116798010A true CN116798010A (en) 2023-09-22

Family

ID=88043723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310863394.2A Pending CN116798010A (en) 2023-07-13 2023-07-13 Training method, device, equipment and medium for vehicle image retrieval model

Country Status (1)

Country Link
CN (1) CN116798010A (en)

Similar Documents

Publication Publication Date Title
CN110866140B (en) Image feature extraction model training method, image searching method and computer equipment
CN106909924B (en) Remote sensing image rapid retrieval method based on depth significance
Kulkarni et al. Classification of cultural heritage sites using transfer learning
CN108960330B (en) Remote sensing image semantic generation method based on fast regional convolutional neural network
CN108197326B (en) Vehicle retrieval method and device, electronic equipment and storage medium
CN109960742B (en) Local information searching method and device
CN109993102B (en) Similar face retrieval method, device and storage medium
CN112966137B (en) Image retrieval method and system based on global and local feature rearrangement
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
CN111680176A (en) Remote sensing image retrieval method and system based on attention and bidirectional feature fusion
CN110795527B (en) Candidate entity ordering method, training method and related device
CN113570029A (en) Method for obtaining neural network model, image processing method and device
CN106033426A (en) Image retrieval method based on latent semantic minimum hash
CN112699941B (en) Plant disease severity image classification method, device, equipment and storage medium
Tang et al. A fast inference networks for SAR target few-shot learning based on improved siamese networks
CN112633423B (en) Training method of text recognition model, text recognition method, device and equipment
WO2023125628A1 (en) Neural network model optimization method and apparatus, and computing device
KR20220017497A (en) Methods, devices and devices for image feature extraction and training of networks
CN115982597A (en) Semantic similarity model training method and device and semantic matching method and device
Fadavi Amiri et al. Improving image segmentation using artificial neural networks and evolutionary algorithms
CN114036326B (en) Image retrieval and classification method, system, terminal and storage medium
CN116844056A (en) SAR target detection method combining self-supervision learning and knowledge distillation
CN110674342B (en) Method and device for inquiring target image
CN114595741B (en) High-dimensional data rapid dimension reduction method and system based on neighborhood relation
CN116824572A (en) Small sample point cloud object identification method, system and medium based on global and part matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination