CN115049895A

CN115049895A - Image attribute identification method, attribute identification model training method and device

Info

Publication number: CN115049895A
Application number: CN202210675231.7A
Authority: CN
Inventors: 蒋旻悦; 于越; 杨喜鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2022-09-13
Anticipated expiration: 2042-06-15
Also published as: CN115049895B

Abstract

The invention provides an image attribute identification method, an attribute identification model training method, an apparatus, a device and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical fields of image processing, computer vision and the like, and can be applied to scenes such as intelligent transportation, smart cities and the like. The specific implementation scheme is as follows: a first image is acquired, the first image including an image of a target object. Then, a second image may be determined from the first image, the second image being an image of a partial region including the target object in the first image. And inputting the first image and the second image into a single attribute identification model to obtain the target image attribute. The target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object. According to the method and the device, the attribute recognition can be simultaneously carried out on different images through one model, the model recognition efficiency is improved, and the running speed reduction caused by the common running of a plurality of models is avoided.

Description

Image attribute identification method, attribute identification model training method and device

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the technical fields of image processing, computer vision and the like, can be applied to scenes such as intelligent transportation, smart cities and the like, and particularly relates to an image attribute identification method and an attribute identification model training method and device.

Background

With the development of the times, attribute recognition on images is becoming more common at present. In some scenarios, a video may be captured or an image may be taken by the capture device. Then, the video or the image acquired by the acquisition device may be identified to determine corresponding attribute information. For example, the image may be recognized to determine that the image has attribute information such as flowers, pedestrians, vehicles, and guideboards.

Disclosure of Invention

The disclosure provides an image attribute identification method, an attribute identification model training method, an apparatus, a device and a storage medium.

According to a first aspect of the present disclosure, there is provided an image attribute identification method, which may include: a first image is acquired, the first image being an image including a target object. Then, a second image may be determined from the first image, the second image being an image of a partial region including the target object in the first image. And inputting the first image and the second image into a single attribute identification model to obtain the target image attribute. The target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object. According to the method and the device, the attribute recognition can be simultaneously carried out on different images through one model, the model recognition efficiency is improved, and the running speed reduction caused by the common running of a plurality of models is avoided.

According to a second aspect of the present disclosure, there is provided an attribute recognition model training method, which may include: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises at least one training sample group, and each training sample group comprises a first training image carrying a first label, a second training image carrying a second label and area position information. The first training image is an image including a target object, the second training image is a partial region image including the target object in the first training image, and the region position information is used for representing the region position of the second training image in the first training image. Then, for each training sample set, the training sample set may be input to a single initial recognition model, and first training attribute information of the first training image and second training attribute information of the second image may be determined. And adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine the single first attribute recognition model. The single model is obtained through training, so that the single model can be used for carrying out attribute recognition on different images simultaneously, the model recognition efficiency is improved, and the running speed reduction caused by the joint running of a plurality of models is avoided.

According to a third aspect of the present disclosure, there is provided an image attribute identification apparatus including: the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first image, and the first image is an image comprising a target object; the determining module is used for determining a second image according to the first image, wherein the second image is a partial area image including the target object in the first image; and the identification module is used for inputting the first image and the second image into a single attribute identification model to obtain the target image attribute, wherein the target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object. According to the method and the device, the attribute recognition can be simultaneously carried out on different images through one model, the model recognition efficiency is improved, and the running speed reduction caused by the common running of a plurality of models is avoided.

According to a fourth aspect of the present disclosure, there is provided an attribute recognition model training device, including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a training sample set, the training sample set comprises at least one training sample group, each training sample group comprises a first training image carrying a first label, a second training image carrying a second label and region position information, the first training image is an image comprising a target object, the second training image is a partial region image comprising the target object in the first training image, and the region position information is used for representing the region position of the second training image in the first training image; the training module is used for inputting the training sample group into a single initial recognition model aiming at each training sample group and determining first training attribute information of a first training image and second training attribute information of a second image; and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine the single first attribute recognition model. The single model is obtained through training, so that the single model can be used for carrying out attribute recognition on different images simultaneously, the model recognition efficiency is improved, and the running speed reduction caused by the joint running of a plurality of models is avoided.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the first or second aspects.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of the first or second aspects.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first or second aspects described above.

According to the image attribute identification method, the attribute identification model training device, the attribute identification equipment and the storage medium, the simultaneous attribute identification of different images through one model is ensured, the model identification efficiency is improved, and the reduction of the running speed caused by the joint running of a plurality of models is avoided.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of an embodiment of the present disclosure;

FIG. 2 is a flow chart of an image attribute identification method of an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an attribute identification model according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of another image attribute identification method of embodiments of the present disclosure;

FIG. 5 is a schematic diagram of a feature extraction branch structure according to an embodiment of the disclosure;

FIG. 6 is a schematic diagram of another attribute identification model configuration according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of yet another image attribute identification method of an embodiment of the present disclosure;

FIG. 8 is a flowchart of an attribute recognition model training method according to an embodiment of the present disclosure;

FIG. 9 is a flow chart of another method for training an attribute recognition model in accordance with an embodiment of the present disclosure;

FIG. 10 is a flowchart of yet another method for training an attribute recognition model according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of an image attribute identification device according to an embodiment of the disclosure;

FIG. 12 is a schematic diagram of an attribute recognition model training apparatus according to an embodiment of the present disclosure;

fig. 13 is a schematic diagram of an image attribute recognition device and an attribute recognition model training device according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure is mainly applied to a scenario, for example, a scenario in which attribute recognition is performed on a vehicle. As shown in fig. 1. The scene includes a photographed object vehicle 110, a photographing apparatus 120 for photographing an image, and a network apparatus 130. It can be seen that the capture device 120 can take a picture to obtain an image containing the vehicle 110 when the vehicle 110 is normally running or parked. Of course, in some examples, the capture device 120 may also continuously capture images for a preset period to obtain a video including the vehicle 110. The capture device 120 may then send the captured image or video to the network device 130 via wired or wireless means. So that the network device 130 performs attribute recognition on the image or video captured by the capturing device 120. For example, the attribute recognition may be performed directly on the image captured by the capture device 120, or may be performed on the corresponding frame in the video captured by the capture device 120. It will be appreciated that any frame in a video may be considered an image.

In some cases, due to some task requirements, attribute recognition may be required on the image to obtain corresponding attributes. For example, an image containing a vehicle may be identified to determine attributes associated with the vehicle. Taking the image as the vehicle image as an example, if a corresponding attribute identification needs to be performed on the license plate in the vehicle for a vehicle image containing the vehicle, the attribute related to the license plate cannot be effectively identified in most cases because the license plate in the vehicle image is too small.

In some examples, additional separate models may be used to identify attributes associated with the license plate by inputting the associated license plate image. For other vehicle-related attributes, it may be necessary to configure the corresponding model separately for the corresponding attribute identification. However, when a plurality of models are deployed and used for identification aiming at different attributes, the parallel operation of the plurality of models will severely limit the respective operation speeds of the plurality of models, and further the overall identification speed is too slow, which seriously affects the use experience.

Therefore, the present disclosure provides an image attribute recognition method, and a corresponding attribute recognition model training method. The attribute recognition is carried out on different images through one model, so that the model recognition efficiency is improved, and the running speed reduction caused by the common running of a plurality of models is avoided.

The present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 2 is a flowchart illustrating an image attribute identification method according to an exemplary embodiment of the present disclosure. The method may use a pre-trained attribute recognition model for attribute recognition, and the method may be applied to a network device, such as the network device 130.

In some examples, the network device may be, for example, a server or a cluster of servers. Of course, the present disclosure is not limited to this, and may also be a server or a server cluster disposed on a virtual machine.

Of course, in some examples, the method may also be applied to a terminal device. For example, the terminal device may include, but is not limited to, any terminal device or portable terminal device such as a mobile phone, wearable device, tablet, handheld computer, notebook, ultra-mobile personal computer (UMPC), netbook, Personal Digital Assistant (PDA), laptop (laptop), mobile computer, Augmented Reality (AR) device, Virtual Reality (VR) device, Artificial Intelligence (AI) device, and/or in-vehicle device.

The method to which the present disclosure relates may comprise the steps of:

s201, a first image is acquired.

In some examples, the first image may be pre-stored in the network device 130 or a database, or may be acquired from the acquisition device 120 in a wired or wireless manner, which is not limited in this disclosure. The first image is an image including a target object.

S202, determining a second image according to the first image.

In some examples, the second image may be extracted from the first image, for example, the second image may be a partial region image including the target object in the first image.

Of course, in some examples, the second image may be determined from the first image in advance. For example, the capture device 120 may determine the second image based on the first image after capturing the first image. The network device 130 may also directly acquire the generated second image from the capture device 120. Of course, in some examples, the second image may also be pre-stored in the network device 130 or a database so that the network device 130 may directly obtain the second image when needed.

S203, inputting the first image and the second image into a single attribute recognition model to obtain the target image attribute.

In one example, the first image and the second image may be input into a single attribute recognition model trained in advance, so that the single attribute recognition model performs attribute recognition on the first image and the second image to obtain the target image attribute. The target image attribute may include first attribute information corresponding to the first image and second attribute information corresponding to the second image.

According to the method and the device, the attribute recognition can be simultaneously carried out on different images through one model, the model recognition efficiency is improved, and the running speed reduction caused by the common running of a plurality of models is avoided.

The solution described in fig. 2 will be described in more detail below in connection with other figures.

As an exemplary embodiment, FIG. 3 is a schematic diagram of an attribute recognition model structure shown in an exemplary embodiment according to the present disclosure. As can be seen from fig. 3, the attribute recognition model 300 may include a first feature extraction branch 310, a second feature extraction branch 320, a feature fusion layer 330, a fully-connected layer 340, and a normalization layer 350.

Fig. 4 is a flowchart illustrating another image attribute identification method according to an exemplary embodiment of the present disclosure. The obtaining of the target image attribute in S203 may be based on the model structure shown in fig. 3, and may further include:

s401, respectively extracting the features of the first image and the second image through different channels of the single attribute identification model to obtain a first image feature and a second image feature.

In some examples, the first image may be input to the first feature extraction branch 310 of the single attribute recognition model for feature extraction, so as to obtain a first image feature corresponding to the first image.

Of course, in some examples, the first feature extraction branch 310 may be a Convolutional Neural Network (CNN), which may include an input layer 510 and at least one intermediate layer 520, as shown in fig. 5. It is understood that the input layer 510 serves to extract various information of the input image. So that the extracted information is input into the at least one intermediate layer 520 for a corresponding convolution operation to obtain the first image feature.

In some examples, each intermediate layer may be comprised of a convolutional layer, a pooling layer, and/or a downsampling layer. The intermediate layer is not particularly limited by this disclosure.

Continuing with fig. 3, in other examples, the second image may be input to the second feature extraction branch 320 of the single attribute identification model for feature extraction to determine the second image feature corresponding to the second image.

It is understood that the structure of the second feature extraction branch 320 may be similar to that of the first feature extraction branch 310, and specifically, the structure shown in fig. 5 may be referred to, and the detailed description of the disclosure is omitted here.

S402, identifying and obtaining the target image attribute based on the first image characteristic and the second image characteristic.

In some examples, after the first image feature and the second image feature are determined in S401, attribute recognition may be performed according to the first image feature and the second image feature to recognize target image data, that is, the first attribute information and the second attribute information.

According to the method and the device, through different channels in the single model, the attribute identification of different images is realized, the attribute identification of different images by using a plurality of models is avoided, the identification efficiency of the models is improved, and the reduction of the running speed caused by the common running of the plurality of models is avoided.

In some embodiments, when the second image is determined at S202, the method may further include: region location information is determined. The region position information is used to indicate a region position of the second image in the first image.

It is to be understood that, in some examples, when the second image is determined from the first image in S202, the region position of the second image in the first image may also be determined. For example, the network device or the terminal device may detect the first image by using a pre-configured detection model to determine the area location information.

Therefore, S402 identifies the target image attribute based on the first image feature and the second image feature, and may further include: based on the region location information, a region sub-feature is determined from the first image feature. And fusing the regional sub-features and the second image features to obtain fused features. Based on the fused features, second attribute information is determined.

In some examples, region sub-features of the respective region may be determined from the first image feature based on the determined region location information. The region sub-features may then be fused with the second image features to obtain fused features. Then, attribute identification may be performed based on the fused features to determine second attribute information.

For example, the first image feature and the second image feature may be input into the feature fusion layer 330 of a single attribute recognition model. Meanwhile, the region location information is also input into the feature fusion layer 330 of the attribute identification model. The feature fusion layer 330 of the attribute recognition model determines the region sub-features from the first image features based on the region location information. And then fusing the regional sub-features and the second image features to determine fused features. It should be understood that the region sub-feature is a partial feature corresponding to the region position information in the first image feature.

In some examples, the feature fusion layer 330 of the attribute identification model may determine, based on the region location information, a partial feature, i.e., a region sub-feature, in the first image feature at a location indicated by the region location information. The region sub-feature may then be fused with the second image feature to obtain a fused feature. Taking two-dimensional coordinates as an example, it can be assumed that the region position information includes 4 coordinates, such as a (x1, y1), B (x2, y2), C (x3, y3), and D (x4, y4), respectively. Based on the A, B, C points and the D points, a partial region in the first image can be determined. The region represents a region position of the second image in the first image. Then, the sub-feature of the region corresponding to the partial region may be fused with the second image feature to obtain a fused feature. Obviously, the fusion feature will have more feature information, which is beneficial to determining the second attribute information of the second image according to the fusion feature.

It is to be understood that the respective coordinates in the above-mentioned area location information are merely exemplary descriptions and are not limiting to the present disclosure.

In some examples, the features may be directly superimposed, or may be superimposed based on a preset weight, or may be correspondingly fused in a preset manner, or may be fused in any equivalent manner. It is to be understood that the present disclosure is not limited to the particular manner of fusion.

In some examples, for the first attribute information, attribute recognition may be performed directly based on the first image feature to obtain the first attribute information. The specific mode can refer to the existing mode, and the details of the disclosure are not repeated herein.

The method and the device have the advantages that the regional sub-features and the second image features are fused based on the regional position information, so that the fusion features can carry more information, and more accurate second attribute information can be identified in a follow-up mode.

In some embodiments, for S402, identifying the target image attribute based on the first image feature and the second image feature may include: first attribute information is determined according to the first image features, and second attribute information is determined according to the second image fusion features. Alternatively, the first image feature and the fused feature are combined. And determining the first attribute information and the second attribute information according to the combined features.

In one example, in the attribute identification model, attribute identification may be performed on the first image feature and the fusion feature, respectively, to determine corresponding attribute information. That is, the first image feature and the fused feature are input to the fully-connected layer 340 of the attribute identification model in sequence, and the output of the fully-connected layer 340 is input into the normalization layer 350 of the attribute identification model. For example, the first image feature may be input to the fully-connected layer 340 of the attribute identification model, and then the output of the fully-connected layer 340 may be input to the normalization layer 350 of the attribute identification model to determine the first attribute information. The first attribute information represents corresponding attribute information in the first image. And, the fused features may be input to the fully-connected layer 340 of the attribute identification model, after which the output of the fully-connected layer 340 is input to the normalization layer 350 of the attribute identification model to determine the second attribute information. The second attribute information represents corresponding attribute information in the second image. It will be appreciated that each neuron in the fully-connected layer may correspond to a respective possible attribute. Thus, all possible attributes of the first image and all possible attributes of the second image may be present in the fully-connected layer 340.

Of course, in some examples, for the fully-connected layer 340 containing all possible attributes of the first image and all possible attributes of the second image, the first image feature and the fused feature may also be superimposed and input to the fully-connected layer 340 of the attribute recognition model. The output of the fully-connected layer 340 is then input to the normalization layer 350 of the attribute identification model to determine the target image attributes. The determined target image attribute may include first attribute information and second attribute information.

Of course, in still other examples, such as the model structure shown in fig. 6, the model 600 shown in fig. 6 includes a first feature extraction branch 610, a second feature extraction branch 620, a feature fusion layer 630, a first fully-connected layer 640, a first normalization layer 650, a second fully-connected layer 660, and a second normalization layer 670. The first feature extraction branch 610 is similar to the first feature extraction branch 310, the second feature extraction branch 620 is similar to the second feature extraction branch 320, and the feature fusion layer 630 is similar to the feature fusion layer 330, which may be specifically described with reference to fig. 3, and is not described herein again.

As shown in fig. 6, the first image feature determined by the first feature extraction branch 610 may be directly input into the first fully-connected layer 640. The output of the first fully connected layer 640 is then input to the first normalization layer 650, thereby determining first attribute information. And, the fused features determined by the feature fusion layer 630 may be input directly to the second fully-connected layer 660. The output of the second fully-connected layer 660 is then input to the second normalization layer 670, thereby determining second attribute information.

The method and the device can select different model structures based on different requirements, and can be suitable for more possible scenes.

As an exemplary embodiment, fig. 7 is a flowchart illustrating still another image attribute identification method according to an exemplary embodiment of the present disclosure. The method may comprise the steps of:

s701, acquiring a first image.

It is understood that the implementation process of S701 is similar to that of S201, and for convenience of description, the present disclosure is not repeated herein.

S702, the first image is cut and filled, and a second image is determined.

The network device or the terminal device may detect the first image by using a pre-configured detection model, and determine the position of the target object in the first image. And then, the first image is cut based on the position, and the first image is cut to obtain a local image of the corresponding area. The partial image may then be padded to determine a second image. It can be understood that the second image obtained after cropping and filling has the same pixel value as the first image, i.e. the image size is the same. Therefore, the position of the target object in the first image, that is, the above-described region position information.

In some examples, the filling may be performed by any of the existing methods. For example, the second image may be made to have the same size, i.e. pixels, as the first image. Of course, in other examples, the size of the second image may also be made to satisfy a preset condition, such as a preset fixed pixel size or L times the size of the first image, where L may be any positive number.

It can be understood that the purpose of filling the local images of the corresponding region of the first image is to ensure that the second image is not identical to the local images of the corresponding region of the first image, and to avoid the features after feature extraction from being identical.

Of course, in some embodiments, the scheme described above with respect to fig. 7 may also be implemented on an acquisition device, such as acquisition device 120. After the acquisition device 120 determines the second image by the method shown in fig. 7, the first image, the second image and the area location information may be sent to a network device, such as the network device 130.

It will of course be appreciated that fig. 7 only depicts the case where the second image can be determined from the first image. In other examples, the second image may be preconfigured, and the disclosure is not limited.

And S703, inputting the first image and the second image into a single attribute identification model to obtain the target image attribute.

It is understood that the implementation process of S703 is similar to S203, and for convenience of description, the disclosure is not repeated herein.

The second image can be determined through the first image, so that the fact that multiple types of different attributes can be accurately identified based on one image in the actual application process is guaranteed, and meanwhile the fact that the model can have certain identification precision is guaranteed.

In some embodiments, in the image attribute identification method described in fig. 2 to 7, the first image may be a vehicle image, the second image may be a license plate image, and the target object may be a license plate.

The vehicle image and the license plate image are taken as examples, so that when the relevant attributes of the vehicle are identified, the difficulty and complexity of model deployment can be reduced, meanwhile, the model identification efficiency can be ensured, and the model identification precision is improved.

As an exemplary embodiment, FIG. 8 is a flowchart of an attribute recognition model training method shown in an exemplary embodiment according to the present disclosure. The method may be applied to the network device 130, and the network device may train a single initial recognition model to obtain a trained single attribute recognition model. That is, before the method described in fig. 2 to 7 above, a single attribute recognition model can be trained by the method. Thus, the method may comprise the steps of:

and S801, acquiring a training sample set.

In some examples, the set of training samples may include at least one training sample set. Each training sample set may include a first training image carrying a first label, a second training image carrying a second label, and region location information. It will be appreciated that the region position information during the training phase is used to represent the region position of the second training image in the first training image. The first training image is an image including a target object, and the second training image is a partial region image including the target object in the first training image. The first label is used for representing real attribute information of the first training image, and the second label is used for representing real attribute information of the second training image.

It should be noted that the region position information described in the training phase is to be understood as the region position of the second training image in the first training image. Whereas the region position information in the application phase should be understood as the region position of the second image in the first image.

S802, aiming at each training sample group, inputting the training sample group into a single initial recognition model, and determining first training attribute information of a first training image and second training attribute information of a second image.

In some examples, in the training phase, iterative training may be performed on a per training sample set basis. For example, each training sample set may be successively input into a single initial recognition model. For each training sample set, first training attribute information of a first training image and second training attribute information of a second image may be determined at each training.

It can be understood that the model structure of a single initial recognition model may be as shown in fig. 3, fig. 5, and fig. 6, and the data processing process in the specific model is similar to the process of obtaining the target image attribute by recognition in fig. 2, fig. 4, and fig. 7, and may specifically refer to corresponding descriptions in fig. 2, fig. 4, and fig. 7, and the disclosure is not repeated herein.

And S803, adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information, and determining the single first attribute recognition model.

In some examples, parameters in the single initial recognition model may be adjusted based on the first label, the first training attribute information, the second label, and the second training attribute information and according to a loss function (loss) that is pre-configured. The loss function may be, for example, a cross-entropy loss function. Of course, in other examples, any possible loss function may be used for training. The present disclosure is not limited.

In some examples, after a large number of training sample sets are trained, until the model converges, a single first attribute recognition model is determined.

It will be appreciated that a single first attribute identification model, i.e. the single attribute identification model referred to in figures 2 to 7, is provided.

The single model is obtained through training, so that the single model can be used for carrying out attribute recognition on different images simultaneously, the model recognition efficiency is improved, and the running speed reduction caused by the joint running of a plurality of models is avoided.

It can be understood that, the corresponding parameters in each network layer in the model are adjusted in the process of training the model, so that after the training of the single initial recognition model is completed, each parameter in the model is also optimized, and thus the single first attribute recognition model is obtained. That is, a single initial recognition model differs from a single first attribute recognition model only in that the corresponding parameters in the models differ.

In one embodiment, FIG. 9 is a flowchart of a method for training an attribute recognition model according to an embodiment of the present disclosure. For example, determining first training attribute information of a first training image and second training attribute information of a second image in S802 may include the steps of:

s901, respectively extracting the features of the first training image and the second training image through different channels of a single initial recognition model to obtain the first training image feature and the second training image feature.

In some examples, feature extraction may be performed separately on the first training image and the second training image through different channels of a single initial recognition model. Thereby obtaining a first training image characteristic corresponding to the first training image and a second training image characteristic corresponding to the second training image.

Taking the model structures shown in fig. 3 and fig. 6 as an example, the first training image may be input into the first feature extraction branch 310 or the first feature extraction branch 610 of the single initial recognition model for feature extraction, so as to determine the first training image feature corresponding to the first training image. And inputting the second training image into the second feature extraction branch 320 or the second feature extraction branch 620 of the single initial recognition model for feature extraction, so as to determine a second training image feature corresponding to the second training image.

S902, obtaining first training attribute information based on the first training image feature recognition, and obtaining second training attribute information based on the second training image feature recognition.

In some examples, after the first training image feature and the second training image feature are determined through S901, attribute recognition may be performed according to the first training image feature to obtain first training attribute information through recognition; and performing attribute recognition according to the second training image characteristics to recognize and obtain second training attribute information.

In one embodiment, the identifying and obtaining second training attribute information based on the second training image feature in S902 may include: determining a region training sub-feature from the first training image features based on the region location information; fusing the regional training sub-features with the second training image features to obtain training fused features; second training attribute information is determined based on the training fusion features.

In some examples, region training sub-features may be determined from the first training image features based on the region location information. And then, fusing the region training sub-features and the second training image features to obtain training fusion features. Second training attribute information may then be determined based on the training fusion features.

For example, after determining the first training image feature and determining the second training image feature, the first training image feature and the second training image feature may be input into the feature fusion layer 330 or the feature fusion layer 630 of the single initial recognition model. In some cases, the region location information is also input into the feature fusion layer 330 or the feature fusion layer 630 of a single initial recognition model. The training fused features are obtained through the feature fusion layer 330 or the feature fusion layer 630.

It can be understood that the process of obtaining the training fusion feature is similar to the process of obtaining the fusion feature in the application stage, and specific reference may be made to the corresponding description, which is not repeated herein.

The method and the device have the advantages that the regional training sub-features and the second training image features are fused based on the regional position information, so that the fused training fusion features can carry more information, and more accurate second training attribute information can be identified in a follow-up mode.

In some embodiments, S803 may combine the first training image feature and the training fusion feature based on the model structure shown in fig. 3, and perform attribute recognition on the combined feature to determine the first training attribute information and the second training attribute information. Of course, in some examples, based on the model structure shown in fig. 3 or fig. 6, the attribute recognition may be performed on the first training image feature to determine corresponding first training attribute information, and the attribute recognition may be performed on the training fusion feature to determine second training attribute information, respectively.

The specific implementation process may refer to corresponding description in the application of the model, and this disclosure is not repeated herein.

It is understood that, in the training process, after the first attribute information and the second attribute information are determined, parameters in corresponding layers in a single initial recognition model may be adjusted by using a loss function in combination with the first label and the second label. In one example, the adjustment may be made using, for example, a cross entropy loss function.

For example, parameters of the corresponding layer in the first feature extraction branch 310 or the first feature extraction branch 610 may be adjusted using a cross entropy loss function according to the first label and the first attribute information. And (4) a large number of training sample sets in the training sample set are processed until the single initial recognition model converges. Similarly, the parameters of the corresponding layer in the second feature extraction branch 320 or the second feature extraction branch 620 may be adjusted by using a cross entropy loss function according to the second label and the second attribute information. A large number of first training sample sets in the first training set are passed until the single initial recognition model converges. When the training of the first feature extraction branch 310 or the first feature extraction branch 610, and the training of the second feature extraction branch 320 or the second feature extraction branch 620 are completed, the initial recognition model may be considered to be completed, and at this time, the parameters of the corresponding layer in the single initial recognition model are the parameters after the training is completed, so that the single initial recognition model after the training is the single first attribute recognition model.

As an exemplary embodiment, fig. 10 is a flowchart illustrating a further method for training an attribute recognition model according to an exemplary embodiment of the present disclosure. The method may obtain a single second attribute recognition model by performing secondary training on the single first attribute recognition model obtained by training in fig. 8 to 9. Therefore, after S803, the method may further include the steps of:

s1001, based on the region training sub-features and the second training image features, the single first attribute recognition model is adjusted, and the single second attribute recognition model is determined.

In some examples, after training to obtain a single first attribute identification model, further training may be performed. For example, the penalty function may be calculated based on the region training sub-features and the second training image features obtained at the time of training. For example, a cross entropy loss function may be employed. And continuously carrying out iterative training until the single first attribute recognition model converges to obtain a single second attribute recognition model.

In the application stage, the single second attribute identification model may only remain the first feature extraction branch 310 or the first feature extraction branch 610 in the model.

It will be appreciated that the above-described training process of fig. 10 is intended to make the features extracted by the first feature extraction branch in the attribute recognition model, whose corresponding region sub-features are closer to the features extracted by the second feature extraction branch in the single first attribute recognition model. Therefore, the features extracted by the first feature extraction branch in the single second attribute identification model can still effectively identify more attribute information, such as the second attribute information.

In some examples, when training to obtain a single second attribute recognition model, a new training sample set may also be used for training. For example, the second training sample set may include at least one second training sample group, and each second training sample group may include a third training image, a fourth training image, and second region location information. The second region position information is used for representing the region position of a fourth training image in a third training image, the third training image comprises an image of a target object, and the fourth training image is a partial region image of the third training image comprising the target object.

It may be understood that, in this example, the training sample set acquired in S801 may be referred to as a first training sample set, and the area location information in the first training sample set may be referred to as first area location information.

Then, each training sample group in the second training sample set is input to a single first attribute recognition model for training. For example, for each second training sample set, a third training image feature corresponding to a third training image and a fourth training image feature corresponding to a fourth image are determined. Then, second region training sub-features are determined from the third training image features based on the second region location information. It is to be understood that, in this example, the region training sub-features involved in training to obtain a single first attribute recognition model may be referred to as first region training sub-features. The single first attribute recognition model may then be adjusted using a loss function based on the second region training sub-features and the fourth training image features to determine a single second attribute recognition model. Of course, in other examples, any equivalent loss function may be used to adjust the first attribute model, and the disclosure is not limited thereto.

In some examples, the second region training sub-features may be determined by the feature fusion layer 330 or the feature fusion layer 630 of a single first attribute recognition model. For example, the

feature fusion layer

330 or 630 of the single first attribute recognition model may determine, based on the second region position information and the third training image feature, a partial feature corresponding to the second region position information in the third training image feature, where the partial feature is a second region training sub-feature.

The present disclosure determines a single second attribute identification model in the above manner, so that the schemes described in fig. 2 to 7 can also use the single second attribute identification model for image attribute identification. It is to be understood that since the single second attribute identification model only retains the first feature extraction branch, identification can be performed directly based on the first image, thereby identifying the first attribute information and the second attribute information. The scheme can further reduce the deployment difficulty of the model, reduce the space occupied by the model, simultaneously ensure the recognition efficiency and improve the recognition precision.

In some embodiments, the first training image may be an image of a vehicle, the second training image may be an image of a license plate, and the target object is the license plate.

Of course, in other embodiments, the third training image may be a vehicle image, and the fourth training image may be a license plate image.

By taking the example that the first feature extraction branch extracts the vehicle image features for the vehicle image and the second feature extraction branch extracts the license plate image features for the license plate image, it can be understood that the single first attribute identification model ensures that more accurate vehicle attribute information and license plate attribute information can be identified simultaneously by fusing the vehicle image features and the license plate image features. The single second attribute recognition model can ensure that the vehicle image features are extracted only by using the vehicle image, and more accurate vehicle attribute information and license plate attribute information can be finally recognized.

As an exemplary embodiment, the present disclosure will be set forth in the context of a vehicle image, a license plate image, and as described in conjunction with fig. 1. Of course, the network device 130 in fig. 1 may be replaced by a terminal device. The present embodiment will be described by taking a network device as an example.

For example, a vehicle image containing the vehicle 110 may be first acquired by the acquisition device 120. And then, detecting the vehicle image through a front detector to determine the position of the license plate. It will be appreciated that the pre-detector may be a pre-configured detection model. The detection model may be located on the acquisition device 120 or on the network device 130. For example, if a pre-configured detection model is located on the collection device 120, the collection device may detect the vehicle image and determine the license plate position. And then, cutting and filling a license plate area in the vehicle image to obtain a license plate image. And then sends the vehicle image, the license plate image, and the license plate position to the network device 130. For another example, if the pre-configured detection model is located on the network device 130, the acquisition device directly transmits the acquired vehicle image to the network device 130. The network device 130 detects the vehicle map according to a pre-configured detection model to determine the position of the license plate. And then, cutting and filling a license plate area in the vehicle image to obtain a license plate image. It can be understood that the vehicle image is the first image, the license plate image is the second image, the license plate position is the region position information, and the license plate is the target object.

Then, the network device may use the trained single attribute recognition model (i.e., the single first attribute recognition model) to perform corresponding feature extraction on the vehicle image and the license plate image, respectively. And determining license plate fusion characteristics based on the license plate position, the vehicle characteristics and the license plate characteristics. This process is similar to the process of determining the fusion characteristics in fig. 2-7, and the present disclosure is not repeated herein. And then, performing attribute identification on the vehicle features and the license plate fusion features to determine vehicle related attributes and license plate related attributes. The process of determining the vehicle-related attribute and the license plate-related attribute is similar to the process of determining the first attribute information and the second attribute information in fig. 2 to 7, and details of the disclosure are not repeated herein.

As an exemplary embodiment, when performing image attribute recognition, the present disclosure may further use a single second attribute recognition model determined by the method described in fig. 10. For example, the first image may be directly subjected to attribute recognition using a single second attribute recognition model to simultaneously determine the first attribute information and the second attribute information. Still, the first image is taken as a vehicle image, the first attribute information is taken as a vehicle related attribute, and the second attribute information is taken as a license plate related attribute for example. In this example, a single second attribute recognition model may be used to directly perform attribute recognition on the vehicle image to determine the vehicle-related attribute and the license plate-related attribute. The reason for this is that, in the course of training of the single second attribute recognition model, the distance between the features extracted by the first feature extraction branch and the features extracted by the second feature extraction branch is reduced. Therefore, more accurate features can be extracted by only utilizing the first feature extraction branch. For example, in the training process, the license plate part features in the vehicle features are gradually close to the license plate features determined by the second feature extraction branch through adjustment of the model parameters. Therefore, in the application stage, the feature extraction can be directly carried out on the vehicle image, and the license plate part features in the vehicle features can be well used for carrying out license plate related attribute recognition.

Based on the same conception, the embodiment of the disclosure also provides an image attribute recognition device and an attribute recognition model training device.

It is understood that, in order to implement the above functions, the image attribute identification apparatus and the attribute identification model training apparatus provided in the embodiments of the present disclosure include hardware structures and/or software modules corresponding to the respective functions. The disclosed embodiments can be implemented in hardware or a combination of hardware and computer software, in combination with the exemplary elements and algorithm steps disclosed in the disclosed embodiments. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

As an exemplary embodiment, fig. 11 is a schematic diagram of an image attribute recognition apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 11, an image attribute identification apparatus 1100 is provided, and the apparatus 1100 may implement any one of the methods mentioned in fig. 2 to 7. The apparatus 1100 may include: an obtaining module 1101, configured to obtain a first image, where the first image is an image that includes a target object; a determining module 1102, configured to determine a second image according to the first image, where the second image is a partial area image including the target object in the first image; the identifying module 1103 is configured to input the first image and the second image into a single attribute identification model to obtain a target image attribute, where the target image attribute includes first attribute information of the first image and second attribute information corresponding to the target object.

In one possible implementation, the identifying module 1103 is further configured to: respectively extracting the features of the first image and the second image through different channels of a single attribute identification model to obtain a first image feature and a second image feature; and identifying and obtaining the target image attribute based on the first image characteristic and the second image characteristic.

In one possible implementation, the determining module 1102 is further configured to: determining region position information, wherein the region position information is used for representing the region position of the second image in the first image; the identifying module 1103 is further configured to: determining region sub-features from the first image features based on the region location information; fusing the regional sub-features with the second image features to obtain fused features; based on the fused features, second attribute information is determined.

In one possible implementation, the determining module 1102 is further configured to: and cutting and filling the first image, and determining a second image, wherein the second image has the same pixel value as the first image.

In one possible embodiment, the first image is a vehicle image, the second image is a license plate image, and the target object is a license plate.

As an exemplary embodiment, FIG. 12 is a schematic diagram of an attribute recognition model training apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 12, an attribute recognition model training apparatus 1200 is provided, where the apparatus 1200 may implement any one of the methods mentioned above in fig. 8 to 10. The apparatus 1200 may include: an obtaining module 1201, configured to obtain a training sample set, where the training sample set includes at least one training sample group, and each training sample group includes a first training image carrying a first label, a second training image carrying a second label, and area position information, where the first training image is an image including a target object, the second training image is a partial area image including the target object in the first training image, and the area position information is used to indicate an area position of the second training image in the first training image; a training module 1202, configured to input a training sample group to a single initial recognition model for each training sample group, and determine first training attribute information of a first training image and second training attribute information of a second image; and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine the single first attribute recognition model.

In one possible implementation, the training module 1202 is further configured to: respectively extracting the features of the first training image and the second training image through different channels of a single initial recognition model to obtain the features of the first training image and the features of the second training image; and obtaining first training attribute information based on the first training image feature recognition, and obtaining second training attribute information based on the second training image feature recognition.

In one possible implementation, the training module 1202 is further configured to: determining a region training sub-feature from the first training image features based on the region location information; fusing the regional training sub-features with the second training image features to obtain training fused features; second training attribute information is determined based on the training fusion features.

In one possible implementation, the training module 1202 is further configured to: and adjusting the single first attribute recognition model based on the region training sub-features and the second training image features to determine a single second attribute recognition model.

The method and the device can further reduce the deployment difficulty of the model, reduce the space occupied by the model, simultaneously ensure the recognition efficiency and improve the recognition precision.

In one possible embodiment, the first training image is a vehicle image, the second training image is a license plate image, and the target object is a license plate.

The specific manner in which the various modules perform operations has been described in detail in relation to the apparatus of the present disclosure above, and will not be elaborated upon here.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an image attribute recognition device, an attribute recognition model training device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 13 shows a schematic block diagram of a device 1300 that may be used to implement embodiments of the present disclosure. It is to be appreciated that the apparatus 1300 may be an image attribute recognition apparatus or an attribute recognition model training apparatus. The apparatus 1300 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, server clusters, and other appropriate computers. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 13, the apparatus 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1302 or a computer program loaded from a storage unit 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data necessary for the operation of the device 1300 can also be stored. The calculation unit 1301, the ROM 1302, and the RAM 1303 are connected to each other via a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.

A number of components in the device 1300 connect to the I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, or the like; and a communication unit 1309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1309 allows the device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 1301 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1301 performs the methods and processes described above, such as any one of the image attribute recognition methods described in fig. 2 to 7, and/or any one of the attribute recognition model training methods described in fig. 8 to 10. For example, in some embodiments, any of the image attribute recognition methods described in fig. 2-7, and/or any of the attribute recognition model training methods described in fig. 8-10, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1308. In some embodiments, some or all of the computer program may be loaded onto and/or installed onto device 1300 via ROM 1302 and/or communications unit 1309. When loaded into RAM 1303 and executed by computing unit 1301, a computer program may perform one or more steps of an image attribute recognition method, and/or an attribute recognition model training method described above. Alternatively, in other embodiments, the computing unit 1301 may be configured in any other suitable manner (e.g., by means of firmware) to perform the image attribute recognition method described above with respect to fig. 2 to 7, and/or to perform the attribute recognition model training method described above with respect to fig. 8 to 10.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

According to the method and the device, the attribute recognition is carried out on different images through one model, the model recognition efficiency is improved, and the running speed reduction caused by the common running of a plurality of models is avoided.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain. Of course, in some examples, a server may also refer to a cluster of servers.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image attribute identification method, comprising:

acquiring a first image, wherein the first image is an image comprising a target object;

determining a second image according to the first image, wherein the second image is a partial area image including the target object in the first image;

and inputting the first image and the second image into a single attribute recognition model to obtain target image attributes, wherein the target image attributes comprise first attribute information of the first image and second attribute information corresponding to the target object.

2. The method of claim 1, wherein the deriving target image attributes comprises:

respectively extracting the features of the first image and the second image through different channels of the single attribute identification model to obtain a first image feature and a second image feature;

and identifying and obtaining the target image attribute based on the first image characteristic and the second image characteristic.

3. The method of claim 2, wherein the method further comprises:

determining region position information, wherein the region position information is used for representing the region position of the second image in the first image;

identifying a target image attribute based on the first image feature and the second image feature, comprising:

determining region sub-features from the first image features based on the region location information;

fusing the region sub-features with the second image features to obtain fused features;

determining the second attribute information based on the fused feature.

4. The method of any of claims 1-3, wherein the determining a second image from the first image comprises:

and cutting and filling the first image, and determining a second image, wherein the second image has the same pixel value as the first image.

5. The method of any one of claims 1-4, wherein the first image is a vehicle image, the second image is a license plate image, and the target object is a license plate.

6. An attribute recognition model training method comprises the following steps:

acquiring a training sample set, wherein the training sample set comprises at least one training sample group, each training sample group comprises a first training image carrying a first label, a second training image carrying a second label and region position information, the first training image is an image comprising a target object, the second training image is a partial region image comprising the target object in the first training image, and the region position information is used for representing the region position of the second training image in the first training image;

for each training sample set, inputting the training sample set into a single initial recognition model, and determining first training attribute information of the first training image and second training attribute information of the second training image;

and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information, and determining the single first attribute recognition model.

7. The method of claim 6, wherein the determining first training attribute information for the first training image and second training attribute information for the second training image comprises:

respectively extracting features of the first training image and the second training image through different channels of the single initial recognition model to obtain a first training image feature and a second training image feature;

and obtaining the first training attribute information based on the first training image feature recognition, and obtaining the second training attribute information based on the second training image feature recognition.

8. The method of claim 7, wherein the deriving the second training attribute information based on the second training image feature recognition comprises:

determining region training sub-features from the first training image features based on the region location information;

fusing the region training sub-features with the second training image features to obtain training fused features;

determining the second training attribute information based on the training fusion features.

9. The method of claim 8, wherein the method further comprises:

and adjusting the single first attribute recognition model based on the region training sub-features and the second training image features to determine a single second attribute recognition model.

10. The method of any of claims 6-9, wherein the first training image is a vehicle image, the second training image is a license plate image, and the target object is a license plate.

11. An image attribute identification apparatus comprising:

the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a first image, and the first image is an image comprising a target object;

a determining module, configured to determine a second image according to the first image, where the second image is a partial region image of the first image that includes the target object;

and the identification module is used for inputting the first image and the second image into a single attribute identification model to obtain the target image attribute, wherein the target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object.

12. The apparatus of claim 11, wherein the identification module is further configured to:

13. The apparatus of claim 12, wherein the means for determining is further configured to:

the identification module is further configured to:

determining the second attribute information based on the fused feature.

14. The apparatus of any of claims 11-13, wherein the means for determining is further configured to:

15. The apparatus of any one of claims 11-14, wherein the first image is a vehicle image, the second image is a license plate image, and the target object is a license plate.

16. An attribute recognition model training device, comprising:

an obtaining module, configured to obtain a training sample set, where the training sample set includes at least one training sample group, and each training sample group includes a first training image carrying a first label, a second training image carrying a second label, and area position information, where the first training image is an image including a target object, the second training image is a partial area image including the target object in the first training image, and the area position information is used to indicate an area position of the second training image in the first training image;

a training module, configured to input the training sample set to a single initial recognition model for each training sample set, and determine first training attribute information of the first training image and second training attribute information of the second training image; and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine a single first attribute recognition model.

17. The apparatus of claim 16, wherein the training module is further to:

respectively extracting the features of the first training image and the second training image through different channels of the single initial recognition model to obtain a first training image feature and a second training image feature;

18. The apparatus of claim 17, wherein the training module is further configured to:

19. The apparatus of claim 18, wherein the training module is further configured to:

20. The apparatus of any one of claims 16-19, wherein the first training image is a vehicle image, the second training image is a license plate image, and the target object is a license plate.

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.