CN115049895B

CN115049895B - Image attribute identification method, attribute identification model training method and device

Info

Publication number: CN115049895B
Application number: CN202210675231.7A
Authority: CN
Inventors: 蒋旻悦; 于越; 杨喜鹏
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2024-01-05
Anticipated expiration: 2042-06-15
Also published as: CN115049895A

Abstract

The disclosure provides an image attribute identification method, an attribute identification model training device, attribute identification model training equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical fields of image processing, computer vision and the like, and can be applied to scenes such as intelligent traffic, smart cities and the like. The specific implementation scheme is as follows: a first image is acquired, the first image comprising an image of a target object. Then, a second image may be determined from the first image, the second image being a partial area image including the target object in the first image. And inputting the first image and the second image into a single attribute identification model to obtain the target image attribute. The target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object. According to the method and the device, the attribute identification can be carried out on different images through one model, so that the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

Description

Image attribute identification method, attribute identification model training method and device

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the technical fields of image processing, computer vision and the like, and can be applied to scenes such as intelligent traffic, smart cities and the like, in particular to an image attribute identification method, an attribute identification model training method and an attribute identification model training device.

Background

With the development of the age, it is becoming more common to perform attribute identification on images. In some scenarios, a video may be acquired or an image captured by an acquisition device. The video or image acquired by the acquisition device may then be identified to determine corresponding attribute information. For example, the image may be identified to determine that the image has attribute information such as flowers, pedestrians, vehicles, and signboards.

Disclosure of Invention

The disclosure provides an image attribute identification method, an attribute identification model training method, an apparatus, a device and a storage medium.

According to a first aspect of the present disclosure, there is provided an image attribute identification method, which may include: a first image is acquired, the first image being an image comprising a target object. Then, a second image may be determined from the first image, the second image being a partial area image including the target object in the first image. And inputting the first image and the second image into a single attribute identification model to obtain the target image attribute. The target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object. According to the method and the device, the attribute identification can be carried out on different images through one model, so that the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

According to a second aspect of the present disclosure, there is provided an attribute identification model training method, which may include: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises at least one training sample set, and each training sample set comprises a first training image carrying a first label, a second training image carrying a second label and region position information. The first training image is an image comprising a target object, the second training image is a partial area image comprising the target object in the first training image, and the area position information is used for representing the area position of the second training image in the first training image. Then, for each training sample set, the training sample set may be input to a single initial recognition model, and the first training attribute information of the first training image and the second training attribute information of the second image may be determined. And adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine a single first attribute recognition model. According to the method and the device, the single model is obtained through training, so that attribute identification can be carried out on different images simultaneously by using the single model, the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

According to a third aspect of the present disclosure, there is provided an image attribute identifying apparatus including: the acquisition module is used for acquiring a first image, wherein the first image is an image comprising a target object; the determining module is used for determining a second image according to the first image, wherein the second image is a partial area image of the first image, which comprises the target object; the identification module is used for inputting the first image and the second image into a single attribute identification model to obtain a target image attribute, wherein the target image attribute comprises first attribute information of the first image and second attribute information corresponding to a target object. According to the method and the device, the attribute identification can be carried out on different images through one model, so that the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

According to a fourth aspect of the present disclosure, there is provided an attribute identification model training apparatus including: the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a training sample set, the training sample set comprises at least one training sample set, each training sample set comprises a first training image carrying a first label, a second training image carrying a second label and region position information, the first training image is an image comprising a target object, the second training image is a partial region image comprising the target object in the first training image, and the region position information is used for representing the region position of the second training image in the first training image; the training module is used for inputting the training sample groups into a single initial recognition model for each training sample group, and determining first training attribute information of a first training image and second training attribute information of a second image; and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine a single first attribute recognition model. According to the method and the device, the single model is obtained through training, so that attribute identification can be carried out on different images simultaneously by using the single model, the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any one of the methods of the first or second aspects described above.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform any one of the methods of the first or second aspects described above.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first or second aspects described above.

According to the image attribute identification method, the attribute identification model training method, the device, the equipment and the storage medium, which are provided by the disclosure, the attribute identification can be performed on different images through one model, the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic view of an application scenario of an embodiment of the present disclosure;

FIG. 2 is a flow chart of an image attribute identification method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an attribute identification model structure in accordance with an embodiment of the present disclosure;

FIG. 4 is a flowchart of another image attribute identification method according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a feature extraction branch structure of an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of another attribute identification model structure in accordance with an embodiment of the present disclosure;

FIG. 7 is a flowchart of yet another image attribute identification method of an embodiment of the present disclosure;

FIG. 8 is a flowchart of a method for training an attribute identification model in accordance with an embodiment of the present disclosure;

FIG. 9 is a flowchart of another attribute identification model training method in accordance with an embodiment of the present disclosure;

FIG. 10 is a flowchart of yet another attribute identification model training method in accordance with an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of an image attribute identification apparatus according to an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of an attribute identification model training apparatus in accordance with an embodiment of the present disclosure;

fig. 13 is a schematic diagram of an image attribute identification apparatus and an attribute identification model training apparatus according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The scenario in which the present disclosure is mainly applied may be, for example, a scenario in which attribute recognition is performed on a vehicle. As shown in fig. 1. The scene includes a photographed object vehicle 110, an image capturing device 120, and a network device 130. As can be seen, the acquisition device 120 may take a picture to obtain an image containing the vehicle 110 while the vehicle 110 is traveling normally or is parked. Of course, in some examples, the capturing device 120 may also capture images continuously during a preset period to obtain video including the vehicle 110. The capture device 120 may then send the captured image or video to the network device 130 via wired or wireless means. So that the network device 130 performs attribute recognition for the image or video photographed by the collecting device 120. For example, the attribute identification may be performed directly on the image captured by the capture device 120, or alternatively, the attribute identification may be performed on a corresponding frame in the video captured by the capture device 120. It is understood that any frame in a video can be considered an image.

In some cases, due to some task requirements, it may be desirable to attribute identify the image to obtain the corresponding attributes. For example, an image containing a vehicle may be identified to determine attributes associated with the vehicle. Taking an image as a vehicle image, for a vehicle image including a vehicle, if corresponding attribute identification is required for a license plate in the vehicle, the license plate in the vehicle image is too small, so that attributes related to the license plate cannot be effectively identified in most cases.

In some examples, additional separate models may be used to identify properties associated with the license plate by entering the associated license plate image. For other vehicle-related attributes, it may also be desirable to configure the corresponding model separately for corresponding attribute identification. However, when multiple models are deployed and used to identify for different attributes, running multiple models in parallel severely limits the running speed of each of the multiple models, which in turn results in an overall identification speed that is too slow, severely affecting the use experience.

Accordingly, the present disclosure provides an image attribute recognition method, and a corresponding attribute recognition model training method. And different images are simultaneously subjected to attribute identification through one model, so that the model identification efficiency is improved, and the reduction of the running speed caused by the common running of a plurality of models is avoided.

The present disclosure will be described in detail below with reference to the attached drawings.

Fig. 2 is a flowchart of an image attribute identification method according to an exemplary embodiment of the present disclosure. The method can adopt a pre-trained attribute identification model to carry out attribute identification, and can be applied to network equipment, such as the network equipment 130.

In some examples, the network device may be, for example, a server or a cluster of servers. Of course, the server may be a server or a server cluster deployed on a virtual machine, which is not limited in this disclosure.

Of course, in some examples, the method may also be applied to a terminal device. For example, the terminal device may include, for example, any terminal device or portable terminal device such as, but not limited to, a cell phone, a wearable device, a tablet, a handheld computer, a notebook, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, a personal digital assistant (personal digitalassistant, PDA), a laptop (laptop), a mobile computer, an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, an artificial intelligence (artificial intelligence, AI) device, and/or an in-vehicle device.

The method according to the present disclosure may comprise the steps of:

s201, acquiring a first image.

In some examples, the first image may be pre-stored in the network device 130 or a database, or may be obtained from the acquisition device 120 by a wired or wireless method, which is not limited in this disclosure. Wherein the first image is an image including the target object.

S202, determining a second image according to the first image.

In some examples, the second image may be extracted from the first image, e.g., the second image may be a partial region image of the first image that includes the target object.

Of course, in some examples, the second image may also be determined from the first image in advance. For example, the acquisition device 120 may determine the second image based on the first image after acquiring the first image. The network device 130 may also obtain the generated second image directly from the acquisition device 120. Of course, in some examples, the second image may also be pre-stored in the network device 130 or database so that the network device 130 may be directly acquired when needed.

S203, inputting the first image and the second image into a single attribute identification model to obtain the target image attribute.

In one example, the first image and the second image may be input into a single attribute identification model trained in advance, so that the single attribute identification model performs attribute identification on the first image and the second image to obtain the target image attribute. The target image attribute may include first attribute information corresponding to the first image and second attribute information corresponding to the second image.

According to the method and the device, the attribute identification can be carried out on different images through one model, so that the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

The scheme depicted in fig. 2 will be described in more detail below in connection with other figures.

As an exemplary embodiment, fig. 3 is a schematic diagram of an attribute identification model structure according to one exemplary embodiment of the present disclosure. As can be seen in fig. 3, the attribute identification model 300 may include a first feature extraction branch 310, a second feature extraction branch 320, a feature fusion layer 330, a full connection layer 340, and a normalization layer 350.

Fig. 4 is a flowchart of another image attribute identification method according to an exemplary embodiment of the present disclosure. The target image attribute may be obtained in S203 based on the model structure shown in fig. 3, and may further include:

S401, respectively extracting features of the first image and the second image through different channels of the single attribute identification model to obtain a first image feature and a second image feature.

In some examples, the first image may be input into the first feature extraction branch 310 of the single attribute identification model for feature extraction to obtain the first image feature corresponding to the first image.

Of course, in some examples, the first feature extraction branch 310 may be a convolutional neural network (convolutional neural networks, CNN), which may include an input layer 510 and at least one intermediate layer 520, as shown in fig. 5. It is understood that the input layer 510 serves to extract various information of an input image. So that the extracted information is input into the at least one intermediate layer 520 for a corresponding convolution operation to obtain the first image feature.

In some examples, each intermediate layer may be composed of a convolutional layer, a pooling layer, and/or a downsampling layer. The present disclosure does not specifically limit the intermediate layer.

Continuing back to FIG. 3, in other examples, the second image may be input into a second feature extraction branch 320 of the single attribute identification model for feature extraction to determine a second image feature corresponding to the second image.

It is to be understood that the second feature extraction branch 320 may have a similar structure to the first feature extraction branch 310, and specifically, reference may be made to the structure shown in fig. 5, which is not described herein.

S402, identifying and obtaining the target image attribute based on the first image characteristic and the second image characteristic.

In some examples, after the first image feature and the second image feature are determined in S401, attribute identification may be performed according to the first image feature and the second image feature to identify and obtain target image data, that is, obtain first attribute information and second attribute information.

According to the method and the device, attribute identification of different images is achieved through different channels in a single model, and the use of a plurality of models to identify the attribute of different images is avoided, so that the identification efficiency of the models is improved, and meanwhile, the reduction of the running speed caused by the common running of the plurality of models is avoided.

In some embodiments, when determining the second image at S202, the method may further include: area location information is determined. The region position information is used to represent the region position of the second image in the first image.

It will be appreciated that in some examples, when determining the second image from the first image at S202, the location of the region of the second image in the first image may also be determined. For example, the network device or the terminal device may detect the first image using a pre-configured detection model to determine the region location information.

Thus, S402, based on the first image feature and the second image feature, identifies a target image attribute, may further include: based on the region location information, a region sub-feature is determined from the first image feature. And fusing the region sub-features with the second image features to obtain fusion features. Based on the fusion characteristics, second attribute information is determined.

In some examples, region sub-features of the respective region may be determined from the first image feature based on the determined region location information. The region sub-features may then be fused with the second image features to obtain fused features. Attribute identification may then be performed based on the fusion features to determine second attribute information.

For example, the first image feature and the second image feature may be input into the feature fusion layer 330 of the single attribute identification model. Meanwhile, the region position information needs to be input into the feature fusion layer 330 of the attribute identification model. The feature fusion layer 330 of the attribute identification model determines region sub-features from the first image features based on the region location information. And then fusing the region sub-features with the second image features to determine fusion features. It should be appreciated that the region sub-feature is a partial feature corresponding to the region position information in the first image feature.

In some examples, the feature fusion layer 330 of the attribute identification model may determine, based on the region location information, a partial feature, i.e., a region sub-feature, at a location indicated by the region location information in the first image feature. The region sub-feature may then be fused with the second image feature to obtain a fused feature. Taking two-dimensional coordinates as an example, it can be assumed that the region position information includes 4 coordinates, for example, a (x 1, y 1), B (x 2, y 2), C (x 3, y 3), and D (x 4, y 4), respectively. Based on the above A, B, C and D four points, one partial region in the first image can be determined. The region represents a region position of the second image in the first image. And then, the region sub-feature corresponding to the partial region can be fused with the second image feature to obtain a fusion feature. Obviously, the fusion feature will have more feature information, which is beneficial for determining the second attribute information of the second image according to the fusion feature later.

It will be appreciated that the above-described respective coordinates in the region position information are merely exemplary descriptions, and are not intended to limit the present disclosure.

In some examples, the features may be directly overlapped, or may be overlapped based on preset weights, or may be correspondingly fused according to a preset manner, or may be fused in any equivalent manner. It is to be understood that the present disclosure is not limited to the particular manner of fusion.

In some examples, for the first attribute information, attribute identification may be performed directly based on the first image feature to obtain the first attribute information. The specific manner may refer to the existing manner, and this disclosure is not repeated here.

The method and the device fuse the region sub-features with the second image features based on the region position information, so that the features are fused, the fused features can carry more information, and the method and the device are favorable for subsequent identification of more accurate second attribute information.

In some embodiments, for S402, identifying the target image attribute based on the first image feature and the second image feature may include: first attribute information is determined from the first image feature and second attribute information is determined from the second image fusion feature. Or, the first image feature and the fusion feature are combined. And determining the first attribute information and the second attribute information according to the combined characteristics.

In one example, in the attribute identification model, attribute identification may be performed for the first image feature and the fusion feature, respectively, to determine corresponding attribute information. Namely, the first image feature and the fusion feature are sequentially input to the full connection layer 340 of the attribute identification model, and the output of the full connection layer 340 is input to the normalization layer 350 of the attribute identification model. For example, the first image feature may be input to the full-connection layer 340 of the attribute identification model, after which the output of the full-connection layer 340 is input to the normalization layer 350 of the attribute identification model, and the first attribute information is determined. The first attribute information indicates corresponding attribute information in the first image. And, the fusion feature may be input to the full connection layer 340 of the attribute identification model, and then, the output of the full connection layer 340 is input to the normalization layer 350 of the attribute identification model, so as to determine the second attribute information. The second attribute information indicates corresponding attribute information in the second image. It will be appreciated that each neuron in the fully connected layer may correspond to a respective possible attribute. Thus, all possible attributes of the first image and all possible attributes of the second image may be present in the fully connected layer 340 at the same time.

Of course, in some examples, for the fully connected layer 340 containing all possible attributes of the first image, and all possible attributes of the second image, the first image features and the fusion features may also be superimposed and input to the fully connected layer 340 of the attribute identification model. The output of the fully connected layer 340 is then input to the normalization layer 350 of the attribute identification model to determine the target image attributes. The determined target image attribute may include first attribute information and second attribute information.

Of course, in still other examples, such as the model structure shown in fig. 6, the model 600 shown in fig. 6 includes a first feature extraction branch 610, a second feature extraction branch 620, a feature fusion layer 630, a first full connection layer 640, a first normalization layer 650, a second full connection layer 660, and a second normalization layer 670. The first feature extraction branch 610 is similar to the first feature extraction branch 310, the second feature extraction branch 620 is similar to the second feature extraction branch 320, and the feature fusion layer 630 is similar to the feature fusion layer 330, and the description thereof will be omitted herein.

As shown in fig. 6, the first image feature determined by the first feature extraction branch 610 may be directly input into the first full connection layer 640. Thereafter, the output of the first full connection layer 640 is input to the first normalization layer 650, thereby determining first attribute information. And, the fusion features determined by the feature fusion layer 630 may be directly input to the second full connection layer 660. Thereafter, the output of the second full connection layer 660 is input to the second normalization layer 670, thereby determining second attribute information.

The present disclosure may select different model structures based on different requirements, which may be applicable to more possible scenarios.

As an exemplary embodiment, fig. 7 is a flowchart of still another image attribute recognition method according to an exemplary embodiment of the present disclosure. The method may comprise the steps of:

s701, a first image is acquired.

It is to be understood that the implementation process of S701 is similar to S201, and for convenience of description, this disclosure is not repeated here.

S702, cutting and filling the first image, and determining a second image.

The network device or the terminal device may detect the first image by using a pre-configured detection model, and determine the position of the target object in the first image. The first image is then cropped based on the location to obtain a partial image of the corresponding region. The partial image may then be filled in to determine a second image. It can be understood that the second image obtained after clipping and filling is the same as the first image in pixel value, i.e. the image is the same in size. Thus, the position of the target object in the first image, i.e. the above-mentioned region position information.

In some examples, the filling may be performed in any of the existing ways. For example, the second image may be made to have the same size, i.e. pixels, as the first image. Of course, in other examples, the size of the second image may also be made to satisfy a preset condition, such as a preset fixed pixel size or L times the size of the first image, where L may be any positive number.

It can be understood that the purpose of filling the partial image of the corresponding region of the first image is to ensure that the partial image of the corresponding region of the second image is not identical to the partial image of the corresponding region of the first image, so as to avoid identical features after feature extraction.

Of course, in some embodiments, the scheme described above with respect to FIG. 7 may also be implemented on an acquisition device, such as acquisition device 120. After the acquisition device 120 determines the second image by the corresponding method in fig. 7, the first image, the second image, and the area location information may be sent to a network device, such as the network device 130.

It will of course be appreciated that fig. 7 only depicts the case where the second image may be determined from the first image. In other examples, the second image may also be pre-configured, and the disclosure is not limited.

S703, inputting the first image and the second image into a single attribute identification model to obtain the target image attribute.

It is to be understood that the implementation process of S703 is similar to S203, and this disclosure is not repeated here for convenience of description.

The second image can be determined through the first image, so that in the practical application process, multiple different types of attributes can be accurately identified based on one image, and meanwhile, the model can be guaranteed to have certain identification precision.

In some embodiments, in the image attribute identifying method described in fig. 2 to 7, the first image may be a vehicle image, and the second image may be a license plate image, and the target object may be a license plate.

The method and the device take the vehicle image and the license plate image as examples, so that the difficulty and the complexity of model deployment can be reduced when the relevant attributes of the vehicle are identified, the model identification efficiency can be ensured, and the model identification precision is improved.

As an exemplary embodiment, fig. 8 is a flowchart of an attribute identification model training method according to one exemplary embodiment of the present disclosure. The method may be applied to the network device 130, which may train a single initial recognition model to obtain a trained single attribute recognition model. That is, a single attribute identification model may be trained by the method described above with respect to fig. 2-7 prior to the method. Thus, the method may comprise the steps of:

s801, a training sample set is acquired.

In some examples, the training sample set may include at least one training sample set. Each training sample set may include a first training image carrying a first label, a second training image carrying a second label, and region location information. It will be appreciated that the region position information during the training phase is used to represent the region position of the second training image in the first training image. The first training image is an image comprising a target object, and the second training image is a partial area image comprising the target object in the first training image. The first label is used for representing real attribute information of the first training image, and the second label is used for representing real attribute information of the second training image.

It should be noted that the region position information described in the training phase is to be understood as the region position of the second training image in the first training image. Whereas the region position information in the application phase should be understood as the region position of the second image in the first image.

S802, inputting a training sample group into a single initial recognition model for each training sample group, and determining first training attribute information of a first training image and second training attribute information of a second image.

In some examples, during the training phase, iterative training may be performed on a per training sample set basis. For example, each training sample set may be input into a single initial recognition model in series. For each training sample set, at each training, first training attribute information of the first training image and second training attribute information of the second image may be determined.

It can be appreciated that the model structure of the single initial recognition model may be shown with reference to fig. 3, 5 and 6, and the data processing procedure in the specific model is similar to that of the recognition process of fig. 2, 4 and 7, and the specific description of fig. 2, 4 and 7 may be referred to herein for brevity.

S803, according to the first label, the first training attribute information, the second label and the second training attribute information, the single initial recognition model is adjusted, and the single first attribute recognition model is determined.

In some examples, parameters in a single initial recognition model may be adjusted based on the first tag, the first training attribute information, the second tag, and the second training attribute information, and in accordance with employing a pre-configured loss function (loss). The loss function may be, for example, a cross entropy loss function. Of course, in other examples, any possible loss function may be used for training. The present disclosure is not limited.

In some examples, a single first attribute identification model is determined when trained on a large set of training samples until the model converges.

It will be appreciated that a single first attribute identification model, referred to in figures 2 to 7, is a single attribute identification model.

According to the method and the device, the single model is obtained through training, so that attribute identification can be carried out on different images simultaneously by using the single model, the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

It will be appreciated that the model training process adjusts the corresponding parameters in each network layer in the model, so that when training a single initial recognition model is completed, each parameter in the model is optimized, thereby obtaining a single first attribute recognition model. That is, a single initial recognition model differs from a single first attribute recognition model only in the corresponding parameters in the model.

In one embodiment, FIG. 9 is a flowchart of yet another attribute identification model training method in accordance with an embodiment of the present disclosure. For example, determining the first training attribute information of the first training image and the second training attribute information of the second image in S802 may include the steps of:

s901, respectively extracting features of a first training image and a second training image through different channels of a single initial recognition model to obtain the features of the first training image and the training features of the second image.

In some examples, feature extraction may be performed on the first training image and the second training image separately through different channels of a single initial recognition model. And obtaining the first training image features corresponding to the first training image and the second image training features corresponding to the second training image.

Taking the model structure shown in fig. 3 and fig. 6 as an example, the first training image may be input into the first feature extraction branch 310 or the first feature extraction branch 610 of the single initial recognition model to perform feature extraction, so as to determine the first training image feature corresponding to the first training image. And inputting the second training image into the second feature extraction branch 320 or the second feature extraction branch 620 of the single initial recognition model to perform feature extraction so as to determine the second training image features corresponding to the second training image.

S902, obtaining first training attribute information based on the first training image feature recognition and obtaining second training attribute information based on the second training image feature recognition.

In some examples, after determining the first training image feature and the second training image feature in S901, attribute recognition may be performed according to the first training image feature to obtain first training attribute information through recognition; and performing attribute identification according to the second training image features to identify second training attribute information.

In one embodiment, the obtaining the second training attribute information based on the second training image feature recognition in S902 may include: determining a region training sub-feature from the first training image feature based on the region position information; fusing the region training sub-features with the second training image features to obtain training fusion features; based on the training fusion characteristics, second training attribute information is determined.

In some examples, the region training sub-feature may be determined from the first training image feature based on the region location information. And then fusing the region training sub-features with the second training image features to obtain training fusion features. Thereafter, second training attribute information may be determined based on the training fusion feature.

For example, after determining the first training image features and determining the second training image features, the first training image features and the second training image features may be input into the feature fusion layer 330 or the feature fusion layer 630 of the single initial recognition model. In some cases, the region position information is also input into the feature fusion layer 330 or the feature fusion layer 630 of the single initial recognition model. Training fusion features are obtained through feature fusion layer 330 or feature fusion layer 630.

It will be appreciated that the process of obtaining the training fusion feature is similar to the process of obtaining the fusion feature at the application stage, and specific reference may be made to the corresponding description, which is not repeated herein.

According to the method and the device, based on the region position information, the region training sub-features are fused with the second training image features, so that the fused training fusion features can carry more information, and the method and the device are favorable for subsequent identification of more accurate second training attribute information.

In some embodiments, S803 may combine the first training image feature and the training fusion feature based on the model structure shown in fig. 3, and then perform attribute recognition on the combined feature to determine the first training attribute information and the second training attribute information. Of course, in some examples, based on the model structure shown in fig. 3 or fig. 6, attribute recognition may be performed on the first training image feature to determine corresponding first training attribute information, and attribute recognition may be performed on the training fusion feature to determine second training attribute information.

The specific implementation process may refer to corresponding description when the model is applied, and this disclosure is not repeated here.

It will be appreciated that, in the training process, after the first attribute information and the second attribute information are determined, the loss function may be used to adjust parameters in the corresponding layers in the single initial recognition model in combination with the first tag and the second tag. In one example, the adjustment may be performed using a cross entropy loss function, for example.

For example, the parameters of the corresponding layers in the first feature extraction branch 310 or the first feature extraction branch 610 may be adjusted using a cross entropy loss function according to the first tag and the first attribute information. And a large number of training sample groups in the training sample set are passed until the single initial recognition model converges. Similarly, the parameters of the corresponding layers in the second feature extraction branch 320 or the second feature extraction branch 620 may be adjusted using the cross entropy loss function according to the second tag and the second attribute information. And a plurality of first training sample groups in the first training set are passed until the single initial recognition model converges. When the training of the first feature extraction branch 310 or the first feature extraction branch 610 and the second feature extraction branch 320 or the second feature extraction branch 620 is completed, the initial recognition model may be considered to be completed, and the parameters of the corresponding layer in the single initial recognition model are the parameters after the training is completed, so that the single initial recognition model after the training is the single first attribute recognition model.

As an exemplary embodiment, fig. 10 is a flowchart of yet another attribute identification model training method, shown in an exemplary embodiment of the present disclosure. The method can obtain a single second attribute identification model by performing secondary training on the single first attribute identification model obtained by training in the above-mentioned figures 8 to 9. Thus, after S803, the method may further comprise the steps of:

S1001, adjusting the single first attribute identification model based on the region training sub-feature and the second training image feature to determine a single second attribute identification model.

In some examples, after training to obtain a single first attribute identification model, further training may be performed. For example, the loss function may be calculated based on the region training sub-feature and the second training image feature obtained at the time of training. For example, a cross entropy loss function may be employed. And training is continuously iterated until the single first attribute identification model converges to obtain a single second attribute identification model.

Wherein a single second attribute identification model may only retain the first feature extraction branch 310 or the first feature extraction branch 610 in the model during the application phase.

It will be appreciated that the training process of fig. 10 is described above with the objective of letting the first feature extraction branch extract features in the attribute identification model have corresponding regional sub-features that are closer to the features extracted by the second feature extraction branch in the single first attribute identification model. The features extracted by the first feature extraction branches in the single second attribute identification model can still effectively identify more attribute information, such as second attribute information.

In some examples, a new training sample set may also be used for training when training to obtain a single second attribute identification model. For example, the second training sample set may include at least one second training sample set, each of which may include a third training image, a fourth training image, and second region position information. The second region position information is used for representing the region position of a fourth training image in a third training image, the third training image comprises an image of the target object, and the fourth training image is a partial region image of the third training image, which comprises the target object.

It is to be understood that in this example, the training sample set acquired in S801 may be referred to as a first training sample set, and the region position information in the first training sample set may be referred to as first region position information.

And then, inputting each training sample group in the second training sample set into a single first attribute identification model for training. For example, for each second training sample set, a third training image feature corresponding to the third training image and a fourth training image feature corresponding to the fourth image are determined. Then, based on the second region position information, a second region training sub-feature is determined from the third training image feature. It will be appreciated that in this example, the region training sub-features involved in training to obtain a single first attribute identification model may be referred to as first region training sub-features. The single first attribute identification model may then be adjusted using the loss function based on the second region training sub-feature and the fourth training image feature to determine a single second attribute identification model. Of course, in other examples, any equivalent loss function may be used to adjust the first attribute model, and the disclosure is not limited.

In some examples, the second region training sub-feature may be determined by the feature fusion layer 330 or the feature fusion layer 630 of the single first attribute identification model. For example, the feature fusion layer 330 or the feature fusion layer 630 of the single first attribute identification model may determine, based on the second region position information and the third training image feature, a partial feature corresponding to the second region position information in the third training image feature, where the partial feature is a second region training sub-feature.

The present disclosure determines a single second attribute identification model in the above manner, so that the schemes described in fig. 2 to 7 may also employ the single second attribute identification model for image attribute identification. It will be appreciated that since the single second attribute identification model only retains the first feature extraction branch, identification can be made directly based on the first image, thereby identifying the first attribute information and the second attribute information. The scheme can further reduce the deployment difficulty of the model, reduce the space occupied by the model, ensure the recognition efficiency and improve the recognition accuracy.

In some embodiments, the first training image may be a vehicle image, the second training image may be a license plate image, and the target object is a license plate.

Of course, in other embodiments, the third training image referred to above may be a vehicle image and the fourth training image may be a license plate image.

Taking the first feature extraction branch for extracting the vehicle image features from the vehicle map and the second feature extraction branch for extracting the license plate image features from the license plate map as an example, it can be understood that the single first attribute identification model ensures that more accurate vehicle attribute information and license plate attribute information can be identified simultaneously by fusing the vehicle image features and the license plate image features. For the single second attribute identification model, only the vehicle image features can be extracted by using the vehicle map, and more accurate vehicle attribute information and license plate attribute information can be finally identified.

As an exemplary embodiment, the present disclosure will be described with reference to a vehicle image, a license plate image, and the scenario described in connection with fig. 1. Of course, the network device 130 in fig. 1 may be replaced by a terminal device. The present embodiment will be described taking a network device as an example.

For example, a vehicle image including the vehicle 110 may first be acquired by the acquisition device 120. And then the vehicle image can be detected by a front detector to determine the license plate position. It will be appreciated that the pre-positioned detector may be a pre-configured detection model. The detection model may be located on the acquisition device 120 or on the network device 130. For example, if a pre-configured detection model is located on the acquisition device 120, the acquisition device may detect the vehicle image and determine the license plate location. Then, the license plate region in the vehicle image can be cut and filled to obtain the license plate image. And then the vehicle image, license plate image and license plate position are sent to the network device 130. For another example, if the pre-configured detection model is located on the network device 130, the acquisition device directly transmits the acquired vehicle image to the network device 130. The vehicle map is detected by the network device 130 according to a pre-configured detection model, and the license plate position is determined. Then, the license plate region in the vehicle image can be cut and filled to obtain the license plate image. It can be understood that the vehicle image is the first image, the license plate image is the second image, the license plate position is the regional position information, and the license plate is the target object.

Then, the network device may use the trained single attribute recognition model (i.e., the single first attribute recognition model) to perform corresponding feature extraction on the vehicle image and the license plate image respectively. And determining license plate fusion characteristics based on license plate positions, vehicle characteristics and license plate characteristics. This process is similar to the process of determining fusion characteristics in fig. 2-7, and this disclosure is not repeated here. And then, carrying out attribute identification on the vehicle characteristics and the license plate fusion characteristics so as to determine the vehicle related attributes and the license plate related attributes. The process of determining the vehicle-related attribute and the license plate-related attribute is similar to the process of determining the first attribute information and the second attribute information in fig. 2 to 7, and the disclosure is not repeated here.

As an exemplary embodiment, the present disclosure may also utilize a single second attribute identification model determined by the method described above with respect to fig. 10 in performing image attribute identification. For example, the attribute identification may be directly performed on the first image using a single second attribute identification model to determine the first attribute information and the second attribute information at the same time. The description will be made taking the first image as a vehicle image, the first attribute information as a vehicle-related attribute, and the second attribute information as a license plate-related attribute as an example. In this example, a single second attribute recognition model may be used to directly perform attribute recognition on the vehicle image, and determine the vehicle-related attribute and the license plate-related attribute. The reason for this is that, in the course of the training of the single second attribute identification model, the distance between the feature extracted by the first feature extraction branch and the feature extracted by the second feature extraction branch is shortened. So that more accurate features can be extracted by using only the first feature extraction branch. For example, in the training process, the license plate part features in the vehicle features are gradually close to the license plate features determined by the second feature extraction branch through adjustment of the model parameters. Therefore, in the application stage, the feature extraction can be directly carried out on the vehicle image, and the license plate part features in the vehicle features can be well used for carrying out license plate related attribute identification.

Based on the same conception, the embodiment of the disclosure also provides an image attribute identification device and an attribute identification model training device.

It may be appreciated that, in order to implement the above functions, the image attribute identifying apparatus and the attribute identifying model training apparatus provided in the embodiments of the present disclosure include a hardware structure and/or a software module that perform each function. The disclosed embodiments may be implemented in hardware or a combination of hardware and computer software, in combination with the various example elements and algorithm steps disclosed in the embodiments of the disclosure. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the embodiments of the present disclosure.

Fig. 11 is a schematic view of an image attribute recognition apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 11, an image attribute recognition apparatus 1100 is provided, and the apparatus 1100 may implement any of the methods described above with reference to fig. 2 to 7. The apparatus 1100 may include: an acquisition module 1101, configured to acquire a first image, where the first image is an image including a target object; a determining module 1102, configured to determine a second image according to the first image, where the second image is a partial area image including the target object in the first image; the identifying module 1103 is configured to input the first image and the second image into a single attribute identifying model, so as to obtain a target image attribute, where the target image attribute includes first attribute information of the first image and second attribute information corresponding to the target object.

In one possible implementation, the identification module 1103 is further configured to: respectively extracting features of the first image and the second image through different channels of a single attribute identification model to obtain a first image feature and a second image feature; and identifying and obtaining the target image attribute based on the first image characteristic and the second image characteristic.

In one possible implementation, the determining module 1102 is further configured to: determining region position information for representing a region position of the second image in the first image; the identification module 1103 is further configured to: determining a region sub-feature from the first image feature based on the region location information; fusing the region sub-features with the second image features to obtain fused features; based on the fusion characteristics, second attribute information is determined.

In one possible implementation, the determining module 1102 is further configured to: and cutting and filling the first image to determine a second image, wherein the pixel value of the second image is the same as that of the first image.

In one possible embodiment, the first image is a vehicle image, the second image is a license plate image, and the target object is a license plate.

As an exemplary embodiment, fig. 12 is a schematic diagram of an attribute identification model training apparatus according to one exemplary embodiment of the present disclosure. Referring to fig. 12, an attribute recognition model training apparatus 1200 is provided, and the apparatus 1200 may implement any of the methods described above with respect to fig. 8-10. The apparatus 1200 may include: the obtaining module 1201 is configured to obtain a training sample set, where the training sample set includes at least one training sample set, and each training sample set includes a first training image carrying a first tag, a second training image carrying a second tag, and region position information, where the first training image is an image including a target object, the second training image is a partial region image including the target object in the first training image, and the region position information is used to represent a region position of the second training image in the first training image; a training module 1202, configured to input a training sample set to a single initial recognition model for each training sample set, and determine first training attribute information of a first training image and second training attribute information of a second image; and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine a single first attribute recognition model.

In one possible implementation, the training module 1202 is further to: respectively extracting features of the first training image and the second training image through different channels of a single initial recognition model to obtain first training image features and second image training features; first training attribute information is obtained based on the first training image feature recognition, and second training attribute information is obtained based on the second training image feature recognition.

In one possible implementation, the training module 1202 is further to: determining a region training sub-feature from the first training image feature based on the region position information; fusing the region training sub-features with the second training image features to obtain training fusion features; based on the training fusion characteristics, second training attribute information is determined.

In one possible implementation, the training module 1202 is further to: and adjusting the single first attribute identification model based on the region training sub-feature and the second training image feature to determine a single second attribute identification model.

The method and the device can further reduce the deployment difficulty of the model, reduce the space occupied by the model, ensure the recognition efficiency and improve the recognition accuracy.

In one possible embodiment, the first training image is a vehicle image, the second training image is a license plate image, and the target object is a license plate.

The specific manner in which the various modules perform the operations in relation to the apparatus of the present disclosure referred to above has been described in detail in relation to embodiments of the method and will not be described in detail herein.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an image attribute recognition apparatus, an attribute recognition model training apparatus, a readable storage medium, and a computer program product.

Fig. 13 shows a schematic block diagram of an apparatus 1300 that may be used to implement embodiments of the present disclosure. It will be appreciated that the device 1300 may be an image attribute recognition device or an attribute recognition model training device. The apparatus 1300 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, server clusters, and other suitable computers. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 13, the apparatus 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1302 or a computer program loaded from a storage unit 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data required for the operation of the device 1300 can also be stored. The computing unit 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.

Various components in device 1300 are connected to I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, etc.; and a communication unit 1309 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1309 allows the device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1301 performs the respective methods and processes described above, such as any one of the image attribute recognition methods described in fig. 2 to 7, and/or any one of the attribute recognition model training methods described in fig. 8 to 10, for example. For example, in some embodiments, any of the image attribute recognition methods described in fig. 2-7, and/or any of the attribute recognition model training methods described in fig. 8-10, may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1300 via the ROM 1302 and/or the communication unit 1309. When the computer program is loaded into the RAM 1303 and executed by the computing unit 1301, one or more steps of an image attribute recognition method, and/or an attribute recognition model training method described above may be performed. Alternatively, in other embodiments, the computing unit 1301 may be configured to perform the image attribute recognition method described above in fig. 2-7, and/or to perform the attribute recognition model training method described above in fig. 8-10, in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

According to the method and the device, the attribute identification is carried out on different images through one model, so that the model identification efficiency is improved, and the reduction of the operation rate caused by the common operation of a plurality of models is avoided.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain. Of course, in some examples, a server may also refer to a cluster of servers.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image attribute identification method, comprising:

acquiring a first image, wherein the first image is an image comprising a target object;

determining a second image according to the first image, wherein the second image is a partial area image of the first image, which comprises the target object;

inputting the first image into a first feature extraction branch in a single attribute identification model, and extracting features of the first image to obtain first image features; inputting the second image into a second feature extraction branch in a single attribute identification model, and extracting features of the second image to obtain second image features;

Determining a region sub-feature corresponding to a partial region of the target object in the first image from the first image feature;

fusing the region sub-features with the second image features to obtain fusion features;

and identifying and obtaining a target image attribute based on the first image feature and the fusion feature, wherein the target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object, the first attribute information is determined based on the first image feature, and the second attribute information is determined based on the fusion feature.

2. The method of claim 1, wherein the method further comprises:

determining region position information for representing a region position of the second image including the target object in the first image;

the determining the region sub-feature corresponding to the partial region of the target object in the first image from the first image feature includes:

and determining a region sub-feature from the first image feature based on the region position information.

3. The method of claim 1 or 2, wherein the determining a second image from the first image comprises:

And cutting and filling the first image, and determining a second image, wherein the pixel value of the second image is the same as that of the first image.

4. The method of claim 1 or 2, wherein the first image is a vehicle image, the second image is a license plate image, and the target object is a license plate.

5. A method for training an attribute identification model, comprising:

acquiring a training sample set, wherein the training sample set comprises at least one training sample set, each training sample set comprises a first training image carrying a first label, a second training image carrying a second label and region position information, the first training image is an image comprising a target object, the second training image is a partial region image comprising the target object in the first training image, and the region position information is used for representing the region position of the second training image in the first training image;

inputting the training sample groups into a single initial recognition model aiming at each training sample group, and extracting the characteristics of the first training image through a first characteristic extraction branch in the single initial recognition model to obtain the characteristics of the first training image; and extracting the features of the second training image through a second feature extraction branch in the single initial recognition model to obtain the features of the second training image;

Obtaining first training attribute information based on the first training image feature recognition; the method comprises the steps of,

determining a region training sub-feature from the first training image feature based on the region position information; fusing the region training sub-features with the second training image features to obtain training fusion features; determining second training attribute information based on the training fusion characteristics;

and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine a single first attribute recognition model.

6. The method of claim 5, wherein the method further comprises:

and adjusting the single first attribute identification model based on the region training sub-feature and the second training image feature to determine a single second attribute identification model.

7. The method of claim 5 or 6, wherein the first training image is a vehicle image, the second training image is a license plate image, and the target object is a license plate.

8. An image attribute identifying apparatus comprising:

the acquisition module is used for acquiring a first image, wherein the first image is an image comprising a target object;

The determining module is used for determining a second image according to the first image, wherein the second image is a partial area image of the first image, which comprises the target object;

the identification module is used for inputting the first image into a first feature extraction branch in a single attribute identification model, and extracting features of the first image to obtain first image features; inputting the second image into a second feature extraction branch in a single attribute identification model, and extracting features of the second image to obtain second image features; determining a region sub-feature corresponding to a partial region of the target object in the first image from the first image feature; fusing the region sub-features with the second image features to obtain fusion features; and identifying and obtaining a target image attribute based on the first image feature and the fusion feature, wherein the target image attribute comprises first attribute information of the first image and second attribute information corresponding to the target object, the first attribute information is determined based on the first image feature, and the second attribute information is determined based on the fusion feature.

9. The apparatus of claim 8, wherein the means for determining is further for:

the identification module is also used for:

10. The apparatus of claim 8 or 9, wherein the determining module is further configured to:

11. The apparatus of claim 8 or 9, wherein the first image is a vehicle image, the second image is a license plate image, and the target object is a license plate.

12. An attribute identification model training device, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a training sample set, the training sample set comprises at least one training sample group, each training sample group comprises a first training image carrying a first label, a second training image carrying a second label and region position information, the first training image is an image comprising a target object, the second training image is a partial region image comprising the target object in the first training image, and the region position information is used for representing the region position of the second training image in the first training image;

The training module is used for inputting the training sample groups into a single initial recognition model aiming at each training sample group, and extracting the characteristics of the first training image through a first characteristic extraction branch in the single initial recognition model to obtain the characteristics of the first training image; and extracting the features of the second training image through a second feature extraction branch in the single initial recognition model to obtain the features of the second training image; obtaining first training attribute information based on the first training image feature recognition; and determining a region training sub-feature from the first training image feature based on the region position information; fusing the region training sub-features with the second training image features to obtain training fusion features; determining second training attribute information based on the training fusion characteristics; and adjusting the single initial recognition model according to the first label, the first training attribute information, the second label and the second training attribute information to determine a single first attribute recognition model.

13. The apparatus of claim 12, wherein the training module is further to:

14. The apparatus of claim 12 or 13, wherein the first training image is a vehicle image, the second training image is a license plate image, and the target object is a license plate.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.