CN114255449A

CN114255449A - Image processing method, image processing device, electronic equipment and storage medium

Info

Publication number: CN114255449A
Application number: CN202111562224.8A
Authority: CN
Inventors: 王学占; 孔德超
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-29

Abstract

The disclosure relates to the technical field of artificial intelligence, and further relates to the technical field of image processing and edge calculation. The implementation scheme is as follows: acquiring an image to be recognized including a first object and a contrast image including a second object; determining a first local distance between a first image and a second image corresponding to the first feature; determining a second local distance between a third image and a fourth image corresponding to the second feature; determining a first attention parameter for the first local distance based on the first weight, the first area of the first image, and the second area of the second image; determining a second attention parameter for a second local distance based on the second weight, a third area of the third image, and a fourth area of the fourth image; an object similarity between the first object and the second object is determined based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter.

Description

Image processing method, image processing device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and further relates to the field of image processing and edge computing technologies, and in particular, to an image processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. The artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The present disclosure provides an image processing method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

According to an aspect of the present disclosure, there is provided an image processing method including: acquiring an image to be recognized including a first object and a contrast image including a second object; determining a first local distance between a first image corresponding to a first feature of the first object and a second image corresponding to the first feature of the second object; determining a second local distance between a third image corresponding to a second feature of the first object and a fourth image corresponding to the second feature of the second object, wherein the image to be identified comprises the first image and the third image, and the comparison image comprises the second image and the fourth image; determining a first attention parameter for the first local distance based on a first weight, a first area of the first image, and a second area of the second image; determining a second attention parameter for the second local distance based on a second weight, a third area of the third image, and a fourth area of the fourth image; determining an object similarity between the first object and the second object based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: an acquisition unit configured to acquire an image to be recognized including a first object and a comparison image including a second object; a first local distance determination unit configured to determine a first local distance between a first image corresponding to a first feature of the first object and a second image corresponding to the first feature of the second object; a second local distance determination unit configured to determine a second local distance between a third image corresponding to a second feature of the first object and a fourth image corresponding to the second feature of the second object, wherein the image to be recognized includes the first image and the third image, and the comparison image includes the second image and the fourth image; a first attention parameter determination unit configured to determine a first attention parameter of the first local distance based on a first weight, a first area of the first image, and a second area of the second image; a second attention parameter determination unit configured to determine a second attention parameter of the second local distance based on a second weight, a third area of the third image, and a fourth area of the fourth image; a similarity determination unit configured to determine an object similarity between the first object and the second object based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as previously described.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program realizes the method as described before when executed by a processor.

According to one or more embodiments of the present disclosure, different weights may be set for different features of an object existing in an image, so that image information of different features has different contribution rates to a similarity result, so that the contribution of effective information in the image to the similarity result can be improved, thereby improving the accuracy of the similarity result obtained for the object in two images.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 2 shows an exemplary flow diagram of an image processing method according to an embodiment of the present disclosure;

FIG. 3 illustrates an exemplary flow of an image processing method according to an embodiment of the present disclosure;

fig. 4 shows an exemplary block diagram of an image processing apparatus according to an embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the image processing methods provided by the present disclosure to be performed.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may use the

client device

101, 102, 103, 104, 105, and/or 106 to obtain images and preset parameters used in the methods of the present disclosure. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptops), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors, monitors, or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In some embodiments, the database used by the server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Fig. 2 illustrates an exemplary flowchart of an image processing method according to an embodiment of the present disclosure. The method 200 shown in fig. 2 may be performed by a client or server shown in fig. 1.

As shown in fig. 2, in step S202, an image to be recognized including a first object and a contrast image including a second object may be acquired.

In step S204, a first local distance between a first image corresponding to a first feature of a first object and a second image corresponding to a first feature of a second object may be determined.

In step S206, a second local distance between the third image corresponding to the second feature of the first object and the fourth image corresponding to the second feature of the second object may be determined.

In step S208, a first attention parameter for the first local distance may be determined based on the first weight, the first area of the first image, and the second area of the second image.

In step S210, a second attention parameter for a second local distance may be determined based on the second weight, a third area of the third image, and a fourth area of the fourth image.

In step S212, an object similarity between the first object and the second object may be determined based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter.

By using one or more embodiments of the present disclosure, different weights may be set for different features of an object existing in an image, so that image information of different features has different contribution rates to a similarity result, so that the contribution of effective information in the image to the similarity result can be improved, and thus the accuracy of the similarity result obtained for the object in two images is improved.

The principles of the present disclosure will be described in detail below.

In step S202, an image to be recognized including a first object and a contrast image including a second object may be acquired.

The image to be recognized and the contrast image can be images acquired by image acquisition equipment at different positions. For example, the image to be recognized and the comparison image may be images acquired by monitoring devices installed at different positions. At least one object, such as a vehicle, a person, an animal, etc., may be included in the image. In some embodiments, the first object and the second object may be vehicles or any other type of object. It is understood that, due to differences in installation positions and image capturing times, even if the first object and the second object are the same object, characteristics such as shapes, sizes, and the like of the first object and the second object may be different in images captured by different image capturing apparatuses. Taking the road monitoring system as an example, when the same vehicle runs on different roads, different monitoring devices can acquire different images including the same vehicle. In some cases, a clear license plate number may not be captured in the image, which makes it difficult to directly determine whether the vehicles captured in different images are the same vehicle. Therefore, in some cases, it is necessary to adopt an image recognition method to determine whether or not the vehicles present in the two images are the same vehicle. The principles of the present disclosure will be described in the present disclosure taking an object present in an image as an example of a vehicle, however, it is understood that the methods provided by the present disclosure may also be used to identify other types of objects depending on the actual situation without departing from the principles of the present disclosure.

In some embodiments, the image to be recognized may be processed based on image segmentation to obtain image regions of a plurality of features of the first object in the image to be recognized. Similarly, the contrast image may be processed based on image segmentation to obtain image regions of a plurality of corresponding features of the second object in the contrast image. In the case where the first object and the second object are vehicles, the characteristic of the object may include at least one of a vehicle head, a vehicle side, a vehicle tail, and a vehicle roof. The present disclosure is not limited to a specific form of image segmentation algorithm, and any image segmentation method (e.g., a pre-trained U-net network or any other available image segmentation algorithm) may be used by those skilled in the art to achieve the image segmentation.

In some implementations, the first local distance and the second local distance may be determined based on feature maps of the image to be recognized and the comparison image. The image to be recognized and the comparison image can be processed respectively based on any image feature extraction mode (such as a pre-trained convolutional neural network) to determine feature maps of the image to be recognized and the comparison image. Hereinafter, the feature map of the image to be recognized is referred to as a feature map to be recognized, and the feature map of the comparison image is referred to as a comparison feature map.

The first local distance may be determined based on a portion of the feature map within the image region corresponding to the first feature in the image to be recognized and a portion of the feature map within the image region corresponding to the first feature in the comparison image. For example, the image to be recognized may be subjected to image segmentation to obtain a first local mask to be recognized corresponding to a first feature of the first object. The first to-be-recognized local mask may be a binary map indicating a position of the first feature of the first object in the to-be-recognized image. For example, a pixel value of "1" may indicate that the location to which the pixel corresponds belongs to an image region of the first feature of the first object, while a pixel value of "0" may indicate that the location to which the pixel corresponds does not belong to an image region of the first feature of the first object. In some examples, the size of the first to-be-identified local mask and the size of the feature map of the to-be-identified image may be the same. The feature map to be recognized may be pooled based on the first local mask to be recognized to obtain a first local feature to be recognized of the first image. For example, the first to-be-identified local mask of the same size may be multiplied by the corresponding pixel in the feature map of the to-be-identified image, thereby obtaining information of the feature map within the image area of the first feature of the first object. Further, feature maps within an image area of a first feature of the first object may be pooled to obtain a first to-be-identified local feature of the first image. Similarly, the contrast image may be image segmented to obtain a first contrast local mask corresponding to the first feature of the second object. Wherein the first comparison local mask may be a binary map indicating a location of the first feature of the second object in the comparison image. For example, a pixel value of "1" may indicate that the location to which the pixel corresponds belongs to an image region of the first feature of the second object, while a pixel value of "0" may indicate that the location to which the pixel corresponds does not belong to an image region of the first feature of the second object. In some examples, the size of the first contrast local mask and the size of the feature map of the contrast image may be the same. The contrast feature map may be pooled based on the first contrast local mask to obtain first contrast local features of the second image. For example, the first comparison local mask of the same size and the corresponding pixel in the feature map of the comparison image may be multiplied to obtain information of the feature map within the image area of the first feature of the second object. Further, the feature map within the image region of the first feature of the second object may be pooled to obtain a first contrast local feature of the second image. The first local distance may be determined on the basis of the first local feature to be identified and the first comparison local feature. For example, a euclidean distance between the first to-be-identified local feature and the first comparison local feature may be calculated as the first local distance.

Based on a similar method, the second local distance may be determined based on a portion of the feature map within the image region of the second feature in the image to be recognized and a portion of the feature map within the image region of the second feature in the comparison image. For example, the image to be recognized may be subjected to image segmentation to obtain a second local mask to be recognized corresponding to the second feature of the first object. Wherein the second to-be-recognized local mask may be a binary map indicating a position of the image region of the second feature of the first object in the image to be recognized. For example, a pixel value of "1" may indicate that the location to which the pixel corresponds belongs to an image region of the second feature of the first object, while a pixel value of "0" may indicate that the location to which the pixel corresponds does not belong to an image region of the second feature of the first object. In some examples, the size of the second to-be-identified local mask and the size of the feature map of the to-be-identified image may be the same. The feature map to be recognized may be pooled based on the second local mask to be recognized to obtain a second local feature to be recognized of the third image. For example, a second local mask to be identified of the same size may be multiplied by a corresponding pixel in the feature map of the image to be identified, thereby obtaining information of the feature map within the image region of the second feature of the first object. Further, the feature maps within the image area of the second feature of the first object may be pooled to obtain a second local feature to be identified of the third image. Similarly, the comparison image may be image-segmented to obtain a second comparison local mask corresponding to a second feature of the second object. Wherein the second comparison local mask may be a binary map indicating a location of an image region of the second feature of the second object in the comparison image. For example, a pixel value of "1" may indicate that the location to which the pixel corresponds belongs to an image region of a second feature of the second object, and a pixel value of "0" may indicate that the location to which the pixel corresponds does not belong to an image region of the second feature of the second object. In some examples, the size of the second contrast local mask and the size of the feature map of the contrast image may be the same. The comparison feature map may be pooled based on the second comparison local mask to obtain second comparison local features of the fourth image. For example, a second comparison local mask of the same size may be multiplied by the corresponding pixel in the feature map of the comparison image to obtain information of the feature map within the image region of the second feature of the second object. Further, the feature maps within the image region of the second feature of the second object may be pooled to obtain a second contrast local feature of the fourth image. The second local distance may be determined based on the second local feature to be identified and the second comparison local feature. For example, a euclidean distance between the second local feature to be identified and the second comparison local feature may be calculated as the second local distance.

By setting different weights for the first and second features, corresponding attention parameters may be determined for the first and second local distances, respectively, thereby determining a degree of contribution of the first and second local distances to the object similarity between the first and second objects.

In some embodiments, the first feature may be a roof and the second feature may be one of a nose, a side, and a tail. In this case, the first weight may be smaller than the second weight. For example, the first weight may be one-half of the second weight. If the value of the second weight is set to 1, the value of the first weight is set to 0.5. With this setting method, different weights can be given to the image information of the first feature and the second feature.

In some embodiments, the first attention parameter may be determined based on the first weight and the first area. For example, the first attention parameter may be determined based on a product of a first weight and a first area of the first image, and a product of the first weight and a second area of the second image. In some examples, the first area may be obtained by summing pixel values of the first to-be-identified local mask. Similarly, the second area may be obtained by summing pixel values of the first comparison local mask.

In a similar manner, the second attention parameter may be determined based on the second weight and an area of the image region to which the second feature corresponds. For example, the second attention parameter may be determined based on a product of the second weight and a third area of the third image, and a product of the second weight and a fourth area of the fourth image. In some examples, the third area may be obtained by summing pixel values of the second to-be-identified local mask. Similarly, the fourth area may be obtained by summing pixel values of the second contrast local mask.

In some implementations, the values of the first attention parameter and the second attention parameter may be further determined based on the following equation (1):

where i is an index coefficient, where i is 1 for the first feature and i is 2 for the second feature,

a first attention parameter may be indicated,

a second attention parameter may be represented, p being the index number of the image to be recognized, q being the index number of the comparison image,

a first area of the first image may be represented,

a second area of the second image may be represented,

a third area of the third image may be represented,

a fourth area, k, which may represent a fourth image₁May represent a first weight, k₂A second weight may be represented and N may represent the total number of features defined.

In some embodiments, the first local similarity may be determined based on a product of the first attention parameter and the first local distance, and the second local similarity may be determined based on a product of the second attention parameter and the second local distance. Wherein the first local similarity may be indicative of a similarity between a first feature in the image to be recognized and the comparison image, and the second local similarity may be indicative of a similarity between a second feature in the image to be recognized and the comparison image.

An object similarity between the first object and the second object may be determined based on the first local similarity and the second local similarity. For example, the first local similarity and the second local similarity may be summed, and the sum of the first local similarity and the second local similarity is determined as the object similarity between the first object and the second object. In the case where more than two features are present in the first object and the second object, local similarities of more features may be determined based on similar methods, and all the local similarities are summed to determine the object similarity between the first object and the second object.

Further, it is also possible to determine a global similarity between the image to be recognized and the comparison image, and determine an object similarity between the first object and the second object based on both the global similarity and the above-described local similarity. For example, the object similarity between the first object and the second object may be determined based on the sum of the global similarity, the first local similarity and the second local similarity (in case there are more features, the global similarity and the sum of all local similarities may be based).

In some implementations, a global feature to be recognized of the image to be recognized may be determined based on the feature map to be recognized, a contrast global feature of the contrast image may be determined based on the contrast feature map, and a global similarity between the image to be recognized and the contrast image may be determined based on the global feature to be recognized and the contrast global feature. The global features to be identified can be determined by performing global pooling on the feature map to be identified. Similarly, the comparison global features may be determined by global pooling of the comparison feature map. Further, the global feature to be recognized can be processed by using a fully connected network to obtain the identity recognition feature of the image to be recognized. Similarly, the comparison global features may be processed using a fully connected network to obtain identification features of the comparison image. The global similarity may be determined based on euclidean distances between the identification features of the image to be identified and the identification features of the comparison image.

Whether the first object and the second object belong to the same object may be determined based on object similarity between the first object and the second object. For example, the first object and the second object may be considered to belong to the same object when the object similarity is greater than a predetermined object similarity threshold based on a predetermined object similarity threshold. For another example, the method 200 may be utilized to process the image to be recognized and the plurality of comparison images to be screened, respectively, and determine at least one second object that is most similar to the first object in the image to be recognized from the plurality of comparison images to be screened.

By the method provided by the disclosure, information of different characteristics of the object existing in the image can have different influences on the similarity of the object. Taking the example where the object in the image is a vehicle, the image information that the roof area of the vehicle can provide is difficult to use to distinguish between different vehicle objects. In this case, by appropriately reducing the weight used for the roof region, the influence of the roof information on the object similarity can be reduced, the importance of the image information of other features can be emphasized, and the accuracy of the object similarity can be improved. Through a comparison experiment, when the weight value of the roof area is 0.5 and the weight values of other areas are 1, the accuracy of the object similarity can be improved by about 1%.

Fig. 3 illustrates an exemplary flow of an image processing method according to an embodiment of the present disclosure. The process flow shown in fig. 3 may be utilized to process either the image to be recognized or the contrast image. Wherein the image to be identified and the comparison image comprise the vehicle.

In operation 301, feature extraction may be performed on the input image 320 (i.e., the image to be recognized or the contrast image) to obtain a feature map 321. For example, the input image 320 may be processed using various trained convolutional neural networks for extracting semantic features of the image to obtain a feature map 321.

In operation 302, an image segmentation may be performed on the input image 320 to obtain a mask of image regions indicating four features of an object present in the input image. The input image 320 may be processed using various trained image segmentation networks for image segmentation to obtain an image mask 322. Wherein the image masks 322 may include a first mask for a roof area, a second mask for a head area, a third mask for a side area, and a fourth mask for a tail area.

In operation 303, local features 323 may be obtained by mask averaging pooling (mask averaging pooling) of the image regions based on the respective features and the mask of the feature map 321, where the local features 323 may include a roof feature of the roof region, a nose feature of the nose region, a side feature of the side region, and a tail feature of the tail region.

In operation 304-₁、a₂、a₃、a₄ Local similarity 324 is determined, wherein the local similarity 324 includes a first local similarity of the roof region, a second local similarity of the nose region, a third local similarity of the side region, and a fourth local similarity of the tail region. Example (b)For example, a first local distance of a roof feature of the image to be recognized and of a roof feature of the comparison image can be determined and based on the attention parameter a₁A first similarity of the roof region of the image to be recognized and the comparison image is determined. Local similarity of other features may be determined based on a similar method. Wherein the attention parameter a₁Is determined based on the roof mask of the image to be recognized, the roof mask of the comparison image, and the first weight of the roof region using the method described in step S208. Similarly, the attention parameter a may be determined based on the head mask of the image to be recognized, the head mask of the comparison image, and the second weight of the head region using the method described in step S208₂. The attention parameter a may be determined based on the vehicle-side mask of the image to be recognized, the vehicle-side mask of the comparison image, and the third weight of the vehicle-side region₃. The attention parameter a can be determined based on the vehicle tail mask of the image to be recognized, the vehicle tail mask of the comparison image and the fourth weight of the vehicle tail area₄。

In operation 308, global features 325 of the input image 320 may be determined based on the feature map 321. For example, the feature map may be processed by global mean pooling, normalization, etc. to obtain the global features 325.

In operation 309, the global features 325 may be processed to obtain the identification features 326 of the input image 320.

In operation 310, a global similarity 327 may be determined based on the identification features 326. For example, the global similarity 327 may be determined based on a euclidean distance between the identification features of the image to be identified and the identification features of the comparison image.

In operation 311, an object similarity 328 between the first object in the image to be recognized and the second object in the compared image may be determined based on the global similarity 327 and the local similarity 324. For example, the sum of the global similarity 327 and the local similarity 324 may be determined as the object similarity 328.

Fig. 4 illustrates an exemplary block diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 4, the image processing apparatus 400 may include an acquisition unit 410, a first local distance determination unit 420, a second local distance determination unit 430, a first attention parameter determination unit 440, a second attention parameter determination unit 450, and a similarity determination unit 460.

Wherein the acquiring unit 410 may be configured to acquire an image to be identified comprising a first object and a contrast image comprising a second object. The first local distance determination unit 420 may be configured to determine a first local distance between a first image corresponding to a first feature of the first object and a second image corresponding to the first feature of the second object. The second local distance determination unit 430 may be configured to determine a second local distance between a third image corresponding to a second feature of the first object and a fourth image corresponding to the second feature of the second object, wherein the image to be identified comprises the first image and the third image, and the comparison image comprises the second image and the fourth image. The first attention parameter determining unit 440 may be configured to determine the first attention parameter for the first local distance based on a first weight, a first area of the first image and a second area of the second image. The second attention parameter determination unit 450 may be configured to determine a second attention parameter for the second local distance based on a second weight, a third area of the third image, and a fourth area of the fourth image. The similarity determination unit 460 may be configured to determine an object similarity between the first object and the second object based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter.

In some embodiments, the first local distance determination unit 420 may be configured to: performing image segmentation on the image to be recognized to obtain a first local mask to be recognized corresponding to the first feature of the first object; determining a feature map to be recognized of the image to be recognized; pooling the feature map to be identified based on the first local mask to be identified to obtain a first local feature to be identified of the first image; performing image segmentation on the comparison image to obtain a first comparison local mask corresponding to the first feature of the second object; determining a contrast feature map of the contrast image; pooling the contrast feature map based on the first contrast local mask to obtain first contrast local features of the second image; determining the first local distance based on the first to-be-identified local feature and the first comparison local feature.

In some embodiments, the second local distance determination unit 430 may be configured to: performing image segmentation on the image to be recognized to obtain a second local mask to be recognized corresponding to the second feature of the first object; determining a feature map to be recognized of the image to be recognized; pooling the feature map to be identified based on the second local mask to be identified to obtain a second local feature to be identified of the third image; performing image segmentation on the comparison image to obtain a second comparison local mask corresponding to the second feature of the second object; determining a contrast feature map of the contrast image; pooling the contrast feature map based on the second contrast local mask to obtain second contrast local features of the fourth image; determining the second local distance based on the second local feature to be identified and the second comparison local feature.

In some embodiments, the first attention parameter determination unit 440 may be configured to: determining the first attention parameter based on a product of the first weight and the first area, and a product of the first weight and the second area.

In some embodiments, the second attention parameter determination unit 450 may be configured to: determining the second attention parameter based on a product of the second weight and a third area, and a product of the second weight and the fourth area.

In some embodiments, the similarity determination unit 460 may be configured to: determining a first local similarity based on a product of the first attention parameter and the first local distance; determining a second local similarity based on a product of the second attention parameter and the second local distance; determining global features to be identified of the image to be identified based on the feature map to be identified; determining a contrast global feature of the contrast image based on the contrast feature map; determining a global similarity between the image to be identified and the comparison image based on the global feature to be identified and the comparison global feature; and determining the object similarity based on a sum of the global similarity, the first local similarity, and the second local similarity.

In some embodiments, the first object and the second object may be vehicles.

In some embodiments, the first feature may be a roof and the second feature may be one of a nose, a side, and a tail.

In some embodiments, the first weight may be less than the second weight.

In some embodiments, the first weight may be one-half of the second weight.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

According to an embodiment of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to one or more embodiments of the present disclosure.

There is also provided, in accordance with an embodiment of the present disclosure, a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform a method in accordance with one or more embodiments of the present disclosure.

There is also provided, in accordance with an embodiment of the present disclosure, a computer program product, including a computer program, wherein the computer program, when executed by a processor, implements a method in accordance with one or more embodiments of the present disclosure.

Referring to fig. 5, a block diagram of a structure of an electronic device 500, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506, an output unit 507, a storage unit 508, and a communication unit 509. The input unit 506 may be any type of device capable of inputting information to the electronic device 500, and the input unit 506 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote controller. Output unit 507 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 508 may include, but is not limited to, a magnetic disk, an optical disk. Communication unit 509 allow the electronic device 500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as bluetooth^TMDevices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above, such as the method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of the method 200 described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the method 200 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. An image processing method comprising:

acquiring an image to be recognized including a first object and a contrast image including a second object;

determining a first local distance between a first image corresponding to a first feature of the first object and a second image corresponding to the first feature of the second object;

determining a second local distance between a third image corresponding to a second feature of the first object and a fourth image corresponding to the second feature of the second object, wherein the image to be identified comprises the first image and the third image, and the comparison image comprises the second image and the fourth image;

determining a first attention parameter for the first local distance based on a first weight, a first area of the first image, and a second area of the second image;

determining a second attention parameter for the second local distance based on a second weight, a third area of the third image, and a fourth area of the fourth image;

determining an object similarity between the first object and the second object based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter.

2. The image processing method of claim 1, wherein determining the first local distance between the first image and the second image comprises:

performing image segmentation on the image to be recognized to obtain a first local mask to be recognized corresponding to the first feature of the first object;

determining a feature map to be recognized of the image to be recognized;

pooling the feature map to be identified based on the first local mask to be identified to obtain a first local feature to be identified of the first image;

performing image segmentation on the comparison image to obtain a first comparison local mask corresponding to the first feature of the second object;

determining a contrast feature map of the contrast image;

pooling the contrast feature map based on the first contrast local mask to obtain first contrast local features of the second image;

determining the first local distance based on the first to-be-identified local feature and the first comparison local feature.

3. The image processing method of claim 1, wherein determining a second local distance between the third image and the fourth image comprises:

performing image segmentation on the image to be recognized to obtain a second local mask to be recognized corresponding to the second feature of the first object;

determining a feature map to be recognized of the image to be recognized;

pooling the feature map to be identified based on the second local mask to be identified to obtain a second local feature to be identified of the third image;

performing image segmentation on the comparison image to obtain a second comparison local mask of the fourth image;

determining a contrast feature map of the contrast image;

pooling a contrast feature map corresponding to the second feature of the second object based on the second contrast local mask to obtain a second contrast local feature;

determining the second local distance based on the second local feature to be identified and the second comparison local feature.

4. The image processing method of claim 2 or 3, wherein determining an object similarity between the first object and the second object based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter comprises:

determining a first local similarity based on a product of the first attention parameter and the first local distance;

determining a second local similarity based on a product of the second attention parameter and the second local distance;

determining global features to be identified of the image to be identified based on the feature map to be identified;

determining a contrast global feature of the contrast image based on the contrast feature map;

determining a global similarity between the image to be identified and the comparison image based on the global feature to be identified and the comparison global feature; and

determining the object similarity based on a sum of the global similarity, the first local similarity, and the second local similarity.

5. The image processing method of claim 1, wherein determining the first attention parameter for the first local distance based on the first weight, the first area of the first image, and the second area of the second image comprises:

determining the first attention parameter based on a product of the first weight and the first area, and a product of the first weight and the second area.

6. The image processing method of claim 1, wherein determining a second attention parameter for the second local distance based on a second weight, a third area of the third image, and a fourth area of the fourth image comprises:

determining the second attention parameter based on a product of the second weight and the third area, and a product of the second weight and the fourth area.

7. The image processing method of any of claims 1 to 6, wherein the first object and the second object are vehicles.

8. The image processing method of claim 7, wherein the first feature is a roof and the second feature is one of a nose, a side, and a tail.

9. The image processing method of claim 8, wherein the first weight is smaller than the second weight.

10. The image processing method of claim 9, wherein the first weight is one-half of the second weight.

11. An image processing apparatus comprising:

an acquisition unit configured to acquire an image to be recognized including a first object and a comparison image including a second object;

a first local distance determination unit configured to determine a first local distance between a first image corresponding to a first feature of the first object and a second image corresponding to the first feature of the second object;

a second local distance determination unit configured to determine a second local distance between a third image corresponding to a second feature of the first object and a fourth image corresponding to the second feature of the second object, wherein the image to be recognized includes the first image and the third image, and the comparison image includes the second image and the fourth image;

a first attention parameter determination unit configured to determine a first attention parameter of the first local distance based on a first weight, a first area of the first image, and a second area of the second image;

a second attention parameter determination unit configured to determine a second attention parameter of the second local distance based on a second weight, a third area of the third image, and a fourth area of the fourth image;

a similarity determination unit configured to determine an object similarity between the first object and the second object based on the first local distance, the first attention parameter, the second local distance, and the second attention parameter.

12. The image processing apparatus according to claim 11, wherein the first local distance determination unit is configured to:

determining a feature map to be recognized of the image to be recognized;

determining a contrast feature map of the contrast image;

13. The image processing apparatus according to claim 11, wherein the second local distance determination unit is configured to:

determining a feature map to be recognized of the image to be recognized;

performing image segmentation on the comparison image to obtain a second comparison local mask corresponding to the second feature of the second object;

determining a contrast feature map of the contrast image;

pooling the contrast feature map based on the second contrast local mask to obtain second contrast local features of the fourth image;

14. The image processing apparatus according to claim 12 or 13, wherein the similarity determination unit is configured to:

15. The image processing apparatus according to claim 11, wherein the first attention parameter determination unit is configured to:

16. The image processing apparatus according to claim 11, wherein the second attention parameter determination unit is configured to:

17. The image processing apparatus according to any one of claims 11 to 16, wherein the first object and the second object are vehicles.

18. The image processing apparatus of claim 17, wherein the first feature is a roof and the second feature is one of a nose, a side, and a tail.

19. The image processing apparatus according to claim 18, wherein the first weight is smaller than the second weight.

20. The image processing apparatus according to claim 19, wherein the first weight is one-half of the second weight.

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-10 when executed by a processor.