CN113190701A

CN113190701A - Image retrieval method, device, equipment, storage medium and computer program product

Info

Publication number: CN113190701A
Application number: CN202110493228.9A
Authority: CN
Inventors: 彭玉龙; 甘露; 陈亮辉
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-07-30

Abstract

The present disclosure discloses an image retrieval method, apparatus, device, storage medium and computer program product, relating to the technical field of artificial intelligence, in particular to computer vision and deep learning. One embodiment of the method comprises: acquiring a target image containing a target object; extracting object feature vectors and attribute information of the target object based on the target image; fusing the object feature vector and the attribute information to obtain a target feature vector; and searching based on the target characteristic vector to obtain a search result. According to the embodiment, the object feature vector and the attribute information of the target object are fused, and the retrieval is performed based on the fused target feature vector, so that the retrieval accuracy and efficiency are improved on the premise of ensuring the retrieval recall rate.

Description

Image retrieval method, device, equipment, storage medium and computer program product

Technical Field

The embodiment of the disclosure relates to the field of computers, in particular to the technical field of artificial intelligence such as computer vision and deep learning, and in particular relates to an image retrieval method, an image retrieval device, image retrieval equipment, a storage medium and a computer program product.

Background

With the continuous development of deep learning technology, the deep learning technology is widely applied in various fields. For example, deep learning techniques are applied to the field of image retrieval, and face retrieval is an important technique in the field of image retrieval. The face retrieval is a new biological recognition technology which integrates computer image processing knowledge and biological statistics knowledge, and has wide application prospect. The face retrieval technology is applied to places such as parks, factories, squares, meeting centers, stadiums, schools, hospitals, commercial streets, hotels, catering and entertainment places, office buildings, elevators and the like.

Disclosure of Invention

The embodiment of the disclosure provides an image retrieval method, an image retrieval device, image retrieval equipment, a storage medium and a computer program product.

In a first aspect, an embodiment of the present disclosure provides an image retrieval method, including: acquiring a target image containing a target object; extracting object feature vectors and attribute information of the target object based on the target image; fusing the object feature vector and the attribute information to obtain a target feature vector; and searching based on the target characteristic vector to obtain a search result.

In a second aspect, an embodiment of the present disclosure provides an image retrieval apparatus, including: an acquisition module configured to acquire a target image containing a target object; an extraction module configured to extract an object feature vector and attribute information of a target object based on a target image; the fusion module is configured to fuse the object feature vector and the attribute information to obtain a target feature vector; and the retrieval module is configured to perform retrieval based on the target feature vector to obtain a retrieval result.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

In a fourth aspect, the disclosed embodiments propose a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.

In a fifth aspect, the present disclosure provides a computer program product including a computer program, which when executed by a processor implements the method as described in any implementation manner of the first aspect.

The image retrieval method provided by the embodiment of the disclosure improves the accuracy and efficiency of retrieval on the premise of ensuring the retrieval recall rate.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of an image retrieval method according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of an image retrieval method according to the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of an image retrieval method according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of an image retrieval device according to the present disclosure;

fig. 6 is a block diagram of an electronic device for implementing an image retrieval method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the image retrieval method or image retrieval apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or transmit a target image or the like. Various client applications, such as a camera application, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services. For example, the server 105 may analyze and process the target images acquired from the

terminal apparatuses

101, 102, 103, and generate a processing result (e.g., a retrieval result).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the image retrieval method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the image retrieval apparatus is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of an image retrieval method according to the present disclosure is shown. The image retrieval method comprises the following steps:

step 201, a target image containing a target object is acquired.

In the present embodiment, an execution subject of the image retrieval method (e.g., the server 105 shown in fig. 1) may acquire a target image including a target object. The target image may be acquired by an image sensor, the image sensor is a sensor capable of acquiring an image, the image sensor in this embodiment is a camera sensor (hereinafter referred to as a camera), and other image sensors may also be used according to actual situations, which is not limited in this disclosure.

The target image acquired by the image sensor contains a target object, and the target object is an object needing to be retrieved.

In practical application, the image retrieval method provided by the present disclosure can be applied to a smart city, the target image can be collected by a fixed camera, and the fixed camera is a camera which is distributed anywhere in the smart city, such as on a road, in a cell, and the like. After the fixed cameras collect images in real time, the collected images can be uploaded to an image database, and the images collected by all the fixed cameras are stored in the image database.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

Step 202, extracting object feature vectors and attribute information of the target object based on the target image.

In this embodiment, the execution subject may extract the object feature vector and the attribute information of the target object based on the target image. Since the target image contains the target object, the object feature vector and the attribute information of the target object in the target image can be extracted, wherein the object feature vector can be a face feature vector of the target object, and the attribute information is other related information of the target object in the target image, such as clothing color of the target object, a place where the target object is located, and the like.

Optionally, the face feature vector of the target object may be obtained through a pre-trained Convolutional Neural Network (CNN) model, and the target image including the face of the target object is input into the pre-trained CNN model and output to obtain the face feature vector of the target object.

Optionally, the CNN model may be obtained by training through the following steps: obtaining a training sample set, wherein training samples in the training sample set comprise: the method comprises the steps of obtaining a sample image containing a target object face and a face feature vector of a target object; and taking a sample image containing the face of the target object as input, taking the face characteristic vector of the target object as output, and obtaining the CNN model through training.

And step 203, fusing the object characteristic vector and the attribute information to obtain a target characteristic vector.

In this embodiment, the execution body may fuse the object feature vector and the attribute information to obtain a target feature vector. In some cases, the quality, the definition, and the integrity of the target image may not be high due to insufficient illumination, a poor shooting angle, a target object being blocked, and the like of the target image acquired by the image sensor, and the face feature vector of the target object extracted from the target image may not accurately represent the face feature of the target object. Therefore, in this embodiment, the object feature vector of the target object is fused with the attribute information thereof to obtain the target feature vector, and the target feature vector can more comprehensively represent the features of the target object.

And step 204, retrieving based on the target characteristic vector to obtain a retrieval result.

In this embodiment, the execution subject may perform a search based on the target feature vector, so as to obtain a search result. Because the face feature vector of the target object and the related attribute information are fused in the target feature vector, the target feature vector is used for searching, so that the searching result is more accurate, and the searching accuracy and efficiency are improved.

Optionally, the executing entity performs similarity retrieval on the target feature vector and the face feature vector of the image in the image database to obtain an image with high similarity, and further locks the target through the image with high similarity.

The image retrieval method provided by the embodiment of the disclosure includes the steps of firstly, acquiring a target image containing a target object; then extracting object characteristic vectors and attribute information of the target object based on the target image; then fusing the object feature vector and the attribute information to obtain a target feature vector; and finally, searching based on the target characteristic vector to obtain a searching result. The invention provides an image retrieval method, which can improve the accuracy of retrieval results on the premise of ensuring the recall rate by fusing object feature vectors and attribute information of a target object to obtain target feature vectors and then using the target feature vectors for retrieval.

With continued reference to fig. 3, fig. 3 illustrates a flow 300 of another embodiment of an image retrieval method according to the present disclosure. The image retrieval method comprises the following steps:

step 301, a target image containing a target object is acquired.

In the present embodiment, an execution subject of the image retrieval method (e.g., the server 105 shown in fig. 1) may acquire a target image including a target object. Step 301 is substantially the same as step 201 in the foregoing embodiment, and the specific implementation manner may refer to the foregoing description of step 201, which is not described herein again.

Step 302, extracting object feature vectors and attribute information of the target object based on the target image.

In this embodiment, the execution subject may extract an object feature vector and attribute information of the target object based on the target image. Step 302 is substantially the same as step 202 in the foregoing embodiment, and the detailed implementation manner may refer to the foregoing description of step 202, which is not described herein again.

In some optional implementations of this embodiment, the attribute information includes, but is not limited to, at least one of: the space-time attribute information of the target image, the personalized attribute information of the target object and the vehicle attribute information of the vehicle in which the target object is positioned. The temporal-spatial attribute information of the target image is related information of a camera for collecting the target image, the personalized attribute information of the target object is related information of the target object, such as clothes color, age and sex of the target object, whether glasses are worn, whether a hat is worn, whether a mask is worn and the like, and the vehicle attribute information of the vehicle where the target object is located is related information of the vehicle where the target object is located when the target object in the target image is riding the vehicle. In some cases, the camera may not be able to completely collect the facial features of the target object, so that the related attribute information of the target object can be obtained, thereby representing the target object more comprehensively.

In some optional implementations of this embodiment, the spatio-temporal attribute information includes, but is not limited to, at least one of: shooting time information, identification information of a shot image sensor, and longitude and latitude information of the image sensor. The target image is collected by an image sensor (camera), the current camera is generally an IP (Internet Protocol ) camera, the IP camera is generally a network camera, and the IP camera comprises a common camera, a video server, a network card, application software and the like, and some cameras further comprise a pan-tilt and a zoom lens. When the IP camera collects images, the collected time information, the identification information of the camera and the latitude and longitude information of the camera can be generated. By acquiring the shooting time information, the identification information of the shot image sensor and the latitude and longitude information of the image sensor, the time-space attribute information of the target object, namely the geographical position information, the action track information and the like related to the target object can be obtained.

In some optional implementations of the present embodiment, the vehicle attribute information includes, but is not limited to, at least one of: license plate information, vehicle color information and position information of a target object in the vehicle. When the target object in the target image is riding in the vehicle, vehicle attribute information of the vehicle in which the target object is located, such as a license plate number, a color of the vehicle, and position information of the target object in the vehicle, for example, whether the target object is in a main driving position or a secondary driving position, may be obtained. Thereby obtaining vehicle attribute information related to the target object.

And 303, fusing the object feature vector and the attribute information based on a pre-trained feature fusion model to obtain a target feature vector.

In this embodiment, the executing agent may fuse the object feature vector and the attribute information based on a pre-trained feature fusion model to obtain a target feature vector. Wherein, the characteristic fusion model is a Deep model in the Wide & Deep model. The Wide & Deep Model comprises a Wide layer/Model and a Deep layer/Model, wherein the Wide Model is a generalized linear Model (Wide linear Model), and the Deep Model is a Deep Neural Network (DNN) Model. The DNN model can fuse the object feature vector and the attribute information to obtain a target feature vector. In the step, the object feature vector and the attribute information are fused through the feature fusion model, so that the fused target feature vector comprises the target image and each feature information of the target object, and the target feature vector can comprehensively represent the target object.

As an example, the DNN model may fuse the object feature vector with the spatiotemporal attribute information of the target image to obtain the target feature vector.

As another example, the DNN model may fuse the object feature vector with personalized attribute information of the target object to obtain a target feature vector.

As another example, the DNN model may fuse the object feature vector, the temporal-spatial attribute information of the target image, and the personalized attribute information of the target object to obtain a fused target feature vector.

And 304, searching based on the target characteristic vector to obtain a search result.

In this embodiment, the execution subject may perform a search based on the target feature vector to obtain a search result. Step 304 is substantially the same as step 204 in the foregoing embodiment, and the specific implementation manner may refer to the foregoing description of step 204, which is not described herein again.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, in the image retrieval method in this embodiment, attribute information such as spatio-temporal attribute information of the target image, personalized attribute information of the target object, and/or vehicle attribute information of the vehicle in which the target object is located is obtained, the object feature vector is fused with the attribute information based on a pre-trained feature fusion model, and retrieval is performed based on the fused target feature vector to obtain a retrieval result. The method introduces the related attribute information of the target image and the target object, can solve the problem of low accuracy of the retrieval result caused by insufficient illumination, poor shooting angle, object shielding and the like, and improves the accuracy of the retrieval result.

With continued reference to fig. 4, fig. 4 illustrates a flow 400 of yet another embodiment of an image retrieval method according to the present disclosure. The image retrieval method comprises the following steps:

step 401, a target image containing a target object is acquired.

In the present embodiment, an execution subject of the image retrieval method (e.g., the server 105 shown in fig. 1) may acquire a target image including a target object. Step 401 is substantially the same as step 301 in the foregoing embodiment, and the specific implementation manner may refer to the foregoing description of step 301, which is not described herein again.

Step 402, extracting the face feature vector, the personalized attribute information, the time-space attribute information of the target image and the vehicle attribute information of the vehicle where the target object is located.

In this embodiment, the execution subject may extract, from the target image, the face feature vector and the personalized attribute information of the target object, the spatiotemporal attribute information of the target image, and the vehicle attribute information of the vehicle in which the target object is located.

The face feature vector of the target object can be obtained through a pre-trained CNN model; the personalized attribute information of the target object comprises the information of the target object such as clothing color, age, gender and the like; the time-space attribute information of the target image is time information, longitude and latitude information and the name of the camera which are recorded by the IP camera when the target image is collected; the vehicle attribute information of the vehicle where the target object is located includes information such as the color of the vehicle, the license plate number, and the position of the target object in the vehicle.

And 403, coding each personalized attribute information, each space-time attribute information and each vehicle attribute information by using a feature fusion model, and performing dimension reduction compression to obtain corresponding feature vectors.

In this embodiment, the execution subject may use a pre-trained feature fusion model to encode each personalized attribute information, each spatiotemporal attribute information, and each vehicle attribute information, and perform dimension reduction compression, so as to obtain corresponding feature vectors.

The personalized attribute information comprises the clothing color, age and sex of the target object, whether glasses are worn or not, whether safety helmets are worn or not and the like, and the processing process of the personalized attribute information comprises the following steps: and analyzing the personalized attributes of the target object by adopting a CNN model to obtain corresponding personalized feature vectors. The output of the attributes such as the clothing color, the age, the sex, whether glasses are worn or not, whether safety helmets are worn or not and the like of the target object is the discrete One-Hot code, so that dense personalized feature vectors can be obtained only by establishing a vector space by respectively using Embedding for each attribute.

Where One-Hot encoding is a representation of a classification variable as a binary vector that maps classification values to integer values, and then each integer value is represented as a binary vector, which is a zero value, except for the index of the integer, which is labeled 1.

Embedding is one way to convert discrete variables into a continuous vector representation. In neural networks, Embedding can reduce the spatial dimension of a discrete variable, while also representing the variable meaningfully.

When using Embedding, the dimension of Embedding after compression needs to be calculated first, and in this embodiment, the dimension of Embedding after compression can be calculated based on the following formula:

where dim represents the compressed Embedding dimension, and vocab represents the length of the value after One-Hot encoding.

The time-space attribute information includes shooting time information, latitude and longitude information of the IP camera, and identification information of the IP camera, so the processing process of the time-space attribute information includes:

firstly, extracting the time information recorded by the IP camera when a target image is collected, carrying out year, month and day in the time information, respectively carrying out One-Hot coding (One-Hot coding) on the year, month and day, and compressing the value after the One-Hot coding by using Embedding after coding, thereby obtaining an intensive time vector. The compressed Embedding dimension can be calculated by formula (1).

And secondly, for the longitude and latitude information of the IP camera, a GeoHash coding method can be used for coding the longitude and latitude information. The GeoHash is an address coding method, and can code two-dimensional space longitude and latitude data into a character string. After the latitude and longitude information is encoded, a corresponding 12-bit encoded value is obtained, and the encoded value is a character string. And then, carrying out One-Hot coding on the character string, and carrying out feature compression by adopting a word to vector (word to vector, which is used for generating a correlation model of a word vector), thereby obtaining a dense vector capable of representing the space position of the target image.

For the name information of the IP camera, removing some irrelevant information (such as manufacturer identification information) in the name information, dividing the name information into words to obtain information after word division, performing One-Hot coding on the information after word division, and finally performing feature compression on the coded result by using word2vec to obtain a dense feature vector.

The vehicle attribute information comprises information such as a license plate number, a color of the vehicle, a position of a target object in the vehicle and the like, and the processing process of the vehicle attribute information comprises the following steps:

firstly, a license plate number of a vehicle where a target object is located can be obtained through a license plate recognition engine, One-Hot coding is carried out on the license plate number according to characters, and after coding, word2vec is adopted for feature dimension reduction, so that a feature vector of the license plate number is obtained.

Secondly, color information of the vehicle can be obtained through the target image, One-Hot coding is carried out on the color, and word2vec is adopted to carry out feature dimension reduction after coding, so that a feature vector of the color is obtained.

And thirdly, judging the position information (a main driving position or a secondary driving position) of the target object in the vehicle through the collected face position information of the target object in the target image and the position information detected by the vehicle, and carrying out One-Hot coding on the position information to obtain the characteristic vector of the position information of the target object.

Through the steps, the attribute information of the target image and the target object can be respectively coded and dimension-reduced, so that the feature vector corresponding to each attribute information can be obtained. By converting each attribute information into a corresponding feature vector, fusion can be performed in subsequent steps.

And step 404, fusing the face feature vector with the feature vector to obtain a target feature vector.

In this embodiment, the executing agent may fuse the face feature vector with all the feature vectors obtained in step 403 to obtain a target feature vector.

In this embodiment, all feature vectors are fused based on the following formula:

y_v＝ReLU(x_v*W_v+b_v)

y_u＝ReLU(x_u*W_u+b_u)

y_concat＝ReLU([y_v：y_u]*W+b)

wherein x is_vIs a face feature vector, x_uW, b is a characteristic vector obtained by One-Hot coding and dimension reduction compression of personalized attribute information, space-time attribute information and vehicle attribute information, y is a corresponding parameter_v、y_uIs equal to x_v、x_uCorresponding output, ReLU is the activation function, y_concatA feature fusion model is represented, the output of which is a target feature vector.

According to the formula, the feature fusion model fuses the human face feature vector with the output result of the feature vector obtained by performing One-Hot coding and dimension reduction compression on the human face feature vector, the personalized attribute information, the time-space attribute information and the vehicle attribute information, so that the target feature vector is obtained.

And 405, searching based on the target characteristic vector to obtain a search result.

In this embodiment, the executing entity performs a search based on the target feature vector obtained in step 404 to obtain a search result. Step 405 is substantially the same as step 304 in the foregoing embodiment, and the specific implementation manner may refer to the foregoing description of step 304, which is not described herein again.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 3, the image retrieval method in this embodiment extracts the face feature vector, the personalized attribute information, the temporal-spatial attribute information, and the vehicle attribute information of the target object, encodes and reduces the dimension of each attribute information by using the feature fusion model to obtain the feature vector corresponding to each attribute information, and finally fuses the face feature vector and the feature vector corresponding to each attribute information to obtain the target feature vector, which can comprehensively and accurately represent the relevant features of the target object, and performs retrieval by using the target feature vector, so that the accuracy of the retrieval result can be improved on the premise of maintaining the retrieval recall rate.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image retrieval apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the image retrieval apparatus 500 of the present embodiment may include: an acquisition module 501, an extraction module 502, a fusion module 503, and a retrieval module 504. The acquiring module 501 is configured to acquire a target image containing a target object; an extraction module 502 configured to extract an object feature vector and attribute information of a target object based on a target image; a fusion module 503 configured to fuse the object feature vector with the attribute information to obtain a target feature vector; and the retrieval module 504 is configured to perform retrieval based on the target feature vector to obtain a retrieval result.

In the present embodiment, in the image retrieval apparatus 500: the specific processing and the technical effects of the obtaining module 501, the extracting module 502, the fusing module 503 and the retrieving module 504 can refer to the related descriptions of step 201 and step 204 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of this embodiment, the attribute information includes at least one of: the space-time attribute information of the target image, the personalized attribute information of the target object and the vehicle attribute information of the vehicle in which the target object is positioned.

In some optional implementations of this embodiment, the spatio-temporal attribute information includes at least one of: shooting time information, identification information of a shot image sensor, and longitude and latitude information of the image sensor.

In some optional implementations of the embodiment, the vehicle attribute information includes at least one of: license plate information, vehicle color information and position information of a target object in the vehicle.

In some optional implementations of this embodiment, the fusion module includes: and the fusion sub-module is configured to fuse the object feature vector and the attribute information based on a pre-trained feature fusion model to obtain a target feature vector.

In some optional implementations of this embodiment, the fusion submodule is further configured to: encoding the attribute information by using a feature fusion model, and performing dimension reduction compression to obtain an attribute feature vector of the attribute information; and fusing the object feature vector and the attribute feature vector to obtain a target feature vector.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the image retrieval method. For example, in some embodiments, the image retrieval method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the image retrieval method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the image retrieval method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and Virtual Private Server (VPS) service.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image retrieval method, comprising:

acquiring a target image containing a target object;

extracting object feature vectors and attribute information of the target object based on the target image;

fusing the object feature vector and the attribute information to obtain a target feature vector;

and searching based on the target characteristic vector to obtain a search result.

2. The method of claim 1, wherein the attribute information comprises at least one of: the space-time attribute information of the target image, the personalized attribute information of the target object and the vehicle attribute information of the vehicle in which the target object is positioned.

3. The method of claim 2, wherein the spatio-temporal attribute information comprises at least one of: shooting time information, identification information of a shot image sensor, and longitude and latitude information of the image sensor.

4. The method of claim 2, wherein the vehicle attribute information comprises at least one of: license plate information, vehicle color information, and position information of the target object in the vehicle.

5. The method according to any one of claims 1-4, wherein the fusing the object feature vector with the attribute information to obtain a target feature vector comprises:

and fusing the object feature vector and the attribute information based on a pre-trained feature fusion model to obtain the target feature vector.

6. The method of claim 5, wherein the fusing the object feature vector and the attribute information based on a pre-trained feature fusion model to obtain the target feature vector comprises:

encoding the attribute information by using the feature fusion model, and performing dimension reduction compression to obtain an attribute feature vector of the attribute information;

and fusing the object feature vector and the attribute feature vector to obtain the target feature vector.

7. An image retrieval apparatus comprising:

an acquisition module configured to acquire a target image containing a target object;

an extraction module configured to extract an object feature vector and attribute information of the target object based on the target image;

the fusion module is configured to fuse the object feature vector and the attribute information to obtain a target feature vector;

and the retrieval module is configured to perform retrieval based on the target feature vector to obtain a retrieval result.

8. The apparatus of claim 7, wherein the attribute information comprises at least one of: the space-time attribute information of the target image, the personalized attribute information of the target object and the vehicle attribute information of the vehicle in which the target object is positioned.

9. The apparatus of claim 8, wherein the spatio-temporal attribute information comprises at least one of: shooting time information, identification information of a shot image sensor, and longitude and latitude information of the image sensor.

10. The apparatus of claim 8, wherein the vehicle attribute information comprises at least one of: license plate information, vehicle color information, and position information of the target object in the vehicle.

11. The apparatus of any one of claims 7-10, wherein the fusion module comprises:

and the fusion sub-module is configured to fuse the object feature vector and the attribute information based on a pre-trained feature fusion model to obtain the target feature vector.

12. The apparatus of claim 11, wherein the fusion sub-module is further configured to:

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.