CN113094538A

CN113094538A - Image retrieval method, device and computer-readable storage medium

Info

Publication number: CN113094538A
Application number: CN201911334579.4A
Authority: CN
Inventors: 张超颖; 项超; 刘珮; 贾丹; 赵啸宇; 孟维业; 王建秀
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2021-07-09

Abstract

The disclosure relates to a retrieval method and a retrieval device for images and a computer readable storage medium, and relates to the technical field of computers. Determining a description text of the target image by utilizing a first machine learning model according to the extracted feature vector of the target image; performing face recognition on the target image by using a second machine learning model to determine identity information of the target image; and matching the retrieval keywords provided by the user according to the description texts and the identity information of the target images, and determining corresponding retrieval results in the target images.

Description

Image retrieval method, device and computer-readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for retrieving an image, and a computer-readable storage medium.

Background

With the rapid development of technologies such as digital imaging and computer storage, a large amount of image data is generated every day through various intelligent terminals. With the rapid increase of image data volume, a large-scale visual information image library is formed, so that the traditional image retrieval based on file names cannot meet the requirements of people. In the face of hundreds of millions of image big data, how to accurately and rapidly retrieve the image required by the user becomes an urgent problem to be solved.

In the related art, the image retrieval method mainly includes: TBIR (Text-based Image Retrieval technology), which performs keyword matching to complete a Retrieval task by indexing the labeled Image keywords in the database; CBIR (Content-based Image Retrieval) searches for low-level visual features such as color, texture, and object outline included in an Image itself as a basis.

Disclosure of Invention

The inventors of the present disclosure found that the following problems exist in the above-described related art: the method is greatly influenced by human factors, and deep information of the image cannot be mined, so that the image retrieval result is inaccurate.

In view of this, the present disclosure provides an image retrieval technique, which can achieve accuracy of an image retrieval result.

According to some embodiments of the present disclosure, there is provided an image retrieval method including: determining a description text of the target image by utilizing a first machine learning model according to the extracted feature vector of the target image; performing face recognition on the target image by using a second machine learning model to determine identity information of the target image; and matching the retrieval keywords provided by the user according to the description texts and the identity information of the target images, and determining corresponding retrieval results in the target images.

In some embodiments, determining the description text of the target image comprises: determining the weight of each feature vector by using an attention mechanism module of a first machine learning model according to the extracted feature vectors of different regions in the target image; and determining the description text of the target image by utilizing the first machine learning model according to the feature vectors and the weights thereof.

In some embodiments, the first machine learning model is a double layer LSTM model (Long Short-Term Memory network) that includes a first LSTM module and a second LSTM module, and the attention mechanism module is disposed between the first LSTM module and the second LSTM module.

In some embodiments, determining the weight for each feature vector comprises: the weight of the feature vector is determined based on the output of the first LSTM module and the feature vector.

In some embodiments, determining the description text of the target image comprises: determining the output of a second LSTM module according to the output of the first LSTM module and the weighted sum of the feature vectors; and determining the description text of the target image according to the output of the second LSTM module.

In some embodiments, determining the description text of the target image comprises: determining the probability of each candidate word by using a first machine learning model; determining a plurality of candidate description texts of the target image according to the probability of each candidate word; determining the probability of each candidate description text according to the probability of each candidate word in each candidate description text; and determining one or more candidate description texts with the highest probability as the description texts.

In some embodiments, the method further comprises: and in response to the target image being deleted, deleting the corresponding description text and the corresponding identity information of the target image.

According to other embodiments of the present disclosure, there is provided an image retrieval apparatus including: the text determining unit is used for determining a description text of the target image by utilizing the first machine learning model according to the extracted feature vector of the target image; the information determining unit is used for carrying out face recognition on the target image by utilizing the second machine learning model and determining the identity information of the target image; and the retrieval unit is used for matching the retrieval keywords provided by the user according to the description texts and the identity information of the target images and determining corresponding retrieval results in the target images.

In some embodiments, the text determination unit determines a weight of each feature vector by using an attention mechanism module of the first machine learning model according to the extracted feature vectors of different regions in the target image, and determines the description text of the target image by using the first machine learning model according to each feature vector and the weight thereof.

In some embodiments, the first machine learning model is a two-layer LSTM model, the LSTM model including a first LSTM module and a second LSTM module, the attention mechanism module disposed between the first LSTM module and the second LSTM module; the text determination unit determines a weight of the feature vector based on the output of the first LSTM module and the feature vector.

In some embodiments, the text determination unit determines an output of a second LSTM module based on the output of the first LSTM module and a weighted sum of the feature vectors, and determines a descriptive text of the target image based on the output of the second LSTM module.

In some embodiments, the text determination unit determines a probability of each candidate word using the first machine learning model, determines a plurality of candidate description texts of the target image according to the probability of each candidate word, determines a probability of each candidate description text according to the probability of each candidate word in each candidate description text, and determines one or more candidate description texts with the highest probability as the description text.

In some embodiments, the apparatus further comprises: and the deleting unit is used for deleting the corresponding description text and the corresponding identity information of the target image in response to the target image being deleted.

According to still further embodiments of the present disclosure, there is provided an image retrieval apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the method of retrieving an image in any of the above embodiments based on instructions stored in the memory device.

According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the retrieval method of an image in any of the above embodiments.

In the above embodiment, the image description and the face recognition result determined by the machine learning method are combined as the basis for image retrieval. Therefore, the deep information of the image can be mined, and the accuracy of image retrieval is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure can be more clearly understood from the following detailed description with reference to the accompanying drawings, in which:

FIG. 1 illustrates a flow diagram of some embodiments of an image retrieval method of the present disclosure;

FIG. 2 illustrates a schematic diagram of some embodiments of an image retrieval method of the present disclosure;

FIG. 3 shows a schematic diagram of further embodiments of an image retrieval method of the present disclosure;

FIG. 4 illustrates a schematic diagram of some embodiments of an image retrieval device of the present disclosure;

FIG. 5 illustrates a flow diagram of further embodiments of an image retrieval method of the present disclosure;

FIG. 6 illustrates a flow diagram of further embodiments of the image retrieval method of the present disclosure;

FIG. 7 illustrates a block diagram of some embodiments of an image retrieval device of the present disclosure;

FIG. 8 shows a block diagram of further embodiments of an image retrieval device of the present disclosure;

fig. 9 shows a block diagram of further embodiments of an image retrieval device of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

As mentioned above, the TBIR needs to be manually labeled in advance, which has the problems of high labeling cost and low efficiency, and the manual labeling mode is affected by personal subjective factors; CBIR utilizes the low-level visual features of the image, and semantic gaps exist between the low-level visual features and the high-level semantic features of the image understood by people, so that the retrieval effect is poor.

Moreover, the current search method for personal photos is only a single search method based on file name, date and classification. And most of the photo libraries store personal life photos, but a retrieval mode based on the names of people is lacked. Therefore, in the case of a picture library with more than thousands of data, the user searches for the required target picture based on the single modes, which is inefficient and affects the searching efficiency of the user.

In order to solve the problems, the visual feature mapping of the lower layer of the image is automatically converted into high-layer semantic information (namely, an image semantic description algorithm) which is known by people through a computer from the content information of the image. Therefore, the accurate searching function of the target image can be realized by searching semantic information contained in the image.

The method solves the problem that the emphasis and the details of the generated sentences are not prominent by optimizing the neural network model. The computer automatically generates Chinese and English description sentences of each image, thereby improving the accuracy of image description. And a face recognition algorithm is introduced, so that Chinese and English description sentences and character identity information can be automatically generated for the photos in the photo library. And constructing a database of the pictures by using the semantic information, and further realizing accurate retrieval based on Chinese and English keywords and sentences and names of people through the database. For example, it can be realized by the following embodiments.

Fig. 1 illustrates a flow diagram of some embodiments of an image retrieval method of the present disclosure.

As shown in fig. 1, the method includes: step 110, determining a description text; step 120, determining identity information; step 130, determining the retrieval result.

In step 110, a description text of the target image is determined by using the first machine learning model according to the extracted feature vector of the target image. For example, the feature vector of the target image may be extracted by a CNN (Convolutional Neural Networks) module. The first machine learning model may be a deep learning machine learning model.

In some embodiments, determining a weight of each feature vector by using an Attention mechanism (Attention) module of the first machine learning model according to the extracted feature vectors of different regions in the target image; and determining the description text of the target image by utilizing the first machine learning model according to the feature vectors and the weights thereof.

For example, the first machine learning model is a two-layer LSTM model that includes a first LSTM module and a second LSTM module. An attention mechanism module is disposed between the first LSTM module and the second LSTM module. The weight of the feature vector is determined based on the output of the first LSTM module and the feature vector.

In some embodiments, the output of the second LSTM module is determined from the weighted sum of the output of the first LSTM module and the respective feature vectors; and determining the description text of the target image according to the output of the second LSTM module.

In some embodiments, a probability of each candidate word is determined using a first machine learning model; determining a plurality of candidate description texts of the target image according to the probability of each candidate word; determining the probability of each candidate description text according to the probability of each candidate word in each candidate description text; and determining one or more candidate description texts with the highest probability as the description texts.

In step 120, the second machine learning model is used to perform face recognition on the target image, and identify information of the target image is determined. The first machine learning model may be a deep learning machine learning model. The second machine learning model may be a deep learning machine learning model.

In step 130, the search keywords provided by the user are matched according to the description texts and the identity information of the target images, and corresponding search results are determined in the target images.

In some embodiments, in response to a target image being deleted, the corresponding descriptive text and identity information for the target image is deleted.

In the embodiment, starting from the content information of the image, the computer can automatically convert the visual feature mapping of the lower layer of the image into high-layer semantic information known by people by improving the image description algorithm and adding the face recognition algorithm. Thus, the retrieval function is realized by searching semantic information contained in the image itself. This may be achieved, for example, by the embodiment of fig. 2.

Fig. 2 shows a schematic diagram of some embodiments of an image retrieval method of the present disclosure.

As shown in fig. 2, in response to the image being stored in the image library, in terms of generating the chinese-english description sentence, a CNN (e.g., inclusion _ V4) model is used to extract a feature vector of the image; analyzing descriptive sentences (descriptive texts) of the images by using a double-layer LSTM model; and an improved Attention mechanism (Attention) is added so that the generated descriptive statement has corresponding weights for different regions of the image. In this way, the primary and secondary relationships of the image content can be highlighted, and the problem that the emphasis and the detail of the generated sentence are not highlighted is solved.

In response to the image being stored in the image repository, in generating the person identity information, a Face recognition algorithm (Face _ recognition) is introduced to build a Face database, automatically generating the person identity information in the photograph. Therefore, the semantic information of the photos can be enriched, and the user can better search the target photos.

And associating the generated description sentences and the identity information with corresponding images and storing the associated description sentences and the identity information into a database to serve as matching objects for keyword retrieval. Thus, the matching of keywords can be realized to search the image library.

Fig. 3 shows a schematic diagram of further embodiments of the image retrieval method of the present disclosure.

As shown in fig. 3, the feature extraction module of the input image may adopt an inclusion _ V4 network model to obtain high-level semantic information of the image; the descriptive sentence generating module may include a two-layer LSTM network model, and an Attention-improving mechanism (Attention) module for determining feature vector weight values for different regions of the image.

In some embodiments, a two-layer LSTM based loop attention mechanism weights the feature vectors of each image region such that each feature vector has a different weight value. When generating a Chinese output sequence, the double-layer LSTM loop attention mechanism model can select input sequence information with the maximum relevance to the word output at the current moment. For example, the model may also be learned by iteratively optimizing parameters of the network based on a loss function of the words.

In some embodiments, the two-layer LSTM model may include two layers of LSTM1 modules and LSTM2 modules.

And

respectively representing the outputs of the LSTM1 module and the LSTM2 module at time t, and so on at times t-1 and t + 1. v. of_iThe feature vector of the ith area of the image is an integer which is greater than or equal to 1 and less than or equal to k.

The feature vectors are weighted by the attention mechanism module. X_tAnd X_t+1The inputs to the LSTM1 module at time t and time t +1, respectively.

In some embodiments, the attention mechanism module calculates a weight a of the ith eigenvector at time t_i,tThe following were used:

α_t＝Softmax(a_t)

W_va、W_haand w_aIs a model parameter determined by training, phi (-) represents the cosine tangent function.

In some embodiments, the attention mechanism module weights each feature vector

The following were used:

in some embodiments, the model predicts the generated word S at time t_tThe probability of (c) is:

S₁to S_t-1Words generated from time 1 to time t-1. W_pAnd b_pAre model parameters obtained through training. In this case, for T moments, each word is represented byThe probabilities of the composed description text are:

in the above-described embodiments, the image description method is improved. An Attention mechanism is added, so that the generated descriptive statement has certain weight. Therefore, the primary and secondary relations of the image content are highlighted, and the function of automatically generating Chinese and English description sentences for the photos is realized. And a face recognition algorithm is added, and the multilayer semantic information of the photos is integrated. Thus, a semantic database of the personal photo system is constructed and information is automatically updated in real time according to the photo database.

In the embodiment, through the generated personal photo semantic database, the photo accurate retrieval based on Chinese and English key word groups is realized. Not only can show the information of the personal photo library in a visual form, but also can delete the photo library in a classified manner based on the key phrase. Therefore, an effective and quick method and system are provided for realizing personal photo retrieval.

In the embodiment, a technical scheme for retrieving the personal photos based on the Chinese and English descriptions of the images is designed, so that people can retrieve the needed photos by searching semantic information of the photos and also can retrieve the photos by identity information of people. In addition, the technical scheme of the disclosure also comprises multiple convenient functions of deleting and displaying the retrieval picture. Therefore, the method can have wide application space in the field of future photo retrieval.

Fig. 4 shows a schematic diagram of some embodiments of an image retrieval apparatus of the present disclosure.

As shown in fig. 4, the image retrieval device may be a personal photo retrieval system based on the chinese and english descriptions of the images. For example, the image retrieval device may include an image English-Chinese description subsystem and a personal photo retrieval subsystem.

In some embodiments, the image Chinese and English description subsystem may include an image description sentence generation module, a face recognition module and an image information database module.

For example, the image description sentence generation module (e.g., text determination unit) implements a function of automatically generating three Chinese and English description sentences with the highest probability for each photo in the personal computer photo library. When the addition, deletion or replacement of the photos in the photo library of the personal computer occurs, the corresponding photo information in the image information database module is correspondingly operated.

For example, a face recognition module (e.g., an information determination unit) implements a function of automatically recognizing name information of a person in a personal photo library. The photo information database module realizes real-time management of photo names, Chinese and English description sentences and character information appearing in the photos in the personal computer photo library.

In some embodiments, the personal photo retrieval subsystem may include a photo retrieval module, a photo deletion module, and a photo display module.

For example, the photo retrieval module (such as a retrieval unit) realizes the functions of inputting and deleting keywords and retrieving related photos from the image information database according to the keywords; the photo deleting module (such as a deleting unit) realizes the function of deleting the retrieved photos; the photo display module realizes visualization of the retrieved photos and the photo names.

Fig. 5 illustrates a flow diagram of further embodiments of the image retrieval method of the present disclosure.

As shown in fig. 5, in the step of setting a path of a photo library, when the photo information database is generated, a folder path of a pc photo library needs to be set to obtain related photo information.

In the step of initializing the database, the database is initialized, and the data storage format of the database is set.

And judging whether the photo library is newly added with photos. When detecting that the photos are added in the personal photo library, in the step of calling the command terminal, the system automatically calls the terminal command, generates Chinese and English description sentences and figure identity information and stores the Chinese and English description sentences and the figure identity information into a database; otherwise, judging whether the photo library deletes the photo.

When detecting that the photo in the personal photo library is deleted, deleting the corresponding photo and semantic information from the image information database, thereby realizing the dynamic management of the database. Thus, the work of generating the photo information database is completed.

Fig. 6 illustrates a flow diagram of further embodiments of the image retrieval method of the present disclosure.

As shown in fig. 6, when retrieving a photo from the database in the step of inputting a chinese and english keyword, a photo file name, chinese and english keyword information, and person identification information may be input.

In the search matching step, photo character information is retrieved from the image data information base according to the input keyword.

In the step of judging whether the matching is successful, the corresponding degree of the input information and the information in the database can be matched during retrieval.

When the matching is successful, all matched photo names may be returned. And the corresponding photo can be extracted from the personal computer photo library according to the photo name and displayed.

When the matching fails, a prompt window, such as "not find desired picture", etc., may pop up to prompt the user that the personal computer photo library does not have a photo corresponding to the keyword.

It may be determined whether to stop the retrieval. Under the condition of not stopping the retrieval, the user can input the keywords again for retrieval; otherwise, the retrieval task may be ended directly.

Fig. 7 illustrates a block diagram of some embodiments of an image retrieval device of the present disclosure.

As shown in fig. 7, the retrieval means 7 of the image includes a text determination unit 71, an information determination unit 72, and a retrieval unit 73.

The text determination unit 71 determines the description text of the target image using the first machine learning model based on the extracted feature vector of the target image.

In some embodiments, the text determination unit 71 determines the weight of each feature vector according to the feature vectors of different regions in the extracted target image by using the attention mechanism module of the first machine learning model; the text determination unit 71 determines the description text of the target image using the first machine learning model based on each feature vector and its weight.

In some embodiments, the first machine learning model is a two-layer LSTM model, the LSTM model including a first LSTM module and a second LSTM module; the attention mechanism module is arranged between the first LSTM module and the second LSTM module; the text determination unit 71 determines the weight of the feature vector based on the output of the first LSTM module and the feature vector.

In some embodiments, the text determination unit 71 determines the output of the second LSTM module from the weighted sum of the output of the first LSTM module and the respective feature vectors; the text determination unit 71 determines the description text of the target image based on the output of the second LSTM module.

In some embodiments, the text determination unit 71 determines the probability of each candidate word using a first machine learning model; the text determining unit 71 determines a plurality of candidate description texts of the target image according to the probability of each candidate word; the text determining unit 71 determines the probability of each candidate description text according to the probability of each candidate word in each candidate description text; the text determination unit 71 determines one or more candidate description texts having the highest probability as the description text.

The information determination unit 72 performs face recognition on the target image using the second machine learning model, and determines identity information of the target image.

The retrieval unit 73 matches the retrieval keywords provided by the user according to the description texts and the identity information of the plurality of target images, and determines corresponding retrieval results in the plurality of target images.

In some embodiments, the retrieving device 7 further includes a deleting unit 74, which is used for deleting the corresponding description text and the identity information of the target image in response to the target image being deleted.

Fig. 8 shows a block diagram of further embodiments of an image retrieval apparatus of the present disclosure.

As shown in fig. 8, the image retrieval device 8 of this embodiment includes: a memory 81 and a processor 82 coupled to the memory 81, the processor 82 being configured to execute a retrieval method of an image in any one of the embodiments of the present disclosure based on instructions stored in the memory 81.

The memory 81 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, application programs, a boot loader, a database, and other programs.

As shown in fig. 9, the image retrieval device 9 of this embodiment includes: a memory 910 and a processor 920 coupled to the memory 910, wherein the processor 920 is configured to execute the image retrieval method in any of the embodiments based on instructions stored in the memory 910.

The memory 910 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a boot loader, and other programs.

The retrieval means 9 of the image may further include an input-output interface 930, a network interface 940, a storage interface 950, and the like. These

interfaces

930, 940, 950 and the memory 910 and the processor 920 may be connected, for example, by a bus 960. The input/output interface 930 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 940 provides a connection interface for various networking devices. The storage interface 950 provides a connection interface for external storage devices such as an SD card and a usb disk.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media having computer-usable program code embodied therein.

So far, a retrieval method, an apparatus, and a computer-readable storage medium of an image according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1. An image retrieval method, comprising:

determining a description text of the target image by utilizing a first machine learning model according to the extracted feature vector of the target image;

performing face recognition on the target image by using a second machine learning model, and determining identity information of the target image;

and matching the retrieval keywords provided by the user according to the description texts and the identity information of the target images, and determining corresponding retrieval results in the target images.

2. The retrieval method of claim 1, wherein the determining the description text of the target image comprises:

determining the weight of each feature vector by using an attention mechanism module of the first machine learning model according to the extracted feature vectors of different regions in the target image;

and determining the description text of the target image by utilizing the first machine learning model according to the feature vectors and the weights thereof.

3. The retrieval method according to claim 2,

the first machine learning model is a double-layer long-short term memory (LSTM) model, the LSTM model comprises a first LSTM module and a second LSTM module, and the attention mechanism module is arranged between the first LSTM module and the second LSTM module;

the determining the weight of each feature vector comprises:

and determining the weight of the feature vector according to the output of the first LSTM module and the feature vector.

4. The retrieval method of claim 3, wherein the determining the description text of the target image comprises:

determining the output of the second LSTM module according to the output of the first LSTM module and the weighted sum of the feature vectors;

and determining the description text of the target image according to the output of the second LSTM module.

5. The retrieval method of claim 1, wherein the determining the description text of the target image comprises:

determining the probability of each candidate word by using the first machine learning model;

determining a plurality of candidate description texts of the target image according to the probability of each candidate word;

determining the probability of each candidate description text according to the probability of each candidate word in each candidate description text;

and determining one or more candidate description texts with the highest probability as the description texts.

6. The retrieval method of any one of claims 1-5, further comprising:

and in response to the target image being deleted, deleting the corresponding description text and the corresponding identity information of the target image.

7. An image retrieval apparatus comprising:

the text determining unit is used for determining a description text of the target image by utilizing a first machine learning model according to the extracted feature vector of the target image;

the information determining unit is used for carrying out face recognition on the target image by utilizing a second machine learning model and determining the identity information of the target image;

and the retrieval unit is used for matching the retrieval keywords provided by the user according to the description texts and the identity information of the target images and determining corresponding retrieval results in the target images.

8. The retrieval device of claim 7,

the text determining unit determines the weight of each feature vector by using an attention mechanism module of the first machine learning model according to the extracted feature vectors of different areas in the target image, and determines the description text of the target image by using the first machine learning model according to each feature vector and the weight thereof.

9. The retrieval device of claim 8,

the text determination unit determines the weight of the feature vector according to the output of the first LSTM module and the feature vector.

10. The retrieval device of claim 9,

the text determining unit determines the output of the second LSTM module according to the output of the first LSTM module and the weighted sum of the feature vectors, and determines the description text of the target image according to the output of the second LSTM module.

11. The retrieval device of claim 7,

the text determination unit determines the probability of each candidate word by using the first machine learning model, determines a plurality of candidate description texts of the target image according to the probability of each candidate word, determines the probability of each candidate description text according to the probability of each candidate word in each candidate description text, and determines one or more candidate description texts with the highest probability as the description texts.

12. The retrieval device of any of claims 7-11, further comprising:

and the deleting unit is used for deleting the corresponding description text and the corresponding identity information of the target image in response to the target image being deleted.

13. An image retrieval apparatus comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of retrieving an image of any of claims 1-6 based on instructions stored in the memory.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the image retrieval method according to any one of claims 1 to 6.