WO2020108234A1 - Procédé de génération d'index d'image, procédé et appareil de recherche d'image, terminal et support - Google Patents

Procédé de génération d'index d'image, procédé et appareil de recherche d'image, terminal et support Download PDF

Info

Publication number
WO2020108234A1
WO2020108234A1 PCT/CN2019/115411 CN2019115411W WO2020108234A1 WO 2020108234 A1 WO2020108234 A1 WO 2020108234A1 CN 2019115411 W CN2019115411 W CN 2019115411W WO 2020108234 A1 WO2020108234 A1 WO 2020108234A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
index
sentence
keyword
search
Prior art date
Application number
PCT/CN2019/115411
Other languages
English (en)
Chinese (zh)
Inventor
侯允
刘耀勇
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020108234A1 publication Critical patent/WO2020108234A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • This application relates to the field of search technology, in particular to an image index generation method, image search method, device, terminal and medium.
  • a photo album application is usually installed in the terminal, and the photo album application is generally used to store captured images, images saved from the network, and the like.
  • Embodiments of the present application provide an image index generation method, image search method, device, terminal, and medium.
  • the technical solution is as follows:
  • an image index generation method includes:
  • the description sentence is determined as an index of the first image, and the index is stored in correspondence with the first image.
  • an image search method including:
  • the index corresponding to the second image includes a first target keyword, and the first target keyword matches the first keyword ,
  • the index corresponding to the second image is a description sentence generated according to the recognition result of the second image
  • a search result is displayed, the search result including the second image.
  • an image index generation device comprising:
  • the image acquisition module is used to acquire the first image
  • An image recognition module configured to perform image recognition on the first image to obtain a recognition result corresponding to the first image
  • a sentence generating module configured to generate a description sentence according to the recognition result, and the description sentence is used to describe the first image
  • the index generation module is configured to determine the description sentence as an index of the first image, and store the index corresponding to the first image.
  • an image search device including:
  • Search box display module used to display the search box
  • a keyword receiving module configured to receive the first keyword input in the search box
  • An image search module is used to search a photo album for a second image matching the first keyword, an index corresponding to the second image includes a first target keyword, and the first target keyword and the The first keywords match, and the index corresponding to the second image is a description sentence generated according to the recognition result of the second image;
  • the result display module is used to display search results, and the search results include the second image.
  • an embodiment of the present application provides a terminal, the terminal includes a processor and a memory, and the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the foregoing image index generation method, Or implement the above image search method.
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and the computer program is loaded and executed by a processor to implement the above image index generation method, or The above image search method.
  • FIG. 3 is a flowchart of an image search method provided by an embodiment of this application.
  • FIG. 5 is a block diagram of an image index generation device provided by an embodiment of the present application.
  • FIG. 6 is a block diagram of an image search device provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of a terminal provided by an embodiment of the present application.
  • Embodiments of the present application provide an image index generation method, device, terminal, and storage medium.
  • the above description sentence is determined as the index of the image, and then when the user needs to search for the image, he can input the words included in the index, or the meanings of the words included in the index are similar Words, the terminal can accurately find the image according to the words entered by the user, which improves the search efficiency of searching images in the album.
  • the execution subject of each step is a terminal.
  • a photo album application is installed in the terminal, and the photo album application refers to an application for storing images.
  • the image may be an image (including photos and videos) taken by the user, or an image (including photos and videos) saved by the user from other applications.
  • the terminal may be a mobile phone, a tablet computer, a personal computer, a smart wearable device, a camera, a smart playback device, and so on.
  • An embodiment of the present application provides an image index generation method.
  • the method includes:
  • the description sentence is determined as an index of the first image, and the index is stored in correspondence with the first image.
  • the generating a description sentence based on the recognition result includes:
  • the first word vector is processed through a language description model to obtain the description sentence.
  • the method further includes:
  • the associated information including at least one of the following: location information, time information, and scene information;
  • the generating a description sentence based on the recognition result includes:
  • the first word vector and the second word vector are processed through a language description model to obtain the description sentence.
  • the method before determining the description sentence as an index of the first image and storing the index corresponding to the first image, the method further includes:
  • the inquiry information is used to inquire whether to determine the description sentence as the index
  • the method further includes:
  • performing image recognition on the first image to obtain a recognition result corresponding to the first image includes:
  • the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
  • the method before generating the description sentence based on the recognition result, the method further includes:
  • the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
  • the recognition result is processed through a language description model, and the actual description sentence is output;
  • the parameters of the language description model are adjusted, and the step of outputting the actual description sentence from the step of processing through the language description model for each sample image and outputting the actual description sentence;
  • the training is stopped, and the language description model that completes the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
  • An embodiment of the present application also provides an image search method.
  • the method includes:
  • the index corresponding to the second image includes a first target keyword, and the first target keyword matches the first keyword ,
  • the index corresponding to the second image is a description sentence generated according to the recognition result of the second image
  • a search result is displayed, the search result including the second image.
  • the method further includes:
  • prompt information is displayed, and the prompt information is used to prompt the input of the second keyword;
  • an index corresponding to the third image includes a second target keyword, the second target keyword and the second key Match words
  • the search result includes the third image.
  • FIG. 1 shows a flowchart of an image index generation method provided by an embodiment of the present application.
  • the method may include the following steps:
  • Step 101 Acquire a first image.
  • the first image may be an image collected by a camera on the terminal.
  • a camera is provided on the terminal and a shooting application is installed.
  • the shooting application refers to an application used to capture an image, for example, a camera application, a beauty application, or other applications.
  • the terminal receives the trigger signal of the shooting control acting on the current shooting interface, and acquires the image collected by the camera as the first image.
  • the first image may not be an image collected by a camera on the terminal, but an image saved by the user from other application programs.
  • the first image is an image obtained from the network or a screenshot.
  • the terminal when the terminal receives a save instruction corresponding to the image, the image is acquired from the network as the first image according to the save instruction.
  • the embodiment of the present application does not limit the acquisition method and timing of the first image.
  • Step 102 Perform image recognition on the first image to obtain a recognition result corresponding to the first image.
  • the recognition result corresponding to the first image is used to indicate the object included in the first image.
  • the first image may include one or more objects, such as people, animals, buildings, landscapes, and so on.
  • the terminal determines the category to which each object belongs by the following steps.
  • the category to which each object belongs is used to indicate the category to which the object belongs.
  • the object is a cat or dog or grass or human or other categories:
  • the image recognition model performs image recognition on the first image to obtain recognition results corresponding to at least one object in the first image, respectively.
  • the image recognition model is a neural network model trained using multiple sample images.
  • the image recognition model may be obtained by training the deep learning network using multiple sample images.
  • the objects in each sample image of the multiple sample images Corresponding to the classification label, the classification label is used to characterize the category to which the object belongs.
  • the image recognition model includes: an input layer and at least one convolutional layer (such as a total of 3 convolutional layers including a first convolutional layer, a second convolutional layer, and a third convolutional layer) , At least one fully connected layer (for example, including two fully connected layers including the first fully connected layer and the second fully connected layer) and one output layer.
  • the input data of the input layer is the first image
  • the output result of the output layer is the classification to which at least one object included in the first image belongs, respectively.
  • the image recognition process is as follows: the first image is input to the input layer of the image recognition model, the features of the first image are extracted by the convolutional layer of the image recognition model, and then the above features are combined and abstracted by the fully connected layer of the image recognition model To obtain data suitable for classification in the output layer, and finally the output layer outputs the recognition results corresponding to the at least one object included in the first image, respectively.
  • the specific structures of the convolution layer and the fully connected layer of the image recognition model are not limited.
  • the image recognition model shown in the above embodiment is only exemplary and explanatory, and is not used to limit the present application.
  • the more layers of the convolutional neural network the better the effect but the longer the calculation time.
  • the convolutional neural network with the appropriate number of layers can be designed in conjunction with the requirements for recognition accuracy and efficiency.
  • the sample image refers to an image selected in advance for training the image recognition model.
  • the sample image has a classification label.
  • the classification label of the sample image is usually determined manually, and is used to describe the scene, item, person, etc. corresponding to the sample image.
  • the neural network may be a deep learning network
  • the deep learning network may use alexNet network, VGG-16 network, GoogleNet network, Deep Residual Learning (deep residual learning) network, etc., which is not limited in the embodiments of the present application.
  • the algorithms used in training the deep learning network may be BP (Back-Propagation, back propagation algorithm), faster RCNN (Regions with Convolutional Neural Network, regional convolutional neural network) algorithm, etc., this embodiment of the application does not make limited.
  • the trained deep learning network is obtained, that is, the image recognition model is obtained.
  • Step 103 Generate a description sentence according to the recognition result.
  • the description sentence is used to describe the first image.
  • the description sentence includes the recognition results corresponding to at least one object respectively.
  • the description sentence also includes other words, which can be used to describe at least one of the following: the positional relationship between at least two objects, the action being performed by an object, the state of an object, etc. Wait.
  • the first image is recognized, and the objects in the first image include a dog and a grass, and the dog's posture on the grass is running, and the above recognition result is input into a language description model to obtain the first image
  • the corresponding descriptive sentence is "dog running on the grass".
  • the language description model includes: an input layer and at least one convolutional layer (such as a total of 3 convolutional layers including a first convolutional layer, a second convolutional layer, and a third convolutional layer) , At least one fully connected layer (for example, including two fully connected layers including the first fully connected layer and the second fully connected layer) and one output layer.
  • the input data of the input layer is the first image and the recognition result to which the object in the first image belongs.
  • the output result of the output layer is the description sentence corresponding to the first image.
  • the generation process of the description sentence is as follows: the first image and the recognition results of the objects in the first image are input to the input layer of the language description model, the convolutional layer of the language description model extracts the features of the above input content, and then the language description model The fully connected layer of the group combines and abstracts the above features, and finally the output layer outputs the description sentence corresponding to the first image.
  • the specific structures of the convolutional layer and the fully connected layer of the language description model are not limited.
  • the language description model shown in the above embodiment is only exemplary and explanatory, and is not intended to limit the application.
  • the more layers of the convolutional neural network the better the effect but the longer the calculation time.
  • step 103 may include the following sub-steps:
  • step 103 can be implemented as:
  • Step 103a converting the recognition result into a first word vector
  • Step 103b Process the first word vector through the language description model to obtain a description sentence.
  • the terminal converts the recognition result into a corresponding word vector through a word vector model.
  • the word vector refers to a vector representing words
  • the word vector model refers to a model that converts words into word vectors, and converts the word vector Input the language description model, and output the description sentence from the language description model.
  • the above word vector model may be a word2vec model.
  • step 103 can also be implemented as:
  • the associated information includes at least one of the following: location information, time information, and scene information.
  • Location information is used to indicate the geographic location when the first image was taken, for example, Shanghai, Beijing, Canada, etc.
  • Time information is used to indicate the time when the first image was acquired, for example, spring, summer, autumn, winter, early morning, evening Etc.
  • the scene information is used to indicate the scene corresponding to the first image, for example, parks, beaches, shopping malls, schools, etc.
  • the terminal can convert the related information into the corresponding word vector through the word vector model.
  • the terminal inputs the first word vector and the second word vector into the language description model, so that the final description sentence is more abundant.
  • the following uses the associated information as location information as an example for description.
  • the first word vector and the second word vector are processed through the language description model to obtain a description sentence.
  • the location information is used to indicate the geographic location when the first image is taken.
  • the position information can be obtained by a positioning component in the terminal, for example, a GPS (Global Positioning System) component.
  • the terminal may also obtain the position information of the first image by performing image recognition on the first image.
  • step 103a For the method of converting the position information into a word vector, reference may be made to step 103a, which will not be repeated here.
  • the description sentence corresponding to the first image is generated by combining the geographic location where the first image is taken, so that the first image can be described more completely, and subsequent users can search for the first image through multiple different keywords An image to enhance the convenience of searching.
  • the first image is identified, and it is obtained that the objects in the first image include a dog and a grass, and the posture of the dog on the grass is running, in addition, the geographic location where the first image is taken is XX Park, then The descriptive sentence corresponding to the first image is "dog running on the grass in xx park".
  • Step 104 Determine the description sentence as the index of the first image, and store the index corresponding to the first image.
  • the terminal determines the description sentence as the index of the first image, and stores the index in correspondence with the first image. Subsequently, if the user needs to search for the first image, he only needs to input at least one word included in the description sentence, or a word matching the word in the description sentence, for example, the similarity between the words in the description sentence For words greater than a preset threshold, the terminal may find the first image according to the words input by the user, and display the first image to the user.
  • the embodiment of the present application does not limit the path for storing the description sentence and the first image, which may be preset by the terminal or may be set by the user.
  • the technical solution provided by the embodiments of the present application recognizes the recognition results corresponding to each object included in the image, and generates a description sentence describing the image according to the recognition result, and determines the above description sentence as the image Index, when the user needs to search for the image later, he can input the words included in the index, or the words with similar meanings to the words included in the index, the terminal can accurately find the image according to the words entered by the user, Improve the search efficiency of searching images in the album.
  • the generated index is accurate.
  • FIG. 2 shows a flowchart of an image index generation method provided by another embodiment of the present application.
  • the method may include the following steps:
  • Step 201 Acquire a first image.
  • Step 202 Perform image recognition on the first image to obtain a recognition result corresponding to the first image.
  • Step 203 Generate a description sentence according to the recognition result.
  • step 204 query information is displayed.
  • the inquiry information is used to inquire whether to determine the description sentence as an index.
  • the inquiry message is "the description sentence corresponding to the image is "watching a concert in a bird's nest", are you sure?".
  • the user can preview the description sentence generated by the language description model, and decide whether to determine the description sentence generated above as the index of the first image.
  • Step 205 when receiving the confirmation instruction corresponding to the inquiry information, determine the description sentence as the index of the first image, and store the index corresponding to the first image.
  • a confirmation instruction can be issued to the query information.
  • the confirmation instruction corresponding to the inquiry information is used to instruct confirmation to determine the generated description sentence as the index of the image.
  • a confirmation control is displayed on the peripheral side of the query information, and when the terminal receives a trigger signal acting on the confirmation control, the terminal receives a confirmation instruction corresponding to the query information.
  • Step 206 when the confirmation instruction is not received, an input box is displayed.
  • the input box is used to receive a description sentence corresponding to the first image input by the user.
  • the terminal does not receive the trigger signal acting on the confirmation control within a preset time
  • the terminal does not receive the confirmation instruction.
  • a denial control is also displayed on the peripheral side of the query information.
  • the terminal receives a trigger signal corresponding to the denial control
  • the terminal does not receive the confirmation instruction, and the terminal may display an input box at this time.
  • Step 207 Receive the sentence input in the input box.
  • Step 208 Determine the input sentence as the index of the first image, and store the index corresponding to the first image.
  • the user judges whether to confirm the generated description sentence as the index of the image, and if the user is not satisfied with the description sentence generated by the terminal, the user inputs the image by himself Corresponding description sentences, so that subsequent users can search the image according to the description sentences entered by themselves, which improves the accuracy of the index and further improves the final image indexing efficiency.
  • the user After generating the index of the first image, the user can search the first image in the album according to the index.
  • an embodiment of the present application further provides an image search method , The image search method may include the following steps:
  • step 301 a search box is displayed.
  • the search box is used for the user to input a search keyword, so that the terminal can find an image matching the search keyword.
  • the search box is displayed on the main interface of the album application.
  • the main interface of the album application program displays a search control.
  • the terminal receives a trigger signal corresponding to the search control, and displays a search box according to the trigger signal.
  • the embodiment of the present application does not limit the display manner of the search box.
  • Step 302 Receive the first keyword entered in the search box.
  • the first keyword is input by the user, and it may be "Forbidden City”, “Cat”, “Rose Flower”, etc., which is not limited in this embodiment of the present application.
  • Step 303 Search the album for the second image that matches the first keyword.
  • the number of second images may be one, or multiple.
  • the index corresponding to the second image is used to describe the second image.
  • the index corresponding to the second image is a description sentence generated according to the recognition result of the second image.
  • the index corresponding to the second image includes the first target keyword.
  • the first target keyword may be a recognition result corresponding to the object included in the second image, or may be other words in the description sentence other than the recognition result, which is not limited in this embodiment of the present application. In this way, users can search the same image with different keywords, reducing the difficulty of searching for images.
  • the first target keyword matches the first keyword, for example, the similarity between the first target keyword and the first keyword meets a preset condition.
  • the preset condition may be that the similarity between the first target keyword and the first keyword is greater than a preset threshold, and the preset threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.
  • the terminal first calculates the similarity between the words included in each description sentence stored in the terminal and the first keyword, and then determines the words whose similarity with the first keyword meets the preset condition as The first target keyword, and finally, the image corresponding to the description sentence containing the first target keyword is used as the second image matching the first keyword.
  • the similarity between the first keyword and the words included in the description sentence can be calculated as follows: the terminal expresses the first keyword as the first vector through the word vector model, and represents the words included in the description sentence as the first Two vectors, and then calculate the similarity between the first keyword and the words included in the description sentence by calculating the cosine distance between the first vector and the second vector, the greater the cosine distance, indicating that the first keyword and the description The lower the similarity between the words included in the sentence; conversely, the smaller the cosine distance, indicating that the similarity between the first keyword and the words included in the description sentence is higher.
  • the terminal may determine words whose cosine distance satisfies the preset condition as the first target keyword.
  • Step 304 Display the search results.
  • the terminal displays the search result on the search result page, and the search result includes the above-mentioned second image.
  • the terminal may sort the second images according to the similarity between the first target keyword and the first keyword.
  • the greater the similarity between the first target keyword and the first keyword the more the second image corresponding to the description sentence containing the first target keyword is arranged in the search result page;
  • the smaller the similarity between the first target keyword and the first keyword the lower the order of the second image corresponding to the description sentence containing the first target keyword in the search result page.
  • the technical solution provided by the embodiments of the present application performs image search through the image index generated according to the above embodiment, and the user only needs to input the words included in the index or the For words with similar meanings, the terminal can accurately search for the image according to the words entered by the user, which improves the search efficiency of searching images in the album.
  • the terminal searches for more second images based on the first keyword
  • the user needs to filter out the images he desires to search among more second images at this time, and the search efficiency is still Relatively low.
  • FIG. 4 shows a flowchart of an image search method provided by another embodiment of the present application.
  • the image search method can be used to solve the problem of low search efficiency when there are many second images searched according to the first keyword.
  • the method includes the following steps:
  • step 401 a search box is displayed.
  • Step 402 Receive the first keyword entered in the search box.
  • Step 403 Search the album for the second image that matches the first keyword.
  • step 404 when the number of second images is greater than the preset number, a prompt message is displayed.
  • the preset number can be set according to actual needs, which is not limited in the embodiments of the present application.
  • the preset number is 10 sheets.
  • the prompt information is used to prompt the input of the second keyword.
  • the second keyword is different from the first keyword.
  • the terminal when finding the second image matching the first keyword, the terminal first detects whether the number of the second image is greater than the preset number. If the number of the second image is less than or equal to the preset number, the second image is directly displayed. If the number of second images is greater than the preset number, the user is prompted to enter more keywords, so that the terminal continues to filter out the first keyword and the second key in the second image matching the first keyword The third image matches the words.
  • Step 405 Obtain the second keyword.
  • the second keyword is also input by the user, which is different from the first keyword.
  • the above prompt information includes an input box for the user to input the second keyword, and the user can input the second keyword in the input box, so that the terminal obtains the second keyword.
  • Step 406 Search for a third image matching the second keyword in the second image.
  • the index corresponding to the third image includes the second target keyword.
  • the second target keyword matches the second keyword.
  • the similarity between the second target keyword and the second keyword meets the second preset condition.
  • the second preset condition may be that the similarity between the second target keyword and the second keyword is greater than a preset threshold, and the preset threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.
  • the terminal first calculates the similarity between the words included in each description sentence stored by the terminal and the first keyword, and between the words included in each description sentence stored by the terminal and the second keyword The similarity of; then the words whose similarity with the first keyword meets the first preset condition are determined as the first target keyword, and the similarity with the second keyword meets the second preset condition The word is determined as the second target keyword; finally, the image corresponding to the description sentence containing the first target keyword and the second target keyword is used as the third image that matches both the first keyword and the second keyword.
  • step 303 for the calculation method of the similarity between the second keyword and the words included in the description sentence, reference may be made to step 303, and details are not described here.
  • the terminal calculates the similarity between the words included in the second image and the second keyword, and determines that the similarity between the second keyword and the second keyword meets the second preset condition as the female target keyword, The image including the second target keyword in the second image is determined as the third image.
  • Step 407 display the search results.
  • the search result includes the above-mentioned third image.
  • the technical solution provided by the embodiments of the present application can prompt the user to input more keywords when there are too many search results, so that the terminal can perform image search based on the keywords entered twice, thereby improving the image search performance. Accuracy.
  • the language description model is pre-trained, and is a model for encoding at least two words into a complete sentence.
  • the following describes the training process of the language description model.
  • Step 501 Obtain a training sample set.
  • the training sample set includes multiple sample images, and the sample images correspond to the expected description sentences corresponding to the recognition results.
  • the recognition result corresponding to the sample image can be marked manually or obtained through the image recognition model. It is expected that the description sentence may be manually marked.
  • Step 502 For the sample image, process the recognition result through the language description model, and output the actual description sentence.
  • the language description model may be a deep learning network, such as alexNet network, VGG-16 network, GoogleNet network, Deep Residual Learning (deep residual learning) network.
  • the parameters of the language description model are initialized.
  • the parameters of the language description model may be set randomly, or may be set by relevant technical personnel based on experience.
  • each sample image is input into a language description model, and the language description model outputs an actual description sentence.
  • Step 503 Calculate the error between the actual description sentence and the expected description sentence.
  • the terminal determines the distance between the actual description sentence and the expected description sentence as an error.
  • the terminal After calculating the error between the actual description sentence and the expected description sentence, the terminal detects whether the error is greater than a preset threshold. If the error is greater than the preset threshold, the parameters of the language description model are adjusted, and the steps of outputting the actual description sentence are processed from the language description model for each sample image, that is, steps 502 and 503 are repeated. When the error is less than or equal to the preset threshold, the training is stopped, and the language description model that has completed the training is obtained. .
  • FIG. 5 shows a block diagram of an image index generation device provided by an embodiment of the present application.
  • the device has the function of implementing the above method, and the function can be realized by hardware, or can be realized by hardware executing corresponding software.
  • the device may be a terminal or may be provided on the terminal.
  • the device includes:
  • the image acquisition module 601 is used to acquire the first image.
  • the image recognition module 602 is configured to perform image recognition on the first image to obtain a recognition result corresponding to the first image.
  • the sentence generating module 603 is configured to generate a description sentence according to the recognition result, and the description sentence is used to describe the first image.
  • the index generation module 604 is configured to determine the description sentence as an index of the first image, and store the index corresponding to the first image.
  • the technical solution provided by the embodiments of the present application recognizes the recognition results corresponding to each object included in the image, and generates a description sentence describing the image according to the recognition result, and determines the above description sentence as the image Index, when the user needs to search for the image later, he can input the words included in the index, or the words with similar meanings to the words included in the index, the terminal can accurately find the image according to the words entered by the user, Improve the search efficiency of searching images in the album.
  • the sentence generation module 603 is used to:
  • the first word vector is processed through a language description model to obtain the description sentence.
  • the device further includes: an information acquisition module (not shown in the figure).
  • the information acquisition module is used to acquire the associated information of the first image, the associated information includes at least one of the following: location information, time information, scene information;
  • the sentence generation module 603 is used to:
  • the first word vector and the second word vector are processed through a language description model to obtain the description sentence.
  • the device further includes: an information display module (not shown in the figure).
  • the information display module is used to display query information, and the query information is used to query whether the description sentence is determined as the index;
  • the index generation module 640 is further configured to, when receiving the confirmation instruction corresponding to the inquiry information, execute the determination of the description sentence as an index of the first image, and compare the index with the The first image corresponds to the stored step.
  • the device further includes an input box display module and a sentence receiving module (not shown in the figure).
  • the input box display module is used to display the input box when the confirmation instruction is not received
  • a sentence receiving module configured to receive a sentence input in the input box
  • the index generation module 640 is further configured to determine the input sentence as an index of the first image, and store the index corresponding to the first image.
  • the image recognition module is configured to:
  • the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
  • the device further includes: a sample set acquisition module, a sentence output module, an error calculation module, and a model training module (not shown in the figure).
  • a sample set acquisition module for acquiring a training sample set, the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
  • the sentence output module is used to process the recognition result through the language description model for the sample image and output the actual description sentence;
  • An error calculation module used to calculate the error between the actual description sentence and the expected description sentence
  • the model training module is used to adjust the parameters of the language description model when the error is greater than a preset threshold, and process from each of the sample images through the language description model to output the actual description sentence Steps begin to execute; until the error is less than or equal to the preset threshold, the training is stopped, and the language description model that has completed the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
  • FIG. 6 shows a block diagram of an image search apparatus provided by an embodiment of the present application.
  • the device has the function of implementing the above method, and the function can be realized by hardware, or can be realized by hardware executing corresponding software.
  • the device may be a terminal or may be provided on the terminal.
  • the device includes:
  • the search box display module 710 is used to display the search box.
  • the keyword receiving module 720 is configured to receive the first keyword input in the search box.
  • the image search module 730 is configured to search a second image matching the first keyword in an album, and the index corresponding to the second image includes a first target keyword, and the first target keyword is The first keyword matches, and the index corresponding to the second image is a description sentence generated according to the recognition result of the second image.
  • the result display module 740 is configured to display search results, and the search results include the second image.
  • the technical solution provided by the embodiments of the present application can prompt the user to input more keywords when there are too many search results, so that the terminal can perform image search based on the keywords entered twice, thereby improving the image search performance. Accuracy.
  • the device further includes: an information display module and a keyword acquisition module (not shown in the figure).
  • the information display module is configured to display prompt information when the number of the second images is greater than a preset number, and the prompt information is used to prompt the input of the second keyword.
  • the keyword acquisition module is used to acquire the second keyword.
  • the image search module is further configured to search for a third image matching the second keyword in the second image, and an index corresponding to the third image includes a second target keyword, the second The target keyword matches the second keyword;
  • the search result includes the third image.
  • the device provided in the above embodiment realizes its function, it is only exemplified by the division of the above functional modules.
  • the above functions can be allocated by different functional modules according to needs, that is, the equipment
  • the internal structure of is divided into different functional modules to complete all or part of the functions described above.
  • the device and method embodiments provided in the above embodiments belong to the same concept. For the specific implementation process, see the method embodiments, and details are not described here.
  • FIG. 7 shows a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • the terminal in this application may include one or more of the following components: a processor 610 and a memory 620.
  • the processor 610 may include one or more processing cores.
  • the processor 610 connects various parts of the entire terminal by using various interfaces and lines, and executes the terminal by executing or executing instructions, programs, code sets or instruction sets stored in the memory 620, and calling data stored in the memory 620 Various functions and processing data.
  • the processor 610 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA) Various hardware forms.
  • the processor 610 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU) and a modem. Among them, CPU mainly deals with operating system and application program, etc.; modem is used to deal with wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 610, and may be implemented by a chip alone.
  • CPU Central Processing Unit
  • the processor 610 executes the program instructions in the memory 620, the image index generation method or the image search method provided by the foregoing method embodiments are implemented.
  • the memory 620 may include random access memory (Random Access Memory, RAM) or read-only memory (Read-Only Memory, ROM).
  • the memory 620 includes a non-transitory computer-readable storage medium.
  • the memory 620 may be used to store instructions, programs, codes, code sets, or instruction sets.
  • the memory 620 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing various method embodiments described above, etc.; storage data area It can store data created according to the use of the terminal.
  • the structure of the above terminal is only schematic. In actual implementation, the terminal may include more or fewer components, such as a display screen, etc., which is not limited in this embodiment.
  • FIG. 6 does not constitute a limitation on the terminal 600, and may include more or fewer components than illustrated, or combine certain components, or adopt different component arrangements.
  • An exemplary embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, which when loaded and executed by a processor implements the image index generation method or image search method provided by the above method embodiments .
  • An exemplary embodiment of the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to execute the image index generation method or the image search method described in the above embodiments.
  • the program may be stored in a computer-readable storage medium.
  • the mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé de génération d'index d'image, un procédé et un appareil de recherche d'image, un terminal et un support. Le procédé comprend : l'acquisition d'une première image (101) ; la réalisation d'une reconnaissance d'image sur la première image pour obtenir le résultat de reconnaissance correspondant à la première image (102) ; en fonction du résultat de la reconnaissance, la génération d'une phrase de description (103) ; et la détermination de la phrase de description en tant qu'index de la première image et le stockage de l'index et de la première image de manière correspondante (104). Le procédé reconnaît les résultats de reconnaissance correspondant à divers objets compris dans une image et génère une phrase de description décrivant l'image en fonction des résultats de la reconnaissance, et détermine la phrase de description susmentionnée en tant qu'index de l'image ; par la suite, lorsqu'il doit rechercher l'image, un utilisateur peut entrer un mot compris dans l'index ou un mot ayant une signification proche du mot compris dans l'index ; et un terminal peut rechercher précisément l'image en fonction du mot entré par l'utilisateur, ce qui améliore l'efficacité de recherche de la recherche d'image dans un album photo.
PCT/CN2019/115411 2018-11-30 2019-11-04 Procédé de génération d'index d'image, procédé et appareil de recherche d'image, terminal et support WO2020108234A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811457455.0 2018-11-30
CN201811457455.0A CN109635135A (zh) 2018-11-30 2018-11-30 图像索引生成方法、装置、终端及存储介质

Publications (1)

Publication Number Publication Date
WO2020108234A1 true WO2020108234A1 (fr) 2020-06-04

Family

ID=66070700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/115411 WO2020108234A1 (fr) 2018-11-30 2019-11-04 Procédé de génération d'index d'image, procédé et appareil de recherche d'image, terminal et support

Country Status (2)

Country Link
CN (1) CN109635135A (fr)
WO (1) WO2020108234A1 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635135A (zh) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 图像索引生成方法、装置、终端及存储介质
CN110083729B (zh) * 2019-04-26 2023-10-27 北京金山数字娱乐科技有限公司 一种图像搜索的方法及系统
CN110362698A (zh) * 2019-07-08 2019-10-22 北京字节跳动网络技术有限公司 一种图片信息生成方法、装置、移动终端及存储介质
CN112541091A (zh) * 2019-09-23 2021-03-23 杭州海康威视数字技术股份有限公司 图像搜索方法、装置、服务器和存储介质
CN110704654A (zh) * 2019-09-27 2020-01-17 三星电子(中国)研发中心 一种图片搜索方法和装置
CN112925939A (zh) * 2019-12-05 2021-06-08 阿里巴巴集团控股有限公司 图片搜索方法、描述信息生成方法、设备及存储介质
CN111046203A (zh) * 2019-12-10 2020-04-21 Oppo广东移动通信有限公司 图像检索方法、装置、存储介质及电子设备
CN111797765B (zh) * 2020-07-03 2024-04-16 北京达佳互联信息技术有限公司 图像处理方法、装置、服务器及存储介质
CN112711998A (zh) * 2020-12-24 2021-04-27 珠海新天地科技有限公司 3d模型注释系统及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838724A (zh) * 2012-11-20 2014-06-04 百度在线网络技术(北京)有限公司 图像搜索方法及装置
CN106446782A (zh) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 图像识别方法及装置
CN106708940A (zh) * 2016-11-11 2017-05-24 百度在线网络技术(北京)有限公司 用于处理图片的方法和装置
CN107766853A (zh) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 一种图像的文本信息的生成、显示方法及电子设备
WO2018134964A1 (fr) * 2017-01-20 2018-07-26 楽天株式会社 Système de recherche d'image, procédé de recherche d'image et programme
CN109635135A (zh) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 图像索引生成方法、装置、终端及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136228A (zh) * 2011-11-25 2013-06-05 阿里巴巴集团控股有限公司 一种图片搜索方法以及图片搜索装置
CN107908770A (zh) * 2017-11-30 2018-04-13 维沃移动通信有限公司 一种照片搜索方法及移动终端
CN108021654A (zh) * 2017-12-01 2018-05-11 北京奇安信科技有限公司 一种相册图像处理方法及装置
CN108509521B (zh) * 2018-03-12 2020-02-18 华南理工大学 一种自动生成文本索引的图像检索方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838724A (zh) * 2012-11-20 2014-06-04 百度在线网络技术(北京)有限公司 图像搜索方法及装置
CN107766853A (zh) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 一种图像的文本信息的生成、显示方法及电子设备
CN106446782A (zh) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 图像识别方法及装置
CN106708940A (zh) * 2016-11-11 2017-05-24 百度在线网络技术(北京)有限公司 用于处理图片的方法和装置
WO2018134964A1 (fr) * 2017-01-20 2018-07-26 楽天株式会社 Système de recherche d'image, procédé de recherche d'image et programme
CN109635135A (zh) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 图像索引生成方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN109635135A (zh) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2020108234A1 (fr) Procédé de génération d'index d'image, procédé et appareil de recherche d'image, terminal et support
JP7091504B2 (ja) 顔認識アプリケーションにおけるフォールスポジティブの最小化のための方法および装置
Gu et al. An empirical study of language cnn for image captioning
WO2019154262A1 (fr) Procédé de classification d'image, serveur, terminal d'utilisateur, et support de stockage
CN111062871B (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
CA2804230C (fr) Procede mis en oeuvre par ordinateur, produit programme informatique et systeme informatique pour traiter une image
US20210271707A1 (en) Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching
WO2019214453A1 (fr) Système de partage de contenu, procédé, procédé de marquage, serveur et équipement terminal
CN106897372B (zh) 语音查询方法和装置
US10685236B2 (en) Multi-model techniques to generate video metadata
KR102124466B1 (ko) 웹툰 제작을 위한 콘티를 생성하는 장치 및 방법
WO2020044099A1 (fr) Procédé et appareil de traitement de service basés sur une reconnaissance d'objets
CN116797684B (zh) 图像生成方法、装置、电子设备及存储介质
WO2023101679A1 (fr) Récupération inter-modale d'image de texte sur la base d'une expansion de mots virtuels
JP2021535508A (ja) 顔認識において偽陽性を低減するための方法および装置
US20170171471A1 (en) Method and device for generating multimedia picture and an electronic device
JP6046501B2 (ja) 特徴点出力装置、特徴点出力プログラム、特徴点出力方法、検索装置、検索プログラムおよび検索方法
WO2022012205A1 (fr) Procédé et appareil d'achèvement de mots
KR20230025917A (ko) 여행과 연관된 증강 현실 기반 음성 번역
KR20200083159A (ko) 사용자 단말에서의 사진 검색 방법 및 시스템
US8994834B2 (en) Capturing photos
JP7483532B2 (ja) キーワード抽出装置、キーワード抽出方法及びキーワード抽出プログラム
WO2014186392A2 (fr) Synthèse d'un album photo
CN117854156B (zh) 一种特征提取模型的训练方法和相关装置
KR102167628B1 (ko) 인공 지능 데이터 셋을 위한 영상 수집 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889402

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889402

Country of ref document: EP

Kind code of ref document: A1