WO2020108234A1 - Image index generation method, image search method and apparatus, and terminal, and medium - Google Patents

Image index generation method, image search method and apparatus, and terminal, and medium Download PDF

Info

Publication number
WO2020108234A1
WO2020108234A1 PCT/CN2019/115411 CN2019115411W WO2020108234A1 WO 2020108234 A1 WO2020108234 A1 WO 2020108234A1 CN 2019115411 W CN2019115411 W CN 2019115411W WO 2020108234 A1 WO2020108234 A1 WO 2020108234A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
index
sentence
keyword
search
Prior art date
Application number
PCT/CN2019/115411
Other languages
French (fr)
Chinese (zh)
Inventor
侯允
刘耀勇
陈岩
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020108234A1 publication Critical patent/WO2020108234A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are an image index generation method, an image search method and apparatus, a terminal and a medium. The method comprises: acquiring a first image (101); performing image recognition on the first image to obtain the recognition result corresponding to the first image (102); according to the recognition result, generating a description sentence (103); and determining the description sentence as an index of the first image and storing the index and the first image correspondingly (104). The method recognizes the recognition results corresponding to various objects comprised in an image and generates a description sentence describing the image according to the recognition results, and determines the above-mentioned description sentence as an index of the image; later, when needing to search for the image, a user can input a word comprised in the index or a word with a meaning close to the word comprised in the index; and a terminal can accurately search for the image according to the word input by the user, improving searching efficiency of image searching in a photo album.

Description

图像索引生成方法、图像搜索方法、装置、终端及介质Image index generation method, image search method, device, terminal and medium
本申请要求于2018年11月30日提交的申请号为201811457455.0、发明名称为“图像索引生成方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application with the application number 201811457455.0 and the invention titled "Image Index Generation Method, Device, Terminal, and Storage Medium" filed on November 30, 2018. in.
技术领域Technical field
本申请涉及搜索技术领域,特别涉及一种图像索引生成方法、图像搜索方法、装置、终端及介质。This application relates to the field of search technology, in particular to an image index generation method, image search method, device, terminal and medium.
背景技术Background technique
目前,终端中通常安装有相册应用程序,该相册应用程序通常用于存储拍摄得到的图像、从网络上保存的图像等。At present, a photo album application is usually installed in the terminal, and the photo album application is generally used to store captured images, images saved from the network, and the like.
当相册中保存的图像较多时,用户若需要从上述保存的图像中查找到自己所需的图像,则需要查找终端中的各个相册目录,从相应的相册目录中找到自己所需的图像。When there are many images saved in the album, if the user needs to find the images he needs from the saved images, he needs to find each album directory in the terminal and find the images he needs from the corresponding album directory.
发明内容Summary of the invention
本申请实施例提供了一种图像索引生成方法、图像搜索方法、装置、终端及介质。所述技术方案如下:Embodiments of the present application provide an image index generation method, image search method, device, terminal, and medium. The technical solution is as follows:
一个方面,提供了一种图像索引生成方法,所述方法包括:In one aspect, an image index generation method is provided. The method includes:
获取第一图像;Get the first image;
对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;Performing image recognition on the first image to obtain a recognition result corresponding to the first image;
根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;Generating a description sentence according to the recognition result, where the description sentence is used to describe the first image;
将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The description sentence is determined as an index of the first image, and the index is stored in correspondence with the first image.
另一方面,提供了一种图像搜索方法,所述方法包括:In another aspect, an image search method is provided, the method including:
显示搜索框;Display the search box;
接收在所述搜索框输入的第一关键字;Receiving the first keyword entered in the search box;
在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;Searching for a second image matching the first keyword in the album, the index corresponding to the second image includes a first target keyword, and the first target keyword matches the first keyword , The index corresponding to the second image is a description sentence generated according to the recognition result of the second image;
显示搜索结果,所述搜索结果包括所述第二图像。A search result is displayed, the search result including the second image.
另一方面,提供了一种图像索引生成装置,所述装置包括:On the other hand, an image index generation device is provided, the device comprising:
图像获取模块,用于获取第一图像;The image acquisition module is used to acquire the first image;
图像识别模块,用于对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;An image recognition module, configured to perform image recognition on the first image to obtain a recognition result corresponding to the first image;
语句生成模块,用于根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;A sentence generating module, configured to generate a description sentence according to the recognition result, and the description sentence is used to describe the first image;
索引生成模块,用于将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The index generation module is configured to determine the description sentence as an index of the first image, and store the index corresponding to the first image.
又一方面,提供了一种图像搜索装置,所述装置包括:In yet another aspect, an image search device is provided, the device including:
搜索框显示模块,用于显示搜索框;Search box display module, used to display the search box;
关键字接收模块,用于接收在所述搜索框输入的第一关键字;A keyword receiving module, configured to receive the first keyword input in the search box;
图像搜索模块,用于在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;An image search module is used to search a photo album for a second image matching the first keyword, an index corresponding to the second image includes a first target keyword, and the first target keyword and the The first keywords match, and the index corresponding to the second image is a description sentence generated according to the recognition result of the second image;
结果显示模块,用于显示搜索结果,所述搜索结果包括所述第二图像。The result display module is used to display search results, and the search results include the second image.
又一方面,本申请实施例提供一种终端,所述终端包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现上述图像索引生成方法,或实现上述图像搜索方法。In still another aspect, an embodiment of the present application provides a terminal, the terminal includes a processor and a memory, and the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the foregoing image index generation method, Or implement the above image search method.
又一方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现上述图像索引生成方法,或实现上述图像搜索方法。In still another aspect, an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and the computer program is loaded and executed by a processor to implement the above image index generation method, or The above image search method.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings required in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without paying any creative work, other drawings can be obtained based on these drawings.
图1为本申请一个实施例提供的图像索引生成方法的流程图;1 is a flowchart of an image index generation method provided by an embodiment of this application;
图2为本申请另一个实施例提供的图像索引生成方法的流程图;2 is a flowchart of an image index generation method provided by another embodiment of this application;
图3为本申请一个实施例提供的图像搜索方法的流程图;3 is a flowchart of an image search method provided by an embodiment of this application;
图4为本申请另一个实施例提供的图像搜索方法的流程图;4 is a flowchart of an image search method provided by another embodiment of this application;
图5为本申请一个实施例提供的图像索引生成装置的框图;5 is a block diagram of an image index generation device provided by an embodiment of the present application;
图6为本申请一个实施例提供的图像搜索装置的框图;6 is a block diagram of an image search device provided by an embodiment of the present application;
图7为本申请一个实施例提供的终端的框图。7 is a block diagram of a terminal provided by an embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。To make the objectives, technical solutions, and advantages of the present application clearer, the following describes the embodiments of the present application in further detail with reference to the accompanying drawings.
本申请实施例提供了一种图像索引生成方法、装置、终端及存储介质,通过识别出图像中所包括的各个对象分别对应的识别结果,并通过语言描述模型来生成包括上述识别结果,且用于描述图像的描述语句,将上述描述语句确定为该图像的索引,后续当用户需要搜索该图像时,可以输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相近的词语,终端可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。Embodiments of the present application provide an image index generation method, device, terminal, and storage medium. By identifying the corresponding recognition results of each object included in the image, and using the language description model to generate the above-mentioned recognition results, and using To describe the description sentence of the image, the above description sentence is determined as the index of the image, and then when the user needs to search for the image, he can input the words included in the index, or the meanings of the words included in the index are similar Words, the terminal can accurately find the image according to the words entered by the user, which improves the search efficiency of searching images in the album.
本申请实施例提供的技术方案,各步骤的执行主体为终端。可选地,终端中安装有相册应用程序,相册应用程序是指用于存储图像的应用程序。该图像可以是用户拍摄的图像(包括照片和视频),也可以是用户从其他应用程序中保存的图像(包括照片和视频)。终端可以是手机、平板电脑、个人计算机、智能可穿戴设备、相机、智能播放设备等等。In the technical solution provided by the embodiments of the present application, the execution subject of each step is a terminal. Optionally, a photo album application is installed in the terminal, and the photo album application refers to an application for storing images. The image may be an image (including photos and videos) taken by the user, or an image (including photos and videos) saved by the user from other applications. The terminal may be a mobile phone, a tablet computer, a personal computer, a smart wearable device, a camera, a smart playback device, and so on.
本申请实施例提供了一种图像索引生成方法,所述方法包括:An embodiment of the present application provides an image index generation method. The method includes:
获取第一图像;Get the first image;
对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;Performing image recognition on the first image to obtain a recognition result corresponding to the first image;
根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;Generating a description sentence according to the recognition result, where the description sentence is used to describe the first image;
将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The description sentence is determined as an index of the first image, and the index is stored in correspondence with the first image.
可选地,所述根据所述识别结果生成描述语句,包括:Optionally, the generating a description sentence based on the recognition result includes:
将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。The first word vector is processed through a language description model to obtain the description sentence.
可选地,所述获取第一图像之后,还包括:Optionally, after acquiring the first image, the method further includes:
获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;Acquiring associated information of the first image, the associated information including at least one of the following: location information, time information, and scene information;
所述根据所述识别结果生成描述语句,包括:The generating a description sentence based on the recognition result includes:
将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
将所述关联信息转换为第二词向量;Convert the related information into a second word vector;
通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。The first word vector and the second word vector are processed through a language description model to obtain the description sentence.
可选地,所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储之前,还包括:Optionally, before determining the description sentence as an index of the first image and storing the index corresponding to the first image, the method further includes:
显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;Displaying inquiry information, the inquiry information is used to inquire whether to determine the description sentence as the index;
在接收到对应于所述询问信息的确认指示时,执行所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。When receiving the confirmation instruction corresponding to the inquiry information, performing the step of determining the description sentence as an index of the first image, and storing the index in correspondence with the first image.
可选地,所述显示询问信息之后,还包括:Optionally, after displaying the inquiry information, the method further includes:
在未接收到所述确认指示时,显示输入框;When the confirmation instruction is not received, an input box is displayed;
接收在所述输入框输入的语句;Receiving the sentence input in the input box;
将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。Determining the input sentence as an index of the first image, and storing the index corresponding to the first image.
可选地,所述对所述第一图像进行图像识别,得到所述第一图像对应的识别结果,包括:Optionally, performing image recognition on the first image to obtain a recognition result corresponding to the first image includes:
通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;Performing image recognition on the first image through an image recognition model to obtain recognition results corresponding to at least one object in the first image respectively;
其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。Wherein, the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
可选地,所述根据所述识别结果生成描述语句之前,还包括:Optionally, before generating the description sentence based on the recognition result, the method further includes:
获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;Acquiring a training sample set, the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;For the sample image, the recognition result is processed through a language description model, and the actual description sentence is output;
计算所述实际描述语句与所述期望描述语句之间的误差;Calculating the error between the actual description sentence and the expected description sentence;
当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。When the error is greater than a preset threshold, the parameters of the language description model are adjusted, and the step of outputting the actual description sentence from the step of processing through the language description model for each sample image and outputting the actual description sentence; When the error is less than or equal to the preset threshold, the training is stopped, and the language description model that completes the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
本申请实施例还提供了一种图像搜索方法,所述方法包括:An embodiment of the present application also provides an image search method. The method includes:
显示搜索框;Display the search box;
接收在所述搜索框输入的第一关键字;Receiving the first keyword entered in the search box;
在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;Searching for a second image matching the first keyword in the album, the index corresponding to the second image includes a first target keyword, and the first target keyword matches the first keyword , The index corresponding to the second image is a description sentence generated according to the recognition result of the second image;
显示搜索结果,所述搜索结果包括所述第二图像。A search result is displayed, the search result including the second image.
可选地,所述显示搜索结果之前,还包括:Optionally, before displaying the search results, the method further includes:
当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字;When the number of the second images is greater than the preset number, prompt information is displayed, and the prompt information is used to prompt the input of the second keyword;
获取所述第二关键字;Acquiring the second keyword;
在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;Searching for a third image matching the second keyword in the second image, an index corresponding to the third image includes a second target keyword, the second target keyword and the second key Match words;
其中,所述搜索结果包括所述第三图像。Wherein, the search result includes the third image.
请参考图1,其示出了本申请一个实施例提供的图像索引生成方法的流程图。该方法可以包括如下步骤:Please refer to FIG. 1, which shows a flowchart of an image index generation method provided by an embodiment of the present application. The method may include the following steps:
步骤101,获取第一图像。Step 101: Acquire a first image.
在一种可能的实现方式中,第一图像可以是终端上的摄像头采集到的图像。可选地,终端上设置有摄像头且安装有拍摄类应用程序,拍摄类应用程序是指用于拍摄图像的应用程序,例如,相机应用程序、美颜应用程序或其他应用程序等。当该拍摄类应用程序运行时,终端接收作用在当前拍摄界面上的拍摄控件的触发信号时,获取摄像头采集到的图像作为第一图像。In a possible implementation manner, the first image may be an image collected by a camera on the terminal. Optionally, a camera is provided on the terminal and a shooting application is installed. The shooting application refers to an application used to capture an image, for example, a camera application, a beauty application, or other applications. When the shooting application is running, the terminal receives the trigger signal of the shooting control acting on the current shooting interface, and acquires the image collected by the camera as the first image.
在另一种可能的实现方式中,第一图像可以不是终端上的摄像头采集到的图像,是用户从其他应用程序中保存的图像。可选地,第一图像是从网络中获取到的图像或者是截图。可选地,当终端的显示界面中显示有一图像,当终端接收到对应于该图像的保存指令时,根据该保存指令从网络中获取该图像作为第一图像。In another possible implementation manner, the first image may not be an image collected by a camera on the terminal, but an image saved by the user from other application programs. Optionally, the first image is an image obtained from the network or a screenshot. Optionally, when an image is displayed on the display interface of the terminal, when the terminal receives a save instruction corresponding to the image, the image is acquired from the network as the first image according to the save instruction.
此外,本申请实施例对第一图像的获取方式以及时机均不作限定。In addition, the embodiment of the present application does not limit the acquisition method and timing of the first image.
步骤102,对第一图像进行图像识别,得到第一图像对应的识别结果。Step 102: Perform image recognition on the first image to obtain a recognition result corresponding to the first image.
第一图像对应的识别结果用于指示第一图像包括的对象,例如,第一图像中可以包括一个或多个对象,例如人物、动物、建筑、风景等等。在本申请实施例中,终端通过如下步骤确定各个对象分别所属的分类,各个对象所属的分类用于指示对象具体所属的类别,例如,该对象是猫或狗或草或人或其他类别:通过图像识别模型对第一图像进行图像识别,得到第一图像中的至少一个对象分别对应的识别结果。The recognition result corresponding to the first image is used to indicate the object included in the first image. For example, the first image may include one or more objects, such as people, animals, buildings, landscapes, and so on. In the embodiment of the present application, the terminal determines the category to which each object belongs by the following steps. The category to which each object belongs is used to indicate the category to which the object belongs. For example, the object is a cat or dog or grass or human or other categories: The image recognition model performs image recognition on the first image to obtain recognition results corresponding to at least one object in the first image, respectively.
图像识别模型是采用多个样本图像训练得到的神经网络模型,例如,图像识别模型可以是采用多个样本图像对深度学习网络进行训练得到的,多个样本图像中的每个样本图像中的对象对应有分类标签,分类标签用于表征对象所属的类别。在本申请的一些实施例中,图像 识别模型包括:一个输入层、至少一个卷积层(比如包括第一卷积层、第二卷积层和第三卷积层共3个卷积层)、至少一个全连接层(比如包括第一全连接层和第二全连接层共2个全连接层)和一个输出层。输入层的输入数据即为第一图像,输出层的输出结果是该第一图像所包括的至少一个对象分别所属的分类。图像识别过程如下:将第一图像输入至图像识别模型的输入层,由图像识别模型的卷积层提取该第一图像的特征,而后由图像识别模型的全连接层对上述特征进行组合和抽象,得到适用于输出层进行分类的数据,最后由输出层输出该第一图像所包括的至少一个对象分别对应的识别结果。The image recognition model is a neural network model trained using multiple sample images. For example, the image recognition model may be obtained by training the deep learning network using multiple sample images. The objects in each sample image of the multiple sample images Corresponding to the classification label, the classification label is used to characterize the category to which the object belongs. In some embodiments of the present application, the image recognition model includes: an input layer and at least one convolutional layer (such as a total of 3 convolutional layers including a first convolutional layer, a second convolutional layer, and a third convolutional layer) , At least one fully connected layer (for example, including two fully connected layers including the first fully connected layer and the second fully connected layer) and one output layer. The input data of the input layer is the first image, and the output result of the output layer is the classification to which at least one object included in the first image belongs, respectively. The image recognition process is as follows: the first image is input to the input layer of the image recognition model, the features of the first image are extracted by the convolutional layer of the image recognition model, and then the above features are combined and abstracted by the fully connected layer of the image recognition model To obtain data suitable for classification in the output layer, and finally the output layer outputs the recognition results corresponding to the at least one object included in the first image, respectively.
在本申请实施例中,对图像识别模型的卷积层和全连接层的具体结构不作限定,上述实施例所示的图像识别模型仅是示例性和解释性的,并不用于限定本申请。一般来说,卷积神经网络的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对识别精度和效率的要求,设计适当层数的卷积神经网络。In the embodiments of the present application, the specific structures of the convolution layer and the fully connected layer of the image recognition model are not limited. The image recognition model shown in the above embodiment is only exemplary and explanatory, and is not used to limit the present application. In general, the more layers of the convolutional neural network, the better the effect but the longer the calculation time. In practical applications, the convolutional neural network with the appropriate number of layers can be designed in conjunction with the requirements for recognition accuracy and efficiency.
样本图像是指预先选定的,用于对图像识别模型进行训练的图像。样本图像具有分类标签,样本图像的分类标签通常由人工确定,用于描述样本图像对应的场景、物品、人物等等。The sample image refers to an image selected in advance for training the image recognition model. The sample image has a classification label. The classification label of the sample image is usually determined manually, and is used to describe the scene, item, person, etc. corresponding to the sample image.
可选地,神经网络可以是深度学习网络,深度学习网络可采用alexNet网络、VGG-16网络、GoogleNet网络、Deep Residual Learning(深度残差学习)网络等等,本申请实施例对此不作限定。另外,训练深度学习网络时所采用的算法可以是BP(Back-Propagation,反向传播算法)、faster RCNN(Regions with Convolutional Neural Network,区域卷积神经网络)算法等,本申请实施例对此不作限定。Optionally, the neural network may be a deep learning network, and the deep learning network may use alexNet network, VGG-16 network, GoogleNet network, Deep Residual Learning (deep residual learning) network, etc., which is not limited in the embodiments of the present application. In addition, the algorithms used in training the deep learning network may be BP (Back-Propagation, back propagation algorithm), faster RCNN (Regions with Convolutional Neural Network, regional convolutional neural network) algorithm, etc., this embodiment of the application does not make limited.
下面以训练深度学习网络时所采用的算法为BP算法为例,对图像识别模型的训练过程进行讲解:首先初始化深度学习网络中各个层的参数;其次将样本图像输入深度学习网络,得到样本图像对应的识别结果;然后将识别结果与分类标签进行比对,得到识别结果与分类标签之间的误差;最后基于上述误差调整深度学习网络中各个层的参数,重复上述步骤,直至识别结果与分类标签之间的误差小于预设数值,此时得到训练完成的深度学习网络,也即得到图像识别模型。The following uses the BP algorithm as an example to train the deep learning network as an example to explain the training process of the image recognition model: first initialize the parameters of each layer in the deep learning network; secondly, input the sample image into the deep learning network to obtain the sample image Corresponding recognition results; then compare the recognition results with the classification labels to obtain the error between the recognition results and the classification labels; finally adjust the parameters of each layer in the deep learning network based on the above errors, and repeat the above steps until the recognition results and the classification The error between the tags is less than the preset value. At this time, the trained deep learning network is obtained, that is, the image recognition model is obtained.
步骤103,根据识别结果生成描述语句。Step 103: Generate a description sentence according to the recognition result.
描述语句用于描述第一图像。描述语句中包括至少一个对象分别对应的识别结果。可选地,描述语句中还包括其它词语,该其它词语可以用于形容以下至少一种:至少两个对象之间的位置关系、某一对象正在执行的动作、某一对象所处的状态等等。示例性地,对第一图像进行识别,得到第一图像中的对象包括狗和草地,并且该狗在草地上的姿态为跑动,将上述识别结果输入语言描述模型中,得到该第一图像对应的描述语句为“狗在草地上跑动”。The description sentence is used to describe the first image. The description sentence includes the recognition results corresponding to at least one object respectively. Optionally, the description sentence also includes other words, which can be used to describe at least one of the following: the positional relationship between at least two objects, the action being performed by an object, the state of an object, etc. Wait. Exemplarily, the first image is recognized, and the objects in the first image include a dog and a grass, and the dog's posture on the grass is running, and the above recognition result is input into a language description model to obtain the first image The corresponding descriptive sentence is "dog running on the grass".
在本申请的一些实施例中,语言描述模型包括:一个输入层、至少一个卷积层(比如包 括第一卷积层、第二卷积层和第三卷积层共3个卷积层)、至少一个全连接层(比如包括第一全连接层和第二全连接层共2个全连接层)和一个输出层。输入层的输入数据即为第一图像,以及第一图像中的对象所属的识别结果,输出层的输出结果是该第一图像对应的描述语句。描述语句的生成过程如下:将第一图像以及第一图像中的对象的识别结果输入至语言描述模型的输入层,由语言描述模型的卷积层提取上述输入内容的特征,而后由语言描述模型的全连接层对上述特征进行组合和抽象,最后由输出层输出该第一图像对应的描述语句。In some embodiments of the present application, the language description model includes: an input layer and at least one convolutional layer (such as a total of 3 convolutional layers including a first convolutional layer, a second convolutional layer, and a third convolutional layer) , At least one fully connected layer (for example, including two fully connected layers including the first fully connected layer and the second fully connected layer) and one output layer. The input data of the input layer is the first image and the recognition result to which the object in the first image belongs. The output result of the output layer is the description sentence corresponding to the first image. The generation process of the description sentence is as follows: the first image and the recognition results of the objects in the first image are input to the input layer of the language description model, the convolutional layer of the language description model extracts the features of the above input content, and then the language description model The fully connected layer of the group combines and abstracts the above features, and finally the output layer outputs the description sentence corresponding to the first image.
在本申请实施例中,对语言描述模型的卷积层和全连接层的具体结构不作限定,上述实施例所示的语言描述模型仅是示例性和解释性的,并不用于限定本申请。一般来说,卷积神经网络的层数越多,效果越好但计算时间也会越长,在实际应用中,可结合对运算精度和效率的要求,设计适当层数的卷积神经网络。In the embodiments of the present application, the specific structures of the convolutional layer and the fully connected layer of the language description model are not limited. The language description model shown in the above embodiment is only exemplary and explanatory, and is not intended to limit the application. Generally speaking, the more layers of the convolutional neural network, the better the effect but the longer the calculation time. In practical applications, it is possible to design a convolutional neural network with an appropriate number of layers in accordance with the requirements for calculation accuracy and efficiency.
可选地,步骤103可以包括如下子步骤:Optionally, step 103 may include the following sub-steps:
在一个示例中,步骤103可以实现为:In an example, step 103 can be implemented as:
步骤103a,将识别结果转换为第一词向量;Step 103a, converting the recognition result into a first word vector;
步骤103b,通过语言描述模型对第一词向量进行处理,得到描述语句。Step 103b: Process the first word vector through the language description model to obtain a description sentence.
在本申请实施例中,终端通过词向量模型将识别结果转换成相应的词向量,词向量是指表征词语的向量,词向量模型是指将词语转换为词向量的模型,并将上述词向量输入语言描述模型,由语言描述模型输出描述语句。上述词向量模型可以是word2vec模型。In the embodiment of the present application, the terminal converts the recognition result into a corresponding word vector through a word vector model. The word vector refers to a vector representing words, and the word vector model refers to a model that converts words into word vectors, and converts the word vector Input the language description model, and output the description sentence from the language description model. The above word vector model may be a word2vec model.
在另一个示例中,终端还可以获取第一图像的关联信息。此时,步骤103还可以实现为:In another example, the terminal may also obtain the association information of the first image. At this time, step 103 can also be implemented as:
1、将识别结果转换为第一词向量;1. Convert the recognition result into the first word vector;
2、将关联信息转换为第二词向量;2. Convert the related information into the second word vector;
在本申请实施例中,关联信息包括以下至少一项:位置信息、时间信息、场景信息。位置信息用于指示拍摄第一图像时的地理位置,例如,上海、北京、加拿大等等,时间信息用于指示获取第一图像时的时间,例如,春天、夏天、秋天、冬天、清晨、傍晚等等;场景信息用于指示第一图像对应的场景,例如,公园、海滩、商场、学校等等。终端可以通过词向量模型将关联信息转换成相应的词向量。In the embodiment of the present application, the associated information includes at least one of the following: location information, time information, and scene information. Location information is used to indicate the geographic location when the first image was taken, for example, Shanghai, Beijing, Canada, etc. Time information is used to indicate the time when the first image was acquired, for example, spring, summer, autumn, winter, early morning, evening Etc.; the scene information is used to indicate the scene corresponding to the first image, for example, parks, beaches, shopping malls, schools, etc. The terminal can convert the related information into the corresponding word vector through the word vector model.
3、通过语言描述模型对第一词向量和第二词向量进行处理,得到描述语句。3. Process the first word vector and the second word vector through the language description model to obtain a description sentence.
终端将第一词向量和第二词向量输入语言描述模型,使得最终生成的描述语句更丰富。The terminal inputs the first word vector and the second word vector into the language description model, so that the final description sentence is more abundant.
示例性地,下面以关联信息为位置信息为例进行介绍说明。Exemplarily, the following uses the associated information as location information as an example for description.
第一,获取第一图像的位置信息。First, obtain the position information of the first image.
第二,将位置信息转换成第二词向量;Second, convert the location information into a second word vector;
第三,通过语言描述模型对第一词向量和第二词向量进行处理,得到描述语句。Third, the first word vector and the second word vector are processed through the language description model to obtain a description sentence.
位置信息用于指示拍摄第一图像时的地理位置。当第一图像为终端通过摄像头采集的图像时,该位置信息可以通过终端中的定位组件,例如,GPS(Global Positioning System,全球定位系统)组件来获取。当然,在其他可能的实现方式中,终端还可以通过对第一图像进行图像识别,来获取第一图像的位置信息。将位置信息转换成词向量的方式可以参考步骤103a,此处不作赘述。在本申请实施例中,通过结合拍摄第一图像的地理位置来生成第一图像对应的描述语句,能够更加完整地描述该第一图像,后续用户可以通过多个不同的关键字来搜索该第一图像,提升搜索的便利性。The location information is used to indicate the geographic location when the first image is taken. When the first image is an image collected by the terminal through the camera, the position information can be obtained by a positioning component in the terminal, for example, a GPS (Global Positioning System) component. Of course, in other possible implementation manners, the terminal may also obtain the position information of the first image by performing image recognition on the first image. For the method of converting the position information into a word vector, reference may be made to step 103a, which will not be repeated here. In the embodiment of the present application, the description sentence corresponding to the first image is generated by combining the geographic location where the first image is taken, so that the first image can be described more completely, and subsequent users can search for the first image through multiple different keywords An image to enhance the convenience of searching.
示例性地,对第一图像进行识别,得到第一图像中的对象包括狗和草地,并且该狗在草地上的姿态为跑动,此外,拍摄该第一图像的地理位置为XX公园,则该第一图像对应的描述语句为“狗在xx公园的草地上跑动”。Exemplarily, the first image is identified, and it is obtained that the objects in the first image include a dog and a grass, and the posture of the dog on the grass is running, in addition, the geographic location where the first image is taken is XX Park, then The descriptive sentence corresponding to the first image is "dog running on the grass in xx park".
步骤104,将描述语句确定为第一图像的索引,并将索引与第一图像对应存储。Step 104: Determine the description sentence as the index of the first image, and store the index corresponding to the first image.
终端将描述语句确定为第一图像的索引,并将该索引与第一图像进行对应存储。后续若用户需要查找该第一图像,则只需输入该描述语句包括的至少一个词语,或者与该描述语句中的词语相匹配的词语,例如,与该描述语句中的词语之间的相似度大于预设阈值的词语,则终端可以根据用户输入的词语查找到该第一图像,并将该第一图像展示给用户。The terminal determines the description sentence as the index of the first image, and stores the index in correspondence with the first image. Subsequently, if the user needs to search for the first image, he only needs to input at least one word included in the description sentence, or a word matching the word in the description sentence, for example, the similarity between the words in the description sentence For words greater than a preset threshold, the terminal may find the first image according to the words input by the user, and display the first image to the user.
另外,本申请实施例对存储描述语句与第一图像的路径不作限定,其可以由终端预先设定,也可以由用户自定义设定。In addition, the embodiment of the present application does not limit the path for storing the description sentence and the first image, which may be preset by the terminal or may be set by the user.
综上所述,本申请实施例提供的技术方案,通过识别出图像中所包括的各个对象分别对应的识别结果,并根据识别结果来生成描述图像的描述语句,将上述描述语句确定为该图像的索引,后续当用户需要搜索该图像时,可以输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相近的词语,终端可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。In summary, the technical solution provided by the embodiments of the present application recognizes the recognition results corresponding to each object included in the image, and generates a description sentence describing the image according to the recognition result, and determines the above description sentence as the image Index, when the user needs to search for the image later, he can input the words included in the index, or the words with similar meanings to the words included in the index, the terminal can accurately find the image according to the words entered by the user, Improve the search efficiency of searching images in the album.
另外,通过根据图像的识别结果来生成用于描述该图像的描述语句,并将该描述语句确定为该图像的索引,生成的索引准确。In addition, by generating a description sentence for describing the image according to the recognition result of the image, and determining the description sentence as the index of the image, the generated index is accurate.
请参考图2,其示出了本申请另一个实施例提供的图像索引生成方法的流程图。该方法可以包括如下步骤:Please refer to FIG. 2, which shows a flowchart of an image index generation method provided by another embodiment of the present application. The method may include the following steps:
步骤201,获取第一图像。Step 201: Acquire a first image.
步骤202,对第一图像进行图像识别,得到第一图像对应的识别结果。Step 202: Perform image recognition on the first image to obtain a recognition result corresponding to the first image.
步骤203,根据识别结果生成描述语句。Step 203: Generate a description sentence according to the recognition result.
步骤204,显示询问信息。In step 204, query information is displayed.
在本申请实施例中,询问信息用于询问是否将该描述语句确定为索引。示例性地,询问信息为“该图像对应的描述语句为“在鸟巢看演唱会”,是否确认?”。In the embodiment of the present application, the inquiry information is used to inquire whether to determine the description sentence as an index. Exemplarily, the inquiry message is "the description sentence corresponding to the image is "watching a concert in a bird's nest", are you sure?".
在本申请实施例中,用户可以预览通过语言描述模型所生成的描述语句,并决定是否将上述生成的描述语句确定为第一图像的索引。In the embodiment of the present application, the user can preview the description sentence generated by the language description model, and decide whether to determine the description sentence generated above as the index of the first image.
步骤205,在接收到对应于询问信息的确认指示时,将描述语句确定为第一图像的索引,并将索引与第一图像对应存储。Step 205, when receiving the confirmation instruction corresponding to the inquiry information, determine the description sentence as the index of the first image, and store the index corresponding to the first image.
若用户确定将该生成的描述语句确定为该图像的索引,则可以对该询问信息下达确认指示。对应于询问信息的确认指示用于指示确认将该生成的描述语句确定为该图像的索引。可选地,询问信息的周侧显示有确认控件,当终端接收到作用在该确认控件的触发信号时,终端接收到对应于询问信息的确认指示。If the user determines that the generated description sentence is determined as the index of the image, a confirmation instruction can be issued to the query information. The confirmation instruction corresponding to the inquiry information is used to instruct confirmation to determine the generated description sentence as the index of the image. Optionally, a confirmation control is displayed on the peripheral side of the query information, and when the terminal receives a trigger signal acting on the confirmation control, the terminal receives a confirmation instruction corresponding to the query information.
步骤206,在未接收到确认指示时,显示输入框。Step 206, when the confirmation instruction is not received, an input box is displayed.
输入框用于接收用户输入的第一图像对应的描述语句。可选地,当终端在预设时间内未接收到作用在该确认控件的触发信号,则终端未接收到确认指示。可选地,询问信息的周侧还显示有否认控件,当终端接收到对应于该否认控件的触发信号时,则终端未接收到确认指示,此时终端可以显示输入框。The input box is used to receive a description sentence corresponding to the first image input by the user. Optionally, when the terminal does not receive the trigger signal acting on the confirmation control within a preset time, the terminal does not receive the confirmation instruction. Optionally, a denial control is also displayed on the peripheral side of the query information. When the terminal receives a trigger signal corresponding to the denial control, the terminal does not receive the confirmation instruction, and the terminal may display an input box at this time.
步骤207,接收在输入框输入的语句。Step 207: Receive the sentence input in the input box.
在本申请实施例中,当用户对生成的描述语句不满意时,可以自行输入该目标图像的描述语句。In the embodiment of the present application, when the user is not satisfied with the generated description sentence, he can input the description sentence of the target image by himself.
步骤208,将输入的语句确定为第一图像的索引,并将索引与第一图像对应存储。Step 208: Determine the input sentence as the index of the first image, and store the index corresponding to the first image.
综上所述,本申请实施例提供的技术方案,通过用户判断是否将生成的描述语句确认为图像的索引,并在用户不满意终端所生成的描述语句的情况下,由用户自行输入该图像对应的描述语句,以使得后续用户能够根据自身所输入的描述语句来对该图像进行搜索,提高了索引的准确性,进而提高最终的图像索引效率。在生成第一图像的索引之后,用户可以根据该索引在相册中搜索第一图像。下面对该搜索过程进行讲解。在基于图1或图2所示实施例提供的一个可选实施例中,在步骤104之后,或者,在步骤208之后,如图3所示,本申请实施例还提供了一种图像搜索方法的流程图,该图像搜索方法可以包括如下步骤:In summary, in the technical solution provided by the embodiments of the present application, the user judges whether to confirm the generated description sentence as the index of the image, and if the user is not satisfied with the description sentence generated by the terminal, the user inputs the image by himself Corresponding description sentences, so that subsequent users can search the image according to the description sentences entered by themselves, which improves the accuracy of the index and further improves the final image indexing efficiency. After generating the index of the first image, the user can search the first image in the album according to the index. The following describes the search process. In an optional embodiment provided based on the embodiment shown in FIG. 1 or FIG. 2, after step 104, or after step 208, as shown in FIG. 3, an embodiment of the present application further provides an image search method , The image search method may include the following steps:
步骤301,显示搜索框。In step 301, a search box is displayed.
搜索框用于供用户输入搜索关键字,以使得终端能够查找与该搜索关键字相匹配的图像。在一种可能的实现方式中,相册应用程序的主界面中显示有该搜索框。在另一种可能的实现方式中,相册应用程序的主界面显示有搜索控件,当用户触发该搜索控件时,终端接收到对应于该搜索控件的触发信号,并根据该触发信号显示搜索框。本申请实施例对搜索框的显示 方式不作限定。The search box is used for the user to input a search keyword, so that the terminal can find an image matching the search keyword. In a possible implementation, the search box is displayed on the main interface of the album application. In another possible implementation, the main interface of the album application program displays a search control. When the user triggers the search control, the terminal receives a trigger signal corresponding to the search control, and displays a search box according to the trigger signal. The embodiment of the present application does not limit the display manner of the search box.
步骤302,接收在搜索框输入的第一关键字。Step 302: Receive the first keyword entered in the search box.
第一关键字由用户输入,其可以是“故宫”、“猫”“玫瑰花”等等,本申请实施例对此不作限定。The first keyword is input by the user, and it may be "Forbidden City", "Cat", "Rose Flower", etc., which is not limited in this embodiment of the present application.
步骤303,在相册中搜索与第一关键字相匹配的第二图像。Step 303: Search the album for the second image that matches the first keyword.
第二图像的数量可以是一张,也可以是多张。第二图像对应的索引用于描述该第二图像,第二图像对应的索引是根据第二图像的识别结果生成的描述语句,第二图像对应的索引中包括第一目标关键字。第一目标关键字可以是第二图像中所包括的对象对应的识别结果,也可以是描述语句中除识别结果之外的其它词语,本申请实施例对此不作限定。通过上述方式,用户可以通过不同的关键字来搜索同一图像,降低搜索图像的难度。The number of second images may be one, or multiple. The index corresponding to the second image is used to describe the second image. The index corresponding to the second image is a description sentence generated according to the recognition result of the second image. The index corresponding to the second image includes the first target keyword. The first target keyword may be a recognition result corresponding to the object included in the second image, or may be other words in the description sentence other than the recognition result, which is not limited in this embodiment of the present application. In this way, users can search the same image with different keywords, reducing the difficulty of searching for images.
示例性地,第一目标关键字与第一关键字相匹配,例如,第一目标关键字与第一关键字之间的相似度符合预设条件。上述预设条件可以是第一目标关键字与第一关键字之间的相似度大于预设阈值,上述预设阈值可以根据实际需求设定,本申请实施例对此不作限定。Exemplarily, the first target keyword matches the first keyword, for example, the similarity between the first target keyword and the first keyword meets a preset condition. The preset condition may be that the similarity between the first target keyword and the first keyword is greater than a preset threshold, and the preset threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.
可选地,终端先计算出终端所存储的各个描述语句所包括的词语与第一关键字之间的相似度,之后将与第一关键字之间的相似度符合预设条件的词语确定为第一目标关键字,最后将包含该第一目标关键字的描述语句对应的图像作为与第一关键字相匹配的第二图像。Optionally, the terminal first calculates the similarity between the words included in each description sentence stored in the terminal and the first keyword, and then determines the words whose similarity with the first keyword meets the preset condition as The first target keyword, and finally, the image corresponding to the description sentence containing the first target keyword is used as the second image matching the first keyword.
另外,可以通过如下方式计算第一关键字与描述语句所包括的词语之间的相似度:终端通过词向量模型将第一关键字表示为第一向量,将描述语句所包括的词语表示为第二向量,之后通过计算第一向量与第二向量之间的余弦距离,来计算第一关键字与描述语句所包括的词语之间的相似度,余弦距离越大,表明第一关键字与描述语句所包括的词语之间的相似度越低;反之,余弦距离越小,表明第一关键字与描述语句所包括的词语之间的相似度越高。之后,终端可以将余弦距离满足预设条件的词语确定为第一目标关键字。In addition, the similarity between the first keyword and the words included in the description sentence can be calculated as follows: the terminal expresses the first keyword as the first vector through the word vector model, and represents the words included in the description sentence as the first Two vectors, and then calculate the similarity between the first keyword and the words included in the description sentence by calculating the cosine distance between the first vector and the second vector, the greater the cosine distance, indicating that the first keyword and the description The lower the similarity between the words included in the sentence; conversely, the smaller the cosine distance, indicating that the similarity between the first keyword and the words included in the description sentence is higher. After that, the terminal may determine words whose cosine distance satisfies the preset condition as the first target keyword.
步骤304,显示搜索结果。Step 304: Display the search results.
终端在搜索结果页面中显示该搜索结果,搜索结果包括上述第二图像。当第二图像的数量为多张时,终端可以根据第一目标关键字与第一关键字之间的相似度的大小,来对第二图像进行排序。可选地,第一目标关键字与第一关键字之间的相似度越大,则包含该第一目标关键字的描述语句对应的第二图像在搜索结果页面中的排列顺序越靠前;第一目标关键字与第一关键字之间的相似度越小,则包含该第一目标关键字的描述语句对应的第二图像在搜索结果页面中的排列顺序越靠后。The terminal displays the search result on the search result page, and the search result includes the above-mentioned second image. When there are multiple second images, the terminal may sort the second images according to the similarity between the first target keyword and the first keyword. Optionally, the greater the similarity between the first target keyword and the first keyword, the more the second image corresponding to the description sentence containing the first target keyword is arranged in the search result page; The smaller the similarity between the first target keyword and the first keyword, the lower the order of the second image corresponding to the description sentence containing the first target keyword in the search result page.
综上所述,本申请实施例提供的技术方案,通过根据上文实施例所生成的图像索引来进行图像搜索,用户只需输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相 近的词语,终端就可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。In summary, the technical solution provided by the embodiments of the present application performs image search through the image index generated according to the above embodiment, and the user only needs to input the words included in the index or the For words with similar meanings, the terminal can accurately search for the image according to the words entered by the user, which improves the search efficiency of searching images in the album.
当用户输入第一关键字时,终端根据该第一关键字搜索的第二图像的数量较多时,此时用户需要在较多的第二图像中筛选出自己期望搜索到的图像,搜索效率依然较为低下。When the user enters the first keyword, and the terminal searches for more second images based on the first keyword, the user needs to filter out the images he desires to search among more second images at this time, and the search efficiency is still Relatively low.
请参考图4,其示出了本申请另一个实施例提供的图像搜索方法的流程图。该图像搜索方法可用于解决根据第一关键字搜索到的第二图像较多时,搜索效率低下的问题。该方法包括如下几个步骤:Please refer to FIG. 4, which shows a flowchart of an image search method provided by another embodiment of the present application. The image search method can be used to solve the problem of low search efficiency when there are many second images searched according to the first keyword. The method includes the following steps:
步骤401,显示搜索框。In step 401, a search box is displayed.
步骤402,接收在搜索框输入的第一关键字。Step 402: Receive the first keyword entered in the search box.
步骤403,在相册中搜索与第一关键字相匹配的第二图像。Step 403: Search the album for the second image that matches the first keyword.
步骤404,当第二图像的数量大于预设数量时,显示提示信息。In step 404, when the number of second images is greater than the preset number, a prompt message is displayed.
预设数量可以根据实际需求设定,本申请实施例对此不作限定。示例性地,预设数量为10张。提示信息用于提示输入第二关键字。可选地,第二关键字与第一关键字不同。The preset number can be set according to actual needs, which is not limited in the embodiments of the present application. Exemplarily, the preset number is 10 sheets. The prompt information is used to prompt the input of the second keyword. Optionally, the second keyword is different from the first keyword.
在本申请实施例中,终端在查找到与第一关键字相匹配的第二图像时,先检测该第二图像的数量是否大于预设数量。若该第二图像的数量小于或等于预设数量,则直接显示该第二图像。若第二图像的数量大于预设数量,则提示用户输入更多的关键字,以使得终端在上述与第一关键字相匹配的第二图像中继续筛选出与第一关键字、第二关键字均匹配的第三图像。In the embodiment of the present application, when finding the second image matching the first keyword, the terminal first detects whether the number of the second image is greater than the preset number. If the number of the second image is less than or equal to the preset number, the second image is directly displayed. If the number of second images is greater than the preset number, the user is prompted to enter more keywords, so that the terminal continues to filter out the first keyword and the second key in the second image matching the first keyword The third image matches the words.
步骤405,获取第二关键字。Step 405: Obtain the second keyword.
第二关键字也由用户输入,其与第一关键字不同。示例性地,上述提示信息中包括供用户输入第二关键字的输入框,用户可以在该输入框中输入第二关键字,以使得终端获取到该第二关键字。The second keyword is also input by the user, which is different from the first keyword. Exemplarily, the above prompt information includes an input box for the user to input the second keyword, and the user can input the second keyword in the input box, so that the terminal obtains the second keyword.
步骤406,在第二图像中搜索与第二关键字匹配的第三图像。Step 406: Search for a third image matching the second keyword in the second image.
第三图像对应的索引中包括第二目标关键字。第二目标关键字与第二关键字相匹配,示例性地,第二目标关键字与第二关键字之间的相似度符合第二预设条件。上述第二预设条件可以是第二目标关键字与第二关键字之间的相似度大于预设阈值,上述预设阈值可以根据实际需求设定,本申请实施例对此不作限定。The index corresponding to the third image includes the second target keyword. The second target keyword matches the second keyword. Exemplarily, the similarity between the second target keyword and the second keyword meets the second preset condition. The second preset condition may be that the similarity between the second target keyword and the second keyword is greater than a preset threshold, and the preset threshold may be set according to actual requirements, which is not limited in this embodiment of the present application.
在一个示例中,终端先计算出终端所存储的各个描述语句所包括的词语与第一关键字之间的相似度,以及终端所存储的各个描述语句所包括的词语与第二关键字之间的相似度;之后将与第一关键字之间的相似度符合第一预设条件的词语确定为第一目标关键字,将与第二关键字之间的相似度符合第二预设条件的词语确定为第二目标关键字;最后将包含该第一目 标关键字和第二目标关键字的描述语句对应的图像作为与第一关键字、第二关键字均匹配的第三图像。另外,第二关键字与描述语句所包括的词语之间的相似度的计算方式可以参考步骤303,此处不作赘述。In one example, the terminal first calculates the similarity between the words included in each description sentence stored by the terminal and the first keyword, and between the words included in each description sentence stored by the terminal and the second keyword The similarity of; then the words whose similarity with the first keyword meets the first preset condition are determined as the first target keyword, and the similarity with the second keyword meets the second preset condition The word is determined as the second target keyword; finally, the image corresponding to the description sentence containing the first target keyword and the second target keyword is used as the third image that matches both the first keyword and the second keyword. In addition, for the calculation method of the similarity between the second keyword and the words included in the description sentence, reference may be made to step 303, and details are not described here.
在另一个示例中,终端计算第二图像包括的词语与第二关键字的相似度,将与第二关键字之间的相似度符合第二预设条件的词语确定为跌女目标关键字,将第二图像中包括第二目标关键字的图像确定为第三图像。In another example, the terminal calculates the similarity between the words included in the second image and the second keyword, and determines that the similarity between the second keyword and the second keyword meets the second preset condition as the female target keyword, The image including the second target keyword in the second image is determined as the third image.
步骤407,显示搜索结果。 Step 407, display the search results.
在本申请实施例中,搜索结果包括上述第三图像。In the embodiment of the present application, the search result includes the above-mentioned third image.
综上所述,本申请实施例提供的技术方案,通过在搜索结果过多时,提示用户输入更多的关键字,以使得终端能够根据两次分别输入的关键字进行图像搜索,提升图像搜索的准确度。In summary, the technical solution provided by the embodiments of the present application can prompt the user to input more keywords when there are too many search results, so that the terminal can perform image search based on the keywords entered twice, thereby improving the image search performance. Accuracy.
在图1实施例中提到,语言描述模型是预先训练的,用于将至少两个词语编码成完整句子的模型。下面对语言描述模型的训练过程进行讲解。It is mentioned in the embodiment of FIG. 1 that the language description model is pre-trained, and is a model for encoding at least two words into a complete sentence. The following describes the training process of the language description model.
步骤501,获取训练样本集。Step 501: Obtain a training sample set.
训练样本集包括多个样本图像,样本图像对应有识别结果对应的期望描述语句。样本图像对应的识别结果可以人工标注,也可以通过图像识别模型得到。期望描述语句可以是人工标注的。The training sample set includes multiple sample images, and the sample images correspond to the expected description sentences corresponding to the recognition results. The recognition result corresponding to the sample image can be marked manually or obtained through the image recognition model. It is expected that the description sentence may be manually marked.
步骤502,对于样本图像,将识别结果通过语言描述模型进行处理,输出实际描述语句。Step 502: For the sample image, process the recognition result through the language description model, and output the actual description sentence.
语言描述模型可以是深度学习网络,例如alexNet网络、VGG-16网络、GoogleNet网络、Deep Residual Learning(深度残差学习)网络。初始化语言描述模型的各项参数,可选地,语言描述模型的各项参数可以是随机设定的,也可以是由相关技术人员根据经验设定的。在本申请实施例中,将每个样本图像输入语言描述模型,由该语言描述模型输出实际描述语句。The language description model may be a deep learning network, such as alexNet network, VGG-16 network, GoogleNet network, Deep Residual Learning (deep residual learning) network. The parameters of the language description model are initialized. Optionally, the parameters of the language description model may be set randomly, or may be set by relevant technical personnel based on experience. In the embodiment of the present application, each sample image is input into a language description model, and the language description model outputs an actual description sentence.
步骤503,计算实际描述语句与期望描述语句之间的误差。Step 503: Calculate the error between the actual description sentence and the expected description sentence.
可选地,终端将实际描述语句与期望描述语句之间的距离确定为误差。Optionally, the terminal determines the distance between the actual description sentence and the expected description sentence as an error.
当终端计算出实际描述语句与期望描述语句之间的误差后,检测该误差是否大于预设阈值。若误差大于预设阈值,则调整语言描述模型的参数,并从对于每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行,也即重复步骤502和503。直至误差小于或等于预设阈值时,停止训练,得到完成训练的语言描述模型。。After calculating the error between the actual description sentence and the expected description sentence, the terminal detects whether the error is greater than a preset threshold. If the error is greater than the preset threshold, the parameters of the language description model are adjusted, and the steps of outputting the actual description sentence are processed from the language description model for each sample image, that is, steps 502 and 503 are repeated. When the error is less than or equal to the preset threshold, the training is stopped, and the language description model that has completed the training is obtained. .
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中 未披露的细节,请参照本申请方法实施例。The following is an embodiment of the device of the present application, which can be used to execute the method embodiment of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图5,其示出了本申请一个实施例提供的图像索引生成装置的框图。该装置具有实现上述方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是终端,也可以设置在终端上,该装置包括:Please refer to FIG. 5, which shows a block diagram of an image index generation device provided by an embodiment of the present application. The device has the function of implementing the above method, and the function can be realized by hardware, or can be realized by hardware executing corresponding software. The device may be a terminal or may be provided on the terminal. The device includes:
图像获取模块601,用于获取第一图像。The image acquisition module 601 is used to acquire the first image.
图像识别模块602,用于对所述第一图像进行图像识别,得到所述第一图像对应的识别结果。The image recognition module 602 is configured to perform image recognition on the first image to obtain a recognition result corresponding to the first image.
语句生成模块603,用于根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像。The sentence generating module 603 is configured to generate a description sentence according to the recognition result, and the description sentence is used to describe the first image.
索引生成模块604,用于将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The index generation module 604 is configured to determine the description sentence as an index of the first image, and store the index corresponding to the first image.
综上所述,本申请实施例提供的技术方案,通过识别出图像中所包括的各个对象分别对应的识别结果,并根据识别结果来生成描述图像的描述语句,将上述描述语句确定为该图像的索引,后续当用户需要搜索该图像时,可以输入该索引中所包括的词语,或者与该索引中所包括的词语的含义相近的词语,终端可以根据用户输入的词语准确地查找该图像,提高了在相册中搜索图像的搜索效率。In summary, the technical solution provided by the embodiments of the present application recognizes the recognition results corresponding to each object included in the image, and generates a description sentence describing the image according to the recognition result, and determines the above description sentence as the image Index, when the user needs to search for the image later, he can input the words included in the index, or the words with similar meanings to the words included in the index, the terminal can accurately find the image according to the words entered by the user, Improve the search efficiency of searching images in the album.
在基于图5所示实施例提供的一个可选实施例中,所述语句生成模块603,用于:In an optional embodiment provided based on the embodiment shown in FIG. 5, the sentence generation module 603 is used to:
将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。The first word vector is processed through a language description model to obtain the description sentence.
可选地,所述装置,还包括:信息获取模块(图中未示出)。Optionally, the device further includes: an information acquisition module (not shown in the figure).
信息获取模块,用于获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;The information acquisition module is used to acquire the associated information of the first image, the associated information includes at least one of the following: location information, time information, scene information;
所述语句生成模块603,用于:The sentence generation module 603 is used to:
将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
将所述关联信息转换为第二词向量;Convert the related information into a second word vector;
通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。The first word vector and the second word vector are processed through a language description model to obtain the description sentence.
在基于图5所示实施例提供的一个可选实施例中,所述装置还包括:信息显示模块(图中未示出)。In an optional embodiment provided based on the embodiment shown in FIG. 5, the device further includes: an information display module (not shown in the figure).
信息显示模块,用于显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;The information display module is used to display query information, and the query information is used to query whether the description sentence is determined as the index;
所述索引生成模块640,还用于在接收到对应于所述询问信息的确认指示时,执行所述 将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。The index generation module 640 is further configured to, when receiving the confirmation instruction corresponding to the inquiry information, execute the determination of the description sentence as an index of the first image, and compare the index with the The first image corresponds to the stored step.
可选地,所述装置还包括:输入框显示模块和语句接收模块(图中未示出)。Optionally, the device further includes an input box display module and a sentence receiving module (not shown in the figure).
输入框显示模块,用于在未接收到所述确认指示时,显示输入框;The input box display module is used to display the input box when the confirmation instruction is not received;
语句接收模块,用于接收在所述输入框输入的语句;A sentence receiving module, configured to receive a sentence input in the input box;
所述索引生成模块640,还用于将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The index generation module 640 is further configured to determine the input sentence as an index of the first image, and store the index corresponding to the first image.
在基于图5所示实施例提供的一个可选实施例中,所述图像识别模块,用于:In an optional embodiment provided based on the embodiment shown in FIG. 5, the image recognition module is configured to:
通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;Performing image recognition on the first image through an image recognition model to obtain recognition results corresponding to at least one object in the first image respectively;
其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。Wherein, the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
可选地,所述装置还包括:样本集获取模块、语句输出模块、误差计算模块和模型训练模块(图中未示出)。Optionally, the device further includes: a sample set acquisition module, a sentence output module, an error calculation module, and a model training module (not shown in the figure).
样本集获取模块,用于获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;A sample set acquisition module, for acquiring a training sample set, the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
语句输出模块,用于对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;The sentence output module is used to process the recognition result through the language description model for the sample image and output the actual description sentence;
误差计算模块,用于计算所述实际描述语句与所述期望描述语句之间的误差;An error calculation module, used to calculate the error between the actual description sentence and the expected description sentence;
模型训练模块,用于当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。The model training module is used to adjust the parameters of the language description model when the error is greater than a preset threshold, and process from each of the sample images through the language description model to output the actual description sentence Steps begin to execute; until the error is less than or equal to the preset threshold, the training is stopped, and the language description model that has completed the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
请参考图6,其示出了本申请一个实施例提供的图像搜索装置的框图。该装置具有实现上述方法的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以是终端,也可以设置在终端上,该装置包括:Please refer to FIG. 6, which shows a block diagram of an image search apparatus provided by an embodiment of the present application. The device has the function of implementing the above method, and the function can be realized by hardware, or can be realized by hardware executing corresponding software. The device may be a terminal or may be provided on the terminal. The device includes:
搜索框显示模块710,用于显示搜索框。The search box display module 710 is used to display the search box.
关键字接收模块720,用于接收在所述搜索框输入的第一关键字。The keyword receiving module 720 is configured to receive the first keyword input in the search box.
图像搜索模块730,用于在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句。The image search module 730 is configured to search a second image matching the first keyword in an album, and the index corresponding to the second image includes a first target keyword, and the first target keyword is The first keyword matches, and the index corresponding to the second image is a description sentence generated according to the recognition result of the second image.
结果显示模块740,用于显示搜索结果,所述搜索结果包括所述第二图像。The result display module 740 is configured to display search results, and the search results include the second image.
综上所述,本申请实施例提供的技术方案,通过在搜索结果过多时,提示用户输入更多的关键字,以使得终端能够根据两次分别输入的关键字进行图像搜索,提升图像搜索的准确度。In summary, the technical solution provided by the embodiments of the present application can prompt the user to input more keywords when there are too many search results, so that the terminal can perform image search based on the keywords entered twice, thereby improving the image search performance. Accuracy.
可选地,所述装置,还包括:信息显示模块和关键字获取模块(图中未示出)。Optionally, the device further includes: an information display module and a keyword acquisition module (not shown in the figure).
信息显示模块,用于当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字。The information display module is configured to display prompt information when the number of the second images is greater than a preset number, and the prompt information is used to prompt the input of the second keyword.
关键字获取模块,用于获取所述第二关键字。The keyword acquisition module is used to acquire the second keyword.
所述图像搜索模块,还用于在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;The image search module is further configured to search for a third image matching the second keyword in the second image, and an index corresponding to the third image includes a second target keyword, the second The target keyword matches the second keyword;
其中,所述搜索结果包括所述第三图像。Wherein, the search result includes the third image.
需要说明的是,上述实施例提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the device provided in the above embodiment realizes its function, it is only exemplified by the division of the above functional modules. In practical applications, the above functions can be allocated by different functional modules according to needs, that is, the equipment The internal structure of is divided into different functional modules to complete all or part of the functions described above. In addition, the device and method embodiments provided in the above embodiments belong to the same concept. For the specific implementation process, see the method embodiments, and details are not described here.
参考图7,其示出了本申请一个示例性实施例提供的终端的结构方框图。本申请中的终端可以包括一个或多个如下部件:处理器610和存储器620。Referring to FIG. 7, it shows a structural block diagram of a terminal provided by an exemplary embodiment of the present application. The terminal in this application may include one or more of the following components: a processor 610 and a memory 620.
处理器610可以包括一个或者多个处理核心。处理器610利用各种接口和线路连接整个终端内的各个部分,通过运行或执行存储在存储器620内的指令、程序、代码集或指令集,以及调用存储在存储器620内的数据,执行终端的各种功能和处理数据。可选地,处理器610可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器610可集成中央处理器(Central Processing Unit,CPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统和应用程序等;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器610中,单独通过一块芯片进行实现。The processor 610 may include one or more processing cores. The processor 610 connects various parts of the entire terminal by using various interfaces and lines, and executes the terminal by executing or executing instructions, programs, code sets or instruction sets stored in the memory 620, and calling data stored in the memory 620 Various functions and processing data. Optionally, the processor 610 may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA) Various hardware forms. The processor 610 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU) and a modem. Among them, CPU mainly deals with operating system and application program, etc.; modem is used to deal with wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 610, and may be implemented by a chip alone.
可选地,处理器610执行存储器620中的程序指令时实现上述各个方法实施例提供的图 像索引生成方法或图像搜索方法。Optionally, when the processor 610 executes the program instructions in the memory 620, the image index generation method or the image search method provided by the foregoing method embodiments are implemented.
存储器620可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory,ROM)。可选地,该存储器620包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器620可用于存储指令、程序、代码、代码集或指令集。存储器620可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于至少一个功能的指令、用于实现上述各个方法实施例的指令等;存储数据区可存储根据终端的使用所创建的数据等。The memory 620 may include random access memory (Random Access Memory, RAM) or read-only memory (Read-Only Memory, ROM). Optionally, the memory 620 includes a non-transitory computer-readable storage medium. The memory 620 may be used to store instructions, programs, codes, code sets, or instruction sets. The memory 620 may include a storage program area and a storage data area, where the storage program area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing various method embodiments described above, etc.; storage data area It can store data created according to the use of the terminal.
上述终端的结构仅是示意性的,在实际实现时,终端可以包括更多或更少的组件,比如:显示屏等,本实施例对此不作限定。The structure of the above terminal is only schematic. In actual implementation, the terminal may include more or fewer components, such as a display screen, etc., which is not limited in this embodiment.
本领域技术人员可以理解,图6中示出的结构并不构成对终端600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。A person skilled in the art may understand that the structure shown in FIG. 6 does not constitute a limitation on the terminal 600, and may include more or fewer components than illustrated, or combine certain components, or adopt different component arrangements.
本申请一示例性实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器加载并执行时实现上述各个方法实施例提供的图像索引生成方法或图像搜索方法。An exemplary embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, which when loaded and executed by a processor implements the image index generation method or image search method provided by the above method embodiments .
本申请一示例性实施例还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各个实施例所述的图像索引生成方法或图像搜索方法。An exemplary embodiment of the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to execute the image index generation method or the image search method described in the above embodiments.
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。It should be understood that the "plurality" referred to herein refers to two or more. "And/or" describes the relationship of the related objects, indicating that there can be three relationships, for example, A and/or B, which can indicate: there are three conditions: A exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the related object is a "or" relationship.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps for implementing the above-described embodiments may be completed by hardware, or may be completed by a program instructing related hardware. The program may be stored in a computer-readable storage medium. The mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only optional embodiments of this application and are not intended to limit this application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this application should be included in the protection of this application Within range.

Claims (20)

  1. 一种图像索引生成方法,其特征在于,所述方法包括:An image index generation method, characterized in that the method includes:
    获取第一图像;Get the first image;
    对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;Performing image recognition on the first image to obtain a recognition result corresponding to the first image;
    根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;Generating a description sentence according to the recognition result, where the description sentence is used to describe the first image;
    将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The description sentence is determined as an index of the first image, and the index is stored in correspondence with the first image.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述识别结果生成描述语句,包括:The method according to claim 1, wherein the generating a description sentence according to the recognition result comprises:
    将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
    通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。The first word vector is processed through a language description model to obtain the description sentence.
  3. 根据权利要求1所述的方法,其特征在于,所述获取第一图像之后,还包括:The method according to claim 1, wherein after acquiring the first image, the method further comprises:
    获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;Acquiring associated information of the first image, the associated information including at least one of the following: location information, time information, and scene information;
    所述根据所述识别结果生成描述语句,包括:The generating a description sentence based on the recognition result includes:
    将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
    将所述关联信息转换为第二词向量;Convert the related information into a second word vector;
    通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。The first word vector and the second word vector are processed through a language description model to obtain the description sentence.
  4. 根据权利要求1所述的方法,其特征在于,所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储之前,还包括:The method according to claim 1, wherein before determining the description sentence as an index of the first image and storing the index corresponding to the first image, further comprising:
    显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;Displaying inquiry information, the inquiry information is used to inquire whether to determine the description sentence as the index;
    在接收到对应于所述询问信息的确认指示时,执行所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。When receiving the confirmation instruction corresponding to the inquiry information, performing the step of determining the description sentence as an index of the first image, and storing the index in correspondence with the first image.
  5. 根据权利要求4所述的方法,其特征在于,所述显示询问信息之后,还包括:The method according to claim 4, wherein after displaying the inquiry information, the method further comprises:
    在未接收到所述确认指示时,显示输入框;When the confirmation instruction is not received, an input box is displayed;
    接收在所述输入框输入的语句;Receiving the sentence input in the input box;
    将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。Determining the input sentence as an index of the first image, and storing the index corresponding to the first image.
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述对所述第一图像进行图像识别,得到所述第一图像对应的识别结果,包括:The method according to any one of claims 1 to 5, wherein the performing image recognition on the first image to obtain a recognition result corresponding to the first image includes:
    通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;Performing image recognition on the first image through an image recognition model to obtain recognition results corresponding to at least one object in the first image respectively;
    其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。Wherein, the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
  7. 根据权利要求1至5任一项所述的方法,其特征在于,所述根据所述识别结果生成描述语句之前,还包括:The method according to any one of claims 1 to 5, wherein before generating the description sentence based on the recognition result, the method further comprises:
    获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;Acquiring a training sample set, the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
    对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;For the sample image, the recognition result is processed through a language description model, and the actual description sentence is output;
    计算所述实际描述语句与所述期望描述语句之间的误差;Calculating the error between the actual description sentence and the expected description sentence;
    当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。When the error is greater than a preset threshold, the parameters of the language description model are adjusted, and the step of outputting the actual description sentence from the step of processing through the language description model for each sample image and outputting the actual description sentence; When the error is less than or equal to the preset threshold, the training is stopped, and the language description model that completes the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
  8. 一种图像搜索方法,其特征在于,所述方法包括:An image search method, characterized in that the method includes:
    显示搜索框;Display the search box;
    接收在所述搜索框输入的第一关键字;Receiving the first keyword entered in the search box;
    在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;Searching for a second image matching the first keyword in the album, the index corresponding to the second image includes a first target keyword, and the first target keyword matches the first keyword , The index corresponding to the second image is a description sentence generated according to the recognition result of the second image;
    显示搜索结果,所述搜索结果包括所述第二图像。A search result is displayed, the search result including the second image.
  9. 根据权利要求8所述的方法,其特征在于,所述显示搜索结果之前,还包括:The method according to claim 8, wherein before displaying the search results, the method further comprises:
    当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字;When the number of the second images is greater than the preset number, prompt information is displayed, and the prompt information is used to prompt the input of the second keyword;
    获取所述第二关键字;Acquiring the second keyword;
    在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;Searching for a third image matching the second keyword in the second image, an index corresponding to the third image includes a second target keyword, the second target keyword and the second key Match words;
    其中,所述搜索结果包括所述第三图像。Wherein, the search result includes the third image.
  10. 一种图像索引生成装置,其特征在于,所述装置包括:An image index generating device, characterized in that the device includes:
    图像获取模块,用于获取第一图像;The image acquisition module is used to acquire the first image;
    图像识别模块,用于对所述第一图像进行图像识别,得到所述第一图像对应的识别结果;An image recognition module, configured to perform image recognition on the first image to obtain a recognition result corresponding to the first image;
    语句生成模块,用于根据所述识别结果生成描述语句,所述描述语句用于描述所述第一图像;A sentence generating module, configured to generate a description sentence according to the recognition result, and the description sentence is used to describe the first image;
    索引生成模块,用于将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The index generation module is configured to determine the description sentence as an index of the first image, and store the index corresponding to the first image.
  11. 根据权利要求10所述的装置,其特征在于,所述语句生成模块,用于:The apparatus according to claim 10, wherein the sentence generation module is configured to:
    将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
    通过语言描述模型对所述第一词向量进行处理,得到所述描述语句。The first word vector is processed through a language description model to obtain the description sentence.
  12. 根据权利要求10所述的装置,其特征在于,所述装置,还包括:The device of claim 10, wherein the device further comprises:
    信息获取模块,用于获取所述第一图像的关联信息,所述关联信息包括以下至少一项:位置信息、时间信息、场景信息;The information acquisition module is used to acquire the associated information of the first image, the associated information includes at least one of the following: location information, time information, scene information;
    所述语句生成模块,用于:The sentence generation module is used to:
    将所述识别结果转换为第一词向量;Convert the recognition result into a first word vector;
    将所述关联信息转换为第二词向量;Convert the related information into a second word vector;
    通过语言描述模型对所述第一词向量和所述第二词向量进行处理,得到所述描述语句。The first word vector and the second word vector are processed through a language description model to obtain the description sentence.
  13. 根据权利要求10所述的装置,其特征在于,所述装置,还包括:The device of claim 10, wherein the device further comprises:
    信息显示模块,用于显示询问信息,所述询问信息用于询问是否将所述描述语句确定为所述索引;The information display module is used to display query information, and the query information is used to query whether the description sentence is determined as the index;
    所述索引生成模块,还用于在接收到对应于所述询问信息的确认指示时,执行所述将所述描述语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储的步骤。The index generation module is further configured to execute the determination of the description sentence as the index of the first image when receiving the confirmation instruction corresponding to the inquiry information, and to compare the index with the first An image corresponds to the stored step.
  14. 根据权利要求13所述的方法,其特征在于,所述装置,还包括:The method according to claim 13, wherein the device further comprises:
    输入框显示模块,用于在未接收到所述确认指示时,显示输入框;The input box display module is used to display the input box when the confirmation instruction is not received;
    语句接收模块,用于接收在所述输入框输入的语句;A sentence receiving module, configured to receive a sentence input in the input box;
    所述索引生成模块,还用于将所述输入的语句确定为所述第一图像的索引,并将所述索引与所述第一图像对应存储。The index generation module is further configured to determine the input sentence as an index of the first image, and store the index corresponding to the first image.
  15. 根据权利要求10至14任一项所述的装置,其特征在于,所述图像识别模块,用于:The device according to any one of claims 10 to 14, wherein the image recognition module is configured to:
    通过图像识别模型对所述第一图像进行图像识别,得到所述第一图像中的至少一个对象分别对应的识别结果;Performing image recognition on the first image through an image recognition model to obtain recognition results corresponding to at least one object in the first image respectively;
    其中,所述图像识别模型是采用多个样本图像训练得到的神经网络模型,所述多个样本图像中的每个样本图像中的对象对应有分类标签。Wherein, the image recognition model is a neural network model trained by using multiple sample images, and the object in each sample image of the multiple sample images corresponds to a classification label.
  16. 根据权利要求10至14任一项所述的方法,其特征在于,所述装置,还包括:The method according to any one of claims 10 to 14, wherein the device further comprises:
    样本集获取模块,用于获取训练样本集,所述训练样本集包括多个样本图像,所述样本图像对应有所述识别结果对应的期望描述语句;A sample set acquisition module, for acquiring a training sample set, the training sample set including a plurality of sample images, the sample images corresponding to the expected description sentences corresponding to the recognition results;
    语句输出模块,用于对于所述样本图像,将所述识别结果通过语言描述模型进行处理,输出实际描述语句;The sentence output module is used to process the recognition result through the language description model for the sample image and output the actual description sentence;
    误差计算模块,用于计算所述实际描述语句与所述期望描述语句之间的误差;An error calculation module, used to calculate the error between the actual description sentence and the expected description sentence;
    模型训练模块,用于当所述误差大于预设阈值时,则调整所述语言描述模型的参数,并从所述对于所述每个样本图像,通过语言描述模型进行处理,输出实际描述语句的步骤开始执行;直至所述误差小于或等于所述预设阈值时,停止训练,得到完成训练的所述语言描述模型,所述语言描述模型用于根据所述识别结果生成所述描述语句。The model training module is used to adjust the parameters of the language description model when the error is greater than a preset threshold, and process from each of the sample images through the language description model to output the actual description sentence Steps begin to execute; until the error is less than or equal to the preset threshold, the training is stopped, and the language description model that has completed the training is obtained, and the language description model is used to generate the description sentence according to the recognition result.
  17. 一种图像搜索装置,其特征在于,所述装置包括:An image search device, characterized in that the device includes:
    搜索框显示模块,用于显示搜索框;Search box display module, used to display the search box;
    关键字接收模块,用于接收在所述搜索框输入的第一关键字;A keyword receiving module, configured to receive the first keyword input in the search box;
    图像搜索模块,用于在相册中搜索与所述第一关键字相匹配的第二图像,所述第二图像对应的索引中包括第一目标关键字,所述第一目标关键字与所述第一关键字相匹配,所述第二图像对应的索引是根据所述第二图像的识别结果生成的描述语句;An image search module is used to search a photo album for a second image matching the first keyword, an index corresponding to the second image includes a first target keyword, and the first target keyword and the The first keywords match, and the index corresponding to the second image is a description sentence generated according to the recognition result of the second image;
    结果显示模块,用于显示搜索结果,所述搜索结果包括所述第二图像。The result display module is used to display search results, and the search results include the second image.
  18. 根据权利要求17所述的方法,其特征在于,所述装置,还包括:The method of claim 17, wherein the device further comprises:
    信息显示模块,用于当所述第二图像的数量大于预设数量时,显示提示信息,所述提示信息用于提示输入第二关键字;An information display module, configured to display prompt information when the number of the second images is greater than a preset number, and the prompt information is used to prompt input of a second keyword;
    关键字获取模块,用于获取所述第二关键字;A keyword acquisition module for acquiring the second keyword;
    所述图像搜索模块,还用于在所述第二图像中搜索与所述第二关键字匹配的第三图像,所述第三图像对应的索引中包括第二目标关键字,所述第二目标关键字与所述第二关键字相匹配;The image search module is further configured to search for a third image matching the second keyword in the second image, and an index corresponding to the third image includes a second target keyword, the second The target keyword matches the second keyword;
    其中,所述搜索结果包括所述第三图像。Wherein, the search result includes the third image.
  19. 一种终端,其特征在于,所述终端包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1至7任一项所述的图像索引生成方法,或实现如权利要求8至9任一项所述的图像搜索方法。A terminal, characterized in that the terminal includes a processor and a memory, and the memory stores a computer program, and the computer program is loaded and executed by the processor to implement any one of claims 1 to 7. Image index generation method, or implement the image search method according to any one of claims 8 to 9.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至7任一项所述的图像索引生成方法,或实现如权利要求8至9任一项所述的图像搜索方法。A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, and the computer program is loaded and executed by a processor to realize the image according to any one of claims 1 to 7. An index generation method, or an image search method according to any one of claims 8 to 9.
PCT/CN2019/115411 2018-11-30 2019-11-04 Image index generation method, image search method and apparatus, and terminal, and medium WO2020108234A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811457455.0 2018-11-30
CN201811457455.0A CN109635135A (en) 2018-11-30 2018-11-30 Image index generation method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
WO2020108234A1 true WO2020108234A1 (en) 2020-06-04

Family

ID=66070700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/115411 WO2020108234A1 (en) 2018-11-30 2019-11-04 Image index generation method, image search method and apparatus, and terminal, and medium

Country Status (2)

Country Link
CN (1) CN109635135A (en)
WO (1) WO2020108234A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635135A (en) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 Image index generation method, device, terminal and storage medium
CN110083729B (en) * 2019-04-26 2023-10-27 北京金山数字娱乐科技有限公司 Image searching method and system
CN110362698A (en) * 2019-07-08 2019-10-22 北京字节跳动网络技术有限公司 A kind of pictorial information generation method, device, mobile terminal and storage medium
CN112541091A (en) * 2019-09-23 2021-03-23 杭州海康威视数字技术股份有限公司 Image searching method, device, server and storage medium
CN110704654A (en) * 2019-09-27 2020-01-17 三星电子(中国)研发中心 Picture searching method and device
CN112925939A (en) * 2019-12-05 2021-06-08 阿里巴巴集团控股有限公司 Picture searching method, description information generating method, device and storage medium
CN111046203A (en) * 2019-12-10 2020-04-21 Oppo广东移动通信有限公司 Image retrieval method, image retrieval device, storage medium and electronic equipment
CN111797765B (en) * 2020-07-03 2024-04-16 北京达佳互联信息技术有限公司 Image processing method, device, server and storage medium
CN112711998A (en) * 2020-12-24 2021-04-27 珠海新天地科技有限公司 3D model annotation system and method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838724A (en) * 2012-11-20 2014-06-04 百度在线网络技术(北京)有限公司 Image search method and device
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN106708940A (en) * 2016-11-11 2017-05-24 百度在线网络技术(北京)有限公司 Method and device used for processing pictures
CN107766853A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 A kind of generation, display methods and the electronic equipment of the text message of image
WO2018134964A1 (en) * 2017-01-20 2018-07-26 楽天株式会社 Image search system, image search method, and program
CN109635135A (en) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 Image index generation method, device, terminal and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136228A (en) * 2011-11-25 2013-06-05 阿里巴巴集团控股有限公司 Image search method and image search device
CN107908770A (en) * 2017-11-30 2018-04-13 维沃移动通信有限公司 A kind of photo searching method and mobile terminal
CN108021654A (en) * 2017-12-01 2018-05-11 北京奇安信科技有限公司 A kind of photograph album image processing method and device
CN108509521B (en) * 2018-03-12 2020-02-18 华南理工大学 Image retrieval method for automatically generating text index

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838724A (en) * 2012-11-20 2014-06-04 百度在线网络技术(北京)有限公司 Image search method and device
CN107766853A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 A kind of generation, display methods and the electronic equipment of the text message of image
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN106708940A (en) * 2016-11-11 2017-05-24 百度在线网络技术(北京)有限公司 Method and device used for processing pictures
WO2018134964A1 (en) * 2017-01-20 2018-07-26 楽天株式会社 Image search system, image search method, and program
CN109635135A (en) * 2018-11-30 2019-04-16 Oppo广东移动通信有限公司 Image index generation method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN109635135A (en) 2019-04-16

Similar Documents

Publication Publication Date Title
WO2020108234A1 (en) Image index generation method, image search method and apparatus, and terminal, and medium
JP7091504B2 (en) Methods and devices for minimizing false positives in face recognition applications
Gu et al. An empirical study of language cnn for image captioning
WO2019154262A1 (en) Image classification method, server, user terminal, and storage medium
CN111062871B (en) Image processing method and device, computer equipment and readable storage medium
CA2804230C (en) A computer-implemented method, a computer program product and a computer system for image processing
US20210271707A1 (en) Joint Visual-Semantic Embedding and Grounding via Multi-Task Training for Image Searching
WO2016015437A1 (en) Method, apparatus and device for generating picture search library and searching for picture
WO2019214453A1 (en) Content sharing system, method, labeling method, server and terminal device
CN106897372B (en) Voice query method and device
KR102124466B1 (en) Apparatus and method for generating conti for webtoon
WO2020044099A1 (en) Service processing method and apparatus based on object recognition
US20200012862A1 (en) Multi-model Techniques to Generate Video Metadata
CN116797684B (en) Image generation method, device, electronic equipment and storage medium
US20170171471A1 (en) Method and device for generating multimedia picture and an electronic device
JP6046501B2 (en) Feature point output device, feature point output program, feature point output method, search device, search program, and search method
WO2022012205A1 (en) Word completion method and apparatus
WO2023101679A1 (en) Text-image cross-modal retrieval based on virtual word expansion
KR20230025917A (en) Augmented reality-based voice translation related to travel
Panda et al. Heritage app: annotating images on mobile phones
US8994834B2 (en) Capturing photos
CN117854156B (en) Training method and related device for feature extraction model
WO2014186392A2 (en) Summarizing a photo album
CN109739970A (en) Information processing method and device and electronic equipment
JP7483532B2 (en) KEYWORD EXTRACTION DEVICE, KEYWORD EXTRACTION METHOD, AND KEYWORD EXTRACTION PROGRAM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19889402

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19889402

Country of ref document: EP

Kind code of ref document: A1