JP2024519504A

JP2024519504A - Reverse image search based on deep neural network (DNN) model and image feature detection model

Info

Publication number: JP2024519504A
Application number: JP2023567990A
Authority: JP
Inventors: ジョンファイ; プラッギャガーグ
Original assignee: Sony Corp; Sony Group Corp
Current assignee: Sony Corp; Sony Group Corp
Priority date: 2021-05-18
Filing date: 2022-05-18
Publication date: 2024-05-14
Also published as: WO2022243912A1; EP4323892A1; CN116420143A

Abstract

逆画像検索のための電子装置及び方法を提供する。電子装置は画像を受け取る。電子装置は、画像に関連する第１の画像特徴セットをＤＮＮモデルによって抽出し、第１の画像特徴セットに基づいて第１の特徴ベクトルを生成する。電子装置は、画像に関連する第２の画像特徴セットを画像特徴検出モデルによって抽出し、第２の画像特徴セットに基づいて第２の特徴ベクトルを生成する。電子装置は、第１及び第２の特徴ベクトルの結合に基づいて第３の特徴ベクトルを生成する。電子装置は、第３の特徴ベクトルと、予め記憶された画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックを決定し、類似度メトリックに基づいて予め記憶された画像を識別する。電子装置は、予め記憶された画像に関連する情報を表示するようにディスプレイ装置を制御する。【選択図】図１An electronic device and method for reverse image searching is provided. The electronic device receives an image. The electronic device extracts a first set of image features associated with the image via a DNN model and generates a first feature vector based on the first set of image features. The electronic device extracts a second set of image features associated with the image via an image feature detection model and generates a second feature vector based on the second set of image features. The electronic device generates a third feature vector based on a combination of the first and second feature vectors. The electronic device determines a similarity metric between the third feature vector and a fourth feature vector of each image in a set of pre-stored images and identifies the pre-stored images based on the similarity metric. The electronic device controls a display device to display information associated with the pre-stored images. (FIG. 1)

Description

〔関連出願との相互参照／引用による組み入れ〕
本出願は、２０２１年５月１８日に出願された米国仮特許出願シリアル番号第６３／１８９，９５６号の優先権を主張するものであり、この文献の内容は全体が引用により本明細書に組み入れられる。 CROSS REFERENCE TO RELATED APPLICATIONS/INCORPORATION BY REFERENCE
This application claims priority to U.S. Provisional Patent Application Serial No. 63/189,956, filed May 18, 2021, the contents of which are incorporated herein by reference in their entirety.

本開示の様々な実施形態は、逆画像検索に関する。具体的には、本開示の様々な実施形態は、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための電子装置及び方法に関する。 Various embodiments of the present disclosure relate to reverse image searching. In particular, various embodiments of the present disclosure relate to electronic devices and methods for reverse image searching based on deep neural network (DNN) models and image feature detection models.

情報通信技術の進歩は、様々なインターネットベースの画像検索システム（例えば、ウェブ検索エンジン）をもたらした。従来、ユーザは、入力画像を検索クエリとしてウェブ検索エンジンにアップロードすることができる。このような場合、ウェブ検索エンジンは、（逆画像検索法を使用して）インターネットから出力画像セットを提供することができる。出力画像セットは、入力画像に類似したものであることができる。このような逆画像検索法は、入力画像に類似する出力画像セットを決定するために機械学習モデルを採用することができる。場合によっては、機械学習モデルが入力画像内の１又は２以上のオブジェクトを誤分類した結果、出力画像セットが望ましくない又は無関係な画像を含んでしまう場合がある。 Advances in information and communication technology have led to a variety of Internet-based image retrieval systems (e.g., web search engines). Traditionally, a user may upload an input image to a web search engine as a search query. In such a case, the web search engine may provide an output image set from the Internet (using a reverse image search technique). The output image set may be similar to the input image. Such a reverse image search technique may employ a machine learning model to determine an output image set that is similar to the input image. In some cases, the machine learning model may misclassify one or more objects in the input image, resulting in the output image set including undesirable or irrelevant images.

当業者には、説明したシステムと、本出願の残り部分において図面を参照しながら示す本開示のいくつかの態様とを比較することにより、従来の慣習的な手法の限界及び不利点が明らかになるであろう。 The limitations and disadvantages of conventional approaches will become apparent to one skilled in the art by comparing the described system with certain aspects of the present disclosure illustrated in the remainder of this application and with reference to the drawings.

実質的に少なくとも１つの図に関連して図示及び／又は説明し、特許請求の範囲にさらに完全に示すような、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための電子装置及び方法を提供する。 The present invention provides an electronic device and method for reverse image search based on a deep neural network (DNN) model and an image feature detection model, substantially as illustrated and/or described in connection with at least one of the figures and more fully set forth in the claims.

全体を通じて同じ要素を同じ参照符号によって示す添付図面を参照しながら本開示の以下の詳細な説明を検討することにより、本開示のこれらの及びその他の特徴及び利点を理解することができる。 These and other features and advantages of the present disclosure can be understood by considering the following detailed description of the disclosure in conjunction with the accompanying drawings, in which like reference characters refer to like elements throughout.

本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的なネットワーク環境を示すブロック図である。FIG. 1 is a block diagram illustrating an exemplary network environment for reverse image search based on a deep neural network (DNN) model and an image feature detection model, in accordance with an embodiment of the present disclosure. 本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的な電子装置を示すブロック図である。FIG. 1 is a block diagram illustrating an exemplary electronic device for reverse image search based on a deep neural network (DNN) model and an image feature detection model, in accordance with an embodiment of the present disclosure. 本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的な動作を示す図である。FIG. 1 illustrates an example operation for reverse image search based on a deep neural network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. 本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的な方法を示すフローチャートである。1 is a flowchart illustrating an example method for reverse image search based on a deep neural network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure.

逆画像検索の精度を高めるためにディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づいて逆画像検索を行うための開示する電子装置及び方法では、後述する実装を見出すことができる。本開示の例示的な態様は、逆画像検索のためのディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルを実装する電子装置を提供する。電子装置は、（ユーザが類似する画像を検索する必要がある画像などの）第１の画像を受け取ることができる。電子装置は、受け取った第１の画像に関連する第１の画像特徴セットをディープニューラルネットワーク（ＤＮＮ）モデルによって抽出し、抽出された第１の画像特徴セットに基づいて、受け取った第１の画像に関連する第１の特徴ベクトルを生成する。電子装置は、受け取った第１の画像に関連する第２の画像特徴セットを画像特徴検出モデルによって抽出し、抽出された第２の画像特徴セットに基づいて、受け取った第１の画像に関連する第２の特徴ベクトルを生成することができる。画像特徴検出モデルの例としては、以下に限定するわけではないが、スケール不変特徴変換（Ｓｃａｌｅ－ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ：ＳＩＦＴ）ベースのモデル、高速化ロバスト特徴（Ｓｐｅｅｄｅｄ－ＵｐＲｏｂｕｓｔＦｅａｔｕｒｅ：ＳＵＲＦ）ベースのモデル、方向付きＦＡＳＴ及び回転ＢＲＩＥＦ（ＯｒｉｅｎｔｅｄＦＡＳＴａｎｄＲｏｔａｔｅｄＢＲＩＥＦ：ＯＲＢ）ベースのモデル、又は近似最近傍のための高速ライブラリ（ＦａｓｔＬｉｂｒａｒｙｆｏｒＡｐｐｒｏｘｉｍａｔｅＮｅａｒｅｓｔＮｅｉｇｈｂｏｒｓ：ＦＬＡＮＮ）ベースのモデルを挙げることができる。画像特徴検出モデルは、ＤＮＮモデル１０８によって一部が誤検出及び／又は誤分類された可能性のある画像特徴を抽出することができる。 The disclosed electronic device and method for performing reverse image search based on a deep neural network (DNN) model and an image feature detection model to improve the accuracy of the reverse image search can be implemented as described below. An exemplary aspect of the present disclosure provides an electronic device implementing a deep neural network (DNN) model and an image feature detection model for reverse image search. The electronic device can receive a first image (such as an image for which a user needs to search for similar images). The electronic device can extract a first set of image features associated with the received first image by the deep neural network (DNN) model, and generate a first feature vector associated with the received first image based on the extracted first image feature set. The electronic device can extract a second set of image features associated with the received first image by the image feature detection model, and generate a second feature vector associated with the received first image based on the extracted second image feature set. Examples of image feature detection models include, but are not limited to, a Scale-Invariant Feature Transform (SIFT)-based model, a Speeded-Up Robust Feature (SURF)-based model, an Oriented FAST and Rotated BRIEF (ORB)-based model, or a Fast Library for Approximate Nearest Neighbors (FLANN)-based model. The image feature detection model can extract image features that may have been partially misdetected and/or misclassified by the DNN model 108.

電子装置は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像に関連する第３の特徴ベクトルをさらに生成することができる。ある例では、第３の特徴ベクトルの生成が、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に対する主成分分析（ＰＣＡ）変換の適用にさらに基づくことができる。電子装置は、受け取った第１の画像に関連する生成された第３の特徴ベクトルと、（データベースに記憶された画像などの）予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックをさらに決定することができる。類似度メトリックの例としては、以下に限定するわけではないが、コサイン距離類似度又はユークリッド距離類似度を挙げることができる。電子装置は、決定された類似度メトリックに基づいて、予め記憶された第２の画像セットから（受け取った第１の画像と同一又は同様の画像などの）予め記憶された第３の画像をさらに識別し、識別された予め記憶された第３の画像に関連する情報を表示するようにディスプレイ装置を制御することができる。 The electronic device may further generate a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector. In one example, the generation of the third feature vector may be further based on application of a principal component analysis (PCA) transform to the combination of the generated first feature vector and the generated second feature vector. The electronic device may further determine a similarity metric between the generated third feature vector associated with the received first image and a fourth feature vector of each image of the pre-stored second set of images (e.g., images stored in a database). Examples of similarity metrics may include, but are not limited to, cosine distance similarity or Euclidean distance similarity. The electronic device may further identify a pre-stored third image (e.g., an image that is the same or similar to the received first image) from the pre-stored second set of images based on the determined similarity metric, and control the display device to display information associated with the identified pre-stored third image.

開示する電子装置は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像に関連する第３の特徴ベクトルを自動的に生成することができる。この結果、第３の特徴ベクトルは、ＤＮＮモデルによって決定できる第１の画像特徴セットと、画像特徴検出モデルによって決定できる第２の画像特徴セットとを含むことができる。第１の画像特徴セットは、受け取った第１の画像に関連する高水準画像特徴（例えば、目、鼻、耳、髪などの顔の特徴）を含み、第２の画像特徴セットは、受け取った第１の画像に関連する低水準画像特徴（例えば、顔のエッジ、ライン、輪郭）を含む。高水準画像特徴及び低水準画像特徴の両方を第３の特徴ベクトルに含めることで、類似する画像の識別を補完し合うことができる。例えば、受け取った第１の画像が、ＤＮＮモデルの訓練データセット内で十分に表現されていない画像である場合、第１の画像特徴セットは、受け取った第１の画像に類似する画像を予め記憶された第２の画像セットから識別できるほど十分なものでない可能性がある。しかしながら、第２の画像特徴セットは、受け取った第１の画像に関連する低水準画像特徴を含むことができるので、第２の画像特徴セットを第３の特徴ベクトルに含めることで、受け取った第１の画像に類似する画像を予め記憶された第２の画像セットから識別する精度を高めることができる。一方で、画像の品質が良くない場合（例えば、解像度が低い不鮮明な画像の場合）、第１の画像特徴セット（すなわち、高水準画像特徴）は、類似する画像を識別できるほど十分なものでない可能性がある。このような場合には、第２の画像特徴（すなわち、低水準画像特徴）の方が、類似する画像の識別にとって有用かつ正確な場合がある。 The disclosed electronic device can automatically generate a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector. As a result, the third feature vector can include a first image feature set that can be determined by a DNN model and a second image feature set that can be determined by an image feature detection model. The first image feature set includes high-level image features associated with the received first image (e.g., facial features such as eyes, nose, ears, hair, etc.), and the second image feature set includes low-level image features associated with the received first image (e.g., facial edges, lines, contours). By including both the high-level image features and the low-level image features in the third feature vector, the identification of similar images can be complemented. For example, if the received first image is an image that is not well represented in the training dataset of the DNN model, the first image feature set may not be sufficient to identify images similar to the received first image from the pre-stored second image set. However, since the second image feature set may include low-level image features related to the received first image, including the second image feature set in the third feature vector may increase the accuracy of identifying images similar to the received first image from the pre-stored second image set. On the other hand, if the image quality is poor (e.g., in the case of a low-resolution, blurry image), the first image feature set (i.e., high-level image features) may not be sufficient to identify similar images. In such cases, the second image features (i.e., low-level image features) may be more useful and accurate for identifying similar images.

図１は、本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的なネットワーク環境を示すブロック図である。図１にはネットワーク環境１００を示す。ネットワーク環境１００は、電子装置１０２、サーバ１０４、及びデータベース１０６を含むことができる。さらに、サーバ１０４に実装されたディープニューラルネットワーク（ＤＮＮ）モデル１０８及び画像特徴検出モデル１１０も示す。図１に示すように、データベース１０６には訓練データセット１１２を記憶することができる。電子装置１０２、サーバ１０４及びデータベース１０６は、通信ネットワーク１１４を介して互いに通信可能に結合することができる。さらに、電子装置１０２に関連するユーザ１１６も示す。図１には電子装置１０２及びサーバ１０４を２つの独立した装置として示しているが、いくつかの実施形態では、本開示の範囲から逸脱することなく、サーバ１０４の機能全体を電子装置１０２に組み込むこともできる。 FIG. 1 is a block diagram illustrating an exemplary network environment for reverse image search based on a deep neural network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. FIG. 1 illustrates a network environment 100. The network environment 100 may include an electronic device 102, a server 104, and a database 106. Also illustrated is a deep neural network (DNN) model 108 and an image feature detection model 110 implemented on the server 104. As illustrated in FIG. 1, the database 106 may store a training data set 112. The electronic device 102, the server 104, and the database 106 may be communicatively coupled to each other via a communication network 114. Also illustrated is a user 116 associated with the electronic device 102. Although FIG. 1 illustrates the electronic device 102 and the server 104 as two separate devices, in some embodiments, the entire functionality of the server 104 may be incorporated into the electronic device 102 without departing from the scope of the present disclosure.

電子装置１０２は、第１の画像にＤＮＮモデル１０８及び画像特徴検出モデル１１０を実装することに基づいて第１の画像に類似する画像セットを識別して表示するように構成できる好適なロジック、回路、インターフェイス及び／又はコードを含むことができる。電子装置１０２の例としては、以下に限定するわけではないが、画像検索エンジン、サーバ、パーソナルコンピュータ、ラップトップ、コンピュータワークステーション、メインフレームマシン、ゲーム装置、仮想現実（ＶＲ）／拡張現実（ＡＲ）／複合現実（ＭＲ）装置、スマートフォン、携帯電話機、コンピュータ装置、タブレット、及び／又はいずれかの消費者向け電子（ＣＥ）装置を挙げることができる。 The electronic device 102 may include suitable logic, circuitry, interfaces, and/or code that may be configured to identify and display a set of images similar to a first image based on implementing the DNN model 108 and the image feature detection model 110 on the first image. Examples of the electronic device 102 may include, but are not limited to, an image search engine, a server, a personal computer, a laptop, a computer workstation, a mainframe machine, a gaming device, a virtual reality (VR)/augmented reality (AR)/mixed reality (MR) device, a smartphone, a mobile phone, a computing device, a tablet, and/or any consumer electronics (CE) device.

ＤＮＮモデル１０８は、画像特徴検出タスクに基づいて第１の画像内の第１の画像特徴セットを検出するように訓練できる深層畳み込みニューラルネットワークモデルとすることができる。ＤＮＮモデル１０８は、例えば（単複の）活性化関数、重みの数、コスト関数、正則化関数、入力サイズ及び層の数などのハイパーパラメータによって定めることができる。ＤＮＮモデル１０８は、計算ネットワーク又は（ノードとも呼ばれる）人工ニューロンのシステムと呼ぶことができる。ＤＮＮモデル１０８のノードは、ＤＮＮモデル１０８のニューラルネットワークトポロジーで定められるような複数の層状に配置することができる。ＤＮＮモデル１０８の複数の層は、入力層、１又は２以上の隠れ層、及び出力層を含むことができる。複数の層の各層は、１又は２以上のノード（又は人工ニューロン）を含むことができる。入力層における全てのノードの出力は、（単複の）隠れ層の少なくとも１つのノードに結合することができる。同様に、各隠れ層の入力は、ＤＮＮモデル１０８の他の層における少なくとも１つのノードの出力に結合することができる。各隠れ層の出力は、ＤＮＮモデル１０８の他の層における少なくとも１つのノードの入力に結合することができる。最終層の（単複の）ノードは、少なくとも１つの隠れ層から入力を受け取って結果を出力することができる。層の数及び各層におけるノードの数は、ＤＮＮモデル１０８のハイパーパラメータから決定することができる。このようなハイパーパラメータは、訓練データセット１１２に基づくＤＮＮモデル１０８の訓練前又は訓練中に設定することができる。 The DNN model 108 may be a deep convolutional neural network model that can be trained to detect a first set of image features in a first image based on an image feature detection task. The DNN model 108 may be defined by hyperparameters such as activation function(s), number of weights, cost function, regularization function, input size, and number of layers. The DNN model 108 may be referred to as a computational network or a system of artificial neurons (also called nodes). The nodes of the DNN model 108 may be arranged in multiple layers as defined by the neural network topology of the DNN model 108. The multiple layers of the DNN model 108 may include an input layer, one or more hidden layers, and an output layer. Each layer of the multiple layers may include one or more nodes (or artificial neurons). The output of every node in the input layer may be coupled to at least one node in the hidden layer(s). Similarly, the input of each hidden layer may be coupled to the output of at least one node in another layer of the DNN model 108. The output of each hidden layer may be coupled to the input of at least one node in another layer of the DNN model 108. The node(s) in the final layer may receive input from at least one hidden layer and output a result. The number of layers and the number of nodes in each layer may be determined from hyperparameters of the DNN model 108. Such hyperparameters may be set before or during training of the DNN model 108 based on the training dataset 112.

ＤＮＮモデル１０８の各ノードは、ネットワークの訓練中に調整できるパラメータセットを有する数学関数（例えば、シグモイド関数又は正規化線形ユニット（ｒｅｃｔｉｆｉｅｄｌｉｎｅａｒｕｎｉｔ））に対応することができる。パラメータセットは、例えば重みパラメータ及び正則化パラメータなどを含むことができる。各ノードは、ＤＮＮモデル１０８の他の（単複の）層（例えば、前の（単複の）層）のノードからの１又は２以上の入力に基づいて、数学関数を使用して出力を計算することができる。ＤＮＮモデル１０８のノードの全部又は一部は、同じ又は異なる数学関数に対応することができる。 Each node of the DNN model 108 can correspond to a mathematical function (e.g., a sigmoid function or a rectified linear unit) having a parameter set that can be adjusted during training of the network. The parameter set can include, for example, weight parameters and regularization parameters. Each node can calculate an output using a mathematical function based on one or more inputs from nodes in other layer(s) (e.g., previous layer(s)) of the DNN model 108. All or some of the nodes of the DNN model 108 can correspond to the same or different mathematical functions.

ＤＮＮモデル１０８の訓練では、（訓練データセットからの）所与の入力に対する最終層の出力がＤＮＮモデル１０８の損失関数に基づく正しい結果に一致するかどうかに基づいてＤＮＮモデル１０８の各ノードの１又は２以上のパラメータを更新することができる。上記プロセスは、損失関数の最小値が達成されて訓練エラーが最小化されるまで同じ又は異なる入力について繰り返すことができる。当業では、勾配降下法、確率的勾配降下法、バッチ勾配降下法、勾配ブースト法及びメタヒューリスティック法などの複数の訓練法が知られている。 Training the DNN model 108 may involve updating one or more parameters of each node of the DNN model 108 based on whether the output of the final layer for a given input (from the training dataset) matches the correct result based on the loss function of the DNN model 108. The above process may be repeated for the same or different inputs until a minimum of the loss function is achieved and the training error is minimized. Several training methods are known in the art, such as gradient descent, stochastic gradient descent, batch gradient descent, gradient boosting, and metaheuristic methods.

ある実施形態では、ＤＮＮモデル１０８が、例えば電子装置１０２又はサーバ１０４上で実行可能なアプリケーションのソフトウェアコンポーネントとして実装できる電子データを含むことができる。ＤＮＮモデル１０８は、電子装置１０２又はサーバ１１０などの処理装置による実行のためにライブラリ、外部スクリプト又はその他のロジック／命令に依拠することができる。ＤＮＮモデル１０８は、入力画像内の画像特徴を検出するための１又は２以上の動作を電子装置１０２又はサーバ１０４などのコンピュータ装置が実行できるようにするコンピュータ実行可能コード又はルーチンを含むことができる。これに加えて又は代えて、ＤＮＮモデル１０８は、プロセッサ、（例えば、１又は２以上の動作の実行又はその制御を行う）マイクロプロセッサ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）又は特定用途向け集積回路（ＡＳＩＣ）を含むハードウェアを使用して実装することもできる。例えば、電子装置１０２（又はサーバ１０４）には、画像特徴検出タスクのためのＤＮＮモデル１０８の計算を加速させる推論アクセラレータチップを含めることができる。いくつかの実施形態では、ＤＮＮモデル１０８を、ハードウェア及びソフトウェアの両方の組み合わせを使用して実装することができる。ＤＮＮモデル１０８の例としては、以下に限定するわけではないが、人工ニューラルネットワーク（ＡＮＮ）、畳み込みニューラルネットワーク（ＣＮＮ）、ＲｅｇｉｏｎｓｗｉｔｈＣＮＮ（Ｒ－ＣＮＮ）、ＦａｓｔＲ－ＣＮＮ、ＦａｓｔｅｒＲ－ＣＮＮ、ＹｏｕＯｎｌｙＬｏｏｋＯｎｃｅ（ＹＯＬＯ）ネットワーク、残差ニューラルネットワーク（Ｒｅｓ－Ｎｅｔ）、特徴ピラミッドネットワーク（ＦＰＮ）、網膜ネット、シングルショット検出器（ＳＳＤ）、及び／又はこれらの組み合わせを挙げることができる。 In some embodiments, the DNN model 108 may include electronic data that may be implemented, for example, as a software component of an application executable on the electronic device 102 or server 104. The DNN model 108 may rely on libraries, external scripts, or other logic/instructions for execution by a processing device, such as the electronic device 102 or server 110. The DNN model 108 may include computer-executable code or routines that enable a computing device, such as the electronic device 102 or server 104, to perform one or more operations to detect image features in an input image. Additionally or alternatively, the DNN model 108 may be implemented using hardware, including a processor, a microprocessor (e.g., performing or controlling one or more operations), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC). For example, the electronic device 102 (or server 104) may include an inference accelerator chip that accelerates the computation of the DNN model 108 for image feature detection tasks. In some embodiments, the DNN model 108 may be implemented using a combination of both hardware and software. Examples of the DNN model 108 include, but are not limited to, an artificial neural network (ANN), a convolutional neural network (CNN), Regions with CNN (R-CNN), Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO) network, a residual neural network (Res-Net), a feature pyramid network (FPN), a retina net, a single shot detector (SSD), and/or combinations thereof.

画像特徴検出モデル１１０は、第１の画像に関連する画像特徴を抽出するように構成された画像処理アルゴリズムとすることができる。画像特徴検出モデル１１０は、例えば画像特徴の数、エッジ閾値、重みの数、コスト関数、入力サイズ及び層の数などのハイパーパラメータによって定めることができる。画像特徴検出モデル１１０のハイパーパラメータは、画像特徴検出モデル１１０のコスト関数の大域的極小値に向かうように調整することができ、重みもそのように更新することができる。画像特徴検出モデル１１０は、例えば電子装置１０２又はサーバ１０４上で実行可能なアプリケーションのソフトウェアコンポーネントとして実装できる電子データを含むことができる。画像特徴検出モデル１１０は、電子装置１０２又はサーバ１０４などの処理装置による実行のためにライブラリ、外部スクリプト又はその他のロジック／命令に依拠することができる。画像特徴検出モデル１１０は、第１の画像に関連する画像特徴セットを抽出することなどの１又は２以上の動作を電子装置１０２又はサーバ１０４などのコンピュータ装置が実行できるようにするよう構成されたコード及びルーチンを含むことができる。これに加えて又は代えて、画像特徴検出モデル１１０は、プロセッサ、（例えば、１又は２以上の動作の実行又はその制御を行う）マイクロプロセッサ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）又は特定用途向け集積回路（ＡＳＩＣ）を含むハードウェアを使用して実装することもできる。或いは、いくつかの実施形態では、画像特徴検出モデル１１０を、ハードウェアとソフトウェアとの組み合わせを使用して実装することもできる。画像特徴検出モデル１１０の例としては、以下に限定するわけではないが、スケール不変特徴変換（ＳＩＦＴ）ベースのモデル、高速化ロバスト特徴（ＳＵＲＦ）ベースのモデル、方向付きＦＡＳＴ及び回転ＢＲＩＥＦ（ＯＲＢ）ベースのモデル、又は近似最近傍のための高速ライブラリ（ＦＬＡＮＮ）ベースのモデルを挙げることができる。 The image feature detection model 110 may be an image processing algorithm configured to extract image features associated with the first image. The image feature detection model 110 may be defined by hyperparameters such as, for example, the number of image features, an edge threshold, the number of weights, a cost function, an input size, and the number of layers. The hyperparameters of the image feature detection model 110 may be adjusted to move toward a global minimum of the cost function of the image feature detection model 110, and the weights may be updated accordingly. The image feature detection model 110 may include electronic data that may be implemented as a software component of an application executable on, for example, the electronic device 102 or the server 104. The image feature detection model 110 may rely on libraries, external scripts, or other logic/instructions for execution by a processing device, such as the electronic device 102 or the server 104. The image feature detection model 110 may include code and routines configured to enable a computing device, such as the electronic device 102 or the server 104, to perform one or more operations, such as extracting a set of image features associated with the first image. Additionally or alternatively, the image feature detection model 110 may be implemented using hardware, including a processor, a microprocessor (e.g., performing or controlling one or more operations), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC). Alternatively, in some embodiments, the image feature detection model 110 may be implemented using a combination of hardware and software. Examples of the image feature detection model 110 may include, but are not limited to, a scale invariant feature transform (SIFT)-based model, a speed-up robust features (SURF)-based model, an oriented FAST and rotated BRIEF (ORB)-based model, or a fast library for approximate nearest neighbors (FLANN)-based model.

サーバ１０４は、ＤＮＮモデル１０８及び画像特徴検出モデル１１０を記憶するように構成できる好適なロジック、回路、インターフェイス及び／又はコードを含むことができる。サーバ１０４は、ＤＮＮモデル１０８を使用して第１の画像に関連する第１の特徴ベクトルを生成し、画像特徴検出モデル１１０を使用して第１の画像に関連する第２の特徴ベクトルを生成することができる。サーバ１０４は、ＤＮＮモデル１０８及び画像特徴検出モデル１１０とは異なる機械学習モデルをさらに記憶することができる。記憶された機械学習モデルは、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを決定するように構成することができる。例示的な実施形態では、サーバ１０４がクラウドサーバとして実装され、ウェブアプリケーション、クラウドアプリケーション、ＨＴＴＰリクエスト、リポジトリ操作及びファイル転送などを通じて動作を実行することができる。サーバ１０４の他の実装例としては、以下に限定するわけではないが、データベースサーバ、ファイルサーバ、ウェブサーバ、アプリケーションサーバ、メインフレームサーバ、又はクラウドコンピューティングサーバを挙げることができる。 The server 104 may include suitable logic, circuitry, interfaces, and/or code that may be configured to store the DNN model 108 and the image feature detection model 110. The server 104 may use the DNN model 108 to generate a first feature vector associated with a first image and the image feature detection model 110 to generate a second feature vector associated with the first image. The server 104 may further store a machine learning model different from the DNN model 108 and the image feature detection model 110. The stored machine learning model may be configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector. In an exemplary embodiment, the server 104 is implemented as a cloud server and may perform operations through web applications, cloud applications, HTTP requests, repository operations, file transfers, and the like. Other implementations of the server 104 may include, but are not limited to, a database server, a file server, a web server, an application server, a mainframe server, or a cloud computing server.

少なくとも１つの実施形態では、当業者に周知の複数の技術を使用することにより、サーバ１０４を複数の分散型クラウドベースリソースとして実装することができる。当業者であれば、本開示の範囲を２つの別個のエンティティとしてのサーバ１０４及び電子装置１０２の実装に限定しないこともできると理解するであろう。いくつかの実施形態では、本開示の範囲から逸脱することなく、サーバ１０４の機能を全体的に又は少なくとも部分的に電子装置１０２に組み込むこともできる。 In at least one embodiment, the server 104 can be implemented as multiple distributed cloud-based resources using techniques known to those of skill in the art. Those skilled in the art will appreciate that the scope of the present disclosure may not be limited to the implementation of the server 104 and the electronic device 102 as two separate entities. In some embodiments, the functionality of the server 104 may be incorporated in whole or at least in part into the electronic device 102 without departing from the scope of the present disclosure.

データベース１０６は、ＤＮＮモデル１０８のための訓練データセット１１２を記憶するように構成できる好適なロジック、インターフェイス及び／又はコードを含むことができる。訓練データセット１１２は、予め記憶された訓練画像セット、及び予め記憶された訓練画像セットの各画像に割り当てられた所定のタグを含むことができる。特定の訓練画像に割り当てられる所定のタグは、特定の訓練画像について予め決定しておくことができる画像特徴に対応するラベルを含むことができる。ＤＮＮモデル１０８は、訓練データセット１１２に基づいて画像特徴検出タスクのために予め訓練することができる。ある実施形態では、データベース１０６を、予め記憶された第２の画像セットを記憶するようにさらに構成することができる。データベース１０６は、リレーショナルデータベース又は非リレーショナルデータベースとすることができる。また、いくつかの事例では、データベース１０６をクラウドサーバなどのサーバ（例えば、サーバ１０４）上に記憶し、又は電子装置１０２上にキャッシュして記憶することもできる。これに加えて又は代えて、データベース１０６は、プロセッサ、（例えば、１又は２以上の動作の実行又はその制御を行う）マイクロプロセッサ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）又は特定用途向け集積回路（ＡＳＩＣ）を含むハードウェアを使用して実装することもできる。他のいくつかの事例では、データベース１０６を、ハードウェアとソフトウェアとの組み合わせを使用して実装することができる。 The database 106 may include suitable logic, interfaces, and/or code that may be configured to store a training data set 112 for the DNN model 108. The training data set 112 may include a pre-stored set of training images and a pre-defined tag assigned to each image of the pre-stored set of training images. The pre-defined tag assigned to a particular training image may include a label corresponding to an image feature that may have been pre-determined for the particular training image. The DNN model 108 may be pre-trained for an image feature detection task based on the training data set 112. In some embodiments, the database 106 may be further configured to store a second pre-stored set of images. The database 106 may be a relational or non-relational database. Also, in some cases, the database 106 may be stored on a server, such as a cloud server (e.g., the server 104), or cached and stored on the electronic device 102. Additionally or alternatively, database 106 may be implemented using hardware, including a processor, a microprocessor (e.g., performing or controlling one or more operations), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC). In some other cases, database 106 may be implemented using a combination of hardware and software.

通信ネットワーク１１４は、電子装置１０２、サーバ１０４及びデータベース１０６が互いに通信できるようにする通信媒体を含むことができる。通信ネットワーク１１４の例としては、以下に限定するわけではないが、インターネット、クラウドネットワーク、ロングタームエボリューション（ＬＴＥ）ネットワーク、（無線ローカルエリアネットワーク）ＷＬＡＮ、ローカルエリアネットワーク（ＬＡＮ）、電話回線（ＰＯＴＳ）、及び／又はメトロポリタンエリアネットワーク（ＭＡＮ）を挙げることができる。ネットワーク環境１００内の様々な装置は、様々な有線及び無線通信プロトコルに従って通信ネットワーク１１４に接続するように構成することができる。このような有線及び無線通信プロトコルの例としては、以下に限定するわけではないが、伝送制御プロトコル及びインターネットプロトコル（ＴＣＰ／ＩＰ）、ユーザデータグラムプロトコル（ＵＤＰ）、ハイパーテキスト転送プロトコル（ＨＴＴＰ）、ファイル転送プロトコル（ＦＴＰ）、ＺｉｇＢｅｅ、ＥＤＧＥ、ＩＥＥＥ８０２．１１、ライトフィデリティ（Ｌｉ－Ｆｉ）、８０２．１６、ＩＥＥＥ８０２．１１ｓ、ＩＥＥＥ８０２．１１ｇ、マルチホップ通信、無線アクセスポイント（ＡＰ）、装置間通信、セルラー通信プロトコル、又はＢｌｕｅｔｏｏｔｈ（ＢＴ）通信プロトコルのうちの少なくとも１つ、或いはこれらの組み合わせを挙げることができる。 The communication network 114 may include a communication medium that allows the electronic device 102, the server 104, and the database 106 to communicate with each other. Examples of the communication network 114 may include, but are not limited to, the Internet, a cloud network, a Long Term Evolution (LTE) network, a (wireless local area network) WLAN, a local area network (LAN), a telephone line (POTS), and/or a metropolitan area network (MAN). The various devices in the network environment 100 may be configured to connect to the communication network 114 according to various wired and wireless communication protocols. Examples of such wired and wireless communication protocols include, but are not limited to, at least one of the following: Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), ZigBee, EDGE, IEEE 802.11, Light Fidelity (Li-Fi), 802.16, IEEE 802.11s, IEEE 802.11g, multi-hop communication, wireless access point (AP), device-to-device communication, cellular communication protocol, or Bluetooth (BT) communication protocol, or a combination thereof.

動作中、電子装置１０２は、逆画像検索クエリを開始することができる。ある実施形態では、逆画像検索を、（図２に示す）ディスプレイ装置を介して受け取られたユーザ入力に基づいて開始することができる。電子装置１０２は、逆画像検索の開始時に第１の画像を画像検索クエリとして受け取るように構成することができる。例えば、第１の画像は、ユーザ入力に基づいて電子装置１０２の（図２に示す）Ｉ／Ｏ装置を通じてアップロードされた画像に対応することができる。第１の画像は、前景又は背景オブジェクトが固定された静止画像、又はビデオから抽出された画像に関連することができる。電子装置１０２は、受け取った第１の画像に関連する第１の画像特徴セットをＤＮＮモデル１０８によって抽出するように構成することができる。電子装置１０２は、受け取った第１の画像に関連する第２の画像特徴セットを画像特徴検出モデル１１０によって抽出するように構成することができる。第１の画像特徴セット及び第２の画像特徴セットの詳細については、例えば図３に示す。画像特徴検出モデル１１０の例としては、以下に限定するわけではないが、スケール不変特徴変換（ＳＩＦＴ）ベースのモデル、高速化ロバスト特徴（ＳＵＲＦ）ベースのモデル、方向付きＦＡＳＴ及び回転ＢＲＩＥＦ（ＯＲＢ）ベースのモデル、又は近似最近傍のための高速ライブラリ（ＦＬＡＮＮ）ベースのモデルを挙げることができる。 In operation, the electronic device 102 may initiate a reverse image search query. In some embodiments, the reverse image search may be initiated based on user input received via a display device (shown in FIG. 2). The electronic device 102 may be configured to receive a first image as an image search query at the initiation of the reverse image search. For example, the first image may correspond to an image uploaded through an I/O device (shown in FIG. 2) of the electronic device 102 based on user input. The first image may relate to a still image with fixed foreground or background objects, or an image extracted from a video. The electronic device 102 may be configured to extract a first set of image features associated with the received first image by the DNN model 108. The electronic device 102 may be configured to extract a second set of image features associated with the received first image by the image feature detection model 110. Details of the first and second image feature sets are shown, for example, in FIG. 3. Examples of the image feature detection model 110 include, but are not limited to, a scale invariant feature transform (SIFT)-based model, a speed-up robust features (SURF)-based model, an oriented FAST and rotated BRIEF (ORB)-based model, or a fast library for approximate nearest neighbors (FLANN)-based model.

電子装置１０２は、抽出された第１の画像特徴セットに基づいて、受け取った第１の画像に関連する第１の特徴ベクトルを生成するようにさらに構成することができる。電子装置１０２は、抽出された第２の画像特徴セットに基づいて、受け取った第１の画像に関連する第２の特徴ベクトルを生成するようにさらに構成することができる。受け取った第１の画像に関連する第１の特徴ベクトルは、第１の画像特徴セットに関する情報を含むことができるベクトルとすることができ、受け取った第１の画像に関連する第２の特徴ベクトルは、第２の画像特徴セットに関する情報を含むことができるベクトルとすることができる。電子装置１０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像に関連する第３の特徴ベクトルを生成するようにさらに構成することができる。第３の特徴ベクトルは、以下に限定するわけではないが、生成された第１の特徴ベクトル及び生成された第２のベクトルを含む。ある実施形態では、第３の特徴ベクトルの生成が、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に対する主成分分析（ＰＣＡ）変換の適用にさらに基づくことができる。第３の特徴ベクトルの生成の詳細については、例えば図３で説明する。 The electronic device 102 may be further configured to generate a first feature vector associated with the received first image based on the extracted first image feature set. The electronic device 102 may be further configured to generate a second feature vector associated with the received first image based on the extracted second image feature set. The first feature vector associated with the received first image may be a vector that may include information about the first image feature set, and the second feature vector associated with the received first image may be a vector that may include information about the second image feature set. The electronic device 102 may be further configured to generate a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector. The third feature vector includes, but is not limited to, the generated first feature vector and the generated second vector. In an embodiment, the generation of the third feature vector may be further based on application of a principal component analysis (PCA) transform to the combination of the generated first feature vector and the generated second feature vector. Details of the generation of the third feature vector are described, for example, in FIG. 3.

電子装置１０２は、受け取った第１の画像に関連する生成された第３の特徴ベクトルと、（データベース１０６に記憶された画像などの）予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックを決定するように構成することができる。類似度メトリックの例としては、以下に限定するわけではないが、コサイン距離類似度又はユークリッド距離類似度を挙げることができる。電子装置１０２は、決定された類似度メトリックに基づいて、予め記憶された第２の画像セットから（受け取った第１の画像と同一又は同様の画像などの）予め記憶された第３の画像を識別するように構成することができる。電子装置１０２は、識別された予め記憶された第３の画像に関連する情報を表示するようにディスプレイ装置を制御するようさらに構成することができる。類似度メトリックの決定及び予め記憶された第３の画像の識別の詳細については、例えば図３でさらに説明する。 The electronic device 102 may be configured to determine a similarity metric between the generated third feature vector associated with the received first image and the fourth feature vector of each image of the pre-stored second set of images (e.g., images stored in the database 106). Examples of similarity metrics may include, but are not limited to, cosine distance similarity or Euclidean distance similarity. The electronic device 102 may be configured to identify a pre-stored third image (e.g., an image that is the same or similar to the received first image) from the pre-stored second set of images based on the determined similarity metric. The electronic device 102 may be further configured to control a display device to display information related to the identified pre-stored third image. Details of the determination of the similarity metric and the identification of the pre-stored third image are further described, for example, in FIG. 3.

図２は、本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的な電子装置を示すブロック図である。図２の説明は、図１の要素に関連して行う。図２には、電子装置１０２のブロック図２００を示す。電子装置１０２は、回路２０２、メモリ２０４、入力／出力（Ｉ／Ｏ）装置２０６、及びネットワークインターフェイス２０８を含むことができる。Ｉ／Ｏ装置２０６は、ディスプレイ装置２１０をさらに含むことができる。ネットワークインターフェイス２０８は、通信ネットワーク１１４を介して電子装置１０２をサーバ１０４及びデータベース１０６に接続することができる。 2 is a block diagram illustrating an exemplary electronic device for reverse image search based on a deep neural network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. The description of FIG. 2 is provided with reference to the elements of FIG. 1. FIG. 2 illustrates a block diagram 200 of an electronic device 102. The electronic device 102 may include a circuit 202, a memory 204, an input/output (I/O) device 206, and a network interface 208. The I/O device 206 may further include a display device 210. The network interface 208 may connect the electronic device 102 to the server 104 and the database 106 via the communication network 114.

回路２０２は、電子装置１０２によって実行される異なる動作に関連するプログラム命令を実行するように構成できる好適なロジック、回路及び／又はインターフェイスを含むことができる。回路２０２は、独立したプロセッサとして実装できる１又は２以上の特殊処理ユニットを含むことができる。ある実施形態では、１又は２以上の特殊処理ユニットを、１又は２以上の特殊処理ユニットの機能をまとめて実行するように構成できる統合プロセッサ又はプロセッサ群として実装することができる。回路２０２は、当業で周知の複数のプロセッサ技術に基づいて実装することができる。回路２０２の実装例は、Ｘ８６ベースのプロセッサ、グラフィックプロセッシングユニット（ＧＰＵ）、縮小命令セットコンピューティング（ＲＩＳＣ）プロセッサ、特定用途向け集積回路（ＡＳＩＣ）プロセッサ、複合命令セットコンピューティング（ＣＩＳＣ）プロセッサ、マイクロコントローラ、中央処理装置（ＣＰＵ）、及び／又はその他の制御回路とすることができる。 The circuitry 202 may include suitable logic, circuits, and/or interfaces that may be configured to execute program instructions associated with different operations performed by the electronic device 102. The circuitry 202 may include one or more specialized processing units that may be implemented as independent processors. In some embodiments, the one or more specialized processing units may be implemented as an integrated processor or processors that may be configured to collectively perform the functions of the one or more specialized processing units. The circuitry 202 may be implemented based on multiple processor technologies known in the art. Example implementations of the circuitry 202 may be an X86-based processor, a graphic processing unit (GPU), a reduced instruction set computing (RISC) processor, an application specific integrated circuit (ASIC) processor, a complex instruction set computing (CISC) processor, a microcontroller, a central processing unit (CPU), and/or other control circuitry.

メモリ２０４は、回路２０２によって実行されるプログラム命令を記憶するように構成できる好適なロジック、回路、インターフェイス及び／又はコードを含むことができる。少なくとも１つの実施形態では、メモリ２０４を、ＤＮＮモデル１０８及び画像特徴検出モデル１１０を記憶するように構成することができる。メモリ２０４は、以下に限定するわけではないが、類似度メトリック、ＤＮＮモデル１０８及び画像特徴検出モデル１１０とは異なる機械学習モデル（例えば、図３の機械学習モデル３１６）、生成された第１の特徴ベクトルに関連する第１の重み、及び生成された第２の特徴ベクトルに関連する第２の重みのうちの１つ又は２つ以上を記憶するように構成することができる。ある実施形態では、メモリ２０４が、第１の画像、及び識別された予め記憶された第３の画像を記憶することができる。メモリ２０４の実装例としては、以下に限定するわけではないが、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、電気的に消去可能なプログラマブルリードオンリメモリ（ＥＥＰＲＯＭ）、ハードディスクドライブ（ＨＤＤ）、固体ドライブ（ＳＳＤ）、ＣＰＵキャッシュ、及び／又はセキュアデジタル（ＳＤ）カードなどを挙げることができる。 The memory 204 may include suitable logic, circuitry, interfaces, and/or code that may be configured to store program instructions executed by the circuit 202. In at least one embodiment, the memory 204 may be configured to store the DNN model 108 and the image feature detection model 110. The memory 204 may be configured to store one or more of, but is not limited to, a similarity metric, a machine learning model different from the DNN model 108 and the image feature detection model 110 (e.g., the machine learning model 316 of FIG. 3), a first weight associated with the generated first feature vector, and a second weight associated with the generated second feature vector. In an embodiment, the memory 204 may store the first image and the identified pre-stored third image. Examples of implementations of memory 204 include, but are not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), hard disk drive (HDD), solid-state drive (SSD), CPU cache, and/or secure digital (SD) cards.

Ｉ／Ｏ装置２０６は、入力を受け取り、受け取った入力に基づいて出力を提供するように構成できる好適なロジック、回路、インターフェイス及び／又はコードを含むことができる。Ｉ／Ｏ装置２０６は、回路２０２と通信するように構成できる様々な入力及び出力装置を含むことができる。ある例では、電子装置１０２が、逆画像検索クエリを含むユーザ入力を（Ｉ／Ｏ装置２０６を介して）受け取ることができる。逆画像検索クエリは、第１の画像を含むことができる。別の例では、電子装置１０２が、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを含むユーザ入力を（Ｉ／Ｏ装置２０６を介して）受け取ることができる。電子装置１０２は、識別された予め記憶された第３の画像を出力するようにＩ／Ｏ装置２０６を制御することができる。Ｉ／Ｏ装置２０６の例としては、以下に限定するわけではないが、タッチ画面、キーボード、マウス、ジョイスティック、ディスプレイ装置（例えば、ディスプレイ装置２１０）、マイク、又はスピーカを挙げることができる。 The I/O device 206 may include suitable logic, circuitry, interfaces, and/or code that may be configured to receive input and provide output based on the received input. The I/O device 206 may include various input and output devices that may be configured to communicate with the circuit 202. In one example, the electronic device 102 may receive (via the I/O device 206) a user input that includes a reverse image search query. The reverse image search query may include a first image. In another example, the electronic device 102 may receive (via the I/O device 206) a user input that includes a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector. The electronic device 102 may control the I/O device 206 to output the identified pre-stored third image. Examples of the I/O device 206 may include, but are not limited to, a touch screen, a keyboard, a mouse, a joystick, a display device (e.g., the display device 210), a microphone, or a speaker.

ディスプレイ装置２１０は、電子装置１０２の出力を表示するように構成できる好適なロジック、回路及びインターフェイスを含むことができる。ディスプレイ装置２１０は、識別された予め記憶された第３の画像に関連する情報を表示するために利用することができる。いくつかの実施形態では、ディスプレイ装置２１０を、電子装置１０２に関連する外部結合ディスプレイ装置とすることができる。ディスプレイ装置２１０は、ユーザ１１６がディスプレイ装置２１０を介してユーザ入力を提供できるようにするタッチ画面とすることができる。タッチ画面は、抵抗膜式タッチ画面、静電容量式タッチ画面、熱式タッチ画面、或いはディスプレイ装置２１０に入力を提供するために使用できる他のいずれかのタッチ画面のうちの少なくとも１つとすることができる。ディスプレイ装置２１０は、以下に限定するわけではないが、液晶ディスプレイ（ＬＣＤ）ディスプレイ、発光ダイオード（ＬＥＤ）ディスプレイ、プラズマディスプレイ、又は有機ＬＥＤ（ＯＬＥＤ）ディスプレイ技術、又はその他のディスプレイ装置のうちの少なくとも１つなどの複数の既知の技術を通じて実現することができる。 The display device 210 may include suitable logic, circuitry, and interfaces that may be configured to display the output of the electronic device 102. The display device 210 may be utilized to display information related to the identified pre-stored third image. In some embodiments, the display device 210 may be an externally coupled display device associated with the electronic device 102. The display device 210 may be a touch screen that allows the user 116 to provide user input via the display device 210. The touch screen may be at least one of a resistive touch screen, a capacitive touch screen, a thermal touch screen, or any other touch screen that may be used to provide input to the display device 210. The display device 210 may be implemented through a number of known technologies, such as, but not limited to, at least one of a liquid crystal display (LCD) display, a light emitting diode (LED) display, a plasma display, or an organic LED (OLED) display technology, or other display device.

ネットワークインターフェイス２０８は、通信ネットワーク１１４を介した電子装置１０２、サーバ１０４及びデータベース１０６の間の通信を容易にするように構成できる好適なロジック、回路、インターフェイス及び／又はコードを含むことができる。ネットワークインターフェイス２０８は、様々な既知の技術を使用して通信ネットワーク１１２との間の電子装置１０２の有線又は無線通信をサポートするように実装することができる。ネットワークインターフェイス２０８は、以下に限定するわけではないが、アンテナ、無線周波数（ＲＦ）トランシーバ、１又は２以上の増幅器、チューナ、１又は２以上の発振器、デジタルシグナルプロセッサ、コーダ－デコーダ（ＣＯＤＥＣ）チップセット、加入者ＩＤモジュール（ＳＩＭ）カード、又はローカルバッファ回路を含むことができる。 The network interface 208 may include suitable logic, circuitry, interfaces, and/or code that may be configured to facilitate communication between the electronic device 102, the server 104, and the database 106 over the communications network 114. The network interface 208 may be implemented to support wired or wireless communication of the electronic device 102 to and from the communications network 112 using a variety of known technologies. The network interface 208 may include, but is not limited to, an antenna, a radio frequency (RF) transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a coder-decoder (CODEC) chipset, a subscriber identity module (SIM) card, or a local buffer circuit.

ネットワークインターフェイス２０８は、インターネット、イントラネット、無線ネットワーク、セルラー電話ネットワーク、無線ローカルエリアネットワーク（ＬＡＮ）又はメトロポリタンエリアネットワーク（ＭＡＮ）などのネットワークと有線通信、無線通信又はこれらの組み合わせを介して通信するように構成することができる。無線通信は、グローバルシステムフォーモバイルコミュニケーションズ（ＧＳＭ）、拡張データＧＳＭ環境（ＥＤＧＥ）、広帯域符号分割多重アクセス（Ｗ－ＣＤＭＡ）、ロングタームエボリューション（ＬＴＥ）、符号分割多重アクセス（ＣＤＭＡ）、時分割多重アクセス（ＴＤＭＡ）、Ｂｌｕｅｔｏｏｔｈ、（ＩＥＥＥ８０２．１１ａ、ＩＥＥＥ８０２．１１ｂ、ＩＥＥＥ８０２．１１ｇ又はＩＥＥＥ８０２．１１ｎなどの）ワイヤレスフィデリティ（ＷｉＦｉ）、ボイスオーバーインターネットプロトコル（ＶｏＩＰ）、ライトフィデリティ（Ｌｉ－Ｆｉ）、ワールドワイド・インターオペラビリティ・フォー・マイクロウェーブ・アクセス（Ｗｉ－ＭＡＸ）、電子メール用プロトコル、インスタントメッセージ、及びショートメッセージサービス（ＳＭＳ）などの複数の通信標準、プロトコル及び技術のうちの１つ又は２つ以上を使用するように構成することができる。 The network interface 208 may be configured to communicate with a network such as the Internet, an intranet, a wireless network, a cellular telephone network, a wireless local area network (LAN) or a metropolitan area network (MAN) via wired communication, wireless communication or a combination thereof. The wireless communication may be configured to use one or more of a number of communication standards, protocols, and technologies, such as Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA), Long Term Evolution (LTE), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wireless Fidelity (WiFi) (such as IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, or IEEE 802.11n), Voice over Internet Protocol (VoIP), Light Fidelity (Li-Fi), Worldwide Interoperability for Microwave Access (Wi-MAX), protocols for email, instant messaging, and short message service (SMS).

回路２０２の動作については、例えば図３及び図４でさらに説明する。なお、図２に示す電子装置１０２は、他の様々なコンポーネント又はシステムを含むこともできる。電子装置１０２の他のコンポーネント又はシステムの説明については、簡潔にするために本開示からは省略する。 The operation of circuitry 202 is further described, for example, in FIGS. 3 and 4. It should be noted that electronic device 102 shown in FIG. 2 may also include various other components or systems. Descriptions of other components or systems of electronic device 102 are omitted from this disclosure for the sake of brevity.

図３は、本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的な動作を示す図である。図３の説明は、図１及び図２の要素に関連して行う。図３には、ＤＮＮモデル１０８及び画像特徴検出モデル１１０に基づく逆画像検索のための３０２～３１４の例示的な動作を示すブロック図３００を示す。例示的な動作は、例えば図１の電子装置１０２又は図２の回路２０２などのいずれかのコンピュータシステムによって実行することができる。 FIG. 3 illustrates an exemplary operation for reverse image search based on a deep neural network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. FIG. 3 is described with reference to elements of FIGS. 1 and 2. FIG. 3 illustrates a block diagram 300 illustrating exemplary operations 302-314 for reverse image search based on a DNN model 108 and an image feature detection model 110. The exemplary operations may be performed by any computer system, such as, for example, the electronic device 102 of FIG. 1 or the circuit 202 of FIG. 2.

３０２において、第１の画像を受け取ることができる。ある実施形態では、回路２０２を、第１の画像を受け取るように構成することができる。例えば、第１の画像３０２Ａを受け取ることができる。第１の画像３０２Ａは、例えば電子装置１０２上の（メモリ２０４のような）永続記憶装置、画像取り込み装置、クラウドサーバ、又はこれらの組み合わせなどのデータソースから受け取ることができる。第１の画像３０２Ａは、ユーザ１１６が逆画像検索を使用して同様又は同一の画像結果を必要とし得る関心オブジェクトを含むことができる。或いは、第１の画像３０２Ａは、第１のビデオの一連の画像からの画像に対応することもできる。回路２０２は、第１のビデオから第１の画像３０２Ａを抽出するように構成することができる。第１のビデオは、ユーザ１１６が逆画像検索を使用して同様又は同一のビデオ結果を必要とする関心オブジェクトを含むビデオに対応することができる。第１の画像３０２Ａは、前景又は背景が固定された画像を表すことができる。例えば、図示のように、第１の画像３０２Ａは、映画のワンシーン（例えば、図３に示すような、関心オブジェクトとしてのスパイダーマンの画像）を表すことができる。 At 302, a first image may be received. In an embodiment, the circuit 202 may be configured to receive a first image. For example, a first image 302A may be received. The first image 302A may be received from a data source, such as a persistent storage device (such as memory 204) on the electronic device 102, an image capture device, a cloud server, or a combination thereof. The first image 302A may include an object of interest for which the user 116 may require similar or identical image results using a reverse image search. Alternatively, the first image 302A may correspond to an image from a series of images of a first video. The circuit 202 may be configured to extract the first image 302A from the first video. The first video may correspond to a video including an object of interest for which the user 116 may require similar or identical video results using a reverse image search. The first image 302A may represent an image with a fixed foreground or background. For example, as shown, the first image 302A may represent a scene from a movie (e.g., an image of Spider-Man as an object of interest, as shown in FIG. 3).

回路２０２は、第１の画像３０２Ａを受け取った後に、受け取った第１の画像３０２Ａを画像特徴抽出のためにＤＮＮモデル１０８及び画像特徴検出モデル１１０に入力することができる。回路２０２は、ＤＮＮモデル１０８を使用して、例えば３０４で説明するように第１の画像３０２Ａに関連する第１の画像特徴セットを抽出することができる。さらに、回路２０２は、画像特徴検出モデル１１０を使用して、例えば３０６で説明するように第１の画像３０２Ａに関連する第２の画像特徴セットを抽出することができる。動作３０４及び３０６は、本開示の範囲から逸脱することなくあらゆる順序で実行することができる。 After receiving the first image 302A, the circuit 202 can input the received first image 302A to the DNN model 108 and the image feature detection model 110 for image feature extraction. The circuit 202 can use the DNN model 108 to extract a first set of image features associated with the first image 302A, for example, as described at 304. Additionally, the circuit 202 can use the image feature detection model 110 to extract a second set of image features associated with the first image 302A, for example, as described at 306. Operations 304 and 306 can be performed in any order without departing from the scope of this disclosure.

３０４において、第１の画像特徴を抽出することができる。ある実施形態では、回路２０２を、受け取った第１の画像３０２Ａに関連する第１の画像特徴セットを（ＤＮＮモデル１０８などの）ディープニューラルネットワーク（ＤＮＮ）モデルによって抽出するように構成することができる。抽出される第１の画像特徴セットは、受け取った第１の画像３０２Ａ内の１又は２以上のオブジェクトに関連する固有の特徴に対応することができる。ＤＮＮモデル１０８は、予め記憶された訓練画像セットのうちの所定のタグが割り当てられた訓練データセット１１２に基づいて画像特徴抽出タスクのために予め訓練することができる。特定の訓練画像に割り当てられる所定のタグは、特定の訓練画像について予め決定しておくことができる画像特徴に対応するラベルを含むことができる。回路２０２は、ＤＮＮモデル１０８に第１の画像３０２Ａを入力として供給し、ＤＮＮモデル１０８によって第１の画像３０２Ａに対して実行された画像特徴検出タスクに基づいて、ＤＮＮモデル１０８からの出力として第１の画像特徴セット（すなわち、第１の画像３０２Ａに関連する画像特徴セット）を受け取ることができる。 At 304, a first image feature may be extracted. In one embodiment, the circuit 202 may be configured to extract a first set of image features associated with the received first image 302A by a deep neural network (DNN) model (such as the DNN model 108). The extracted first set of image features may correspond to unique features associated with one or more objects in the received first image 302A. The DNN model 108 may be pre-trained for an image feature extraction task based on a training data set 112 of a pre-stored set of training images to which pre-defined tags have been assigned. The pre-defined tags assigned to a particular training image may include labels corresponding to image features that may have been pre-determined for the particular training image. The circuit 202 may provide the first image 302A as an input to the DNN model 108 and receive the first set of image features (i.e., a set of image features associated with the first image 302A) as an output from the DNN model 108 based on an image feature detection task performed by the DNN model 108 on the first image 302A.

ある実施形態では、抽出される第１の画像特徴セットが、含まれている各オブジェクトを特定のオブジェクトクラスに分類するために必要とされる情報を含むことができる。抽出される第１の画像特徴セットの例としては、以下に限定するわけではないが、形状、テクスチャ、色、及びその他の高水準画像特徴を挙げることができる。例えば、図３に示すように、第１の画像３０２Ａに関連する抽出された第１の画像特徴３０４Ａは、（スパイダーマンの顔などの）関心オブジェクト上のグレイシェードとして示す色を含むことができる。例えば、第１の画像３０２Ａにおいてスパイダーマン又はその他の人物／キャラクタなどの人物の顔が関心オブジェクトである場合、第１の画像特徴セット３０４Ａは、目の形、耳の形、鼻の形、及び人物／キャラクタの他の高水準な顔の細部の形／テクスチャを含むことができる。ＤＮＮモデル１０８による第１の画像特徴セットの抽出の詳細な実装は当業者に周知であると考えられ、従ってこのような第１の画像特徴セットの抽出の詳細な説明については、簡潔にするために本開示からは省略する。 In some embodiments, the first set of extracted image features may include information required to classify each included object into a particular object class. Examples of the first set of extracted image features may include, but are not limited to, shape, texture, color, and other high-level image features. For example, as shown in FIG. 3, the extracted first image features 304A associated with the first image 302A may include color shown as a shade of gray on the object of interest (such as Spiderman's face). For example, if the face of a person, such as Spiderman or other person/character, is the object of interest in the first image 302A, the first set of image features 304A may include eye shape, ear shape, nose shape, and shape/texture of other high-level facial details of the person/character. The detailed implementation of the extraction of the first set of image features by the DNN model 108 is believed to be well known to those skilled in the art, and therefore a detailed description of such extraction of the first set of image features is omitted from this disclosure for the sake of brevity.

回路２０２は、抽出された第１の画像特徴セットに基づいて、受け取った第１の画像３０２Ａに関連する第１の特徴ベクトルを生成するように構成することができる。このような第１の特徴ベクトルは、固有の第１の画像特徴セットと呼ぶこともできる。生成された第１の特徴ベクトルは、それぞれが抽出された第１の画像特徴セットからの画像特徴に対応できる複数のベクトル要素を含むことができる。第１の特徴ベクトルの各ベクトル要素は、第１の画像特徴セットからの特定の第１の画像特徴に対応できる値を記憶することができる。例えば、受け取った第１の画像３０２Ａが高精細画像（例えば、「１０２４×１０２４」画素の画像）である場合、第１の特徴ベクトルは、２０４８個のベクトル要素を有する「１×２０４８」ベクトルであることができる。第１の特徴ベクトルのｉ番目の要素は、ｉ番目の第１の画像特徴の値を表すことができる。 The circuit 202 may be configured to generate a first feature vector associated with the received first image 302A based on the extracted first image feature set. Such a first feature vector may also be referred to as a unique first image feature set. The generated first feature vector may include a plurality of vector elements, each of which may correspond to an image feature from the extracted first image feature set. Each vector element of the first feature vector may store a value that may correspond to a particular first image feature from the first image feature set. For example, if the received first image 302A is a high definition image (e.g., an image of "1024x1024" pixels), the first feature vector may be a "1x2048" vector having 2048 vector elements. The i-th element of the first feature vector may represent the value of the i-th first image feature.

３０６において、第２の画像特徴セットを抽出することができる。ある実施形態では、回路２０２を、受け取った第１の画像３０２Ａに関連する第２の画像特徴セットを（画像特徴検出モデル１１０などの）画像特徴検出モデルによって抽出するように構成することができる。抽出される第２の画像特徴セットは、受け取った第１の画像３０２Ａに含まれる１又は２以上のオブジェクトに関連する特定の固有の特徴に対応することができる。いくつかの実施形態では、第２の画像特徴セットは、（例えば、３０４において）ＤＮＮモデル１０８によって誤検出された又は未検出のままである可能性がある画像特徴とすることができる。ある実施形態では、抽出される第２の画像特徴セットが、（第１の画像３０２Ａ内の）各オブジェクトを特定のオブジェクトクラスに最適に分類するために必要とされる情報を含むことができる。第２の画像特徴セットの例としては、以下に限定するわけではないが、エッジ、ライン、輪郭及びその他の低水準画像特徴を挙げることができる。画像特徴検出モデル１１０の例としては、以下に限定するわけではないが、スケール不変特徴変換（ＳＩＦＴ）ベースのモデル、高速化ロバスト特徴（ＳＵＲＦ）ベースのモデル、方向付きＦＡＳＴ及び回転ＢＲＩＥＦ（ＯＲＢ）ベースのモデル、又は近似最近傍のための高速ライブラリ（ＦＬＡＮＮ）ベースのモデルを挙げることができる。これらの例示的な方法の詳細な実装は当業者に周知であると考えられ、従ってこのような方法の詳細な説明については、簡潔にするために本開示からは省略する。例えば、図３に示すように、第１の画像３０２Ａに関連する第２の画像特徴セット３０６Ａは、ＳＩＦＴベースのモデルに基づいて抽出された第２の画像特徴セットを示し、第１の画像３０２Ａに関連する第２の画像特徴セット３０６Ｂは、ＳＵＲＦベースのモデルに基づいて抽出された第２の画像特徴セットを示す。例えば、第１の画像３０２Ａ内のスパイダーマン又は他のいずれかの人物／キャラクタなどの人物の顔が関心オブジェクトである場合、第２の画像特徴セット３０６Ａ（又は第２の画像特徴セット３０６Ｂ）は、目のエッジ及び輪郭、耳のエッジ及び輪郭、鼻のエッジ及び輪郭、並びに人物／キャラクタの他の低水準な顔の詳細を含むことができる。 At 306, a second set of image features may be extracted. In some embodiments, the circuit 202 may be configured to extract a second set of image features associated with the received first image 302A by an image feature detection model (such as the image feature detection model 110). The extracted second set of image features may correspond to certain unique features associated with one or more objects contained in the received first image 302A. In some embodiments, the second set of image features may be image features that may have been misdetected or remain undetected by the DNN model 108 (e.g., at 304). In some embodiments, the extracted second set of image features may include information required to optimally classify each object (in the first image 302A) into a particular object class. Examples of the second set of image features may include, but are not limited to, edges, lines, contours, and other low-level image features. Examples of the image feature detection model 110 may include, but are not limited to, a scale-invariant feature transform (SIFT)-based model, a speed-up robust features (SURF)-based model, an oriented FAST and rotated BRIEF (ORB)-based model, or a fast library for approximate nearest neighbors (FLANN)-based model. Detailed implementations of these exemplary methods are believed to be well known to those skilled in the art, and therefore detailed descriptions of such methods are omitted from this disclosure for the sake of brevity. For example, as shown in FIG. 3, a second image feature set 306A associated with a first image 302A represents a second image feature set extracted based on a SIFT-based model, and a second image feature set 306B associated with a first image 302A represents a second image feature set extracted based on a SURF-based model. For example, if the face of a person, such as Spider-Man or any other person/character in the first image 302A, is the object of interest, the second image feature set 306A (or the second image feature set 306B) may include eye edges and contours, ear edges and contours, nose edges and contours, and other low level facial details of the person/character.

回路２０２は、抽出された第２の画像特徴セットに基づいて、受け取った第１の画像３０２Ａに関連する第２の特徴ベクトルを生成するようにさらに構成することができる。このような第２の特徴ベクトルは、固有の第２の画像特徴セットと呼ぶこともできる。生成された第２の特徴ベクトルは、それぞれが抽出された第２の画像特徴セットからの画像特徴に対応できる複数のベクトル要素を含むことができる。第２の特徴ベクトルの各ベクトル要素は、第２の画像特徴セットからの特定の第２の画像特徴に対応できる値を記憶することができる。例えば、受け取った第１の画像３０２Ａが高精細画像（例えば、「１０２４×１０２４」画素の画像）である場合、第２の特徴ベクトルは、２０４８個のベクトル要素を有する「１×２０４８」ベクトルであることができる。第２の特徴ベクトルのｉ番目の要素は、ｉ番目の第２の画像特徴の値を表すことができる。 The circuit 202 may further be configured to generate a second feature vector associated with the received first image 302A based on the extracted second image feature set. Such a second feature vector may also be referred to as a unique second image feature set. The generated second feature vector may include a plurality of vector elements, each of which may correspond to an image feature from the extracted second image feature set. Each vector element of the second feature vector may store a value that may correspond to a particular second image feature from the second image feature set. For example, if the received first image 302A is a high definition image (e.g., an image of "1024x1024" pixels), the second feature vector may be a "1x2048" vector having 2048 vector elements. The i-th element of the second feature vector may represent the value of the i-th second image feature.

３０８において、特徴ベクトルを結合することができる。ある実施形態では、回路２０２を、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像３０２Ａに関連する第３の特徴ベクトルを生成するように構成することができる。ある実施形態では、回路２０２を、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとを自動的に結合するように構成することができる。例えば、受け取った第１の画像３０２Ａが高精細画像（例えば、「１０２４×１０２４」画素の画像）である場合、第１の特徴ベクトルは、２０４８個のベクトル要素を有する「１×２０４８」ベクトルであることができ、第２の特徴ベクトルも、２０４８個のベクトル要素を有する「１×２０４８」ベクトルであることができる。このような場合、第３の特徴ベクトルは、４０９６個のベクトル要素を有する「１×４０９６」ベクトルであることができる。 At 308, the feature vectors can be combined. In some embodiments, the circuit 202 can be configured to generate a third feature vector associated with the received first image 302A based on combining the generated first feature vector and the generated second feature vector. In some embodiments, the circuit 202 can be configured to automatically combine the generated first feature vector and the generated second feature vector. For example, if the received first image 302A is a high definition image (e.g., an image of "1024 x 1024" pixels), the first feature vector can be a "1 x 2048" vector having 2048 vector elements, and the second feature vector can also be a "1 x 2048" vector having 2048 vector elements. In such a case, the third feature vector can be a "1 x 4096" vector having 4096 vector elements.

ある実施形態では、回路２０２を、機械学習モデル３１６（すなわち、ＤＮＮモデル１０８及び画像特徴検出モデル１１０とは異なる機械学習モデル）によって、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを決定するように構成することができる。機械学習モデル３１６は、同様のサイズの特徴ベクトルセットに基づいて訓練できる回帰モデルとすることができ、各ベクトルにはユーザが定義した重み値をタグ付けすることができる。機械学習モデル３１６は、特徴ベクトル重み割り当てタスクに基づいて訓練することができ、ここでは、機械学習モデル３１６が、２つの同様のサイズの特徴ベクトルセットを入力として受け取り、２つの同様のサイズの特徴ベクトルセットの各々の重みを出力することができる。機械学習モデル３１６は、例えば重みの数、コスト関数、入力サイズ及び層の数などのハイパーパラメータによって定めることができる。機械学習モデル３１６のハイパーパラメータは、機械学習モデル３１６のコスト関数の大域的極小値に向かうように調整することができ、重みもそのように更新することができる。機械学習モデル３１６は、訓練データセット内の特徴情報に基づく数エポックの訓練後に、入力セットの重み値を出力するように訓練することができる。この出力は、入力セットの各入力（例えば、第１の特徴ベクトル及び第２の特徴ベクトル）の重み値を示すことができる。 In one embodiment, the circuit 202 can be configured to determine a first weight associated with the first generated feature vector and a second weight associated with the second generated feature vector by a machine learning model 316 (i.e., a machine learning model different from the DNN model 108 and the image feature detection model 110). The machine learning model 316 can be a regression model that can be trained based on a set of similarly sized feature vectors, where each vector can be tagged with a user-defined weight value. The machine learning model 316 can be trained based on a feature vector weight assignment task, where the machine learning model 316 can receive two similarly sized feature vector sets as inputs and output weights for each of the two similarly sized feature vector sets. The machine learning model 316 can be defined by hyperparameters, such as the number of weights, the cost function, the input size, and the number of layers. The hyperparameters of the machine learning model 316 can be adjusted to move toward a global minimum of the cost function of the machine learning model 316, and the weights can be updated accordingly. The machine learning model 316 can be trained to output weight values for an input set after several epochs of training based on feature information in a training data set. This output can indicate a weight value for each input in the input set (e.g., the first feature vector and the second feature vector).

機械学習モデル３１６は、例えば電子装置１０２上で実行可能なアプリケーションのソフトウェアコンポーネントとして実装できる電子データを含むことができる。機械学習モデル３１６は、回路２０２などのプロセッサを含むコンピュータ装置による実行のためにライブラリ、外部スクリプト又は他のロジック／命令に依拠することができる。機械学習モデル３１６は、第１の特徴ベクトルに関連する第１の重みの決定、及び第２の特徴ベクトルに関連する第２の重みの決定のための１又は２以上の動作を回路２０２などのプロセッサを含むコンピュータ装置が実行できるようにするよう構成されたコード及びルーチンを含むことができる。これに加えて又は代えて、機械学習モデル３１６は、プロセッサ、（例えば、１又は２以上の動作の実行又はその制御を行う）マイクロプロセッサ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、又は特定用途向け集積回路（ＡＳＩＣ）を含むハードウェアを使用して実装することもできる。或いは、いくつかの実施形態では、機械学習モデル３１６を、ハードウェアとソフトウェアとの組み合わせを使用して実装することができる。 The machine learning model 316 may include electronic data that may be implemented, for example, as a software component of an application executable on the electronic device 102. The machine learning model 316 may rely on libraries, external scripts, or other logic/instructions for execution by a computing device including a processor, such as the circuit 202. The machine learning model 316 may include code and routines configured to enable a computing device including a processor, such as the circuit 202, to perform one or more operations for determining a first weight associated with a first feature vector and determining a second weight associated with a second feature vector. Additionally or alternatively, the machine learning model 316 may be implemented using hardware, including a processor, a microprocessor (e.g., performing or controlling one or more operations), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC). Alternatively, in some embodiments, the machine learning model 316 may be implemented using a combination of hardware and software.

生成された第１の特徴ベクトルに関連する第１の重み及び生成された第２の特徴ベクトルに関連する第２の重みの各々は、画像内の関心オブジェクトの識別、従って関心オブジェクトを含むことができる類似する画像の識別のためのそれぞれの特徴ベクトルの信頼性の尤度を示すことができる。第１の重み及び第２の重みは、信頼性のための信頼値を（０～１の確率値で）指定することができる。従って、信頼性の高い特徴ベクトルほど高い重み値を有することができる。例えば、受け取った第１の画像３０２Ａが高解像度画像であり、抽出された第１の画像特徴セットの方が抽出された第２の画像特徴セットよりも精密である場合、生成された第１の特徴ベクトルは生成された第２の特徴ベクトルよりも高い信頼性を有することができる。このような例では、生成された第１の特徴ベクトルに関連する第１の重みが、生成された第２の特徴ベクトルに関連する第２の重み（例えば、０．４値）と比べて高い重み値（例えば、０．６値）を有することができる。対照的に、受け取った第１の画像３０２Ａが低解像度画像であり、抽出された第２の画像特徴セットの方が抽出された第１の画像特徴セットよりも精密である場合、生成された第２の特徴ベクトルは、生成された第１の特徴ベクトルよりも高い信頼性を有することができる。このような例では、生成された第１の特徴ベクトルに関連する第１の重みが、生成された第２の特徴ベクトルに関連する第２の重み（例えば、０．６値）と比べて低い重み値（例えば、０．４値）を有することができる。さらに、受け取った第１の画像３０２Ａが中解像度画像（例えば、標準解像度画像）である場合、生成された第１の特徴ベクトルに関連する第１の重みは、生成された第２の特徴ベクトルに関連する第２の重み（例えば、０．５値）と比べて等しい重み値（例えば、０．５）を有することができる。回路２０２は、決定された第１の重み及び決定された第２の重みに基づいて、生成された第１の特徴ベクトル及び生成された第２の特徴ベクトルを結合し、この結合に基づいて第３の特徴ベクトルをさらに生成するようにさらに構成することができる。 Each of the first weight associated with the generated first feature vector and the second weight associated with the generated second feature vector may indicate a likelihood of the reliability of the respective feature vector for identifying an object of interest in an image and thus identifying similar images that may contain the object of interest. The first weight and the second weight may specify a confidence value for the reliability (with a probability value between 0 and 1). Thus, a more reliable feature vector may have a higher weight value. For example, if the received first image 302A is a high-resolution image and the extracted first image feature set is more precise than the extracted second image feature set, the generated first feature vector may have a higher reliability than the generated second feature vector. In such an example, the first weight associated with the generated first feature vector may have a higher weight value (e.g., a value of 0.6) compared to the second weight associated with the generated second feature vector (e.g., a value of 0.4). In contrast, if the received first image 302A is a low-resolution image and the extracted second image feature set is more precise than the extracted first image feature set, the generated second feature vector may have a higher reliability than the generated first feature vector. In such an example, the first weight associated with the generated first feature vector may have a lower weight value (e.g., a 0.4 value) compared to the second weight associated with the generated second feature vector (e.g., a 0.6 value). Furthermore, if the received first image 302A is a medium resolution image (e.g., a standard resolution image), the first weight associated with the generated first feature vector may have an equal weight value (e.g., 0.5) compared to the second weight associated with the generated second feature vector (e.g., a 0.5 value). The circuit 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight, and further generate a third feature vector based on the combination.

ある実施形態では、回路２０２を、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを含むユーザ入力を受け取るように構成することができる。ある例では、受け取ったユーザ入力が、生成された第１の特徴ベクトルに関連する第１の重みを「０．４」として示すことができ、この結果、回路２０２は、生成された第２の特徴ベクトルに関連する第２の重みを「０．６」として決定するように構成することができる。別の例では、受け取ったユーザ入力が、生成された第１の特徴ベクトルに関連する第１の重みを「０．５」として示し、生成された第２の特徴ベクトルに関連する第２の重みを「０．５」として示すことができる。回路２０２は、受け取ったユーザ入力に基づいて、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとを結合し、この結合に基づいて第３の特徴ベクトルをさらに生成するようにさらに構成することができる。 In one embodiment, the circuit 202 can be configured to receive a user input including a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector. In one example, the received user input can indicate the first weight associated with the generated first feature vector as "0.4", such that the circuit 202 can be configured to determine the second weight associated with the generated second feature vector as "0.6". In another example, the received user input can indicate the first weight associated with the generated first feature vector as "0.5" and the second weight associated with the generated second feature vector as "0.5". The circuit 202 can be further configured to combine the generated first feature vector and the generated second feature vector based on the received user input, and further generate a third feature vector based on the combination.

ある実施形態では、回路２０２を、ＤＮＮモデル１０８によって、受け取った第１の画像３０２ＡをＤＮＮモデル１０８に関連する画像タグの組からの第１の画像タグに分類するように構成することができる。第１の画像タグは、受け取った第１の画像３０２Ａが属することができる画像タグを指定することができる。例えば、受け取った第１の画像３０２Ａは、スパイダーマンのキャラクタなどの画像タグを有することができる。回路２０２は、ＤＮＮモデル１０８に関連する（訓練データセット１１２などの）訓練データセット内で、第１の画像タグに関連する第１の画像カウントを決定するようにさらに構成することができる。例えば、回路２０２は、第１の画像タグを、訓練データセット１１２内の予め記憶された訓練画像セットからの予め記憶された訓練画像の画像タグと比較することができる。訓練データセット１１２内の予め記憶された訓練画像の画像タグが第１の画像タグに一致する場合、回路２０２は、第１の画像タグに関連する第１の画像カウントを１だけ増分する。同様に、回路２０２は、訓練データセット１１２内の予め記憶された訓練画像の各々の画像タグと第１の画像タグとの比較に基づいて第１の画像カウントを決定することができる。回路２０２は、第１の画像タグに関連する決定された第１の画像カウントに基づいて、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを決定するようにさらに構成することができる。例えば、訓練データセット１１２内の第１の画像カウントが一定の閾値（例えば、訓練データセット１１２内の全画像の閾値カウント又はパーセンテージ）よりも高い場合、生成された第１の特徴ベクトルに関連する第１の重みは、生成された第２の特徴ベクトルに関連する第２の重みと比べて高い重み値を有することができる。対照的に、訓練データセット１１２内の第１の画像カウントが閾値又は公称値（例えば、１００万の画像の訓練データセット１１２内の数百の画像などの容易に無視できる値）よりも低い場合、生成された第１の特徴ベクトルに関連する第１の重みは、生成された第２の特徴ベクトルに関連する第２の重みと比べて低い重み値を有することができる。回路２０２は、決定された第１の重み及び決定された第２の重みに基づいて、生成された第１の特徴ベクトル及び生成された第２の特徴ベクトルを結合して第３の特徴ベクトルを生成し、この結合に基づいて第３の特徴ベクトルをさらに生成するように構成することができる。 In an embodiment, the circuit 202 can be configured to classify the received first image 302A by the DNN model 108 to a first image tag from a set of image tags associated with the DNN model 108. The first image tag can specify an image tag to which the received first image 302A can belong. For example, the received first image 302A can have an image tag such as a Spider-Man character. The circuit 202 can be further configured to determine a first image count associated with the first image tag in a training dataset (such as the training dataset 112) associated with the DNN model 108. For example, the circuit 202 can compare the first image tag to image tags of pre-stored training images from a set of pre-stored training images in the training dataset 112. If the image tag of the pre-stored training image in the training dataset 112 matches the first image tag, the circuit 202 increments the first image count associated with the first image tag by one. Similarly, the circuit 202 may determine a first image count based on a comparison of the image tag of each of the pre-stored training images in the training dataset 112 with the first image tag. The circuit 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first image count associated with the first image tag. For example, if the first image count in the training dataset 112 is higher than a certain threshold (e.g., a threshold count or percentage of all images in the training dataset 112), the first weight associated with the generated first feature vector may have a high weight value compared to the second weight associated with the generated second feature vector. In contrast, if the first image count in the training dataset 112 is lower than a threshold or nominal value (e.g., a value that is easily negligible, such as a few hundred images in a training dataset 112 of one million images), the first weight associated with the generated first feature vector may have a low weight value compared to the second weight associated with the generated second feature vector. The circuit 202 may be configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight to generate a third feature vector, and further generate a third feature vector based on the combination.

ある実施形態では、回路２０２を、抽出された第１の画像特徴セット又は抽出された第２の画像特徴セットの少なくとも一方に基づいて、受け取った第１の画像３０２Ａに関連する画質スコアを決定するように構成することができる。画質スコアは、受け取った第１の画像３０２Ａの忠実度に関連する定性的値を示すことができる。画質スコアが高ければ高いほど、受け取った第１の画像３０２Ａの忠実度が高いことを示すことができる。画質スコアは、以下に限定するわけではないが、受け取った第１の画像３０２Ａに関連する鮮明さ、ノイズ、ダイナミックレンジ、階調再現（ｔｏｎｅｒｅｐｒｏｄｕｃｔｉｏｎ）、コントラスト、彩度、歪曲収差、口径食（ｖｉｇｎｅｔｔｉｎｇ）、露光精度、色収差、レンズフレア、色モアレ（ｃｏｌｏｒｍｏｉｒｅ）、又はアーチファクトに対応することができる。鮮明さは、受け取った第１の画像３０２Ａに関連する画像特徴に関連する詳細に対応することができる。例えば、受け取った第１の画像３０２Ａの画素数又はフォーカスが高い場合、受け取った第１の画像３０２Ａの鮮明さは高いものであることができる。ノイズは、受け取った第１の画像３０２Ａにおける画素レベルでの望ましくない変動などの、受け取った第１の画像３０２Ａにおける外乱に対応することができる。ダイナミックレンジは、受け取った第１の画像３０２Ａ内に取り込まれた光の最も明るい陰影と最も暗い陰影との間の階調差（ｔｏｎａｌｄｉｆｆｅｒｅｎｃｅ）の量に対応することができる。階調再現は、受け取った第１の画像３０２Ａ内に取り込まれた光の量と、受け取った第１の画像３０２Ａが曝される光の量との間の相関に対応することができる。コントラストは、受け取った第１の画像３０２Ａにおける色変化の量に対応することができる。彩度は、受け取った第１の画像３０２Ａにおける色の強さに対応することができる。歪曲収差は、受け取った第１の画像３０２Ａにおける望ましくない画素変化に対応することができる。口径食は、受け取った第１の画像３０２Ａの中央部と比較した、受け取った第１の画像３０２Ａの隅部からの黒化、鮮明さの低下又は彩度の低下に対応することができる。露光精度は、受け取った第１の画像３０２Ａを最適な明度で取り込むことに対応することができる。色収差は、受け取った第１の画像３０２Ａにおける色の歪みに対応することができる。レンズフレアは、明るい光に対する画像取り込み装置の反応に対応することができる。色モアレは、受け取った第１の画像３０２Ａに現れる反復的色縞（ｒｅｐｅｔｉｔｉｖｅｃｏｌｏｒｓｔｒｉｐｅｓ）に対応することができる。受け取った第１の画像３０２Ａに関連するアーチファクトは、受け取った第１の画像３０２Ａ内に存在し得るいずれかの仮想オブジェクトに対応することができる。 In some embodiments, the circuit 202 may be configured to determine an image quality score associated with the received first image 302A based on at least one of the extracted first set of image features or the extracted second set of image features. The image quality score may indicate a qualitative value associated with the fidelity of the received first image 302A. A higher image quality score may indicate a higher fidelity of the received first image 302A. The image quality score may correspond to, but is not limited to, sharpness, noise, dynamic range, tone reproduction, contrast, saturation, distortion, vignetting, exposure accuracy, chromatic aberration, lens flare, color moire, or artifacts associated with the received first image 302A. Sharpness may correspond to details associated with image features associated with the received first image 302A. For example, if the pixel count or focus of the received first image 302A is high, the sharpness of the received first image 302A may be high. Noise can correspond to disturbances in the received first image 302A, such as undesired variations at the pixel level in the received first image 302A. Dynamic range can correspond to the amount of tonal difference between the lightest and darkest shades of light captured in the received first image 302A. Tonal reproduction can correspond to the correlation between the amount of light captured in the received first image 302A and the amount of light to which the received first image 302A is exposed. Contrast can correspond to the amount of color variation in the received first image 302A. Saturation can correspond to the intensity of colors in the received first image 302A. Distortion can correspond to undesired pixel variations in the received first image 302A. Vignetting can correspond to darkening, reduced sharpness, or reduced saturation from the corners of the received first image 302A compared to the center of the received first image 302A. Exposure accuracy can correspond to capturing the received first image 302A at an optimal brightness. Chromatic aberration may correspond to color distortions in the received first image 302A. Lens flare may correspond to the response of the image capture device to bright light. Color moiré may correspond to repetitive color stripes appearing in the received first image 302A. Artifacts associated with the received first image 302A may correspond to any virtual objects that may be present in the received first image 302A.

回路２０２は、決定された画質スコアに基づいて、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを決定するようにさらに構成することができる。例えば、決定された画質スコアが高い場合、生成された第１の特徴ベクトルに関連する第１の重みには、生成された第２の特徴ベクトルに関連する第２の重みと比べて高い重み値を割り当てることができる。対照的に、決定された画質スコアが低い又はわずか（ｎｏｍｉｎａｌ）である場合、生成された第１の特徴ベクトルに関連する第１の重みには、生成された第２の特徴ベクトルに関連する第２の重みと比べて低い重み値を割り当てることができる。ある実施形態では、画質スコアが閾値を上回る場合、決定される第１の重みは決定される第２の重みよりも高いことができる。閾値は、例えば「０．４」、「０．６」及び「０．８」などの画質スコアを含むことができる。ある実施形態では、回路２０２を、画質スコアの閾値を設定するユーザ入力を受け取るように構成することができる。別の実施形態では、回路２０２を、画質スコアの閾値を自動的に設定するように構成することができる。回路２０２は、決定された第１の重み及び決定された第２の重みに基づいて、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとを結合するように構成することができる。その後、回路２０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて第３の特徴ベクトルを生成するようにさらに構成することができる。 The circuit 202 may further be configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score. For example, if the determined image quality score is high, the first weight associated with the generated first feature vector may be assigned a higher weight value than the second weight associated with the generated second feature vector. In contrast, if the determined image quality score is low or nominal, the first weight associated with the generated first feature vector may be assigned a lower weight value than the second weight associated with the generated second feature vector. In an embodiment, if the image quality score is above a threshold, the determined first weight may be higher than the determined second weight. The threshold may include image quality scores such as, for example, "0.4", "0.6", and "0.8". In an embodiment, the circuit 202 may be configured to receive a user input to set the image quality score threshold. In another embodiment, the circuit 202 may be configured to automatically set the image quality score threshold. The circuitry 202 may be configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight. The circuitry 202 may then be further configured to generate a third feature vector based on the combination of the generated first feature vector and the generated second feature vector.

３１０において、次元性を低減することができる。ある実施形態では、回路２０２を、生成された第３の特徴ベクトルの次元性を低減するように構成することができる。いくつかの実施形態では、回路２０２が、生成された第３の特徴ベクトルを特徴抽出器の入力層のサイズに一致するようにサイズ変更（又は圧縮）し、サイズ変更された生成された第３の特徴ベクトルを特徴抽出器の入力層に受け渡すことができる。これにより、生成された第３の特徴ベクトルから望ましくない情報又は反復的情報を低減することができる。第３の特徴ベクトルの生成は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に対する主成分分析（ＰＣＡ）変換の適用にさらに基づくことができる。例えば、生成された第３のベクトルが４０９６個のベクトル要素を有する「１×４０９６」ベクトルである場合、ＰＣＡ変換の適用後には、生成された第３のベクトルを、２５６個のベクトル要素を有する「１×２５６」ベクトルに縮小することができる。ＰＣＡ変換の詳細な実装は当業者に周知であると考えられ、従ってこのような変換の詳細な説明については、簡潔にするために本開示からは省略する。 At 310, the dimensionality can be reduced. In an embodiment, the circuit 202 can be configured to reduce the dimensionality of the generated third feature vector. In some embodiments, the circuit 202 can resize (or compress) the generated third feature vector to match the size of the input layer of the feature extractor and pass the resized generated third feature vector to the input layer of the feature extractor. This can reduce undesirable or repetitive information from the generated third feature vector. The generation of the third feature vector can be further based on the application of a Principal Component Analysis (PCA) transform to the combination of the generated first feature vector and the generated second feature vector. For example, if the generated third vector is a "1x4096" vector having 4096 vector elements, after application of the PCA transform, the generated third vector can be reduced to a "1x256" vector having 256 vector elements. The detailed implementation of the PCA transform is believed to be well known to those skilled in the art, and therefore a detailed description of such a transform is omitted from this disclosure for the sake of brevity.

３１２において、類似度メトリックを決定することができる。ある実施形態では、回路２０２を、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルと、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックを決定するように構成することができる。予め記憶された第２の画像セットはデータベース１０６に記憶することができる。ある実施形態では、回路２０２を、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルを生成するように構成することができる。例えば、回路２０２は、予め記憶された第２の画像セットの各画像にＤＮＮモデル１０８、画像特徴検出モデル１１０、又はこれらの両方の組み合わせを適用して、それぞれの予め記憶された第２の画像の第４の特徴ベクトルを生成することができる。別の例では、各それぞれの予め記憶された第２の画像の第４の特徴ベクトルを予め決定して、それぞれの予め記憶された第２の画像と共にデータベース１０６に予め記憶しておくことができる。類似度メトリックは、受け取った第１の画像３０２Ａに類似する画像を予め記憶された第２の画像セットから決定するための類似度尺度に対応することができる。このような事例では、類似する画像を識別するために、決定された類似度メトリックに基づいて、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルを予め記憶された第２の画像セットの各画像の第４の特徴ベクトルと比較することができる。類似度メトリックの例としては、以下に限定するわけではないが、コサイン距離類似度又はユークリッド距離類似度を挙げることができる。コサイン距離類似度では、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルと、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間のコサイン距離を決定することができる。例えば、特定の予め記憶された第２の画像の第４の特徴ベクトルが生成された第３のベクトルに対して小さなコサイン距離を有する場合、この特定の予め記憶された第２の画像を、受け取った第１の画像３０２Ａに類似する画像のうちの１つとして識別することができる。 At 312, a similarity metric may be determined. In an embodiment, the circuit 202 may be configured to determine a similarity metric between the generated third feature vector associated with the received first image 302A and a fourth feature vector of each image of the pre-stored second set of images. The pre-stored second set of images may be stored in the database 106. In an embodiment, the circuit 202 may be configured to generate a fourth feature vector of each image of the pre-stored second set of images. For example, the circuit 202 may apply the DNN model 108, the image feature detection model 110, or a combination of both to each image of the pre-stored second set of images to generate a fourth feature vector of each pre-stored second image. In another example, the fourth feature vector of each respective pre-stored second image may be pre-determined and pre-stored in the database 106 with the respective pre-stored second image. The similarity metric may correspond to a similarity measure for determining images from the pre-stored second set of images that are similar to the received first image 302A. In such a case, the generated third feature vector associated with the received first image 302A can be compared to the fourth feature vector of each image of the pre-stored second set of images based on the determined similarity metric to identify similar images. Examples of similarity metrics can include, but are not limited to, cosine distance similarity or Euclidean distance similarity. In cosine distance similarity, a cosine distance between the generated third feature vector associated with the received first image 302A and the fourth feature vector of each image of the pre-stored second set of images can be determined. For example, if the fourth feature vector of a particular pre-stored second image has a small cosine distance to the generated third vector, the particular pre-stored second image can be identified as one of the images similar to the received first image 302A.

３１４において、類似する画像を識別することができる。ある実施形態では、回路２０２を、決定された類似度メトリックに基づいて、予め記憶された第３の画像を予め記憶された第２の画像セットからの類似する画像として識別するように構成することができる。例えば、回路２０２は、類似度メトリックに基づいて、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルを、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルと比較することができる。類似度メトリックに基づいて、特定の予め記憶された第２の画像の第４の特徴ベクトルが、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルに一致すると判定された場合、回路２０２は、予め記憶された第２の画像セットからの特定の予め記憶された第２の画像を予め記憶された第３の画像（すなわち、類似する画像）として識別することができる。 At 314, similar images can be identified. In an embodiment, the circuit 202 can be configured to identify the pre-stored third image as a similar image from the pre-stored second set of images based on the determined similarity metric. For example, the circuit 202 can compare the generated third feature vector associated with the received first image 302A to the fourth feature vector of each image of the pre-stored second set of images based on the similarity metric. If it is determined that the fourth feature vector of the particular pre-stored second image matches the generated third feature vector associated with the received first image 302A based on the similarity metric, the circuit 202 can identify the particular pre-stored second image from the pre-stored second set of images as a pre-stored third image (i.e., a similar image).

回路２０２は、識別された予め記憶された第３の画像に関連する情報を表示するように（ディスプレイ装置２１０などの）ディスプレイ装置を制御するようさらに構成することができる。識別された予め記憶された第３の画像に関連する情報は、以下に限定するわけではないが、予め記憶された第３の画像自体、予め記憶された第３の画像に関連するメタデータ、第３の特徴ベクトルと第４の特徴ベクトルとの間の特徴マップ、予め記憶された第３の画像のファイルサイズ、予め記憶された第３の画像に関連する記憶位置、又は予め記憶された第３の画像に関連するファイルダウンロード経路などの情報を含むことができる。ある実施形態では、識別された予め記憶された第３の画像が、予め記憶された第２のビデオに対応することができる。予め記憶された第２のビデオは第１のビデオに関連することができる。例えば、予め記憶された第３の画像は、予め記憶された第２のビデオ内の画像フレームセットからの画像フレームのうちの１つとすることができる。ある実施形態では、第１のビデオから第１の画像３０２Ａを抽出することができる。予め記憶された第３の画像は、受け取った第１の画像３０２Ａに関連又は類似することができるので、予め記憶された第２のビデオは、第１のビデオに関連又は類似することができる。 The circuit 202 may further be configured to control a display device (such as the display device 210) to display information related to the identified pre-stored third image. The information related to the identified pre-stored third image may include information such as, but not limited to, the pre-stored third image itself, metadata related to the pre-stored third image, a feature map between the third feature vector and the fourth feature vector, a file size of the pre-stored third image, a storage location associated with the pre-stored third image, or a file download path associated with the pre-stored third image. In an embodiment, the identified pre-stored third image may correspond to a pre-stored second video. The pre-stored second video may be associated with the first video. For example, the pre-stored third image may be one of the image frames from a set of image frames in the pre-stored second video. In an embodiment, the first image 302A may be extracted from the first video. The pre-stored third image can be related or similar to the received first image 302A, and the pre-stored second video can be related or similar to the first video.

図３に示す例として、ＤＮＮモデル１０８及び（ＳＩＦＴベースのモデルなどの）画像特徴検出モデル１１０に基づいて識別できる識別された予め記憶された第３の画像に関連する情報３１４Ａを示す。情報３１４Ａは、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルと、識別された予め記憶された第３の画像に関連する第４の特徴ベクトルとの間の特徴マップを含むことができる。図３に示す例として、ＤＮＮモデル１０８及び（ＳＵＲＦベースのモデルなどの）画像特徴検出モデル１１０に基づいて識別できる識別された予め記憶された第３の画像に関連する情報３１４Ｂをさらに示す。情報３１４Ｂも、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルと、識別された予め記憶された第３の画像に関連する第４の特徴ベクトルとの間の特徴マップを含むことができる。 3 shows information 314A related to an identified pre-stored third image that can be identified based on the DNN model 108 and the image feature detection model 110 (e.g., a SIFT-based model). The information 314A can include a feature map between a generated third feature vector related to the received first image 302A and a fourth feature vector related to the identified pre-stored third image. ...

上述したように、開示する電子装置１０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像３０２Ａに関連する第３の特徴ベクトルを自動的に生成することができる。この結果、第３の特徴ベクトルは、ＤＮＮモデル１０８によって決定できる第１の画像特徴セットと、画像特徴検出モデル１１０によって決定できる第２の画像特徴セットとを含むことができる。第１の画像特徴セットは、受け取った第１の画像３０２Ａ（例えば、人物の顔の画像）に関連する高水準な画像特徴（例えば、目、鼻、耳、髪などの顔の特徴）を含むことができ、第２の画像特徴セットは、低水準な画像特徴（例えば、点、エッジ、線、輪郭、又は顔の基本オブジェクト及び形状）を含むことができる。高水準画像特徴及び低水準画像特徴の両方を第３の特徴ベクトルに含めることで、類似する画像の識別を補完し合うことができる。あるシナリオでは、第１の画像特徴セットが、受け取った第１の画像３０２Ａ内に存在し得る全ての特徴を検出して抽出しないことがある。例えば、いくつか特徴がＤＮＮモデル１０８によって誤検出され又は未検出のままの場合がある。例えば、受け取った第１の画像３０２ＡがＤＮＮモデル１０８の訓練データセット１１２内で十分に表現されていない画像である場合、第１の画像特徴セットは、受け取った第１の画像に類似する画像を予め記憶された第２の画像セットから識別できるほど十分なものでない可能性がある。しかしながら、第２の画像特徴セット（すなわち、画像特徴検出モデル１１０によって決定される第２の画像特徴セット）は、受け取った第１の画像３０２Ａに関連する低水準画像特徴を含むことができるので、第２の画像特徴セットを第３の特徴ベクトルに含めることで、受け取った第１の画像３０２Ａに類似する画像を予め記憶された第２の画像セットから識別する精度をさらに高めることができる。例えば、画像の品質が良くない場合（例えば、解像度が低い不鮮明な画像の場合）、第１の画像特徴セット（すなわち、高水準画像特徴）は、類似する画像を識別するできるほど十分なものでない可能性がある。このような場合には、第２の画像特徴（すなわち、低水準画像特徴）の方が、類似する画像の識別にとって有用かつ正確な場合がある。 As described above, the disclosed electronic device 102 can automatically generate a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. As a result, the third feature vector can include a first image feature set that can be determined by the DNN model 108 and a second image feature set that can be determined by the image feature detection model 110. The first image feature set can include high-level image features (e.g., facial features such as eyes, nose, ears, hair, etc.) associated with the received first image 302A (e.g., an image of a person's face), and the second image feature set can include low-level image features (e.g., points, edges, lines, contours, or basic objects and shapes of a face). Including both the high-level image features and the low-level image features in the third feature vector can complement each other in identifying similar images. In some scenarios, the first image feature set may not detect and extract all features that may be present in the received first image 302A. For example, some features may be misdetected or remain undetected by the DNN model 108. For example, if the received first image 302A is an image that is not well represented in the training data set 112 of the DNN model 108, the first image feature set may not be sufficient to identify images similar to the received first image from the pre-stored second image set. However, since the second image feature set (i.e., the second image feature set determined by the image feature detection model 110) may include low-level image features related to the received first image 302A, the second image feature set may be included in the third feature vector to further improve the accuracy of identifying images similar to the received first image 302A from the pre-stored second image set. For example, if the image quality is poor (e.g., in the case of a blurry image with low resolution), the first image feature set (i.e., high-level image features) may not be sufficient to identify similar images. In such a case, the second image features (i.e., low-level image features) may be more useful and accurate for identifying similar images.

図４は、本開示の実施形態による、ディープニューラルネットワーク（ＤＮＮ）モデル及び画像特徴検出モデルに基づく逆画像検索のための例示的な方法を示すフローチャートである。図４の説明は、図１、図２及び図３の要素に関連して行う。図４にはフローチャート４００を示す。フローチャート４００に示す方法は、電子装置１０２又は回路２０２などのいずれかのコンピュータシステムによって実行することができる。方法は４０２から開始して４０４に進むことができる。 FIG. 4 is a flowchart illustrating an exemplary method for reverse image search based on a deep neural network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. FIG. 4 is described with reference to elements of FIGS. 1, 2, and 3. FIG. 4 illustrates a flowchart 400. The method illustrated in flowchart 400 may be performed by any computer system, such as electronic device 102 or circuit 202. The method may start at 402 and proceed to 404.

４０４において、（第１の画像３０２Ａなどの）第１の画像を受け取ることができる。１又は２以上の実施形態では、回路２０２を、第１の画像３０２Ａを受け取るように構成することができる。第１の画像３０２Ａを受け取ることについては、例えば図３（の３０２）でさらに説明している。 At 404, a first image (e.g., first image 302A) can be received. In one or more embodiments, circuitry 202 can be configured to receive first image 302A. Receiving first image 302A is further described, for example, in FIG. 3 (at 302).

４０６において、受け取った（例えば、第１の画像３０２Ａ）第１の画像に関連する（第１の画像特徴セット３０４Ａなどの）第１の画像特徴セットをディープニューラルネットワーク（ＤＮＮ）モデル（例えば、ＤＮＮモデル１０８）によって抽出することができる。１又は２以上の実施形態では、回路２０２を、受け取った第１の画像３０２Ａに関連する第１の画像特徴セット３０４ＡをＤＮＮモデル１０８によって抽出するように構成することができる。第１の画像特徴セット３０４Ａの抽出については、例えば図３（の３０４）でさらに説明している。 At 406, a first set of image features (e.g., first image feature set 304A) associated with the received first image (e.g., first image 302A) can be extracted by a deep neural network (DNN) model (e.g., DNN model 108). In one or more embodiments, the circuit 202 can be configured to extract the first set of image features 304A associated with the received first image 302A by the DNN model 108. Extraction of the first set of image features 304A is further described, for example, in FIG. 3 (at 304).

４０８において、抽出された第１の画像特徴セット３０４Ａに基づいて、受け取った第１の画像３０２Ａに関連する第１の特徴ベクトルを生成することができる。回路２０２は、抽出された第１の画像特徴３０４Ａに基づいて、受け取った第１の画像３０２Ａに関連する第１の特徴ベクトルを生成するように構成することができる。第１の特徴ベクトルの生成については、例えば図３（の３０４）でさらに説明している。 At 408, a first feature vector associated with the received first image 302A may be generated based on the extracted set of first image features 304A. The circuit 202 may be configured to generate a first feature vector associated with the received first image 302A based on the extracted set of first image features 304A. Generation of the first feature vector is further described, for example, in FIG. 3 (at 304).

４１０において、受け取った第１の画像３０２Ａに関連する（第２の画像特徴セット３０６Ａなどの）第２の画像特徴セットを画像特徴検出モデル（例えば、画像特徴検出モデル１１０）によって抽出することができる。１又は２以上の実施形態では、回路２０２を、受け取った第１の画像３０２Ａに関連する第２の画像特徴セット３０６Ａを画像特徴検出モデル１１０によって抽出するように構成することができる。画像特徴検出モデル１１０は、スケール不変特徴変換（ＳＩＦＴ）ベースのモデル、高速化ロバスト特徴（ＳＵＲＦ）ベースのモデル、方向付きＦＡＳＴ及び回転ＢＲＩＥＦ（ＯＲＢ）ベースのモデル、又は近似最近傍のための高速ライブラリ（ＦＬＡＮＮ）ベースのモデルのうちの少なくとも１つを含む。第２の画像特徴セット３０６Ａの抽出については、例えば図３（の３０６）でさらに説明している。 At 410, a second set of image features (e.g., second set of image features 306A) associated with the received first image 302A can be extracted by an image feature detection model (e.g., image feature detection model 110). In one or more embodiments, the circuit 202 can be configured to extract the second set of image features 306A associated with the received first image 302A by the image feature detection model 110. The image feature detection model 110 includes at least one of a scale invariant feature transform (SIFT)-based model, a speeded up robust features (SURF)-based model, an oriented FAST and rotated BRIEF (ORB)-based model, or a fast library for approximate nearest neighbors (FLANN)-based model. Extraction of the second set of image features 306A is further described, for example, in FIG. 3 (at 306).

４１２において、抽出された第２の画像特徴セット３０６Ａに基づいて、受け取った第１の画像３０２Ａに関連する第２の特徴ベクトルを生成することができる。１又は２以上の実施形態では、回路２０２を、抽出された第２の画像特徴セット３０６Ａに基づいて、受け取った第１の画像３０２Ａに関連する第２の特徴ベクトルを生成するように構成することができる。第２の特徴ベクトルの生成については、例えば図３（の３０６）でさらに説明している。 At 412, a second feature vector associated with the received first image 302A may be generated based on the extracted set of second image features 306A. In one or more embodiments, the circuit 202 may be configured to generate a second feature vector associated with the received first image 302A based on the extracted set of second image features 306A. Generation of the second feature vector is further described, for example, in FIG. 3 (at 306).

４１４において、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像３０２Ａに関連する第３の特徴ベクトルを生成することができる。１又は２以上の実施形態では、回路２０２を、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像３０２Ａに関連する第３の特徴ベクトルを生成するように構成することができる。第３の特徴ベクトルの生成は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に対する主成分分析（ＰＣＡ）変換の適用にさらに基づくことができる。第３の特徴ベクトルの生成については、例えば図３（の３０８）でさらに説明している。 At 414, a third feature vector associated with the received first image 302A may be generated based on a combination of the generated first feature vector and the generated second feature vector. In one or more embodiments, the circuit 202 may be configured to generate a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. The generation of the third feature vector may be further based on application of a principal component analysis (PCA) transform to the combination of the generated first feature vector and the generated second feature vector. The generation of the third feature vector is further described, for example, in FIG. 3 (at 308).

４１６において、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルと、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックを決定することができる。１又は２以上の実施形態では、回路２０２を、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルと、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックを決定するように構成することができる。ある例では、類似度メトリックが、以下に限定するわけではないが、コサイン距離類似度又はユークリッド距離類似度の少なくとも一方を含むことができる。類似度メトリックの決定については、例えば図３（の３１２）でさらに説明している。 At 416, a similarity metric may be determined between the generated third feature vector associated with the received first image 302A and the fourth feature vector of each image of the pre-stored second set of images. In one or more embodiments, the circuit 202 may be configured to determine a similarity metric between the generated third feature vector associated with the received first image 302A and the fourth feature vector of each image of the pre-stored second set of images. In one example, the similarity metric may include, but is not limited to, at least one of cosine distance similarity or Euclidean distance similarity. Determining the similarity metric is further described, for example, in FIG. 3 (at 312).

４１８において、決定された類似度メトリックに基づいて、予め記憶された第２の画像セットから（予め記憶された第３の画像などの）予め記憶された第３の画像を識別することができる。１又は２以上の実施形態では、回路２０２を、決定された類似度メトリックに基づいて、予め記憶された第２の画像セットから予め記憶された第３の画像を識別するように構成することができる。予め記憶された第３の画像の識別については、例えば図３（の３１４）でさらに説明している。 At 418, a pre-stored third image (e.g., a pre-stored third image) may be identified from the set of pre-stored second images based on the determined similarity metric. In one or more embodiments, the circuit 202 may be configured to identify the pre-stored third image from the set of pre-stored second images based on the determined similarity metric. Identifying the pre-stored third image is further described, for example, in FIG. 3 (at 314).

４２０において、識別された予め記憶された第３の画像に関連する情報を表示するように（ディスプレイ装置２１０などの）ディスプレイ装置を制御することができる。回路２０２は、識別された予め記憶された第３の画像に関連する情報を表示するようにディスプレイ装置２１０を制御するよう構成することができる。ディスプレイ装置２１０の制御については、例えば図３（の３１４）でさらに説明している。制御は終了に進む。 At 420, a display device (such as display device 210) may be controlled to display information associated with the identified pre-stored third image. Circuitry 202 may be configured to control display device 210 to display information associated with the identified pre-stored third image. Control of display device 210 is further described, for example, in FIG. 3 (at 314). Control proceeds to an end.

フローチャート４００については、４０４、４０６、４０８、４１０、４１２、４１６、４１８及び４２０などの離散的動作として示しているが、本開示はこのように限定されるものではない。従って、いくつかの実施形態では、開示する実施形態の本質を損なうことなく、特定の実装に応じてこのような離散的動作をさらなる動作にさらに分割し、より少ない動作に結合し、又は削除することもできる。 Although flowchart 400 is illustrated as discrete operations such as 404, 406, 408, 410, 412, 416, 418, and 420, the disclosure is not so limited. Thus, in some embodiments, such discrete operations may be further divided into additional operations, combined into fewer operations, or eliminated, depending on the particular implementation, without departing from the essence of the disclosed embodiments.

本開示の様々な実施形態は、機械及び／又は（電子装置１０２などの）コンピュータによって実行可能な命令を記憶した非一時的コンピュータ可読媒体及び／又は記憶媒体を提供することができる。命令は、機械及び／又はコンピュータに（第１の画像３０２Ａなどの）第１の画像を受け取ることを含む動作を実行させることができる。動作は、受け取った第１の画像３０２Ａに関連する（第１の画像特徴セット３０４Ａなどの）第１の画像特徴セットを（ＤＮＮモデル１０８などの）ディープニューラルネットワーク（ＤＮＮ）モデルによって抽出することをさらに含むことができる。動作は、抽出された第１の画像特徴３０４Ａに基づいて、受け取った第１の画像３０２Ａに関連する第１の特徴ベクトルを生成することをさらに含むことができる。動作は、受け取った第１の画像３０２Ａに関連する（第２の画像特徴セット３０６Ａなどの）第２の画像特徴セットを（画像特徴検出モデル１１０などの）画像特徴検出モデルによって抽出することをさらに含むことができる。動作は、抽出された第２の画像特徴３０４Ａに基づいて、受け取った第１の画像３０２Ａに関連する第２の特徴ベクトルを生成することをさらに含むことができる。動作は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像３０２Ａに関連する第３の特徴ベクトルを生成することをさらに含むことができる。動作は、受け取った第１の画像に関連する生成された第３の特徴ベクトルと、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックを決定することをさらに含むことができる。動作は、決定された類似度メトリックに基づいて、予め記憶された第２の画像セットから予め記憶された第３の画像を識別することをさらに含むことができる。動作は、識別された予め記憶された第３の画像に関連する情報を表示するように（ディスプレイ装置２１０などの）ディスプレイ装置を制御することをさらに含むことができる。 Various embodiments of the present disclosure may provide a non-transitory computer-readable medium and/or storage medium having instructions executable by a machine and/or a computer (such as electronic device 102). The instructions may cause the machine and/or computer to perform operations including receiving a first image (such as first image 302A). The operations may further include extracting a first set of image features (such as first image feature set 304A) associated with the received first image 302A by a deep neural network (DNN) model (such as DNN model 108). The operations may further include generating a first feature vector associated with the received first image 302A based on the extracted first image features 304A. The operations may further include extracting a second set of image features (such as second image feature set 306A) associated with the received first image 302A by an image feature detection model (such as image feature detection model 110). The operations may further include generating a second feature vector associated with the received first image 302A based on the extracted second image features 304A. The operations may further include generating a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. The operations may further include determining a similarity metric between the generated third feature vector associated with the received first image and a fourth feature vector of each image of the pre-stored second set of images. The operations may further include identifying a pre-stored third image from the pre-stored second set of images based on the determined similarity metric. The operations may further include controlling a display device (such as display device 210) to display information associated with the identified pre-stored third image.

本開示の例示的な態様は、（回路２０２などの）回路を含む（図１の電子装置１０２などの）電子装置を提供することができる。回路２０２は、第１の画像３０２Ａを受け取るように構成することができる。回路２０２は、受け取った第１の画像３０２Ａに関連する（第１の画像特徴セット３０４Ａなどの）第１の画像特徴セットをディープニューラルネットワーク（ＤＮＮ）モデル１０８によって抽出するように構成することができる。回路２０２は、抽出された第１の画像特徴３０４Ａに基づいて、受け取った第１の画像３０２Ａに関連する第１の特徴ベクトルを生成するように構成することができる。回路２０２は、受け取った第１の画像３０２Ａに関連する（第２の画像特徴セット３０６Ａなどの）第２の画像特徴セットを画像特徴検出モデル１１０によって抽出するように構成することができる。回路２０２は、抽出された第２の画像特徴３０６Ａに基づいて、受け取った第１の画像３０２Ａに関連する第２の特徴ベクトルを生成するように構成することができる。回路２０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて、受け取った第１の画像３０２Ａに関連する第３の特徴ベクトルを生成するように構成することができる。回路２０２は、受け取った第１の画像３０２Ａに関連する生成された第３の特徴ベクトルと、予め記憶された第２の画像セットの各画像の第４の特徴ベクトルとの間の類似度メトリックを決定するように構成することができる。回路２０２は、決定された類似度メトリックに基づいて、予め記憶された第２の画像セットから予め記憶された第３の画像を識別するように構成することができる。回路２０２は、識別された予め記憶された第３の画像に関連する情報を表示するようにディスプレイ装置２１０を制御するよう構成することができる。 An exemplary aspect of the present disclosure may provide an electronic device (such as the electronic device 102 of FIG. 1 ) including a circuit (such as the circuit 202). The circuit 202 may be configured to receive a first image 302A. The circuit 202 may be configured to extract a first set of image features (such as the first image feature set 304A) associated with the received first image 302A by a deep neural network (DNN) model 108. The circuit 202 may be configured to generate a first feature vector associated with the received first image 302A based on the extracted first image features 304A. The circuit 202 may be configured to extract a second set of image features (such as the second image feature set 306A) associated with the received first image 302A by an image feature detection model 110. The circuit 202 may be configured to generate a second feature vector associated with the received first image 302A based on the extracted second image features 306A. The circuit 202 may be configured to generate a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. The circuit 202 may be configured to determine a similarity metric between the generated third feature vector associated with the received first image 302A and a fourth feature vector of each image of the pre-stored second set of images. The circuit 202 may be configured to identify a pre-stored third image from the pre-stored second set of images based on the determined similarity metric. The circuit 202 may be configured to control the display device 210 to display information associated with the identified pre-stored third image.

ある実施形態によれば、画像特徴検出モデル１１０は、以下に限定するわけではないが、スケール不変特徴変換（ＳＩＦＴ）ベースのモデル、高速化ロバスト特徴（ＳＵＲＦ）ベースのモデル、方向付きＦＡＳＴ及び回転ＢＲＩＥＦ（ＯＲＢ）ベースのモデル、又は近似最近傍のための高速ライブラリ（ＦＬＡＮＮ）ベースのモデルのうちの少なくとも１つを含むことができる。 According to an embodiment, the image feature detection model 110 may include at least one of, but is not limited to, a scale invariant feature transform (SIFT) based model, a speeded up robust features (SURF) based model, an oriented FAST and rotated BRIEF (ORB) based model, or a fast library for approximate nearest neighbors (FLANN) based model.

ある実施形態によれば、第３の特徴ベクトルの生成は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に対する主成分分析（ＰＣＡ）変換の適用にさらに基づくことができる。ある実施形態によれば、類似度メトリックは、以下に限定するわけではないが、コサイン距離類似度又はユークリッド距離類似度の少なくとも一方を含むことができる。 According to an embodiment, the generation of the third feature vector may be further based on applying a Principal Component Analysis (PCA) transform to a combination of the generated first feature vector and the generated second feature vector. According to an embodiment, the similarity metric may include at least one of, but is not limited to, cosine distance similarity or Euclidean distance similarity.

ある実施形態によれば、回路２０２は、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを、ＤＮＮモデル１０８及び画像特徴検出モデル１１０とは異なる（機械学習モデル３１６などの）機械学習モデルによって決定するようにさらに構成することができる。回路２０２は、決定された第１の重み及び決定された第２の重みに基づいて、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとを結合するようにさらに構成することができる。回路２０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて第３の特徴ベクトルを生成するように構成することができる。 According to an embodiment, the circuit 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector by a machine learning model (such as the machine learning model 316) different from the DNN model 108 and the image feature detection model 110. The circuit 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight. The circuit 202 may be configured to generate a third feature vector based on the combination of the generated first feature vector and the generated second feature vector.

ある実施形態によれば、回路２０２は、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを含むユーザ入力を受け取るようにさらに構成することができる。回路２０２は、受け取ったユーザ入力に基づいて、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとを結合するようにさらに構成することができる。回路２０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて第３の特徴ベクトルを生成するように構成することができる。 According to an embodiment, the circuit 202 may be further configured to receive a user input including a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector. The circuit 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the received user input. The circuit 202 may be configured to generate a third feature vector based on the combination of the generated first feature vector and the generated second feature vector.

ある実施形態によれば、回路２０２は、ＤＮＮモデル１０８によって、受け取った第１の画像３０２ＡをＤＮＮモデル１０８に関連する画像タグの組からの第１の画像タグに分類するようにさらに構成することができる。回路２０２は、ＤＮＮモデル１０８に関連する（訓練データセット１１２などの）訓練データセット内で、第１の画像タグに関連する第１の画像カウントを決定するように構成される。回路２０２は、第１の画像タグに関連する画像の決定された第１のカウントに基づいて、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを決定するようにさらに構成することができる。回路２０２は、決定された第１の重み及び決定された第２の重みに基づいて、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとを結合するようにさらに構成することができる。回路２０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて第３の特徴ベクトルを生成するように構成することができる。 According to an embodiment, the circuit 202 may be further configured to classify, by the DNN model 108, the received first image 302A to a first image tag from a set of image tags associated with the DNN model 108. The circuit 202 may be configured to determine a first image count associated with the first image tag in a training data set (such as the training data set 112) associated with the DNN model 108. The circuit 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first count of images associated with the first image tag. The circuit 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight. The circuit 202 may be configured to generate a third feature vector based on the combination of the generated first feature vector and the generated second feature vector.

ある実施形態によれば、回路２０２は、抽出された第１の画像特徴セット３０４Ａ又は抽出された第２の画像特徴セット３０６Ａの少なくとも一方に基づいて、受け取った第１の画像３０２Ａに関連する画質スコアを決定するように構成することができる。回路２０２は、決定された画質スコアに基づいて、生成された第１の特徴ベクトルに関連する第１の重みと、生成された第２の特徴ベクトルに関連する第２の重みとを決定するようにさらに構成することができる。回路２０２は、決定された第１の重み及び決定された第２の重みに基づいて、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとを結合するようにさらに構成することができる。回路２０２は、生成された第１の特徴ベクトルと生成された第２の特徴ベクトルとの結合に基づいて第３の特徴ベクトルを生成するように構成することができる。ある実施形態によれば、画質スコアは、以下に限定するわけではないが、受け取った第１の画像３０２Ａに関連する鮮明さ、ノイズ、ダイナミックレンジ、階調再現、コントラスト、彩度、歪曲収差、口径食、露光精度、色収差、レンズフレア、色モアレ又はアーチファクトのうちの少なくとも１つに対応することができる。 According to an embodiment, the circuit 202 may be configured to determine an image quality score associated with the received first image 302A based on at least one of the extracted first image feature set 304A or the extracted second image feature set 306A. The circuit 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score. The circuit 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight. The circuit 202 may be configured to generate a third feature vector based on the combination of the generated first feature vector and the generated second feature vector. According to an embodiment, the image quality score may correspond to at least one of, but not limited to, sharpness, noise, dynamic range, tone reproduction, contrast, saturation, distortion, vignetting, exposure accuracy, chromatic aberration, lens flare, color moiré, or artifacts associated with the received first image 302A.

ある実施形態によれば、回路２０２は、第１のビデオと、予め記憶された第２のビデオに対応できる識別された予め記憶された第３の画像とから、第１の画像３０２Ａを抽出するようにさらに構成することができる。予め記憶された第２のビデオは第１のビデオに関連することができる。 According to an embodiment, the circuitry 202 may be further configured to extract the first image 302A from the first video and an identified pre-stored third image that may correspond to a pre-stored second video. The pre-stored second video may be related to the first video.

本開示は、ハードウェアで実現することも、又はハードウェアとソフトウェアとの組み合わせで実現することもできる。本開示は、少なくとも１つのコンピュータシステム内で集中方式で実現することも、又は異なる要素を複数の相互接続されたコンピュータシステムにわたって分散できる分散方式で実現することもできる。本明細書で説明した方法を実行するように適合されたコンピュータシステム又はその他の装置が適することができる。ハードウェアとソフトウェアとの組み合わせは、ロードされて実行された時に本明細書で説明した方法を実行するようにコンピュータシステムを制御することができるコンピュータプログラムを含む汎用コンピュータシステムとすることができる。本開示は、他の機能も実行する集積回路の一部を含むハードウェアで実現することができる。 The present disclosure can be implemented in hardware or in a combination of hardware and software. The present disclosure can be implemented in a centralized manner in at least one computer system, or in a distributed manner where different elements can be distributed across several interconnected computer systems. Any computer system or other device adapted to perform the methods described herein can be suitable. The combination of hardware and software can be a general-purpose computer system including a computer program that, when loaded and executed, can control a computer system to perform the methods described herein. The present disclosure can be implemented in hardware, including as part of an integrated circuit that also performs other functions.

本開示は、本明細書で説明した方法の実装を可能にする全ての特徴を含み、コンピュータシステムにロードされた時にこれらの方法を実行できるコンピュータプログラム製品に組み込むこともできる。本文脈におけるコンピュータプログラムとは、情報処理能力を有するシステムに特定の機能を直接的に、或いはａ）別の言語、コード又は表記法への変換、ｂ）異なる内容形態での複製、のいずれか又は両方を行った後に実行させるように意図された命令セットの、あらゆる言語、コード又は表記法におけるあらゆる表現を意味する。 The present disclosure may also be embodied in a computer program product, including all features enabling the implementation of the methods described herein, which is capable of executing these methods when loaded into a computer system. A computer program in this context means any expression in any language, code or notation of a set of instructions intended to cause a system capable of processing information to execute a particular function, either directly or after a) conversion into another language, code or notation, b) reproduction in a different content form, or both.

いくつかの実施形態を参照しながら本開示を説明したが、当業者であれば、本開示の範囲から逸脱することなく様々な変更を行うことができ、同等物を代用することもできると理解するであろう。また、本開示の範囲から逸脱することなく、特定の状況又は内容を本開示の教示に適合させるように多くの修正を行うこともできる。従って、本開示は、開示した特定の実施形態に限定されるものではなく、添付の特許請求の範囲内に収まる全ての実施形態を含むように意図される。 Although the present disclosure has been described with reference to several embodiments, those skilled in the art will recognize that various modifications can be made and equivalents substituted without departing from the scope of the present disclosure. Additionally, many modifications can be made to adapt a particular situation or material to the teachings of the present disclosure without departing from its scope. Therefore, it is not intended that the present disclosure be limited to the particular embodiments disclosed, but rather, it is intended to include all embodiments falling within the scope of the appended claims.

１０２電子装置
１０４サーバ
１０６データベース
１０８ディープニューラルネットワーク（ＤＮＮ）モデル
１１０画像特徴検出モデル
１１２訓練データセット
１１４通信ネットワーク
１１６ユーザ 102 Electronic device 104 Server 106 Database 108 Deep Neural Network (DNN) model 110 Image feature detection model 112 Training dataset 114 Communication network 116 User

Claims

1. An electronic device comprising:
Receiving a first image;
extracting a first set of image features associated with the received first image using a deep neural network (DNN) model;
generating a first feature vector associated with the received first image based on the extracted set of first image features;
extracting a second set of image features associated with the received first image using an image feature detection model;
generating a second feature vector associated with the received first image based on the extracted set of second image features;
generating a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector;
determining a similarity metric between the generated third feature vector associated with the received first image and a fourth feature vector of each image of a second set of pre-stored images;
identifying a third pre-stored image from the second set of pre-stored images based on the determined similarity metric;
controlling a display device to display information related to the identified pre-stored third image.
The circuit is configured to
1. An electronic device comprising:

The image feature detection model includes at least one of a Scale Invariant Feature Transform (SIFT) based model, a Speed Up Robust Features (SURF) based model, an Oriented FAST and Rotated BRIEF (ORB) based model, or a Fast Library for Approximate Nearest Neighbors (FLANN) based model.
2. The electronic device of claim 1.

the generation of the third feature vector is further based on applying a principal component analysis (PCA) transform to the combination of the generated first feature vector and the generated second feature vector.
2. The electronic device of claim 1.

the similarity metric includes at least one of a cosine distance similarity or a Euclidean distance similarity;
2. The electronic device of claim 1.

The circuit comprises:
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector by a machine learning model distinct from the DNN model and the image feature detection model;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The electronic device of claim 1 , further configured to:

The circuit comprises:
receiving user input including a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector;
combining the generated first feature vector and the generated second feature vector based on the received user input;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The electronic device of claim 1 , further configured to:

The circuit comprises:
classifying, with the DNN model, the received first image into a first image tag from a set of image tags associated with the DNN model;
determining a first image count associated with the first image tag in a training data set associated with the DNN model;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first image count associated with the first image tag;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The electronic device of claim 1 , further configured to:

The circuit comprises:
determining an image quality score associated with the received first image based on at least one of the extracted first set of image features or the extracted second set of image features;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The electronic device of claim 1 , further configured to:

the image quality score corresponds to at least one of sharpness, noise, dynamic range, tone reproduction, contrast, color saturation, distortion, vignetting, exposure accuracy, chromatic aberration, lens flare, color moiré, or artifacts associated with the received first image;
9. The electronic device of claim 8.

the circuitry is further configured to extract the first image from a first video and the identified pre-stored third image corresponding to a pre-stored second video related to the first video.
2. The electronic device of claim 1.

In an electronic device,
Receiving a first image;
extracting a first set of image features associated with the received first image using a deep neural network (DNN) model;
generating a first feature vector associated with the received first image based on the extracted set of first image features;
extracting a second set of image features associated with the received first image using an image feature detection model;
generating a second feature vector associated with the received first image based on the extracted set of second image features;
generating a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector;
determining a similarity metric between the generated third feature vector associated with the received first image and a fourth feature vector of each image of a second set of pre-stored images;
identifying a third pre-stored image from the second set of pre-stored images based on the determined similarity metric;
controlling a display device to display information related to the identified pre-stored third image;
The method according to claim 1, further comprising:

The image feature detection model includes at least one of a Scale Invariant Feature Transform (SIFT) based model, a Speed Up Robust Features (SURF) based model, an Oriented FAST and Rotated BRIEF (ORB) based model, or a Fast Library for Approximate Nearest Neighbors (FLANN) based model.
The method of claim 11.

the generation of the third feature vector is further based on applying a principal component analysis (PCA) transform to the combination of the generated first feature vector and the generated second feature vector.
The method of claim 11.

the similarity metric includes at least one of a cosine distance similarity or a Euclidean distance similarity;
The method of claim 11.

determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector by a machine learning model distinct from the DNN model and the image feature detection model;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The method of claim 11 further comprising:

receiving user input including a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector;
combining the generated first feature vector and the generated second feature vector based on the received user input;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The method of claim 11 further comprising:

classifying, by the DNN model, the received first image into a first image tag from a set of image tags associated with the DNN model;
determining a first image count associated with the first image tag in a training data set associated with the DNN model;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first image count associated with the first image tag;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The method of claim 11 further comprising:

determining an image quality score associated with the received first image based on at least one of the extracted first set of image features or the extracted second set of image features;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight;
generating the third feature vector based on the combination of the generated first feature vector and the generated second feature vector;
The method of claim 11 further comprising:

the image quality score corresponds to at least one of sharpness, noise, dynamic range, tone reproduction, contrast, color saturation, distortion, vignetting, exposure accuracy, chromatic aberration, lens flare, color moiré, or artifacts associated with the received first image;
20. The method of claim 18.

A non-transitory computer-readable medium having computer-executable instructions stored thereon, the computer-executable instructions, when executed by an electronic device,
Receiving a first image;
extracting a first set of image features associated with the received first image using a deep neural network (DNN) model;
generating a first feature vector associated with the received first image based on the extracted set of first image features;
extracting a second set of image features associated with the received first image using an image feature detection model;
generating a second feature vector associated with the received first image based on the extracted set of second image features;
generating a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector;
determining a similarity metric between the generated third feature vector associated with the received first image and a fourth feature vector of each image of a second set of pre-stored images;
identifying a third pre-stored image from the second set of pre-stored images based on the determined similarity metric;
controlling a display device to display information related to the identified pre-stored third image;
23. A non-transitory computer-readable medium for causing the electronic device to perform operations including: