CN116420143A - Reverse image search based on Deep Neural Network (DNN) model and image feature detection model - Google Patents

Reverse image search based on Deep Neural Network (DNN) model and image feature detection model Download PDF

Info

Publication number
CN116420143A
CN116420143A CN202280007041.6A CN202280007041A CN116420143A CN 116420143 A CN116420143 A CN 116420143A CN 202280007041 A CN202280007041 A CN 202280007041A CN 116420143 A CN116420143 A CN 116420143A
Authority
CN
China
Prior art keywords
image
feature vector
generated
feature
received
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280007041.6A
Other languages
Chinese (zh)
Inventor
李钟和
P·加格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/482,290 external-priority patent/US11947631B2/en
Application filed by Sony Group Corp filed Critical Sony Group Corp
Publication of CN116420143A publication Critical patent/CN116420143A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Electronic devices and methods for reverse image search are provided. The electronic device receives an image. The electronic device extracts a first set of image features associated with the image through the DNN model and generates a first feature vector based on the first set of image features. The electronic device extracts a second set of image features associated with the image through the image feature detection model and generates a second feature vector based on the second set of image features. The electronic device generates a third feature vector based on a combination of the first feature vector and the second feature vector. The electronic device determines a similarity measure between the third feature vector and a fourth feature vector of each image in the set of pre-stored images and identifies the pre-stored images based on the similarity measure. The electronic device controls the display device to display information associated with the pre-stored image.

Description

Reverse image search based on Deep Neural Network (DNN) model and image feature detection model
Cross-reference/cross-reference to related applications
The present application claims priority to U.S. patent application Ser. No.17/482,290, filed on 22 at 9 of 2021, which claims priority to U.S. provisional patent application Ser. No.63/189,956, filed on 18 at 5 of 2021, the entire contents of which are incorporated herein by reference.
Technical Field
Various embodiments of the present disclosure relate to reverse image searching. More specifically, various embodiments of the present disclosure relate to electronic devices and methods for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model.
Background
Advances in information and communication technology have led to the development of various internet-based image search systems (e.g., web search engines). Conventionally, a user may upload an input image onto a web search engine as a search query. In this case, the web search engine (using the reverse image search method) may provide a collection of output images from the internet. The set of output images may be similar to the input images. Such reverse image search methods may employ a machine learning model to determine a set of output images that are similar to the input images. In some cases, the machine learning model may misclassify one or more objects in the input image, which may result in the set of output images including unwanted or irrelevant images.
Limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present disclosure as set forth in the remainder of the present application with reference to the drawings.
Disclosure of Invention
An electronic device and method for inverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
These and other features and advantages of the present disclosure can be understood by review of the following detailed description of the disclosure, along with the accompanying figures in which like reference numerals refer to like parts throughout.
Drawings
FIG. 1 is a block diagram illustrating an exemplary network environment for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure.
Fig. 2 is a block diagram illustrating an exemplary electronic device for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure.
Fig. 3 is a diagram illustrating exemplary operations for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure.
Fig. 4 is a flowchart illustrating an exemplary method for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure.
Detailed Description
The embodiments described below may be found in the disclosed electronic device and method for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model to enhance the accuracy of the reverse image search. Exemplary aspects of the present disclosure provide an electronic device implementing a Deep Neural Network (DNN) model and an image feature detection model for reverse image searching. The electronic device may receive a first image (such as an image that the user needs to search for similar images). The electronic device may extract a first set of image features associated with the received first image through a Deep Neural Network (DNN) model and generate a first feature vector associated with the received first image based on the extracted first set of image features. The electronic device may extract a second set of image features associated with the received first image through the image feature detection model and generate a second feature vector associated with the received first image based on the extracted second set of image features. Examples of image feature detection models may include, but are not limited to, scale-invariant feature transform (SIFT) based models, acceleration robust feature (SURF) based models, orientation FAST and rotated BRIEF (ORB) based models, or FAST approximate nearest neighbor (FLANN) based models. The image feature detection model may be capable of extracting those image features, some of which may be falsely detected and/or misclassified by the DNN model 108.
The electronic device may also generate a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector. In an example, the generation of the third feature vector may also be based on application of a Principal Component Analysis (PCA) transformation to a combination of the generated first feature vector and the generated second feature vector. The electronic device may also determine a similarity measure between the generated third feature vector associated with the received first image and a fourth feature vector for each second image in a set of pre-stored second images (such as images stored in a database). Examples of similarity metrics may include, but are not limited to, cosine distance similarity or euclidean distance similarity. The electronic device may also identify a pre-stored third image (such as an image that is the same as or similar to the received first image) from the set of pre-stored second images based on the determined similarity measure and control the display device to display information associated with the identified pre-stored third image.
The disclosed electronic device may automatically generate a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector. The third feature vector may thus comprise a first set of image features that may be determined by the DNN model and a second set of image features that may be determined by the image feature detection model. The first set of image features may include higher-level image features (e.g., facial features such as eyes, nose, ears, and hair) and the second set of image features may include lower-level image features (e.g., edges, lines, and contours of the face) associated with the received first image. Both the higher level image features and the lower level image features included in the third feature vector may complement each other for identification of similar images. For example, if the received first image is an image that is not sufficiently represented in the training dataset of the DNN model, the first set of image features may be insufficient to identify an image from the pre-stored set of second images that is similar to the received first image. However, since the second set of image features may include lower levels of image features associated with the received first image, including the second set of image features in the third feature vector may improve the accuracy of identifying images from the pre-stored set of second images that are similar to the received first image. On the other hand, if the quality of the image is poor (e.g., if it is a low resolution and blurred image), the first set of image features (i.e., the higher level image features) may not be sufficient to identify similar images. In this case, the second set of image features (i.e., the lower level image features) may be more useful and accurate for the identification of similar images.
FIG. 1 is a block diagram illustrating an exemplary network environment for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. Referring to fig. 1, a network environment 100 is shown. The network environment 100 may include an electronic device 102, a server 104, and a database 106. Also shown are a Deep Neural Network (DNN) model 108 and an image feature detection model 110 implemented on server 104. As shown in fig. 1, training data set 112 may be stored on database 106. The electronic device 102, the server 104, and the database 106 may be communicatively coupled to one another via a communication network 114. A user 116 associated with the electronic device 102 is also shown. In fig. 1, the electronic device 102 and the server 104 are shown as two separate devices; however, in some embodiments, the entire functionality of the server 104 may be incorporated into the electronic device 102 without departing from the scope of this disclosure.
The electronic device 102 may comprise suitable logic, circuitry, interfaces, and/or code that may be configured to identify and display a set of images that are similar to the first image based on the implementation of the DNN model 108 and the image feature detection model 110 on the first image. Examples of electronic device 102 may include, but are not limited to, an image search engine, a server, a personal computer, a laptop computer, a computer workstation, a mainframe, a gaming device, a Virtual Reality (VR)/Augmented Reality (AR)/Mixed Reality (MR) device, a smart phone, a mobile phone, a computing device, a tablet computer, and/or any Consumer Electronics (CE) device.
DNN model 108 may be a deep convolutional neural network model that may be trained on an image feature detection task to detect a first set of image features in a first image. DNN model 108 may be defined by its hyper-parameters (e.g., activation function(s), number of weights, cost function, regularization function, input size, number of layers, etc.). DNN model 108 may be referred to as a system or computing network of artificial neurons (also referred to as nodes). The nodes of the DNN model 108 may be arranged in multiple layers, as defined in the neural network topology of the DNN model 108. The multiple layers of DNN model 108 may include an input layer, one or more hidden layers, and an output layer. Each of the plurality of layers may include one or more nodes (or artificial neurons). The outputs of all nodes in the input layer may be coupled to at least one node of the hidden layer(s). Similarly, the input of each hidden layer may be coupled to the output of at least one node in other layers of DNN model 108. The output of each hidden layer may be coupled to an input of at least one node in other layers of DNN model 108. The node(s) in the last layer may receive input from at least one hidden layer to output a result. The number of layers and the number of nodes in each layer may be determined by the hyper-parameters of the DNN model 108. Such super parameters may be set prior to or concurrent with training the DNN model 108 on the training dataset 112.
Each node of DNN model 108 may correspond to a mathematical function (e.g., sigmoid function or a rectifying linear unit) having a set of parameters that are adjustable during training of the network. The parameter set may include, for example, weight parameters, regularization parameters, and the like. Each node may use a mathematical function to compute an output based on one or more inputs from nodes in other layer(s) (e.g., the previous layer (s)) from DNN model 108. All or some of the nodes of DNN model 108 may correspond to the same mathematical function, which may be the same or different.
In training of DNN model 108, one or more parameters of each node of DNN model 108 may be updated based on whether the output of the last layer for a given input (from the training dataset) matches the correct result based on the loss function for DNN model 108. The above process may be repeated for the same or different inputs until a minimum of the loss function is achieved and the training error is minimized. Several methods for training are known in the art, such as gradient descent, stochastic gradient descent, batch gradient descent, gradient lifting, meta-heuristics, and the like.
In an embodiment, DNN model 108 may include electronic data, which may be implemented as a software component of an application executable on electronic device 102 or server 104, for example. DNN model 108 may rely on libraries, external scripts, or other logic/instructions for execution by a processing device (such as electronic device 102 or server 104). DNN model 108 may include computer executable code or routines to enable a computing device, such as electronic device 102 or server 104, to perform one or more operations to detect image features in an input image. Additionally, or alternatively, DNN model 108 may be implemented using hardware including a processor, a microprocessor (e.g., to perform or control the performance of one or more operations), a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC). For example, an inference accelerator chip may be included in the electronic device 102 (or server 104) to accelerate the computation of the DNN model 108 for image feature detection tasks. In some embodiments, DNN model 108 may be implemented using a combination of both hardware and software. Examples of DNN model 108 may include, but are not limited to, artificial Neural Networks (ANNs), convolutional Neural Networks (CNNs), CNN-equipped regions (R-CNNs), fast R-CNNs, faster R-CNNs, you look only once (YOLO) networks, residual neural networks (Res-Net), feature Pyramid Networks (FPNs), retinal networks (Retna-Net), single-shot detectors (SSDs), and/or combinations thereof.
The image feature detection model 110 may be an image processing algorithm configured to extract image features associated with the first image. The image feature detection model 110 may be defined by its super-parameters (e.g., number of image features, edge threshold, number of weights, cost function, input size, number of layers, etc.). The hyper-parameters of the image feature detection model 110 may be tuned and the weights may be updated to move toward the global minimum of the cost function of the image feature detection model 110. The image feature detection model 110 may include electronic data, which may be implemented as a software component of an application executable on the electronic device 102 or the server 104, for example. The image feature detection model 110 may rely on libraries, external scripts, or other logic/instructions to be executed by a processing device, such as the electronic device 102 or the server 104. The image feature detection model 110 may include code and routines configured to enable a computing device, such as the electronic device 102 or the server 104, to perform one or more operations, such as to extract a set of image features associated with a first image. Additionally, or alternatively, the image feature detection model 110 may be implemented using hardware including a processor, microprocessor (e.g., to perform or control the performance of one or more operations), field Programmable Gate Array (FPGA), or Application Specific Integrated Circuit (ASIC). Alternatively, in some embodiments, the image feature detection model 110 may be implemented using a combination of hardware and software. Examples of image feature detection models 110 may include, but are not limited to, scale-invariant feature transform (SIFT) based models, acceleration robust feature (SURF) based models, orientation FAST and rotation BRIEF (ORB) based models, or FAST approximate nearest neighbor (FLANN) based models.
The server 104 may comprise suitable logic, circuitry, interfaces and/or code that may be configured to store the DNN model 108 and the image feature detection model 110. Server 104 may use DNN model 108 to generate a first feature vector associated with the first image and may use image feature detection model 110 to generate a second feature vector associated with the first image. Server 104 may also store a machine learning model that is different from DNN model 108 and image feature detection model 110. The stored machine learning model may be configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector. In an exemplary embodiment, the server 104 may be implemented as a cloud server, and operations may be performed by web applications, cloud applications, HTTP requests, repository operations, file transfers, and the like. Other example implementations of server 104 may include, but are not limited to, a database server, a file server, a web server, an application server, a mainframe server, or a cloud computing server.
In at least one embodiment, server 104 may be implemented as a plurality of distributed cloud-based resources using several techniques well known to those of ordinary skill in the art. Those of ordinary skill in the art will appreciate that the scope of the present disclosure may not be limited to implementing the server 104 and the electronic device 102 as two separate entities. In some embodiments, the functionality of the server 104 may be incorporated, in whole or at least in part, into the electronic device 102 without departing from the scope of the present disclosure.
Database 106 may include suitable logic, interfaces, and/or code that may be configured to store training data set 112 for DNN model 108. The training data set 112 may include a set of pre-stored training images and a predetermined label assigned to each training image in the set of pre-stored training images. The predetermined label assigned to a certain training image may comprise a label corresponding to an image feature that may be predetermined for the particular training image. DNN model 108 may be pre-trained for image feature detection tasks based on training dataset 112. In an embodiment, the database 106 may also be configured to store a set of pre-stored second images. Database 106 may be a relational database or a non-relational database. Moreover, in some cases, database 106 may be stored on a server (such as server 104) such as a cloud server, or may be cached and stored on electronic device 102. In addition, or alternatively, database 106 may be implemented using hardware including a processor, a microprocessor (e.g., to perform or control the performance of one or more operations), a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC). In some other cases, database 106 may be implemented using a combination of hardware and software.
The communication network 114 may include a communication medium through which the electronic device 102, the server 104, and the database 106 may communicate with one another. Examples of communication network 114 may include, but are not limited to, the internet, a cloud network, a Long Term Evolution (LTE) network, a Wireless Local Area Network (WLAN), a Local Area Network (LAN), a telephone line (POTS), and/or a Metropolitan Area Network (MAN). Various devices in network environment 100 may be configured to connect to communication network 114 according to various wired and wireless communication protocols. Examples of such wired and wireless communication protocols may include, but are not limited to, at least one of transmission control protocol and internet protocol (TCP/IP), user Datagram Protocol (UDP), hypertext transfer protocol (HTTP), file Transfer Protocol (FTP), zigBee, EDGE, IEEE 802.11, light fidelity (Li-Fi), 802.16, IEEE 802.11s, IEEE 802.11g, multi-hop communication, wireless Access Point (AP), device-to-device communication, cellular communication protocol, or Bluetooth (BT) communication protocol, or a combination thereof.
In operation, the electronic device 102 may initiate an inverse image search query. In an embodiment, the reverse image search may be initiated based on user input received via a display device (shown in fig. 2). After initiating the reverse image search, the electronic device 102 may be configured to receive the first image as an image search query. For example, the first image may correspond to an image uploaded through an I/O device (shown in fig. 2) of the electronic device 102 based on user input. The first image may be associated with a still image having a fixed foreground or background object or an image extracted from a video. The electronic device 102 may be configured to extract a first set of image features associated with the received first image through the DNN model 108. The electronic device 102 may be configured to extract a second set of image features associated with the received first image through the image feature detection model 110. Details of the first and second image feature sets are provided, for example, in fig. 3. Examples of image feature detection models 110 may include, but are not limited to, scale-invariant feature transform (SIFT) based models, acceleration robust feature (SURF) based models, orientation FAST and rotation BRIEF (ORB) based models, or FAST approximate nearest neighbor (FLANN) based models.
The electronic device 102 may be further configured to generate a first feature vector associated with the received first image based on the extracted first set of image features. The electronic device 102 may be further configured to generate a second feature vector associated with the received first image based on the extracted second set of image features. The first feature vector associated with the received first image may be a vector that may include information about the first set of image features, and the second feature vector associated with the received first image may be a vector that may include information about the second set of image features. The electronic device 102 may be further configured to generate a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector. The third feature vector may include, but is not limited to, the generated first feature vector and the generated second vector. In an embodiment, the generation of the third feature vector may also be based on application of a Principal Component Analysis (PCA) transformation to a combination of the generated first feature vector and the generated second feature vector. The details of the generation of the third feature vector are further described, for example, in fig. 3.
The electronic device 102 may be configured to determine a similarity measure between the generated third feature vector associated with the received first image and a fourth feature vector of each second image in a set of pre-stored second images (such as images stored in the database 106). Examples of similarity metrics may include, but are not limited to, cosine distance similarity or euclidean distance similarity. The electronic device 102 may be configured to identify a pre-stored third image (such as an image that is the same as or similar to the received first image) from a set of pre-stored second images based on the determined similarity measure. The electronic device 102 may also be configured to control the display device to display information associated with the identified pre-stored third image. Details of the determination of the similarity measure and the identification of the pre-stored third image are further described in fig. 3, for example.
Fig. 2 is a block diagram illustrating an exemplary electronic device for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. Fig. 2 is explained in connection with elements from fig. 1. Referring to fig. 2, a block diagram 200 of the electronic device 102 is shown. Electronic device 102 may include circuitry 202, memory 204, input/output (I/O) devices 206, and network interface 208. The I/O device 206 may also include a display device 210. The network interface 208 may connect the electronic device 102 with the server 104 and the database 106 via the communication network 114.
Circuitry 202 may comprise suitable logic, circuitry, and/or interfaces that may be configured to execute program instructions associated with different operations to be performed by electronic device 102. Circuitry 202 may include one or more dedicated processing units, which may be implemented as separate processors. In an embodiment, one or more special purpose processing units may be implemented as an integrated processor or cluster of processors, which may be configured to collectively perform the functions of the one or more special purpose processing units. Circuitry 202 may be implemented based on a variety of processor technologies known in the art. Examples of implementations of circuitry 202 may be an X86-based processor, a Graphics Processing Unit (GPU), a Reduced Instruction Set Computing (RISC) processor, an Application Specific Integrated Circuit (ASIC) processor, a Complex Instruction Set Computing (CISC) processor, a microcontroller, a Central Processing Unit (CPU), and/or other control circuitry.
The memory 204 may comprise suitable logic, circuitry, interfaces and/or code that may be configured to store program instructions to be executed by the circuitry 202. In at least one embodiment, the memory 204 may be configured to store the DNN model 108 and the image feature detection model 110. The memory 204 may be configured to store one or more of a similarity metric, a machine learning model (e.g., machine learning model 316 of fig. 3) that is different from the DNN model 108 and the image feature detection model 110, a first weight associated with the generated first feature vector, and a second weight associated with the generated second feature vector, but is not limited to these. In an embodiment, the memory 204 may store the first image and the identified pre-stored third image. Examples of embodiments of memory 204 may include, but are not limited to, random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), hard Disk Drive (HDD), solid State Drive (SSD), CPU cache, and/or Secure Digital (SD) card.
The I/O device 206 may comprise suitable logic, circuitry, interfaces and/or code that may be configured to receive input and provide output based on the received input. The I/O devices 206 may include various input and output devices that may be configured to communicate with the circuitry 202. In an example, the electronic device 102 may receive (via the I/O device 206) user input including an inverted image search query. The reverse image search query may include a first image. In another example, the electronic device 102 may receive (via the I/O device 206) a user input including a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector. The electronic device 102 may control the I/O device 206 to output the identified pre-stored third image. Examples of I/O devices 206 may include, but are not limited to, a touch screen, a keyboard, a mouse, a joystick, a display device (e.g., display device 210), a microphone, or a speaker.
The display device 210 may comprise suitable logic, circuitry, and interfaces that may be configured to display the output of the electronic device 102. The display device 210 may be used to display information associated with the identified pre-stored third image. In some embodiments, display device 210 may be an externally coupled display device associated with electronic device 102. The display device 210 may be a touch screen that may enable the user 116 to provide user input via the display device 210. The touch screen may be at least one of a resistive touch screen, a capacitive touch screen, a thermal touch screen, or any other touch screen that may be used to provide input to the display device 210. The display device 210 may be implemented by several known technologies, such as, but not limited to, at least one of a Liquid Crystal Display (LCD) display, a Light Emitting Diode (LED) display, a plasma display, or an Organic LED (OLED) display, or other display technologies.
The network interface 208 may comprise suitable logic, circuitry, interfaces, and/or code that may be configured to facilitate communications between the electronic device 102, the server 104, and the database 106 via the communication network 114. The network interface 208 may be implemented using a variety of known techniques to support wired or wireless communication of the electronic device 102 with the communication network 114. The network interface 208 may include, but is not limited to, an antenna, a Radio Frequency (RF) transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chipset, a Subscriber Identity Module (SIM) card, or local buffer circuitry.
The network interface 208 may be configured to communicate via wired or wireless communications or a combination thereof using a network such as the internet, an intranet, a wireless network, a cellular telephone network, a wireless Local Area Network (LAN), or a Metropolitan Area Network (MAN). Wireless communications may be configured to use one or more of a variety of communication standards, protocols, and technologies, such as global system for mobile communications (GSM), enhanced Data GSM Environment (EDGE), wideband code division multiple access (W-CDMA), long Term Evolution (LTE), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wireless fidelity (Wi-Fi), such as IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, or IEEE 802.11n, voice over internet protocol (VoIP), light fidelity (Li-Fi), worldwide interoperability for microwave access (Wi-MAX), protocols for email, instant messaging, and Short Messaging Service (SMS).
The operation of circuitry 202 is further described, for example, in fig. 3 and 4. It may be noted that the electronic device 102 shown in fig. 2 may include various other components or systems. For brevity, descriptions of other components or systems of the electronic device 102 are omitted from this disclosure.
Fig. 3 is a diagram illustrating exemplary operations for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. Fig. 3 is explained in connection with the elements in fig. 1 and 2. Referring to fig. 3, a block diagram 300 illustrating exemplary operations 302 through 314 for reverse image search based on the DNN model 108 and the image feature detection model 110 is shown. The exemplary operations may be performed by any computing system, for example, by the electronic device 102 of fig. 1 or by the circuitry 202 of fig. 2.
At 302, a first image may be received. In an embodiment, circuitry 202 may be configured to receive a first image. For example, the first image 302A may be received. The first image 302A may be received from a data source (e.g., persistent storage (such as memory 204) on the electronic device 102, an image capture device, a cloud server, or a combination thereof). The first image 302A may include an object of interest for which the user 116 may use an inverse image search to require similar or identical image results. Alternatively, the first image 302A may correspond to an image from a sequence of images of the first video. Circuitry 202 may be configured to extract first image 302A from the first video. The first video may correspond to a video that includes an object of interest for which the user 116 requires similar or identical video results using reverse image search. The first image 302A may represent an image with a fixed foreground or background. As shown, for example, the first image 302A may depict a scene in a movie (e.g., an image of a spider knight-errant as an object of interest, as shown in fig. 3).
After receiving the first image 302A, the circuitry 202 may input the received first image 302A to the DNN model 108 and the image feature detection model 110 for image feature extraction. Circuitry 202 may employ DNN model 108 to extract a first set of image features associated with first image 302A, such as described in 304. Circuitry 202 may also employ image feature detection model 110 to extract a second set of image features associated with first image 302A, such as described in 306. Operations 304 and 306 may be performed in any order without departing from the scope of the present disclosure.
At 304, a first set of image features may be extracted. In an embodiment, circuitry 202 may be configured to extract a first set of image features associated with a received first image 302A through a Deep Neural Network (DNN) model, such as DNN model 108. The extracted first set of image features may correspond to unique features associated with one or more objects in the received first image 302A. The DNN model 108 may be pre-trained for image feature extraction tasks based on a training dataset 112 of a set of pre-stored training images assigned predetermined labels. The predetermined labels assigned to certain training images may include labels corresponding to image features that may be predetermined for a particular training image. Circuitry 202 may feed first image 302A to DNN model 108 as input and may receive as output a first set of image features (i.e., associated with first image 302A) from DNN model 108 based on image feature detection tasks performed on first image 302A by DNN model 108.
In an embodiment, the extracted first set of image features may include information that may be needed to classify each object contained into a particular object class. Examples of the extracted first set of image features may include, but are not limited to, shape, texture, color, and other high-level image features. As shown in fig. 3, for example, the extracted first set of image features 304A associated with the first image 302A may include colors depicted as shades of gray on an object of interest (such as a spider knight-errant's face). For example, if the object of interest is a face of a person (such as a spider knight-errant or any other person/character) in the first image 302A, the first set of image features 304A may include the shape of eyes, the shape of ears, the shape of nose, and the shape/texture of other high-level facial details of the person/character. A specific embodiment of extracting the first set of image features by the DNN model 108 may be known to a person skilled in the art, and thus, a detailed description of such extracting the first set of image features is omitted from this disclosure for brevity.
Circuitry 202 may be configured to generate a first feature vector associated with the received first image 302A based on the extracted first set of image features. Such a first feature vector may also be referred to as a unique set of first image features. The generated first feature vector may include a plurality of vector elements, wherein each vector element may correspond to an image feature from the extracted first set of image features. Each vector element of the first feature vector may store a value that may correspond to a certain first image feature from the first set of image features. For example, if the received first image 302A is a high definition image (e.g., an image of "1024x1024" pixels), the first feature vector may be a "1x2048" vector having 2048 vector elements. The i-th element of the first feature vector may represent the value of the i-th first image feature.
At 306, a second set of image features may be extracted. In an embodiment, circuitry 202 may be configured to extract a second set of image features associated with the received first image 302A by an image feature detection model (such as image feature detection model 110). The extracted second set of image features may also correspond to certain unique features associated with one or more objects included in the received first image 302A. In some embodiments, the second set of image features may be image features that may be falsely detected by the DNN model 108 or may remain undetected (e.g., at 304). In an embodiment, the extracted second set of image features may include information that may be needed to optimally classify each object (in the first image 302A) into a particular object class. Examples of the second set of image features may include, but are not limited to, edges, lines, contours, and other low-level image features. Examples of image feature detection models 110 may include, but are not limited to, scale-invariant feature transform (SIFT) based models, acceleration robust feature (SURF) based models, orientation FAST and rotation BRIEF (ORB) based models, or FAST approximate nearest neighbor (FLANN) based models. Detailed implementations of these example methods may be known to those skilled in the art, and thus, a detailed description of such methods is omitted from this disclosure for brevity. As shown in fig. 3, for example, the second image feature set 306A associated with the first image 302A depicts a second image feature set extracted from the SIFT-based model, and the second image feature set 306B associated with the first image 302A depicts a second image feature set extracted from the SURF-based model. For example, if the object of interest is a face of a person (such as a spider knight-errant or any other person/character in the first image 302A), the second image feature set 306A (or the second image feature set 306B) may include edges and contours of eyes, edges and contours of ears, edges and contours of nose, and other low-level facial details of the person/character.
Circuitry 202 may also be configured to generate a second feature vector associated with the received first image 302A based on the extracted second set of image features. Such a second feature vector may also be referred to as a unique set of second image features. The generated second feature vector may include a plurality of vector elements, wherein each vector element may correspond to an image feature in the extracted second set of image features. Each vector element of the second feature vector may store a value that may correspond to a certain second image feature of the second set of image features. For example, if the received first image 302A is a high definition image (e.g., an image of "1024x1024" pixels), the second feature vector may be a "1x2048" vector having 2048 vector elements. The ith element of the second feature vector may represent the value of the ith second image feature.
At 308, the feature vectors may be combined. In an embodiment, circuitry 202 may be configured to generate a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. In an embodiment, circuitry 202 may be configured to automatically combine the generated first feature vector and the generated second feature vector. For example, if the received first image 302A is a high definition image (e.g., an image of "1024x1024" pixels), the first feature vector may be a "1x2048" vector having 2048 vector elements, and the second feature vector may also be a "1x2048" vector having 2048 vector elements. In this case, the third feature vector may be a "1x4096" vector having 4096 vector elements.
In an embodiment, circuitry 202 may be configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector by machine learning model 316 (i.e., different from DNN model 108 and image feature detection model 110). The machine learning model 316 may be a regression model that may be trained on a set of similarly sized feature vectors, where each vector may be labeled with a user-defined weight value. The machine learning model 316 may be trained on a feature vector weight assignment task, where the machine learning model 316 may receive as input a set of two similarly sized feature vectors and may output weights for each of the set of two similarly sized feature vectors. The machine learning model 316 may be defined by its super-parameters (e.g., number of weights, cost function, input size, number of layers, etc.). The hyper-parameters of the machine learning model 316 may be tuned and the weights may be updated to move toward the global minimum of the cost function of the machine learning model 316. After training the feature information in the training dataset for a number of periods, the machine learning model 316 may be trained to output the set of weight values for the input. The output may indicate a weight value for each input in a set of inputs (e.g., a first feature vector and a second feature vector).
The machine learning model 316 may include electronic data, which may be implemented as software components of an application executable on the electronic device 102, for example. The machine learning model 316 may rely on libraries, external scripts, or other logic/instructions for execution by a computing device (such as the circuitry 202) that includes a processor. The machine learning model 316 may include code and routines configured to enable a computing device (such as circuitry 202) including a processor to perform one or more operations for determining a first weight associated with a first feature vector and determining a second weight associated with a second feature vector. Additionally or alternatively, the machine learning model 316 may be implemented using hardware including a processor, a microprocessor (e.g., to perform or control the performance of one or more operations), a Field Programmable Gate Array (FPGA), or an Application Specific Integrated Circuit (ASIC). Alternatively, in some embodiments, the machine learning model 316 may be implemented using a combination of hardware and software.
Each of the first weight associated with the generated first feature vector and the second weight associated with the generated second feature vector may indicate a likelihood that the respective feature vector is used to identify an object of interest in the image and thereby identify a reliability of a similar image that may include the object of interest. The first weight and the second weight may specify confidence values for reliability (based on probability values between 0 and 1). Thus, more reliable feature vectors may have higher weight values. For example, if the received first image 302A is a high resolution image and the extracted first image feature set is more detailed than the extracted second image feature set, the generated first feature vector may be more reliable than the generated second feature vector. In such an example, the first weight associated with the generated first feature vector may have a higher weight value (such as 0.6) than the second weight (such as a value of 0.4) associated with the generated second feature vector. In contrast, if the received first image 302A is a low resolution image and the extracted second image feature set is more detailed than the extracted first image feature set, the generated second feature vector may be more reliable than the generated first feature vector. In such an example, the first weight associated with the generated first feature vector may have a smaller weight value (such as 0.4) than the second weight (such as a value of 0.6) associated with the generated second feature vector. In addition, if the received first image 302A is a medium resolution image (e.g., a standard resolution image), the first weights associated with the generated first feature vectors may have equal weight values (such as 0.5) as compared to the second weights (such as values of 0.5) associated with the generated second feature vectors. Based on the determined first weight and the determined second weight, circuitry 202 may be further configured to combine the generated first feature vector and the generated second feature vector to further generate a third feature vector based on the combination.
In an embodiment, circuitry 202 may be configured to receive a user input including a first weight associated with a generated first feature vector and including a second weight associated with a generated second feature vector. In an example, the received user input may indicate a first weight associated with the generated first feature vector as "0.4", and circuitry 202 may then be configured to determine a second weight associated with the generated second feature vector as "0.6". In another example, the received user input may indicate a first weight associated with the generated first feature vector as "0.5" and a second weight associated with the generated second feature vector as "0.5". Based on the received user input, circuitry 202 may be further configured to combine the generated first feature vector and the generated second feature vector to further generate a third feature vector based on the combination.
In an embodiment, circuitry 202 may be configured to classify, by DNN model 108, received first image 302A as a first image marker from a set of image markers associated with DNN model 108. The first image tag may specify an image tag to which the received first image 302A may belong. For example, the received first image 302A may have an image marker such as a spider knight-errant character. Circuitry 202 may also be configured to determine a first count of images associated with a first image marker in a training dataset associated with DNN model 108, such as training dataset 112. For example, circuitry 202 may compare the first image tag to image tags from pre-stored training images in a set of pre-stored training images in training dataset 112. If the image tag of the pre-stored training image in the training data set 112 matches the first image tag, the circuitry 202 may increment a first count of images associated with the first image tag by one. Similarly, based on a comparison of the image signature of each training image in the set of pre-stored training images in the training data set 112 with the first image signature, the circuitry 202 may determine a first count of images. Circuitry 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first count of images associated with the first image tag. For example, if the first count of images in the training data set 112 is above a certain threshold (e.g., a certain threshold count or percentage of total images in the training data set 112), then the first weight associated with the generated first feature vector may have a higher weight value than the second weight associated with the generated second feature vector. In contrast, if the first count of images in the training data set 112 is less than a threshold or nominal value (e.g., a very negligible value, such as a few hundred images in the training data set 112 of one million images), then the first weight associated with the generated first feature vector may have a smaller weight value than the second weight associated with the generated second feature vector. Based on the determined first weight and the determined second weight, circuitry 202 may be configured to combine the generated first feature vector and the generated second feature vector to generate a third feature vector to further generate the third feature vector based on the combination.
In an embodiment, circuitry 202 may be configured to determine an image quality score associated with received first image 302A based on at least one of the extracted first image feature set or the extracted second image feature set. The image quality score may indicate a qualitative value associated with the received fidelity level of the first image 302A. A higher image quality score may indicate a higher level of fidelity of the received first image 302A. The image quality score may correspond to, but is not limited to, sharpness, noise, dynamic range, tone reproduction, contrast, color saturation, distortion, vignetting, exposure accuracy, color differences, lens vignetting, color moire, or artifacts associated with the received first image 302A. The sharpness may correspond to details associated with image features associated with the received first image 302A. For example, if the pixel count or focus of the received first image 302A is high, then the sharpness of the received first image 302A may be high. The noise may correspond to disturbances in the received first image 302A, such as unwanted changes at the pixel level in the received first image 302A. The dynamic range may correspond to an amount of hue difference between the brightest shade and the darkest shade of light that may be captured in the received first image 302A. Tone reproduction may correspond to a correlation between the amount of light that may be captured in the received first image 302A and the amount of light to which the received first image 302A may be exposed. The contrast may correspond to an amount of color change in the received first image 302A. The color saturation may correspond to the color intensity in the received first image 302A. The distortion may correspond to unwanted pixel changes in the received first image 302A. Vignetting may correspond to a received first image 302A blacking, a reduced sharpness, or a reduced saturation from corners compared to the center of the received first image 302A. The exposure accuracy may correspond to capturing the received first image 302A at an optimal brightness. The color difference may correspond to color distortion in the received first image 302A. The lens halo may correspond to a response of the image capturing device to bright light. The color moire may correspond to a repeating color stripe that may appear in the received first image 302A. The artifacts associated with the received first image 302A may correspond to any virtual objects that may be present in the received first image 302A.
Circuitry 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score. For example, if the determined image quality score is high, a first weight associated with the generated first feature vector may be assigned a higher weight value than a second weight associated with the generated second feature vector. In contrast, if the determined image quality score is smaller or nominal, the first weight associated with the generated first feature vector may be assigned a smaller weight value than the second weight associated with the generated second feature vector. In an embodiment, the determined first weight may be higher than the determined second weight when the image quality score is higher than the threshold. The threshold may include, for example, an image quality score of "0.4", "0.6", "0.8", and the like. In an embodiment, circuitry 202 may be configured to receive user input to set a threshold for image quality score. In another embodiment, circuitry 202 may be configured to automatically set a threshold for image quality score. Based on the determined first weight and the determined second weight, circuitry 202 may be configured to combine the generated first feature vector and the generated second feature vector. Thereafter, circuitry 202 may be further configured to generate a third feature vector based on a combination of the generated first feature vector and the generated second feature vector.
At 310, the dimensions may be reduced. In an embodiment, circuitry 202 may be configured to reduce the dimension of the generated third feature vector. In some embodiments, circuitry 202 may adjust the size of the generated third feature vector (or compress the generated third feature vector) to match the size of the input layer of the feature extractor and then pass the resized generated third feature vector to the input layer of the feature extractor. This may reduce unwanted or repeated information from the generated third feature vector. The generation of the third feature vector may also be based on application of a Principal Component Analysis (PCA) transformation to a combination of the generated first feature vector and the generated second feature vector. For example, if the generated third vector is a "1x4096" vector having 4096 vector elements, then after the PCA transform is applied, the generated third vector may be reduced to a "1x256" vector having 256 vector elements. Detailed implementations of PCA transformations may be known to those skilled in the art, and thus, a detailed description of such transformations is omitted from this disclosure for brevity.
At 312, a similarity metric may be determined. In an embodiment, circuitry 202 may be configured to determine a similarity measure between the generated third feature vector associated with the received first image 302A and a fourth feature vector of each second image in the set of pre-stored second images. The pre-stored set of second images may be stored in database 106. In an embodiment, circuitry 202 may be configured to generate a fourth feature vector for each second image in the set of pre-stored second images. For example, circuitry 202 may apply DNN model 108, image feature detection model 110, or a combination of both to each second image in the set of pre-stored second images to generate a fourth feature vector for the respective pre-stored second image. In another example, the fourth feature vector of each respective pre-stored second image may be predetermined and pre-stored in database 106 along with the respective pre-stored second image. The similarity measure may correspond to a similarity measure to determine an image from a set of pre-stored second images that is similar to the received first image 302A. In this case, based on the determined similarity measure, the generated third feature vector associated with the received first image 302A may be compared to the fourth feature vector of each of the set of pre-stored second images to identify similar images. Examples of similarity metrics may include, but are not limited to, cosine distance similarity or euclidean distance similarity. In cosine distance similarity, a cosine distance between the generated third feature vector associated with the received first image 302A and a fourth feature vector of each second image in the set of pre-stored second images may be determined. For example, if the fourth feature vector of a particular pre-stored second image has a small cosine distance relative to the third vector generated, that particular pre-stored second image may be identified as one of the images that is similar to the received first image 302A.
At 314, similar images may be identified. In an embodiment, circuitry 202 may be configured to identify the pre-stored third image as a similar image from the set of pre-stored second images based on the determined similarity measure. For example, circuitry 202 may compare the generated third feature vector associated with the received first image 302A to a fourth feature vector of each of the set of pre-stored second images based on the similarity metric. Based on the similarity metric, if it is determined that the fourth feature vector of a particular pre-stored second image matches the generated third feature vector associated with the received first image 302A, the circuitry 202 may identify the particular pre-stored second image as a pre-stored third image (i.e., a similar image) from the set of pre-stored second images.
Circuitry 202 may also be configured to control a display device (such as display device 210) to display information associated with the identified pre-stored third image. The information associated with the identified pre-stored third image may include, but is not limited to, information such as the pre-stored third image itself, metadata associated with the pre-stored third image, a feature map between the third feature vector and the fourth feature vector, a file size of the pre-stored third image, a storage location associated with the pre-stored third image, or a file download path associated with the pre-stored third image. In an embodiment, the identified pre-stored third image may correspond to the pre-stored second video. The pre-stored second video may be associated with the first video. For example, the pre-stored third image may be one of a set of image frames in the pre-stored second video. In an embodiment, the first image 302A may be extracted from the first video. Since the pre-stored third image may be associated with or similar to the received first image 302A, the pre-stored second video may be associated with or similar to the first video.
As an example shown in fig. 3, information 314A associated with the identified pre-stored third image is shown, which may be identified based on the DNN model 108 and the image feature detection model 110 (such as a SIFT-based model). The information 314A may include a feature map between the generated third feature vector associated with the received first image 302A and a fourth feature vector associated with the identified pre-stored third image. As an example shown in fig. 3, information 314B associated with the identified pre-stored third image is also shown, which may be identified based on the DNN model 108 and the image feature detection model 110 (such as a SURF-based model). The information 314B may also include a feature map between the generated third feature vector associated with the received first image 302A and a fourth feature vector associated with the identified pre-stored third image.
As discussed above, the disclosed electronic device 102 may automatically generate a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. The third feature vector may thus comprise a first set of image features that may be determined by the DNN model 108 and a second set of image features that may be determined by the image feature detection model 110. The first set of image features may include higher-level image features (e.g., facial features such as eyes, nose, ears, and hair) and the second set of image features may include lower-level image features (e.g., points, edges, lines, contours, or basic objects and shapes of the face) associated with the received first image 302A (e.g., an image of a human face). Both the higher level image features and the lower level image features included in the third feature vector may complement each other for identification of similar images. In some scenarios, the first set of image features may not be able to detect and extract all features that may be present in the received first image 302A. Herein, some features may be falsely detected by DNN model 108 or not detected by DNN model 108. For example, if the received first image 302A is an image that is not sufficiently represented in the training dataset 112 of the DNN model 108, the first set of image features may be insufficient to identify an image similar to the received first image from the set of pre-stored second images. However, since the second set of image features (i.e., determined by the image feature detection model 110) may include lower level image features associated with the received first image 302A, including the second set of image features in the third feature vector may further improve the accuracy of identifying images similar to the received first image 302A from the pre-stored set of second images. For example, in situations where image quality is poor (e.g., in the case of low resolution and blurred images), the first set of image features (i.e., higher level image features) may be insufficient to identify similar images. In this case, the second set of image features (i.e., lower level image features) may be more useful and accurate for the identification of similar images.
Fig. 4 is a flowchart illustrating an exemplary method for reverse image search based on a Deep Neural Network (DNN) model and an image feature detection model, according to an embodiment of the present disclosure. Fig. 4 is explained in connection with elements from fig. 1, 2 and 3. Referring to fig. 4, a flow chart 400 is shown. The method illustrated in flowchart 400 may be performed by any computing system, such as by electronic device 102 or circuitry 202. The method may begin at 402 and proceed to 404.
At 404, a first image (such as first image 302A) may be received. In one or more embodiments, circuitry 202 may be configured to receive first image 302A. The receipt of the first image 302A is further described (at 302), for example, in fig. 3.
At 406, a first set of image features (such as first set of image features 304A) associated with the received first image (e.g., first image 302A) may be extracted by a Deep Neural Network (DNN) model (e.g., DNN model 108). In one or more embodiments, circuitry 202 may be configured to extract, by DNN model 108, a first set of image features 304A associated with a received first image 302A. The extraction of the first set of image features 304A is further described (at 304), for example, in fig. 3.
At 408, a first feature vector associated with the received first image 302A may be generated based on the extracted first image feature set 304A. In one or more embodiments, circuitry 202 may be configured to generate a first feature vector associated with the received first image 302A based on the extracted first image feature set 304A. The generation of the first feature vector is further described (at 304), for example, in fig. 3.
At 410, a second set of image features associated with the received first image 302A, such as the second set of image features 306A, may be extracted by an image feature detection model (e.g., the image feature detection model 110). In one or more embodiments, circuitry 202 may be configured to extract, by image feature detection model 110, a second set of image features 306A associated with the received first image 302A. In an example, the image feature detection model 110 may include at least one of a scale-invariant feature transform (SIFT) based model, an acceleration robust feature (SURF) based model, a directional FAST and rotational BRIEF (ORB) based model, or a FAST approximate nearest neighbor (FLANN) based model. The extraction of the second set of image features 306A is further described (at 306), for example, in fig. 3.
At 412, a second feature vector associated with the received first image 302A may be generated based on the extracted second image feature set 306A. In one or more embodiments, circuitry 202 may be configured to generate a second feature vector associated with the received first image 302A based on the extracted second image feature set 306A. The generation of the second feature vector is further described (at 306), for example, in fig. 3.
At 414, a third feature vector associated with the received first image 302A may be generated based on a combination of the generated first feature vector and the generated second feature vector. In one or more embodiments, circuitry 202 may be configured to generate a third feature vector associated with received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. In an example, the generation of the third feature vector may also be based on application of a Principal Component Analysis (PCA) transformation to a combination of the generated first feature vector and the generated second feature vector. The generation of the third feature vector is further described (at 308), for example, in fig. 3.
At 416, a similarity measure between the generated third feature vector associated with the received first image 302A and a fourth feature vector of each second image in the set of pre-stored second images may be determined. In one or more embodiments, the circuitry 202 may be configured to determine a similarity measure between the generated third feature vector associated with the received first image 302A and a fourth feature vector of each of the set of pre-stored second images. In an example, the similarity measure may include at least one of cosine distance similarity or euclidean distance similarity, but is not limited to these. The determination of the similarity measure is further described (at 312), for example, in fig. 3.
At 418, a pre-stored third image (such as a pre-stored third image) may be identified from the set of pre-stored second images based on the determined similarity measure. In one or more embodiments, the circuitry 202 may be configured to identify a pre-stored third image from the set of pre-stored second images based on the determined similarity measure. The identification of the pre-stored third image is further described (at 314), for example, in fig. 3.
At 420, a display device (such as display device 210) may be controlled to display information associated with the identified pre-stored third image. In one or more embodiments, circuitry 202 may be configured to control display device 210 to display information associated with the identified pre-stored third image. Control of the display device 210 is further described (at 314), for example, in fig. 3. Control may pass to the end.
Although flowchart 400 is illustrated as discrete operations, such as 404, 406, 408, 410, 412, 416, 418, and 420, the disclosure is not so limited. Thus, in certain embodiments, these discrete operations may be further divided into additional operations, combined into fewer operations, or eliminated, depending on the particular implementation, without departing from the essence of the disclosed embodiments.
Various embodiments of the present disclosure may provide a non-transitory computer-readable medium and/or storage medium having stored thereon instructions executable by a machine and/or a computer (such as electronic device 102). The instructions may cause the machine and/or computer to perform operations including receiving a first image, such as first image 302A. The operations may also include extracting, by a Deep Neural Network (DNN) model (e.g., DNN model 108), a first set of image features (e.g., first set of image features 304A) associated with the received first image 302A. The operations may also include generating a first feature vector associated with the received first image 302A based on the extracted first image feature set 304A. The operations may also include extracting, by an image feature detection model, such as image feature detection model 110, a second set of image features (e.g., second set of image features 306A) associated with the received first image 302A. The operations may also include generating a second feature vector associated with the received first image 302A based on the extracted second image feature set 304A. The operations may also include generating a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. The operations may further include determining a similarity measure between the generated third feature vector associated with the received first image and a fourth feature vector of each second image in the set of pre-stored second images. The operations may further include identifying a pre-stored third image from the set of pre-stored second images based on the determined similarity measure. The operations may also include controlling a display device (such as display device 210) to display information associated with the identified pre-stored third image.
Exemplary aspects of the present disclosure may provide an electronic device (such as electronic device 102 of fig. 1) that includes circuitry (such as circuitry 202). Circuitry 202 may be configured to receive first image 302A. Circuitry 202 may be configured to extract a first set of image features associated with a received first image 302A, such as first set of image features 304A, through a Deep Neural Network (DNN) model 108. Circuitry 202 may be configured to generate a first feature vector associated with the received first image 302A based on the extracted first image feature set 304A. Circuitry 202 may be configured to extract a second set of image features (such as second set of image features 306A) associated with the received first image 302A via image feature detection model 110. Circuitry 202 may be configured to generate a second feature vector associated with the received first image 302A based on the extracted second image feature set 306A. Circuitry 202 may be configured to generate a third feature vector associated with the received first image 302A based on a combination of the generated first feature vector and the generated second feature vector. Circuitry 202 may be configured to determine a similarity measure between the generated third feature vector associated with the received first image 302A and a fourth feature vector of each second image in the set of pre-stored second images. Circuitry 202 may be configured to identify a pre-stored third image from the set of pre-stored second images based on the determined similarity measure. Circuitry 202 may be configured to control display device 210 to display information associated with the identified pre-stored third image.
According to an embodiment, the image feature detection model 110 may include at least one of a scale-invariant feature transform (SIFT) based model, an acceleration robust feature (SURF) based model, a directional FAST and rotational BRIEF (ORB) based model, or a FAST approximate nearest neighbor (FLANN) based model, but is not limited to these.
According to an embodiment, the generation of the third feature vector may also be based on an application of a Principal Component Analysis (PCA) transformation to a combination of the generated first feature vector and the generated second feature vector. According to an embodiment, the similarity measure may include at least one of cosine distance similarity or euclidean distance similarity, but is not limited to these.
According to an embodiment, circuitry 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector by a machine learning model (such as machine learning model 316) different from DNN model 108 and image feature detection model 110. Circuitry 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight. Circuitry 202 may be configured to generate a third feature vector based on a combination of the generated first feature vector and the generated second feature vector.
According to an embodiment, circuitry 202 may be further configured to receive a user input comprising a first weight associated with the generated first feature vector and comprising a second weight associated with the generated second feature vector. Circuitry 202 may also be configured to combine the generated first feature vector and the generated second feature vector based on the received user input. Circuitry 202 may be configured to generate a third feature vector based on a combination of the generated first feature vector and the generated second feature vector.
According to an embodiment, circuitry 202 may be further configured to classify, by DNN model 108, received first image 302A as a first image marker from a set of image markers associated with DNN model 108. Circuitry 202 may be configured to determine a first count of images associated with a first image marker in a training dataset associated with DNN model 108, such as training dataset 112. Circuitry 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first count of images associated with the first image tag. Circuitry 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight. Circuitry 202 may be configured to generate a third feature vector based on a combination of the generated first feature vector and the generated second feature vector.
According to an embodiment, the circuitry 202 may be configured to determine an image quality score associated with the received first image 302A based on at least one of the extracted first image feature set 304A or the extracted second image feature set 306A. Circuitry 202 may be further configured to determine a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score. Circuitry 202 may be further configured to combine the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight. Circuitry 202 may be configured to generate a third feature vector based on a combination of the generated first feature vector and the generated second feature vector. According to an embodiment, the image quality score may correspond to at least one of sharpness, noise, dynamic range, tone reproduction, contrast, color saturation, distortion, vignetting, exposure accuracy, color differences, lens vignetting, color moire, or artifacts associated with the received first image 302A, but is not limited to these.
According to an embodiment, circuitry 202 may be further configured to extract first image 302A from the first video and to extract an identified pre-stored third image that may correspond to the pre-stored second video. The pre-stored second video may be associated with the first video.
The present disclosure may be implemented in hardware or a combination of hardware and software. The present disclosure may be realized in a centralized fashion in at least one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. A computer system or other device suitable for performing the methods described herein may be suitable. The combination of hardware and software may be a general purpose computer system with a computer program that, when loaded and executed, may control the computer system such that it carries out the methods described herein. The present disclosure may be implemented in hardware comprising a portion of an integrated circuit that also performs other functions.
The present disclosure may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. In the present context, a computer program refers to any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) Conversion to another language, code or notation; b) Are replicated in different material forms.
While the disclosure has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the disclosure. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the scope thereof. Therefore, it is intended that the disclosure not be limited to the particular embodiments disclosed, but that the disclosure will include all embodiments falling within the scope of the appended claims.

Claims (20)

1. An electronic device, comprising:
circuitry configured to:
receiving a first image;
extracting, by a Deep Neural Network (DNN) model, a first set of image features associated with the received first image;
generating a first feature vector associated with the received first image based on the extracted first image feature set;
extracting, by the image feature detection model, a second set of image features associated with the received first image;
generating a second feature vector associated with the received first image based on the extracted second image feature set;
generating a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector;
Determining a similarity measure between the generated third feature vector associated with the received first image and a fourth feature vector of each second image in the set of pre-stored second images;
identifying a pre-stored third image from the set of pre-stored second images based on the determined similarity measure; and
the display device is controlled to display information associated with the identified pre-stored third image.
2. The electronic device of claim 1, wherein the image feature detection model comprises at least one of a scale-invariant feature transform (SIFT) based model, an acceleration robust feature (SURF) based model, a directional FAST and rotational BRIEF (ORB) based model, or a FAST approximate nearest neighbor (FLANN) based model.
3. The electronic device of claim 1, wherein the generation of the third feature vector is further based on application of a Principal Component Analysis (PCA) transform to a combination of the generated first feature vector and the generated second feature vector.
4. The electronic device of claim 1, wherein the similarity measure comprises at least one of cosine distance similarity or euclidean distance similarity.
5. The electronic device of claim 1, wherein the circuitry is further configured to:
Determining, by a machine learning model different from the DNN model and the image feature detection model, a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight; and
a third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
6. The electronic device of claim 1, wherein the circuitry is further configured to:
receiving a user input comprising a first weight associated with a generated first feature vector and comprising a second weight associated with a generated second feature vector; and
combining the generated first feature vector and the generated second feature vector based on the received user input; and
a third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
7. The electronic device of claim 1, wherein the circuitry is further configured to:
classifying, by the DNN model, the received first image as a first image marker in a set of image markers associated with the DNN model;
Determining a first count of images associated with a first image marker in a training dataset associated with the DNN model;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first count of images associated with the first image tag;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight; and
a third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
8. The electronic device of claim 1, wherein the circuitry is further configured to:
determining an image quality score associated with the received first image based on at least one of the extracted first image feature set or the extracted second image feature set;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight; and
A third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
9. The electronic device of claim 8, wherein the image quality score corresponds to at least one of sharpness, noise, dynamic range, tone reproduction, contrast, color saturation, distortion, vignetting, exposure accuracy, color differences, lens vignetting, color moire, or artifacts associated with the received first image.
10. The electronic device of claim 1, wherein the circuitry is further configured to extract a first image from the first video and to extract an identified pre-stored third image corresponding to the pre-stored second video, and wherein the pre-stored second video is associated with the first video.
11. A method, comprising:
in an electronic device:
receiving a first image;
extracting, by a Deep Neural Network (DNN) model, a first set of image features associated with the received first image;
generating a first feature vector associated with the received first image based on the extracted first image feature set;
extracting, by the image feature detection model, a second set of image features associated with the received first image;
Generating a second feature vector associated with the received first image based on the extracted second image feature set;
generating a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector;
determining a similarity measure between the generated third feature vector associated with the received first image and a fourth feature vector of each second image in the set of pre-stored second images;
identifying a pre-stored third image from the set of pre-stored second images based on the determined similarity measure; and
the display device is controlled to display information associated with the identified pre-stored third image.
12. The method of claim 11, wherein the image feature detection model comprises at least one of a scale-invariant feature transform (SIFT) based model, an acceleration robust feature (SURF) based model, a directional FAST and rotational BRIEF (ORB) based model, or a FAST approximate nearest neighbor (FLANN) based model.
13. The method of claim 11, wherein the generation of the third feature vector is further based on application of a Principal Component Analysis (PCA) transform to a combination of the generated first feature vector and the generated second feature vector.
14. The method of claim 11, wherein the similarity measure comprises at least one of cosine distance similarity or euclidean distance similarity.
15. The method of claim 11, further comprising:
determining, by a machine learning model different from the DNN model and the image feature detection model, a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight; and
a third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
16. The method of claim 11, further comprising:
receiving a user input comprising a first weight associated with a generated first feature vector and a second weight associated with a generated second feature vector; and
combining the generated first feature vector and the generated second feature vector based on the received user input; and
a third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
17. The method of claim 11, further comprising:
classifying, by the DNN model, the received first image as a first image marker in a set of image markers associated with the DNN model;
determining a first count of images associated with a first image marker in a training dataset associated with the DNN model;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined first count of images associated with the first image tag;
combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight; and
a third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
18. The method of claim 11, further comprising:
determining an image quality score associated with the received first image based on at least one of the extracted first image feature set or the extracted second image feature set;
determining a first weight associated with the generated first feature vector and a second weight associated with the generated second feature vector based on the determined image quality score;
Combining the generated first feature vector and the generated second feature vector based on the determined first weight and the determined second weight; and
a third feature vector is generated based on a combination of the generated first feature vector and the generated second feature vector.
19. The method of claim 18, wherein the image quality score corresponds to at least one of sharpness, noise, dynamic range, tone reproduction, contrast, color saturation, distortion, vignetting, exposure accuracy, color differences, lens vignetting, color moire, or artifacts associated with the received first image.
20. A non-transitory computer-readable medium storing computer-executable instructions that, when executed by an electronic device, cause the electronic device to perform operations comprising:
receiving a first image;
extracting, by a Deep Neural Network (DNN) model, a first set of image features associated with the received first image;
generating a first feature vector associated with the received first image based on the extracted first image feature set;
extracting, by the image feature detection model, a second set of image features associated with the received first image;
Generating a second feature vector associated with the received first image based on the extracted second image feature set;
generating a third feature vector associated with the received first image based on a combination of the generated first feature vector and the generated second feature vector;
determining a similarity measure between the generated third feature vector associated with the received first image and a fourth feature vector of each second image in the set of pre-stored second images;
identifying a pre-stored third image from the set of pre-stored second images based on the determined similarity measure; and
the display device is controlled to display information associated with the identified pre-stored third image.
CN202280007041.6A 2021-05-18 2022-05-18 Reverse image search based on Deep Neural Network (DNN) model and image feature detection model Pending CN116420143A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US202163189956P 2021-05-18 2021-05-18
US63/189,956 2021-05-18
US17/482,290 US11947631B2 (en) 2021-05-18 2021-09-22 Reverse image search based on deep neural network (DNN) model and image-feature detection model
US17/482,290 2021-09-22
PCT/IB2022/054647 WO2022243912A1 (en) 2021-05-18 2022-05-18 Reverse image search based on deep neural network (dnn) model and image-feature detection model

Publications (1)

Publication Number Publication Date
CN116420143A true CN116420143A (en) 2023-07-11

Family

ID=81927630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280007041.6A Pending CN116420143A (en) 2021-05-18 2022-05-18 Reverse image search based on Deep Neural Network (DNN) model and image feature detection model

Country Status (4)

Country Link
EP (1) EP4323892A1 (en)
JP (1) JP2024519504A (en)
CN (1) CN116420143A (en)
WO (1) WO2022243912A1 (en)

Also Published As

Publication number Publication date
EP4323892A1 (en) 2024-02-21
JP2024519504A (en) 2024-05-14
WO2022243912A1 (en) 2022-11-24

Similar Documents

Publication Publication Date Title
CN112508094B (en) Garbage picture identification method, device and equipment
US20210042928A1 (en) Image mask generation using a deep neural network
WO2019100724A1 (en) Method and device for training multi-label classification model
US11423634B2 (en) Object detection model training method, apparatus, and device
WO2019100723A1 (en) Method and device for training multi-label classification model
US11776257B2 (en) Systems and methods for enhancing real-time image recognition
WO2020024585A1 (en) Method and apparatus for training object detection model, and device
US10122912B2 (en) Device and method for detecting regions in an image
US11947631B2 (en) Reverse image search based on deep neural network (DNN) model and image-feature detection model
US12100169B2 (en) Sparse optical flow estimation
CN112101386B (en) Text detection method, device, computer equipment and storage medium
JP6997369B2 (en) Programs, ranging methods, and ranging devices
US10089721B2 (en) Image processing system and method for object boundary smoothening for image segmentation
US11495020B2 (en) Systems and methods for stream recognition
CN111291773A (en) Feature identification method and device
CN112101185B (en) Method for training wrinkle detection model, electronic equipment and storage medium
US20220406091A1 (en) Landmark detection using deep neural network with multi-frequency self-attention
WO2024041108A1 (en) Image correction model training method and apparatus, image correction method and apparatus, and computer device
EP3970061B1 (en) Tyre sidewall imaging method
US12045315B2 (en) Neural network-based image-to-image translation
CN116420143A (en) Reverse image search based on Deep Neural Network (DNN) model and image feature detection model
US20240037922A1 (en) Adapting generative neural networks using a cross domain translation network
US20230177871A1 (en) Face detection based on facial key-points
KR102711289B1 (en) Servers, systems, methods and programs that provide image generation services containing non-identifying qr codes for tracking purposes
Nagpal et al. AI-Based Pro Mode in Smartphone Photography

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination