WO2021213339A1 - Procédé et système d'extraction et de stockage de métadonnées d'images - Google Patents

Procédé et système d'extraction et de stockage de métadonnées d'images Download PDF

Info

Publication number
WO2021213339A1
WO2021213339A1 PCT/CN2021/088186 CN2021088186W WO2021213339A1 WO 2021213339 A1 WO2021213339 A1 WO 2021213339A1 CN 2021088186 W CN2021088186 W CN 2021088186W WO 2021213339 A1 WO2021213339 A1 WO 2021213339A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
coordinates
present disclosure
communication device
processing unit
Prior art date
Application number
PCT/CN2021/088186
Other languages
English (en)
Inventor
Tiwari VIPIN
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp., Ltd. filed Critical Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Publication of WO2021213339A1 publication Critical patent/WO2021213339A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • the present invention generally relates to the field of processing of image data and more particularly to a method and system for extracting and storing image metadata.
  • the rapid enhancement in communication technology has enabled users to extract relevant information from an image or a picture.
  • the image may be one of a clicked image, downloaded image, or a screenshot taken from a communication device.
  • these technologies for e.g., ML, AI, image processing
  • these technologies have enabled the users to perform a number of tasks based on the extracted information from the image.
  • a user may use the extracted data of the image for searching through the images, for exploring places, for getting translation of text in the image and much more.
  • the conventional available method and system process an image multiple times, i.e. each time a query is received such as search for an image, the existing solutions process all images to identify information associated with the images so that search can be performed on the extracted information. This makes the process inefficient and time consuming.
  • the system or method will process the image to detect the object only. Further, the same image will get processed again in case user wants translation of any text available on the image.
  • the conventional available method and systems used for image processing are not efficient.
  • an object of the present disclosure is to provide a novel method and system of extracting and storing image metadata. It is another object of the present disclosure to identify at least one object in the received image. It is yet another object of the present disclosure to identify and store the coordinates of the at least one identified object. It is yet another object of the present disclosure to prevent reprocessing of the image to perform various task associated with the image such as cropping/editing of different features of the image.
  • the present disclosure provides a method and system for extracting and storing image metadata.
  • One aspect of the present invention relates to a method of extracting and storing image metadata.
  • the method comprises, receiving, at a transceiver unit, at least one image.
  • the method thereafter comprises identifying, by a processing unit, at least one object in each of the at least one image.
  • the method then encompasses determining, by the processing unit, a set of coordinates associated with each of the at least one object.
  • the method further comprises storing, at a storage unit, the at least one object and the set of coordinates associated with each of the at least one object.
  • the system comprises a transceiver unit, a processing unit, and a storage unit.
  • the transceiver unit is configured to receive at least one image.
  • the processing unit is configured to identify at least one object in each of the at least one image and to determine a set of coordinates associated with each of the at least one object.
  • the storage unit is configured to store the at least one object and the set of coordinates associated with each of the at least one object.
  • the communication device comprises a system.
  • the system further configured to receive at least one image. Thereafter, the system is configured to identify at least one object in each of the at least one image and to determine a set of coordinates associated with each of the at least one object. Also, the system is configured to store the at least one object and the set of coordinates associated with each of the at least one object.
  • FIG. 1 illustrates a block diagram of the system [100] for extracting and storing image metadata, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 2 illustrates an exemplary method [200] of extracting and storing image metadata, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 3 illustrates an exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 4 illustrates another exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 5 illustrates yet another exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 6 illustrates yet another exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • the present disclosure provides a solution for analysing, extracting and storing the image metadata. More specifically, the present disclosure provides a method and system for storing coordinates or location of the identified objects present within the image. The present disclosure also provides a more reliable and secure solution of analysing extracted metadata for performing various task such as to find accurate region for cropping or editing of detected objects without calculation or performing image processing again and again.
  • the present disclosure first allows receiving of at least one image from various sources and then perform identification of at least one object present in the received image. Further, the present disclosure encompasses identification of the location of the identified objects and then based on the location of the identified object, the present disclosure allows determining of a set of coordinates associated with the identified object. Next, the present disclosure allows storing of the determined set of coordinates associated with the identified object for future use such as for cropping of the objects from the image, editing of the image, translating the identified text in a different language, content-based shopping and the like.
  • “user device” and/or “communication device” may be any electrical, electronic, electromechanical and computing device or equipment, having one or more transceiver unit installed on it.
  • the communication device may include, but is not limited to, a mobile phone, smart phone, laptop, a general-purpose computer, desktop, personal digital assistant, tablet computer, wearable device or any other computing device which is capable of implementing the features of the present disclosure and is obvious to a person skilled in the art.
  • a “processing unit” or “processor” includes one or more processors, wherein processor refers to any logic circuitry for processing instructions.
  • a processor may be a general-purpose processor, a special purpose processor, a conventional processor, a digital signal processor, a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits, Field Programmable Gate Array circuits, any other type of integrated circuits, etc.
  • the processor may perform signal coding data processing, input/output processing, and/or any other functionality that enables the working of the system according to the present disclosure. More specifically, the processor or processing unit is a hardware processor.
  • the “transceiver unit” may include but not limited to a transmitter to transmit a set of data or a receiver to receive a set of data. Further, the transceiver unit may include any other similar unit obvious to a person skilled in the art, to implement the features of the present disclosure. In context of the present disclosure, the transceiver unit act as a receiver to receive at least one image.
  • storage unit refers to a machine or computer-readable medium including any mechanism for storing information in a form readable by a computer or similar machine.
  • a computer-readable medium includes read-only memory ( “ROM” ) , random access memory ( “RAM” ) , magnetic disk storage media, optical storage media, flash memory devices or other types of machine-accessible storage media.
  • Metadata refers to “data about data” .
  • metadata can describe the structure, content, history and intended usage of the data with which it is associated.
  • the metadata allows for a communication device to at least partially understand the context and meaning of the image.
  • the examples of the image metadata include but not limited to the objects in an image, the location coordinates of the objects, etc.
  • FIG. 1 illustrates a system [100] for extracting and storing image metadata, in accordance with exemplary embodiment of the present disclosure.
  • the system [100] comprises at least one transceiver unit [102] , at least one processing unit [104] and at least one storage unit [106] , wherein all the components are assumed to be connected to each other unless otherwise indicated below.
  • the system [100] may comprises multiple such units and modules or the system may comprise any such numbers of said units and modules, obvious to a person skilled in the art to implement the features of the present disclosure.
  • the transceiver unit [102] of the present disclosure may include but not limited to one or more transmitters for the transmission of data and one or more receivers for receiving of data and any other similar units obvious to a person skilled in the art, to implement the features of the present disclosure.
  • the transceiver unit [102] acts as a receiver to receive data (at least one image) from one or more sources.
  • the transceiver unit [102] is configured to receive one or more media files.
  • the media files include but not limited to an image file or jpg file, a gif file, a video file, an audio file and the like.
  • the transceiver unit [102] is configured to receive at least one image.
  • the transceiver unit [102] may receive at least one image through the use of camera of the communication device.
  • camera of the communication device facilitates in capturing of the at least one image.
  • the transceiver unit [102] may receive at least one image through the use of internet access associated with the communication device.
  • the internet access unit associated with the communication device allow the downloads of the at least one image from the Internet.
  • the transceiver unit [102] may receive at least one image through screenshot generating unit associated with the communication device.
  • the screenshot generating unit allows a user to take screenshot of the liked items or image files on the communication device.
  • the screenshot or screen capture is defined as a way of taking photo of the screen of the communication device.
  • the system [100] includes the processing unit [104] .
  • the processing unit [104] of the present disclosure is connected to the transceiver unit [102] .
  • the processing unit [104] is configured to process the received at least one image.
  • the processing unit [104] is configured to analyse the received image and to extract the features or metadata associated with the received image.
  • the processing unit [104] analyse the received image using machine learning model, Artificial intelligence techniques, image processing techniques, computer vision techniques and the like.
  • the processing of the received image includes extracting features associated with the received image, detecting object present in the received image, detecting set of coordinates of the identified objects, identifying location of the coordinates of identified objects and the like.
  • the processing unit [104] identifies at least one object in the received image based on one or more pre-defined objects.
  • the one or more pre-defined objects may include stored parameters/categories such as particular shape, structure, vehicle, human, building, vegetable, fruits, locations and the like.
  • the processing unit [104] is further configured to classify the identified at least one object into one or more pre-defined categories.
  • the one or more pre-defined categories may include category of vehicles, fruits, vegetables, furniture, humans, animals, walls, etc.
  • the processing unit [104] is further configured to determine a set of coordinates associated with each of the identified at least one object in the received image.
  • the set of coordinates associated with each of the identified at least one object in the received image are determined based on the identification of a location of the object in the received image.
  • the processing unit determines a set of coordinates (x1, y1) & (x2, y2) of an apple identified as an object in the received image.
  • the processing unit [104] determines a set of coordinates (x1, y1) & (x2, y2) of a car identified as a first object in the received image and a set of coordinates (x3, y3) & (x4, y4) of a text banner identified as a second object in the received image.
  • the processing unit [104] is further configured to modify the set of coordinates associated with the identified at least one object based on a resolution of the received image.
  • the system further includes the storage unit [106] .
  • the storage unit [106] of the present disclosure is connected to the processing unit [104] and the transceiver unit [102] .
  • the storage unit [106] is configured to store the identified at least one object and the determined set of coordinates associated with each of the identified at least one object.
  • the storage unit [106] stores the identified object and the set of coordinates associated with each of the identified object to prevent processing of the received image again and again for performing certain activities.
  • the stored set of coordinates of the identified object of the image facilitates performing of at least one of processing of image, extracting objects from the image, searching identified object, cropping identified object from the image, generating translation and the like.
  • the stored set of coordinates of the identified objects and resolution of the image helps in scaling up or scaling down to identify the region of identified objects on different communication devices or different resolution devices.
  • FIG. 2 an exemplary method flow diagram [200] , depicting method of extracting and storing image metadata, in accordance with exemplary embodiment of the present disclosure is shown. As shown in Fig. 2, the method begins at step [202] .
  • the method comprises receiving, at a transceiver unit [102] , at least one image.
  • the at least one image may include an image taken by the camera of the communication device.
  • the at least one image may include a screenshot taken by the user using communication device.
  • the at least one image may include an image downloaded from various sources like web browser/social media applications (e.g., Google, Whatsapp, Facebook, Wechat, Twitter) .
  • the method comprises identifying, by a processing unit [104] , at least one object in each of the at least one image.
  • the identification of the at least one object in the each of the at least one image is done by first analyzing the image for object detection using machine learning algorithms and image processing techniques.
  • the identification of the at least one object in each of the at least one image is done based on one or more pre-defined objects saved in the object detection library.
  • the processing unit [104] is configured to classify the least one object into one or more pre-defined categories.
  • the identification of objects and classification of identified objects is done using machine learning algorithms by utilizing the objection detection library.
  • the method encompasses determining, by the processing unit [104] , a set of coordinates associated with each of the at least one object.
  • the determined set of coordinated facilitates in identifying a location of the at least one object in each of the at least one image.
  • processing unit [104] is configured to modify the set of coordinates based on a resolution of the at least one image so that the accurate region of the identified object can be accessed even on different communication device by using scaling up/scaling down techniques.
  • the processing unit [104] stores the coordinates of the identified at least one object, resolution of the image and screen dimension of the source communication device.
  • the processing unit [104] stores the coordinates of an object [ (xi1, yi1) , (xi2, yi2) ] , height (h1) and width (w2) of the screen dimension of the source communication device. Further, the processing unit [104] makes use of the stored coordinates of the identified object, height, width of the screen of the source and destination communication device to perform transformation of the coordinates for the mapping of the coordinates on the destination communication device.
  • the method at step [210] leads to storing, by a storing unit [106] , the at least one object and the set of coordinates associated with each of the at least one object. After successful extracting and storing of the image metadata, the method further terminates at step [212] .
  • the received image is an image captured using camera of the communication device.
  • the captured image is first analyzed to detect various objects present in the captured image using machine learning model, Artificial intelligence techniques, image processing techniques, computer vision, objection detection libraries and the like.
  • a set of coordinates are determined which are associated with each of the identified at least one object in the captured image.
  • the set of coordinates for the identified objects are determined as per the screen resolution of the communication device used for capturing the image.
  • the identified set of coordinates are saved in meta values in a defined structure for future use.
  • the objection detection library includes different colors, shape, size, dimensions, and names of fruit.
  • the method facilitates detection of apples from the image based on the mapping of the objects i.e., parameters of apples disclosed in the object detection library and the picture of apples present in the image clicked by the user.
  • the method determines the location or coordinates of the apples detected in the clicked image and then store the determined coordinates in meta values in defined structure which may be used in future for various purpose like shopping of fruit or identifying the name of fruit.
  • the insertion of the meta data includes a first step of receiving source image and a second step of analyzing the image for object detection.
  • the source image is a captured image taken by a camera.
  • the source image is a screenshot taken with the help of the communication device.
  • the source image is a downloaded image. In case, if the object is detected in the received source image, then the corresponding coordinates of the object is determined and saved in the metadata of the image, else, null values are saved in the metadata of the image when no object is identified in the received source image.
  • the stored image metadata such as the set of coordinates of the identified or detected objects are used for a number of tasks such as for the purpose of editing/cropping of image part, for the purpose of editing/cropping of text part, for the purpose of search content based on category, for the purpose of translation of text and the like.
  • FIG. 3 illustrates an exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 3 displays extraction of the image part from the picture based on the stored metadata associated with the image part.
  • the first picture displayed in FIG. 3 includes an image part and a text part.
  • the user wants only to crop/edit the image part out of the picture then the present disclosure fetches the corresponding metadata associated with the image part and provides to the user, extracted image part out of the picture to perform further operations on the image part.
  • the picture can be a banner having a photo of a person and some text written at one side of the picture.
  • the user can provide inputs for the extraction of image part by choosing an image extraction option i.e., I (n) on the user equipment, wherein pressing or choosing I (n) button will display number of image part present in the picture and provide the extracted images from the picture one by one (shown in second picture of Fig. 3) .
  • I (n) image extraction option
  • FIG. 4 illustrates another exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 4 display extraction of the text part from the picture based on the stored metadata associated with the text part.
  • the first picture displayed in FIG. 4 includes an image part and a text part.
  • the user wants only to crop/edit the text part out of the picture then the present disclosure fetches the corresponding metadata associated with the text part and provide the user extracted text part out of the picture to perform further operations on the text part.
  • the picture can be a banner having a photo of a person and some text written at one side of the picture.
  • the user can provide inputs for the extraction of image part by choosing text extraction option i.e., t (n) on the user equipment, wherein pressing or choosing t (n) button will display number of text part/block present in the picture and provide the extracted text part from the picture one by one (shown in second picture of Fig. 4) .
  • text extraction option i.e., t (n) on the user equipment
  • pressing or choosing t (n) button will display number of text part/block present in the picture and provide the extracted text part from the picture one by one (shown in second picture of Fig. 4) .
  • FIG. 5 illustrates yet another exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 5 display an example of performing search based on the user specific content or user request.
  • the first picture displayed in the FIG. 5 includes an image part and a text part.
  • the user wants to perform image based search, then based on the metadata associated with the image in the picture, the present disclosure enables or allows the user to perform search only specific to the image part in the picture and provide similar images or objects on the searching/shopping platforms (displayed in the second picture of FIG. 5) .
  • the picture is having two image parts, out of which one is an image of a Colgate and other is an image of a Perfume.
  • the user wants to buy same Perfume displayed in the picture for which the user with the help of present disclosure extracts the image part having perfume and then perform search on the one or more shopping platforms to purchase that specific Perfume displayed in the picture.
  • FIG. 6 illustrates yet another exemplary use case of the present disclosure, in accordance with exemplary embodiment of the present disclosure.
  • FIG. 6 display translation of the text part based on the metadata associated with the text part.
  • the first picture displayed in FIG. 6 includes an image part and a text part.
  • the user wants the translation of the text part then the present disclosure first fetches the corresponding metadata associated with the text part and provide the user extracted text part out of the picture to perform further operation of translating the extracted text part.
  • a user X clicked an image of a banner.
  • the image of the banner includes a picture of a person and text written in Chinese language.
  • picture of a person is detected as a first object and the text in Chinese language is detected as a second object.
  • the coordinates of the first object and the second object are saved in the image metadata in a database.
  • the user wants English translation of the text written in Chinese language.
  • the present method and system facilitates in fetching only the saved coordinates of the second object i.e., the text portion without processing the whole image again and then provides English translation of the Chinese text portion using any translation engine. Therefore, the method and system for extracting and storing image metadata facilitates in identifying the text region for translation by saving the time of the translation engine to again identify the text area because of the presence of text coordinates in the image metadata.
  • the one or more aspect of the present disclosure relates to a communication device for extracting and storing of image metadata.
  • the communication device includes the system [100] , wherein the system is configured to receive at least one image.
  • the system is further configured to identify at least one object in each of the at least one image.
  • the system is configured to determine a set of coordinates associated with each of the at least one object.
  • the system is configured to store the at least one object and the set of coordinates associated with each of the at least one object.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne des procédés et un système d'extraction et de stockage de métadonnées d'images. Le procédé reçoit au moins une image. Le procédé identifie ensuite au moins un objet dans l'image ou chacune des images. En outre, le procédé détermine un ensemble de coordonnées associées à l'objet ou à chacun des objets. Le procédé stocke alors l'objet ou les objets et l'ensemble de coordonnées associées à l'objet ou à chacun des objets.
PCT/CN2021/088186 2020-04-22 2021-04-19 Procédé et système d'extraction et de stockage de métadonnées d'images WO2021213339A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202041017213 2020-04-22
IN202041017213 2020-04-22

Publications (1)

Publication Number Publication Date
WO2021213339A1 true WO2021213339A1 (fr) 2021-10-28

Family

ID=78270248

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088186 WO2021213339A1 (fr) 2020-04-22 2021-04-19 Procédé et système d'extraction et de stockage de métadonnées d'images

Country Status (1)

Country Link
WO (1) WO2021213339A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022988A (zh) * 2015-06-24 2015-11-04 福州瑞芯微电子有限公司 电子手稿的识别方法及装置
US20170024791A1 (en) * 2007-11-20 2017-01-26 Theresa Klinger System and method for interactive metadata and intelligent propagation for electronic multimedia
CN107710280A (zh) * 2015-05-26 2018-02-16 “实验室24”股份有限公司 对象可视化方法
US20180308271A1 (en) * 2015-04-13 2018-10-25 International Business Machines Corporation Synchronized display of street view map and video stream
WO2019036309A1 (fr) * 2017-08-14 2019-02-21 Amazon Technologies, Inc. Reconnaissance d'identité sélective utilisant un suivi d'objet

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170024791A1 (en) * 2007-11-20 2017-01-26 Theresa Klinger System and method for interactive metadata and intelligent propagation for electronic multimedia
US20180308271A1 (en) * 2015-04-13 2018-10-25 International Business Machines Corporation Synchronized display of street view map and video stream
CN107710280A (zh) * 2015-05-26 2018-02-16 “实验室24”股份有限公司 对象可视化方法
CN105022988A (zh) * 2015-06-24 2015-11-04 福州瑞芯微电子有限公司 电子手稿的识别方法及装置
WO2019036309A1 (fr) * 2017-08-14 2019-02-21 Amazon Technologies, Inc. Reconnaissance d'identité sélective utilisant un suivi d'objet

Similar Documents

Publication Publication Date Title
CN107256109B (zh) 信息显示方法、装置及终端
CN106446816B (zh) 人脸识别方法及装置
US8370358B2 (en) Tagging content with metadata pre-filtered by context
CN110249304B (zh) 电子设备的视觉智能管理
US9805022B2 (en) Generation of topic-based language models for an app search engine
US8909617B2 (en) Semantic matching by content analysis
US10282374B2 (en) System and method for feature recognition and document searching based on feature recognition
US9633049B2 (en) Searching apparatus, searching method, and searching system
US20070073749A1 (en) Semantic visual search engine
CN113434716B (zh) 一种跨模态信息检索方法和装置
US8861896B2 (en) Method and system for image-based identification
WO2012141655A1 (fr) Annotation de produit vidéo avec exploration d'informations web
US20160026858A1 (en) Image based search to identify objects in documents
CN113806588B (zh) 搜索视频的方法和装置
CN111209431A (zh) 一种视频搜索方法、装置、设备及介质
US20190188224A1 (en) Method and apparatus for obtaining picture public opinions, computer device and storage medium
CN112766284A (zh) 图像识别方法和装置、存储介质和电子设备
CN113869063A (zh) 数据推荐方法、装置、电子设备及存储介质
EP3564833B1 (fr) Procédé et dispositif d'identification d'image principale dans une page web
CN110674123B (zh) 数据预处理方法、装置、设备及介质
CN111783786A (zh) 图片的识别方法、系统、电子设备及存储介质
US20130230248A1 (en) Ensuring validity of the bookmark reference in a collaborative bookmarking system
WO2021213339A1 (fr) Procédé et système d'extraction et de stockage de métadonnées d'images
CN113342684A (zh) 一种网页的测试方法、装置及设备
WO2022031283A1 (fr) Contenu de flux vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21792305

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21792305

Country of ref document: EP

Kind code of ref document: A1