WO2022149153A1

WO2022149153A1 - A system for informative scanning

Info

Publication number: WO2022149153A1
Application number: PCT/IN2021/050603
Authority: WO
Inventors: Lipika DAS SINHA
Original assignee: Das Sinha Lipika
Priority date: 2021-01-09
Filing date: 2021-06-22
Publication date: 2022-07-14

Abstract

The present invention relates to a system for providing information from the captured visual information. The system includes a computing device and a cloud server computer. The computing device includes an optical scanner, a microphone, a computing device memory, and a device processing unit. The optical scanner captures visual information from a rendered object. The microphone captures sound information from the rendered object. The system enables users to capture large information while not in connection with a separate computing device, eliminating the need for a constant wired connection to the separate device. Additionally, the device provides users with a convenient handheld device capable of obtaining, storing, and processing large amounts of different types of data.

Description

A SYSTEM FOR INFORMATIVE SCANNING

FIELD OF THE INVENTION:

The present disclosure relates generally to a system for informative scanning, and more particularly, to a system and method for providing information from the captured visual information and sound information and using that information to organize, process action that required captured visual information and sound information.

BACKGROUND OF THE INVENTION:

A lot of the work that we do today is about information. We research to gather information, we experiment to create information, and we communicate to sharethis information. As such, information comes in many forms. Unfortunately, these forms of information are not always accessible to each other. Information onpaper is often seen as portable, and easy to interface with. Computers, on the otherhand, are able to store much larger amounts of information and to search this faster than a person with paper, in many cases.

Many computers cannot access the vast quantities of printed information, however. One solution to is have a paperless workplace. The idea is that all information is electronic, to which computers will always have access. Paper is still a very useful way of storing and communicating data for people, though, and as such it thrives in modern workplaces.

Furthermore, in the current environment, a need exists to verify information captured via digital images and to rapidly transmit this information from one location to another. However, no system or hardware currently exists to enable a user to capture and store images and scanners generally do not store a large amount of data, and, therefore, a user may need to transfer data to another storage device, such as a computer, before obtaining everything they may want or need. In this regard, the user may not be near their computer and may not be able to transfer data out of the scanner in order to clear room in the scanner's storage for subsequently scanned data. Some of the prior arts do explain some basic features like in the patent application no: U.S. Pat. No. 6,678,075 that discloses a slide securing device, using in a flatbed scanner, includes a frame and a securing cell. The securing cell comprises an aperture, a first securing clip, a second securing clip, a first securing groove, a second securing groove, a picking portion, a first support member, and a second support member. The slide securing device utilizes proper clips for holding slides. Thus, the scanner can directly scan the secured slides by predetermined parameters and reset mode, in order to save scanning time. Further, the securing grooves maintain slides at a fixed height to improve the scanning quality.

None of the existing prior arts can overcome problems associated with large scanners and their ability to store lesser amount of information. Therefore, a need exists, for techniques, for easily associating information about an image to the image and using the information to control and retrieve the image.

OBJECTIVE OF THE INVENTION:

The main objective of the present invention is to develop a portable and convenient scanner in order to acquire precise quality of scanned images.

Yet another objective of the present invention is to provide information from captured visual information.

Yet another objective of the present invention is to provide an apparatus that enables users to capture large information while not in connection with a separate computing device, eliminating the need for a constant wired connection to the separate device.

Further objectives, advantages, and features of the present invention will become apparent from the detailed description provided hereinbelow, in which various embodiments of the disclosed invention are illustrated by way of example.

SUMMARY OF THE INVENTION:

The present invention relates to a system for providing information from the captured visual information. The system includes a computing device and a cloud server computer. The computing device includes an optical scanner, a microphone, a computing device memory, and a device processing unit. The optical scanner captures visual information from a rendered object. The microphone captures sound information from the rendered object. The computing device memory stores all the data, computer-readable instructions, and data capturing module. The device processing unit is connected to the computing device memory that executes a data capturing module to scan the visual information using an optical scanner. The device processing unit executes the data capturing module also to capture sound information using the microphone. Herein a triggering mechanism of data capturing module that automatically causes the optical scanner and the microphone to capture visual information and sound information from the rendered object. The cloud server computer is connected to the computing device having a database unit, and a system processing unit. The database unit stores visual information and sound information captured from the rendered objects and computer-readable instructions. The system processing unit executes computer-readable instructions to receive the captured visual information and sound information. Herein, the system processing unit executes computer- readable instructions to search for contextual information related to the captured visual information and sound information and stores the captured visual information and sound information along with corresponding contextual information in the database unit. Herein, the system processing unit executes computer-readable instructions to send the captured visual information and sound information along with corresponding contextual information to be stored in the computing device memory.

In an embodiment, the present invention relates to a method for providing information from the captured visual information. The method includes: A method for capturing the visual information from the rendered object, the method having the steps of receiving the user input for selecting a location of the captured visual information; a computing device displays instructions to capture the visual information; a data capturing module with the help of the triggering mechanism automatically causes the optical scanner to capture the selected visual information; the optical scanner captures the visual information; the device processing unit stores the visual information in the computing device memory and sends captured visual information to the cloud server computer; and the device processing unit signaled the user after determining that sufficient information has been obtained from the rendered object, for executing action associated with the scanned visual information. A method for capturing sound information from the rendered object, the method having the steps of receiving the user input for selecting a location of the captured sound information; a computing device displays instructions to capture the sound information; a data capturing module with the help of the triggering mechanism automatically causes the microphone to capture the selected sound information; the microphone captures the sound information; the device processing unit stores the sound information in the computing device memory and sends captured sound information to the cloud server computer; and the device processing unit signaled the user after determining that sufficient information has been obtained from the rendered object, for executing action associated with the scanned sound information. A method for processing the visual information and the sound information captured from the rendered object, the method having: the computing device sends the visual information and the sound information to the cloud server computer; the system processing unit executes computer-readable instructions to extract data from visual information and sound information that is being captured; the system processing unit executes computer-readable instructions to search for contextual information related to the captured visual information and sound information; the system processing unit stores the captured visual information and sound information along with corresponding the contextual information in the database unit; the system processing unit executes computer-readable instructions to send the captured visual information and sound information along with corresponding the contextual information to be stored in the computing device memory; and the system processing unit executes computer-readable instructions to performs action that requires extracted information from visual information and sound information. In another embodiment, the computing device communicates the digital visual information to the system processing unit in order to determine the portion of the set of readable text in the digital visual information. The computing device communicates the digital sound information to the system processing unit in order to convert the portion of the set of sound record in the digital sound information into readable text. In yet another embodiment, the system processing unit of the computing device uses natural language processing to extract information from visual information and sound information that is being captured and search contextual information and performs the action that requires extracted information from visual information and sound information.

The main advantage of the present invention is to develop a portable and convenient scanner in order to acquire the precise quality of scanned images.

Yet another advantage of the present invention is to provide information from the captured visual information.

Yet another advantage of the present invention is to provide an apparatus that enables users to capture large information while not in connection with a separate computing device, eliminating the need for a constant wired connection to the separate device.

According to this invention, there is provided a system for informative scanning, the system comprises:

- a computing processor, with instructions, causing:

- initializing a defined environment having objects by causing said objects to be tagged with visual information and sound information, thereby causing each of such said objected to be rendered object/s;

- storing associations of initialized tags in an object information repository;

- a computing device having: o an optical scanner, the optical scanner captures visual information from an object which is the rendered object, o a microphone, the microphone captures sound information from the rendered object, o a computing device memory, the computing device memory stores all data, computer-readable instructions, and data capturing module, and o a device processing unit, the device processing unit is connected to the computing device memory and executes the datacapturing module to scan the captured visual information using the optical scanner, and the device processing unit executes the data capturing module to scan the captured sound information using the microphone, wherein a triggering mechanism of the data capturing module automatically causes the optical scanner and the microphone to capture visual information and sound information from the rendered object; and an at least one cloud server computer, the at least one cloud server computer is connected to the computing device, the at least one cloud server computer having: o an at least one database unit, the at least one database unit stores visual information and sound information captured from the rendered objects and computer-readable instructions, and o a system processing unit, the system processing unit executes computer- readable instructions to receive the captured visual information and the captured sound information, wherein, the system processing unit executes computer-readable instructions to search, upon querying by a user, by means of a multimodal search engine, for contextual information related to the captured visual information and the captured sound information and stores the captured visual information and the captured sound information, along with corresponding contextual information, in the at least one database unit; wherein, the system processing unit executes computer-readable instructions to send the captured visual information and the captured sound information, along with corresponding contextual information, to be stored in the computing device memory and to a user querying the multimodal search engine.

In at least an embodiment, said multimodal search engine being configured with pixel mapping techniques in order to fetch pixel data from said captured visual information in order to search and to output associated tagged information relevant to such captured visual information; output of an input captured visual information being text information, sound data, image data, multimedia data, user-input data, and / or combinations thereof.

In at least an embodiment, said multimodal search engine being configured with audio wave mapping techniques in order to fetch audio data from said captured sound information in order to search and to output associated tagged information relevant to such captured sound information; output of an input captured audio information being text information, sound data, image data, multimedia data, user-input data, and / or combinations thereof.

In at least an embodiment, said multimodal search engine being communicably coupled to a central server, said central server being communicably coupled to a plurality of computing devices (102), each computing device being a client device, said central server is configured to store an active cache of information in a continuously updateable server cache, in that, communicable coupling between the computing devices and the central server being internet independent wired or wireless communicable coupling.

In at least an embodiment, the computing device is a wireless reception device, wherein, the wireless reception device comprises ofa battery to provide power and the computing device memory comprises of the computer-readable instructions stored for the execution on the device processing unit.

In at least an embodiment, the computing device is selected from a tablet, a smartphone, mobile phone, computer, and laptop.

In at least an embodiment, the visual information captured from the rendered object comprises of a text from visual information and audio information.

In at least an embodiment, the at least cloud server computer is selected from a desktop computer, a server, the mainframe computer. In at least an embodiment, the computing device also able to get connected with at least one external interface device that execute the stored computer-readable instructions to communicates with the system processing unit of the computing device to retrieve data.

According to this invention, there is provided a method for providing information from the captured visual information, the method comprising: a method for informative scanning, through a multimodal search engine, the method comprises:

- storing associations of initialized tags in an object information repository;

- performing, by means of a computing device, the performing steps including: o capturing, visual information from an object which is the rendered object, o capturing, sound information from the rendered object, o storing all data, o scanning the captured visual information and the captured sound information, wherein, upon triggering, automatically causing capturing of visual information and sound information from the rendered object; and performing, by means of a at least one cloud server computer connected to the computing device, the at least one cloud server computer having: o storing visual information and sound information captured from the rendered objects and computer-readable instructions, and o executing computer-readable instructions to receive the captured visual information and the captured sound information; o searching, upon querying by a user, for contextual information related to the captured visual information and the captured sound information and storing the captured visual information and the captured sound information, along with corresponding contextual information; wherein, the method being configured to send the captured visual information and the captured sound information, along with corresponding contextual information, to a user querying the multimodal search engine of this method.

In at least an embodiment, the least one computing device communicates the digital visual information to the system processingunit in order to determine the portion of the set of readable text in the digital visual information.

In at least an embodiment, the least one computing device communicates the digital sound information to the system processing unit in order to convert the portion of the set of sound record in the digital sound information into readable text.

In at least an embodiment, the system processing unit of the least one computing device uses natural language processing to extract information from visual information and sound information that isbeing captured and search contextual information and performs an action that requires extracted information from visual information and sound information.

Further objectives, advantages, and features of the present invention will become apparent from the detailed description provided herein below, in which various embodiments of the disclosed invention are illustrated by way of example.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS:

The accompanying drawings are incorporated in and constitute a part of this specification to provide a further understanding of the invention. The drawings illustrate one embodiment of the invention and together with the description, serve to explain the principles of the invention.

Fig.1 illustrates a system for informative scanning.

Fig.2 illustrates a method for capturing visual information.

Fig.3 illustrates a method for capturing the sound information.

Fig.4 illustrates a method for processing visual information and sound information.

DETAILED DESCRIPTION OF THE ACCOMPANYING DRAWINGS:

The terms “a” or “an”, as used herein, are defined as one or as more than one. The term “plurality”, as used herein, is defined as two as or more than two. The term “another”, as used herein, is defined as at least a second or more. The terms “including” and/or “having”, as used herein, are defined as comprising (i.e., open language). The term “coupled”, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically.

The term “comprising” is not intended to limit inventions to only claiming the present invention with such comprising language. Any invention using the term comprising could be separated into one or more claims using “consisting” or “consisting of’ claim language and is so intended. The term “comprising” is used interchangeably used by the terms “having” or “containing”.

Reference throughout this document to “one embodiment”, “certain embodiments”, “an embodiment”, “another embodiment”, and “yet another embodiment” or similar terms means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of such phrases or in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics are combined in any suitable manner in one or more embodiments without limitation.

The term “or” as used herein is to be interpreted as an inclusive or meaning any one or any combination. Therefore, “A, B or C” means any of the following: “A; B; C; A and B; A and C; B and C; A, B and C”. An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.

As used herein, the term "one or more" generally refers to, but not limited to, singular as well as the plural form of the term.

The drawings featured in the figures are to illustrate certain convenient embodiments of the present invention and are not to be considered as a limitation to that. The term "means" preceding a present participle of an operation indicates the desired function for which there is one or more embodiments, i.e., one or more methods, devices, or apparatuses for achieving the desired function and that one skilled in the art could select from these or their equivalent in view of the disclosure herein and use of the term "means" is not intended to be limiting.

Fig.l illustrates a system (100) for providing information from the captured visual information and sound information. The system (100) includes a computing device (102), and a cloud server computer (110). The computing device (102) includes an optical scanner (104), a computing device memory (106), a microphone (130), and a device processing unit (108). The optical scanner (104) captures visual information from a rendered object. The microphone (130) captures sound information from the rendered object. The computing device memory (106) stores all the data, computer-readable instructions, and data capturing module. The device processing unit (108) is connected to the computing device memory (106) and executes the data capturing module to scan the visual information using an optical scanner (104) and the device processing unit (108) executes the data capturing module also to capture sound information using the microphone (130). The cloud server computer (110) is connected to the computing device (102). The cloud server computer (110) includes a database unit (120), and a system processing unit (122).

In at least an embodiment, the computing device (102) is communicably coupled to a search engine. The computing device (102), with the search engine, is communicably coupled to the optical scanner (104) or a camera for receiving a photo as an input. The computing device (102), with the search is engine, is communicably coupled to the microphone (130) for receiving sound as an input. The search engine is configured to work in a multi-modal format, in that, in some embodiments, the search engine is configured with pixel mapping techniques in order to fetch pixel data from an input image in order to search and to output pertinent information relevant to such image; output of an input search image may be text information, sound data, image data, multimedia data, user-input data, and / or combinations thereof. The search engine is configured to work in a multi modal format, in that, in some embodiments, the search engine is configured with audio wave mapping techniques in order to fetch audio data from an input sound in order to search and to output pertinent information relevant to such sound; output of an input sound may be text information, sound data, image data, multimedia data, user-input data, and / or combinations thereof. In at least an embodiment, the computing devices (102) are configured with a user-defined multimodal entry module such that a user can click a photo and enter relevant text, data, image, sound, multimedia, and / or its combinations with respect to the photo and upload it to a server.

In at least an embodiment, the computing devices (102) are configured with a user-defined multimodal entry module such that a user can record a sound and enter relevant text, data, image, sound, multimedia, and / or its combinations with respect to the sound and upload it to a server.

The search engine is multimodal because input to the search engine can be in the form of text, image, sound, and / or their combinations and output from the search engine can be in the form of text, image, sound, and / or their combinations.

In at least an embodiment, the system is communicably coupled to a central server which, in turn, is communicably coupled to a plurality of computing devices (102), each computing device being a client device. This central server is configured to store an active cache of information (text, images, sound data, multimedia data, and / or their combinations). The server’s cache can be physically uploaded from time to time. The communicable coupling between the computing devices (102) and the central server may be wired or wireless; but, is non-internet dependent. These computing devices (102) can interface with the server for accessing information.

Fig.2 illustrates system (100) which discloses a method for capturing the visual information from the rendered object. In step (132), the method discloses receiving the user input for selecting a location of the captured visual information. In step (134), the method discloses a computing device(102) displaying instructions to capture the visual information. In step (136), the method discloses a data capturing module with the help of the triggering mechanism automatically causings the opticalscanner(104) to capture the selected visual information. In step (138), the method discloses the optical scanner (104) capturing the visual information, and the device processing unit (108) storing the visual information in computing device memory (106) and sending the captured visual information to the cloud server computer (110). In step (142), the method discloses the device processing unit (108) signaling the user after determining that sufficient information has been obtained from the rendered object, for executing action associated with the scanned visual information,

Fig.3 illustrates a method for capturing the sound information from the rendered object. In step (142), the method discloses receiving the user input for selecting a location of the captured sound information. In step (144), the method discloses a computing device (102) displayng instructions to capture the sound information. In step (146), the method discloses a data capturing module with the help of the triggering mechanism automatically causing the microphone (130) to capture the selected sound information. In step (148), the method discloses the microphone (130) capturing the sound information, and the device processing unit (108) stores the sound information in the computing device memory (106) and sends captured sound information to the cloud server computer (110). In step (150), the method discloses the device processing unit (108) signaling after determining that sufficient information has been obtained from the rendered object, for executing action associated with the scanned sound information.

Fig.4 illustrates a method for processing the visual information and the sound information captured from the rendered object. In step (152), the method discloses a computing device (102) sending the visual information and the sound information to the cloud server computer (110). In step (154), the method discloses the system processing unit (122) executing computer-readable instructions to extract data from visual information and sound information that is being captured. In step (156), the method discloses the system processing unit (122) executing computer-readable instructions to search for contextual information related to the captured visual information and sound information. In step (158), the method discloses the system processing unit (122) storing the captured visual information and sound information along with corresponding the contextual information in the database unit (120). In step (160), the method discloses the system processing unit (122) executing computer-readable instructions to send the captured visual information and sound information along with corresponding contextual information to be stored in the computing device memory (106). In step (162), the method discloses the system processing unit (122) executing computer-readable instructions to perform an action that requires extracted information from visual information and sound information.

The present invention relates to a system for providing information from the captured visual information. The system includes a computing device and a cloud server computer. The computing device includes an optical scanner, a microphone, a computing device memory, and a device processing unit. The optical scanner captures visual information from a rendered object. The microphone captures sound information from the rendered object. The computing device memory stores all the data, computer-readable instructions, and data capturing module. The device processing unit is connected to the computing device memory that executes a data capturing module to scan the visual information using an optical scanner. The device processing unit executes a data capturing module also to capture sound information using the microphone. Herein a triggering mechanism of data capturing module that automatically causes the optical scanner and the microphone to capture visual information and sound information from the rendered object. The cloud server computer is connected to the computing device having a database unit, and a system processing unit. The database unit stores visual information and sound information captured from the rendered objects and computer-readable instructions. The system processing unit executes computer-readable instructions to receive the captured visual information and sound information. Herein, the system processing unit executes computer- readable instructions to search for contextual information related to the captured visual information and sound information and stores the captured visual information and sound information along with corresponding contextual information in the database unit. Herein, the system processing unit executes computer-readable instructions to send the captured visual information and sound information along with corresponding contextual information to be stored in the computing device memory.

In an embodiment, the computing device is a wireless reception device, the wireless reception device comprises a battery to provide power and the computing device memory comprises of the computer-readable instructions stored for the execution on the device processing unit.

In another embodiment, the computing device is including, but not limited to, a tablet, a smartphone, mobile phone, computer, and laptop.

In yet another embodiment, the visual information captured from the rendered object comprises text from a rendered document, visual information, and audio information. The cloud server computer is including, but not limited to a desktop computer, a server, a mainframe computer. In the preferred embodiment of the present invention, the computing device also able to get connected with an external interface device that executes the stored computer-readable instructions to communicates with the system processing unit of the computing device to retrieve data.

In an embodiment, the present invention relates to a method for providing information from the captured visual information. The method having: a method for capturing the visual information from the rendered object, the method having receiving the user input for selecting a location of the captured visual information; a computing device displays instructions to capture the visual information; a data capturing module with the help of the triggering mechanism automatically causes the optical scanner to capture the selected visual information; the optical scanner captures the visual information and the device processing unit stores the visual information in the computing device memory and sends captured visual information to the cloud server computer; and the device processing unit signaled the user after determining that sufficient information has been obtained from the rendered object, for executing action associated with the scanned visual information.

A method for capturing the sound information from the rendered object, the method having receiving the user input for selecting a location of the captured sound information; a computing device displays instructions to capture the sound information; a data capturing module with the help of the triggering mechanism automatically causes the microphone to capture the selected sound information; the microphone captures the sound information and the device processing unit stores the sound information in the computing device memory and sends captured sound information to the cloud server computer; and the device processing unit signaled the user after determining that sufficient information has been obtained from the rendered object, for executing action associated with the scanned sound information.

A method for processing the visual information and the sound information captured from the rendered object, the method having computing device sends the visual information and the sound information to the cloud server computer; the system processing unit executes computer-readable instructions to extract data from visual information and sound information that is being captured; the system processing unit executes computer-readable instructions to search for contextual information related to the captured visual information and sound information; the system processing unit stores the captured visual information and sound information along with corresponding contextual information in the database unit; the system processing unit executes computer-readable instructions to send the captured visual information and sound information along with corresponding contextual information to be stored in the computing device memory, and the system processing unit executes computer-readable instructions to performs an action that requires extracted information from visual information and sound information

In another embodiment, the computing device communicates the digital visual information to the system processing unit in order to determine the portion of the set of readable text in the digital visual information. The computing device communicates the digital sound information to the system processing unit in order to convert the portion of the set of sound records in the digital sound information into readable text.

In yet another embodiment, the system processing unit of the computing device uses natural language processing to extract information from visual information and sound information that is being captured and search contextual information and performs the action that requires extracted! nform ati on from visual information and sound information.

Further objectives, advantages, and features of the present invention will become apparent from the detailed description provided herein, in which various embodiments of the disclosed present invention are illustrated by way of example and appropriate reference to accompanying drawings. Those skilled in the art to which the present invention pertains may make modifications resulting in other embodiments employing principles of the present invention without departing from its spirit or characteristics, particularly upon considering the foregoing teachings. Accordingly, the described embodiments are to be considered in all respects only as illustrative, and not restrictive, and the scope of the present invention is, therefore, indicated by the appended claims rather than by the foregoing description or drawings. Consequently, while the present invention has been described with reference to particular embodiments, modifications of structure, sequence, materials and the like apparent to those skilled in the art still fall within the scope of the invention as claimed by the applicant.

Claims

1. A system (100) for informative scanning, the system (100) comprises:

- a computing processor, with instructions, causing:

- storing associations of initialized tags in an object information repository;

- a computing device (102) having: o an optical scanner (104), the optical scanner (104) captures visual information from an object which is the rendered object, o a microphone (130), the microphone (130) captures sound information from the rendered object, o a computing device memory (106), the computing device memory (106) stores all data, computer-readable instructions, and data capturing module, and o a device processing unit (108), the device processing unit (108) is connected to the computing device memory (106) and executes the datacapturing module to scan the captured visual information using the optical scanner(104), and the device processing unit (108) executes the data capturing module to scan the captured sound information using the microphone(130), wherein a triggering mechanism of the data capturing module automatically causes the optical scanner (104) and the microphone (130) to capture visual information and sound information from the rendered object; and an at least one cloud server computer (110), the at least one cloud server computer (110) is connected to the computing device (102), the at least one cloud server computer (110) having: o an at least one database unit (120), the at least one database unit (120) stores visual information and sound information captured from the rendered objects and computer-readable instructions, and o a system processing unit (122), the system processing unit (122) executes computer-readable instructions to receive the captured visual information and the captured sound information, wherein, the system processing unit (122) executes computer-readable instructions to search, upon querying by a user, by means of a multimodal search engine, for contextual information related to the captured visual information and the captured sound information and stores the captured visual information and the captured sound information, along with corresponding contextual information, in the at least one database unit (120); wherein, the system processing unit (122) executes computer-readable instructions to send the captured visual information and the captured sound information, along with corresponding contextual information, to be stored in the computing device memory (106) and to a user querying the multimodal search engine.

2. The system as claimed in claim 1, wherein said multimodal search engine being configured with pixel mapping techniques in order to fetch pixel data from said captured visual information in order to search and to output associated tagged information relevant to such captured visual information; output of an input captured visual information being text information, sound data, image data, multimedia data, user-input data, and / or combinations thereof.

3. The system as claimed in claim 1, wherein said multimodal search engine being configured with audio wave mapping techniques in order to fetch audio data from said captured sound information in order to search and to output associated tagged information relevant to such captured sound information; output of an input captured audio information being text information, sound data, image data, multimedia data, user-input data, and / or combinations thereof.

4. The system as claimed in claim 1, wherein said multimodal search engine being communicably coupled to a central server, said central server being communicably coupled to a plurality of computing devices (102), each computing device being a client device, said central server is configured to store an active cache of information in a continuously updateable server cache, in that, communicable coupling between the computing devices (102) and the central server being internet independent wired or wireless communicable coupling.

5. The system as claimed in claim 1, wherein the computing device (102) is a wireless reception device, wherein, the wireless reception device comprises of a battery to provide power and the computing device memory (106) comprises of the computer- readable instructions stored for the execution on the device processing unit (108).

6. The system (100) as claimed in claim 1, wherein the computing device (102) is selected from a tablet, a smartphone, mobile phone, computer, and laptop.

7. The system (100) as claimed in claim 1, wherein, the visual information captured from the rendered object comprises of a text from visual information and audio information.

8. The system as claimed in claim 1, wherein the at least cloud server computer (102) is selected from a desktop computer, a server, the mainframe computer.

9. The system (100) as claimed in claim 1, wherein, the computing device (102) also able to get connected with at least one external interface device (124) that execute the stored computer-readable instructions to communicates with the system processing unit (122) of the computing device (102) to retrieve data.

10. A method (100) for informative scanning, through a multimodal search engine, the method (100) comprises:

- storing associations of initialized tags in an object information repository;

- performing, by means of a computing device (102), the performing steps including: o capturing, visual information from an object which is the rendered object, o capturing, sound information from the rendered object, o storing all data, o scanning the captured visual information and the captured sound information, wherein, upon triggering, automatically causing capturing of visual information and sound information from the rendered object; and performing, by means of a at least one cloud server computer (110) connected to the computing device (102), the at least one cloud server computer (110) having: o storing visual information and sound information captured from the rendered objects and computer-readable instructions, and o executing computer-readable instructions to receive the captured visual information and the captured sound information; o searching, upon querying by a user, for contextual information related to the captured visual information and the captured sound information and storing the captured visual information and the captured sound information, along with corresponding contextual information; wherein, the method being configured to send the captured visual information and the captured sound information, along with corresponding contextual information, to a user querying the multimodal search engine of this method.

11. The method claimed in claim 10, wherein the least one computing device (102) communicates the digital visual information to the system processingunit (122)in order to determine the portion of the set of readable text in the digital visual information.

12. The method claimed in claim 10, wherein the least one computing device (102) communicates the digital sound information to the system processing unit (122) in order to convert the portion of the set of sound record in the digital sound information into readable text.

13. The method claimed in claim 10, wherein, the system processing unit (122) of the least one computing device (102) uses natural language processing to extract information from visual information and sound information that isbeing captured and search contextual information and performs an action that requires extracted information from visual information and sound information.