WO2019175896A1

WO2019175896A1 - System and method for interacting with digitalscreensusing voice input and image processing technique

Info

Publication number: WO2019175896A1
Application number: PCT/IN2019/050200
Authority: WO
Inventors: Renuka Bodla
Original assignee: Renuka Bodla
Priority date: 2018-03-13
Filing date: 2019-03-12
Publication date: 2019-09-19

Abstract

A system for interacting with digital screen using voice input is provided. The voice based digital screen interaction device (104) identifies a type of program that runs on its digital screen and parses the digital screen as an image in real-time. The voice based digital screen interaction device (104) identifies a layout of the digital screen to determine attributes and to associate labels with corresponding input fields. The voice based digital screen interaction device (104) receives voice input from the user (102) in a human spoken language and communicates the attributes and the voice input to a voice interaction enabling server (108). The voice interaction enabling server (108) translates the attributes and the voice input and verifies the voice input to generate commands based on the voice input. The voice based digital screen interaction device (104) receives and executes the commands to perform functions as defined in the commands.

Description

SYSTEM AND METHOD FOR INTERACTING WITH DIGITALSCREENSUSING VOICE INPUT AND IMAGE PROCESSING TECHNIQUE

BACKGROUND

Technical Field

[0001] The embodiments herein generally relate to a system for interacting with digital screens and natural language processing, and more particularly, to a system and method that assists a user in interacting with digital screens using voice input.

Description of the Related Art

[0002] Users of software products, websites, mobile applications, etc. typically interact with the digital screens to provide inputs using input devices like a keyboard or a mouse. Excess usage of the above-mentioned input devices may result in ergonomic injuries such as neck pain, back pain, and carpel tunnel syndrome, and reduced productivity. Further, users who suffer from physical disabilities, who are unable to operate these input devices, cannot use their personal digital devices without seeking help from others. Since digital screens are typically displayed in popular languages like English, users who are not proficient in English also face hurdles in operating their personal digital devices.

[0003] With the advent of speech recognition, virtual assistants have become available for use with personal devices such as smartphones, tablets, etc. which use natural language processing (NLP) to match user text or voice input to executable commands. They are used for simple applications such providing information on the weather, playing music and videos etc. However, they are limited in their capabilities to interact with digital screens to implement processes and require substantial manual input using input devices, and are slow and inaccurate. Existing systems also do not support multilingual capability to assist the users to interact with the digital screens in their natural language or desired language.

[0004] Accordingly, there remains a need for a system and a method for enabling users to effectively interact with the digital screens using their voice inputs in real time. SUMMARY

[0005] In view of the foregoing, an embodiment herein provides a method for interacting with a digital screen of a voice based digital screen interaction device using a voice input of a user. The method includes (a) generating a database that stores information comprising a natural language of a user; (b) identifying a type of a program that runs on a digital screen of the voice based digital screen interaction device; (c) parsing the digital screen as an image in real time using an image processing technique to identify an overall layout of the digital screen, wherein the image processing technique removes unnecessary images and graphics that are embedded in the digital screen; (d) determining attributes by analysing the overall layout of the digital screen; (e) transforming the attributes of the digital screen from a machine-readable language to the natural language of the user using a natural language processing technique, wherein the natural language of the user is a human spoken language; (f) providing the attributes in the natural language of the user on the voice based digital screen interaction device for enabling the user to input a voice input, wherein each attribute comprises a label, an input field and their semantics; (g) receiving the voice input of the user in the natural language of the user from the voice based digital screen interaction device, wherein the voice input of the user is processed to eliminate unnecessary background noises from the voice input; (h) transforming the voice input from the natural language of the user to the machine-readable language using the natural language processing technique; (i) determining a first attribute to be populated based on the voice input irrespective of an order of a position of the first attribute in the digital screen; (J) identifying a proximity type between a label and an input field associated with the first attribute to associate the label with the input field based on the identified proximity type; and (k) analysing semantics of the label and their corresponding input field based on the identified proximity type; (1) generating a command based on the voice input, the first attribute to populate the voice input in the input field associated with the first attribute; and (m) populating the voice input in the input field associated with the first attribute by executing the generated command on the voice based digital screen interaction device using a command execution technique.

[0006] In one embodiment, the method includes accessing functions selected from a group comprising Home, Tools, View, Message, Submit, Save, Backspace, Pagedown, Pageup based on the voice input to operate of the voice digital screen based interaction device. [0007] In another embodiment, the method includes (a) analysing a behaviour of the user, using a machine learning algorithm, based on the user’s interactions with different digital screens of the voice based digital screen interaction device; (b) generating data analytics on which digital screens or attributes that the user is spending maximum amount of time in interacting with; and (c) automatically prepopulating input fields associated with the attributes the standard values based on the user behaviour analysis and data analytics.

[0008] In yet another embodiment, the method includes implementing a voice

authentication process to authenticate authorized users to access the digital screen that are secured.

[0009] In yet another embodiment, the method includes (a) verifying the voice input of the user when a format of received input data for an input field is different from an expected input data format; and (b) displaying or playing the attributes of the digital screen to the user in the natural language of the user.

[0010] In yet another embodiment, the voice input of the user is captured using a microphone associated with the voice based interaction device.

[0011] In yet another embodiment, the method includes eliminating grammatical errors from the voice input using the natural language processing technique to determine input data corresponding to different input fields.

[0012] In yet another embodiment, the method includes (a) implementing a machine learning algorithm to interpret abbreviations, synonyms, and other forms of pronouncing the labels and to analyse standard values provided by the user for prepopulating in the respective input fields.

[0013] In yet another embodiment, the attributes include a static text, a menu bar option, a tool bar option, an audio, a video, a label, an input field and/or a semantics on the digital screen and the input fields include a text box, a radio button, a drop-down list box, a check box, or a multiple choice to select input data.

[0014] In one aspect, a system for interacting with a digital screen of a voice based digital screen interaction device using a voice input of a user is provided. The system comprises a memory and a processor. The memory stores a set of instructions. The memory comprises a database that stores information associated with a natural language of a user. The processor executes the set of instructions to perform: (a) identifying a type of a program that runs on a digital screen of the voice based digital screen interaction device ; (b) parsing the digital screen as an image in real time using an image processing technique to identify an overall layout of the digital screen, wherein the image processing technique removes unnecessary images and graphics that are embedded in the digital screen; (c) determining attributes by analysing the overall layout of the digital screen; (d) transforming the attributes of the digital screen from a machine-readable language to the natural language of the user using a natural language processing technique, wherein the natural language of the user is a human spoken language; (e) providing the attributes in the natural language of the user on the voice based digital screen interaction device for enabling the user to input a voice input, wherein each attribute comprises a label, an input field and their semantics; (f) receiving the voice input of the user in the natural language of the user from the voice based digital screen interaction device, wherein the voice input of the user is processed to eliminate unnecessary background noises from the voice input; (g) transforming the voice input from the natural language of the user to the machine-readable language using the natural language processing technique; (h) determining a first attribute to be populated based on the voice input irrespective of an order of a position of the first attribute in the digital screen; (i) identifying a proximity type between a label and an input field associated with the first attribute to associate the label with the input field based on the identified proximity type; and (j) analysing semantics of the label and their corresponding input field based on the identified proximity type; (k) generating a command based on the voice input, the first attribute to populate the voice input in the input field associated with the first attribute; and (1) populating the voice input in the input field associated with the first attribute by executing the generated command on the voice based digital screen interaction device using a command execution technique.

[0015] The system enhances performance/ productivity of the user through voice input based interaction. The system does not require any external hardware to allow the user to interact with the digital screen or to provide the input data for various input fields. The system is quick and accurate enough to parse, populate and identify the input fields in a newly opened digital screen, so that the user can interact with the newly opened digital screen in a real time. The system allows the users to interact with the digital screens in data entry kind of jobs using voice input. The system helps the users to effectively interact with the digital screen who are not proficient in (i) the machine -readable language and (ii) operating the voice based digital screen interaction device.

[0016] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] The embodiments herein will be better understood from the following detailed description with reference to the drawings, in which:

[0018] FIG. 1 illustrates a system view of a user interacting with a digital screen using a voice based digital screen interaction device that communicates with a voice interaction enabling server through a network according to an embodiment herein;

[0019] FIG. 2 illustrates an exploded view of the voice based digital screen interaction device of FIG. 1 according to an embodiment herein;

[0020] FIG. 3 illustrates an exploded view of the voice interaction enabling server of FIG. 1 according to an embodiment herein;

[0021] FIG. 4 illustrates an exemplary view of a digital screen with a menu bar having sub-options for messaging that the user interacts with to select a sub-option in real time by providing voice input according to an embodiment herein;

[0022] FIG. 5 illustrates an exemplary view of a user interacting with digital screens in real time according to an embodiment herein;

[0023] FIGS. 6A-6B are flow diagrams illustrating a method for interacting with the digital screen in the voice based digital screen interaction device using voice input according to an embodiment herein;

[0024] FIGS. 7 A and 7B are flow diagrams illustrating a method for generating commands using the voice interaction enabling server based on voice input according to an embodiment herein; and [0025] FIG. 8 shows a diagrammatic representation of a computer system within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein may be executed in accordance with an embodiment herein.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0026] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.

[0027] As mentioned, there remains a need for a system for enhancing the performance/ productivity of the user. The embodiments herein achieve this by providing a voice based digital screen interaction device and a voice interaction enabling server that enables a user to effectively interact with a digital screen using voice input. Referring now to the drawings, and more particularly to FIGS. 1 through 8, where similar reference characters denote corresponding features consistently throughout the figures, preferred embodiments are shown.

[0028] FIG. 1 illustrates a system view of a user 102 interacting with a digital screen using a voice based digital screen interaction device 104 that communicates with a voice interaction enabling server 108 through a network 106 according to an embodiment herein. The voice based digital screen interaction device 104 identifies a type of a program that runs on its digital screen. The voice based digital screen interaction device 104 may be any of a mobile phone, a tablet, a laptop, a desktop computer etc. The program may be an ERP or CRM application, an operating system, a mobile application, a website etc. The voice based digital screen interaction device 104 parses the digital screen as an image in real time using image processing techniques (e.g. image preprocessing, image restoration, image compression, image filtering and/or image analysis) to identify an overall layout of the digital screen. The digital screen may be (a) a web portal page or (b) an e-filing form, (c) a mobile application, (d) a user interface of an operating system etc. The digital screen is internally processed using the image processing techniques, such as image preprocessing, image filtering and image compression, for the removal of noise, accurate image visibility and to identify an overall lay out of the digital screen. From the overall layout of the digital screen, the voice based digital screen interaction device 104 determines attributes of the digital screen by analyzing it. The attributes may include static text, menu bar options, tool bar options, audio, video, labels, input fields and/or semantics on the digital screen. The input fields may include text boxes, radio buttons, drop down list boxes, check boxes, etc. The menu bar options may include FILE, VIEW, EDIT, TOOLS, MESSAGE, etc. In one embodiment, the voice based digital screen interaction device 104 uses the image processing technique to remove unnecessary images and graphics that are embedded in the digital screen.

[0029] The voice based digital screen interaction device 104 identifies a proximity type between the labels and the input fields. The proximity type may include (a) adjacent proximity, (b) up and down proximity and/or (c) near-by location proximity. The up and down proximity may typically occur when the digital screen is displayed on a computing device that has a smaller screen. The voice based digital screen interaction device 104 analyzes the semantics of the labels and associates the labels with corresponding input fields based on the identified proximity type. The input fields may include multiple choices to select input data. The voice based digital screen interaction device 104 associates the multiple choices with corresponding input fields in real time. For example, if an input field associated with the label“HOBBIES” include multiple choices like“PLAYING”, “DRAWING”,“READING”, etc., the voice based digital screen interaction device 104 associates the above-mentioned multiple choices with the label “HOBBIES” in real time.

[0030] The voice based digital screen interaction device 104 receives voice input from the user 102 using a microphone or any other mechanism to receive the voice input/commands. The voice based digital screen interaction device 104 may receive the voice input in any order based on a preferred communication style or format of the user 102. The voice based digital screen interaction device 104 communicates the voice input and attributes of the digital screen to a voice interaction enabling server 108. The voice interaction enabling server 108 may eliminate grammatical errors from the voice input using a natural language processing technique to determine data corresponding to different fields. Natural-language processing technique is a technique used in artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to fruitfully process large amounts of natural language data. The Natural-language processing technique employs machine learning algorithms such as, Python etc. to eliminate grammatical errors from the voice input in order to determine data corresponding to different fields. There are multiple open source libraries (e.g. Apache Open NLP, Natural Language Toolkit (NLTK), Stanford NLP, MALLET) that are available to provide algorithmic building blocks in real time. The embodiment herein can implement Natural language processing technique using open source library called Natural Language Toolkit (NLTK), for example Python. For example, the voice interaction enabling server 108 may receive the voice input as“MY FIRST NAME is JOHN” or“FIRST NAME” and“JOHN” to determine that the data corresponding to the field“FIRST NAME” is“JOHN”. The voice interaction enabling server 108 may also receive the voice input as exactly the input data or spelled format of the input data. For example, the voice interaction enabling server 108 receives the voice input in the format of“JOHN” or“J-O-H-N”. The voice interaction enabling server 108 identifies the attributes irrespective of the order of the attributes in the digital screen and generates commands to populate the input data in the associated input fields. For example, for the labels“FIRST NAME”,“MIDDLE NAME” and“LAST NAME”, the voice interaction enabling server 108 may receive the input data first for the label“LAST NAME”, second for the label“FIRST NAME” and last for the label“MIDDLE NAME” and generate commands to exactly populate the associated input fields with the respective input data irrespective of the order in which the inputs were received. In an embodiment, using machine learning algorithms, the voice interaction enabling server 108 may learn how to understand and interpret abbreviations, synonyms, and other forms of pronouncing the labels. For example, inside an Oracle ERP system, there could be a label “PO” and the user 102 may provide input corresponding to the label either as Purchase Order or PO.

[0031] The voice interaction enabling server 108 further analyzes the voice input and the attributes of the digital screen. The voice input may be in a natural language (e.g. native tongue) of the user 102. The voice interaction enabling server 108 processes the voice input of the user 102 to eliminate unnecessary background noises from the voice input. The attributes of the digital screen may be in a machine readable language. The machine-readable language may be any language, for example Extensible Markup Language (XML), on which the digital screen is intended to fill according to legal requirements, jurisdictional requirements etc. The voice interaction enabling server 108 translates (i) the voice input from a natural language to a machine readable language and (ii) the attributes of the digital screen from the machine readable language to the natural language of the user 102 (e.g. native tongue of the user 102) using the natural language processing technique. The voice interaction enabling server 108 communicates the translated attributes to the voice based digital screen interaction device 104. The voice based digital screen interaction device 104 may display or play the attributes using a display or a speaker respectively, when the user 102 requests the voice interaction enabling server 108. The voice interaction enabling server 108 may assist the user 102 using the natural language processing technique (a) to understand the semantics of the attributes and (b) to provide the voice input in the natural language. The natural language may be any a human spoken language or native tongue of the user 102. The natural language is not a programming language.

[0032] The user 102 may interact with the digital screens several times. The voice interaction enabling server 108 may learn and analyze behavior (e.g. which may include an accent or an individual style or pattern of providing the voice inputs) of the user 102 from the digital screens that are interacted by the user 102, using a machine learning algorithm. The voice interaction server 108, using the machine learning algorithm, may (i) learn and analyze standard values (e.g. JOHN, SMITH, INDIA etc.) provided by the user 102 for the respective input fields (e.g. first name, last name, country etc.) and (ii) generate commands to automatically pre populate those standard values in the respective input fields to reduce input time for the user l02during subsequent interactions with the digital screens. The voice based digital screen interaction device 104 includes a voice based interaction database that stores a repository of the digital screens, the standard values and the automatic pre -populated input fields etc. For example, if the user 102, who may be from a country“INDIA”, may repeatedly fill the label “COUNTRY” with the standard value of“INDIA”, then the voice interaction enabling server 108, using the machine learning algorithm, learns the standard value as“INDIA” for the label “COUNTRY” for that user 102 and generate commands to automatically pre-populate the standard value of “INDIA” for an input filed of the label“COUNTRY” in the digital screen during subsequent interactions of that user 102 with that digital screen. In another example, if the user 102 of a“FEMALE” gender may repeatedly fill the label“GENDER” with the standard value of“FEMALE”, then the voice interaction enabling server l08learns the standard value as “FEMALE” for the label“GENDER” and generates commands to automatically pre -populate an input field of the label“GENDER” with the standard value of“FEMALE”. Similarly, the voice interaction enabling server 108 may analyze the semantics of the input data in the label“FIRST NAME” and generate commands to automatically pre -populate an input field of the label “GENDER” with appropriate data. For example, the voice interaction enabling server 108 analyses the semantics of the input data“JOHN” associated with the label“FIRST NAME” and generates commands to automatically pre -populate an input field of the label“GENDER” with the data of“MALE”. The voice based digital screen interaction device 104 may delete, edit or change the automatically pre-populated standard values based on voice input of the user 102. In an embodiment, based on a location of the voice based digital screen interaction device 104, the voice interaction enabling server 108 automatically populates the input field of the label “COUNTRY”. For example, if a location of the voice based digital screen interaction device 104 is identified as“INDIA” using Global positioning system, the voice interaction enabling server 108 automatically populates the input field of the label“COUNTRY” as“INIDA” in the digital screen. Similarly, based on the location of the voice based digital screen interaction device 104, the voice interaction enabling server 108 determines how a word has to be spelled in the display / speaker of the voice based digital screen interaction device 104. For example, for the word“COLOUR”, the spelling is“COLOR” in US whereas the spelling is COLOUR in India. If the location the voice based digital screen interaction device 104 is identified as US, then the voice interaction enabling server 108 populates the input field with the correct spelling used in US.

[0033] The voice interaction enabling server 108 may provide data analytics on which digital screen that the user 102 is spending maximum amount of time in interacting with or input fields that are not being populated by the user 102. The voice interaction enabling server 108 may provide options to the user 102 to stop or start interacting with the digital screen using voice input. In an example embodiment, the voice interaction enabling server 108 may allow the user 102 to start or stop interacting with the digital screen using the voice input“START” or“STOP” respectively. The voice based digital screen interaction device 104 may allow the user 102 to enter the inputs through a keyboard or a mouse.

[0034] The voice interaction enabling server 108 may analyze an accent of the user 102 and an individual style or pattern of the user 102 in providing the voice input. The voice interaction enabling server 108 may employ a voice authentication process which authenticates authorized users to fill / interact with secured digital screens. The voice interaction enabling server 108 may use the user’s 102 voice as an authentication feature to allow the authorized users to fill the secured digital screens. The authentication feature may be (i) a voice pitch, (ii) a voice tone or (iii) a voice accent.

[0035] The voice interaction enabling server 108 may generate commands based on the analysis of the voice input of the user 102 and the attributes of the digital screen. The voice interaction enabling server 108 may communicate the generated commands to the voice based digital screen interaction device 104. The voice based digital screen interaction device 104 receives and executes the commands using a command execution technique. The command execution technique includes well known techniques such as command injection. The command injection is the insertion of code e.g. UNIX commands, into dynamically generated output and it helps for managing the data. The command execution may include (i) populating the input fields of the labels with the respective input data based on the voice input of the user 102, (ii) automatically pre-populating the input fields with the standard values based on the behavior analysis and data analytics associated with the user 102, (iii) verifying the voice input or the input data with the user 102, (iv) accessing the functions like HOME, TOOLS, VIEW, MESSAGE, SUBMIT, SAVE, BACKSPACE, PAGEDOWN, PAGEUP etc. to effectively operate the voice based digital screen interaction device 104, (v) providing an authentication to the user 102 to access the secured digital screen or (vi) displaying or playing the attributes of the digital screen or input data to the user 102 in the natural language of the user 102.

[0036] The voice based digital screen interaction device 104 may play or display the attributes of the digital screen or input data to the user 102 in the natural language, using the speaker/display, when the user 102 requests to play /display the input data entered in the input field. The voice based digital screen interaction device 104 may play or display the attribute of the digital screen irrespective of a type of font, color of the attribute, a size of the attribute etc. The voice based digital screen interaction device 104 may verify the voice input or the input data with the user 102 when a format of received input data for an input field is different from an expected input data format a. For example, for the label, the input data format expected by the voice interaction enabling server 108 is a number format. If the user 102 provides the input data in a text format for the label“MOBILE”, the voice interaction enabling server 108 verifies and notifies the user 102 through the voice based digital screen interaction device l04to provide the input data in the number format. In another example, the digital screen may include mandatory fields to be filled in by the user 102. If the user 102 clicks“SUBMIT” option without filling the mandatory fields, then the voice interaction enabling server 108 verifies and notifies the user 102 through the voice based digital screen interaction device 104 to fill the mandatory fields.

[0037] In an embodiment, one or more voice based digital screen interaction devices is communicatively coupled with the voice interaction enabling server 108. The one or more voice based digital screen interaction devices receives the voice input from the one or more users and communicates the voice inputs of the one or more users with the voice interaction enabling server 108 for further processing as described above. In another embodiment, the one or more voice based digital screen interaction devices communicates with a voice based digital screen interactive cloud model instead of the voice interaction enabling server 108. In yet another embodiment, the voice based digital screen interaction device 104 performs the function of the voice interaction enabling server 108 as described above including processing voice input obtained from the user 102 and the attributes of the digital screen, generating command based on analysis of the voice input and the attributes of the digital screen etc. In an embodiment, the voice interaction enabling server 108 may include a default setting if the user 102 uses the voice interaction enabling server 108 for the first time. The voice interaction enabling server 108 then starts learning the attributes of the digital screens that the user 102 is interacting with, using the machine learning algorithm, and saves those attributes with his/her profile. If the voice interaction enabling server 108 is being used by multiple users, then the voice interaction enabling server 108 creates a profile for each of the users using an identification of each user.

[0038] FIG. 2 illustrates an exploded view of the voice based digital screen interaction device l04of FIG. 1 according to an embodiment herein. The voice based digital screen interaction device l04includes a voice based interaction database 202, a program identification module 204, a screen parsing module 206, a label and input field association module 208, a voice input obtaining module 210, a voice input verification module 212, an attributes and voice input communication module 214, a command receiving module 216 and a command execution module 218. The voice based digital screen interaction device 104 may include a voice input swapping module.

[0039] The program identification module 204 identifies a type of the program that runs on a digital screen of the voice based digital screen interaction device 104. The screen parsing module 206 parses the digital screen as an image in real time using an image processing technique to identify the overall layout of the digital screen. The screen parsing module 206 may determine the attributes of the digital screen by analyzing the layout of the digital screen. The screen parsing module 206 may remove unnecessary images and graphics that are embedded in the digital screen for better interaction. The label and input field association module 208 identifies a proximity type between the labels and the input fields. The label and input field association module 208 associates the labels with the respective input fields based on (i) the identified proximity type and (ii) the analysis of the semantics of the labels and corresponding input fields. The voice input obtaining module 210 obtains the voice input from the user 102 in natural language (e.g. native tongue) using the microphone or any other mechanism to receive the voice input. The voice input may include any one of (i) utterance of name of the label followed by the input data to be entered in the corresponding input field and (ii) utterance of phrase“CLICK ON” followed by at least one of (a) menu bar option, (b) tool bar option or (c) functions like EDIT, BACKSPACE, DELETE, SUBMIT, SAVE, etc. The voice input may include (ii) utterance of phrase“TRANSLATE” followed by at least one of (a) a screen name, (b) a label name and (c) an input field followed by a name of the natural language. In an embodiment, the user voice input obtaining module 210 eliminates the unnecessary background noises from the voice input of the user 102. The voice input verification module 212 verifies the voice input of the user 102 when a format of received input data for an input field is different from an expected input data format. The voice input verification module 212 further verifies (i) a voice pitch, (ii) a voice tone and (iii) a voice accent of the user 102 (e.g. authorized user) with the received voice input for authentication. The attributes and voice input communication module 214 communicates the voice input and the attributes of the digital screen to the voice interaction enabling server 108 to generate commands based on the analysis of the voice input of the user 102 and the attributes of the digital screen.

[0040] The command receiving module 216 receives a command that is generated based on the voice input and the attributes of the digital screen from the voice interaction enabling server 108. The command execution module 218 executes the command that is generated based on the voice input using a command execution technique.

[0041] The command execution module 218 may delete, edit or change the input data in an input field that is automatically pre-populated with standard values based on voice input of the user 102. The voice based interaction database 202 stores a repository of the digital screens, the standard values filled in different digital screens filled by the user 102 and the automatic pre populated input fields etc. The voice input swapping module provides options to the user 102 to swap between the voice input based interaction and a traditional way of interaction (e.g. through keyboard or mouse) with the digital screen.

[0042] FIG. 3 illustrates an exploded view of the voice interaction enabling server l08of FIG. 1 according to an embodiment herein. The voice interaction enabling server 108 includes a voice based command execution database 302, an attribute and voice input receiving module 304, a natural language module 306, an attribute identification module 308, a behavior analysis module 310, a data analytics generation module 312, a command generation module 314 and a command communication module 316. The attribute and voice input receiving module 304 receives the voice input and the attributes of the digital screen from the voice based digital screen interaction device 104. The natural language module 306 analyzes the received voice input and translates the voice input to a machine readable language from a natural language (e.g. native tongue of the user 102). The attribute identification module 308 identifies an attribute that the user 102 is intended to enter input data/execute a command from the received attributes of the digital screen based on the analyzed voice input. The attribute identification module 308 may identify the attributes that the user 102 is intended to enter input data/execute the command irrespective of position of the attributes in the digital screen. The behavior analysis module 310 analyzes the behavior of the user 102 (e.g. which may include an accent or an individual style or pattern of providing the voice input) from the digital screens that are filled in /interacted by the user 102, using a machine learning algorithm (e.g. MATLAB, C++ etc.). The data analytics generation module 312 generates data analytics on which digital screens or attributes that the user 102 is spending maximum amount of time in interacting with /entering the input data. The command generation module 314 generates commands based on the voice input, the data analytics of user’s interaction with the different digital screens, the voice input verification and the behavior analysis of the user 102. The command communication module 316 communicates the generated commands to the voice based digital screen interaction device 104 through a wireless or any other network. The voice based command execution database 302 stores (i) a voice pitch of the user 102, (ii) a voice tone of the user 102, (iii) a voice accent of the user 102, (iv) standard values, (iv) data analytics of user’s interaction with the different digital screens, etc. In an embodiment, the wireless network may be (a) a WIFI network, (b) a Bluetooth or (c) any other network.

[0043] FIG. 4 illustrates an exemplary view of a digital screen with a menu bar having sub-options for messaging that the user interacts with to select a sub-option in real time by providing voice input according to an embodiment herein. The menu bar comprises an option “MESSAGE” 402, which in turn provides/displays sub-options such as (i) NEW MESSAGE, (ii) NEW MESSAGE USING, (iii) REPLY TO SENDER, (iv) REPLY TO ALL, (v) REPLY TO GROUP, and (iii) FORWARD when the user 102 accesses/clicks the option“MESSAGE” 402. The voice interaction enabling server 108 analyzes, parses and associates the sub-options with the option“MESSAGE” 402 in real time and allows the user 102 to select a sub-option in real time using his voice input.

[0044] FIG. 5 illustrates an exemplary view of a user interacting with one or more digital screens in real time according to an embodiment herein. The voice based digital screen interaction device 104 may provide options to the user 102 to interact simultaneously with more than one digital screens and switch between a first digital screen and a second digital screen. The voice based digital screen interaction device 104 provides options to the user 102 to name each digital screen with a name. The voice based digital screen interaction device l04receives a voice input that includes a name of the digital screen as input data followed by a label from the user 102 and names that digital screen based on the voice input. For example, when the user 102 is interacting simultaneously with two digital screens, for example“SCREEN 1” and SCREEN 2” and wants to name the“SCREEN 1” with a name, then the user 102 can provide his voice input to name the digital screens as“SCREEN 1” followed by a name of the attribute/label and followed by input data to be entered/executed in the input field as associated with the attribute / label.

[0045] FIGS. 6A-6B are flow diagrams illustrating a method for interacting with a digital screen in the voice based digital screen interaction device 104 using voice input according to an embodiment herein. At step 602, attributes of a digital screen is received from a voice based digital screen interaction device 104. At step 604, the attributes of the digital screen is transformed from a machine -readable language to the natural language of the user 102 using a natural language processing technique. The natural language of the user 102 is a human spoken language. At step 606, the attributes in the natural language of the user 102 is provided on the voice based digital screen interaction device 104 for enabling the user 102 to input a voice input. Each attribute comprises a label, an input field and their semantics. At step 608, the voice input of the user 102 is received from the voice based digital screen interaction device 104. At step 610, the voice input is transformed from the natural language of the user 102 to the machine - readable language using the natural language processing technique. At step 612, a first attribute to be populated is determined based on the voice input irrespective of an order of a position of the first attribute in the digital screen. At step 614, a proximity type between a label and an input field associated with the first attribute is identified to associate the label with the input field based on the identified proximity type. At step 616, semantics of the label and their corresponding input field is analyzed based on the identified proximity type. At step 618, a command is generated based on the voice input, the first attribute to populate the voice input in the input field associated with the first attribute. At step 620, the voice input is populated in the input field associated with the first attribute by executing the generated command on the voice based digital screen interaction device 104 using a command execution technique.

[0046] FIGS. 7 A and 7B are flow diagrams illustrating a method for generating commands using the voice interaction enabling server 108 based on voice input according to an embodiment herein. At step 702, a type of a program that runs on a digital screen of the voice based digital screen interaction device 104 is identified. At step 704, the digital screen is parsed as an image in real time using an image processing technique to identify an overall layout of the digital screen. The image processing technique removes unnecessary images and graphics that are embedded in the digital screen. At step 706, attributes are determined by analysing the overall layout of the digital screen. At step 708, the attributes of the digital screen are transformed from a machine-readable language to the natural language of the user using a natural language processing technique. The attributes comprise a static text, a menu bar option, a tool bar option, an audio, a video, a label, an input field and/or a semantics on the digital screen and the input fields comprise a text box, a radio button, a drop down list box, a check box, or a multiple choice to select input data. The natural language of the user is a human spoken language. At step 710, the attributes are provided in the natural language of the user 102 on the voice based digital screen interaction device 104 for enabling the user 102 to input a voice input. Each attribute comprises a label, an input field and their semantics. At step 712, the voice input of the user 102 is received in the natural language of the user 102 from the voice based digital screen interaction device 104. The voice input of the user 102 is processed to eliminate unnecessary background noises from the voice input. At step 714, the voice input is transformed from the natural language of the user 102 to the machine -readable language using the natural language processing technique. At step 716, a first attribute to be populated is determined based on the voice input irrespective of an order of a position of the first attribute in the digital screen. At step 718, a proximity type between a label and an input field associated with the first attribute is identified to associate the label with the input field based on the identified proximity type. At step 720, semantics of the label and their corresponding input field is analysed based on the identified proximity type. At step 722, a command is generated based on the voice input, the first attribute to populate the voice input in the input field associated with the first attribute. At step 724, the voice input is populated in the input field associated with the first attribute by executing the generated command on the voice based digital screen interaction device 104 using a command execution technique.

[0047] FIG. 8 shows a diagrammatic representation of a computer system within which a set of instructions for causing the machine to perform any one or more of the methodologies discussed herein may be executed in accordance with an embodiment herein. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client machine in server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a server computer, a client computer, a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term "machine" shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.

[0048] The example computer system includes a processor 802 (e.g., a central processing unit (CPU) a graphics processing unit (GPU) or both), a main memory 804 and a static memory 806, which communicate with each other via a bus 808. The computer system may further include a video display unit 810 (e.g., a liquid crystal display (LCD), a light emitting diode (LED) or a cathode ray tube (CRT)). The computer system also includes an alphanumeric input device 812 (e.g., a keyboard or touch screen), a disk drive unit 814 and a network interface device 816.

[0049] The disk drive unit 814 includes a machine -readable medium 818 on which is stored one or more sets of instructions 820 (e.g. software) embodying any one or more of the methodologies or functions described herein. The instructions 820 may also reside, completely or at least partially, within the main memory 804 and/or within the processor 802 during execution thereof by the computer system, the main memory 804 and the processor 802 also constituting machine-readable media. The instructions 820 may further be transmitted or received over a network 822 via the network interface device 816.

[0050] While the machine-readable medium 818 is shown in an example embodiment to be a single medium, the term "machine-readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term "machine -readable medium" shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present disclosure. The term "machine -readable medium" shah accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals.

[0051] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope.

Claims

FWe claim:

1. A method for interacting with a digital screen of a voice based digital screen interaction device (104) using a voice input of a user (102), comprising:

generating a database that stores information comprising a natural language of a user

(102);

identifying a type of a program that runs on a digital screen of the voice based digital screen interaction device (104);

characterized in that,

parsing the digital screen as an image in real time using an image processing technique to identify an overall layout of the digital screen, wherein the image processing technique removes unnecessary images and graphics that are embedded in the digital screen; and

determining attributes by analysing the overall layout of the digital screen;

transforming the attributes of the digital screen from a machine-readable language to the natural language of the user (102) using a natural language processing technique, wherein the natural language of the user (102) is a human spoken language;

providing the attributes in the natural language of the user (102) on the voice based digital screen interaction device (104) for enabling the user (102) to input a voice input, wherein each attribute comprises a label, an input field and their semantics;

receiving the voice input of the user (102) in the natural language of the user (102) from the voice based digital screen interaction device (104), wherein the voice input of the user (102) is processed to eliminate unnecessary background noises from the voice input;

transforming the voice input from the natural language of the user (102) to the machine- readable language using the natural language processing technique;

determining a first attribute to be populated based on the voice input irrespective of an order of a position of the first attribute in the digital screen;

identifying a proximity type between a label and an input field associated with the first attribute to associate the label with the input field based on the identified proximity type; and analysing semantics of the label and their corresponding input field based on the identified proximity type; generating a command based on the voice input, the first attribute to populate the voice input in the input field associated with the first attribute; and

populating the voice input in the input field associated with the first attribute by executing the generated com maud on the voice based digital screen interaction device (104) using a command execution technique.

2. The method as claims in claim 1 , wherein the method comprises

accessing functions selected from a group comprising Home, Tools, View, Message, Submit, Save, Backspace, Pagedown, Pageup based on the voice input to operate of the voice digital screen based interaction device (104).

3. The method as claimed in claim 1, wherein the method comprises

analysing a behaviour of the user (102), using a machine learning algorithm, based on the user’s interactions with different digital screens of the voice based digital screen interaction device (104);

generating data analytics on which digital screens or attributes that the user (102) is spending maximum amount of time in interacting with; and

automatically prepopulating input fields associated with the attributes the standard values based on the user behaviour analysis and data analytics.

4. The method as claimed in claim 1, wherein the method comprises implementing a voice authentication process to authenticate authorized users to access the digital screen that are secured.

5. The method as claimed in claim 1, wherein the method comprises

verifying the voice input of the user (102) when a format of received input data for an input field is different from an expected input data format; and

displaying or playing the attributes of the digital screen to the user (102) in the natural language of the user (102).

6. The method as claimed in claim 1, wherein the voice input of the user (102) is captured using a microphone associated with the voice based interaction device (104).

7. The method as claimed in claim 1, wherein the method comprises eliminating grammatical errors from the voice input using the natural language processing technique to determine input data corresponding to different input fields.

8. The method as claimed in claim 1, wherein the method comprises implementing a machine learning algorithm to interpret abbreviations, synonyms, and other forms of pronouncing the labels and to analyse standard values provided by the user (102) for prepopulating in the respective input fields.

9. The method as claimed in claim 1, wherein the attributes comprise at least one of a static text, a menu bar option, a tool bar option, an audio, a video, a label, an input field and/or a semantics on the digital screen and the input fields comprise a text box, a radio button, a drop down list box, a check box, or a multiple choice to select input data.

10. A system for interacting with a digital screen of a voice based digital screen interaction device (104) using a voice input of a user (102), comprising:

a memory that stores a set of instructions, wherein said memory comprises a database that stores information associated with a natural language of a user (102); and

a processor that executes said set of instructions to perform:

characterized in that,

determining attributes by analysing the overall layout of the digital screen; transforming the attributes of the digital screen from a machine-readable language to the natural language of the user (102) using a natural language processing technique, wherein the natural language of the user (102) is a human spoken language;

transforming the voice input from the natural language of the user (102) to the machine-readable language using the natural language processing technique;

identifying a proximity type between a label and an input field associated with the first attribute to associate the label with the input field based on the identified proximity type; and

analysing semantics of the label and their corresponding input field based on the identified proximity type;

generating a command based on the voice input, the first attribute to populate the voice input in the input field associated with the first attribute; and

populating the voice input in the input field associated with the first attribute by executing the generated command on the voice based digital screen interaction device (104) using a command execution technique.