WO2021144339A1

WO2021144339A1 - System and method to capture and analyze audio samples

Info

Publication number: WO2021144339A1
Application number: PCT/EP2021/050642
Authority: WO
Inventors: Alok Bhagwat Joshi
Original assignee: Unilever Ip Holdings B.V.; Unilever Global Ip Limited; Conopco, Inc., D/B/A Unilever
Priority date: 2020-01-14
Filing date: 2021-01-14
Publication date: 2021-07-22

Abstract

Systems and methods are described for providing an audio interaction system. A stream of audio samples related to articles of interest is extracted from an audio capturing unit. A set of article attributes are extracted from the received stream of audio samples and the set of article attributes relate to the articles of interest. Determine one or more article parameters for each of the articles of interest and the corresponding first set of article attributes for each of the articles of interest. Further, a correlation of the one or more article parameters for each of the article of interest over a predefined period of time is determined to form a pattern; and analysed at the processor said pattern regarding the one or more articles of interest, using the pattern to provide recommendations for business management.

Description

SYSTEM AND METHOD TO CAPTURE AND ANALYZE AUDIO SAMPLES

Field of the invention

The present disclosure relates to the field of capturing audio samples. More specifically, the present disclosure relates to a system and method for processing and analyzing audio samples.

Background of the invention

A conventional product purchase approach facilitates an entity (such as a guest entity for example, a buyer) to procure an article of interest (the article of interest is herein referred to as an item or a service herewith). Based on the procured article of interest, attributes relating to procurement preferences, time and frequency of procurement/purchase by the guest entity, for instance, can be determined by a host entity (such as a retailer) for suggesting and promoting the article of interest in the future. While in a conventional buying approach, the guest entity’s buying preference is determined, the intent and decision-making steps for procuring the article of interest with specific attributes are not clearly established.

Available audio-controlled computing devices can capture spoken words and other audio inputs through a microphone and perform audio recognition to identify audio commands from received audio samples. The audio-controlled computing devices may then use the audio commands to perform various tasks such as purchasing the article of interest over electronic networks with aid of an on-line service provider. However, general entity interactions with the audio-controlled computing devices are limited to receiving certain audio commands related to a particular task, and are not associated with determination of desire, preference, trend or decision making of the guest entity to procure the article of interest thus depriving both the entities (guest and host) of useful information related to the article of interest which is crucial in performing competitive and demand analysis. In view of the foregoing, there is a need in the art for determining the decision making of the guest entity before procuring the article of interest by capturing interactions of the entity with other entities along with forecasting requirement of the articles of interest by determining various quantitative and qualitative attributes of the articles of interest.

Summary of the invention

First aspect of the present invention provides a method to analyze an audio interaction, said method comprising: receiving a stream of audio samples captured using an audio capturing unit, from a first entity and a second entity; extracting, at a processor operatively coupled with the audio capturing unit, audio samples pertaining to one or more articles of interest from the received stream of audio samples; extracting, at the processor, a first set of article attributes from the received stream of audio samples, wherein the first set of article attributes correspond to the one or more articles of interest; determining, at the processor, for each article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest, one or more article parameters; determining, at the processor, a correlation of the one or more article parameters for each of the article of interest and the corresponding first set of article attributes over a predefined period of time to form a pattern; and analysing at the processor said pattern regarding the one or more articles of interest, using the pattern to provide recommendations for business management.

Second aspect of the present invention provides a system to analyse an audio interaction, said system comprising: an audio capturing unit adapted to receive a stream of audio samples from a first entity and a second entity; a processing unit operatively coupled to the audio capturing unit, the processing unit comprising a processor communicatively coupled to a memory, the memory storing a set of instructions executable by the processor, wherein, when the system is in operation, the processor is configured to execute the set of instructions to enable the processing unit to: extract from the audio capturing unit, audio samples pertaining to one or more articles of interest from the received stream of audio samples; extract a first set of article attributes from the received stream of audio samples, wherein the first set of article attributes pertain to the one or more articles of interest; determine for each article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest, one or more article parameters; determine a correlation of the one or more article parameters for each of the article of interest of the one or more articles of interest over a predefined period of timeto form a pattern; and analysing at the processor said pattern regarding the one or more articles of interest, using the pattern to provide recommendations for business management. This and other aspects, features and advantages will become apparent to those of ordinary skill in the art from a reading of the following detailed description and the appended claims. For the avoidance of doubt, any feature of one aspect of the present invention may be utilized in any other aspect of the invention. The word “comprising” is intended to mean “including” but not necessarily “consisting of” or “composed of.” In other words, the listed steps or options need not be exhaustive. It is noted that the examples and drawings given in the description below are intended to clarify the invention and are not intended to limit the invention to those examples and drawings per se. Similarly, all percentages are weight/weight percentages unless otherwise indicated. Except in the operating and comparative examples, or where otherwise explicitly indicated, all numbers in this description indicating amounts of material or conditions of reaction, physical properties of materials and/or use are to be understood as modified by the word “about”. Numerical ranges expressed in the format "from x to y" are understood to include x and y. When for a specific feature multiple preferred ranges are described in the format "from x to y", it is understood that all ranges combining the different endpoints are also contemplated.

Brief description of the drawings The accompanying drawings are included to provide a further understanding of the present disclosure and are incorporated in and constitute a part of this specification.

FIG. 1 illustrates an exemplary network implementation of the proposed audio interaction system, which facilitates capturing audio samples from multiple entities, in accordance with an aspect of the present disclosure.

FIG. 2 illustrates exemplary functional components of the proposed audio interaction system, in accordance with an aspect of the present disclosure. FIG. 3 illustrates an exemplary block diagram of the proposed audio interaction system with various components, in accordance with an aspect of the present disclosure.

FIG. 4 illustrates an exemplary method for analysis of a stream of audio samples in accordance with embodiments of the present disclosure.

FIG. 5 illustrates an exemplary computer system to implement the proposed audio interaction system, in accordance with aspects of the present disclosure.

Detailed description of the invention

In the following description, numerous specific details are set forth to provide a thorough understanding of aspects of the present invention. It will be apparent to one skilled in the art that aspects of the present invention may be practiced without some of these specific details.

Aspects of the present invention include various steps, which will be described below. The steps may be performed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps. Alternatively, steps may be performed by a combination of hardware, software, firmware and/or by human operators.

The present invention may be provided as a computer program product, which may include a machine-readable storage medium tangibly embodying thereon instructions, which may be used to program a computer (or other electronic devices) to perform a process. The machine-readable medium may include, but is not limited to, fixed (hard) drives, magnetic tape, floppy diskettes, optical disks, compact disc read-only memories (CD-ROMs), and magneto-optical disks, semiconductor memories, such as ROMs, PROMs, random access memories (RAMs), programmable read-only memories (PROMs), erasable PROMs (EPROMs), electrically erasable PROMs (EEPROMs), flash memory, magnetic or optical cards, or other type of media/machine-readable medium suitable for storing electronic instructions (e.g., computer programming code, such as software or firmware).

Various methods described herein may be practiced by combining one or more machine-readable storage media containing the code according to the present invention with appropriate standard computer hardware to execute the code contained therein. An apparatus for practicing various aspects of the present invention may involve one or more computers (or one or more processors within a single computer) and storage systems containing or having network access to computer program(s) coded in accordance with various methods described herein, and the method steps of the invention could be accomplished by modules, routines, subroutines, or subparts of a computer program product. If the specification states a component or feature “may”, “can”, “could”, or “might” be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic. As used in the description herein and throughout the claims that follow, the meaning of “a,” “an,” and “the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.

The invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary aspects are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the aspects set forth herein. These aspects are provided so that this invention will be thorough and complete and will fully convey the scope of the invention to those of ordinary skill in the art. Moreover, all statements herein reciting aspects of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future (i.e., any elements developed that perform the same function, regardless of structure).

While the present invention has been illustrated and described, it will be clear that the invention is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions, and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the invention.

The present disclosure relates to the field of capturing audio samples. More specifically, the present disclosure relates to a system and method for processing and analyzing audio samples. Method

The present invention provides a method to analyze an audio interaction, said method comprising: receiving a stream of audio samples captured using an audio capturing unit, from a first entity and a second entity; extracting, at a processor operatively coupled with the audio capturing unit, audio samples pertaining to one or more articles of interest from the received stream of audio samples; extracting, at the processor, a first set of article attributes from the received stream of audio samples, wherein the first set of article attributes correspond to the one or more articles of interest; determining, at the processor, for each article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest, one or more article parameters; and determining, at the processor, a correlation of the one or more article parameters for each of the article of interest and the corresponding first set of article attributes over a predefined period of time to form a pattern; and analysing at the processor said pattern regarding the one or more articles of interest, using the pattern to provide recommendations for business management.

Preferably, business management includes but not limited to retail management, logistics management, supply chain management, inventory management, market analysis and forecast, devising marketing strategy, devising advertising strategy, management of business administration, management of business operation, finance management, marketing management, business stakeholder management, competitor management, business decision making.

It is preferable that business management includes market forecasts, strategizing future sales, decisions relating to strategic positioning of article of interest in market, making informed market decisions, providing recommendations for other entities, and predicting the stock usage and demand; competitive and demand analysis, predictive analysis such as stock usage and demand

It is preferred, the extracted first set of article attributes is any or a combination of a plurality of phrases and one or more words determined, at the processor, from the received stream of audio samples by matching the received stream of audio samples with a second dataset comprising a predefined set of plurality of phrases and one or more words for a set of audio samples. It is preferred, upon an unsuccessful matching of the received stream of audio samples with the second dataset discarding the received stream of audio samples.

It is preferred, the article parameters pertain to any or a combination of category, demand, availability, cost and quality of each of the one or more articles of interest and the corresponding first set of article attributes.

It is preferred that the audio capturing unit initiates capturing of the stream of audio samples in real time without any intervention from the first entity and the second entity. It is preferred that the method comprises: generating, at the processor, a request for registration of the first entity with an entity account; and receiving, at the processor and from the first entity, entity attributes, wherein the entity attributes are associated with recognition of the first entity by the processor. It is preferred that the first entity is authenticated based on receipt and positive identification, at the processor, of the entity attributes.

It is preferred that the entity attributes are selected from a group comprising a voice sample, an audio password and a unique identifier, and wherein said entity attributes are stored in a storage device operatively coupled with the processor.

It is preferred that the first entity and the second entity are present within a predefined area of the audio capturing unit. It is preferred that the received stream of audio samples pertains to an interaction between the first entity, the second entity and amongst the second entity.

It is preferred that the first entity is a host entity and the second entity is a guest entity. The present invention provides that in an embodiment, a stream of audio samples are captured using an audio capturing unit. The stream of audio samples is captured from a first entity and a second entity. The first entity may be a host entity (e.g., a vendor, a retailer, a supplier, a shopkeeper, a dealer, a trader, a merchant and the like) which offers, exhibits and demonstrates articles of interest for sale, exhibition and/or demonstration. The second entity may be a guest entity (e.g., a procurer, an agent, a purchaser, a shopper, a customer, a consumer, an audience and the like).

The present invention provides that in an embodiment, audio samples that relate to one or more articles of interest are extracted from the received stream of audio samples. A first set of article attributes are extracted from the received stream of audio samples. Further, the first set of article attributes may correspond to the one or more articles of interest. The first set of article attributes may be any or a combination of a plurality of phrases and one or more words that are determined from the audio samples. The phrases and words are determined based on a matching of the received stream of audio samples with a second dataset comprising a predefined set of plurality of phrases and words for a set of audio samples.

For example, a set of words and phrases from the host entity may be captured and used to determine what are the articles of interest that are most in demand - based on frequency at which the articles of interest are being provided by the host entity, pricing preference by the guest entity, quality determination of the articles of interest, feedback generated for the articles of interest based on the article attributes by the guest entity to the host entity and the like. Similarly, the audio samples may be gathered from the guest entity based on their interaction amongst the guest entity and/or their interaction with the host entity. Frequent and positive interactions for the article of interest may signify a positive product demand, favorable pricing and so forth. As an illustration when an article of interest is discussed in comparison to other article of interest by the guest entity, the system may determine and extract features such as demand, availability, cost, pricing, competitive demand, preference and the like for the article of interest. The present invention provides that in an embodiment, one or more article parameters are determined for each of an article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest.

For example, the article of interest may be various products, food items, or services. The article parameters may be any or a combination of category, demand, availability, cost, quantity and quality for each of the article of interest. The one or more article parameters may be determined for each of the article of interest and the corresponding first set of article attributes. Further, a correlation of the article parameters is determined for each of the article of interest and corresponding first set of article attributes over a predefined period of time.

The present invention provides that in an embodiment, a request is received from the first entity for registration with an entity account. The request may comprise entity attributes associated with recognition of the first entity and may include a group comprising a voice sample, an audio password, and a unique identifier associated with the first entity. The entity attributes may be stored in a storage device. Further, the first entity is authenticated based on positive identification of the received entity attributes.

The present invention provides that in an embodiment, the first entity and the second entity may be present within a predefined area of the audio capturing unit. In an example, the audio samples may be captured from the first entity and the second entity via a telephone or via an e-commerce application.

System

The present invention provides a system to analyse an audio interaction, said system comprising: an audio capturing unit adapted to receive a stream of audio samples from a first entity and a second entity; a processing unit operatively coupled to the audio capturing unit, the processing unit comprising a processor communicatively coupled to a memory, the memory storing a set of instructions executable by the processor, wherein, when the system is in operation, the processor is configured to execute the set of instructions to enable the processing unit to: extract from the audio capturing unit, audio samples pertaining to one or more articles of interest from the received stream of audio samples; extract a first set of article attributes from the received stream of audio samples, wherein the first set of article attributes pertain to the one or more articles of interest; determine for each article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest, one or more article parameters; and determine a correlation of the one or more article parameters for each of the article of interest of the one or more articles of interest over a predefined period of time.

It is preferred that the audio capturing unit comprises an array of microphones for capturing the stream of audio samples. An article of interest (also referred to herein interchangeably as articles of interest) for the purpose of the present invention may be an item or service that can be represented and offered for sale. For example, the article of interest may be a physical good (e.g., clothing, personal care products, hardware, electronics, etc.), a digital item (e.g., audio, video, image, etc.) that is exhibited virtually, a service (e.g., landscaping, banking, house painting, cleaning, etc.) that is offered for sale singly or with plurality of the other articles of item. An article of interest may correspond to a single article or a group of article of interest.

A first entity for the purpose of the present invention may be a first entity (e.g., host entity) and one or more second entities (e.g., guest entity). The first entity may be such as a vendor, a retailer, a supplier, a shopkeeper, a dealer, a trader, a merchant and the like which offers, exhibits and demonstrates the articles of interest for sale, exhibition and/or demonstration. The first entity may host sale, exhibition and demonstration of the articles of interest either physically (e.g., in a wholesale store) or on an e- commerce site. The one or more second entities may be such as a procurer, an agent, a viewer, an audience. It is highly preferable that when the articles of interest are made available in a commercial set-up, the one or more second entities are prospective buyers and when the articles of interest are made available in an exhibitor, demonstration type of set-up, the one or more second entities are viewers or audience.

The present invention provides an audio interaction system that may facilitate receiving a stream of audio samples from a first entity and a second entity. The received audio samples may be captured using an audio capturing unit. The received stream of audio samples may pertain to the audio interaction occurring between the first entity, the second entity and amongst the second entities. The first entity and the one or more second entities may be present within a predefined area of the audio recognition unit. A first set of article attributes may be extracted from the audio samples and the audio samples may pertain to the one or more articles of interest. Further, a first set of article attributes may be extracted from the received stream of audio samples. The first set of article attributes may pertain to any or a combination of a plurality of phrases and one or more words determined from the received stream of audio samples. The first set of article attributes may be determined by matching the received stream of audio samples with a second dataset, where the second dataset may comprise a predefined set of plurality of phrases and one or more words for a set of audio samples. Additionally, a first set of article attributes are extracted from the received stream of audio samples. The first set of article attributes may correspond to the one or more articles of interest. Multiple article parameters are determined for each of the article of interest. Furthermore, a correlation of the one or more article parameters for each of the article of interest and the corresponding first set of article attributes over a predefined period of time may be determined to form a pattern; and analysed at the processor said pattern regarding the one or more articles of interest, using the pattern to provide recommendations for business management. The article parameters may be any or a combination of category, demand, availability, cost and quality of each of the one or more articles of interest and the corresponding first set of article attributes, trends and/or patterns may be derived indicative of the article parameters pertaining to the article of interest. The trends and/or patterns may be determined for each of the articles of interest being procured or being in demand. The articles of interest may be determined by capturing the audio interaction amongst the host entity and the guest entity. The trends and/or patterns may be utilized for market forecast of the articles of interest, for decision making and performing competitive and demand analysis and also for registering a real-time feedback of the first entity and the one or more second entities with respect to the articles of interest or a group of the articles of interest. The real time feedback may be categorized as unbiased feedback.

The present invention facilitates generating a request for registration of the first entity with an entity account. Entity attributes are received from the first entity and are associated with recognition of the first entity. The first entity may be authenticated based on receipt and positive identification of the entity attributes. Further, the entity attributes may be selected from a group comprising a voice sample, an audio password and a unique identifier. The entity attributes may be stored in a storage device.

The present invention provides the audio capturing unit that is operatively coupled with the audio interaction system to capture audio samples. The audio capturing unit may include inbuilt microphones to capture the audio samples, wherein, in an instance, the microphone of the audio recognition unit may capture the audio samples automatically without any manual intervention. The audio capturing unit of the present invention may be stationary or can be maneuvering in the predefined area or region to gather the audio samples.

The present invention provides the audio interaction system where the extracted first set of article attributes are determined from the received stream of audio samples by matching the received stream of audio samples with a second dataset comprising a predefined set of plurality of phrases and one or more words for a set of audio samples. The extracted first set of article attributes is any or a combination of a plurality of phrases and one or more words. Upon determination of an unsuccessful match of the received stream of audio samples with the second dataset, the received stream of audio samples is discarded.

The present invention provides the audio interaction system where a request for registration of the first entity with an entity account is generated from the first entity. Along with the request for registration, entity attributes are received from the first entity. The entity attributes may be associated with recognition of the first entity by the one or more processors operatively coupled with the system. The first entity may be authenticated based on receipt and positive identification of the received entity attributes, where the entity attributes are selected from a group comprising a voice sample, an audio password and a unique identifier. The entity attributes are stored in a storage device.

The present invention provides the audio interaction system where the first entity and the one or more second entities are present within a predefined area of the audio capturing unit. The predefined area can be an aisle or a particular section of a building providing the articles of interest for procurement. The predefined area may be an e- commerce site, or a site remotely located to the audio capturing unit. In an exemplary embodiment, the present invention may facilitate interaction of the first entity (e.g., host entity) with the one or second entities (e.g., guest entity) over an electronic network (e.g., internet, e-commerce website) where the audio recognition happens over an electronic network means. The present invention may facilitate capturing the audio samples using the audio capturing unit over an audio or a video call.

The audio interaction system of the present invention may be operatively coupled with one or more processors. The one or more processors may be configured to process the captured audio samples to determine and remove noise for efficient analysis of the audio samples.

FIG. 1 illustrates an exemplary network implementation 100 of the proposed audio interaction system, which facilitates capturing audio samples from multiple entities, in accordance with an aspect of the present disclosure. In context of the network architecture, an audio interaction system 102(individually referred to as the system 102, hereinafter) is presented. The system 102 may be implemented in any computing device and may be configured/operatively connected with a server 106. The system 102 may be communicatively coupled with one or more entities 104-1, 104-2,.., 104-N (individually referred to as the entity 104 and collectively referred to as the entities 104, hereinafter) (the entities 104 may include the first entity and the one or more second entities, hereinafter) through a voice-user interface (VUI) present on an audio capturing unit (not shown) operatively coupled with the system 102. Through the VUI, the system 102 may capture the audio samples from the one or more entities 104 mobile present in a predefined area of the system 102. As can be appreciated by one skilled in the art an array of microphones may be present within the VUI which may facilitate to capture the audio samples with a high degree of accuracy. The captured audio samples may be used to generate one or more audio signals. Further, the generated audio samples may be processed to evaluate and determine procurement parameters for the article of interest. As will be appreciated, the system

102 may also receive as the audio samples- background noise, silence, background speech, spoken noise from the predefined area and so forth.

Those skilled in the art would appreciate that, the network 108 can be a wireless network, a wired network or a combination thereof that can be implemented as one of the different types of networks, such as Intranet, Local Area Network (LAN), Wide Area Network (WAN), Internet, and the like. Further, the network 108can either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like.

The entities 104 may connect to the server 106 via the network 108. The server 106 may be a management server, a web server, or any other electronic device or a computing system capable of receiving and sending data. In some aspects, the server 106 may be a laptop computer, a notebook computer, a tablet computer, a personal computer (PC), a desktop computer, a smartphone, a personal digital assistant (PDA), or any programmable device capable of communication with the entities 104 over the network 108. In other aspects, the server 106 may represent a server computing system utilizing multiple computers as a server system, such as in a cloud computing environment. Further, the entities 104 may be the host entity and the one or more guest entities.

The host entity of the entities 104 may generate a request for registration with an entity account by providing entity attributes that are stored in database. The entity attributes may be associated with recognition of the first entity. The host entity may be authenticated based on receipt and positive identification of the entity attributes. The entity attributes may be selected from a group comprising a voice sample, an audio password and a unique identifier.

FIG. 2 illustrates exemplary functional components 200 of the proposed audio interaction system 102, in accordance with an aspect of the present disclosure.

The audio interaction system 102, of the present invention may include one or more processor(s) 202. The one or more processor(s) 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that manipulate data based on operational instructions. Among other capabilities, the one or more processor(s) 202 may be configured to fetch and execute computer-readable instructions stored in a memory 206 of the audio interaction system 102. The memory 206 may store one or more computer-readable instructions or routines, which may be fetched and executed to create or share the data units over a network service. The memory 206 may comprise any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like.

The system 102may also comprise an interface(s) 204. The interface(s) 204 may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 204 may facilitate communication of the audio interaction system 102with various devices coupled to the audio interaction system 102 such as an input unit and an output unit. The interface(s) 204may also provide a communication pathway for one or more components of the audio interaction system 102. Examples of such components include, but are not limited to, processing engine(s) 208and database 210. The processing engine(s) 208 may be implemented as a combination of hardware and programming (for example, programmable instructions) to implement one or more functionalities of the processing engine(s) 208. In examples described herein, such combinations of hardware and programming may be implemented in several different ways. For example, the programming for the processing engine(s) 208 may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the processing engine(s) 208 may comprise a processing resource (for example, one or more processors), to execute such instructions. In the present examples, the machine-readable storage medium may store instructions that, when executed by the processing resource, implement the processing engine(s) 208. In such examples, the system 102 may comprise the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separate but accessible to the audio interaction system 102and the processing resource. In other examples, the processing engine(s) 208 may be implemented by electronic circuitry. The database 210 may comprise data that is either stored or generated as a result of functionalities implemented by any of the components of the processing engine(s) 208.

The processing engine(s) 208 may comprise an audio samples extracting unit 212, an article attributes extracting unit 214, an article parameters determination unit 216, and other supplementary unit(s) 218.

It would be appreciated that units being described are only exemplary units and any other unit or sub-unit may be included as part of the engine 102. These units too may be merged or divided into super-units or sub-units as may be configured. Audio Samples Extracting Unit 212

A stream of audio samples may be captured from a first entity and one or more second entities using an audio capturing unit. Audio samples that are related to one or more articles of interest are extracted from the received stream of audio samples.

The stream of audio samples obtained from the first entity and the one or more second entities present in a store (e.g., a shopping complex, a shopping mall, a market, an outlet and the like)may be captured via the microphones configured with the VUI that is operatively coupled to each of the audio interaction system 102. The microphones may include an array of microphones that are configured to capture the audio samples from the predefined area of the audio interaction system 102 via the VUI. Based on the captured audio samples, the audio interaction system 102 may generate corresponding audio signals to be processed thereof. A direction of the first entity 104 and the one or more second entities 104 speaking to the audio interaction system 102 may be determined using the array of microphones. Further, the array of microphones may be configured to perform various noise cancellation techniques to remove background noise and isolate entity speech from the generated audio signal.

The microphones may be configured to capture the audio samples while the first entity and the one or more second entities are moving. As an example, the entities 104 may be walking through, facing towards the audio capturing unit at one instance and move away from the audio capturing unit at another instance. Further, the entities 104 may be walking through and speaking simultaneously, or in an instance, the audio capturing unit may be maneuvering for capturing the audio samples.

The audio interaction system 102 may detect the occurrence of the audio samples from the first entity and the one or more second entities that interact with the system 102 through a voice trigger, for example by uttering a phrase that may prompt the system 102 to begin capturing the audio samples. Further, the system 102 may capture any of the audio samples being produced in the predefined area (e.g., surrounding locations) of the system 102 without receiving any prompt or input from the first entity 104 and the one or more second entities 104. For example, an event or a pre-determined setting may trigger the audio interaction system 102 to capture the audio sample produced in the surrounding locations.

The quality of the captured audio sample may be affected due to reasons such as related to background noise, and movement of the entities. The background noise may include sounds from, for example, surrounding appliances, footsteps, music, etc.

The audio samples may be analyzed locally within the system 102 using a Natural language processing (NLP) technique or by sending the audio signals to the server 106. The NLP technique may process and analyze natural language data to understand and derive meaning from a language spoken by the entity. Through the NLP, the audio samples may be organized and structured to perform tasks such as automatic summarization, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.

Article Attributes Extracting Unit 214

The present invention facilitates extracting a first set of article attributes from the received stream of audio samples. The first set of article attributes may correspond to one or more articles of interest. The extracted first set of article attributes may be any or a combination of a plurality of phrases and one or more words determined from the received stream of audio samples. Subsequently, the plurality of phrases and the one or more words determined from the received stream of audio samples may be matched with a second dataset comprising a predefined set of plurality of phrases and one or more words for a set of audio samples.

For example, based on the article of interest such as a product being in demand brand names for the products are determined. For example, for a product say ABC, the brand name (such as say XYZ, DEF, JLK and so forth) for the products being ordered or being in demand from the entities may be determined. These brand names may appear as the article attributes that correspond to the articles of interest. Further, to avoid ambiguity in determination of the article attributes (e.g., the brand name), the article attributes may be normalized by a list that associates the brand names with replacement brand names. For example, the list may include an entry for every variation of the company along with an associated replacement name for the company.

Article Parameters Determination Unit 216 The present invention facilitates determining one or more article parameters for each of the multiple articles of interest and the corresponding first set of article attributes. The article parameters may pertain to any or a combination of category, demand, availability, cost and quality of each of the one or more articles of interest and the corresponding first set of article attributes. Thereafter, is determined a correlation of the one or more article parameters for each of the article of interest and the corresponding first set of article attributes over a predefined period of time. The article parameters may pertain to any or a combination of category, demand, availability, cost and quality of each of the one or more articles of interest and the corresponding first set of article attributes.

The present invention provides that the determined article parameters may be used to generate trends/patterns based on determination of a correlation of the article parameters for each of the article of interest and corresponding first set of article attributes over a predefined period of time. The generated trends/patterns may facilitate determining a deep insight into the entity’s requirements by knowing what, when and where the guest entities are looking for. Based on the requirements the store may be armed up with stock that entice the entities into easier and convenient procuring. Further, intelligence and procurement demand of the entities may be determined based on the generated patterns to improve promotional efficiency, build persuasive promotions and facilitate the entities with desired articles of interest for future procurement.

The present invention provides that the trends/patterns may be created based on the correlation that exists between the article of interest and the one or more article attributes of the articles of interest. For example, if the article of interest is a sports shoe, and may be available from multiple different host entities then in such an implementation, the guest entity may be associated for procuring of the sports shoe with each of the host entities based on availability, pricing, quality of the article of interest. The sports shoe from the host entity may be selected by the guest entity based on price, the guest entity’s past purchase history, and/or any other factors.

As an example, the article parameters may be based on various factors, such as life styles, customs, common habits, and change in fashion, standard of living, age, and gender of the guest entities. As can be appreciated by one skilled in the art the article parameters may be based on advertisements for catching attention of the guest entities, informing them about the availability of a product, demonstrating the features of the product to potential guest entities, and persuading them to purchase the product. In another example, demand for the product and hence the article parameters may be affected by climatic conditions of the product. For example, demand of ice-creams and cold drinks increases in summer, while tea and coffee are preferred in winter. Some products have a stronger demand in hilly areas than in plains. The present invention provides that the patterns/trends may further be created based on sentiments of the entities preferably with respect to a particular article of interest. The sentiments may enable determining sensitivity of the content of the audio sample and hence in response threshold confidence level of the entity in regards to the article of interest is determined. Further, the patterns may be based on trending topics as determined from the audio samples. The patterns may be used to make informed decisions, provide recommendations preferably for other entities, and to predict the stock usage and demand.

The present invention provides that the correlation of the article parameters for each of the article of interest and the corresponding first set of article attributes may be determined. The determined correlation may be represented as the patterns/trends and may present and categorize a percentage of the procurements by the entities related to specific articles of interest such as sports, household improvement, video games, skin and body care, cooking and baking essentials, etc. Also, based on the captured audio samples it may be determined that a group of inter-related entities visiting the store may influence a household purchase rather than an individual entity who visits the store for procuring the articles of interest of individual interest. The patterns may thus depict the number of guest entities making an individual purchase or a household purchase based on the captured audio samples. Additionally, in an aspect, the captured patterns may signify the articles of interest that need to be stocked regularly based on the audio samples that determine and evaluate whether the procurers are consistent in procurement of the articles of interest.

The present invention provides that the trends/patterns may be used to evaluate decision making of the entity, perform demand analysis, and competitive analysis for the item or services. The trends/patterns may further also be used to determine trends in dissatisfaction and negative language so as to more easily mitigate and reduce concerns of the entities. Further, the trends/patterns based on the demographics of the entity can be effectively used to provide suitable item or services in order to evolve and fall more in line with what the entity's demographics expect.

The present invention facilitates generating a request for registration of the first entity with an entity account by receiving, from the first entity, one or more entity attributes that are associated with recognition of the first entity. The first entity is authenticated based on receipt and positive identification, of the entity attributes. The entity attributes are selected from a group comprising a voice sample, an audio password and a unique identifier. For example, the first entity’s voice sample may be processed to produce a voiceprint which may be stored in a database coupled to the system 102. The first entity may be asked to answer a shared secret question or a standard phrase (such “At EGH bank, my voice is my password”). These phrases may be used to strengthen the entity attributes by providing additional authentication samples on which to base the authentication decision.

The present invention provides that the audio samples may be captured during a verbal interaction between the one or more second entities and his/her mobile device, the second entities and another second entities, the second entities and the first entity, multiple first entities, and amongst the plurality of second entities and the first entity.

The supplementary units 218 implement functionalities that supplement applications or functions performed by the audio interaction system 102, processor(s) 202 or the processing engine(s) 208. The present invention provides in an exemplary embodiment that audio samples corresponding to the first entity and the plurality of second entities may be received in real-time. The correlation between the two audio samples is determined to understand the pattern of audio samples being exchanged during a discussion between the two entities so as to, for instance, understand the kind/type/mode of feedback pertaining to a given product that one of the entity is giving to the another entity while the procurement is being made or contemplated. Similarly, audio sample sets such as AS_1 and AS_2 may be received by the proposed system from two related or unrelated entities so as to analyze feedback pertaining to the articles of interest, and how the articles of interest procurement decisions are being made between the two entities.

The present invention provides that analysis of the received/processed audio samples may also be sent to a server/cloud for storage of said analysis along with further decision making by the entities as to how to place/position/market the product and enable reception of product feedback and opinion from the entities in real time.

The present invention provides that the audio samples received from the audio interaction system for pattern analysis may be in an encoded form, and may be decoded into, for instance, discrete time/audio signals and subsequently into the frequency domain. For example, a discrete Fourier transform (DFT), fast Fourier transform (FFT), or other discrete mathematical transform may be incorporated to transform discrete time signal into frequency domain. Representing the time signal in the frequency domain may facilitate a more efficient and accurate comparison of the distinguishing vocal characteristics that are in common between an audio input sample and stored audio patterns. Once the discrete signal is transformed to the frequency domain, it may be manipulated or calculated in various ways in a processor (configured either locally at the audio interaction system /reception devices itself, or at the Cloud) to facilitate an accurate comparison of the audio input and stored audio patterns (of different entities whose audio signals have been processed and stored). In an aspect, no audio samples of any entity may be stored at all (or only upon explicit consent), and only the analysis of the received audio patterns may be stored (without associating any identify or information or audio samples of any specific user). In some aspects the stored audio patterns may be represented in the time-frequency domain as a spectrogram, which contains data of the spectral density of a signal varied with time. In another aspect, spectral density may be calculated by squaring magnitude of the frequency domain signal, wherein a processor in the server/local audio reception device, can be a digital signal processor where various signal conditioning may take place, and the processor may filter out noise that exists outside of the fundamental frequency range of the human voice, between about 40-600 Hertz. The frequency domain representation of the audio patterns is stored in a database and may further be conditioned at the processor through any or a combination of logarithmic calculations, moving average filtration, re-sampling, statistical modeling, or various other types of signal conditioning.

The present invention provides that the audio interaction system 102 of the present disclosure may use an acoustic model and a lingual model to recognize input voice, and associate the recognized voice with attributes of the entity to which it pertains. The attributes may include for instance, the approximate age/range of the entity, gender, among other attributes. The system may further include a parsing unit that may use a parser to comprehend, from the recognized text, the meaning of the speech, and associate with a control engine that may utilize a database (on cloud or stored locally within the system 102) to determine a demographic and psychographic profile of the entities, which can assist in making more informed and accurate decisions.

The proposed system may further employ a feedback control unit to receive and interpret incorrect audio samples (based on feedback received from host entity) so as to learn from feedback and make further subsequent analysis more concrete and accurate.

FIG. 3 illustrates an exemplary block diagram 300 of the proposed audio interaction system with various components, in accordance with an aspect of the present disclosure.

The audio interaction system 102 includes an array of microphones302 and an audio pattern generator engine 304. There may be a single microphone or an array of microphones and may be used to capture the audio samples from the entities. The audio samples may be captured based on verbal communication between the entities, the entities and their respective devices, noise from surroundings and so forth. The array of microphones 302 may be enclosed under the VUI (not shown) present on the audio interaction system 102. The array of microphones 302 may enable the engine 102 to capture the audio samples with a high degree of accuracy for performing analysis and generating the audio signals. The array of microphones 302 may be composed of a number of individual microphones linked together using a technology known as beam forming. In beam forming, complex trigonometric functions may be used to combine the individual microphones to create a highly directional beam, which may focus on the one or more entities. The directional beam may be governed to track the moving entities in the surrounding locations so that the audio samples from the entities may be captured efficiently.

The audio pattern generator engine 304 may facilitate to generate patterns from the received audio signals. For example, the patterns may be based on correlation of article parameters for each of the article of interest and corresponding first set of article attributes. The entities 104 may be categorized based on their age group, procuring habits, procuring frequency and so forth. The patterns may be generated based on the articles of interest (e.g., items or services) being procured by the entities 104 of a certain age group, gender, occupations, income bracket, etc. Additionally, the generated patterns may be determined by capturing certain words from the audio samples related to such as demand, cost, quality, availability discounts, and promotions for the articles of interest. Furthermore, the patterns may elaborate upon depicting the entities of a certain age group, and of a particular gender that visits the store during weekends, weekdays, during sale timings, etc.

The present invention provides that in an exemplary embodiment, the generated patterns may be related to the articles of interest (e.g., items or services) that are in demand and present at the store, the articles of interest in demand present in short supply at the store and the articles of interest in demand and not present at the store and those that need to be refilled at the store. The generated patterns may depict the entities 104 that buy premium articles of interest or procure the articles of interest based on their interaction with other entities, such as based on a product recommendation from other host entity or guest entities. The present invention provides that in an exemplary embodiment, the patterns/trends generated based on the determined correlation may indicate that the entities may choose to procure a new article of interest over the articles of interest procured by the entity in the past, and where the procurement of the new the article of interest is based on a promotion or a scheme or a sale being offered on the new the article of interest. Additionally, the pattern may depict whether the procurer is rigid or flexible in procuring the new the articles of interest, based on the captured audio samples. For example, when the procurer is open to receiving inputs from other entities present in the surrounding locations, such as determined from the audio samples the entity maybe considered to be flexible otherwise rigid in procuring of the new articles of interest. Further, based on the audio signals the patterns may be generated related to an amount of time the entity spends in the store, and how much in which section of the store.

The present invention provides that in an exemplary embodiment the trends and/or patterns may include the one or more words or the phrases that are frequently used by the first entity and the one or more second entities over a period of time. For example, in one implementation, the trends and/or patterns may include top N queries issued by the one or more second entities during a time period. For example, in another implementation the trends and/or patterns may facilitate determining best-rated articles, newly pinned articles, best seller articles, recently reviewed articles, just sold articles, and articles mentioned in the audio samples.

The present invention provides that in an exemplary embodiment when the audio samples are analyzed the trends and/or topics for the associated audio signals is determined. The trends may include popularity information associated with the articles of interest present in the audio signals. The popularity information may indicate how popular the articles of interest have been among the host entity and the guest entities. The popularity information may include, for example, a number of times the articles of interest has been viewed by the guest entities, the number of times the guest entities have commented regarding the articles of interest, the number of times the guest entities have indicated that they like the articles of interest, and the number of times the guest entities have indicated that they dislike the articles of interest.

The present invention provides that in an exemplary embodiment the audio samples are analyzed to determine by the host entity regarding in store availability and demand related to quantity of the articles of interest. In a simplistic example, three disparate articles of interest can be classified; however, there may be a much greater demand for one of the three leading to determination of how much quantity of the said articles of interest be present in the store during peak sale hours or days.

The present invention provides that in an exemplary embodiment the audio samples are analyzed to determine by the host entity from the one or more guest entities quality of the articles of interest offered for sale, such as, for example, during a feature, seasonal, or discount display. The first entity may walk to the second entities to visually inspect and interact about parameters related to the articles of interest such as quality, pricing, costing and the like. The first entity may then make sure that the article of interest that is in demand is present in the store or out of stock.

The present invention provides that in an exemplary embodiment, the audio samples may facilitate determining new procurers and generating a pattern based on the articles of interest the new procurers look for. For example, when an entity procures a lot of baby products from the store on their weekly visits to the store, such as baby powder, baby shampoo, etc. of a particular brand and then buy formula milk at the same store from a pharmacy, the pattern may be generated based on what kind of the articles of interest a family with a new addition is procuring.

Further, the audio interaction system 102 may differentiate or classify the audio samples that include a speech from the audio samples that do not include speech. While a specific example of the audio interaction system 102 is illustrated, one of skill in the art in possession of the present disclosure will recognize that a wide variety of audio capturing devices having various configurations of components may operate without departing from the scope of the present disclosure. The audio interaction system 102 may be connected to the network 108. The entities 104 may connect to the server 106 via the network 108.

FIG. 4 illustrates an exemplary method 400 for analysis of a stream of audio samples in accordance with embodiments of the present disclosure.

The present invention provides that in an embodiment, a stream of audio samples are captured using an audio capturing unit. At step 402, the stream of audio samples is captured from a first entity and a second entity. The first entity may be a host entity (e.g., a vendor, a retailer, a supplier, a shopkeeper, a dealer, a trader, a merchant and the like) which offers, exhibits and demonstrates articles of interest for sale, exhibition and/or demonstration. The second entity may be a guest entity (e.g., a procurer, an agent, a purchaser, a shopper, a customer, a consumer, an audience and the like). The present invention provides that in an embodiment, at step 404, audio samples that relate to one or more articles of interest are extracted from the received stream of audio samples. At step 406, a first set of article attributes are extracted from the received stream of audio samples. Further, the first set of article attributes may correspond to the one or more articles of interest. The first set of article attributes may be any or a combination of a plurality of phrases and one or more words that are determined from the audio samples. The phrases and words are determined based on a matching of the received stream of audio samples with a second data set comprising a predefined set of plurality of phrases and words for a set of audio samples. For example, a set of words and phrases from the host entity may be captured and used to determine what are the articles of interest that are most in demand - based on frequency at which the articles of interest are being provided by the host entity, pricing preference by the guest entity, quality determination of the articles of interest, feedback generated for the articles of interest based on the article attributes by the guest entity to the host entity and the like. Similarly, the audio samples may be gathered from the guest entity based on their interaction amongst the guest entity and/or their interaction with the host entity. Frequent and positive interactions for the article of interest may signify a positive product demand, favorable pricing and so forth. As an illustration when an article of interest is discussed in comparison to other article of interest by the guest entity, the system may determine and extract features such as demand, availability, cost, pricing, competitive demand, preference and the like for the article of interest.

The present invention provides that in an embodiment, at step 408, one or more article parameters are determined for each of an article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest.

For example, the article of interest may be various products, food items, or services. The article parameters may be any or a combination of category, demand, availability, cost, quantity and quality for each of the article of interest. The one or more article parameters may be determined for each of the article of interest and the corresponding first set of article attributes. At step 410, a correlation of the article parameters is determined for each of the article of interest and corresponding first set of article attributes over a predefined period of time.

FIG. 5 illustrates an exemplary computer system 500 to implement the proposed audio recognition device, in accordance with aspects of the present disclosure. As shown in FIG. 5, computer system can include an external storage device 510, a bus 520, a main memory 530, a read only memory 540, a mass storage device 550, communication port 560, and a processor 570. A person skilled in the art will appreciate that computer system may include more than one processor and communication ports. Examples of processor 570 include, but are not limited to, an Intel® Itanium® or Itanium 2 processor(s), or AMD® Opteron® or Athlon MP® processor(s), Motorola® lines of processors, FortiSOC™ system on a chip processors or other future processors. Processor 570 may include various modules associated with aspects of the present invention. Communication port 560 can be any of an RS- 232 port for use with a modem based dialup connection, a 10/100 Ethernet port, a

Gigabit or 10 Gigabit port using copper or fiber, a serial port, a parallel port, or other existing or future ports. Communication port 560 may be chosen depending on a network, such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which computer system connects.

Memory 530 can be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. Read only memory 540 can be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or BIOS instructions for processor 570. Mass storage 550 may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces), e.g. those available from Seagate (e.g., the Seagate Barracuda 7102 family) or Hitachi (e.g., the Hitachi Deskstar 7K1000), one or more optical discs, Redundant Array of Independent Disks (RAID) storage, e.g. an array of disks (e.g., SATA arrays), available from various vendors including Dot Hill Systems Corp., LaCie, Nexsan Technologies, Inc. and Enhance Technology, Inc.

Bus 520 communicatively couples processor(s) 570 with the other memory, storage and communication blocks. Bus 520 can be, e.g. a Peripheral Component Interconnect (PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), USB or the like, for connecting expansion cards, drives and other subsystems as well as other buses, such a front side bus (FSB), which connects processor 570 to software system.

Optionally, operator and administrative interfaces, e.g. a display, keyboard, and a cursor control device, may also be coupled to bus 520 to support direct operator interaction with computer system. Other operator and administrative interfaces can be provided through network connections connected through communication port 560. External storage device 510 can be any kind of external hard-drives, floppy drives, IOMEGA® Zip Drives, Compact Disc - Read Only Memory (CD-ROM), Compact Disc - Re-Writable (CD-RW), Digital Video Disk - Read Only Memory (DVD-ROM). Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system limit the scope of the present disclosure. Aspects and embodiments of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or combining software and hardware implementation that may all generally be referred to herein as a “circuit,” “module,” “component,” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product comprising one or more computer readable media having computer readable program code embodied thereon.

Thus, it will be appreciated by those of ordinary skill in the art that the diagrams, schematics, illustrations, and the like represent conceptual views or processes illustrating systems and methods embodying this invention. The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing associated software. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the entity implementing this invention. Those of ordinary skill in the art further understand that the exemplary hardware, software, processes, methods, and/or operating systems described herein are for illustrative purposes and, thus, are not intended to be limited to any particular named.

As used herein, and unless the context dictates otherwise, the term "coupled to" is intended to include both direct coupling (in which two elements that are coupled to each other contact each other) and indirect coupling (in which at least one additional element is located between the two elements). Therefore, the terms "coupled to" and "coupled with" are used synonymously. Within the context of this document terms "coupled to" and "coupled with" are also used euphemistically to mean “communicatively coupled with” over a network, where two or more devices are able to exchange data with each other over the network, possibly via one or more intermediary device.

It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group comprising of A, B, C ... . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.

While the foregoing describes various aspects and embodiments of the present invention, other and further aspects of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described aspects, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

Claims

Claims:

1. A method to analyse an audio interaction, said method comprising: receiving a stream of audio samples captured using an audio capturing unit, from a first entity and a second entity; extracting, at a processor operatively coupled with the audio capturing unit, audio samples pertaining to one or more articles of interest from the received stream of audio samples; extracting, at the processor, a first set of article attributes from the received stream of audio samples, wherein the first set of article attributes correspond to the one or more articles of interest; determining, at the processor, for each article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest, one or more article parameters; determining, at the processor, a correlation of the one or more article parameters for each of the article of interest and the corresponding first set of article attributes over a predefined period of time to form a pattern; and analysing at the processor said pattern regarding the one or more articles of interest, using the pattern to provide recommendations for business management.

2. The method according to claim 1, wherein the extracted first set of article attributes is any or a combination of a plurality of phrases and one or more words determined, at the processor, from the received stream of audio samples by matching the received stream of audio samples with a second dataset comprising a predefined set of plurality of phrases and one or more words for a set of audio samples.

3. The method according to claim 2, wherein upon an unsuccessful matching of the received stream of audio samples with the second dataset discarding the received stream of audio samples.

4. The method according to claim 1, wherein the article parameters pertain to any or a combination of category, demand, availability, cost and quality of each of the one or more articles of interest and the corresponding first set of article attributes.

5. The method according to claim 1, wherein the method comprises: generating, at the processor, a request for registration of the first entity with an entity account; and receiving, at the processor and from the first entity, entity attributes, wherein the entity attributes are associated with recognition of the first entity by the processor.

6. The method according to claim 5, wherein the first entity is authenticated based on receipt and positive identification, at the processor, of the entity attributes.

7. The method according to claim 5, wherein the entity attributes are selected from a group comprising a voice sample, an audio password and a unique identifier, and wherein said entity attributes are stored in a storage device operatively coupled with the processor.

8. The method according to claim 1, wherein the first entity and the second entity are present within a predefined area of the audio capturing unit.

9. The method according to claim 1, wherein the received stream of audio samples pertains to an interaction between the first entity, the second entity and amongst the second entities.

10. The method according to claim 1, wherein the first entity is a host entity and the second entity is a guest entity.

11. A system to analyse an audio interaction, said system comprising: an audio capturing unit adapted to receive a stream of audio samples from a first entity and a second entity; a processing unit operatively coupled to the audio capturing unit, the processing unit comprising a processor communicatively coupled to a memory, the memory storing a set of instructions executable by the processor, wherein, when the system is in operation, the processor is configured to execute the set of instructions to enable the processing unit to: extract from the audio capturing unit, audio samples pertaining to one or more articles of interest from the received stream of audio samples; extract a first set of article attributes from the received stream of audio samples, wherein the first set of article attributes pertain to the one or more articles of interest; determine for each article of interest out of the one or more articles of interest and the corresponding first set of article attributes for each of the article of interest, one or more article parameters; determine a correlation of the one or more article parameters for each of the article of interest of the one or more articles of interest over a predefined period of time; to form a pattern; and analysing at the processor said pattern regarding the one or more articles of interest, using the pattern to provide recommendations for business management.

12. The system according to claim 11, wherein the audio capturing unit comprises an array of microphones for capturing the stream of audio samples.

13. The system according to claim 11, wherein the first entity and the second entity are present within a predefined area of the audio capturing unit.

14. The system according to claim 11, wherein the audio capturing unit initiates capturing of the stream of audio samples in real time without any intervention from the first entity and the second entity.

15. The system according to claim 11, wherein the article parameters pertain to any or a combination of category, demand, availability, cost and quality of each of the articles of interest.