WO2020039282A1

WO2020039282A1 - System and method for providing a binary output based on input packets of data

Info

Publication number: WO2020039282A1
Application number: PCT/IB2019/056357
Authority: WO
Inventors: Sudipta BISWAS; Shivam SAXENA
Original assignee: Biswas Sudipta; Saxena Shivam
Priority date: 2018-07-26
Filing date: 2019-07-25
Publication date: 2020-02-27

Abstract

The present disclosure provides a system and method for providing a binary output based on input packets of data pertaining to an event. The system is configured to: receive a first dataset of visual data pertaining to one or more first attributes associated with the event; determine presence of the one or more first attributes from the received first dataset of visual data based on a processed visual data; receive a second dataset of audio data pertaining to one or more second attributes associated with the event; and determine presence of the one or more second attributes from the received second dataset of audio data based on a processed audio data, wherein, based on presence of the one or more first attributes and the one or more second attributes, a binary output is provided pertaining to the event.

Description

SYSTEM AND METHOD FOR PROVIDING A BINARY OUTPUT BASED ON

INPUT PACKETS OF DATA

TECHNICAL FIELD

[1] The present disclosure relates to an approach for providing a binary output based on input one or more packets of data. In particular, the present disclosure relates to a method and system for providing a binary output based on input one or more packets of data obtained from one or more sensors.

BACKGROUND

[2] Background description includes information that may be useful in understanding the present invention. It is not an admission that any of the information provided herein is prior art or relevant to the presently claimed invention, or that any publication specifically or implicitly referenced is prior art.

[3] Taking decisions based on packets of information in certain scenarios is considered highly valuable. However, to have relevance, it is imperative that a determination system takes relevant and correct decisions in real time.

[4] An exemplary implementation of such a determination system is to assess umpiring decisions in a cricket game that are taken manually and are prone to errors thereby effecting outcome of the cricket games.

[5] In the domain of umpiring an introduced DRS (Decision Review System) scheme has been under a lot of controversy with regards to affecting pace of the cricket game, limit on number of reviews which makes the DRS an unfair system, and lack of consistency when it comes to the umpire's call. The DRS would break all spontaneity and excitement which the shortest format that cricket brings. Moreover, there have been several instances where right decision was not given due to the limit on the number of the reviews and adding to it are the inconsistencies involved in the umpire's call, which makes the system quite inefficient.

[6] Thus, there is a need to provide an automated system and method which facilitates in real time accurate decision making based on the information present in the packets.

[7] All publications herein are incorporated by reference to the same extent as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.

[8] In some embodiments, the numbers expressing quantities or dimensions of items, and so forth, used to describe and claim certain embodiments of the invention are to be understood as being modified in some instances by the term“about.” Accordingly, in some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable. The numerical values presented in some embodiments of the invention may contain certain errors necessarily resulting from the standard deviation found in their respective testing measurements.

[9] As used in the description herein and throughout the claims that follow, the meaning of“a,”“an,” and“the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of“in” includes“in” and“on” unless the context clearly dictates otherwise.

[10] The recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g.“such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the invention.

[11] Groupings of alternative elements or embodiments of the invention disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience and/or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all groups used in the appended claims.

OBJECTS

[12] A general object of the present disclosure is to provide a system and method for providing a binary output based on input packets of information pertaining to an event.

[13] Another object of the present disclosure is to provide a method and a system to make accurate decisions in real time using information obtained from one or more packets.

[14] Another object of the present disclosure is to provide a method and system where decision making is done using an automated computational system such that the machine operates programmatically and provides instant correct decisions instead of a human making manual incorrect decision.

SUMMARY

[15] The present disclosure relates to an approach for providing a binary output based on input one or more packets of data. In particular, the present disclosure relates to a method and system for providing a binary output based on input one or more packets of data obtained from one or more sensors.

[16] In an aspect, the present disclosure provides a system for providing a binary output based on input packets of data pertaining to an event, said system comprising: a memory operatively coupled to one or more processors, the memory storing instructions executable by the one or more processors to: receive, from one or more first sensors operatively coupled with one or more elements involved in the event, a first dataset of data pertaining to one or more first attributes associated with the event; determine presence of the one or more first attributes from the received first dataset of data based on a processed first data, said processed first data obtained from processing, by the one or more processors, the received first dataset of data based on a first parameter; receive, from one or more second sensors operatively coupled with the one or more elements involved in the event, a second dataset of data pertaining to one or more second attributes associated with the event; and determine presence of the one or more second attributes from the received second dataset of data based on a processed second data, said processed second data obtained from processing, by the one or more processors, the received second dataset of data based on a second parameter, wherein, based on presence of any or a combination of the one or more first attributes and the one or more second attributes, a binary output is provided pertaining to the event.

[17] In an embodiment, a log of data pertaining to any or a combination of the first dataset of data and the second dataset of data is provided as output.

[18] In another embodiment, one or more third dataset of data is additionally provided from an external source, and wherein the one or more third dataset is additionally processed to provide the binary output.

[19] In another embodiment, the one or more first sensors are any or a combination of image sensors and depth sensors configured to detect presence and movement of the one or more elements involved in the event, and wherein the first dataset comprises data from either or both of image sensors and depth sensors.

[20] In another embodiment, the one or more second sensors are audio sensors configured to detect sounds produced due to engagement of any or all of the one or more elements involved in the event.

[21] In another embodiment, a neural network is configured to process any or a combination of the received first dataset of data and the received second dataset of data.

[22] In another embodiment, the neural network is trained based on any or a combination of a plurality of first training datasets and a plurality of second training datasets pertaining to the event, said first training datasets and second training datasets being stored in a database operatively coupled with the system.

[23] In another embodiment, the neural network is configured to predict engagement of the one or more elements involved in the event based on historical data pertaining to location of the one or more elements at a plurality of instances preceding the event.

[24] In an aspect, the present disclosure provides a method for providing a binary output based on input packets of data pertaining to an event, said method comprising the steps of: receiving, at a computing device from one or more first sensors operatively coupled with one or more elements involved in the event, a first dataset of data pertaining to one or more first attributes associated with the event; determining, at the computing device, presence of the one or more first attributes from the received first dataset of data based on a processed first data, said processed first data obtained from processing, by the one or more processors, the received first dataset of data based on a first parameter; receiving, at the computing device from one or more second sensors operatively coupled with one or more elements involved in the event, a second dataset of data pertaining to one or more second attributes associated with the event; and determining, at the computing device, presence of the one or more second attributes from the received second dataset of data based on a processed second data, said processed second data obtained from processing, by the one or more processors, the received second dataset of data based on a second parameter, wherein, based on presence of any or a combination of the one or more first attributes and the one or more second attributes, a binary output is provided pertaining to the event.

[25] In an embodiment, a log of data pertaining to any or a combination of the first dataset of data and the second dataset of data is provided as output.

[26] In another embodiment, one or more third dataset of data is additionally provided from an external source, and wherein the one or more third dataset is additionally processed to provide the binary output.

[27] In an embodiment, a neural network is configured to process any or a combination of the received first dataset of data and the received second dataset of data.

[28] In another embodiment, the neural network is trained based on any or a combination of a plurality of first training datasets and a plurality of second training datasets pertaining to the event, said first training datasets and second training datasets being stored in a database operatively coupled with the system.

[29] In another embodiment, the neural network is configured to predict engagement of the one or more elements involved in the event based on historical data pertaining to location of the one or more elements at a plurality of instances preceding the event.

[30] Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF DRAWINGS

[31] The accompanying drawings are included to provide a further understanding of the present disclosure and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. The diagrams are for illustration only, which thus is not a limitation of the present disclosure.

[32] FIG. 1 illustrates exemplary implementation architecture of a system for providing a binary output based on input packets of data pertaining to an event, in accordance with an embodiment of the present disclosure. [33] FIG. 2 illustrates an exemplary module diagram of the system for providing a binary output based on input packets of data pertaining to an event, in accordance with an embodiment of the present disclosure.

[34] FIG. 3A illustrates an input image from a video and FIG. 3B illustrates the network output as a result of 2D image segmentation in accordance with an embodiment of the present disclosure.

[35] FIG. 4A illustrates a simplified representation of the CNN and the FCN in accordance with an embodiment of the present disclosure.

[36] FIGS. 4B-4D shows a visual transformation of the image frames across the network, applied with the output of the results with each of the network in accordance with an embodiment of the present disclosure.

[37] FIG. 5A illustrates the network architecture representation of an encoder- decoder network in accordance with an embodiment of the present disclosure.

[38] FIGS. 5B-5C illustrates the transformation of a depth mapacross the network in accordance with an embodiment of the present disclosure.

[39] FIG. 6 illustrates a flow diagram of a sound detection system, 600 in accordance with an embodiment of the present disclosure.

[40] FIG. 7 illustrates a flow diagram of a DSP block within the sound detection system, 700 in accordance with an embodiment of the present disclosure.

[41] FIG. 8 illustrates a flow diagram of the disclosed decision-making process, 800 in accordance with an embodiment of the present disclosure.

[42] FIG. 9 illustrates an exemplary computer system in which or with which embodiments of the present disclosure can be utilized, in accordance with embodiments of the present disclosure.

DETAILED DESCRIPTION

[43] The following is a detailed description of embodiments of the disclosure depicted in the accompanying drawings. The embodiments are in such detail as to clearly communicate the disclosure. However, the amount of detail offered is not intended to limit the anticipated variations of embodiments; on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present disclosure as defined by the appended claims. [44] If the specification states a component or feature“may”,“can”,“could”, or “might” be included or have a characteristic, that particular component or feature is not required to be included or have the characteristic.

[45] As used in the description herein and throughout the claims that follow, the meaning of“a,”“an,” and“the” includes plural reference unless the context clearly dictates otherwise. Also, as used in the description herein, the meaning of“in” includes“in” and“on” unless the context clearly dictates otherwise.

[46] Exemplary embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments are shown. These exemplary embodiments are provided only for illustrative purposes and so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those of ordinary skill in the art. The invention disclosed may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Various modifications will be readily apparent to persons skilled in the art. The general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Moreover, all statements herein reciting embodiments of the invention, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future (i.e., any elements developed that perform the same function, regardless of structure). Also, the terminology and phraseology used is for the purpose of describing exemplary embodiments and should not be considered limiting. Thus, the present invention is to be accorded the widest scope encompassing numerous alternatives, modifications and equivalents consistent with the principles and features disclosed. For purpose of clarity, details relating to technical material that is known in the technical fields related to the invention have not been described in detail so as not to unnecessarily obscure the present invention.

[47] The use of any and all examples, or exemplary language (e.g.,“such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention otherwise claimed. No language in the specification should be construed as indicating any non - claimed element essential to the practice of the invention. [48] The present disclosure relates to capturing of packets of data-based information pertaining to decision making. In particular, the disclosure relates to the packets of the information obtained from one or more data acquisition unit for decision making.

[49] FIG. 1 illustrates exemplary implementation architecture of a system for providing a binary output, in accordance with an embodiment of the present disclosure.

[50] In an embodiment, the proposed system 110 is a system for providing a binary output based on input packets of data, the data being obtained from a 3D segmentation unit and a sound detection unit that can have a bearing on providing a binary output. Although the present subject matter is explained considering that the systeml lO is implemented as an application on a server 102, it would be appreciated that the system 110 can also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a server, a network server, a cloud-based environment and the like. It would be appreciated that the determination system 110 may be accessed by multiple users 106-1, 106-2... 106-N (collectively referred to as users 106 and individually referred to as the user 106 hereinafter), through one or more computing devices 108-1, 108- 2... 108-N (collectively referred to as computing devices 108 hereinafter), or applications residing on the computing devices 108. In an aspect, the proposed determination system 110 can be operatively coupled to a website and are operable from any Internet enabled computing device 108. Examples of the computing devices 108 may include, but are not limited to, a portable computer, a personal digital assistant, a handheld device, and a workstation. The computing devices 108 are communicatively coupled to the proposed determination system 110 through a network 104. It may be also understood that the proposed determination system 110 is a system for decision making by capturing packets of data-based information from multiple data acquisition units.

[51] In one implementation, the network 104 can be a wireless network, a wired network or a combination thereof. The network 104 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. Further, the network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Intc nct Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 104 can include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like. [52] As discussed, the computing device 108 (which may include multiple devices in communication in a hard-wired or wireless format) may include at least one of the following: a mobile wireless device, a smartphone, a mobile computing device, a wireless device, a hard-wired device, a network device, a docking device, a personal computer, a laptop computer, a pad computer, a personal digital assistant, a wearable device, a remote computing device, a server, a functional computing device, or any combination thereof. While, in one preferred and non-limiting embodiment, the primary computing device 108 is a smartphone (which may include the appropriate hardware and software components to implement the various described functions), it is also envisioned that the computing device 108 be any suitable computing device configured, programmed, or adapted to perform one or more of the functions of the described system.

[53] FIG. 2 illustrates an exemplary module diagram of the system for providing a binary output based on input packets of data pertaining to an event, in accordance with an embodiment of the present disclosure.

[54] In an aspect, the system 110 can comprise one or more processor(s) 202. The one or more processor(s) 202 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, logic circuitries, and/or any devices that manipulate data based on operational instructions. Among other capabilities, the one or more processor(s) 202 are configured to fetch and execute computer-readable instructions stored in a memory 204 of the system 110. The memory 204 may store one or more computer-readable instructions or routines, which may be fetched and executed to create or share the data units over a network service. The memory 204 may comprise any non-transitory storage device including, for example, volatile memory such as RAM, or non-volatile memory such as EPROM, flash memory, and the like.

[55] The system 110 can also comprise an interface(s) 206. The interface(s) 206 may comprise a variety of interfaces, for example, interfaces for data input and output devices, referred to as I/O devices, storage devices, and the like. The interface(s) 206 may facilitate communication of system 110 with various devices coupled to the system 110 such as the input unit 102 and the output unit 104. The interface(s) 206 may also provide a communication pathway for one or more components of the system 110. Examples of such components include, but are not limited to, a visual data receiving unit 208, a visual data processing unit 210, an audio data receiving unit 212, an audio data processing unit 214, a determination unit 216, and an output unit 218 and a database 220. It would be appreciated that, the database 220 of the system 110 can be configured at a remote location say a cloud or a server.

[56] In an embodiment, the visual data receiving unit 208 is configured to receive a first dataset pertaining to a first attribute associated with an event to be analysed. The first dataset can be received from one or more sensors which can include any or a combination of image sensors and depth sensors.

[57] In another embodiment, the visual data processing unit 210 is configured to analyse the first dataset based on a first parameter.

[58] In another embodiment, the audio data receiving unit 212 is configured to receive a second dataset pertaining to a second attribute associated with the event to be analysed.

[59] In another embodiment, the audio data processing unit 214 is configured to analyse the second dataset based on a second parameter.

[60] In another embodiment, the determination unit 216 is configured to receive processed data from the first data processing unit 210 and the second data processing unit 214 and determine from them the presence of the first attribute and the second attribute respectively.

[61] In another embodiment, the output unit 218, based on the presence of the first attribute and the second attribute as determined by the determination unit 216, provides a binary output pertaining to the event being analysed.

[62] In another embodiment, the output unit 218 is configured to provide a log of relevant information pertaining to the first attribute and the second attribute, while providing the binary output.

[63] In another embodiment, the system 200 can be configured to receive additional input pertaining to the event from an external source, wherein the additional input can be considered to provide the binary output.

[64] In another embodiment, the system 200 can be configured to receive additional input pertaining to the event from an external source after the system 200 has provided the binary output. In this case, the binary output provided can be re-computed to provide a new binary output considering the additional input.

Working Example

[65] The following sections demonstrate working of the proposed system for providing a binary output by illustrating an exemplary implementation of the proposed system. The implementation relates to a cricket game, and in particular, to providing an“out” or“not out” decision for an event where a ball bowled hits one or both pads of the batsman when the batsman is standing in front of wickets. The proposed system is implemented to determine if the ball is engaged by the bat before striking the one or both pads of the batsman to adjudicate if a leg-before-wicket (LBW) decision for the batsman. If yes, the result would be“not out”, and if no, the result would be“out”.

[66] It would be appreciated by those skilled in the art that the following illustration is solely to demonstrate the working of the proposed system and that the illustration should not be construed as a limitation to other potential implementations of the proposed system.

[67] In an exemplary implementation, the system can be applied for enabling decision making in real time (~2-3 sec).

[68] In another exemplary embodiment, the proposed system can be operatively coupled to one or more sensors to detect different aspects pertaining to the event such as a real time position of the batsman, the ball, the bat and position of the one or more fielders. The sensors can be any or a combination of audio sensors, video sensors and depth sensors. The audio sensors can be configured to detect different sounds as the batsman engages to play the ball with the bat.

[69] The following sections contains references to the following terms defined as hereunder,

• RGBD frame - aframe consisting mxn pixels, with each pixel having four values designated RGBD.

• Image frame - a frame consisting mxn pixel, with each pixel consisting three values designated RGB.

• Depth frame / Depth Map - a frame consisting mxn pixels, with each pixel having a depth value designated D, where the depth value corresponding to a pixel location (a, b) can be a horizontal distance of a point on the object plane that is projected at (a, b) in the image plane called as the depth value as d (a, b).

• Video / Image sequence- a collection of image frames which are in a chronological, sequential order.

• Depth map sequence - a collection of depth maps which are in a chronological, sequential order.

• RGBD sequence - a collection of RGBD frames which are in a chronological, sequential order. [70] In an exemplary embodiment, the proposed system can use a deep learning software to detect, identify and precisely locate a bat, a ball and a bat’s man body (including various protection that he/she wears) in their three-dimensional configuration from a RGBD frame using a 3D segmentation technique. The 3D segmentation of the bat and the batsman’s body in real time are key contributions for decision making. The 3D segmentation of the bat, the ball and the batsman’s body are performed primarily for two major purposes: a) identifying components in contact that produce the sound caught by an ultra-edge, and b) locating the point of contact of the ball with the batsman’s body, which is essential for finding the projected trajectory of the ball.

[71] In an exemplary embodiment, with the ultra-edge audio and the 3D reconstructed bat, ball and batsman’s body are taken as input and fetched to a computer program to identify any contact present between the bat/the gloves/the batsman’s body and the ball. Upon no contact determination being found between the bat and the gloves by the system, the case for an LBW decision arises.

[72] In an exemplary embodiment, the system either automatically displays the trajectory of the ball (in case an AR headset is used) or directly conveys the final decision (i.e. OUT or NOT OUT) using other decision conveying devices instantly.

[73] In another exemplary embodiment, if a contact is found between the bat/the gloves and the ball, the AR headset displays a few image frames from the captured image sequence when the contact takes place automatically and instantly and there is no need of a manual check. Plurality of display devices could be used to convey the final determined decision.

3D Segmentation of Bat

[74] In an embodiment, the proposed method and system can be used for 3D segmentation of a bat.

[75] In another embodiment, the method and system can be used to segment off the cricket bat from the background of an RGBD frame using a deep learning network. The part of the bat covered with a ball or a batsman’s body is then reconstructed using a deep learning network to generate a 3D model.

[76] In another embodiment, the method and system facilitates taking decisions spontaneously and automatically. The decisions can be an LBW decision and a caught decision in a game of cricket.

[77] In another embodiment, taking the LBW and the caught decisions requires knowing if the bat has touched/edged the ball. The method and the device takes the decisions based on if the bat has touched the ball by finding the position and orientation of the bat and the ball precisely in 3-D.

[78] In another embodiment, the system identifies the cricket bat of varying shape, size, and in various orientation of pose, tilt, zoom in an image frame.

[79] In another embodiment, a part of the bat that is hidden by either the ball or a part the batsman’s body is reconstructed to constitute a full 3Dmodel.

[80] In another embodiment, the information that the system has access to (from a camera and a depth sensor) is a RGBD sequence, which contains the sequence of the images and corresponding depth maps while the bowler bowls and the batsman plays a shot.

[81] In another embodiment, a neural network is provided that identifies the footprint of the uncovered part of the bat. The neural network segments a 2D footprint of the bat from animage frame after being trained with thousands of images of the bats with varying size, shape, pose, tilt and zoom.

[82] In another embodiment, the neural network performs supervised learning in the process of which, the network learns about key features of the cricket bat from many different looking samples of bats. Mathematically, the network predicts a probability value for every pixel being present within the footprint of the bat. All those pixels for which the probability value crosses a certain predesigned threshold constitute a 2D footprint of a bat.

[83] FIG. 3A illustrates an input image from a video and FIG. 3B illustrates the network output as a result of 2D image segmentation in accordance with an embodiment of the present disclosure.

[84] In another embodiment, the neural network architecture can be an amalgamation of multiple fundamental neural networks such as but not limited to a convolutional neural network, commonly referred to as CNN and a fully convolutional network or FCN. The CNN predicts a rectangular region where the bat is most likely to be present. The FCN follows up by identifying the pixels from the rectangular region that would constitute the bat.

[85] In another embodiment, the output of the CNN can be a matrix of dimension B

X 3, where B is the number of bounding boxes predicted per image, each having 3 attributes, namely the probability of the box containing the bat, the left-top coordinate of the box and the right-bottom coordinate of the box.

[86] In another embodiment, the output of the FCN is a binary matrix of the dimension m x n, where the value of 1 signifies computed probability that is more than a threshold and 0 signifies the value that is lesser than the threshold. [87] FIG. 4A illustrates a simplified representation of the CNN and the FCN in accordance with an embodiment of the present disclosure.

[88] FIGS. 4B-4D shows a visual transformation of the image frames across the network, applied with the output of the results with each of the network in accordance with an embodiment of the present disclosure.

[89] In another embodiment, a reconstruction engine can be provided that is fed with the output of the previous network along with a depth frame.

[90] In another embodiment, the depth frame consists of the depth values corresponding to every pixel of the RGB frame. Upon discovering the 2D footprint of the uncovered region of the bat, the depth values corresponding to those pixels are easily obtained from the depth frame thus facilitating achieving first part of the 3D reconstruction.

[91] In another embodiment, the covered region of the bat can be reconstructed i.e. horizontal distances of the covered region of the bat is to be estimated, which is equal to the depth values for the region had the ball not covered the bat.

[92] In another embodiment, the reconstruction engine can be made of a deep neural network, commonly referred to as an encoder-decoder network. The network transforms a depth frame to another depth frame, where the depth value of the obstructed region is replaced with estimated depth values of the bat.

[93] In an embodiment, to illustrate the transformation of the depth frame, a heat map is shown each at the beginning and end of the network.

[94] FIG. 5A illustrates the network architecture representation of an encoder- decoder network in accordance with an embodiment of the present disclosure.

[95] FIGS. 5B-5C illustrates the transformation of a depth frame across the network in accordance with an embodiment of the present disclosure.

[96] In another embodiment, the output of the network is an m x n matrix (M) whose expected values are as follows:

[97] In another embodiment, the CNN, FCN, and Encoder-Decoder network, like many other machine learning methods work by a technique of training with thousands of images like FIGS. 5B-5C.

[98] In another embodiment, the network parameters are tuned in iterative steps such that the network’s output images are similar, in some measurement, to the annotated images of the same input RGBD frames. The accuracy of the machine learning methods depends heavily on the number of training RGBD frames that are used to tune the parameters.

[99] In another embodiment, the cricket pitch can refer to hard, bouncy planar surface over which the game is played. The pitch’s length can be a bit over 22 yards and can be rectangular in shape.

[100] In an embodiment, the identification of the pitch can be done to a) enable the system to infer origin of the sound as the sound can be from the ball hitting the pitch, b) enable the identification of the batsman’s body (as explained in later embodiments).

[101] In an embodiment, an identification of the pitch can be performed. For this a neural network is built that takes in the image sequence as an input (as a bowler bowls the delivery). Each of the images contains m x n pixels, and every pixel has a corresponding RGB and a depth value.

[102] In another embodiment, for identification of the pitch, the neural network takes only the RGB values as input for every pixel. The neural network outputs the coordinates of the rectangle which encloses the pitch.

[103] In another embodiment, the neural network can be trained over tens of thousands of the images which contain the cricket pitches, while it outputs 4 points (or pixel locations). [104] In another embodiment, the neural network can be trained over tens of thousands of the images which contain the cricket pitches, while it outputs 4 points (or pixel locations).

Assuming the points to be Ai’s , where i is 1, 2, 3 and 4. Let the actual points (of the neural network while training) which enclose the rectangular pitch be Bi’s. Let Ai (x) and A i (y) be the x and y coordinates of the first point in set A respectively. Let B i (x) and B i (y) be the x and y coordinates of the first point in set B respectively. The training of the neural network is done so as to minimise the objective function:

Since the neural network identifies the 4 points which enclose the pitch, it in-effect identifies the rectangular pitch in the image in 2D. The pitch is identified in the 3D by finding the depth values of a few pixels which lie in the rectangular area that is found out by the neural network. Using the depth values and location of these pixels, a 3D plane is reconstructed containing the pitch.

[105] In another embodiment, only the depth values and the locations of the 3 pixels are required to construct a3D plane. Let the set of points which are a part of the plane of the pitch belong to the set S.

3D Segmentation of Batsman’s Body

[106] In an embodiment the proposed method and the system can be used to find the location of the ball hitting the human body, which is crucial for determining the LBW decisions. The batsman’s body is segmented off the background of an RGBD frame using a deep learning model.

[107] In an embodiment, the LBW decision making can be done by effectively determining the point in the 3D space, where the ball hits the batsman’s body (or the protection the batsman wears) for the first time after the ball is bowled.

[108] In another embodiment, projected trajectory of the ball can be predicted from actual trajectory of the ball before it hits the batsman’s body. The (x, y, z) position on the batsman’s body can be found in a 3D space. The location and depth(z)information of the points on the batman’s body can be made visible from a front-on vision and can be determined by identifying the pixels which:

a. Do not belong to the set of pixels where the bat and ball has been identified, b. I z - depth values at those pixels | > threshold, where z is the depth of the stumps, and c. Do not belong to set S.

[109] In another embodiment, the pixels that satisfy the above-mentioned criteria belong to set A (as explained in later embodiments). However a case can arise wherein apart of the body where the ball hits of is closer to the ball and is hidden by the bat/the ball and hence the minimum distance between the body and the ball cannot be found out correctly as we can get the depth values of the part of the body closer to the ball and which is hidden by the bat/ball from the front-on vision. To overcome this problem, we apply a method to find out the location and the depth of the hidden part of the batsman’s body, which may be potentially close to the ball. The depth map is drawn in a 3D space, where the depth value corresponding to every pixel is on the z-axis, while the pixel locations correspond points on the x and y axis.

[110] In another embodiment, a depth-gradient is found out for the constructed depth map for the pixels belonging to the set A (as explained in later embodiments) and which are in region surrounding the ball. A small value of the gradient suggests a smooth curve and is reflective of the fact that there are no hidden body parts at these pixels. However, when the gradient is larger, it reflects that we have covered body parts at these pixels the set of pixels, whose gradient exceeds a certain value, are sent through a reconstruction (along with the depth values at these pixels) network. The reconstruction network outputs the depth values of the hidden body parts.

[111] In another embodiment, the reconstructed network is another neural network, which is trained over tens of thousands of depth map sequences, each containing body movements of the batsman for various shots he/she plays. The depth map sequences comprise of many depth maps and each of the depth map is made up of the m x n pixels and every pixel has a corresponding depth value. A few intermediate frames in every depth map sequence are selected and parts of the body are occluded using the bat or the ball in these depth maps.

[112] In another embodiment, it is observed that the occluded parts have the different depth values corresponding to the part which is occluded. However, the depth values of the body parts which are occluded are also known for these selected depth maps. The reconstruction neural network takes in the depth maps which were not occluded (which were not selected as the intermediate depth maps) as the input and predicts the depth values of the body parts which are occluded in the intermediate frame. This is done by taking a“weighted average” of the depth maps before and after the intermediate depth maps to predict the depth value of the occluded body part in the intermediate depth maps such that the predicted depth values“closely match” the actual depth value of the occluded body parts in the selected depth maps.

[113] In another embodiment, training the neural network involves finding the values of the weights which are convolved with the non-selected depth maps to predict the depth values of the selected depth maps. As an example, when there are 5 depth maps in a depth map sequence, and in the 3rd depth map the batsman’s belly is covered by the bat, then the pixels in the 3rd depth map can be predicted using the following formula:

Where fi ,f₂ ,f₃ ,f₄ ,fs are the depth maps which contain the depth value at every pixel of the parts of the body. We minimise while training the following objective function:

Once the training is done to minimise the above objective function, we have a set of wi ,w₂ ,w₃ ,etc which can be used to predict the depth map where the body parts are occluded in a new depth map sequence to closely match the actual depth values of the body parts.

Sound Detection

[114] In an embodiment, the proposed method and the system can be used to detect and determine whether a waveform coming from a microphone contains a cricketing sound, automatically by using an automated process. The automated process completely determines whether the sound is caused due to the cricketing sound and without any human intervention. This is in contrast to the existing solutions that require implanting the microphone in cricketing stumps to record the various sounds. A third umpire then gets to see a filtered waveform of the recorded signal. The third umpire looks at the waveform and uses his manual intuition (via observing some disturbance in the waveform) to decide if there is the cricketing sound or not. In addition, the audio captured by the stump microphone present in a cricket ground is laden with ambient noise, crowd noise and any other undesirable noises (all such noises can be collectively referred to as‘stadium noise’) in addition to‘the cricketing sound’. At certain instances, the stadium noise can have a higher magnitude than the cricketing sound and can simultaneously have similar frequency components to that of the cricketing sound. Upon occurrence of these criteria simultaneously it gets difficult to separate the sounds and thereby detect the actual occurrence of the cricketing sound. [115] In an embodiment, the proposed method and the system can effectively eliminate the stadium noise from up to the noise that is of five times higher magnitude primarily using a technique of spectrogram analysis, followed by a high pass filtering.

[116] FIG. 6 illustrates a flow diagram of a sound detection system, 600 in accordance with an embodiment of the present disclosure. As shown, at block 602 a summation of the cricketing sound and the stadium noise (construes crowd’s noise at the stadium and the ambient noise) is captured. The captured noise is fed into the single microphone at block 604. The digital signal processing clock at block 606 determines the captured sound and instants of the cricketing sound are evaluated for multiple time frames such as but not limited to 100- l20ms or 200-205ms at block 608.

[117] FIG. 7 illustrates a flow diagram of a DSP block within the sound detection system, 700 in accordance with an embodiment of the present disclosure.

[118] As illustrated in FIG.7 an input waveform is received at block 702, which is passed through a spectrogram at block 704. The output waveform received from the spectrogram is inputted to a high pass filter at block 706. The waveform is normalized at block 708 by applying signal normalization. The normalized signal received at block 708 is sent as input to a rolling window counter at block 710, where L is the rolling window length. At block 712 it is evaluated whether the rolling counter at any instance has a value greater than 0.5. If the value of the rolling counter is greater than 0.5 it is concluded at block 714 that the detected noise is a cricket noise at that instance of the time, else at block 716 it is concluded that the cricketing sound is detected at the time instance.

[119] In another embodiment, at block 718 the cricketing sound detected at various time instants is pooled so as to make disjoint periods. The multiple periods of the cricketing sounds can be but not limited to 100-120 ms and 200-205 ms as shown at block 720.

Sound Recognition

[120] In an embodiment, the proposed method and the system upon detecting the sound, can determine the origin of the sound based on the various inputs such as the position of the bat, the ball and the batsman’s body (including the protection he/she wears) and the microphone input.

[121] In an embodiment, when the bowler bowls a delivery and the batsman plays a shot, the microphone (or the snick-o-meter) records the waveform of the sound that may be produced. The multiple sounds which are produced such as the cricketing sounds and the time instants at which they are produced are determined. These cricketing sounds can include sounds due to:

[122] In an embodiment, the sounds determined at points 1 and 2 are referred to as sound A and the ones determined at the points 3-5 are referred to as B. The sound determined at point 6 is referred to as C and at 7 is referred to as D. To classify the cricketing sound a minimum distance between the ball and the bat and that of the batsman’s body are found out at the instant the cricketing sound is detected.

[123] In an embodiment, the below mentioned factors are considered for determining the cricketing sound based on the distance between the bat and the ball and the batsman:

Decision Making Process

[124] In an embodiment, the proposed method and the system are provided for decision making. This requires finding out in what sequence have the above sounds occurred, because identifying the sequence in which various processes (like ball hitting the bat or ball hitting the pad) have occurred is at heart of the decision making. There could primarily be multiple combination (or sequence) of sounds that could occur, to determine whether the batsman is a potential candidate for LBW or caught decision.

[125] In an embodiment, the following combinations or sequence of the sounds can occur:

[126] In another embodiment, sound D can also occur in any order in any of a top above. However, the presence of the sound D doesn’t affect the cricketing decision of the disclosed method and system or that the umpire would give.

[127] In another embodiment, whenever the batsman is a candidate for“caught- decision”, we refer to it as scenario 1. Whenever the batsman is a candidate for“Leg Before Wicket decision”, we refer to it as scenario 2. Whenever the batsman is a candidate for both “caught and LBW decision”, we refer to it as scenario 3. Whenever the batsman is neither a candidate for“caught decision”, nor a candidate for“LBW decision”, we refer to it as scenario 4. The multiple mentioned scenarios are discussed as:

Scenario 1:

Case when the sounds a, d, f, or o have occurred. In these cases, there is a contact between the bat and the ball and no contact between the pitch and the ball immediately after the ball hit the bat and hence combination of the sounds in a, d, f or o make the batsman a candidate for“caught decision”.

Scenario 2:

Case when the sounds b, i, j, 1 or m have occurred. In these cases, there is a contact between the ball and the batsman’s body and no contact between the ball and the bat before the ball hits the batsman’s body and hence a combination of the sounds in b, i, j or m make the batsman a candidate for the“LBW decision”.

Scenario 3:

Case when the sounds e, n or p has occurred. In these cases, there is a contact between the ball and the batsman’s body (before ball hits the bat), which makes the batsman a candidate for the“LBW decision”. There is also a contact between the bat and the ball (post the ball and the batsman’s body), which makes the batsman a candidate for“caught decision”.

Scenario 4:

Case when the sounds c, g, h or k has occurred. In these cases, either there is no contact between the bat and the ball or the ball hits the pitch immediately after the ball hits the bat. This makes the batsman out of contention for the caught decision. Also, there is either no contact between the ball and the batsman’s body or the contact occurs after the ball has hit the bat. This makes the batsman out of contention for the“LBW decision”.

[128] In another embodiment, in Scenario 2 and Scenario 3, wherein the batsman is a candidate for the LBW decision, the projected trajectory of the ball is calculated based on the actual trajectory of the ball before the ball hits the batsman’s body. The point at which the ball hits the batsman’s body is referred to as the point of impact. The projected trajectory is drawn from the point of impact up to the plane of the stumps. If the projection of the point of impact onto the plane of the stumps lies inside the stumps, the impact is said to be in-line. If the impact is in-line, the projected trajectory of the ball is hitting the stumps and the ball has not pitched outside leg, the batsman is out LBW and we refer to it as scenario 2A or 3A. In case ball is not hitting the stumps or the impact is not is line or the ball has pitched outside leg, we refer to it as scenario 2B or 3B. Based on the scenario that fits the situation as per the determined captured information and that has occurred, the necessary information is transmitted to the on field umpire spontaneously using a decision conveying device.

[129] In an embodiment, the decision conveying device can be such as but not limited to an augmented reality headset or tablet, an earphone, a smart watch etc.

[130] In another embodiment, in case of scenario 1 and scenario 3B, an oversize screen such as those present in most cricket grounds can be used to convey a possible contact between the bat and the ball. In case of scenario 2A and 3A, the oversize screen displays that the batsman is OUT-LBW and in another scenarios, the screen displays that the batsman is NOT-OUT.

[131] In another embodiment, single or multiple cameras can be used to obtain high resolution RGB frames from the instance the bowler bowls a delivery till the time the batsman completes playing the bowl (either plays or misses the ball).

[132] In another embodiment, an active or passive depth sensor or the camera can be used. The RGB cameras capture colour and shade information of an object. The depth camera captures the depth information. Multiple types of the depth sensors that can be used are: 1) active depth sensors that are based on time-of-flight or structured light imaging. Based on the wavelength of the light used, the sensor could be a LIDAR (motion-based or solid state), infrared depth sensor, etc. 2) passive depth sensors based on stereo or hyper stereo vision, and they give the value of depth or the relative depth of each point in the depth map.

[133] In another embodiment, single or multiple GPUs or CPUs take input from the multiple sensors and process the information to convey the decision using a decision conveying device to the umpire/players. The GPUs/CPUs can be either wired into the umpire’s pocket or are present in the cloud with an allocated and dedicated bandwidth through which the inputs from the sensors are transferred to the cloud.

[134] In another embodiment, MSU (motion sensing unit) is provided. The MSU is attached to an augmented reality (AR) headset. It consists of a pair of accelerometer and gyroscope. The accelerometer is a MEMS based circuitry that enables tracking the linear acceleration of the AR Lens. The gyroscope, on the other hand is inept at tracking rotational motion of the headset. The purpose of the MSU is to provide stability to the rendering on the AR Lens.

[135] In another embodiment, a Wi-Fi antenna is provided. The Wi-Fi antenna is attached to the camera module and the MSU. The data is collected from the RGB and the depth camera in the form of the pixels and the MSU in the form of linear and angular acceleration and can be transmitted via the Wi-Fi antenna over a dedicated bandwidth through HTTP protocol to a GPU Server hosted in a nearby location.

[136] FIG. 8 illustrates a flow diagram of the disclosed decision-making process, 800 in accordance with an embodiment of the present disclosure. At block 802, visual information is captured by using the RGB camera and the depth sensors. The captured information is passed through a neural network at block 804 to determine 3D position of the bat, the ball and the cricket pitch even for occluded parts at block 806. Additionally, at block 808 a segmentation method is used that facilities determining the 3D location of the points on the batman’s body that are visible from the front-on vision at block 810.

[137] In another embodiment, the neural networks takes depth map sequence of the non-occluded body parts as input at block 812 and determines the 3D location of the points on parts of the body occluded by either the bat or the ball in every depth map, at block 814.

[138] In another embodiment, the 3D positions of the bat, the ball and the pitch from the block 806 is used, along with 3D location of the points on the part of the body occluded by either the bat or the ball in every depth map to determine the 3D mapping of points on the bat, the ball, the pitch and the batman’s body including the occluded parts as shown at block 816.

[139] In another embodiment, at block 818 the audio data is captured, and the sound is detected at block 820. This captured audio and sound data is used along with the 3D mapping of the points at block 816, to determine a closest distance among connected subsets (dl, d2, d3) at block 822. At block 824, a determination of the recognition of the origin of the sound is done, and at block 826 an analysis of the sequence of origin of the sounds is done. At block 828, the multiple discussed scenarios are considered to determine the decision for the batsman. The determined decision is conveyed to the public using the oversized display device at block 830. The decision is executed for the batsman at block 832.

[140] The technical benefits achieved by the implementation of various embodiments of the present disclosure are as recited below:

[141] The present invention solves the above recited and other available technical problems by providing a determination system that uses an automated computational machine which operates programmatically and provides instant correct decisions instead of a human umpire making manual incorrect decisions.

[142] The above described aspects of the invention of the present disclosure can be potentially implemented in other cases such as in other sports like football, rugby, American football, tennis, baseball, basketball, badminton etc. the following sections briefly describe some implementations of the invention of the present disclosure. It would be appreciated that, the proposed invention of the present disclosure can be applied to other applications as well, and the embodiments and implementations described in the applications are exemplary ones and may not be construed as limitations.

Additional Working Examples

Football

The invention of the present disclosure can be implemented to determine events such as handball, foul, offside, goals etc. pertaining to Football. The system and method followed here is similar to the embodiments described above taking Cricket as the working example.

• 3D Segmentation of players and the ball.

• Binary Output provided based on the rules of football.

• Binary Output provided to a referee.

Tennis

The invention of the present disclosure can be implemented to determine events such as line calls, double bounce, body touch etc. pertaining to Tennis. The system and method followed here is similar to the embodiments described above taking Cricket as the working example.

• 3D Segmentation of players, racquet and the ball.

• Binary Output provided based on the rules of Tennis.

• Binary Output provided to a referee.

Baseball

The invention of the present disclosure can be implemented to determine events such as ball/strike, foul ball, forced out, tagged out, fly out etc. pertaining to Baseball. The system and method followed here is similar to the embodiments described above taking Cricket as the working example.

• 3D Segmentation of players, bat and the ball.

• Binary Output provided based on the rules of Baseball.

• Binary Output provided to a referee. American Football

The invention of the present disclosure can be implemented to determine events such as down, touchdowns etc. pertaining to American Football. The system and method followed here is similar to the embodiments described above taking Cricket as the working example.

• 3D Segmentation of players and the football.

• Binary Output provided based on the rules of American Football.

• Binary Output provided to a referee.

Basketball

The invention of the present disclosure can be implemented to determine events such as shot clock violation, double dribble, foul, travelling three in the key, charging, five second violation, three second violation etc. pertaining to Basketball. The system and method followed here is similar to the embodiments described above taking Cricket as the working example.

• 3D Segmentation of players and the basketball.

• Binary Output provided based on the rules of Basketball.

• Binary Output provided to a referee.

[143] As shown in FIG. 9, computer system includes an external storage device 910, a bus 920, a main memory 930, a read only memory 940, a mass storage device 950, communication port 960, and a processor 970. A person skilled in the art will appreciate that computer system may include more than one processor and communication ports. Examples of processor 970 include, but are not limited to, an Intel® Itanium® or Itanium 2 processor(s), or AMD® Opteron® or Athlon MP® processor(s), Motorola® lines of processors, FortiSOC™ system on a chip processors or other future processors. Processor 970 may include various modules associated with embodiments of the present invention. Communication port 960 can be any of an RS-232 port for use with a modem-based dialup connection, a 10/100 Ethernet port, a Gigabit or 10 Gigabit port using copper or fibre, a serial port, a parallel port, or other existing or future ports. Communication port 960 may be chosen depending on a network, such a Local Area Network (LAN), Wide Area Network (WAN), or any network to which computer system connects.

[144] Memory 930 can be Random Access Memory (RAM), or any other dynamic storage device commonly known in the art. Read only memory 940 can be any static storage device(s) e.g., but not limited to, a Programmable Read Only Memory (PROM) chips for storing static information e.g., start-up or BIOS instructions for processor 970. Mass storage 950 may be any current or future mass storage solution, which can be used to store information and/or instructions. Exemplary mass storage solutions include, but are not limited to, Parallel Advanced Technology Attachment (PATA) or Serial Advanced Technology Attachment (SATA) hard disk drives or solid-state drives (internal or external, e.g., having Universal Serial Bus (USB) and/or Firewire interfaces), e.g. those available from Seagate (e.g., the Seagate Barracuda 7200 family) or Hitachi (e.g., the Hitachi Deskstar 7K1000), one or more optical discs, Redundant Array of Independent Disks (RAID) storage, e.g. an array of disks (e.g., SATA arrays), available from various vendors including Dot Hill Systems Corp., LaCie, Nexsan Technologies, Inc. and Enhance Technology, Inc.

[145] Bus 920 communicatively couples processor(s) 970 with the other memory, storage and communication blocks. Bus 920 can be, e.g. a Peripheral Component Interconnect (PCI) / PCI Extended (PCI-X) bus, Small Computer System Interface (SCSI), USB or the like, for connecting expansion cards, drives and other subsystems as well as other buses, such a front side bus (FSB), which connects processor 970 to software system.

[146] Optionally, operator and administrative interfaces, e.g. a display, keyboard, and a cursor control device, may also be coupled to bus 920 to support direct operator interaction with computer system. Other operator and administrative interfaces can be provided through network connections connected through communication port 760. External storage device 910 can be any kind of external hard-drives, floppy drives, IOMEGA® Zip Drives, Compact Disc - Read Only Memory (CD-ROM), Compact Disc - Re -Writable (CD-RW), Digital Video Disk - Read Only Memory (DVD-ROM). Components described above are meant only to exemplify various possibilities. In no way should the aforementioned exemplary computer system limit the scope of the present disclosure.

[147] It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive patient matter, therefore, is not to be restricted except in the spirit of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “includes” and “including” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refer to at least one of something selected from the group consisting of A, B, C ....and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc. The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practised with modification within the spirit and scope of the appended claims.

[148] While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

ADVANTAGES

[149] The present disclosure provides a system and method for providing a binary output based on input packets of information pertaining to an event.

[150] The present disclosure provides a method and a system to make accurate decisions in real time using information obtained from one or more packets.

[151] The present disclosure provides a method and system where decision making is done using an automated computational system such that the machine operates programmatically and provides instant correct decisions instead of a human making manual incorrect decision.

Claims

We Claim:

1. A system for providing a binary output based on input packets of data pertaining to an event, said system comprising:

a memory operatively coupled to one or more processors, the memory storing instructions executable by the one or more processors to:

receive, from one or more first sensors operatively coupled with one or more elements involved in the event, a first dataset of data pertaining to one or more first attributes associated with the event;

determine presence of the one or more first attributes from the received first dataset of data based on a processed first data, said processed first data obtained from processing, by the one or more processors, the received first dataset of data based on a first parameter;

receive, from one or more second sensors operatively coupled with the one or more elements involved in the event, a second dataset of data pertaining to one or more second attributes associated with the event; and

determine presence of the one or more second attributes from the received second dataset of data based on a processed second data, said processed second data obtained from processing, by the one or more processors, the received second dataset of data based on a second parameter,

wherein, based on presence of any or a combination of the one or more first attributes and the one or more second attributes, a binary output is provided pertaining to the event.

2. The system as claimed in claim 1, wherein a log of data pertaining to any or a combination of the first dataset of data and the second dataset of data is provided as output.

3. The system as claimed in claim 1, wherein one or more third dataset of data is additionally provided from an external source, and wherein the one or more third dataset is additionally processed to provide the binary output.

4. The system as claimed in claim 1, wherein the one or more first sensors are any or a combination of image sensors and depth sensors configured to detect presence and movement of the one or more elements involved in the event, and wherein the first dataset comprises data from either or both of image sensors and depth sensors.

5. The system as claimed in claim 1, wherein the one or more second sensors are audio sensors configured to detect sounds produced due to engagement of any or all of the one or more elements involved in the event.

6. The system as claimed in claim 1, wherein a neural network is configured to process any or a combination of the received first dataset of data and the received second dataset of data.

7. The system as claimed in claim 6, wherein the neural network is trained based on any or a combination of a plurality of first training datasets and a plurality of second training datasets pertaining to the event, said first training datasets and second training datasets being stored in a database operatively coupled with the system.

8. The system as claimed in claim 1, wherein the neural network is configured to predict engagement of the one or more elements involved in the event based on historical data pertaining to location of the one or more elements at a plurality of instances preceding the event.

9. A method for providing a binary output based on input packets of data pertaining to an event, said method comprising the steps of:

receiving, at a computing device from one or more first sensors operatively coupled with one or more elements involved in the event, a first dataset of visual data pertaining to one or more first attributes associated with the event;

determining, at the computing device, presence of the one or more first attributes from the received first dataset of visual data based on a processed visual data, said processed visual data obtained from processing, by the one or more processors, the received first dataset of visual data based on a first parameter;

receiving, at the computing device from one or more second sensors operatively coupled with one or more elements involved in the event, a second dataset of audio data pertaining to one or more second attributes associated with the event; and

determining, at the computing device, presence of the one or more second attributes from the received second dataset of audio data based on a processed audio data, said processed audio data obtained from processing, by the one or more processors, the received second dataset of audio data based on a second parameter, wherein, based on presence of any or a combination of the one or more first attributes and the one or more second attributes, a binary output is provided pertaining to the event.

10. The method as claimed in claim 9, wherein a log of data pertaining to any or a combination of the first dataset of data and the second dataset of data is provided as output.

11. The method as claimed in claim 9, wherein one or more third dataset of data is additionally provided from an external source, and wherein the one or more third dataset is additionally processed to provide the binary output.

12. The method as claimed in claim 9, wherein a neural network is configured to process any or a combination of the received first dataset of data and the received second dataset of data.

13. The method as claimed in claim 12, wherein the neural network is trained based on any or a combination of a plurality of first training datasets and a plurality of second training datasets pertaining to the event, said first training datasets and second training datasets being stored in a database operatively coupled with the system.

14. The method as claimed in claim 9, wherein the neural network is configured to predict engagement of the one or more elements involved in the event based on historical data pertaining to location of the one or more elements at a plurality of instances preceding the event.