WO2022195628A1

WO2022195628A1 - An artificial neural network based virtual air monitoring network system

Info

Publication number: WO2022195628A1
Application number: PCT/IN2022/050255
Authority: WO
Inventors: N Nithin SRIVATSAV; Pareekshith Us KATTI; Madhusudhan ANAND B
Original assignee: Datair Technology Private Limited
Priority date: 2021-03-16
Filing date: 2022-03-16
Publication date: 2022-09-22

Abstract

This invention discloses an artificial neural network based virtual air monitoring network system and method, for providing air quality data, comprising: receiving, a request signal, from a user's device, for monitoring air quality, at a geo-location data (x, y); forming a virtual network of stations (IS, TH, St); forming a virtual network of sensors (S1, S2, S3, …, Sn); polling, one or more stations to provide station-related data items; polling, sensors to provide sensor identity data; determining, and polling, parameters from a set of first parameters and a set of second parameters; training a neural network using said station-related data items (Dst, Sst), said sensor-related data items (Dsn, Ssn), said first parameter-related data (P1), said correlative first parameter distance (D1), said second parameter-related data (P2), said correlative second parameter distance (D2), in correlation with said geo-location data (x, y) to process air quality data (AQ) for said geo-location data.

Description

AN ARTIFICIAL NEURAL NETWORK BASED VIRTUAL AIR MONITORING NETWORK SYSTEM

FIELD OF THE INVENTION:

This invention relates to the field of environmental engineering and sensors.

Particularly, this invention relates to an artificial neural network based virtual air monitoring network system.

BACKGROUND OF THE INVENTION:

Ambient air monitoring is the systematic, long-term assessment of pollutant levels by measuring the quantity and types of certain pollutants in the surrounding, outdoor air.

As per the Greenpeace report, 22 out of 30 world’s most polluted cities are in India, however, there are only -160 Air monitoring stations in India while W.H.O recommends there should be 4000 stations. This lack of adequate monitoring means lack of adequate data or insight into Air pollution in the country. China for example, has more than 8000 air monitoring stations (as per aqicn.org). This inadequacy is often seen as a hindrance to policy making, decision making and hence a lot of solutions are not hence fully built. A base data is needed for everything from city planning to traffic management to new drug development to clinical trials. Air quality data forms a base layer to many things including human health as 8 out of 10 deaths can be attributed to air pollution. India lost 8% of its GDP to air pollution. Like Peter Drucker said “if you cannot measure you cannot solve it”. Hence there is a big need to solve this problem of lack of data and make this available for everyone. Globally 7 million deaths are linked to air pollution (7 million deaths annually linked to air pollution., 2014) Air pollution increases premature mortality rate by a significant margin (Lelieveld et al., 2015) .

But each air monitoring station costs $150,000 USD to $180,000 USD and it is economically not very viable to install 4000 stations. However, combining the existing air monitoring infrastructure and satellite data, by factoring human activities like traffic, garbage burning, industries source apportionment (Guttikunda et al., 2019) and population density combining it with interpolation methodologies along with a feature engineered neural network (and additional machine learning algorithms) increase the accuracy by including virtual air monitoring station that uses the power of Data Science to provide a high spatio temporal resolution and near real-time data which solves this data problem.

Traditional air monitors are large and costly; the cost can go up to $200,000 per air monitoring station. They monitor PM2.5, PM1, PM10, S02, N02, CO and 03. They consume a lot of power, maintenance cost, and man power. They often do not perform and their sensor accuracy also dips down, gradually, upon usage. Sensor replacement, calibration is also a time consuming process (i.e. it takes up to 2 to 3 months). Replacements are also very time consuming.

Often only 4 or 5 such sensors are used; because of their cost. Therefore, they do not cover an entire city. They only cover a 150 sqm radius, while the size of a city is at least 1000 times that.

In India, we do have an understanding of sources of emission; in Delhi, as per (Guttikunda and Calori, 2013; Guttikunda et al., 2019) for PM2.5 and CO emissions, were estimated to be 17% and 18% from vehicle exhaust, 16% and 31% for power plants, 15% and 12% for brick kilns, 14% and 15% for industries, and 12% and 14% for domestic, respectively. However, the air monitoring does not have the coverage to predict air pollution across the city like Delhi. Direct measurements of air pollution and sensitive population groups’ exposure are limited, rather scarce and therefore methods of Interpolation to point and aerial air quality estimations based on physical factors that are region specific are pre-requisite. For example, air pollution dispersion modelling in Bangalore may not work the same in Delhi. Delhi has different weather conditions and meteorological factors along with garbage burning, stubble burning and a clogged Indo-Gangetic plain. This highlights the importance of generating accurate fields of air pollution for estimating present and future health related risks.

There is a need for a system / network which is larger and provides greater accuracy and is fool proof.

PRIOR ART:

Until now, in the field of modelling for air pollution, two approaches have been adopted by the scientific community, differentiated by their geography and applied fundamental principles. A study for India, that considers Indian conditions like population density, traffic, industrial pollution, meteorological factors do not exist. In one such research (Guttikunda et ak, 2019) an Atmospheric Transport Modeling System (ATMoS) forward trajectory Lagrangian Pufftransport dispersion model was tried, to model PM concentrations (Calori and Carmichael, 1999) or modify it to add meteorological data (temperature, wind speed, wind direction, surface heat flux, precipitation etc.) was tried. However, accounting to human activities, traffic could improve the accuracy of this approach further and that is hence an objective of this research to validate using the methods detailed below. There are several studies in Emission apportioning in Delhi (Guttikunda et al., 2019) and in Hyderabad (Guttikunda et al., 2013), especially for Delhi, for both PM2.5 and CO, diesel and biomass combustion account for major shares in the respective sectors. In the transport sector, freight movement via heavy duty and light duty trucks are the largest contributors. All the heavy-duty trucks are diesel operated and most of the light duty trucks are CNG operated. For NOx emissions, vehicle exhaust remains the dominant source (53%). For S02, power plants (55%) and industries (23%), the largest coal users in the region, are the dominant sources. Diesel consumption for in- situ power generation from diesel generator sets in the mobile phone towers, hotels, hospitals, large institutions, markets, malls, and apartment complexes, contributes about 6% of PM2.5. Although, the overall percentage of these emissions is small, when spatially segregated, these low-lying emissions are substantial, especially since they are located in densely populated areas.

PROBLEM STATEMENT:

Use of Spatial Interpolation Methodologies: Spatial interpolation is a methodical procedure used to estimate the values of the variable under study at unsampled locations, using observations of point in the same region. Such statistical methodologies of interpolation are applied in air pollution modelling for estimating the distribution of pollutants, based on ground station data from an existing air quality monitoring network. Air monitoring stations as we know are unevenly distributed and hence the issue of coverage of air pollution measurement is essentially a problem in the field of scattered data approximation. There are a variety of spatial algorithms available in the literature as well as in many GIS or statistics software (Contreras and Ferri, 2016). The air monitors in Delhi managed by the Central and State pollution control board are located to detect high concentrations and while they are very appropriate w.r.t configuration to detect high concentration it fails to describe the spatial variability of air pollution. Recent developments in the design and the modification methods of modelling address this issue (Pummakarnchana et al., 2005)(01iver and Webster, 1990)(Wong et al., 2004; Marshall et al., 2008; Kethireddy et al., 2014; Noorpoor and Feiz, 2014) and go on to suggest improved methodologies for more reliable and cost effective spatial predictions.

However, such a study does not exist (for India) and, further, such studies do not validate physical and meteorological parameters.

Further, there is a need for a virtual air quality monitoring station, configured, at a point of request, using static data and variable data pertaining to the point of request.

OBJECTS OF THE INVENTION:

An object of the invention is to provide an air monitoring network system which covers a relatively larger area.

Another object of the invention is to provide an accurate air monitoring network system.

Yet another object of the invention is to provide a relatively inexpensive accurate air monitoring network system.

Still another object of the invention is to provide a robust and fool proof accurate air monitoring network system.

An additional object of the invention is to provide an air monitoring network system with a very high spatial resolution (of 10m x 10m). Yet an additional object of the invention is to provide an air monitoring network system which provides better results than a statistical interpolation method.

Still an additional object of the invention is to provide an air monitoring network system which achieves error less than 10.

Another additional object of the invention is to provide an air monitoring network system which is hybrid in nature, in that, it uses columnar data from satellites when integrated with on-ground station data.

Yet another additional object of the invention is to provide an air monitoring network system which measure air quality in near real time covering 1,50,000 postcodes in a country.

SUMMARY OF THE INVENTION:

The present disclosure may be a system, a method, and / or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

Aspects of the disclosed embodiments may include tangible computer readable media that store software instructions that, when executed by one or more processors, are configured for and capable of performing and executing one or more of the methods, operations, and the like consistent with the disclosed embodiments. Also, aspects of the disclosed embodiments may be performed by one or more processors that are configured as special-purpose processor(s) based on software instructions that are programmed with logic and instructions that perform, when executed, one or more operations consistent with the disclosed embodiments. According to this invention, there is provided an artificial neural network based virtual air monitoring network system, for providing air quality data, on a user’ s device, said system comprising a computer processor configured with a non- transitory memory to store instructions configured to be executed on said computer processor, said instructions comprising the steps of: receiving, and recording, a request signal, from a user’s device, for monitoring air quality, at a geo-location, said geo-location being defined in terms of geo-location data (x, y); forming a virtual network of stations (IS, TH, St), within a pre-defined perimeter, correlative to said geo-location (x, y), said virtual network of stations comprising at least one target station and one or more variable stations; forming a virtual network of sensors (SI, S2, S3, ..., Sn), within a pre defined perimeter, correlative to said geo-location (x, y); polling, one or more stations, from said formed virtual network of stations, closest to said geo-location data (x, y) from where said request signal was transmitted, in order to provide station-related data items comprising: station identity data, station distance data (Dst) of said station from said geo location data (x, y), station air quality data (Sst) resident at said station; polling, one or more sensors, from said formed virtual network of sensors, closest to said geo-location data (x, y) from where said request signal was transmitted, in order to provide sensor identity data, sensor distance data (Dsn) of said sensor from said geo-location data (x, y), sensor air quality data (Ssn) sensed by said sensor; determining, and polling, one or more parameters from a set of first parameters, within a pre-defined perimeter, of said geo-location data (x, y) to obtain first parameter-related data (PI) and correlative first parameter distance (Dl) with respect to said geo-location data (x, y); determining, and polling, one or more parameters from a set of second parameters, within a pre-defined perimeter, of said geo-location data (x, y) to obtain second parameter-related data (P2) and correlative second parameter distance (D2) with respect to said geo-location data (x, y); and training a neural network using said station distance data (Dst), said sensor distance data (Dsn), said station air quality data (Sst), said sensor air quality data (Ssn), said first parameter-related data (PI), said correlative first parameter distance (Dl), said second parameter-related data (P2), said correlative second parameter distance (D2), all in correlation with said geo location data (x, y) in order to process an output signal, delivered to said user’s device, of air quality data (AQ) for said geo-location data in response to said request signal.

In at least an embodiment, said step of training a neural network comprising the steps of: o selecting at least a target station, from said virtual network of stations; o selecting one or more variable stations, from said virtual network of stations; o using Spatiotemporal Prediction Framework based on ANN along with Geospatial interpolation in order to provide air quality data (AQ) in correlation with said selected target station; o computing a first station distance data from said target station to said geo location data (x, y); o computing a second station distance data from said sensor, from said virtual network of sensors, to said geo-location data (x, y); o de-selecting said previously selected target station and re-classifying said previously selected target station to one or more variable stations; o further selecting at least another target station, remaining from said virtual network of stations; o using Spatiotemporal Prediction Framework based on ANN along with Geospatial interpolation in order to provide air quality data (AQ) in correlation with said selected target station; o computing, iteratively, a next station distance data from said another selected target station to said geo-location data (x, y), till all stations in said virtual network of stations have exhausted; o computing a second station distance data from said sensor, from said virtual network of sensors, to said geo-location data (x, y); and o computing a weighted mean from a plurality of said air quality data (AQ) to obtain a final air quality data (AQ) for said geo-location data (x, y).

In at least an embodiment, said step of training a neural network comprises the steps of: polling air quality data from said one or more pre-existing stations, from said virtual network of stations, said air quality data comprising PM2.5 values and PM 10 values; polling air quality data from said one or more sensors, from said virtual network of sensors, said air quality data comprising PM2.5 values and PM10 values; building a linear regression model correlating said PM2.5 values with said PM 10 values; filling missing values, of said air quality data, by: o using built linear regression model, populating missing PM2.5 values based on existing PM 10 values based on existing PM2.5 values; o using built linear regression model, populating missing PM 10 values; o using a temporal interpolation model, populating missing PM2.5 values and missing PM 10 values; o using a backward filling model, populating missing PM2.5 values and missing PM10 values; determining said target station, from said virtual network of stations; determining said variable station, from said virtual network of stations, as features in said neural network; computing, for each target station, a distance value between each of said variable stations and said target station; creating a feature-engineered dataframe comprising said fields: o air quality data from said variable stations; o distance from each of said variable stations to said target station; o air quality data from said target station: training said neural network basis said created feature-engineered dataframe; obtaining time-stamped predicted values of air quality data, for said target station, as an output of said trained neural network; grouping said predicted values of air quality data basis said time-stamp; and computing a mean of said grouped predicted values of air quality data to obtain at least a final air quality data for said geo-location data.

In at least an embodiment, said step of training a neural network comprises the steps of: for said virtual network of stations (n), determining: o at least one station (y) as said target station; o remainder stations (n-1) as said variable stations; determining number of neurons correlative to number of stations (n), from said virtual network of stations, in that, said correlation being equal to 2(n- 1) stations, thereby providing 2(n-l) neurons, each target station having: o at least air quality readings; o distance values correlative to one or more variable stations from said virtual network of stations; and eliminating outliers determined by percentile values based on distance values to provide values correlative to said provided 2(n-l) neurons.

In at least an embodiment, said step of training a neural network comprises the steps of: for said virtual network of stations (n), determining: o at least one station (y) as said target station; o remainder stations (n-1) as said variable stations; determining, for said target station, using at least two neurons having features, said features being: o PM2.5 value feature for said target station; and o station distance data (Dst) feature for said target station.

In at least an embodiment said step of training a neural network comprises the steps of: for said virtual network of stations (n), determining: o at least one station (y) as said target station; o remainder stations (n-1) as said variable stations; determining, for said target station, using at least two neurons having features, said features being: o PM 10 value feature for said target station; and o station distance data (Dst) feature for said target station.

In at least an embodiment, said set of first parameters comprises data corresponding to pre-identified traffic hotspots in said perimeter. In at least an embodiment, said set of second parameters comprises data corresponding to pre-identified industrial sources in said perimeter.

In at least an embodiment, said air quality data for said geo-location data comprises PM2.5 values and / or PM10 values.

In at least an embodiment, said distance computation is haversine distance computation method.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS:

Figure 1 illustrates a typical Continuous Air Monitoring Station (CAMS) of the prior art;

Figure 2 illustrates size of a Continuous Air Monitoring Station, of the prior art; Figure 3 shows, according to a non-limiting exemplary embodiment, an image of an Indian city, Bangalore; and

Figure 4 illustrates a typical data processing flow using CAMS, in accordance with prior art.

The invention will now be described in relation to the accompanying drawings, in which:

Figure 5 illustrates a typical data processing flow using virtual, artificially intelligent, network of this invention;

Figure 6 illustrates a typical data processing flow using virtual, artificially intelligent, network of this invention;

Figure 7 illustrates a flowchart for first neural network methodology;

Figure 8 illustrates a flowchart for data pre-processing for the first neural network methodology;

Figure 9 illustrates the first Neural Network methodology’s Architecture with 2 inputs, 15 neurons, 20 neurons, 15 neurons and 1 output; Figure 10 illustrates Comparison of actual data with predicted data for Bangalore, the standard deviation is not resembling the actual distribution; Figure 11 illustrates a first set of Descriptive Statistics correlative to the first methodology;

Figure 12 illustrates Comparison of actual data with predicted data for Delhi, results are much more similar to IDW but IDW is slightly better;

Figure 13 illustrates a second set of Descriptive Statistics correlative to the first methodology;

Figure 14 illustrates a flowchart for second neural network methodology;

Figure 15 illustrates a flowchart for Data Pre-processing of the second neural network methodology;

Figure 16 illustrates the second Neural Network methodology’s Architecture with 2(n-l) inputs, 15 neurons, 20 neurons, 15 neurons and 1 output;

Figure 17 illustrates Comparison of actual data with predicted data for Bangalore, Results are significantly better than previous approach and IDW; Figure 18 illustrates a first set of Descriptive Statistics, for Bangalore, correlative to the second methodology;

Figure 19 illustrates Comparison of actual data with predicted data for Delhi, Results are better than previous approach and IDW; and

Figure 20 illustrates a second set of Descriptive Statistics, for Delhi, correlative to the second methodology.

DETAILED DESCRIPTION OF THE ACCOMPANYING DRAWINGS: Figure 1 illustrates a typical Continuous Air Monitoring Station (CAMS) of the prior art.

Typically, Continuous Air Monitoring Station (CAMS), of the prior art, employs physical sensors on ground. CAMS is a single instance monitoring station that monitors air quality at a single location. Its data is sent to a database within a private network or via the internet that is accessible on a webpage dashboard or a web application as per the structure shown in Figure 1. The size of a Continuous Air Monitoring Station is large as, schematically, seen in Figure 2 of the accompanying drawings. Additionally, placement of these sensors is also ad hoc and does not cover a defined geography, efficiently. Moreover, the cost of operating a CAMS station is about US$ 5,000 to US$ 8,000 per month while one time installation of purchasing a CAMS is US$200,000 USD.

Figure 3 shows, according to a non-limiting exemplary embodiment, an image of an Indian city, Bangalore, which is India’s 3^rd most populous city with a population of 1.5 crore people; in this map, only 7 CAMS are installed and its coverage is also shown in the Figure 3. They mainly monitor PM2.5, PM 10, PM1, 03, SOx, NOx, and CO.

Figure 4 illustrates a typical data processing flow using CAMS, in accordance with prior art. Accordingly,

1. Request is made for geo-location [latitude longitude (x, y )] for the time ‘t’ a. A correlated database is then scanned for the Air Quality parameters (AQ) for the requested location: i. If found, it returns the output; ii. Else, it returns the output of the nearest station from the requested geo-location [latitude longitude]

According to the afore-mentioned prior art, OUTPUT Function can be characterized as:

AQ (x, y, t) = Sensor Values (S, d, t - At ) where, x = latitude y = longitude t = time at which the request is placed d = distance of the sensor from requested lat and lng

S = Sensor nearest to the requested lat lng

At = time gap between the requested time and the value available for that sensor.

According to this invention, there is provided an artificial neural network based virtual air monitoring network system.

Figure 5 illustrates a schematic block diagram of a computing environment, used by the system and method, of this invention, comprising a networked server (150) and multiple stations (112, 114, 116, 118) and multiple sensors (212, 214, 216, 218) interfacing with the networked server (150), by means of a network, according to one embodiment of this invention.

A ‘station’ may include pre-existing stations, Industrial Sources of Air pollution (first set of parameters), Traffic Hotspots (second set of parameters). One or more stations, in a pre-defined perimeter, form a virtual network of stations.

A ‘sensor’ may include air quality sensors. One or more sensors, in a pre defined perimeter, form a virtual network of sensors.

In at least an embodiment, the system (100) comprises an array of stations and an array of sensors

The system may further include: a device for allowing an administrator (user) to read, write, store, retrieve, edit, modify, add to, update, delete, insert, upload, data mine, download, transfer, email, schedule, notify, alert, text message, or instant message the system, method, and data of this invention.

In one aspect of the invention, the processor may include logic which could be at least one of fuzzy logic, artificial intelligence (AI), a knowledge base (KB), a neural network, a decision support system (DSS), agent, software agent, or an expert system.

In at least an embodiment, the network may be any type of network (including infrastructure) that provides communications, exchanges information, and / or facilitates the exchange of information between the components of system (100). The network may be or be part of the Internet, a Local Area Network, wireless network (e.g., a Wi-Fi/302.11 network), or other suitable connections. In other embodiments, one or more components of system (100) may communicate directly through dedicated communication links, such as, for example, a telephone network, an extranet, an intranet, the Internet, satellite communications, off-line communications, wireless communications, transponder communications, a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), and so forth.

In at least an embodiment, a stations database comprises identities of stations per defined geography, each station identity being correlated with geo-location data, statistical data, and air quality data.

In at least an embodiment, a sensors database comprises identities of sensors per defined geography, each sensor identity being correlated with geo-location data, statistical data, and air quality data. In at least an embodiment, a user, through their device, requests for air quality data. In processing such request, a request signal is generated from the user’s device along with geo-location data (x,y) of such request signal which is transmitted to the networked server for further processing.

In at least an embodiment, virtual air monitoring stations, which form the virtual air monitoring network system, according to this invention, are a hybrid model of Spatiotemporal Prediction Framework based on ANN along with Geospatial interpolation. Therefore, deployment of these sensors solves all major problems concerning a large air monitoring station.

In a preferred embodiment, a defined sensor network (virtual network of sensor) comprising, preferably about a hundred defined sensors, are deployed across a defined geography (such as a city) to cover an entire such geography, effectively. Data from such defined sensors is polled, by a sensor module, communicably coupled to the defined sensor network to provide sensor data. The one or more sensors are polled, for data, by the networked server.

In a preferred embodiment, a defined station network (virtual network of stations) comprising, preferably one or more pre-existing stations, already / previously deployed across a defined geography (such as a city) to represent an entire such geography. Data from such defined station/s is polled, by a station module, communicably coupled to the defined station network to provide station data. The one or more stations are polled, for data, by the networked server.

In at least an embodiment, a satellite module is configured to poll satellite data, in respect of the defined geography (such as a city) in order to obtain historical air quality data, weather data, and the like data; this set of data being a third set of data. In some embodiments, this data may also be resident at the defined station network correlative to corresponding station identity.

In at least an embodiment, a demographic module is configured to poll data such as socio-economic data of the given geography, data of human activities (construction, garbage burning, and the like) in the given geography, and the like data; this set of data being a fourth set of data.

In at least an embodiment, a computational module comprising a processor forms one or more virtual air sensor networks, the computation module comprising:

1. ANN Geospatial interpolation

2. ANN - Artificial Neural Network

3. RNN - Recurrent Neural Network

4. NN-GSI - Neural Network based Geospatial Interpolation

Figure 6 illustrates a typical data processing flow using virtual, artificially intelligent, network of this invention.

The computational module, which forms one or more virtual air sensor networks, receives a request is made for geo-location data at a time ‘t’ in response to which, the computational module is configured towards: a. Creating at least a Virtual Air Monitoring Sensor Network (VAMS) at the requested geo-location data; b. Identifying Traffic hotspots (set of first parameters) and Industrial sources of air pollution (set of second parameters); c. Pulling data (sensor data) from the sensors nearby to the geo location data from where request if made; d. Pulling data (third set of data) from the satellite module; e. Pulling data (fourth set of data) from the demographic module; f. Feeding the received variables into a Neural Network formed with the computational module for further processing; g. Return output of the Air Quality data (AQ) for the requested geo location data.

In at least an embodiment, an OUTPUT Function, used by the neural network, is characterized as:

AQ ( x, y, t ) = Interpolation ( Si, S₂, S₃, .. S_n, IS, TH, d, t) where, x = latitude (geo-location data) y = longitude (geo-location data) t = time at which the request is placed d, distance of the sensor from requested geo-location data = 0 S_x = Sensor number nearest to the requested geo-location data IS = Industrial Sources of Air pollution (first set of parameters) TH = Traffic Hotspots (second set of parameters)

In at least an embodiment, air quality output data, now, according to this invention, is a function of:

- Geo-location data (x, y), from where request is made;

- Time data (t), at which request is made;

- Distance data (d), of a sensor closest to the geo-location data (x, y) from where request is made;

- Identity data, of Sensor (Sx), closest to the geo-location data (x, y) from where request is made;

- Sensor data, of Sensor (Sx), closest to the geo-location data (x, y) from where request is made; - Difference in time (At), i.e. time gap, between Time data (t), at which request is made and latest Time Stamp data available in consonance with the sensor data, of sensor (Sx);

- First set of (constant) parameters, correlative, according to pre-defined parameters, to geo-location data (x, y), from where request is made, the first set of (constant) parameters providing a first set of data items;

- Second set of (variable) parameters, correlative, according to pre defined parameters, to geo-location data (x, y), from where request is made, the second set of (variable) parameters providing a second set of data items.

In at least an embodiment, inputs, sensed and / or measures, by the virtual air quality monitoring system, of this invention are:

• Data from stations and distance from stations to the point of request (a pair of (Sst, Dst) for each station where Sst is the value of pollutant from station and Dst is the distance from the station to the point of request);

• Distance from nearest traffic hotspots (Dt);

• Distance from nearest government recognized industrial estates/areas (Di);

• Data from the sensor network and distance from sensors to the point of request (a pair of (Ssn, Dsn) for each sensor where Ssn is the value of pollutant from sensor and Dsn is the distance from the sensor to point of request)

Input to the machine learning algorithm (Neural Network): { (Sst, Dst), Dt, Di, (Ssn, DSn) }.

In at least an embodiment, processing, by a computer processor, in response to the inputs, being: • Data gathered is first processed by filling in missing data, removing anomalies, etc.

• If there are n stations in a city, one of them is chosen as the target, and the rest of the stations are chosen as the variables that’d get fed into the machine learning algorithm. Distance is calculated using the haversine distance formula from stations to the target and sensors to the target. Now we have the set of { (Sst, Dst), Dt, Di, (Ssn, DSn) } that can be fed to the model

• The multiple models are trained by alternating the target station

• The pollution value is predicted by taking a weighted mean from all the models

In at least an embodiment, output, by the computer processor, in response to the processing, being:

A Virtual Air Quality Monitoring Station is made at the point of request by using the above process, the output is the set of pollutant values for the point of request derived by feeding set of { (Sst, Dst), Dt, Di, (Ssn, DSn) } to the machine learning algorithms.

In at least an embodiment, a predictive neural network is envisaged for the computational module comprising a processor. A neural network based approach is evident as it takes distance into consideration and predicts particulate matter (PM2.5) values with a better mean absolute error than IDW [Geospatial interpolation]. Hence, the system and method can be envisaged in terms of a first neural network methodology and a second neural network methodology.

In at least an embodiment of the first neural network methodology, a neural network, is trained, using two features: PM2.5 value of a station in a same city and its corresponding distance from a target station. Before training, the data is formatted so that it would be suitable for training. Distance is calculated using haversine distance formula. This system uses feed forward networks, NALU network, and a mixture of both; in order to select a best architecture and based on the result, feed forward network was chosen. foil d ~ 2 arcsin j sin² i i

Outliers are treated before training and a neural network is trained for 300 epochs each. After 300 epochs, the results are averaged to get predicted value. The neural network architecture had performance close to IDW but took time to train.

Difference between approaches lies in how data is formatted. Outliers are treated before training. For training, the system took 2(n-l) features for n stations. 1 station was chosen as target and n-1 stations were chosen as features.

Each station has 2 features:

1) PM2.5 value, and

2) Distance; therefore, making 2(n-l) features.

The neural network was trained for 100 epochs and there were significantly better results than IDW and the previous approach. The training time was also comparatively shorter than the previous approach.

Figure 7 illustrates a flowchart for first neural network methodology.

Figure 8 illustrates a flowchart for data pre-processing for the first neural network methodology. In at least an embodiment of the first neural network methodology, the following steps are traversed:

STEP 1: Gather data from multiple existing stations using web scraping scripts and spiders and merge data by city;

STEP 2: For filling missing values, a. Build a model to calculate PM2.5 values from existing PM 10 values b. Build a model to calculate PM 10 values from existing PM2.5 values c. For the rest of the missing values use temporal interpolation to fill the missing values d. For those values which are still missing, use backward filling to fill the missing values

STEP 3: Given n stations, choose one station as target denoted by ‘y’ feature and rest as features denoted by ‘x’

STEP 4: For each station i in x,

Calculate the distance between i and y using the haversine distance formula:

STEP 5: Remove all outlier values above 99^th percentile and below 1^st percentile STEP 6: Create a data frame containing 3 fields - reading from station x, distance from station x to y, reading from station y. The first two fields will be features for the neural network and the third field will the target STEP 7: Train neural network with the features that were engineered in the previous step

Figure 9 illustrates the first Neural Network methodology’s Architecture with 2 inputs, 15 neurons, 20 neurons, 15 neurons and 1 output. STEP 8: For prediction, for n stations take n records containing reading and distance and predict

STEP 9: Now, Calculate the mean of n predictions to get the predicted reading value

Using this, the system and method achieved a mean absolute error value of 10.1796 for the city of Bangalore with a correlation of 65% and 19.8340 for the city of Delhi with a correlation 89.3%. For Bangalore, this approach performed significantly better than IDW with 13% increase in correlation and lesser mean squared error and although IDW was slightly better for Delhi, this approach came close and was only slightly worse.

Figure 10 illustrates Comparison of actual data with predicted data for Bangalore, the standard deviation is not resembling the actual distribution. Figure 11 illustrates a first set of Descriptive Statistics correlative to the first methodology.

Figure 12 illustrates Comparison of actual data with predicted data for Delhi, results are much more similar to IDW but IDW is slightly better.

Figure 13 illustrates a second set of Descriptive Statistics correlative to the first methodology.

Figure 14 illustrates a flowchart for second neural network methodology. Figure 15 illustrates a flowchart for Data Pre-processing of the second neural network methodology.

In at least an embodiment of the second neural network methodology, the following steps are traversed: STEP 1: Gather data from multiple existing stations using web scraping scripts and spiders and merge the data by city STEP 2: For filling missing values, a. Build a model to calculate PM2.5 values from existing PM 10 values b. Build a model to calculate PM 10 values from existing PM2.5 values c. For the rest of the missing values use temporal interpolation to fill the missing values d. For those values which are still missing, use backward filling to fill the missing values

STEP 3: Given n stations, select 1 station as target ‘y’ and n-1 station as features denoted by ‘x’

STEP 4: for each station i in x:

Calculate Distance from all other stations to the target using haversine distance

STEP 5: Create 2(n-l) features where n is the number of stations containing readings and distance values

STEP 6: Remove all outlier values above 99th percentile and below 1st percentile

STEP 7: Train the neural network with 2(n-l) features.

Figure 16 illustrates the second Neural Network methodology’s Architecture with 2(n-l) inputs (features), 15 neurons, 20 neurons, 15 neurons and 1 output. STEP 8: For predicting an unknown value take n- 1 nearest stations out of the n stations STEP 9: Calculate distance from the station to the point using haversine distance

STEP 10: Predict the values and evaluate

Using this, the system and method achieved a mean absolute error value of 8.023 for the city of Bangalore with a correlation of 77.6% and 14.995 for the city of Delhi with a correlation 93.7%. This approach was significantly better than IDW and the previous approach using neural networks for both the cities of Delhi and Bangalore.

Figure 17 illustrates Comparison of actual data with predicted data for Bangalore, Results are significantly better than previous approach and IDW Figure 18 illustrates a first set of Descriptive Statistics, for Bangalore, correlative to the second methodology.

Figure 19 illustrates Comparison of actual data with predicted data for Delhi, Results are better than previous approach and IDW.

Technical Advantages of this invention: a. Reduces cost to one 1000^th of the current air monitoring station of $200k b. No manpower required c. Covers an entire city with increased resolution, gives door to door real time air quality data d. Cloud calibration e. Same accuracy f. easy to maintain, low cost and scalable The TECHNICAL ADVANCEMENT of this invention lies in providing a predicting neural network based system and method wherein the system can predict air quality data across India and extend this method to every village, every remote part of a country, and provide this information for the betterment of the people.

The TECHNICAL ADVANCEMENT of this invention lies in providing a predicting neural network based system and method which combines existing air monitoring infrastructure and a proprietary sensor network, by factoring human activities like traffic, industries source and combining it with interpolation techniques along with a feature engineered neural network (and additional machine learning algorithms) in order to be able to, accurately, predict pollution at any point of request in time and space.

According to a non-limiting exemplary embodiment, the system and method is used in order to interpolate PM2.5 values using multiple interpolation methods for multiple resolutions. The system and method was able to get up to 11m xllm resolution for Delhi and up to 3m x 3m resolution for Bangalore. The system and method compared various interpolation methods and concluded that IDW was the most consistent algorithm for both the cities. The system and method then compared IDW results with an IDW model trained with weather data using distance weighted k-Nearest Neighbours and also trained a neural network architecture to predict PM2.5 values using weather data. From the results, it was observed that weather data did not have a significant impact on pollutant values. The system and method used two neural networks based interpolation techniques and compared it with results of IDW. It was found that the second method with neural networks had a significant improvement in error as well as correlation compared to IDW. The ECONOMIC SIGNIFICANCE of this invention lies in the following: in order to install 4000 air monitoring stations as per the recommendation from World Health Organization (WHO), it would cost India $800,000,000 and about $2,000,000 in maintenance every year. However, using the system and method, of this invention, this cost could be less than $10,000 a year saving millions of lives and helping the country achieve the goals of NCAP program.

Air pollution is a global problem. From France to the United States of America, from the Middle East to Africa and all of Asia, as a planet we are suffering from this problem so much so that over 4.2 million premature deaths could be avoided apart from productivity loss, economy loss and contribution to global warming. This invention can be scaled globally to extend monitoring of air quality down to every 3 square metres on the planet by combining Satellite data and existing on-ground air monitoring stations. For places where there are no air monitors, adoption of low-cost air monitoring sensors could bring a huge respite and a sensor network can thus be created. These low-cost air monitors cost under $300 and are also low on maintenance. Using this invention, answers can be achieved towards controlling emissions, supplying data for standards of the Environmental, Social and Governance (ESG), for mitigating climate risk, for better use of Carbon funds, carbon sequestration, emission assessment, and a lot more.

While this detailed description has disclosed certain specific embodiments for illustrative purposes, various modifications will be apparent to those skilled in the art which do not constitute departures from the spirit and scope of the invention as defined in the following claims, and it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.

Claims

CLAIMS,

1. An artificial neural network based virtual air monitoring network system, for providing air quality data, on a user’s device, said system comprising a computer processor configured with a non-transitory memory to store instructions configured to be executed on said computer processor, said instructions comprising the steps of:

- receivin , and recording, a request signal, from a user’s device, for monitoring air quality, at a geo-location, said geo-location being defined in terms of geo-location data (x, y);

- forming a virtual network of stations (IS, TH, St), within a pre-defined perimeter, correlative to said geo-location (x, y), said virtual network of stations comprising at least one target station and one or more variable stations;

- forming a virtual network of sensors (Si , S₂, S₃, .. S_n), within a pre defined perimeter, correlative to said geo-location (x, y);

- polling , one or more stations, from said formed virtual network of stations, closest to said geo-location data (x, y) from where said request signal was transmitted, in order to provide station-related data items comprising: station identity data, station distance data (Dst) of said station from said geo-location data (x, y), station air quality data (Sst) resident at said station;

- polling , one or more sensors, from said formed virtual network of sensors, closest to said geo-location data (x, y) from where said request signal was transmitted, in order to provide sensor identity data, sensor distance data (Dsn) of said sensor from said geo-location data (x, y), sensor air quality data (Ssn) sensed by said sensor;

- determining , and polling , one or more parameters from a set of first parameters, within a pre-defined perimeter, of said geo-location data (x, y) to obtain first parameter- related data (PI) and correlative first parameter distance (Dl) with respect to said geo-location data

(x, y);

- determinin , and pollin , one or more parameters from a set of second parameters, within a pre-defined perimeter, of said geo-location data (x, y) to obtain second parameter-related data (P2) and correlative second parameter distance (D2) with respect to said geo-location data (x, y); and

- training a neural network using said station distance data (Dst), said sensor distance data (Dsn), said station air quality data (Sst), said sensor air quality data (Ssn), said first parameter- related data (PI), said correlative first parameter distance (Dl), said second parameter-related data (P2), said correlative second parameter distance (D2), all in correlation with said geo-location data (x, y) in order to process an output signal, delivered to said user’s device, of air quality data (AQ) for said geo-location data in response to said request signal.

2. The system as claimed in claim 1 wherein, said step of training a neural network comprising the steps of: o selecting at least a target station, from said virtual network of stations: o selecting one or more variable stations, from said virtual network of stations: o using Spatiotemporal Prediction Framework based on ANN along with Geospatial interpolation in order to provide air quality data (AQ) in correlation with said selected target station; o computing a first station distance data from said target station to said geo-location data (x, y); o computing a second station distance data from said sensor, from said virtual network of sensors, to said geo-location data

(x, y); o de- selecting said previously selected target station and re classifying said previously selected target station to one or more variable stations; o further selecting at least another target station, remaining from said virtual network of stations: o using Spatiotemporal Prediction Framework based on ANN along with Geospatial interpolation in order to provide air quality data (AQ) in correlation with said selected target station; o computing, iteratively, a next station distance data from said another selected target station to said geo-location data (x, y), till all stations in said virtual network of stations have exhausted; o computing a second station distance data from said sensor, from said virtual network of sensors, to said geo-location data (x, y); and o computing a weighted mean from a plurality of said air quality data (AQ) to obtain a final air quality data (AQ) for said geolocation data (x, y).

3. The system as claimed in claim 2 wherein, said step of training a neural network comprising the steps of: - polling air quality data from said one or more pre-existing stations, from said virtual network of stations, said air quality data comprising PM2.5 values and PM 10 values;

- polling air quality data from said one or more sensors, from said virtual network of sensors, said air quality data comprising PM2.5 values and PM 10 values;

- building a linear regression model correlating said PM2.5 values with said PM10 values;

- filling missing values, of said air quality data, by: o using built linear regression model , populating missing PM2.5 values based on existing PM10 values based on existing PM2.5 values; o using built linear regression model , populating missing PM 10 values; o using a temporal interpolation model , populating missing PM2.5 values and missing PM10 values; o using a backward filling model , populating missing PM2.5 values and missing PM10 values;

- determining said target station, from said virtual network of stations;

- determining said variable station, from said virtual network of stations, as features in said neural network;

- computing, for each target station, a distance value between each of said variable stations and said target station;

- creating a feature-engineered dataframe comprising said fields: o air quality data from said variable stations; o distance from each of said variable stations to said target station; o air quality data from said target station: - training said neural network basis said created feature-engineered dataframe;

- obtaining time-stamped predicted values of air quality data, for said target station, as an output of said trained neural network;

- grouping said predicted values of air quality data basis said time- stamp; and

- computing a mean of said grouped predicted values of air quality data to obtain at least a final air quality data for said geo-location data.

4. The system as claimed in claim 2 wherein, said step of training a neural network comprising the steps of:

- for said virtual network of stations (n), determining: o at least one station (y) as said target station; o remainder stations (n-1) as said variable stations;

- determining number of neurons correlative to number of stations (n), from said virtual network of stations, in that, said correlation being equal to 2(n-l) stations, thereby providing 2(n-l) neurons, each target station having: o at least air quality readings; o distance values correlative to one or more variable stations from said virtual network of stations; and

- eliminating outliers determined by percentile values based on distance values to provide values correlative to said provided 2(n-l) neurons.

5. The system as claimed in claim 2 wherein, said step of training a neural network comprising the steps of:

- for said virtual network of stations (n), determining: o at least one station (y) as said target station; o remainder stations (n-1) as said variable stations; - determining, for said target station, using at least two neurons having features, said features being: o PM2.5 value feature for said target station; and o station distance data (Dst) feature for said target station.

6. The system as claimed in claim 2 wherein, said step of training a neural network comprising the steps of:

- determining, for said target station, using at least two neurons having features, said features being: o PM 10 value feature for said target station; and o station distance data (Dst) feature for said target station.

7. The system as claimed in claim 1 wherein, said set of first parameters comprising data corresponding to pre-identified traffic hotspots in said perimeter.

8. The system as claimed in claim 1 wherein, said set of second parameters comprising data corresponding to pre-identified industrial sources in said perimeter.

9. The system as claimed in claim 1 wherein, said air quality data for said geo-location data comprising PM2.5 values and / or PM10 values.

10. The system as claimed in claim 1 wherein, said distance computation being haversine distance computation method.