CN117354330A - Improved edge computing IoT big data analysis architecture - Google Patents
Improved edge computing IoT big data analysis architecture Download PDFInfo
- Publication number
- CN117354330A CN117354330A CN202311335469.6A CN202311335469A CN117354330A CN 117354330 A CN117354330 A CN 117354330A CN 202311335469 A CN202311335469 A CN 202311335469A CN 117354330 A CN117354330 A CN 117354330A
- Authority
- CN
- China
- Prior art keywords
- data
- processing
- internet
- things
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 92
- 238000000034 method Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 10
- 238000001914 filtration Methods 0.000 claims abstract description 10
- 230000002776 aggregation Effects 0.000 claims abstract description 8
- 238000004220 aggregation Methods 0.000 claims abstract description 8
- 230000007246 mechanism Effects 0.000 claims abstract description 5
- 238000005457 optimization Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000012544 monitoring process Methods 0.000 claims description 17
- 238000013507 mapping Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 230000007613 environmental effect Effects 0.000 claims description 5
- 238000011161 development Methods 0.000 claims description 3
- 230000037406 food intake Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000010076 replication Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 2
- 239000007787 solid Substances 0.000 claims description 2
- 238000013459 approach Methods 0.000 claims 1
- 230000003139 buffering effect Effects 0.000 claims 1
- 238000005192 partition Methods 0.000 claims 1
- 238000010801 machine learning Methods 0.000 abstract description 9
- 238000010606 normalization Methods 0.000 abstract description 6
- 230000009467 reduction Effects 0.000 description 8
- 230000008676 import Effects 0.000 description 5
- 238000007726 management method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 231100000627 threshold limit value Toxicity 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
- G06F18/15—Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/568—Storing data temporarily at an intermediate stage, e.g. caching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The patent relates to an improved internet of things (IoT) big data analysis system, comprising an internet of things edge layer and a cloud processing layer, aiming at solving the challenge of processing and managing large-scale IoT sensor data. The edge layer is responsible for collecting and primarily processing data from various sensors and devices, and then transmitting the data to the cloud processing layer for distributed storage and preprocessing. Preprocessing includes normalization, filtering, queuing, and data aggregation to improve data quality and provide a better data basis for subsequent processing and model training. The Map-Only algorithm and the MapReduce parallel processing mechanism are adopted, so that the data processing speed and efficiency are improved. In addition, training and reasoning of the machine learning model is performed using an optimized BP neural network algorithm. The method provides strong support for the application of the Internet of things, can be used for real-time decision making, prediction and resource optimization, provides an efficient solution for processing and analyzing the data of the Internet of things, and has wide application potential.
Description
Technical Field
The invention relates to the field of big data analysis of the Internet of things, in particular to an improved edge computing (IoT) big data analysis architecture.
Background
With the popularity of the internet of things (IoT), the large-scale data generated by various sensor devices presents an explosive growth, and thus data processing and management faces unprecedented challenges. The rise of edge computation provides new possibilities to address these challenges, pushing computation and data processing towards the network edge to reduce latency and improve scalability. In addition, the application of techniques such as machine learning and federal learning has made data analysis more intelligent, but has also presented new challenges. Traditional data processing frameworks and methods have been inefficient and, therefore, new research is needed to optimize data intake, processing, and storage to achieve more intelligent, efficient, and scalable IoT applications.
Disclosure of Invention
The present invention aims to propose an improved edge computing IoT big data analysis architecture, aiming at coping with IoT generated big data challenges with edge computing and machine learning technologies. The proposed framework and algorithm aim to solve the problems of data ingestion, processing and storage to achieve more efficient big data analysis. This will provide new methods and tools for IoT and edge computing fields, drive the development of this field, provide more opportunities for various application fields, and help solve the problem of large-scale data processing. The final goal is to introduce intelligence into all objects in the physical world, promote the fusion of the animal networking application and the machine learning technology, and provide support for future intellectualization.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
on one hand, the embodiment of the invention comprises an Internet of things edge data processing method, which performs preliminary processing and integration on the original data from the sensor and provides a basis for subsequent big data analysis and management. Data is collected from various internet of things devices and sensors, then preliminary processing is carried out on the data, then the data is transmitted to nearby edge devices or servers, and caching and processing are carried out by the edge devices.
On the other hand, the embodiment of the invention further comprises a cloud processing layer responsible for data loading and parallel processing. Firstly, after the cloud processing layer receives the cache data of the edge, distributed storage is needed. Distributed storage typically employs multiple storage nodes, which may be located in different physical locations. This architecture ensures redundant backup of data to improve reliability and fault tolerance.
Further, the cloud processing layer needs to perform preliminary processing on the original data, and the pretreatment mechanisms such as normalization, filtering and queuing are used for preparing the data for effective processing and training, so that the quality and accuracy of the data are improved, and the data are suitable for subsequent processing, analysis and modeling.
Preferably, the normalization operation is used to eliminate deviations in the data, ensure consistency of the data, and improve accuracy of data processing.
Preferably, filtering is used to speed up the actual processing speed. Selectively retaining high quality information while filtering out bad or noisy data, thereby improving data quality
In order to accelerate the data processing speed and efficiently utilize big data, an M/M/1 queuing model is optimized, and a mixed M/M/1 queuing model is adopted for queuing processing.
Preferably, a message queue is used to speed up data processing. The message queue runs in a specific operating mode, the message M is acquired at time t and then forwarded to the generated component, controlled by H (a specific handler). This enhances efficient use of big data, ensuring that messages are processed and delivered in a predetermined manner when needed.
Further, data aggregation of the preprocessed data, integration of data from multiple IoT sensor sources into a central location, provides accurate packet data for further analysis.
Further, the data divided into blocks is parallelized with data loading and mapping using a parallel algorithm. And the data blocks are loaded on a plurality of nodes at the same time, so that the overall calculation efficiency is improved. The large data set is divided into fixed-size small blocks, which are processed in parallel by each node, by optimizing the block size of the processing unit to keep the number of parallel channels balanced.
In another aspect, machine learning algorithm parallel processing of a BP neural network is provided, including training and reasoning of a BP neural network machine learning model. The BP network can learn and store a large number of mappings of input and expected results, and automatically fine-tune network weights and threshold limit values through error approximation and error back propagation. The BP model construction has parallelism, the optimized BP model is trained and verified in parallel by utilizing a plurality of processing nodes, and score results generated by each processing node are combined by using integrated learning so as to improve the results.
The beneficial effects of the invention are as follows: the edge computing IoT big data analysis architecture provided by the invention effectively solves the large-scale heterogeneous data processing and storage challenges generated by IoT applications, and provides an efficient data processing solution. Second, by optimizing data loading, cluster resource management, and machine learning applications, the performance and speed of data processing is improved, enabling IoT applications to obtain valuable information more quickly. In addition, distributed storage and parallel processing are adopted, so that the efficiency and usability of data processing are improved. The invention has wide application range, can be suitable for various IoT application fields, and provides better data analysis and modeling tools for business and decision support. Therefore, the invention is expected to generate important practical benefits in the fields of edge computing and IoT big data processing.
Drawings
FIG. 1 is a system model of data analysis of the Internet of things;
FIG. 2 is a schematic diagram of a modified edge computing IoT big data architecture data processing flow;
FIG. 3 is a schematic diagram of data processing and transmission for an M/M/1 queue;
FIG. 4 is a schematic diagram of parallel processing by MapReduce operations;
FIG. 5 is a block diagram of a hybrid BP neural model.
Detailed Description
The invention will now be described in further detail with reference to the drawings and examples.
Referring to fig. 1, the invention provides a system model for analyzing big data of internet of things data, and constructs a system model consisting of an internet of things edge layer and a cloud processing layer. The first internet of things edge layer includes various IoT sensors and embedded devices that cover a number of areas including environmental monitoring, security monitoring, facility monitoring, traffic monitoring, power monitoring, and transportation monitoring, among others, and are integrated with edge servers.
The second cloud processing layer is a core component of the big data analysis framework and plays a crucial role. Its main responsibilities include receiving, processing, and storing data from the internet of things edge layer. The layer is composed of a cloud server and various processing units, and various data processing tasks are executed after various complex data generated by the edge server of the edge equipment of the Internet of things are collected so as to calculate valuable information about the data. At the same time, it is desirable to ensure redundant backup and high availability of data to ensure that the data is available for access when needed. The cloud processing layer enables the system to efficiently process large and heterogeneous data, providing support for various IoT applications.
In the implementation, the edge computing IoT big data processing architecture provides the whole process of big data processing in the internet of things edge layer and the cloud computing environment, and aims to efficiently manage and analyze large-scale internet of things data. Referring to fig. 2, the big data processing includes the steps of:
(S1) the edge devices and servers receive a large amount of data generated by various internet of things sensors and embedded devices.
(S2) the edge data is properly collected through an edge cache in the cloud processing layer, and then distributed storage is performed by the cloud.
And S3, optimizing loading of big data by using a Map-Only algorithm, and reducing communication overhead and improving data processing speed and efficiency by parallelizing data loading and mapping.
(S4) carrying out pretreatment such as normalization, filtering and queuing, data aggregation and the like on the collected data, and preparing the data for effective treatment and training.
And (S5) parallel processing under the edge environment is realized through an optimized MapReduce mechanism. The MapReduce programming paradigm has better scalability, flexibility, cost-effectiveness, rapidity, simplicity, and resilience.
(S6) training and reasoning the machine model by using an optimized Back Propagation (BP) neural network algorithm. The model can learn the mapping relation between input data and output data by continuously adjusting the weights and the threshold values of the neural network to minimize the loss function of the model.
In the execution step S1, a large amount of data generated by the sensor and the embedded device is received. In an edge environment, various sensors, such as: environmental, security, facility, traffic, power and transportation observation sensors are deployed on different locations and devices. These sensors are responsible for monitoring and collecting various data such as temperature, humidity, location, events, etc., and depending on their design and use, a large amount of data is constantly being generated. The generated data is received, buffered, and processed by nearby edge devices or servers.
In step S2, the edge cache performs preliminary processing, aggregation and temporary storage on the data. This process helps reduce redundancy and bandwidth consumption of data transmission to the cloud. Data is grouped and compressed by type, time stamp, etc. characteristics so that it can be processed more efficiently.
And after the data is initially processed in the edge cache, uploading the data to cloud storage. This process is periodic or based on trigger conditions to ensure consistency and timeliness of the data. At the cloud, the data is stored in a distributed storage system. This system is responsible for persistent storage, backup, management and scalability of data. The data is stored in a partitioned, duplicate policy to ensure the reliability and availability of the data.
In step S3, map-Only algorithm is introduced to parallelize data loading and mapping. Data loading depends on the type of processing available inside the parallel and distributed platforms, and data must be loaded to the parallel processing platform before processing. The Sqoop utility needs to be integrated with the Map job of MapReduce paradigm, and then Map-Only algorithm is introduced to adjust the segmentation size and replication factor of the traditional method for parallel data ingestion. First, a new directory is created under the root directory. Next, a file is verified and added to the directory. Then, command change copying is performed. Similarly, replication of all files is changed in the directory. To this end, specific and generic parameters are used to control the operation of the Sqoop tool.
This process configures the generic Hadoop command line parameters via the Sqoop tool, then selects the source tables to import from the relational database management system (RDBMS), and specifies the storage format of the data. Next, the particular column subset to import is selected using the-columns parameter, as needed, while the SQL WHERE clause is used to filter the data to import. Finally, incremental import is realized by utilizing the-increment parameter of the Sqoop, and only new records or updated records in the RDBMS source table are imported, so that the data in the Hadoop is ensured to keep the latest state. This process makes the import of data from the RDBMS to the Hadoop Distributed File System (HDFS) highly configurable and flexible.
In step S4, data is preprocessed. The method comprises the steps of normalizing, filtering, queuing and data aggregation, so that the quality of the data is improved, the processing speed is accelerated, and a better data base is provided for subsequent processing and model training.
Data normalization scales the value of the data to a range of 0 to 1 using a min-max normalization method. Data of different scales, ranges or units are converted into uniform standard scales, and the values of the data points are mapped to a range of 0 to 1 so that the minimum value becomes 0, the maximum value becomes 1, and other values are located between the two. The dimensional difference of the data is eliminated, and different characteristics or variables are ensured to have similar dimensions, so that the data analysis and modeling are easier to perform.
The core idea of filtering with optimized Kalman Filtering (KF) is to combine the previous state estimation with the new observation data to obtain a more accurate estimation of the system state. The starting point of the algorithm is the initialization, which includes defining the dynamics of the system (transfer model T), the observation mode (observation model O), and the estimation of the system uncertainty (noise covariance CN and observation covariance CO). It then completes the state estimation by a series of steps: firstly, obtaining initial data; the previous state estimate is then searched, and new observations are then acquired. Next, the transition model and previous state estimates are used to predict the current system state and estimate the state uncertainty. The new observations are then combined with the predicted state, and the state estimate is updated by calculating the kalman gain. Finally, the process is repeated, including the prediction and observation steps, to gradually refine the state estimate. Once all time steps have been processed, the filter process ends, providing a series of estimates of the state of the system that take into account the noise of the observed data and the dynamics of the system, which can provide accurate state estimates in various applications.
And a mixed M/M/1 queuing model is adopted for queuing, so that the data processing speed is increased, and big data is efficiently utilized. The model performs various operations when it receives the data segmentation D at time t. At this point, the system is considered to be in a steady state. FIG. 3 illustrates data processing and transmission for an M/M/1 queue. Two divisions, e.g. S 1 And S is 2 Is balanced at steady state, then S 2 ,...,S k-1 ,S k ,S k+1 And an arrival rate Λ, the service rate μ can be calculated as:
ΛS 0 =μS 1 ,ΛS 1 =μS 2 ,ΛS k =μS k+1
thus, it is possible to obtain: s is S n =(Λ/μ) k S k ―1=(Λ/μ) k S 0 ,
When the probability is equal to 1, it is:S 0 (1+(Λ/μ) 1 +(Λ/μ) 2 +...)=1
summing the series can result in: s=1- Λ/μ
When S=1-S 0 When s=1 (1- Λ/, mu)
Because of S n =SR k (1―S),
The average value is calculated as follows:
mq=m-Tasks are known,
thus, the waiting time can be obtained as
Finally, the average number of tasks in the system becomes
Finally, the preferred technique for data aggregation is to divide the vast data set into smaller blocks, reduce the complexity of each block, make it easier to manage and process, and group similar data in blocks, which are then processed simultaneously on different processing units. Inside each data block, various aggregation operations may be performed, such as summing, averaging, counting, etc. These operations serve to generate summary data, reduce the size of the data set, and provide a higher level of data summarization.
In step S5, parallel processing under the edge environment is realized through an optimized MapReduce mechanism. Big data is very huge and needs to be divided into blocks or segments for distributed storage and parallel processing. Therefore, the MapReduce paradigm is preferred.
MapReduce is a programming model and processing technique for processing and generating large-scale data sets, and parallel processing includes mapping (Map) and reduction (Reduce), and the flow is shown in fig. 4. First, a large-scale data set is divided into small data blocks, which can be processed on different nodes of a parallel computing cluster, with each data block containing a plurality of records or data items. Then, in the mapping phase, each data block is passed to a set of mapping tasks. The goal of the mapping task is to convert each record or data item in an input data block into a set of key-value pairs, where the keys are used to identify certain attributes of the data and the values contain the actual data. The mapping tasks are performed in parallel, each task processing its assigned data block independently. Next, the MapReduce framework sorts and groups the key-value pairs output by the key-pair map. This process groups key-value pairs with the same key together and assigns them to different reduction tasks. And partitioning the keys to ensure load balancing of the reduction tasks. A reduction is then performed, each reduction task taking a set of key-value pairs with the same key, and performing a user-defined reduction operation. These operations typically include aggregating values, calculating statistics, or performing other data processing operations. The reduction tasks are also performed in parallel, each task independently processing its assigned data set. Each reduction task generates a partial result that is ultimately combined into a complete result set that contains the final processing results for the input data set.
And 6, training and reasoning a machine model by using an optimized Back Propagation (BP) neural network algorithm. BP neural network is a neural network model for machine learning, used for training and reasoning. And (3) the parallelism of BP model construction, and a plurality of processing nodes are utilized to train and verify the optimized BP model in parallel. The score results produced by each processing node are combined using ensemble learning to improve the results. The BP model consists of an input layer, a hidden layer and an output layer, as shown in fig. 5.
The preferred variant of the BP network adopts an additional momentum technology, and a gradient descent algorithm is adopted to construct a parallel model by introducing a momentum coefficient mu. The coefficients as a function of weight are shown below:
where Δω (n+1) and Δω (n) represent weights after the n+1th and n-th iterations. The value of μmust be between 0 and 1, gx/gw representing the negative value of the gradient. And a variable learning rate method is adopted, and self-adaptive adjustment is carried out according to the error change. The adjustment of the learning rate in the adaptive adjustment can be calculated as follows:
wherein the increment factor m++ is greater than 1 and the decrement factor m-is between 0 and 1. Here, X (n+1) and X (n) represent the sum of squares of total errors after the (n+1) th and n-th iterations, respectively. Finally, Δ represents the learning rate. The learning direction management concept of BP is that the weight and the threshold value of the network are adjusted to be consistent with the direction of the gradient.
According to formula S i+1 =S i ―Λ i g i . Si represents a matrix of existing weight thresholds g i Representing the gradient of the current operation, delta i Representing the learning rate. Assume that there is one having an input node y j Hidden layer node x j And output layer node z i And we can then get the three-layer BP model:
wherein the method comprises the steps ofThe calculated output of the output node is:
the error of the available output nodes is:
final output bits:
the invention provides a comprehensive data processing and analyzing system of the Internet of things, which integrates an Internet of things edge layer and a cloud processing layer, and efficiently manages and analyzes large-scale and diversified data of the Internet of things through the steps of data acquisition, preprocessing, distributed processing, machine learning model training reasoning and the like. The method provides key data support for various IoT applications, facilitates real-time decision making, prediction and resource utilization optimization, improves data processing efficiency and quality of the internet of things system, facilitates development of the internet of things technology, and provides a solid foundation for innovation in the fields of intelligent cities, intelligent transportation, environmental protection and the like.
The preferred embodiments disclosed above are merely to help illustrate the present invention, and it is obvious to those skilled in the art that the scope of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
Claims (7)
1. A system model for big data analysis of internet of things data, the system model comprising: and two layers consisting of an Internet of things edge layer and a cloud processing layer.
The internet of things edge layer comprises various IoT sensors and embedded devices, and is used in various fields such as environment monitoring, security monitoring, facility monitoring, traffic monitoring, power monitoring and transportation monitoring. The internet of things edge layer is integrated with the edge server and is used for receiving, processing and buffering data from the sensor and the embedded device.
The cloud processing layer comprises a cloud server and various processing units, and is used for receiving, processing and storing data from the edge layer of the Internet of things, and comprises the following data processing flows:
the optimization of data loading is realized through a Map-Only algorithm, communication overhead is reduced through parallelization of data loading and mapping, and data processing speed and efficiency are improved.
The data is pre-processed, e.g., normalized, filtered, queued, and aggregated, to prepare the data for further processing and training.
The parallel processing under the edge environment is realized by adopting a MapReduce mechanism, and better scalability, flexibility, cost effectiveness, rapidness, simplicity and elasticity are provided.
Machine model training and reasoning is performed using an optimized Back Propagation (BP) neural network algorithm to learn the mapping from input data to output data.
2. The system model of claim 1, wherein the sensors of the internet of things edge layer include environmental observation sensors, security monitoring sensors, facility monitoring sensors, traffic monitoring sensors, power monitoring sensors, and transportation observation sensors.
3. The system model of claim 1, wherein the data loading process of the cloud processing layer includes mapping job integration with a MapReduce paradigm using an Sqoop tool to achieve parallel data ingestion by adjusting partition size and replication factors.
4. The system model of claim 1, wherein the data preprocessing process of the cloud processing layer includes filtering using kalman filtering, queuing using a hybrid M/1 queuing model, and data aggregation using a divide-by-conquer approach.
5. The system model of claim 1, wherein the machine model training and reasoning process of the cloud processing layer includes model training using a BP neural network algorithm, with weights and thresholds of the neural network being continuously adjusted to minimize a loss function of the model, enabling the model to learn a mapping relationship between input data and output data.
6. The system model of claim 1, for processing large-scale and heterogeneous internet of things data, supporting various IoT applications, providing capabilities for real-time decision making, prediction and resource optimization, improving data processing efficiency and quality of the internet of things system.
7. The system model according to claim 1, which is suitable for innovation in the fields of intelligent cities, intelligent transportation, environmental protection and the like, and provides a solid foundation for development of the fields.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311335469.6A CN117354330A (en) | 2023-10-13 | 2023-10-13 | Improved edge computing IoT big data analysis architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311335469.6A CN117354330A (en) | 2023-10-13 | 2023-10-13 | Improved edge computing IoT big data analysis architecture |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117354330A true CN117354330A (en) | 2024-01-05 |
Family
ID=89359014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311335469.6A Pending CN117354330A (en) | 2023-10-13 | 2023-10-13 | Improved edge computing IoT big data analysis architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117354330A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117873402A (en) * | 2024-03-07 | 2024-04-12 | 南京邮电大学 | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering |
-
2023
- 2023-10-13 CN CN202311335469.6A patent/CN117354330A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117873402A (en) * | 2024-03-07 | 2024-04-12 | 南京邮电大学 | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering |
CN117873402B (en) * | 2024-03-07 | 2024-05-07 | 南京邮电大学 | Collaborative edge cache optimization method based on asynchronous federal learning and perceptual clustering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103345514B (en) | Streaming data processing method under big data environment | |
Jiang et al. | Dimboost: Boosting gradient boosting decision tree to higher dimensions | |
CN102737126B (en) | Classification rule mining method under cloud computing environment | |
CN103701635B (en) | Method and device for configuring Hadoop parameters on line | |
CN105550374A (en) | Random forest parallelization machine studying method for big data in Spark cloud service environment | |
CN102567312A (en) | Machine translation method based on distributive parallel computation framework | |
CN117354330A (en) | Improved edge computing IoT big data analysis architecture | |
CN107247799A (en) | Data processing method, system and its modeling method of compatible a variety of big data storages | |
CN114418129B (en) | Deep learning model training method and related device | |
CN108885641A (en) | High Performance Data Query processing and data analysis | |
CN115858675A (en) | Non-independent same-distribution data processing method based on federal learning framework | |
CN103281374A (en) | Method for rapid data scheduling in cloud storage | |
CN112199154B (en) | Reinforced learning training system and method based on distributed collaborative sampling center type optimization | |
CN117875454B (en) | Multistage intelligent linkage-based data heterogeneous federation learning method and storage medium | |
CN113672684A (en) | Layered user training management system and method for non-independent same-distribution data | |
CN109754638B (en) | Parking space allocation method based on distributed technology | |
Zhang et al. | Txallo: Dynamic transaction allocation in sharded blockchain systems | |
CN105550351B (en) | The extemporaneous inquiry system of passenger's run-length data and method | |
Wei et al. | Participant selection for hierarchical federated learning in edge clouds | |
Liang et al. | Collaborative Edge Service Placement for Maximizing QoS with Distributed Data Cleaning | |
CN114691327A (en) | Multi-objective group intelligent optimization method and system for two-stage task scheduling | |
CN117391858A (en) | Inductive blockchain account distribution method and device based on graphic neural network | |
Esfahanizadeh et al. | Stream iterative distributed coded computing for learning applications in heterogeneous systems | |
Fan et al. | Self-adaptive gradient quantization for geo-distributed machine learning over heterogeneous and dynamic networks | |
Ge et al. | Compressed collective sparse-sketch for distributed data-parallel training of deep learning models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |