CN113222109A

CN113222109A - Internet of things edge algorithm based on multi-source heterogeneous data aggregation technology

Info

Publication number: CN113222109A
Application number: CN202110343274.0A
Authority: CN
Inventors: 王暾; 田禹
Original assignee: Xinruixin Intelligent Iot Research Institute Nanjing Co ltd
Current assignee: Xinruixin Intelligent Iot Research Institute Nanjing Co ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-08-06

Abstract

The invention discloses an internet of things edge algorithm based on a multi-source heterogeneous data aggregation technology, which comprises the following steps: s1, obtaining multi-source internet of things heterogeneous data and carrying out standardized aggregation processing; s2, analyzing and processing to obtain a data set; s3, extracting event elements to obtain abstract contents; s4, obtaining a training sample; s5, calculating a flow session feature vector and a regression model prediction vector of each shunted training sample; s6, obtaining a trained target data extraction model; s7, sending the data to a cloud database; and S8, sending the tag field to a cloud database for retrieval, and extracting target data after the retrieval is finished. According to the method, analysis processing on data of a certain type or a certain field is optimized, multi-source business data are processed in a combined mode, data can be extracted according to a configuration mode, data transmission to the cloud is greatly reduced, a large amount of equipment and operation and maintenance personnel cost is saved, and the retrieval efficiency and accuracy of multi-source heterogeneous data are effectively improved.

Description

Internet of things edge algorithm based on multi-source heterogeneous data aggregation technology

Technical Field

The invention relates to the technical field of data processing, in particular to an internet of things edge algorithm based on a multi-source heterogeneous data aggregation technology.

Background

The internet of things is an information carrier such as the internet and a traditional telecommunication network, and all common objects capable of performing independent functions are enabled to realize an interconnected network.

On the internet of things, everyone can use the electronic tag to link the real object to the internet, and the specific position of the real object can be found on the internet of things. The Internet of things can be used for carrying out centralized management and control on machines, equipment and personnel by using a central computer, can also be used for carrying out remote control on household equipment and automobiles, searching positions, preventing articles from being stolen and the like, is similar to an automatic control system, and can be finally gathered into big data by collecting data of the facts, including redesigning roads to reduce important social changes such as traffic accidents, urban updating, disaster prediction and epidemic control and the like, so that the object and the object are connected.

When data of the internet of things are processed, massive data are generated, and if the data are gathered together, the data are astronomical numbers. Real-time mass data analysis, storage and extraction at the cloud end are a huge challenge to computing capacity and network bandwidth, and edge computing is particularly suitable for application scenarios with special business requirements such as low time delay, high bandwidth, high reliability, mass connection, heterogeneous convergence, local security and privacy protection, so that an internet of things edge algorithm based on a multi-source heterogeneous data aggregation technology is urgently needed.

Disclosure of Invention

The invention aims to provide an Internet of things edge algorithm based on a multi-source heterogeneous data aggregation technology, which is established on the Internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, optimizes analysis processing on data of a certain type or a certain field, enables multi-source business data to be processed in a combined mode, can extract data according to a configuration mode, greatly reduces data transmission to the cloud, saves a large amount of equipment and operation and maintenance personnel cost, effectively improves retrieval efficiency and accuracy of multi-source heterogeneous data, and improves integrity of extracted data so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

an internet of things edge algorithm based on a multi-source heterogeneous data aggregation technology comprises the following steps:

s1, obtaining multi-source internet of things heterogeneous data, performing standardized aggregation processing on the multi-source internet of things heterogeneous data, and locally storing the multi-source internet of things heterogeneous data;

s2, analyzing and processing the multi-source internet of things heterogeneous data subjected to standardized polymerization to obtain a data set;

s3, constructing a convolutional neural network to extract event elements from the analyzed and processed data set to obtain abstract contents;

s4, further analyzing and processing the abstract content by using a text similarity algorithm and a maximum edge correlation model to obtain a plurality of certain types or a plurality of data containing certain fields as training samples;

s5, shunting the training samples according to the label types, and calculating the flow session feature vector and the regression model prediction vector of each shunted training sample;

s6, inputting the streaming session feature vector, the regression model prediction vector and the label of each training sample into a data extraction model for training to obtain a trained target data extraction model;

s7, sending the data and the tag group to a cloud database according to the data type;

and S8, receiving the input data of the specific type or the specific field of the search, analyzing the data of the specific type or the specific field of the search, extracting the tag field, sending the tag field to a cloud database for searching, and extracting the target data after the searching is finished.

As an optimal internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, in the S2, the multi-source internet of things heterogeneous data after the standardized aggregation processing is analyzed and processed in a filtering, similar merging and de-duplication manner.

As an optimal internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, the S3 specifically includes the following steps:

s31, traversing all samples of the data set, performing single sentence segmentation and manual labeling on the samples to obtain a model data set D:

wherein l_jText single sentence c after segmenting for sample_jLabel of l_jE.g., { certain kind or data containing a certain field }, j 1,2,.

S32, extracting the feature vector of the text single sentence in the model data set D to obtain a data set feature matrix F;

s33, constructing a convolutional neural network, and recording the convolutional neural network as TextCNN, wherein the TextCNN network structure comprises a convolutional layer, a maximum pooling layer, 2 full-link layers and a softmax layer;

s34, randomly dividing the model data set characteristics F into a training set, a testing set and a verification set according to the ratio of 4:2: 1;

s35, training the convolutional neural network TextCNN obtained in the step S33 by using the training set and the verification set which are divided in the step S34 to obtain a trained network Model;

and S36, abstracting the summary of the test set in the step S34 by using the Model obtained in the step S35, so as to obtain a data set which only comprises a certain type or contains a certain field and is marked as summary content.

As an optimal selection of the internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, the step of calculating the streaming session feature vector and the regression model prediction vector of each shunted training sample in S5 includes the following steps:

s51, calculating a state transition list of each shunted training sample, performing space compression on the state transition list of each training sample, dividing the state transition list into a plurality of mutually disjoint subsets, and performing coding operation by using different alphabet recoding aiming at each subset to obtain coding feature information of each subset;

s52, combining similar coding characteristic information in the coding characteristic information of each subset through a state transition side corresponding to the label type to obtain a streaming session characteristic vector of each training sample after shunting;

and S53, carrying out regression model analysis on the flow session feature vector of each shunted training sample to obtain a regression model prediction vector of each shunted training sample.

As an preferable aspect of the internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, in S52, similar encoding feature information in the encoding feature information of each subset is processed by using a state transition edge corresponding to a label type

The row combination is carried out, and the flow conversation feature vector of each training sample after the shunting is obtained comprises the following steps:

s521, identifying a state transition matrix in the coding feature information of each subset through a state transition edge corresponding to the label type;

s522, acquiring a target state transition matrix with the same state transition parameters, and determining coding characteristic information corresponding to the target state transition matrix as similar coding characteristic information;

and S523, combining the similar coding feature information in the coding feature information of each subset to obtain the streaming session feature vector of each training sample after shunting.

As an optimal selection for the internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, the cloud database in S7 further associates the received data with the tags in the tag group, and the associated tags form a cluster.

As an optimal selection for the internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, the cloud database in S7 further extracts and classifies the clusters to form a classification directory, and sends out the classification directory.

Preferably, in the internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, in the step S8, the tag fields are sent to the cloud database for retrieval, feature vectors extracted from the tag fields are constructed into a vector set, and the feature vectors include keywords and feature weights.

As an preferable aspect of the internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, the manner of sending the tag field to the cloud database for retrieval in S8 further includes:

calculating the word frequency of the keywords through a preset formula, wherein the preset formula is as follows:

wherein L is_iIs word frequency, TF is word frequency, Ctotal is word total;

calculating an updating weight according to the word frequency and a preset part-of-speech weight;

and adjusting the feature weight by adopting the updating weight to obtain the optimized feature vector.

Compared with the prior art, the invention has the beneficial effects that:

the method is established on the basis of the edge algorithm of the Internet of things based on the multi-source heterogeneous data aggregation technology, and the analysis processing of data of a certain type or a certain field is optimized, so that the multi-source business data are processed in a combined mode, the data can be extracted according to a configuration mode, the data transmission to the cloud is greatly reduced, a large amount of equipment and operation and maintenance personnel cost is saved, the retrieval efficiency and accuracy of the multi-source heterogeneous data are effectively improved, and the integrity of the extracted data is improved.

Drawings

FIG. 1 is a schematic flow diagram of an edge algorithm of the Internet of things based on a multi-source heterogeneous data aggregation technology according to the present invention;

FIG. 2 is a schematic flow chart of a flow session feature vector and a regression model prediction vector of each training sample after computation and shunting of an Internet of things edge algorithm based on a multi-source heterogeneous data aggregation technology according to the present invention;

fig. 3 is a schematic flow chart of the internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, in which similar coding feature information in the coding feature information of each subset is combined through a state transition edge corresponding to a label type to obtain a streaming session feature vector of each training sample after being shunted.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Examples

Referring to fig. 1-3, the present invention provides a technical solution:

As a technical optimization scheme of the invention, in S2, the manner of analyzing and processing the multi-source internet-of-things heterogeneous data after standardized aggregation processing is filtering, similar merging and deduplication.

As a technical optimization scheme of the present invention, the S3 specifically includes the following steps:

As a technical optimization scheme of the present invention, the calculating of the streaming session feature vector and the regression model prediction vector of each training sample after splitting in S5 includes the following steps:

As a technical optimization scheme of the present invention, in S52, similar encoding characteristic information in the encoding characteristic information of each subset is processed by using a state transition edge corresponding to a label type

As a technical optimization scheme of the present invention, the cloud database in S7 further associates the received data with the tags in the tag group, and the associated tags form a cluster.

As a technical optimization scheme of the present invention, the cloud database in S7 further extracts and classifies the clusters to form a classification directory, and sends out the classification directory.

As a technical optimization scheme of the present invention, in the step S8, the tag field is sent to the cloud database for retrieval, in which feature vectors extracted from the tag field are constructed into a vector set, and the feature vectors include keywords and feature weights.

As a technical optimization scheme of the present invention, the manner of sending the tag field to the cloud database for retrieval in S8 further includes:

wherein L is_iIs word frequency, TF is word frequency, Ctotal is word total;

In summary, the invention is established on the basis of the edge algorithm of the internet of things based on the multi-source heterogeneous data aggregation technology, optimizes the analysis processing of certain types or certain fields of data, enables the multi-source business data to be processed jointly, can extract data according to the configuration mode, greatly reduces the data transmission to the cloud, saves a large amount of equipment and operation and maintenance personnel cost, effectively improves the retrieval efficiency and accuracy of the multi-source heterogeneous data, and improves the integrity of the extracted data.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An internet of things edge algorithm based on a multi-source heterogeneous data aggregation technology is characterized by comprising the following steps:

2. The internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology according to claim 1, wherein the manner of analyzing and processing the multi-source internet of things heterogeneous data subjected to the standardized aggregation in S2 is filtering, similar merging and de-duplication.

3. The internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology according to claim 1, wherein the S3 specifically includes the following steps:

4. The internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, according to claim 1, wherein the step of calculating the streaming session feature vector and the regression model prediction vector of each shunted training sample in S5 includes the following steps:

5. The Internet of things edge algorithm based on multi-source heterogeneous data aggregation technology according to claim 4, wherein similar encoding feature information in the encoding feature information of each subset is processed by a state transition edge corresponding to a label type in S52

6. The internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, according to claim 1, wherein the cloud database in the S7 further associates the received data with tags in the tag groups, and the associated tags form clusters.

7. The internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology according to claim 1, wherein the cloud database in S7 further extracts and classifies the clusters to form a classification directory, and sends out the classification directory.

8. The internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology, according to claim 1, in the step S8, the tag fields are sent to the cloud database for retrieval in a manner that feature vectors extracted from the tag fields are constructed into a vector set, and the feature vectors include keywords and feature weights.

9. The internet of things edge algorithm based on the multi-source heterogeneous data aggregation technology according to claim 1, wherein the manner of sending the tag field to the cloud database for retrieval in S8 further includes:

wherein L is_iIs word frequency, TF is word frequency, Ctotal is word total;