CN116226755A - Real-time data identification method based on big data - Google Patents
Real-time data identification method based on big data Download PDFInfo
- Publication number
- CN116226755A CN116226755A CN202310520706.XA CN202310520706A CN116226755A CN 116226755 A CN116226755 A CN 116226755A CN 202310520706 A CN202310520706 A CN 202310520706A CN 116226755 A CN116226755 A CN 116226755A
- Authority
- CN
- China
- Prior art keywords
- data
- real
- identification
- data identification
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000006870 function Effects 0.000 claims abstract description 19
- 238000013528 artificial neural network Methods 0.000 claims abstract description 13
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 13
- 238000013135 deep learning Methods 0.000 claims abstract description 12
- 230000003993 interaction Effects 0.000 claims abstract description 9
- 230000002452 interceptive effect Effects 0.000 claims description 11
- 230000000694 effects Effects 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
Abstract
The invention relates to the technical field of real-time data identification, in particular to a real-time data identification method based on big data; setting a region to be detected, acquiring data to be detected generated in the interaction process of a terminal and a server, and extracting the characteristics of the data to be detected and the output probability of a Softmax function by using a convolutional neural network to obtain real-time data; setting data identifiers constructed by a deep learning neural network architecture to realize data identification, classifying real-time data by using a data identification pool, and combining a plurality of data identifiers to identify the obtained real-time data in the data identification pool through the neural network architecture and an identification model; specific data identification values are obtained according to the data identifier formulas in the data identification pools of different categories, and the data identification values are compared with the threshold range, so that the categories of the real-time data are judged, and the accuracy of identification is improved.
Description
Technical Field
The invention relates to the field of real-time data identification, in particular to a real-time data identification method based on big data.
Background
With the development of technology and the arrival of big data at present, people can produce interactive data at any time between life work, can all produce a large amount of characters, image and audio data information each day, and a large amount of data information can't discern in real time in operation, probably has various maliciously or unknown data information, at this moment need carry out real-time discernment to data information, judge the type of data information to solve the problem of a large amount of interactive data in the life.
Disclosure of Invention
The invention aims to provide a real-time data identification method based on big data.
The aim of the invention can be achieved by the following technical scheme: a real-time data identification method based on big data, the method performing the steps of:
step one: acquiring data to be detected generated in the interaction process of the terminal and the server;
step two: extracting the characteristics of the data to be detected by using a convolutional neural network and obtaining real-time data through a Softmax function;
step three: setting a data identifier constructed by a deep learning neural network architecture, and placing the data identifier in a data identification pool;
step four: and identifying the real-time data by using the data identification pool, and judging the type of the real-time data.
Further, the process of obtaining the data to be detected generated in the process of interaction between the terminal and the server comprises the following steps:
acquiring interactive data generated between a terminal and a server through a region to be detected;
the database stores the acquired interactive data through a data table, each row of the data table stores a group of data, and the data stored in the database is used as the data to be detected.
Further, the process of judging the data to be detected as real-time data includes:
classifying the data to be detected, and respectively marking each type of data to be detected to form a training sample set;
the training sample set is sent to a convolutional neural network for training, a Softmax function is utilized to regress and output the corresponding actual output value of the marking data, and a cross entropy loss function is adopted to judge the error of the actual output value and the expected output value;
judging whether the data to be detected is real-time data or not according to the error between the actual output value and the expected output value;
the Softmax function regression processes the output y' =softmax (y x ) The cross entropy loss function isWherein y is a desired output value, y' is an actual output value, and x is a mark corresponding to various data;
and setting an actual output value threshold range according to the cross entropy data, wherein the expected output value threshold range is (0, 1), when the actual output value Q is smaller than 0.5, the data to be detected is judged to be real-time data, and when the actual output value Q is larger than 0.5, the data to be detected is judged to be non-actual data.
Further, the process for realizing the identification of the real-time data comprises the following steps:
setting a data identifier constructed by a deep learning neural network architecture;
different kinds of data identification pools are arranged in the data identifier, real-time data are identified by using the different kinds of data identification pools, corresponding data identification values are output, and the types of the real-time data are judged according to the obtained data identification values;
the deep learning neural network architecture consists of a plurality of layers of random hidden variables, wherein the upper two layers are in undirected symmetrical connection, the lower layer is in top-down directed connection from the upper layer, the state of the bottommost unit is a visible input data vector, the deep learning neural network architecture consists of a plurality of structural units which are RBM, the number of visible layer neurons of each RBM unit in the stack is equal to that of hidden layer neurons of the previous RBM unit, the first layer RBM unit is trained by adopting an input sample according to a deep learning mechanism, the second layer RBM model is trained by utilizing the output of the first layer RBM unit, the RBM model is stacked to improve the model performance by adding layers, in the unsupervised pre-training process, after DBN codes are input to the top layer RBM, the state of the top layer is decoded to the bottommost unit, the input reconstruction is realized, and the RBM is taken as the structural unit of the DBN and shares parameters with each layer of the DBN.
Further, the types of the data identification pool include: an image data recognition pool, a text data recognition Chi Hesheng voice data recognition pool;
marking the data identification value output by the image data identification pool as an image data identification value;
marking the data identification value output by the text data identification pool as a text data identification value;
marking the data identification value output by the voice data identification pool as a voice data identification value;
the image data identifier uses the formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein, A is the data head mark of the data to be identified, and the value range is as follows: (1, 3), N is the number of the data to be identified, Q is the calculated image data identification value, P is the probability of occurrence of a certain data in the data to be identified, L is the bit number of a certain data in the data to be identified, y ij For the ordinate value, x of a certain point of a data matrix corresponding to certain data in the data to be identified ij The method comprises the steps that an abscissa value of a certain point of a data matrix corresponding to certain data in data to be identified is represented by phi, which is a gradient function;
the text data identifier uses the formula:the method comprises the steps of carrying out a first treatment on the surface of the Wherein, B is an adjustment coefficient, and the value range is: (20, 50), R is the character data identification value obtained by calculation;
the voice data recognitionThe formula of the device is:the method comprises the steps of carrying out a first treatment on the surface of the Wherein, C is the adjustment coefficient, and the value range is: (1, 5), W is the calculated voice data recognition value.
Further, the process of judging the real-time data category includes:
when the obtained data identification value is at the image threshold value, the real-time data is the image data;
when the obtained data identification value is at the text threshold value, the real-time data is text data;
when the obtained data identification value is at the sound threshold value, the real-time data is sound data.
Compared with the prior art, the invention has the beneficial effects that: the automation of the data identification technology is realized through a neural network model, wherein the characteristic of the data to be detected and the output probability of a Softmax function are extracted by utilizing a convolutional neural network to obtain real-time data, a data identifier constructed by a deep learning neural network architecture is arranged to realize data identification, a plurality of data identifiers are arranged, the plurality of data identifiers are combined for identification, specific numerical values are obtained according to the data identifier formulas of different categories, and the specific numerical values are compared with a value range, so that the identification category of the real-time data is judged, the data identification efficiency is ensured, and the identification accuracy is improved.
Drawings
Fig. 1 is a schematic diagram of the present invention.
Detailed Description
As shown in fig. 1, a real-time data identification method based on big data includes the following steps:
step one: acquiring data to be detected generated in the interaction process of the terminal and the server;
step two: extracting the characteristics of the data to be detected by using a convolutional neural network and obtaining real-time data through a Softmax function;
step three: setting a data identifier constructed by a deep learning neural network architecture, and placing the data identifier in a data identification pool;
step four: identifying the real-time data by using a data identification pool, and judging the type of the real-time data;
setting a region to be tested as interactive data generated by an online lottery activity, wherein a user performs the online lottery activity once through a terminal, the lottery activity group is a data interaction service, the user firstly needs to use a terminal login lottery page and then performs lottery operation, and obtains corresponding rewards after the lottery is completed;
in the process of interaction of a complete service, all generated data can be stored in a database in real time, generated big data is stored in the database, the database stores the big data through a data table according to a certain rule, such as login time, lottery operation time and time interval between login and lottery operation sent by the terminal, each row in a data table can store a group of data stored according to a certain rule, and one row of data in the data table can also be used as one piece of data to be detected.
The process for extracting the characteristics of the data to be detected and the output probability of the Soft max function by using the convolutional neural network to obtain the real-time data comprises the following steps:
firstly, classifying and marking different data acquired by data to be detected, manufacturing a training sample set, and taking the acquired login request time, the IP address of a terminal, a terminal login lottery page and a lottery operation time period as the data to be detected; the data to be detected is divided into real-time data and non-real-time data, and each type of data to be detected is respectively marked to form a training sample set;
inputting a training sample set into a convolutional neural network for training, constructing a convolutional neural network model, regressively outputting actual output values of various interactive data by using a Softmax function, and judging errors of the actual output values and expected output values by using a cross entropy loss function;
Wherein y is a desired output value, y' is an actual output value, and x is a mark corresponding to various data to be detected;
finally, extracting the characteristics of the data set of the acquired data to be detected by using a convolutional neural network, and dividing the data to be detected into real-time data and non-real-time data according to the actual output value;
the sum of actual output values of various data to be detected is 1 by utilizing Softmax function regression, and when the error between the actual output value and the expected output value is smaller, the identification result is more accurate;
illustrating:
setting the number of data identifiers n=2, the expected output value p= (1, 0), the actual output value Q 1 =(0.5,0.2,0.3),Q 2 =(0.7,0.1,0.2),Q 1 ,Q 2 Respectively representing actual output values of different data identifiers, optimizing the data identifiers by calculating corresponding cross entropy, and outputting correct rate of the data to be detected corresponding to the size of the result;
the obtained accuracy is:
obtaining an error between the actual output value and the expected output value according to the accuracy, thereby judging whether the data to be detected is real-time data, namely
Setting an actual output value threshold according to the cross entropy data, and marking the actual output value threshold as K; in a specific implementation, k=0.5;
when the actual output value is less than or equal to K, the data to be detected is judged to be real-time data, and when the actual output value is more than K, the data to be detected is judged to be non-actual data.
It should be further noted that, in the implementation process, the training process for the neural network architecture includes:
the neural network architecture consists of a plurality of layers of random hidden variables, a bottom layer unit and a structural unit, wherein the plurality of layers of random hidden variables comprise an upper layer hidden variable and a lower layer hidden variable; the upper hidden variables are in unordered symmetrical connection, and the lower hidden variables are in directional connection;
the bottom layer unit is composed of a plurality of stacks of 2F structural units;
the structure units are RBM units, and the number of visible layer neurons of each RBM unit in the structure unit stack is equal to the number of hidden layer neurons of the previous RBM unit;
training a first-layer RBM unit through a training sample set according to a deep learning mechanism, and training a second-layer RBM unit by utilizing a training result of the first-layer RBM;
and stacking RBM units to improve the model performance by adding layers, and in the unsupervised training process, after training results of RBM units of each layer are input to RBM units of the top layer, decoding the RBM units of the top layer, transmitting the states of the RBM units to RBM units of the bottommost layer, and realizing the input reconstruction and sharing parameters with DBNs of each layer.
Identifying real-time data, wherein the types of the data identification pool comprise: an image data recognition pool, a text data recognition Chi Hesheng voice data recognition pool;
it should be further noted that, in the implementation process, the same content and the same type of real-time data are collected together to form a corresponding data matrix.
It should be further noted that, in the implementation process, the usage formula of the data identifier for identifying the image data is:;
wherein,,a is a data identification coefficient of the data to be identified, and the value range is as follows: (1, 3), N is the number of the data to be identified, Q is the calculated image data identification value, P is the probability of occurrence of certain data in the data to be identified, L is the data length of the data to be identified, y ij For the ordinate value, x of a certain point of a data matrix corresponding to certain data in the data to be identified ij And when the calculated image data identification value is within a set image identification threshold range, judging that the data identified by the image data identifier is the image data.
It should be further noted that, in the implementation process, the data identifier algorithm for identifying the text data is:
wherein, B is an adjustment coefficient, and the value range is: and (20, 50), R is a calculated character data recognition value, and when the calculated character data recognition value is within a set character recognition threshold value range, the data recognized by the character data recognizer is judged to be character data.
It should be further noted that, in the implementation process, the data identifier algorithm for identifying the voice data is:
wherein, C is the adjustment coefficient, and the value range is: (1, 5), W is the calculated voice data recognition value, and when the calculated voice data recognition value is within the set voice recognition threshold value range, the data recognized by the voice data recognizer is judged to be voice data.
When interactive data is generated in the online lottery activities, extracting the characteristics of the data to be detected and the output probability of a Softmax function from the interactive data by using a convolutional neural network to obtain real-time data, and performing category processing on the real-time data:
when the obtained data identification value is at the image threshold value, the real-time data is the image data;
when the obtained data identification value is at the text threshold value, the real-time data is text data;
when the obtained data identification value is at the sound threshold value, the real-time data is sound data.
Working principle: setting a region to be detected, acquiring data to be detected generated in the interaction process of a terminal and a server, extracting the characteristics of the data to be detected and the output probability of a Softmax function by using a convolutional neural network to obtain real-time data; setting a data identifier constructed by a deep learning neural network architecture to realize data identification; classifying and identifying the real-time data, classifying the real-time data into image data, text data and sound data according to the categories, obtaining identification values according to the data identifier formulas of different categories, and judging the categories of the real-time data according to the value ranges of the different categories;
and setting a plurality of data recognizers, combining the plurality of data recognizers for recognition, obtaining specific numerical values according to the data recognizer formulas of different categories, and comparing the specific numerical values with a value range, so as to judge the recognition category of the real-time data, ensure the data recognition efficiency and improve the recognition accuracy.
The above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.
Claims (6)
1. A real-time data identification method based on big data, characterized in that the method performs the following steps:
step one: acquiring data to be detected generated in the interaction process of the terminal and the server;
step two: extracting the characteristics of the data to be detected by using a convolutional neural network and obtaining real-time data through a Softmax function;
step three: setting a data identifier constructed by a deep learning neural network architecture, and placing the data identifier in a data identification pool;
step four: and identifying the real-time data by using the data identification pool, and judging the type of the real-time data.
2. The real-time data identification method based on big data according to claim 1, wherein the obtaining process of the data to be detected generated in the interaction process of the terminal and the server comprises the following steps:
acquiring interactive data generated between a terminal and a server through a region to be detected;
the database stores the acquired interactive data through a data table, each row of the data table stores a group of data, and the interactive data stored in the database is used as data to be detected.
3. The real-time data identification method based on big data according to claim 2, wherein the process of judging the data to be detected as real-time data comprises:
classifying the data to be detected, and respectively marking each type of data to be detected to form a training sample set;
the training sample set is sent to a convolutional neural network for training, the Softmax function is used for regressing the output mark data to obtain a corresponding actual output value, and the cross entropy loss function is used for judging the error between the actual output value and the expected output value;
and judging whether the data to be detected is real-time data or not according to the error between the actual output value and the expected output value.
4. A real-time data identification method based on big data according to claim 3, wherein the identification process of real-time data is realized by:
setting a data identifier constructed by a deep learning neural network architecture;
different kinds of data identification pools are arranged in the data identifier, real-time data are identified by using the different kinds of data identification pools, corresponding data identification values are output, and the type of the real-time data is judged according to the obtained data identification values.
5. The real-time data identification method based on big data according to claim 4, wherein the kinds of the data identification pool include: an image data recognition pool, a text data recognition Chi Hesheng voice data recognition pool;
marking the data identification value output by the image data identification pool as an image data identification value;
marking the data identification value output by the text data identification pool as a text data identification value;
and marking the data identification value output by the voice data identification pool as a voice data identification value.
6. The method for real-time data identification based on big data according to claim 5, wherein the process of judging the type of the real-time data comprises:
when the obtained data identification value is at the image threshold value, the real-time data is the image data;
when the obtained data identification value is at the text threshold value, the real-time data is text data;
when the obtained data identification value is at the sound threshold value, the real-time data is sound data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310520706.XA CN116226755A (en) | 2023-05-10 | 2023-05-10 | Real-time data identification method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310520706.XA CN116226755A (en) | 2023-05-10 | 2023-05-10 | Real-time data identification method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116226755A true CN116226755A (en) | 2023-06-06 |
Family
ID=86570084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310520706.XA Pending CN116226755A (en) | 2023-05-10 | 2023-05-10 | Real-time data identification method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116226755A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111934437A (en) * | 2020-09-22 | 2020-11-13 | 中科全维科技(苏州)有限公司 | Active power distribution network big data transmission method based on behavior mark and lightweight encryption |
CN113642679A (en) * | 2021-10-13 | 2021-11-12 | 山东凤和凰城市科技有限公司 | Multi-type data identification method |
CN115879514A (en) * | 2022-12-06 | 2023-03-31 | 深圳大学 | Method and device for improving class correlation prediction, computer equipment and storage medium |
-
2023
- 2023-05-10 CN CN202310520706.XA patent/CN116226755A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111934437A (en) * | 2020-09-22 | 2020-11-13 | 中科全维科技(苏州)有限公司 | Active power distribution network big data transmission method based on behavior mark and lightweight encryption |
CN113642679A (en) * | 2021-10-13 | 2021-11-12 | 山东凤和凰城市科技有限公司 | Multi-type data identification method |
CN115879514A (en) * | 2022-12-06 | 2023-03-31 | 深圳大学 | Method and device for improving class correlation prediction, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107506823B (en) | Construction method of hybrid neural network model for dialog generation | |
CN112417099B (en) | Method for constructing fraud user detection model based on graph attention network | |
CN110532379B (en) | Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis | |
CN112633010A (en) | Multi-head attention and graph convolution network-based aspect-level emotion analysis method and system | |
CN109993102A (en) | Similar face retrieval method, apparatus and storage medium | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
CN107506350B (en) | Method and equipment for identifying information | |
CN111709244A (en) | Deep learning method for identifying causal relationship of contradictory dispute events | |
CN112686376A (en) | Node representation method based on timing diagram neural network and incremental learning method | |
CN113191445A (en) | Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm | |
CN113591978A (en) | Image classification method, device and storage medium based on confidence penalty regularization self-knowledge distillation | |
CN111079930B (en) | Data set quality parameter determining method and device and electronic equipment | |
CN109886206B (en) | Three-dimensional object identification method and equipment | |
CN114548106A (en) | Method for recognizing science collaborative activity named entity based on ALBERT | |
CN110414586A (en) | Antifalsification label based on deep learning tests fake method, device, equipment and medium | |
CN114220179A (en) | On-line handwritten signature handwriting retrieval method and system based on faiss | |
CN113420291A (en) | Intrusion detection feature selection method based on weight integration | |
CN116226755A (en) | Real-time data identification method based on big data | |
CN110334080B (en) | Knowledge base construction method for realizing autonomous learning | |
CN112766134A (en) | Expression recognition method for enhancing class distinction | |
CN117112749A (en) | RNN-driven intelligent customer service dialogue intention recognition method and system for electronic commerce | |
CN113642679B (en) | Multi-type data identification method | |
CN115422945A (en) | Rumor detection method and system integrating emotion mining | |
CN113963235A (en) | Cross-category image recognition model reusing method and system | |
CN109308565B (en) | Crowd performance grade identification method and device, storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230606 |