CN116226755A - Real-time data identification method based on big data - Google Patents

Real-time data identification method based on big data Download PDF

Info

Publication number
CN116226755A
CN116226755A CN202310520706.XA CN202310520706A CN116226755A CN 116226755 A CN116226755 A CN 116226755A CN 202310520706 A CN202310520706 A CN 202310520706A CN 116226755 A CN116226755 A CN 116226755A
Authority
CN
China
Prior art keywords
data
real
identification
data identification
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310520706.XA
Other languages
Chinese (zh)
Inventor
杨吉伟
许柳飞
杜伟豪
陈健斌
梁伟锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Weixin Zhilian Technology Co ltd
Original Assignee
Guangdong Weixin Zhilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Weixin Zhilian Technology Co ltd filed Critical Guangdong Weixin Zhilian Technology Co ltd
Priority to CN202310520706.XA priority Critical patent/CN116226755A/en
Publication of CN116226755A publication Critical patent/CN116226755A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to the technical field of real-time data identification, in particular to a real-time data identification method based on big data; setting a region to be detected, acquiring data to be detected generated in the interaction process of a terminal and a server, and extracting the characteristics of the data to be detected and the output probability of a Softmax function by using a convolutional neural network to obtain real-time data; setting data identifiers constructed by a deep learning neural network architecture to realize data identification, classifying real-time data by using a data identification pool, and combining a plurality of data identifiers to identify the obtained real-time data in the data identification pool through the neural network architecture and an identification model; specific data identification values are obtained according to the data identifier formulas in the data identification pools of different categories, and the data identification values are compared with the threshold range, so that the categories of the real-time data are judged, and the accuracy of identification is improved.

Description

Real-time data identification method based on big data
Technical Field
The invention relates to the field of real-time data identification, in particular to a real-time data identification method based on big data.
Background
With the development of technology and the arrival of big data at present, people can produce interactive data at any time between life work, can all produce a large amount of characters, image and audio data information each day, and a large amount of data information can't discern in real time in operation, probably has various maliciously or unknown data information, at this moment need carry out real-time discernment to data information, judge the type of data information to solve the problem of a large amount of interactive data in the life.
Disclosure of Invention
The invention aims to provide a real-time data identification method based on big data.
The aim of the invention can be achieved by the following technical scheme: a real-time data identification method based on big data, the method performing the steps of:
step one: acquiring data to be detected generated in the interaction process of the terminal and the server;
step two: extracting the characteristics of the data to be detected by using a convolutional neural network and obtaining real-time data through a Softmax function;
step three: setting a data identifier constructed by a deep learning neural network architecture, and placing the data identifier in a data identification pool;
step four: and identifying the real-time data by using the data identification pool, and judging the type of the real-time data.
Further, the process of obtaining the data to be detected generated in the process of interaction between the terminal and the server comprises the following steps:
acquiring interactive data generated between a terminal and a server through a region to be detected;
the database stores the acquired interactive data through a data table, each row of the data table stores a group of data, and the data stored in the database is used as the data to be detected.
Further, the process of judging the data to be detected as real-time data includes:
classifying the data to be detected, and respectively marking each type of data to be detected to form a training sample set;
the training sample set is sent to a convolutional neural network for training, a Softmax function is utilized to regress and output the corresponding actual output value of the marking data, and a cross entropy loss function is adopted to judge the error of the actual output value and the expected output value;
judging whether the data to be detected is real-time data or not according to the error between the actual output value and the expected output value;
the Softmax function regression processes the output y' =softmax (y x ) The cross entropy loss function is
Figure SMS_1
Wherein y is a desired output value, y' is an actual output value, and x is a mark corresponding to various data;
and setting an actual output value threshold range according to the cross entropy data, wherein the expected output value threshold range is (0, 1), when the actual output value Q is smaller than 0.5, the data to be detected is judged to be real-time data, and when the actual output value Q is larger than 0.5, the data to be detected is judged to be non-actual data.
Further, the process for realizing the identification of the real-time data comprises the following steps:
setting a data identifier constructed by a deep learning neural network architecture;
different kinds of data identification pools are arranged in the data identifier, real-time data are identified by using the different kinds of data identification pools, corresponding data identification values are output, and the types of the real-time data are judged according to the obtained data identification values;
the deep learning neural network architecture consists of a plurality of layers of random hidden variables, wherein the upper two layers are in undirected symmetrical connection, the lower layer is in top-down directed connection from the upper layer, the state of the bottommost unit is a visible input data vector, the deep learning neural network architecture consists of a plurality of structural units which are RBM, the number of visible layer neurons of each RBM unit in the stack is equal to that of hidden layer neurons of the previous RBM unit, the first layer RBM unit is trained by adopting an input sample according to a deep learning mechanism, the second layer RBM model is trained by utilizing the output of the first layer RBM unit, the RBM model is stacked to improve the model performance by adding layers, in the unsupervised pre-training process, after DBN codes are input to the top layer RBM, the state of the top layer is decoded to the bottommost unit, the input reconstruction is realized, and the RBM is taken as the structural unit of the DBN and shares parameters with each layer of the DBN.
Further, the types of the data identification pool include: an image data recognition pool, a text data recognition Chi Hesheng voice data recognition pool;
marking the data identification value output by the image data identification pool as an image data identification value;
marking the data identification value output by the text data identification pool as a text data identification value;
marking the data identification value output by the voice data identification pool as a voice data identification value;
the image data identifier uses the formula:
Figure SMS_2
the method comprises the steps of carrying out a first treatment on the surface of the Wherein, A is the data head mark of the data to be identified, and the value range is as follows: (1, 3), N is the number of the data to be identified, Q is the calculated image data identification value, P is the probability of occurrence of a certain data in the data to be identified, L is the bit number of a certain data in the data to be identified, y ij For the ordinate value, x of a certain point of a data matrix corresponding to certain data in the data to be identified ij The method comprises the steps that an abscissa value of a certain point of a data matrix corresponding to certain data in data to be identified is represented by phi, which is a gradient function;
the text data identifier uses the formula:
Figure SMS_3
the method comprises the steps of carrying out a first treatment on the surface of the Wherein, B is an adjustment coefficient, and the value range is: (20, 50), R is the character data identification value obtained by calculation;
the voice data recognitionThe formula of the device is:
Figure SMS_4
the method comprises the steps of carrying out a first treatment on the surface of the Wherein, C is the adjustment coefficient, and the value range is: (1, 5), W is the calculated voice data recognition value.
Further, the process of judging the real-time data category includes:
when the obtained data identification value is at the image threshold value, the real-time data is the image data;
when the obtained data identification value is at the text threshold value, the real-time data is text data;
when the obtained data identification value is at the sound threshold value, the real-time data is sound data.
Compared with the prior art, the invention has the beneficial effects that: the automation of the data identification technology is realized through a neural network model, wherein the characteristic of the data to be detected and the output probability of a Softmax function are extracted by utilizing a convolutional neural network to obtain real-time data, a data identifier constructed by a deep learning neural network architecture is arranged to realize data identification, a plurality of data identifiers are arranged, the plurality of data identifiers are combined for identification, specific numerical values are obtained according to the data identifier formulas of different categories, and the specific numerical values are compared with a value range, so that the identification category of the real-time data is judged, the data identification efficiency is ensured, and the identification accuracy is improved.
Drawings
Fig. 1 is a schematic diagram of the present invention.
Detailed Description
As shown in fig. 1, a real-time data identification method based on big data includes the following steps:
step one: acquiring data to be detected generated in the interaction process of the terminal and the server;
step two: extracting the characteristics of the data to be detected by using a convolutional neural network and obtaining real-time data through a Softmax function;
step three: setting a data identifier constructed by a deep learning neural network architecture, and placing the data identifier in a data identification pool;
step four: identifying the real-time data by using a data identification pool, and judging the type of the real-time data;
setting a region to be tested as interactive data generated by an online lottery activity, wherein a user performs the online lottery activity once through a terminal, the lottery activity group is a data interaction service, the user firstly needs to use a terminal login lottery page and then performs lottery operation, and obtains corresponding rewards after the lottery is completed;
in the process of interaction of a complete service, all generated data can be stored in a database in real time, generated big data is stored in the database, the database stores the big data through a data table according to a certain rule, such as login time, lottery operation time and time interval between login and lottery operation sent by the terminal, each row in a data table can store a group of data stored according to a certain rule, and one row of data in the data table can also be used as one piece of data to be detected.
The process for extracting the characteristics of the data to be detected and the output probability of the Soft max function by using the convolutional neural network to obtain the real-time data comprises the following steps:
firstly, classifying and marking different data acquired by data to be detected, manufacturing a training sample set, and taking the acquired login request time, the IP address of a terminal, a terminal login lottery page and a lottery operation time period as the data to be detected; the data to be detected is divided into real-time data and non-real-time data, and each type of data to be detected is respectively marked to form a training sample set;
inputting a training sample set into a convolutional neural network for training, constructing a convolutional neural network model, regressively outputting actual output values of various interactive data by using a Softmax function, and judging errors of the actual output values and expected output values by using a cross entropy loss function;
wherein the actual output value y' =softmax (y x ) The cross entropy loss function is
Figure SMS_5
Wherein y is a desired output value, y' is an actual output value, and x is a mark corresponding to various data to be detected;
finally, extracting the characteristics of the data set of the acquired data to be detected by using a convolutional neural network, and dividing the data to be detected into real-time data and non-real-time data according to the actual output value;
the sum of actual output values of various data to be detected is 1 by utilizing Softmax function regression, and when the error between the actual output value and the expected output value is smaller, the identification result is more accurate;
illustrating:
setting the number of data identifiers n=2, the expected output value p= (1, 0), the actual output value Q 1 =(0.5,0.2,0.3),Q 2 =(0.7,0.1,0.2),Q 1 ,Q 2 Respectively representing actual output values of different data identifiers, optimizing the data identifiers by calculating corresponding cross entropy, and outputting correct rate of the data to be detected corresponding to the size of the result;
the obtained accuracy is:
Figure SMS_6
Figure SMS_7
obtaining an error between the actual output value and the expected output value according to the accuracy, thereby judging whether the data to be detected is real-time data, namely
Setting an actual output value threshold according to the cross entropy data, and marking the actual output value threshold as K; in a specific implementation, k=0.5;
when the actual output value is less than or equal to K, the data to be detected is judged to be real-time data, and when the actual output value is more than K, the data to be detected is judged to be non-actual data.
It should be further noted that, in the implementation process, the training process for the neural network architecture includes:
the neural network architecture consists of a plurality of layers of random hidden variables, a bottom layer unit and a structural unit, wherein the plurality of layers of random hidden variables comprise an upper layer hidden variable and a lower layer hidden variable; the upper hidden variables are in unordered symmetrical connection, and the lower hidden variables are in directional connection;
the bottom layer unit is composed of a plurality of stacks of 2F structural units;
the structure units are RBM units, and the number of visible layer neurons of each RBM unit in the structure unit stack is equal to the number of hidden layer neurons of the previous RBM unit;
training a first-layer RBM unit through a training sample set according to a deep learning mechanism, and training a second-layer RBM unit by utilizing a training result of the first-layer RBM;
and stacking RBM units to improve the model performance by adding layers, and in the unsupervised training process, after training results of RBM units of each layer are input to RBM units of the top layer, decoding the RBM units of the top layer, transmitting the states of the RBM units to RBM units of the bottommost layer, and realizing the input reconstruction and sharing parameters with DBNs of each layer.
Identifying real-time data, wherein the types of the data identification pool comprise: an image data recognition pool, a text data recognition Chi Hesheng voice data recognition pool;
it should be further noted that, in the implementation process, the same content and the same type of real-time data are collected together to form a corresponding data matrix.
It should be further noted that, in the implementation process, the usage formula of the data identifier for identifying the image data is:
Figure SMS_8
wherein,,a is a data identification coefficient of the data to be identified, and the value range is as follows: (1, 3), N is the number of the data to be identified, Q is the calculated image data identification value, P is the probability of occurrence of certain data in the data to be identified, L is the data length of the data to be identified, y ij For the ordinate value, x of a certain point of a data matrix corresponding to certain data in the data to be identified ij And when the calculated image data identification value is within a set image identification threshold range, judging that the data identified by the image data identifier is the image data.
It should be further noted that, in the implementation process, the data identifier algorithm for identifying the text data is:
Figure SMS_9
wherein, B is an adjustment coefficient, and the value range is: and (20, 50), R is a calculated character data recognition value, and when the calculated character data recognition value is within a set character recognition threshold value range, the data recognized by the character data recognizer is judged to be character data.
It should be further noted that, in the implementation process, the data identifier algorithm for identifying the voice data is:
Figure SMS_10
wherein, C is the adjustment coefficient, and the value range is: (1, 5), W is the calculated voice data recognition value, and when the calculated voice data recognition value is within the set voice recognition threshold value range, the data recognized by the voice data recognizer is judged to be voice data.
When interactive data is generated in the online lottery activities, extracting the characteristics of the data to be detected and the output probability of a Softmax function from the interactive data by using a convolutional neural network to obtain real-time data, and performing category processing on the real-time data:
when the obtained data identification value is at the image threshold value, the real-time data is the image data;
when the obtained data identification value is at the text threshold value, the real-time data is text data;
when the obtained data identification value is at the sound threshold value, the real-time data is sound data.
Working principle: setting a region to be detected, acquiring data to be detected generated in the interaction process of a terminal and a server, extracting the characteristics of the data to be detected and the output probability of a Softmax function by using a convolutional neural network to obtain real-time data; setting a data identifier constructed by a deep learning neural network architecture to realize data identification; classifying and identifying the real-time data, classifying the real-time data into image data, text data and sound data according to the categories, obtaining identification values according to the data identifier formulas of different categories, and judging the categories of the real-time data according to the value ranges of the different categories;
and setting a plurality of data recognizers, combining the plurality of data recognizers for recognition, obtaining specific numerical values according to the data recognizer formulas of different categories, and comparing the specific numerical values with a value range, so as to judge the recognition category of the real-time data, ensure the data recognition efficiency and improve the recognition accuracy.
The above embodiments are only for illustrating the technical method of the present invention and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present invention may be modified or substituted without departing from the spirit and scope of the technical method of the present invention.

Claims (6)

1. A real-time data identification method based on big data, characterized in that the method performs the following steps:
step one: acquiring data to be detected generated in the interaction process of the terminal and the server;
step two: extracting the characteristics of the data to be detected by using a convolutional neural network and obtaining real-time data through a Softmax function;
step three: setting a data identifier constructed by a deep learning neural network architecture, and placing the data identifier in a data identification pool;
step four: and identifying the real-time data by using the data identification pool, and judging the type of the real-time data.
2. The real-time data identification method based on big data according to claim 1, wherein the obtaining process of the data to be detected generated in the interaction process of the terminal and the server comprises the following steps:
acquiring interactive data generated between a terminal and a server through a region to be detected;
the database stores the acquired interactive data through a data table, each row of the data table stores a group of data, and the interactive data stored in the database is used as data to be detected.
3. The real-time data identification method based on big data according to claim 2, wherein the process of judging the data to be detected as real-time data comprises:
classifying the data to be detected, and respectively marking each type of data to be detected to form a training sample set;
the training sample set is sent to a convolutional neural network for training, the Softmax function is used for regressing the output mark data to obtain a corresponding actual output value, and the cross entropy loss function is used for judging the error between the actual output value and the expected output value;
and judging whether the data to be detected is real-time data or not according to the error between the actual output value and the expected output value.
4. A real-time data identification method based on big data according to claim 3, wherein the identification process of real-time data is realized by:
setting a data identifier constructed by a deep learning neural network architecture;
different kinds of data identification pools are arranged in the data identifier, real-time data are identified by using the different kinds of data identification pools, corresponding data identification values are output, and the type of the real-time data is judged according to the obtained data identification values.
5. The real-time data identification method based on big data according to claim 4, wherein the kinds of the data identification pool include: an image data recognition pool, a text data recognition Chi Hesheng voice data recognition pool;
marking the data identification value output by the image data identification pool as an image data identification value;
marking the data identification value output by the text data identification pool as a text data identification value;
and marking the data identification value output by the voice data identification pool as a voice data identification value.
6. The method for real-time data identification based on big data according to claim 5, wherein the process of judging the type of the real-time data comprises:
when the obtained data identification value is at the image threshold value, the real-time data is the image data;
when the obtained data identification value is at the text threshold value, the real-time data is text data;
when the obtained data identification value is at the sound threshold value, the real-time data is sound data.
CN202310520706.XA 2023-05-10 2023-05-10 Real-time data identification method based on big data Pending CN116226755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310520706.XA CN116226755A (en) 2023-05-10 2023-05-10 Real-time data identification method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310520706.XA CN116226755A (en) 2023-05-10 2023-05-10 Real-time data identification method based on big data

Publications (1)

Publication Number Publication Date
CN116226755A true CN116226755A (en) 2023-06-06

Family

ID=86570084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310520706.XA Pending CN116226755A (en) 2023-05-10 2023-05-10 Real-time data identification method based on big data

Country Status (1)

Country Link
CN (1) CN116226755A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111934437A (en) * 2020-09-22 2020-11-13 中科全维科技(苏州)有限公司 Active power distribution network big data transmission method based on behavior mark and lightweight encryption
CN113642679A (en) * 2021-10-13 2021-11-12 山东凤和凰城市科技有限公司 Multi-type data identification method
CN115879514A (en) * 2022-12-06 2023-03-31 深圳大学 Method and device for improving class correlation prediction, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111934437A (en) * 2020-09-22 2020-11-13 中科全维科技(苏州)有限公司 Active power distribution network big data transmission method based on behavior mark and lightweight encryption
CN113642679A (en) * 2021-10-13 2021-11-12 山东凤和凰城市科技有限公司 Multi-type data identification method
CN115879514A (en) * 2022-12-06 2023-03-31 深圳大学 Method and device for improving class correlation prediction, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107506823B (en) Construction method of hybrid neural network model for dialog generation
CN112417099B (en) Method for constructing fraud user detection model based on graph attention network
CN110532379B (en) Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis
CN112633010A (en) Multi-head attention and graph convolution network-based aspect-level emotion analysis method and system
CN109993102A (en) Similar face retrieval method, apparatus and storage medium
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN107506350B (en) Method and equipment for identifying information
CN111709244A (en) Deep learning method for identifying causal relationship of contradictory dispute events
CN112686376A (en) Node representation method based on timing diagram neural network and incremental learning method
CN113191445A (en) Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN113591978A (en) Image classification method, device and storage medium based on confidence penalty regularization self-knowledge distillation
CN111079930B (en) Data set quality parameter determining method and device and electronic equipment
CN109886206B (en) Three-dimensional object identification method and equipment
CN114548106A (en) Method for recognizing science collaborative activity named entity based on ALBERT
CN110414586A (en) Antifalsification label based on deep learning tests fake method, device, equipment and medium
CN114220179A (en) On-line handwritten signature handwriting retrieval method and system based on faiss
CN113420291A (en) Intrusion detection feature selection method based on weight integration
CN116226755A (en) Real-time data identification method based on big data
CN110334080B (en) Knowledge base construction method for realizing autonomous learning
CN112766134A (en) Expression recognition method for enhancing class distinction
CN117112749A (en) RNN-driven intelligent customer service dialogue intention recognition method and system for electronic commerce
CN113642679B (en) Multi-type data identification method
CN115422945A (en) Rumor detection method and system integrating emotion mining
CN113963235A (en) Cross-category image recognition model reusing method and system
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230606