CN114861834B - Method for continuously updating data information of big data storage system - Google Patents

Method for continuously updating data information of big data storage system Download PDF

Info

Publication number
CN114861834B
CN114861834B CN202210776234.XA CN202210776234A CN114861834B CN 114861834 B CN114861834 B CN 114861834B CN 202210776234 A CN202210776234 A CN 202210776234A CN 114861834 B CN114861834 B CN 114861834B
Authority
CN
China
Prior art keywords
data
module
communication protocol
data information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210776234.XA
Other languages
Chinese (zh)
Other versions
CN114861834A (en
Inventor
魏俊杰
蓝岸
何翼
熊黄
庄辉
黄松杰
郑裕豪
黄金田
梁焯源
黄莹涛
覃俊华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen News Network Media Co ltd
Original Assignee
Shenzhen News Network Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen News Network Media Co ltd filed Critical Shenzhen News Network Media Co ltd
Priority to CN202210776234.XA priority Critical patent/CN114861834B/en
Publication of CN114861834A publication Critical patent/CN114861834A/en
Application granted granted Critical
Publication of CN114861834B publication Critical patent/CN114861834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention discloses a method for continuously updating data information of a big data storage system, which relates to the technical field of measurement and control and solves the technical problem that the data information is updated, monitored, updated and evaluated in the data storage process of the big data storage system. The adopted technical scheme is that the invention receives data information at a speed of at least 24bps/s through a compatible data interface and identifies the data information through an information identification module; the isolation of the updated data information is realized through a data isolation module, and the updated data information is calculated through an improved Bayesian algorithm model; the data information early warning is realized by updating the early warning function, and the invention can dynamically evaluate the updated data information in real time in the process of storing data in the big data storage system, thereby improving the monitoring capability of updating and monitoring the stored data in the big data storage system and monitoring the dynamic data.

Description

Method for continuously updating data information of big data storage system
Technical Field
The invention relates to the technical field of measurement and control, in particular to a method for continuously updating data information by a big data storage system.
Background
"big data" generally refers to data sets that are large in size, difficult to collect, process, analyze, and also refers to data that is stored in a traditional infrastructure for a long period of time. Big data storage is the persistence of these data sets into the computer. With the explosive growth of big data application, a big data storage system increasingly stores a great deal of data information, and in the prior art, the big data storage system can continuously input the data information, cannot realize the updating calculation of the data information, cannot realize the abnormal early warning and the communication updating evaluation in the data updating process, so that the data information monitoring of the big data storage system cannot be dynamically realized.
Disclosure of Invention
Aiming at the technical defects, the invention discloses a method for continuously updating data information by a big data storage system, which can dynamically evaluate and update the data information in real time in the data storage process of the big data storage system, and improves the storage data updating monitoring capability and the dynamic data monitoring capability of the big data storage system.
In order to achieve the technical effects, the invention adopts the following technical scheme:
a method for continuously updating data information by a big data storage system comprises the following steps:
step one, receiving data information at a speed of at least 24bps/s through a compatible data interface, and identifying the data information through an information identification module;
in this step, the compatible data interface is an interface compatible with TCP/IP communication protocol, RS485 communication protocol, RS232 communication protocol, Modbus communication protocol, HTTP communication protocol, XMPP communication protocol, WIA-PA communication protocol, PLC communication protocol and serial communication protocol,
the information identification module comprises a communication interface, a communication protocol decoding module, a communication protocol matching module and a data output terminal, wherein the output end of the communication interface is connected with the input end of the communication protocol decoding module, the output end of the communication protocol decoding module is connected with the input end of the communication protocol matching module, and the output end of the communication protocol matching module is connected with the input end of the data output terminal;
secondly, realizing the isolation of the updated data information through a data isolation module, and calculating the updated data information through an improved Bayesian algorithm model;
in this step, the data isolation module comprises a main control module, a memory connected with the main control module, a parallel computing module, a checking module, a shielding module and a communication network interface;
in this step, the improved bayesian algorithm model comprises a data input module, a network node module, a classification module, a search module and a data output module, wherein an output end of the data input module is connected with an input end of the network node module, an output end of the network node module is connected with an input end of the classification module, an output end of the classification module is connected with an input end of the search module, and an output end of the search module is connected with an input end of the data output module; the input end of the data input module receives the calculation updating data information, the input end of the data input module is connected with the information identification module, and the output end of the data output terminal is connected with the communication interface;
thirdly, data information early warning is achieved by updating an early warning function;
in this step, the update early warning function is an improved orthogonalization function to realize early warning of update data information.
In the first step, the method for realizing data identification by the information identification module comprises the following steps:
the method comprises the steps of receiving updated data information through a communication interface, starting a communication protocol decoding module, enabling the communication protocol decoding module to decode information to be a TCP/IP communication protocol, an RS485 communication protocol, an RS232 communication protocol, a Modbus communication protocol, an HTTP communication protocol, an XMPP communication protocol, a WIA-PA communication protocol, a PLC communication protocol or a serial communication protocol, enabling data information matching through matching of a protocol frame length, a protocol frame header, a protocol frame tail and a data cache speed through a communication protocol matching module, locking a communication mode and further enabling data information output through a data output terminal.
In the second step, the master control module is a master control chip based on EP4CE115F29C7N,
in the second step, the method for data isolation by the data isolation module comprises the following steps:
under the control of an EP4CE115F29C7N main control chip, a check module realizes data check on data information input into a memory through a cyclic redundancy check code, polynomial of the information code is shifted to the left by k bits through generating a polynomial, addition and subtraction are carried out according to the number of the bits, the obtained remainder is a check code, a parallel calculation module realizes data information calculation through a python code, the calculated abnormal data information is shielded through a shielding module, and the calculated normal data information is output through a communication network interface.
In the second step, the working method of the improved Bayesian algorithm model is as follows:
step (1), constructing a classifier model through a classification module, wherein the model function is as follows:
Figure 10060DEST_PATH_IMAGE001
(1)
in the formula (1), the first and second groups,Prepresenting a Bayesian network model;C k representing a big data type;k representsThe kind of big data type(s) in the big data type,i represents a large data sequence number,
Figure 822158DEST_PATH_IMAGE002
a large data set representing all the input data information,Nrepresents a sample classification in a large data set, P: (x i |C k ) Denoted as the maximum probability value for the big data class.
Step (2), calculating and updating fault data in data information by constructing a classifier model; calculated by the following formula:
Figure 402175DEST_PATH_IMAGE003
(2)
in the formula (2), the first and second groups,k=1,2,···,NNc k representing the number of samples not updated in all big data training samples;
optimizing the Bayesian network model iterative training process by adopting a seagull algorithm, supposing that a big data sample which is not updated is subjected to global search, avoiding path conflict between samples, and having a search module function as follows:
Figure 678436DEST_PATH_IMAGE004
(3)
in the formula (3), the first and second groups,
Figure 52916DEST_PATH_IMAGE005
indicating that a search is being conducted for a sample location of updated data information,
Figure 680207DEST_PATH_IMAGE006
in (1)
Figure 532756DEST_PATH_IMAGE007
A large data set representing the input data information,Arepresenting the spatial extent of movement of the sample in a given search space,H x indicating the current position of the sea sample,trepresenting the current iteration number;
and (3) the method function for searching the data information by the searching module is as follows:
Figure 866785DEST_PATH_IMAGE008
(4)
in the formula (4), M represents the position where gull data information is searched,H bx indicating the location of the sample relative to the best neighbor search during the data update process,H bx b in (2) represents the best neighbor searchAn identification in the location of the sample of the cable,𝐵representing a constant parameter influencing data information updating in the data updating process;
big data update location functionW x Expressed as:
Figure 536801DEST_PATH_IMAGE009
(5)
in the formula (5), the first and second groups,
Figure 526754DEST_PATH_IMAGE010
and with
Figure 42049DEST_PATH_IMAGE011
The upper arrow of (2) represents a moving direction vector in the data updating process, and the big data is relative to the optimal position before updating in the iterative training process;
step (4), diagnosing fault data information in the data updating process;
big datax 1 ,x 2 ,···,x n Are conditionally independent of each other, assuming that the failure data information in the updating process isCThen the fault diagnosis function output is:
Figure 230585DEST_PATH_IMAGE012
(6)
in the formula (6), the first and second groups,
Figure 681289DEST_PATH_IMAGE013
representing fault data information in an update process
Figure 283171DEST_PATH_IMAGE014
In big data sets
Figure 539840DEST_PATH_IMAGE015
The probability of occurrence of (a) in (b),P(x i |C k ) Expressed as the maximum probability value for the big data category, where:
Figure 520566DEST_PATH_IMAGE016
(7)
in the formula (7), the first and second groups,
Figure 532384DEST_PATH_IMAGE017
indicating that in the iterative computation process, a failure data probability set is output every time a big data update is computed,
Figure 231350DEST_PATH_IMAGE018
represents a training sample P: (C k ) While satisfying big data input but not updating attributesx i If not present
Figure 291710DEST_PATH_IMAGE018
If yes, it indicates that all big data in the big data storage system are updated, and if yes, it indicates that all big data in the big data storage system are updated
Figure 251576DEST_PATH_IMAGE018
If the data is not updated in the big data storage system, the training sample data update fault function is evolved as follows:
Figure 309662DEST_PATH_IMAGE019
(8)
in the third step, the updating of the early warning function realizes the early warning of the updating data information through the orthogonalization function with the positioning function.
In the above embodiment, the working method for updating the warning function is as follows:
the big data updating input positioning is realized through an MTG positioning function, and the fault data positioning function in the data information updating process is as follows:
Figure 886136DEST_PATH_IMAGE020
(9)
in the formula (9), the reaction mixture,
Figure 687870DEST_PATH_IMAGE021
indicating the location of data communication during the updating of the data information,
Figure 439926DEST_PATH_IMAGE022
which represents the operation period of the data update,
Figure 59126DEST_PATH_IMAGE023
indicating a data communication protocol in the data update operation,
Figure 732684DEST_PATH_IMAGE024
indicating the influence coefficient of the fault factor in the data update,
Figure 197163DEST_PATH_IMAGE025
indicating a data update operation period in the ith communication protocol;
constructing an orthogonalization function, and expressing fault information in the process of updating data information in a matrix form, wherein the orthogonalization function is expressed as follows:
Figure 995269DEST_PATH_IMAGE026
(10)
in the formula (10), the first and second groups,
Figure 457474DEST_PATH_IMAGE027
the representation of the function of orthogonalization is,
Figure 477382DEST_PATH_IMAGE028
the data updating indexes in all communication protocols are shown to be mutually overlapped functions,
Figure 620919DEST_PATH_IMAGE029
the average value after iterative computation of the fault data in the process of updating the data information is shown, and fault diagnosis is carried out on the updated data information diagnosis result after iterative computation;
outputting data information to an early warning output function by the orthogonalization function, wherein the early warning output function is as follows:
Figure 675463DEST_PATH_IMAGE030
(11)
in the formula (11), the reaction mixture,
Figure 246252DEST_PATH_IMAGE031
a function representing the output of the early warning is shown,
Figure 691140DEST_PATH_IMAGE032
indicating the category in the big data type, i is the communication protocol ordering,Nrepresenting the classification of samples in a large data set,
Figure 763001DEST_PATH_IMAGE033
represents the average value after iterative computation in the ith communication protocol in the data updating process,
Figure 547418DEST_PATH_IMAGE034
representing the stability of the orthogonalization function during operation.
The invention has the beneficial and positive effects that:
different from the conventional technology, the invention receives data information at a speed of at least 24bps/s through a compatible data interface and identifies the data information through an information identification module; the isolation of the updated data information is realized through a data isolation module, and the updated data information is calculated through an improved Bayesian algorithm model; the data information early warning is realized by updating the early warning function, and the invention can dynamically evaluate the updated data information in real time in the process of storing data in the big data storage system, thereby improving the monitoring capability of updating and monitoring the stored data in the big data storage system and monitoring the dynamic data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive exercise, wherein:
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of an information recognition module according to the present invention;
FIG. 3 is a schematic diagram of an improved Bayesian model architecture in the present invention;
FIG. 4 is a schematic diagram of a data isolation module architecture according to the present invention;
FIG. 5 is a schematic diagram of an embodiment of the improved Bayesian algorithm model in the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
As shown in fig. 1, a method for continuously updating data information in a big data storage system includes:
step one, receiving data information at a speed of at least 24bps/s through a compatible data interface, and identifying the data information through an information identification module;
in this step, the compatible data interface is an interface compatible with TCP/IP communication protocol, RS485 communication protocol, RS232 communication protocol, Modbus communication protocol, HTTP communication protocol, XMPP communication protocol, WIA-PA communication protocol, PLC communication protocol and serial communication protocol,
the information identification module comprises a communication interface, a communication protocol decoding module, a communication protocol matching module and a data output terminal, wherein the output end of the communication interface is connected with the input end of the communication protocol decoding module, the output end of the communication protocol decoding module is connected with the input end of the communication protocol matching module, and the output end of the communication protocol matching module is connected with the input end of the data output terminal;
secondly, realizing the isolation of the updated data information through a data isolation module, and calculating the updated data information through an improved Bayesian algorithm model;
in the step, the data isolation module comprises a main control module, a memory connected with the main control module, a parallel computing module, a checking module, a shielding module and a communication network interface;
in this step, the improved bayesian algorithm model comprises a data input module, a network node module, a classification module, a search module and a data output module, wherein an output end of the data input module is connected with an input end of the network node module, an output end of the network node module is connected with an input end of the classification module, an output end of the classification module is connected with an input end of the search module, and an output end of the search module is connected with an input end of the data output module; the input end of the data input module receives the calculation updating data information, the input end of the data input module is connected with the information identification module, and the output end of the data output terminal is connected with the communication interface;
thirdly, data information early warning is achieved by updating an early warning function;
in this step, the update early warning function is an improved orthogonalization function to achieve early warning of update data information.
In the first step, the method for realizing data identification by the information identification module comprises the following steps:
the method comprises the steps of receiving updated data information through a communication interface, starting a communication protocol decoding module, enabling the communication protocol decoding module to decode information to be a TCP/IP communication protocol, an RS485 communication protocol, an RS232 communication protocol, a Modbus communication protocol, an HTTP communication protocol, an XMPP communication protocol, a WIA-PA communication protocol, a PLC communication protocol or a serial communication protocol, enabling data information matching through matching of a protocol frame length, a protocol frame header, a protocol frame tail and a data cache speed through a communication protocol matching module, locking a communication mode and further enabling data information output through a data output terminal.
In a specific embodiment, in the data communication process, no matter which communication mode is adopted, such as common Wi-Fi, RFID, NFC, ZigBee, Bluetooth, LoRa, NB-IoT, GSM, GPRS, 3/4/5G network, Ethernet, RS232, RS485, USB, etc., in the communication process, there are corresponding communication protocols and codes, and a decryptor corresponding to these communication protocols is required to implement decoding of different communication protocols, and the decoding process is a process of selecting a communication protocol. In a further embodiment, each communication mode has a corresponding communication code. Therefore, the communication protocol decoding module can realize decoding of different communication protocols under the control of the programmable controller.
In the specific embodiment, the communication protocols are different, and the data information transfer speeds are different, for example, the data transfer speed of the SATA100 interface hard disk is 100MB/S, the data transfer speed of the SATA150 interface hard disk is 100MB/S, and the data transfer speed of the Ultra320 SCSI interface hard disk is 320 MB/S.
In the specific embodiment, after the length of the protocol frame is successfully matched, matching verification is carried out on the transmitted data again through the content of the frame tail; the initial position of the communication protocol frame tail in the communication protocol data; taking out the next frame tail data of the communication protocol, and judging whether the frame tail data is the same as the data of the position of the frame tail in the buffer area; and if the data of the frame tail is the same as the data of the position of the frame tail in the buffer area, caching the matched data information. If the data of the frame tail is different from the data of the position of the frame tail in the buffer area, deleting the communication protocol from the protocol queue.
In the specific embodiment, if the comparison of all frame tail data of the communication protocol is finished, taking RS485 data reception as an example, in the data receiving process, the problem of error frame interference in the communication process is removed through frame header full content matching; the integrity of the communication protocol data is ensured through frame length matching; the method has the advantages that the correctness of the communication protocol data is improved through frame tail matching, the communication protocol matching template is a model established in software according to communication protocol information defined in an ICD, the communication protocol matching template is established in an XML file manual writing mode according to an RS485 bus common ICD format, and data receiving is carried out through an RS485 communication board card; before receiving RS485 data, each transmission channel of the communication board card needs to be initialized according to communication parameters of each component or subsystem, and the specific parameters of initialization include: baud rate, data bits, stop bits, and check bits; because the complex information system contains more internal components and needs to adopt a multi-transmission channel mode to transmit information, a data buffer area needs to be established for each transmission channel; after the received RS485 data is cached in each channel, matching is carried out according to the transmission channel field in the communication protocol matching template, and if the transmission channel field of a certain protocol in the communication protocol matching template is the same as the data transmission channel received this time, the transmission channel matching is considered to be successful.
In the second step, the master control module is a master control chip based on EP4CE115F29C7N,
in the second step, the method for data isolation by the data isolation module comprises the following steps:
under the control of an EP4CE115F29C7N main control chip, a check module realizes data check on data information input into a memory through a cyclic redundancy check code, polynomial of the information code is shifted to the left by k bits through generating a polynomial, addition and subtraction are carried out according to the number of the bits, the obtained remainder is a check code, a parallel calculation module realizes data information calculation through a python code, the calculated abnormal data information is shielded through a shielding module, and the calculated normal data information is output through a communication network interface.
In a specific embodiment, a forward isolation device and a reverse isolation device are used, so that the system meets the requirement of quick and real-time communication, and simultaneously needs to support various network communication protocols, ensure that big data information can be accurately sent and received, realize one-way communication between a system intranet and an external network, and support monitoring and isolation of data transmission message instructions in a big data storage system. The research takes PFGA as a development and design platform of an isolation device, can have higher operation speed, uses EP4CE115F29C7N as a main control chip of the isolation device, has a plurality of embedded memories, is loaded with 2 communication network interfaces, has a main control frequency as high as 200MHz, performs and operates in parallel among different logic blocks, and has better parallel processing capability when processing network communication.
In a specific embodiment, a cyclic redundancy check code detection technology is used in the check module, the detection capability is stronger, the transmitted user data bit sequence is used as a coefficient of a polynomial, and the generated polynomial is different in remainder when the transmission is wrong. Transmitting terminalThe transmitted big data information shifts the information code polynomial to the left by generating the polynomialkAnd performing bitwise addition and subtraction operation to obtain a remainder as a check code. And an output interface of the sending module is used as a receiving interface of the checking module, and the checking code is generated and then output to the data sending module. The isolation module plays an important role in the one-way isolation channel, judges communication data and inquires whether the received IP address is in a credible safe communication range. Judging the message type and the danger level according to the received communication message, if the system communication network is attacked, the message type is a high-risk instruction, and the big data storage system replaces the user communication data with an error code Err _ code [7.0 ]]The output is sent out after passing through the checking module, and computer operation and maintenance workers can timely receive the information to carry out network security maintenance work.
In a specific embodiment, a PRAM (Random Access Machine) model, also referred to as a shared-memory SIMD model, is an abstract Parallel computing model that is developed directly from a serial RAM model. In this model, it is assumed that there is a shared memory with infinite capacity, there are finite or infinite processors with the same function, and they all have simple arithmetic operation and logic judgment functions, and the processors can exchange data with each other through the shared memory unit at any time. According to the limitation of simultaneous reading and simultaneous writing of the shared memory unit by the processor, the PRAM model is synchronous, which means that all instructions operate in a lockstep mode, and although a user cannot feel the existence of synchronization, the existence of synchronization is time-consuming and cannot reflect the asynchronization of many systems in reality;
in a specific embodiment, the shielding module realizes data information isolation through a time threshold, a frequency domain or a communication wave band. Data masking is different than restricting data access. Access restrictions make the data invisible. The data mask replaces the vulnerable or sensitive data with information that appears to be authentic. When the data is masked, it will be modified so that the basic format remains inconvenient.
In the second step, the working method of the improved Bayesian algorithm model is as follows:
in the specific embodiment, the Seagull algorithm is adopted to optimize a Naive Bayes (Seagull Optimization Naive Bayes) algorithm, and the SONB algorithm can comprehensively analyze big data and early warning information and can be rapidly updated continuously, so that the real-time user service function of the big data storage system is realized. Assuming that the conditions between the attribute variables of the SONB algorithm classifier are independent, each attribute node is only associated with class node C. Due to the reduction of the network layers of the SONB algorithm, the complexity of establishing the Bayesian network model is exponentially reduced.
Step (1), constructing a classifier model through a classification module, wherein the model function is as follows:
Figure 679322DEST_PATH_IMAGE035
(1)
in the formula (1), the first and second groups of the compound,Prepresenting a Bayesian network model;C k representing a big data type;k representsThe kind of big data type(s) in the big data type,i represents a large data sequence number,
Figure 549189DEST_PATH_IMAGE036
a large data set representing all the input data information,Nrepresents the sample classification in a large data set, P: (x i |C k ) Represented as the maximum probability value for the big data category.
Step (2), calculating and updating fault data in data information by constructing a classifier model; calculated by the following formula:
Figure 159162DEST_PATH_IMAGE037
(2)
in the formula (2), the first and second groups,k=1,2,···,NNc k representing the number of samples not updated in all big data training samples; when the number of attributes is very large, the maximum posterior probability is calculatedP(X|C k ) In time, the calculation cost is increased, and in order to reduce the influence of the situation, the seagull algorithm is adopted to optimize the Bayesian network modelIn the iterative training process, if a large data sample which is not updated is subjected to global search, path conflict is avoided between the samples, and the search module function is as follows:
Figure 63664DEST_PATH_IMAGE038
(3)
in the formula (3), the first and second groups,
Figure 100890DEST_PATH_IMAGE039
indicating that a search is being conducted for a sample location of updated data information,
Figure 723632DEST_PATH_IMAGE040
in (1)
Figure 137296DEST_PATH_IMAGE041
A large data set representing the input data information,Arepresenting the spatial extent of movement of the sample in a given search space,H x indicating the current position of the sea sample,trepresenting the current iteration number;
and (3) after the conflict between the adjacent big data samples is avoided, the searching module moves towards the direction of the best adjacent sample center, and the method function of the searching module for searching the data information is as follows:
Figure 896305DEST_PATH_IMAGE042
(4)
in the formula (4), M represents the position where seagull data information is searched,H bx indicating the location of the sample relative to the best neighbor search during the data update process,H bx b in (a) represents the identity in the best neighbor search sample location,𝐵representing a constant parameter influencing data information updating in the data updating process; continuously updating sample positions by continuously updating all big data samples to tend to run to the corresponding optimal adjacent sample centers, and continuously updating the position function of the big dataW x Expressed as:
Figure 104432DEST_PATH_IMAGE043
(5)
in the formula (5), the first and second groups,
Figure 745629DEST_PATH_IMAGE044
and
Figure 572771DEST_PATH_IMAGE045
the upper arrow of (2) represents a moving direction vector in the data updating process, and the big data is relative to the optimal position before updating in the iterative training process;
step (4), diagnosing fault data information in the data updating process;
big datax 1 ,x 2 ,···,x n Are conditionally independent of each other, assuming that the failure data information in the updating process isCThen the fault diagnosis function output is:
Figure 310920DEST_PATH_IMAGE046
(6)
in the formula (6), the first and second groups,
Figure 830894DEST_PATH_IMAGE047
representing fault data information during an update
Figure 287283DEST_PATH_IMAGE048
In big data sets
Figure 918116DEST_PATH_IMAGE049
The probability of occurrence of (a) in (b),P(x i |C k ) Expressed as the maximum probability value for the big data category, where:
Figure 776350DEST_PATH_IMAGE050
(7)
in the formula (7), the first and second groups,
Figure 467226DEST_PATH_IMAGE051
indicating that in the iterative computation process, a failure data probability set is output every time a big data update is computed,
Figure 145332DEST_PATH_IMAGE052
represents a training sample P: (C k ) While satisfying big data input but not updating attributesx i If not present
Figure 376593DEST_PATH_IMAGE053
If yes, it indicates that all big data in the big data storage system are updated, and if yes, it indicates that all big data in the big data storage system are updated
Figure 699121DEST_PATH_IMAGE054
If the data is not updated in the big data storage system, the training sample data update fault function is evolved as follows:
Figure 357635DEST_PATH_IMAGE055
(8)
according to the calculation principle, the probability P (X & ltcalculation & gt) of each big data category of the SONB algorithm networkC k )×P(C k ) Probability value C k Updating the sample class for outputting the corresponding big data, and performing a continuous updating process through algorithm iterative training.
In the third step, the updating early warning function realizes the early warning of the updating data information through the orthogonalization function with the positioning function.
In the above embodiment, the working method for updating the warning function is as follows:
the big data updating input positioning is realized through an MTG positioning function, and the fault data positioning function in the data information updating process is as follows:
Figure 523037DEST_PATH_IMAGE056
(9)
in the formula (9), the reaction mixture,
Figure 495673DEST_PATH_IMAGE057
indicating the location of data communication during the updating of the data information,
Figure 735024DEST_PATH_IMAGE058
which represents the operation period of the data update,
Figure 892336DEST_PATH_IMAGE059
indicating a data communication protocol in the data update operation,
Figure 420400DEST_PATH_IMAGE060
indicating the influence coefficient of the fault factor in the data update,
Figure 321360DEST_PATH_IMAGE061
indicating a data update operation period in the ith communication protocol;
constructing an orthogonalization function, and expressing fault information in the process of updating data information in a matrix form, wherein the orthogonalization function is expressed as follows:
Figure 884060DEST_PATH_IMAGE062
(10)
in the formula (10), the first and second groups,
Figure 618798DEST_PATH_IMAGE027
the representation of the function of orthogonalization is,
Figure 758792DEST_PATH_IMAGE063
the data updating indexes in all communication protocols are shown to be mutually overlapped functions,
Figure 73230DEST_PATH_IMAGE029
the average value after iterative computation of the fault data in the process of updating the data information is shown, and fault diagnosis is carried out on the updated data information diagnosis result after iterative computation;
outputting data information to an early warning output function by the orthogonalization function, wherein the early warning output function is as follows:
Figure 615070DEST_PATH_IMAGE064
(11)
in the formula (11), the reaction mixture,
Figure 723971DEST_PATH_IMAGE065
the function of the early warning output is represented,
Figure 351261DEST_PATH_IMAGE032
indicating the category in the big data type, i is the communication protocol ordering,Nrepresenting the classification of samples in a large data set,
Figure 469390DEST_PATH_IMAGE033
represents the average value after iterative computation in the ith communication protocol in the data updating process,
Figure 131316DEST_PATH_IMAGE034
representing the stability of the orthogonalization function during operation.
In a specific embodiment, MTG (collectively: Multiple-Trigger Generator) means a multi-Trigger Generator. The error range is selected to be +/-0.02, and the communication error of 20-40 meters can be calculated.
In a specific embodiment, Schmidt orthogonalization (Schmidt orthogonalization) is a method of finding the euclidean space orthogonal basis. Vector set alpha free of linear independence from Euclidean space 1 ,α 2 ,……,α m Starting from the vector, a set of orthogonal vectors β is obtained 1 ,β 2 ,……,β m Is caused by alpha 1 ,α 2 ,……,α m And vector set beta 1 ,β 2 ,……,β m Equivalently, each vector in the orthogonal vector group is unitized to obtain a standard orthogonal vector group, and the method is called Schmitt orthogonalization. By the method, big data updating early warning can be realized. And after the data information is diagnosed, early warning output is realized through an early warning output function.
In a particular embodiment, the schmidt orthogonalized matrix writing (MATLAB) simulation function may be:
Figure 676698DEST_PATH_IMAGE066
in a specific embodiment, a theoretical basis of the updating early warning function is related to various data information such as a data communication position, a data updating operation period, a data communication protocol, a fault factor influence coefficient, stability of a data updating operation period and an orthogonalization function in a communication protocol in the process of updating data information, and in the process of overlapping functions of data updating indexes in the communication protocol, an average value after iterative computation of fault data in the process of updating data information by different data information can also reflect an early warning parameter.
In a specific embodiment, the early warning output function is related to the communication state, the communication protocol and the stability of the big data algorithm model in the process, and when the early warning output function outputs certain fault data information, the early warning performance can be prompted to a user in a sound early warning mode, a signal early warning mode, a data early warning mode or a display early warning mode, so that the early warning capability is improved.
Although specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are merely illustrative and that various omissions, substitutions and changes in the form of the detail of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the above-described methods to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims (4)

1. A method for continuously updating data information of a big data storage system is characterized by comprising the following steps: the method comprises the following steps:
step one, receiving data information at a speed of at least 24bps/s through a compatible data interface, and identifying the data information through an information identification module;
in this step, the compatible data interface is an interface compatible with TCP/IP communication protocol, RS485 communication protocol, RS232 communication protocol, Modbus communication protocol, HTTP communication protocol, XMPP communication protocol, WIA-PA communication protocol, PLC communication protocol and serial communication protocol,
the information identification module comprises a communication interface, a communication protocol decoding module, a communication protocol matching module and a data output terminal, wherein the output end of the communication interface is connected with the input end of the communication protocol decoding module, the output end of the communication protocol decoding module is connected with the input end of the communication protocol matching module, and the output end of the communication protocol matching module is connected with the input end of the data output terminal;
secondly, realizing the isolation of the updated data information through a data isolation module, and calculating the updated data information through an improved Bayesian algorithm model;
in this step, the data isolation module comprises a main control module, a memory connected with the main control module, a parallel computing module, a checking module, a shielding module and a communication network interface;
in this step, the improved bayesian algorithm model comprises a data input module, a network node module, a classification module, a search module and a data output module, wherein an output end of the data input module is connected with an input end of the network node module, an output end of the network node module is connected with an input end of the classification module, an output end of the classification module is connected with an input end of the search module, and an output end of the search module is connected with an input end of the data output module; the input end of the data input module receives the calculation updating data information, the input end of the data input module is connected with the information identification module, and the output end of the data output terminal is connected with the communication interface; thirdly, data information early warning is achieved by updating an early warning function;
in this step, the update early warning function is an improved orthogonalization function to realize early warning of update data information;
the working method of the improved Bayesian algorithm model comprises the following steps:
step (1), constructing a classifier model through a classification module, wherein the model function is as follows:
Figure DEST_PATH_IMAGE001
(1)
in the formula (1), the reaction mixture is,Prepresenting a Bayesian network model;C k representing a big data type;k representsThe kind of big data type(s) in the big data type,i represents The sequence number of the large data is,
Figure 772714DEST_PATH_IMAGE002
a large data set representing all the input data information,Nrepresents the sample classification in a large data set, P: (x i |C k ) Expressed as a big data category maximum probability value;
step (2), calculating and updating fault data in data information by constructing a classifier model; calculated by the following formula:
Figure 741807DEST_PATH_IMAGE003
(2)
in the formula (2), the reaction mixture is,k=1,2,···,NNc k representing the number of samples not updated in all big data training samples;
optimizing the Bayesian network model iterative training process by adopting a seagull algorithm, supposing that a big data sample which is not updated is subjected to global search, avoiding path conflict between samples, and having a search module function as follows:
Figure 693582DEST_PATH_IMAGE004
(3)
in the formula (3), the reaction mixture is,
Figure 697310DEST_PATH_IMAGE005
indicating that a search is being conducted for a sample location of updated data information,
Figure 631332DEST_PATH_IMAGE006
in (1)
Figure 87721DEST_PATH_IMAGE007
A large data set representing the input data information,Arepresenting the spatial extent of movement of the sample in a given search space,H x indicating the current position of the sea sample,trepresenting the current iteration number;
and (3) the method function for searching the data information by the searching module is as follows:
Figure 577608DEST_PATH_IMAGE008
(4)
in the equation (4), M represents a position where gull data information is searched,H bx indicating the location of the sample relative to the best neighbor search during the data update process,H bx b in (a) represents the identity in the best neighbor search sample position,𝐵representing a constant parameter influencing data information updating in the data updating process;
big data update location functionW x Expressed as:
Figure 170263DEST_PATH_IMAGE009
(5)
in the formula (5), the reaction mixture is,
Figure 782510DEST_PATH_IMAGE010
and with
Figure 460616DEST_PATH_IMAGE011
The upper arrow of (2) represents a moving direction vector in the data updating process, and the big data is relative to the optimal position before updating in the iterative training process;
step (4), diagnosing fault data information in the data updating process;
big datax 1 ,x 2 ,···,x n Are conditionally independent of each other, assuming that the failure data information in the updating process isCThen the fault diagnosis function output is:
Figure 521239DEST_PATH_IMAGE012
(6)
in the formula (6), the reaction mixture is,
Figure 233980DEST_PATH_IMAGE013
representing fault data information during an update
Figure 751549DEST_PATH_IMAGE014
In large data sets
Figure 182530DEST_PATH_IMAGE015
The probability of occurrence of (a) in (b),P(x i |C k ) Representing a maximum probability value for a big data category, wherein:
Figure 279799DEST_PATH_IMAGE016
(7)
in the formula (7), the reaction mixture is,
Figure 847047DEST_PATH_IMAGE017
indicating that in the iterative computation process, a failure data probability set is output every time a big data update is computed,
Figure 738779DEST_PATH_IMAGE018
represents a training sample P: (C k ) While satisfying big data input but not updating attributesx i If not present
Figure 952329DEST_PATH_IMAGE018
If yes, it indicates that all big data in the big data storage system are updated, and if yes, it indicates that all big data in the big data storage system are updated
Figure 853289DEST_PATH_IMAGE018
If the data is not updated in the big data storage system, the training sample data update fault function is evolved as follows:
Figure 275043DEST_PATH_IMAGE019
(8)
in the third step, the updating early warning function realizes the early warning of the updating data information through an orthogonalization function with a positioning function; the working method for updating the early warning function comprises the following steps:
the big data updating input positioning is realized through an MTG positioning function, and the fault data positioning function in the data information updating process is as follows:
Figure 337677DEST_PATH_IMAGE020
(9)
in the formula (9), the reaction mixture is,
Figure 477672DEST_PATH_IMAGE021
indicating the location of data communication during the updating of the data information,
Figure DEST_PATH_IMAGE022
which represents the operation period of the data update,
Figure 11683DEST_PATH_IMAGE023
indicating a data communication protocol in the data update operation,
Figure 287944DEST_PATH_IMAGE024
indicating the influence coefficient of the fault factor in the data update,
Figure 583796DEST_PATH_IMAGE025
is shown in
Figure DEST_PATH_IMAGE026
A data update operation period in each communication protocol;
constructing an orthogonalization function, and expressing fault information in the process of updating data information in a matrix form, wherein the orthogonalization function is expressed as follows:
Figure 742245DEST_PATH_IMAGE027
(10)
in the formula (10), the compound represented by the formula (10),
Figure 719428DEST_PATH_IMAGE028
the representation of the function of orthogonalization is,
Figure 416907DEST_PATH_IMAGE029
the data updating indexes in all communication protocols are shown to be mutually overlapped functions,
Figure 821343DEST_PATH_IMAGE030
the average value after iterative computation of the fault data in the process of updating the data information is shown, and fault diagnosis is carried out on the updated data information diagnosis result after iterative computation;
outputting data information to an early warning output function by the orthogonalization function, wherein the early warning output function is as follows:
Figure 935930DEST_PATH_IMAGE031
(11)
in the formula (11), the reaction mixture is,
Figure 513542DEST_PATH_IMAGE032
the function of the early warning output is represented,
Figure 29974DEST_PATH_IMAGE033
indicating the kind in the big data type,
Figure 605312DEST_PATH_IMAGE026
the order is given to the communication protocol(s),
Figure 207194DEST_PATH_IMAGE034
representing the classification of samples in a large data set,
Figure 89962DEST_PATH_IMAGE035
indicating that the data is in the process of updating
Figure 460900DEST_PATH_IMAGE026
Iteratively calculated averages in each communication protocol,
Figure 472719DEST_PATH_IMAGE036
representing the stability of the orthogonalization function during operation.
2. The method for continuously updating data information of a big data storage system according to claim 1, wherein: the method for realizing data identification by the information identification module comprises the following steps:
the method comprises the steps of receiving updated data information through a communication interface, starting a communication protocol decoding module, enabling the communication protocol decoding module to decode the information to be a TCP/IP communication protocol, an RS485 communication protocol, an RS232 communication protocol, a Modbus communication protocol, an HTTP communication protocol, an XMPP communication protocol, a WIA-PA communication protocol, a PLC communication protocol or a serial communication protocol, enabling data information matching to be achieved through matching of the length of a protocol frame, the header of the protocol frame, the tail of the protocol frame and the data caching speed through a communication protocol matching module, locking a communication mode, and further enabling data information output through a data output terminal.
3. The method for continuously updating data information of a big data storage system according to claim 1, wherein: the main control module is a main control chip based on EP4CE115F29C 7N.
4. The method for continuously updating data information of a big data storage system according to claim 1, wherein: the method for isolating the data by the data isolation module comprises the following steps:
under the control of an EP4CE115F29C7N main control chip, a check module realizes data check on data information input into a memory through a cyclic redundancy check code, polynomial of the information code is shifted to the left by k bits through generating a polynomial, addition and subtraction are carried out according to the number of the bits, the obtained remainder is a check code, a parallel calculation module realizes data information calculation through a python code, the calculated abnormal data information is shielded through a shielding module, and the calculated normal data information is output through a communication network interface.
CN202210776234.XA 2022-07-04 2022-07-04 Method for continuously updating data information of big data storage system Active CN114861834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210776234.XA CN114861834B (en) 2022-07-04 2022-07-04 Method for continuously updating data information of big data storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210776234.XA CN114861834B (en) 2022-07-04 2022-07-04 Method for continuously updating data information of big data storage system

Publications (2)

Publication Number Publication Date
CN114861834A CN114861834A (en) 2022-08-05
CN114861834B true CN114861834B (en) 2022-09-30

Family

ID=82626495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210776234.XA Active CN114861834B (en) 2022-07-04 2022-07-04 Method for continuously updating data information of big data storage system

Country Status (1)

Country Link
CN (1) CN114861834B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101783752A (en) * 2010-02-10 2010-07-21 哈尔滨工业大学 Network security quantitative estimation method based on network topology characteristic
CN103888312A (en) * 2014-03-04 2014-06-25 京信通信系统(广州)有限公司 Alarm method and device of pre-distortion system
CN111612160A (en) * 2020-05-26 2020-09-01 吉林大学 Incremental Bayesian network learning method based on particle swarm optimization algorithm
CN113836374A (en) * 2020-06-08 2021-12-24 上海政谱科技有限公司 Real-time government affair data processing system based on big data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7672240B2 (en) * 2006-12-14 2010-03-02 Sun Microsystems, Inc. Method and system for using Bayesian network inference for selection of transport protocol algorithm
CN106348119B (en) * 2016-09-20 2020-03-20 广州特种机电设备检测研究院 Isolated elevator operation safety monitoring system and method based on Internet of things

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101783752A (en) * 2010-02-10 2010-07-21 哈尔滨工业大学 Network security quantitative estimation method based on network topology characteristic
CN103888312A (en) * 2014-03-04 2014-06-25 京信通信系统(广州)有限公司 Alarm method and device of pre-distortion system
CN111612160A (en) * 2020-05-26 2020-09-01 吉林大学 Incremental Bayesian network learning method based on particle swarm optimization algorithm
CN113836374A (en) * 2020-06-08 2021-12-24 上海政谱科技有限公司 Real-time government affair data processing system based on big data

Also Published As

Publication number Publication date
CN114861834A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN102866971B (en) Device, the system and method for transmission data
CN107547574B (en) Communication system and method based on universal protocol
WO2021189844A1 (en) Detection method and apparatus for multivariate kpi time series, and device and storage medium
US11374688B2 (en) Data transmission method and device for intelligent driving vehicle, and device
GB2604552A (en) Fusing multimodal data using recurrent neural networks
US20230099117A1 (en) Spiking neural network-based data processing method, computing core circuit, and chip
US11892955B2 (en) System and method for bypass memory read request detection
CN114861834B (en) Method for continuously updating data information of big data storage system
CN114660561A (en) AIS and radar track association method and device
CN117235745A (en) Deep learning-based industrial control vulnerability mining method, system, equipment and storage medium
CN110297926B (en) On-orbit configuration method of satellite-borne image processing device
CN114722388B (en) Database data information security monitoring method
CN115221135B (en) Sharing method and system for industrial Internet data
JP2020057362A (en) Information processing apparatus, information processing circuit, information processing system, and information processing method
CN115328753A (en) Fault prediction method and device, electronic equipment and storage medium
CN104679687B (en) A kind of method and device for identifying interrupt source
CN109388371B (en) Data sorting method, system, co-processing device and main processing device
Bian et al. Disturbances prediction of bit error rate for high-speed railway Balise transmission through persistent state mapping
CN117596598B (en) Unmanned aerial vehicle communication protocol data anomaly detection method
KR102485570B1 (en) Method for sequence-based intrusion detection using dbc file
CN114399034B (en) Data handling method for direct memory access device
CN111338318B (en) Method and apparatus for detecting anomalies
US20220171628A1 (en) Techniques to encode and decode for character class matching
Harlow ANOMALY DETECTION FOR THE MIL-STD-1553B MULTIPLEX DATA BUS USING AN LSTM AUTOENCODER
KR20120010256A (en) Sphere detector performing depth-first search until terminated

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant