Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to overcome the above problems in the prior art, an embodiment of the present invention provides a method for storing monitoring data, and the inventive concept is as follows:
1. in order to solve the problem of effective retrieval of real-time acquired data in a complex mariculture environment, a Redis + Hbase database Redis used as a memory database for storing real-time data and performing real-time query, and the Hbase database stores historical information; the Redis + Hbase database and the Redis are used as the memory database, so that the method has the advantages of high reading and writing speed, high query efficiency, capability of supporting concurrent access and the like, can realize real-time information query, and can well process semi-structured data. The Hbase database is particularly suitable for inquiring and storing massive and simple-structure data, and the data with low heat degree is migrated into the Hbase data, so that massive data storage can be met, and real-time information inquiry can be met.
2. According to the real-time effect of mariculture environment data, HBase is often used for carrying out maximum benefit migration on historical data in Redis regularly, the large quantity of data of the Internet of things and the complex structure are considered, in order to get rid of manual migration and improve migration efficiency, a self-adaptive concurrent data migration algorithm is designed, self-adaptive data migration of the Redis database is achieved, load of the Redis database is balanced, and a load-balanced self-adaptive data migration method is constructed.
3. The method has the advantages that the Bayesian network model is built to predict the searching heat of the monitoring data, the monitoring data with low future searching heat are selected according to the prediction result and transferred to the Hbase database, and normal searching of the monitoring data can be effectively guaranteed.
Fig. 1 is a schematic flow chart of a method for storing monitoring data according to an embodiment of the present invention, as shown in fig. 1, including:
s101, receiving monitoring data acquired by a monitoring terminal in real time and taking the monitoring data as target data, and then judging whether the migration zone bit is in a locked state at the current moment.
It can be understood that redis a key value database (non-relational), the data query of the database is faster than that of a relational database, and the database is also a database based on a memory, the influence of the I/O efficiency is small, in a marine culture environment, because the monitoring terminals are various in types and are far more arranged than general land culture or pond culture, the quantity of data generated by the monitoring terminals in real time is huge, and the method is particularly suitable for storing the monitoring data in real time by using the redis database.
In the embodiment of the invention, the monitoring data generated in real time generally needs to be stored in a redis database, but because the order of magnitude of storage of the redis database is smaller, in order to store massive marine culture monitoring data, the embodiment of the invention indicates whether the monitoring data is directly placed in the redis database or placed in a temporary database by creating the migration flag bit, and specifically, after receiving the monitoring data acquired by the monitoring terminal in real time, whether the migration flag bit is in a locked state at the current moment needs to be judged
S102, if the migration flag bit is determined to be in a locked state, storing the target data to a temporary database, and generating a Bayesian network model according to the search frequency of each monitoring data in a redis database in a preset time period; and the Bayesian network model is used for representing the search probability of each monitoring data in the redis database in the preset time period.
It should be noted that, when the migration flag bit is locked, it means that the memory of the redis database is already occupied too much and is not suitable for continuously writing data, and at this time, the target data needs to be written into the temporary database. It is to be understood that the temporary database is another database that can store data, and the type of the temporary database is not limited by the embodiment of the present invention. If the migration flag bit is determined to be in the non-locked state, it is indicated that the memory of the redis database currently has a sufficient storage space, and the memory is suitable for continuously writing data, so that the target data is directly stored in the redis database.
S103, calculating the search probability of each monitoring data in the redis database in a future time period according to the Bayesian network model, selecting a certain amount of monitoring data with lower search probability in the future time period to form a migration data set, and migrating the migration data set to the Hbase database, so that the available storage capacity of the redis database is greater than a first preset threshold value.
It should be noted that after the migration flag is determined to be in the locked state, a part of the monitored data in the redis database needs to be transferred to the Hbase database, and it is understood that the Hbase (hadoop database) database is used to read and write a large-scale data set in real time. The method is particularly suitable for simple data writing (such as 'message class' application) and query of massive and simple-structure data (such as 'detailed class' application). Is particularly suitable for sparse surfaces. The HBase database is a distributed storage system with high reliability, high performance, column orientation and scalability, and a large-scale structured storage cluster can be built on a cheap PC Server by utilizing the HBase technology.
The embodiment of the invention adopts a Redis + Hbase database mode to store the monitoring data, and the Redis is used as a memory database, so that the method and the device have the advantages of high reading and writing speed, high query efficiency, support of concurrent access and the like, can realize real-time information query, and can well process semi-structured data. The Hbase database is particularly suitable for inquiring and storing massive and simple-structure data, and the data with low heat degree is migrated into the Hbase data, so that massive data storage can be met, and real-time information inquiry can be met.
In the embodiment of the invention, the redis database is used for storing real-time monitoring data, and the purpose is to facilitate searching and analyzing the real-time monitoring data by a worker, so that determining which monitoring data in the redis database needs to be transferred to the Hbase database is one of important problems to be solved by the embodiment of the invention.
The bayesian network model reflects a probability relation among data in a data set, and includes two important elements, namely a node set and a directed edge, wherein each node represents a state variable, the directed edge represents a dependency relation among the variables, and the association strength or the confidence degree among the variables is described through a Conditional Probability Table (CPT).
In mariculture, the water area is wide, and the variety of cultured fishes is various. And the user can check the real-time culture data of a certain water area for some reasons, and the randomness is strong. However, data nodes in the water area are numerous. The ordinary search is performed for a large number of data nodes, which takes too long. Therefore, the heat is set according to the query times of the data nodes, and the time is undoubtedly saved by searching based on the heat. The water area nodes are numerous and irregular, so the heat is set by adopting a Bayesian network. The core of the bayesian network is conditional probability, which is essentially to establish association constraint relation between other nodes by using prior knowledge. The Bayesian network is used for reasoning the query heat of a certain node in the water area, and the prior knowledge is used for determining the query probability of the following node so as to quantitatively describe the rule and accord with the objective rule of the world.
The method utilizes the Bayesian network model to deduce the possibility of searching the monitoring data acquired by each monitoring terminal, and the possibility is used as the basis for guiding the screening of the monitoring data to the Hbase database, so that the method has the advantage of high accuracy.
S104, updating the migration flag bit to be in a non-locking state, then transferring the monitoring data stored in the temporary database to a redis database, judging whether the available storage capacity of the redis database is lower than a second preset threshold, and if the available storage capacity of the redis database is lower than the second preset threshold, updating the migration flag bit to be in a locking state;
it should be noted that, after the monitoring data is migrated from the redis database to the Hbase database, some storage space must be restored in the redis database, and at this time, the migration flag needs to be updated, and then the monitoring data stored in the temporary database is transferred to the redis database. After the transfer, the available storage capacity in the redis database needs to be judged again, if the available storage capacity is lower than a second preset threshold value at the moment, if the available storage capacity is lower than the second preset threshold value, it indicates that the storage space in the redis database is not enough to store the monitoring data acquired in real time at the next moment, and the transfer flag needs to be updated to the locking state again; if the value is higher than the second preset threshold, the storage space in the redis database is enough to store the monitoring data acquired in real time at the next moment, and the migration flag bit is still in a non-locking state.
On the basis of the foregoing embodiments, as an optional embodiment, the monitoring data used for constructing the bayesian network model in the present invention is the monitoring data stored in the redis database between the time when the last migration flag bit is updated to the non-locked state and the time when the last migration flag bit is updated to the locked state.
It can be understood that the state of the transition flag is always in the non-locked state and the locked state, and obviously, in the initial state, the transition flag is in the non-locked state. The bayesian network model of the embodiment of the invention is continuously updated, and particularly, after the target data is received each time and the migration flag is found to be in a locked state, the bayesian network model needs to be updated (or reconstructed). The monitoring data required by the Bayesian network model is the monitoring data in the redis database from the moment when the last migration zone bit is updated to be in the non-locking state to the moment when the last migration zone bit is updated to be in the locking state. By means of the method, the search condition of the redis database in the latest period of time can be referred to each time the search heat is predicted.
On the basis of the above embodiments, as an alternative, the bayesian network model is represented as BQG ═ N, E, W, T, where,
n is a node set, and N is S, U, A and O; wherein S ═ { S ═ Si1., M }, monitoring data node siThe monitoring data is used for representing whether the monitoring data collected by the ith monitoring terminal is planned to be searched in the preset time period or not, and the value of the monitoring data can be 0 or 1; a ═ aiI ═ 1., M }, search for node aiFor characterizing a monitoring data node siWhether searched as planned, its value may be 0 or 1; o ═ Oi1., M }, evidence node oiFor characterizing whether to search for the node aiVerified, its value can be 0 or 1;
e is a set of directed edges, E ═ E
1∪E
2∪E
3};
The monitoring data which represents the collection of the ith monitoring terminal is searched in the preset time period plan;
representing a monitoring data node s
iWhether the search is a planned search is searched;
indicating that the occurrence of the data query can be posteriorized by some evidence nodes;
w is the weight on E, W ═ W1,w2,w3) (ii) a Wherein w1Attached to a directed edge E1Indicating the probability that a subsequent search for the search node a may occur under a certain monitored data node s; the probability of the search occurring is often related to the relevance of the node, i.e. the greater the relevance of the node, the easier it is to search. The probability of a search occurring is also related to the frequency f of occurrence in the history of the search. w is a2Attached to a directed edge E2Is shown in a certain searchProbability of successfully reaching the next monitoring data node s after the cable node a occurs; it is set to one (0, 1) in combination with experience]A value in between. w is a3={(ti,fi) I 1, M is attached to the directed edge E3Wherein, tiIndicates the probability P (o) that the occurrence of the search can be verifiedi|ai);fiIndicates the probability P (o) that the search action has not occurred but has been verifiedi|-ai) And the data are obtained through historical data statistics.
T={ρiL is a local conditional probability distribution table for associating nodes in the bayesian network model with corresponding direct father nodes, and for any node kiIf k isiIf there is a parent node, then k isiThe corresponding local conditional probability distribution is denoted as P ═ P (k)i|Pre(ki) Where Pre (k)i) Represents node kiAll of the parent nodes of (1).
The construction of the Bayesian query graph is divided into three parts: the method comprises the steps of establishing a structure, determining the weight of an edge and generating a local conditional probability distribution table.
The local conditional probability distribution table associates nodes with their father nodes, which is the basis for bayesian inference, and there are two cases:
pointing to the same monitored data node siEach search node a ofiThere is an OR relationship between them, i.e. any search action can occuriSetting as 1;
pointing to the same search node aiEach data node siThere is an AND relationship between them, i.e. a only if all preconditions for searching for a node are fulfillediMay occur.
Since there are three types of nodes in the bayesian network model of the embodiment of the present invention, three types of local conditional probability distribution tables need to be determined. According to the above specification, for data node sjLet us say Pre(s)j) Denotes sjAll parent nodes of ai∈Pre(sj),wijIndicating edge (a)i,sj) The weight value of above, then sjOffice ofThe conditional probability distribution is:
for search node ajLet Pre (a)j) Denotes ajAll parent nodes (being one or more data nodes), si∈Pre(aj),wijIndicating an edge(s)i,aj) The weight value of above, then ajThe local conditional probability distribution of (c) is:
and for evidence node oiThe father node is a single searching node aiLet a in pairiThe detection rate and false alarm rate of is distributed as tiAnd fiThen oiThe local conditional probability distribution of (c) is:
due to the problems of missing detection and false detection in query detection, part of search evidences are not accurate. Studies have found that isolated queries with low confidence levels are not queried again, and the presence of such evidence will affect the accuracy of query identification. Therefore, the confidence degree and the association strength of the search nodes are defined through the Bayesian network structure of the embodiment of the invention, and the confidence level of the search nodes and the association relation among the search nodes are comprehensively considered, so that effective search evidence is obtained; and Bayesian posterior reasoning is carried out according to the result, and the possibility that each data node in the network is searched is calculated.
Evidence node oiIs defined as the corresponding search a in the case that the search evidence is observediProbability of occurrence, i.e. P (a)i|oi). Further known by Bayesian formula, the evidence is searchediThe confidence of (c) can be expressed by the following formula:
according to the property of Bayesian network, N-N is set for a node in Bayesian query graph1,n2,...,nn}, node n1,n2,...,nnThe joint probability distribution of (c) is:
wherein, Pre (n)i) Is niAll of the parent nodes of (1). A certain node N in NkProbability of occurrence of P (n)k) The edge distribution can be obtained by equation (5), i.e.:
P(nk)=∑P(n1,n2,...,nk,...,nm) (6)
the node a can be obtained by the equations (5) and (6)iA priori probability P (a) ofi) And further find oiConfidence of (a) P (a)i|oi)。
There is usually some kind of connection between search evidences, and the connection between search evidences can be realized by searching the node a in the search graphiThe relationship between them. For example, if searching for node ai,ajIf reachable, the corresponding search evidence is oi,ojReflects the multistep nature of the search; if ai,ajIs the same data node skIs directly followed by the corresponding search evidence oi,ojCan reflect that the user is searching for skAnd then the searching process. According to the definition of Bayesian attack, P (a) is adoptedj|ai) To reflect searching node ai,ajThe closeness of the connection between them and thus the strength of the association of the search evidence.
Search for evidence oi,ojCorrelation strength Cor (o)i,oj) Defined as search aiTake place ofUnder the condition of searching for ajProbability of occurrence, i.e. P (a)j|ai)。
In the process of processing the evidence nodes, the sequence among the evidence nodes is not considered, namely Cor (o) is requiredi,oj)=Cor(oj,oi) However, P (a)j|ai) Is not necessarily equal to P (a)i|aj) Thus, in the actual calculation:
Cor(oi,oj)=Min{P(ai|aj),P(aj|ai)} (7)
wherein the content of the first and second substances,
P(a
i,a
j) And P (a)
j) Calculated according to the formulas (5) and (6).
On the basis of the foregoing embodiments, as an optional embodiment, calculating, according to the bayesian network model, a search probability of each monitoring data in the redis database in a future time period, specifically:
computing confidence P (a) of query evidence from the Bayesian network modeli|oi) The confidence level P (a)i|oi) For characterizing the search node a in the case that the search evidence is observediThe probability of occurrence;
forming effective evidence node set E ═ o by the evidence nodes with confidence degrees larger than confidence degree threshold value1,o2,...,on};
According to the formula
Calculating the probability P(s) that the monitoring data collected by the ith monitoring terminal is searched in the preset time period under the condition that the effective evidence set E appears
iI E), where P(s)
iAnd E) the effective evidence set E can verify the probability that the monitoring data collected by the ith monitoring terminal is searched in the preset time period; p (E) represents the probability of occurrence of the valid evidence set E.
Specifically, first, it is calculated according to formula (4)P(a
i|o
i) Adding P (a)
i|o
i) Less than or equal to 50% of invalid evidence was removed (preliminary screening was performed on the validity of the evidence in order; o
iConfidence of (a) P (a)
i|o
i) Should be greater than 50%, otherwise, P: (
a
i|o
i) Will be greater than 50%, i.e., the presence of evidence instead increases the likelihood that the query will not appear, leading to bias in the query a posteriori inference).
Rescreening the preliminarily screened evidence according to the query confidence coefficient and the correlation strength; traversing all alarms o associated with other evidence with a strength below a threshold α according to equation (7)iWhile calculating P (a) according to equation (4)i|oi) If o isiIs less than the confidence threshold beta, o is declarediIs an isolated query evidence with low confidence level, removes the isolated query evidence from the effective query evidence set E, and finally outputs the effective query evidence set E.
Setting parameters alpha and beta according to the practical application requirements of mariculture; alpha is a threshold value for measuring the relevance of the query evidence, and is taken
Wherein
Representing a set of directed edges E in a Bayesian query graph
1The average weight of (2); beta is a threshold value that measures the confidence level of the query evidence, taking 70%.
The valid evidence set E ═ { o } generated by algorithm 11,o2,...,onInformation node set S ═ S in the networkiI 1., N, the inference of query behavior is for each siBelongs to S, and solves P (S)iE) to show the probability of each node being queried.
On the basis of the foregoing embodiments, as an optional embodiment, the migrating the monitoring data in the migration data set to the Hbase database specifically includes:
dividing the migration data set into n data sub-tables according to the thread capacity in the thread pool;
and starting n threads, binding each thread with one data sublist, and migrating the monitoring data in the data sublist to the Hbase database.
It should be noted that, in the embodiment of the present invention, the migration data set is split according to the size of the thread capacity of the thread pool, and a plurality of data branch tables are obtained, so that the multithread task realizes the migration of the monitoring data in a shorter time through the asynchronous processing supported by the redis database.
On the basis of the foregoing embodiments, as an optional embodiment, the receiving of the monitoring data acquired by the monitoring terminal in real time according to the embodiments of the present invention specifically includes: and acquiring monitoring data acquired by the monitoring terminal in real time through pre-constructed MINA message middleware.
The mariculture monitoring equipment is various, the protocol is complex, the MINA can carry out asynchronous communication, a plurality of coding modes are set, different protocols are filtered, and the message processing efficiency can be improved. And the characteristic of uniform data interface is provided, so that the data can be conveniently operated. The middleware is designed based on the MINA, so that the message processing efficiency can be improved, and a uniform data interface is provided for operations such as upper-layer storage.
Specifically, the step of receiving the mass data based on the MINA message middleware technology is as follows:
s1, the monitor terminal firstly sends the device register request to the MINA message middleware to carry out active connection.
And S2, the monitoring terminal sends monitoring data after registering on the MINA message middleware.
And S3, defining different data formats by a Filter chain of the MINA message middleware, and analyzing the sensing data by a monitoring data packet of the monitoring terminal through the Filter chain.
And S4, the MINA message middleware receives the monitoring data and unifies the monitoring data with different formats into data with a uniform format.
And S5, packaging the monitoring data into a data format easy for redis storage through a decode method of the MINA message middleware.
And S6, taking the packaged sensing data as target data.
Fig. 2 is a schematic structural diagram of a storage device for monitoring data according to an embodiment of the present invention, and as shown in fig. 2, the storage device for monitoring data includes: the target data obtaining module 201, the temporary storage module 202, the migration data module 203, and the transfer module 204 specifically:
a target data obtaining module 201, configured to receive monitoring data acquired by the monitoring terminal in real time and use the monitoring data as target data, and then determine whether the migration flag is in a locked state at the current time;
the temporary storage module 202 is configured to store the target data to a temporary database if it is determined that the migration flag bit is in a locked state, and generate a bayesian network model according to a search frequency of each monitoring data in a redis database in a preset time period;
the migration data module 203 is configured to calculate, according to the bayesian network model, a search probability of each monitoring data in the redis database in a future time period, select a certain number of monitoring data with a lower search probability in the future time period to form a migration data set, and migrate the migration data set to the Hbase database, so that an available storage capacity of the redis database is greater than a first preset threshold;
a transferring module 204, configured to update the migration flag bit to a non-locked state, then transfer the monitoring data stored in the temporary database to a redis database, determine whether an available storage capacity of the redis database is lower than a second preset threshold, and if the available storage capacity of the redis database is lower than the second preset threshold, update the migration flag bit to a locked state;
the Bayesian network model is used for representing the search probability of each monitoring data in the redis database in the preset time period.
Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call a computer program stored on the memory 330 and operable on the processor 310 to perform the storage method of the monitoring data provided by the above embodiments, for example, including: receiving monitoring data acquired by a monitoring terminal in real time and taking the monitoring data as target data, and then judging whether a migration zone bit is in a locked state at the current moment; if the migration zone bit is determined to be in a locked state, storing the target data to a temporary database, and generating a Bayesian network model according to the search frequency of each monitoring data in a redis database in a preset time period; calculating the search probability of each monitoring data in a redis database in a future time period according to the Bayesian network model, selecting a certain amount of monitoring data with lower search probability in the future time period to form a migration data set, and migrating the migration data set to an Hbase database so as to enable the available storage capacity of the redis database to be larger than a first preset threshold value; updating the migration flag bit to be in a non-locking state, then transferring the monitoring data stored in the temporary database to a redis database, judging whether the available storage capacity of the redis database is lower than a second preset threshold, and if the available storage capacity of the redis database is lower than the second preset threshold, updating the migration flag bit to be in a locking state; the Bayesian network model is used for representing the search probability of each monitoring data in the redis database in the preset time period.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the storage method of monitoring data provided in the foregoing embodiments when executed by a processor, and for example, the storage method includes: receiving monitoring data acquired by a monitoring terminal in real time and taking the monitoring data as target data, and then judging whether a migration zone bit is in a locked state at the current moment; if the migration zone bit is determined to be in a locked state, storing the target data to a temporary database, and generating a Bayesian network model according to the search frequency of each monitoring data in a redis database in a preset time period; calculating the search probability of each monitoring data in a redis database in a future time period according to the Bayesian network model, selecting a certain amount of monitoring data with lower search probability in the future time period to form a migration data set, and migrating the migration data set to an Hbase database so as to enable the available storage capacity of the redis database to be larger than a first preset threshold value; updating the migration flag bit to be in a non-locking state, then transferring the monitoring data stored in the temporary database to a redis database, judging whether the available storage capacity of the redis database is lower than a second preset threshold, and if the available storage capacity of the redis database is lower than the second preset threshold, updating the migration flag bit to be in a locking state; the Bayesian network model is used for representing the search probability of each monitoring data in the redis database in the preset time period.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.