Disclosure of Invention
In order to overcome the problems in the related art, the application provides a smart city data sharing system.
According to an embodiment of the present application, a smart city data sharing system is provided, which includes:
the query library module is used for establishing a query library by historical accumulated queries on big data of the smart city;
the analysis module is used for carrying out statistical analysis on the query library to determine an attribute column of which the access frequency exceeds a preset value in the big data of the smart city;
the dividing module is used for dividing the smart city big data into data blocks by the attribute column;
and the sharing module is used for receiving the query of the big data of the smart city, matching the attribute columns in the query to the data blocks, and acquiring the data meeting the query from the data blocks to provide for the current query.
Preferably, the dividing module sets the granularity of dividing the data blocks according to the uniformity of the attribute columns.
Preferably, the larger the uniformity, the smaller the granularity of division is set.
Preferably, the smart city data sharing system further includes:
a correlation module: the system is used for analyzing the correlation of the query corresponding to the data block;
placing a module: for placing the data blocks onto respective compute nodes according to the dependencies.
Preferably, the correlation module operates as follows:
Rijrefers to the data block BiAnd data block BjThe correlation between the query and the query, t is the sequence number of the query in the query library, n is the number of the queries in the query library, ftIs the number of queries for the t-th query, when the data block BiAttribute column and data block B ofjWhen the attribute column of (2) appears in the t-th query at the same time, ptGet 1, otherwise get 0, whereinWhen i is j, Rij=0。
Preferably, the placement module sets the sum of the correlations of the data blocks placed in each computing node to be greater than a first preset value MIN and less than a second preset value MAX.
Preferably, is provided with
;
Where S is the sum of the correlations of all m data blocks.
Preferably, is provided with
。
Preferably, the data block B is calculated using the following formulaiCorrelation of (2)i:
Preferably, S is calculated using the following formula:
the technical scheme provided by the embodiment of the application can have the following beneficial effects: the data blocks are divided by analyzing historical query data, so that the division of the data blocks is highly related to data content, the query efficiency is obviously improved, and the data sharing service can be efficiently provided for smart cities which are large in scale, rapid in growth and various in structure.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The following disclosure provides many different embodiments, or examples, for implementing different features of the application. In order to simplify the disclosure of the present application, specific example components and arrangements are described below. Of course, they are merely examples and are not intended to limit the present application. Further, the present application may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. Further, examples of various specific processes and materials are provided herein, but one of ordinary skill in the art may recognize the applicability of other processes and/or the use of other materials. In addition, the structure of a first feature described below as "on" a second feature may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which additional features are formed between the first and second features, such that the first and second features may not be in direct contact.
In the description of the present application, it should be noted that, unless otherwise specified and limited, the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.
Fig. 1 is a block diagram illustrating a smart city data sharing system according to an exemplary embodiment.
Referring to fig. 1, the apparatus includes:
the query library module 10 is used for building a query library by historical accumulated queries on big data of the smart city;
the analysis module 20 is configured to perform statistical analysis on the query library to determine an attribute column of the smart city big data, where the access frequency exceeds a preset value;
the dividing module 30 is used for dividing the smart city big data into data blocks according to the attribute column;
and the sharing module 40 is used for receiving a query of big data of the smart city, matching the attribute columns in the query to the data blocks, and acquiring data meeting the query from the data blocks for providing the data to the current query.
With the rapid development of smart cities, a large amount of data is generated, and the data has the remarkable characteristics of large scale, rapid growth, various structures and the like, so that the conventional method is adopted to provide data sharing service for the smart cities, and the query speed is too slow due to too low efficiency.
The embodiment of the invention divides the data block by analyzing the historical query data, so that the division of the data block is highly related to the data content, namely, the attribute column which is accessed most frequently is obtained as the current division column by performing statistical analysis on the query load, thereby remarkably improving the query efficiency, and realizing that the smart city with large scale, rapid growth and various structures can also efficiently provide data sharing service.
Preferably, as shown in fig. 2, the smart city data sharing system further includes an updating module 50, configured to add the received query to the query library. This allows the partitioning of the data block to be continually updated, so that the partitioning of the data block can be consistent with the updating of the query.
Preferably, in the query repository, the query times of the query are also recorded for each query, that is, the query times are increased once every time the query is repeated. This can be used to further optimize the storage of the data block in the future.
Preferably, f in the query library can also be periodically traversedt,ftThe query frequency of the t-th query is 1-n, the query is deleted from the query library for queries which have not increased beyond a predetermined time, t is the sequence number of the queries in the query library, and n is the number of the queries in the query library. The method can delete the queries which are gradually aged and not used from the query library, thereby further optimizing the structure of the query library and improving the efficiency of providing the shared service for the smart city.
Preferably, the preset values for the access frequency are set as follows:
wherein c is the number of attribute columns in the query library, atIs an adjustment parameter.
If the access frequency is set too low, the management overhead is too large, and all the attribute columns accessed occasionally are added into the consideration of data block division, and if the access frequency is set too high, the data block division is too large, redundant data is too much, and subsequent data retrieval consumes too much computing resources. According to the invention, a large number of policy experiments are carried out, the access frequency preset value of the preferred embodiment is creatively designed, so that the management overhead is low, the reasonable granularity of data block division can be ensured, and the high-efficiency intelligent urban data sharing system is provided.
Preferably, the first and second liquid crystal materials are,
if the attribute column i does not appear in the t-th query, then at=0。
In the preferred embodiment, the occurrence frequency of the attribute column in the query is also adjusted, and the adjustment mode makes the adjustment parameter of the attribute column with higher occurrence frequency in the query larger (i.e. the weight value is higher), so that a better Martha effect is realized, and the efficiency of intelligent city data sharing is further improved.
Preferably, the dividing module sets the granularity of dividing the data blocks according to the uniformity of the attribute columns.
According to the preferred embodiment, the big data of the smart city can be divided into different granularity according to the data distribution characteristics of different attribute columns, so that a division result with higher efficiency is obtained.
Preferably, the larger the uniformity, the larger the granularity of division is set.
Preferably, the granularity of division is set by the following formula:
where C is the sum of the homogeneity of all C attribute columns, CiIs the uniformity of the ith attribute column.
Preferably, C is set using the following formulai:
The attribute columns are distributed uniformly, the division granularity can be set to be larger, so that the number of data blocks can be reduced properly, the parallelism degree of the operation can be ensured, and the overhead of data management can be reduced.
For the attribute column distribution which is more inclined, the division granularity can be set to be smaller, so that the number of data blocks is properly increased in an area with dense attribute column distribution, the selectivity of sampling operation on the data blocks can be improved, and irrelevant data can be cut off as much as possible.
The above described preferred embodiment of the invention achieves the above described object of granularity division.
Preferably, the smart city data sharing system further includes:
a correlation module: the system is used for analyzing the correlation of the query corresponding to the data block;
placing a module: for placing the data blocks onto respective compute nodes according to the dependencies.
Even if the data blocks are reasonably divided, unreasonably storing the data blocks, for example, storing a large number of high-frequency accessed data blocks in the same computing node and storing a large number of low-frequency accessed data blocks in another computing node, may result in unbalanced loads on the computing nodes.
The preferred embodiment analyzes the correlation of the query corresponding to the data block, thereby realizing frequency analysis of the data block frequently called by historical query, and further reasonably storing the data block, so that the load of each computing node is balanced as much as possible.
Preferably, the correlation module operates as follows:
Rijrefers to the data block BiAnd data block BjThe correlation between the data blocks, t is the serial number of the query in the query library, n is the number of the query in the query library, and the data block B is a data blockiAttribute column and data block B ofjWhen the attribute column of (2) appears in the t-th query at the same time, ptTaking 1, otherwise taking 0, wherein when i ═ j, Rij=0。
The preferred embodiment determines whether the two attribute blocks have strong correlation by adopting whether the attribute columns of the two data blocks appear in the same attribute, so that the data blocks with high correlation can be scattered to different computing nodes as far as possible, the parallelism of query computation is greatly improved, the correlation between the data blocks is computed by a very efficient algorithm, and the efficiency of the smart city sharing service is remarkably improved.
Preferably, the placement module sets the sum of the correlations of the data blocks placed in each computing node to be greater than a first preset value MIN and less than a second preset value MAX.
Preferably, is provided with
;
Where S is the sum of the correlations of all m data blocks.
Preferably, is provided with
。
The inventor creatively provides the adjusting value through a large number of simulation experiments, and the adjusting value is adopted to set the range of the correlation, so that the reasonable distribution interval of the correlation can be efficiently calculated, the management overhead is reduced, and the query parallelism can be basically guaranteed.
Preferably, the data block B is calculated using the following formulaiCorrelation of (2)i:
Preferably, S is calculated using the following formula:
the above preferred embodiment of the present invention provides an efficient algorithm to achieve the calculation of the correlation between data blocks.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.