CN110555037B

CN110555037B - Smart city data sharing system

Info

Publication number: CN110555037B
Application number: CN201910862896.7A
Authority: CN
Inventors: 张振
Original assignee: Suzhou New Hope Technology Co Ltd
Current assignee: Jiangsu new hope Technology Co.,Ltd.
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2020-10-23
Anticipated expiration: 2039-09-12
Also published as: CN110555037A

Abstract

The application relates to a wisdom city data sharing system, its characterized in that includes: the query library module is used for establishing a query library by historical accumulated queries on big data of the smart city; the analysis module is used for carrying out statistical analysis on the query library to determine an attribute column of which the access frequency exceeds a preset value in the big data of the smart city; the dividing module is used for dividing the smart city big data into data blocks by the attribute column; and the sharing module is used for receiving the query of the big data of the smart city, matching the attribute columns in the query to the data blocks, and acquiring the data meeting the query from the data blocks to provide for the current query.

Description

Smart city data sharing system

Technical Field

The application relates to the technical field of the next generation information network industry, in particular to a smart city data sharing system.

Background

The smart city aims at maximizing and optimizing city functions, economic growth is promoted, and meanwhile, the quality of life of urban residents is improved by means of intelligent science and technology and data analysis. Currently, the technologies most closely related to smart cities include: unmanned, machine learning, and internet of things (IoT). These technologies take advantage of the big data in different aspects of the city to make the life of urban residents more convenient. Among them, the internet of things has been deployed in a number of countries and regions. For example, 3200 detectors have been installed in the city of san diego in the united states to collect traffic information, and if there is any problem with the traffic, the city government can notify the citizens in the first place and implement timely solutions.

For example, monitoring and recording of relevant air pollutants is a task in smart cities. However, the problem that how many people know the pollution risk indexes of pedestrians, people riding bicycles, drivers and local residents, how to measure the pollution influence on road traffic, traffic lights, buses and subway stations, how to reduce the environmental pollution for urban population in the long run by popularizing the electric buses, and how much environmental pollution can be reduced can be known only by most accurate long-time multi-site monitoring, collection of a large amount of data and various analyses.

With the rapid development of smart cities, a large amount of data is generated, and the data has the remarkable characteristics of large scale, rapid growth, various structures and the like, so that the conventional method is adopted to provide data sharing service for the smart cities, and the query speed is too slow due to too low efficiency.

Disclosure of Invention

In order to overcome the problems in the related art, the application provides a smart city data sharing system.

According to an embodiment of the present application, a smart city data sharing system is provided, which includes:

the query library module is used for establishing a query library by historical accumulated queries on big data of the smart city;

the analysis module is used for carrying out statistical analysis on the query library to determine an attribute column of which the access frequency exceeds a preset value in the big data of the smart city;

the dividing module is used for dividing the smart city big data into data blocks by the attribute column;

and the sharing module is used for receiving the query of the big data of the smart city, matching the attribute columns in the query to the data blocks, and acquiring the data meeting the query from the data blocks to provide for the current query.

Preferably, the dividing module sets the granularity of dividing the data blocks according to the uniformity of the attribute columns.

Preferably, the larger the uniformity, the smaller the granularity of division is set.

Preferably, the smart city data sharing system further includes:

a correlation module: the system is used for analyzing the correlation of the query corresponding to the data block;

placing a module: for placing the data blocks onto respective compute nodes according to the dependencies.

Preferably, the correlation module operates as follows:

；

R_ijrefers to the data block B_iAnd data block B_jThe correlation between the query and the query, t is the sequence number of the query in the query library, n is the number of the queries in the query library, f_tIs the number of queries for the t-th query, when the data block B_iAttribute column and data block B of_jWhen the attribute column of (2) appears in the t-th query at the same time, p_tGet 1, otherwise get 0, whereinWhen i is j, R_ij＝0。

Preferably, the placement module sets the sum of the correlations of the data blocks placed in each computing node to be greater than a first preset value MIN and less than a second preset value MAX.

Preferably, is provided with

；

Where S is the sum of the correlations of all m data blocks.

Preferably, is provided with

。

Preferably, the data block B is calculated using the following formula_iCorrelation of (2)_i：

。

Preferably, S is calculated using the following formula:

。

the technical scheme provided by the embodiment of the application can have the following beneficial effects: the data blocks are divided by analyzing historical query data, so that the division of the data blocks is highly related to data content, the query efficiency is obviously improved, and the data sharing service can be efficiently provided for smart cities which are large in scale, rapid in growth and various in structure.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a block diagram illustrating a smart city data sharing system according to an exemplary embodiment.

Fig. 2 is a block diagram illustrating a smart city data sharing system according to another exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

The following disclosure provides many different embodiments, or examples, for implementing different features of the application. In order to simplify the disclosure of the present application, specific example components and arrangements are described below. Of course, they are merely examples and are not intended to limit the present application. Further, the present application may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. Further, examples of various specific processes and materials are provided herein, but one of ordinary skill in the art may recognize the applicability of other processes and/or the use of other materials. In addition, the structure of a first feature described below as "on" a second feature may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which additional features are formed between the first and second features, such that the first and second features may not be in direct contact.

In the description of the present application, it should be noted that, unless otherwise specified and limited, the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

Referring to fig. 1, the apparatus includes:

the query library module 10 is used for building a query library by historical accumulated queries on big data of the smart city;

the analysis module 20 is configured to perform statistical analysis on the query library to determine an attribute column of the smart city big data, where the access frequency exceeds a preset value;

the dividing module 30 is used for dividing the smart city big data into data blocks according to the attribute column;

and the sharing module 40 is used for receiving a query of big data of the smart city, matching the attribute columns in the query to the data blocks, and acquiring data meeting the query from the data blocks for providing the data to the current query.

The embodiment of the invention divides the data block by analyzing the historical query data, so that the division of the data block is highly related to the data content, namely, the attribute column which is accessed most frequently is obtained as the current division column by performing statistical analysis on the query load, thereby remarkably improving the query efficiency, and realizing that the smart city with large scale, rapid growth and various structures can also efficiently provide data sharing service.

Preferably, as shown in fig. 2, the smart city data sharing system further includes an updating module 50, configured to add the received query to the query library. This allows the partitioning of the data block to be continually updated, so that the partitioning of the data block can be consistent with the updating of the query.

Preferably, in the query repository, the query times of the query are also recorded for each query, that is, the query times are increased once every time the query is repeated. This can be used to further optimize the storage of the data block in the future.

Preferably, f in the query library can also be periodically traversed_t，f_tThe query frequency of the t-th query is 1-n, the query is deleted from the query library for queries which have not increased beyond a predetermined time, t is the sequence number of the queries in the query library, and n is the number of the queries in the query library. The method can delete the queries which are gradually aged and not used from the query library, thereby further optimizing the structure of the query library and improving the efficiency of providing the shared service for the smart city.

Preferably, the preset values for the access frequency are set as follows:

；

wherein c is the number of attribute columns in the query library, a_tIs an adjustment parameter.

If the access frequency is set too low, the management overhead is too large, and all the attribute columns accessed occasionally are added into the consideration of data block division, and if the access frequency is set too high, the data block division is too large, redundant data is too much, and subsequent data retrieval consumes too much computing resources. According to the invention, a large number of policy experiments are carried out, the access frequency preset value of the preferred embodiment is creatively designed, so that the management overhead is low, the reasonable granularity of data block division can be ensured, and the high-efficiency intelligent urban data sharing system is provided.

Preferably, the first and second liquid crystal materials are,

if the attribute column i does not appear in the t-th query, then a_t＝0。

In the preferred embodiment, the occurrence frequency of the attribute column in the query is also adjusted, and the adjustment mode makes the adjustment parameter of the attribute column with higher occurrence frequency in the query larger (i.e. the weight value is higher), so that a better Martha effect is realized, and the efficiency of intelligent city data sharing is further improved.

According to the preferred embodiment, the big data of the smart city can be divided into different granularity according to the data distribution characteristics of different attribute columns, so that a division result with higher efficiency is obtained.

Preferably, the larger the uniformity, the larger the granularity of division is set.

Preferably, the granularity of division is set by the following formula:

；

where C is the sum of the homogeneity of all C attribute columns, C_iIs the uniformity of the ith attribute column.

Preferably, C is set using the following formula_i：

。

The attribute columns are distributed uniformly, the division granularity can be set to be larger, so that the number of data blocks can be reduced properly, the parallelism degree of the operation can be ensured, and the overhead of data management can be reduced.

For the attribute column distribution which is more inclined, the division granularity can be set to be smaller, so that the number of data blocks is properly increased in an area with dense attribute column distribution, the selectivity of sampling operation on the data blocks can be improved, and irrelevant data can be cut off as much as possible.

The above described preferred embodiment of the invention achieves the above described object of granularity division.

Preferably, the smart city data sharing system further includes:

Even if the data blocks are reasonably divided, unreasonably storing the data blocks, for example, storing a large number of high-frequency accessed data blocks in the same computing node and storing a large number of low-frequency accessed data blocks in another computing node, may result in unbalanced loads on the computing nodes.

The preferred embodiment analyzes the correlation of the query corresponding to the data block, thereby realizing frequency analysis of the data block frequently called by historical query, and further reasonably storing the data block, so that the load of each computing node is balanced as much as possible.

Preferably, the correlation module operates as follows:

。

R_ijrefers to the data block B_iAnd data block B_jThe correlation between the data blocks, t is the serial number of the query in the query library, n is the number of the query in the query library, and the data block B is a data block_iAttribute column and data block B of_jWhen the attribute column of (2) appears in the t-th query at the same time, p_tTaking 1, otherwise taking 0, wherein when i ═ j, R_ij＝0。

The preferred embodiment determines whether the two attribute blocks have strong correlation by adopting whether the attribute columns of the two data blocks appear in the same attribute, so that the data blocks with high correlation can be scattered to different computing nodes as far as possible, the parallelism of query computation is greatly improved, the correlation between the data blocks is computed by a very efficient algorithm, and the efficiency of the smart city sharing service is remarkably improved.

Preferably, is provided with

；

Where S is the sum of the correlations of all m data blocks.

Preferably, is provided with

。

The inventor creatively provides the adjusting value through a large number of simulation experiments, and the adjusting value is adopted to set the range of the correlation, so that the reasonable distribution interval of the correlation can be efficiently calculated, the management overhead is reduced, and the query parallelism can be basically guaranteed.

。

Preferably, S is calculated using the following formula:

。

the above preferred embodiment of the present invention provides an efficient algorithm to achieve the calculation of the correlation between data blocks.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A smart city data sharing system, comprising:

obtaining the attribute column which is accessed most frequently as the current division column by carrying out statistical analysis on the query load;

the sharing module is used for receiving query of big data of the smart city, matching the attribute columns in the query to the data blocks, and acquiring data meeting the query from the data blocks to provide for the current query;

in the query library, recording the query times of each query;

periodically traversing f in the query repository_t，f_tThe query frequency of the t-th query is 1-n, the query is deleted from the query library for the queries which are not increased by the query frequency exceeding the preset time, t is the sequence number of the queries in the query library, and n is the number of the queries in the query library;

the preset values for the access frequency are set as follows:

wherein c is the number of attribute columns in the query library, a_tIs an adjustment parameter;

if the attribute column i does not appear in the t-th query, then a_t＝0；

The dividing module sets the dividing granularity of the data blocks according to the uniformity of the attribute columns;

the larger the uniformity is, the smaller the set granularity is;

placing a module: for placing the data blocks onto respective compute nodes according to the dependencies;

the partition granularity is set by the following formula:

where C is the sum of the uniformity of all C attribute columns, C_iUniformity for the ith attribute column

Setting C by the following formula_i：

The attribute columns are distributed uniformly, and the division granularity can be set to be larger, so that the number of data blocks can be reduced properly, and the overhead of data management can be reduced while the parallelism of the operation is ensured;

the correlation module works as follows:

R_ijrefers to the data block B_iAnd data block B_jThe correlation between the data blocks, t is the serial number of the query in the query library, n is the number of the query in the query library, and the data block B is a data block_iAttribute column and data block B of_jWhen the attribute column of (2) appears in the t-th query at the same time，p_tTaking 1, otherwise taking 0, wherein when i ═ j, R_ij＝0；

The placement module sets the sum of the correlations of the data blocks placed in each computing node to be greater than a first preset value MIN and smaller than a second preset value MAX;

is provided with

Wherein S is the sum of the correlations of all m data blocks;

is provided with

Calculate data block B using the following formula_iCorrelation of (2)_i：

S is calculated using the following formula: