CN111931000A - Large-scale vector field oriented data processing method - Google Patents

Large-scale vector field oriented data processing method Download PDF

Info

Publication number
CN111931000A
CN111931000A CN202010807796.7A CN202010807796A CN111931000A CN 111931000 A CN111931000 A CN 111931000A CN 202010807796 A CN202010807796 A CN 202010807796A CN 111931000 A CN111931000 A CN 111931000A
Authority
CN
China
Prior art keywords
data
data block
vector field
block
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010807796.7A
Other languages
Chinese (zh)
Other versions
CN111931000B (en
Inventor
答海玲
张柱
郑坤
冉秀桃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhaotu Science & Technology Co ltd
Original Assignee
Wuhan Zhaotu Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhaotu Science & Technology Co ltd filed Critical Wuhan Zhaotu Science & Technology Co ltd
Priority to CN202010807796.7A priority Critical patent/CN111931000B/en
Publication of CN111931000A publication Critical patent/CN111931000A/en
Application granted granted Critical
Publication of CN111931000B publication Critical patent/CN111931000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types

Abstract

The invention provides a large-scale vector field oriented data processing method. The method equally divides the large-scale vector field data into sub-regions, and codes the sub-regions in sequence according to the positions of the sub-region data; reading sub-region data and merging the sub-region data into a data block; reading the data block, distributing the data block to a corresponding partition through Hash mapping according to the code of the data block, distributing the data block adjacent to the space to the same partition, and sequentially judging the unicity and the integrity of the data block according to the code value by the flow data valve. According to the method, the data blocks are distributed to the corresponding partitions through Hash mapping according to the codes of the data blocks, so that the data blocks adjacent to each other in space are distributed to the same partition, the data blocks adjacent to each other in space are not required to be searched again, and the efficiency of iterative computation is improved; the stream data valve eliminates data blocks with repeated and missing information, and guarantees the uniqueness and integrity of the data stream.

Description

Large-scale vector field oriented data processing method
Technical Field
The invention relates to a large-scale vector field oriented data processing method, and belongs to the field of high-performance computing frames.
Background
In the information age, sensors and information technologies are rapidly developed, vector field data and application requirements thereof are rapidly increased, and real-time calculation for large-scale vector field data also faces higher and higher performance requirements. Taking the application of wind field data in the field of meteorology as an example, the data is acquired by widely distributed wind speed and direction sensors, synchronously summarized in meteorological departments of a plurality of areas, analyzed and calculated in a unified cloud computing environment, and finally used for analyzing the wind field structure, performing typhoon early warning and the like, but in the face of huge data volume, the application of the data by the meteorological departments still faces minute-level delay. Similarly, in marine science, ocean current data can be used to analyze the influence of ocean currents on climate, and the same problem is faced in data processing performance.
Vector field data has a wide range of applications. For example, natural disasters such as typhoons and tsunamis can be predicted by analyzing wind field and ocean current data, early warning information is sent out in time, and therefore citizens can take precautionary measures in time, life safety of the citizens is guaranteed, and property loss of people is reduced. If the analysis and utilization of the data are not timely enough, the practical value of the data is correspondingly reduced, so that the research on the high-performance calculation of the large-scale vector field data has very important practical significance.
The existing large-scale vector field data calculation method divides large-scale vector field data into fine-scale data blocks, randomly transmits the fine-scale data blocks to different nodes and performs calculation, but the vector field data has spatial relevance, namely when calculating a certain vector field data, vector field data adjacent to the vector field data in space needs to be used, so that the calculation amount of the vector field data is greatly increased, and meanwhile, a large amount of communication overhead is increased.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a high-performance computing framework for large-scale vector field data, which reduces the large-scale data computation caused by the spatial correlation of the vector field data and improves the unicity and the integrity of the output data stream.
The technical scheme for realizing the aim of the invention is that the method for processing the large-scale vector field data at least comprises the following steps:
(1) equally dividing large-scale vector field data, equally dividing each divided vector field data formed after the equally dividing again, repeating the steps until the number of equally dividing times reaches the dimension numerical value of the vector field data, finally equally dividing to form sub-regions, and coding in sequence according to the positions of the sub-region data;
(2) setting a maximum merging number, reading all the subarea data, and sequentially merging the subarea data into an area block from small to large according to the maximum merging number;
(3) coding is carried out successively according to the formation of the data blocks, the data blocks are read, the data blocks are distributed to corresponding partitions through Hash mapping according to the coding of the data blocks, and the data blocks adjacent to each other in space are distributed to the same partition;
the hash mapping formula is
Figure BDA0002629800060000021
Wherein A is the number of partitions, C is the number of codes of the data block, M is the total number of partitions,
Figure BDA0002629800060000022
the number of the partitions is rounded down;
(4) checking the uniqueness and integrity of the data block by using a flow data valve; the method comprises the following specific steps:
1) setting different buffer areas according to the coding values of the data blocks;
2) sequentially distributing the data blocks into corresponding buffer areas according to the coding numerical values and judging whether the data block to be distributed is the same as a certain data block in the buffer areas or not, if so, replacing the same data block in the buffer areas with the data block, and if not, adding the data block into the corresponding buffer areas;
3) judging whether the data in the data block in the buffer area is complete, if so, outputting the data block quickly, and if not, not outputting the data block;
(5) and performing iterative computation on the data blocks, namely computing by always using the result data of the last computation, and combining the computed data blocks to form a data stream for output.
The technical scheme is further improved as follows: the equal division into the cross shape is equally divided.
And the codes of the sub-region data in the step (1) are sequentially marked from top to bottom according to 00, 01, 10 and 11, the codes of the sub-region data after further equal division are further marked, the original marks are reserved, and suffix marks are added according to the rule.
And step 1) the buffer only stores data blocks of the same encoding prefix.
And the data block data structure in the step 2) is key value name, data type and key value.
And the specific method of whether the data block to be allocated is the same as a certain data block in the buffer in step 2) is as follows: and judging whether the key values of the data block to be distributed and the data block in the buffer area are the same or not.
And the specific method for judging whether the data in the data block in the buffer area is complete in the step 3) is as follows: and sequentially checking whether all the minimum unit areas in the key values of the data block contain data values, if so, completing the data in the data block, and otherwise, completing the data in the data block.
According to the technical scheme, the large-scale vector field data processing method provided by the invention equally divides large-scale vector field data, equally divides each divided vector field data formed after the division again, repeats the steps until the number of the equally divided times reaches the dimension numerical value of the vector field data, finally equally divides the vector field data to form sub-regions, and sequentially encodes the sub-regions according to the positions of the sub-region data; therefore, the data scale of large-scale vector field data is reduced, and the complexity of data communication is reduced;
and simultaneously, all sub-region data in the designated range are read and combined into a data block, so that the whole large-scale vector field data does not need to be searched, and the data transmission efficiency is improved.
The method reads the data blocks and distributes the data blocks to corresponding partitions through Hash mapping according to the codes of the data blocks, and the distribution mode distributes the data blocks adjacent to the space to the same partition without searching the data blocks adjacent to the space of the data blocks again, so that the efficiency of iterative computation is improved;
the stream data valve distributes the data blocks into the corresponding buffer areas according to the coding numerical values in sequence and judges whether the data blocks to be distributed are the same as the data blocks in the buffer areas or not, if so, the data blocks are substituted for the same data blocks in the buffer areas, and if not, the data blocks are added into the corresponding buffer areas; judging whether the data in the data block in the buffer area is complete or not, if so, outputting the data block to a data stream, and if not, not outputting the data block; the stream data valve eliminates data blocks with repeated and missing information, outputs a single and complete data block to the data stream, and ensures the unicity and the integrity of the data stream.
Drawings
FIG. 1 is a schematic diagram of data partitioning and encoding according to the present invention;
FIG. 2 is a schematic diagram of node allocation according to the present invention;
FIG. 3 is a block output flow diagram according to the present invention;
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and examples, and the present invention is not limited to the examples.
Referring to fig. 1, the present invention provides a large-scale-oriented vector field data processing method, which includes the following steps:
equally dividing large-scale vector field data, equally dividing each divided vector field data formed after the equally dividing again, repeating the steps until the number of equally dividing times reaches the dimension numerical value of the vector field data, finally equally dividing to form sub-regions, and coding in sequence according to the positions of the sub-region data; the specific segmentation rule in this embodiment is as follows: and equally dividing the large-scale vector field data according to the cross shape, further dividing the divided large-scale vector field data, and repeating the dividing step until the dividing quantity reaches the data dimension.
The specific encoding rule in this embodiment is: the codes of the sub-region data are labeled from left to right sequentially from top to bottom according to 00, 01, 10 and 11, the codes of the sub-region data after further segmentation keep the original labels and add suffix labels according to the above rules, for example, the codes of the sub-region data after first segmentation are 00, 01, 10 and 11, and the codes of the sub-region data after second segmentation are 0000, 0001, 0010, 0011, 0100 and 0101 … ….
Therefore, the data scale of large-scale vector field data is reduced, and the complexity of data communication is reduced;
(2) setting the maximum merging number, wherein the maximum merging number is smaller than the number of the subareas, reading all the subarea data, and sequentially merging the subarea data into the subarea blocks from small to large according to the maximum merging number. The sub-region data is read from the appointed position, and then a section of continuous sub-region data is read from the storage space according to the codes and is merged into a data block to be sent out, so that the whole large-scale vector field data does not need to be searched, the data transmission efficiency is increased, and the data is ensured to enter the data stream in the most efficient mode.
Referring to fig. 2, encoding is performed sequentially according to the formation of data blocks, the data blocks are read and allocated to corresponding partitions by hash mapping according to the encoding of the data blocks, and spatially adjacent data blocks are allocated to the same partition;
the hash mapping formula is
Figure BDA0002629800060000041
Wherein A is the number of partitions, C is the number of codes of the data block, M is the total number of partitions,
Figure BDA0002629800060000042
the number of the partitions is rounded down;
for example, there are 10 data blocks encoded from 0 to 9, the computing cluster includes three partitions from 0 to 2, and according to the above mapping relationship, the data block allocated to partition 0 is (0, 1, 2, 9), the data block allocated to partition 1 is (3, 4, 5), and the data block allocated to partition 2 is (6, 7, 8).
Aiming at the spatial relevance of vector field data, the data blocks adjacent to each other in space are distributed to the same partition, and the data blocks adjacent to each other in space do not need to be searched again, so that the efficiency of iterative computation is improved;
(4) checking the uniqueness and integrity of the data block by using a flow data valve; the method comprises the following specific steps:
1) setting different buffer areas according to the coding values of the data blocks; wherein the buffer only stores data blocks of the same encoded prefix;
2) referring to fig. 3, sequentially allocating data blocks to corresponding buffers according to the encoding values and determining whether the data block to be allocated is the same as a certain data block in the buffer, if so, replacing the same data block in the buffer with the data block, and if not, adding the data block to the corresponding buffer; in the embodiment, the data block data structure is a key value name, a data type and a key value, and the specific method for judging whether the data block to be distributed is the same as the data block in the buffer area comprises the step of judging whether the key values in the data block to be distributed and the data block in the buffer area are the same.
3) Judging whether the data in the data block in the buffer area is complete, if so, outputting the data block, and if not, not outputting the data block, wherein the specific method for judging whether the data in the data block in the buffer area is complete in the embodiment is as follows: and sequentially checking whether all the minimum unit areas in the key values of the data block contain data values, if so, completing the data in the data block, and otherwise, completing the data in the data block.
The step (2) is specifically to judge the unicity of the data block, and the step (3) is specifically to judge the integrity of the data block.
The stream data valve eliminates data blocks with repeated and missing information, outputs a single and complete data block to the data stream, and ensures the unicity and the integrity of the data stream.
And performing iterative computation on the data blocks, namely computing by always using the result data of the last computation, and combining the computed data blocks to form a data stream for output. The iterative calculation ensures the data visibility among different data blocks, so that the calculation result is more accurate.

Claims (7)

1. A large-scale-oriented vector field data processing method is characterized by at least comprising the following steps:
(1) equally dividing large-scale vector field data, equally dividing each divided vector field data formed after the equally dividing again, repeating the steps until the number of equally dividing times reaches the dimension numerical value of the vector field data, finally equally dividing to form sub-regions, and coding in sequence according to the positions of the sub-region data;
(2) setting a maximum merging number, reading all the subarea data, and sequentially merging the subarea data into an area block from small to large according to the maximum merging number;
(3) coding is carried out successively according to the formation of the data blocks, the data blocks are read, the data blocks are distributed to corresponding partitions through Hash mapping according to the coding of the data blocks, and the data blocks adjacent to each other in space are distributed to the same partition;
the hash mapping formula is
Figure FDA0002629800050000011
Wherein A is the number of partitions, C is the number of codes of the data block, M is the total number of partitions,
Figure FDA0002629800050000012
the number of the partitions is rounded down;
(4) checking the uniqueness and integrity of the data block by using a flow data valve; the method comprises the following specific steps:
1) setting different buffer areas according to the coding values of the data blocks;
2) sequentially distributing the data blocks into corresponding buffer areas according to the coding numerical values and judging whether the data block to be distributed is the same as a certain data block in the buffer areas or not, if so, replacing the same data block in the buffer areas with the data block, and if not, adding the data block into the corresponding buffer areas;
3) judging whether the data in the data block in the buffer area is complete, if so, outputting the data block quickly, and if not, not outputting the data block;
(5) and performing iterative computation on the data blocks, namely computing by always using the result data of the last computation, and combining the computed data blocks to form a data stream for output.
2. The large-scale-oriented vector field data processing method of claim 1, wherein: equally dividing into cross shapes and equally dividing in the step (1).
3. The large-scale-oriented vector field data processing method of claim 1, wherein: and (2) sequentially marking the codes of the sub-region data in the step (1) from left to right and from top to bottom according to 00, 01, 10 and 11, further equally dividing the codes of the sub-region data, keeping the original marks and adding suffix marks according to the rule.
4. The large-scale-oriented vector field data processing method of claim 1, wherein: step 1) the buffer only stores data blocks of the same encoding prefix.
5. The large-scale-oriented vector field data processing method of claim 1, wherein: and 2) the data block data structure is a key value name, a data type and a key value.
6. The large-scale-oriented vector field data processing method according to claim 1 or 5, wherein the specific method of whether the data block to be allocated in step 2) is the same as a certain data block in the buffer area is as follows: and judging whether the key values of the data block to be distributed and the data block in the buffer area are the same or not.
7. The large-scale-oriented vector field data processing method according to claim 1 or 5, wherein the specific method for judging whether the data in the data block in the buffer is complete in step 3) is as follows: and sequentially checking whether all the minimum unit areas in the key values of the data block contain data values, if so, completing the data in the data block, and otherwise, completing the data in the data block.
CN202010807796.7A 2020-08-12 2020-08-12 Large-scale vector field data processing method Active CN111931000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010807796.7A CN111931000B (en) 2020-08-12 2020-08-12 Large-scale vector field data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010807796.7A CN111931000B (en) 2020-08-12 2020-08-12 Large-scale vector field data processing method

Publications (2)

Publication Number Publication Date
CN111931000A true CN111931000A (en) 2020-11-13
CN111931000B CN111931000B (en) 2023-12-19

Family

ID=73312004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010807796.7A Active CN111931000B (en) 2020-08-12 2020-08-12 Large-scale vector field data processing method

Country Status (1)

Country Link
CN (1) CN111931000B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655618A (en) * 2005-04-08 2005-08-17 北京中星微电子有限公司 Search method for video frequency encoding based on motion vector prediction
CN103259729A (en) * 2012-12-10 2013-08-21 上海德拓信息技术有限公司 Network data compaction transmission method based on zero collision hash algorithm
CN103678523A (en) * 2013-11-28 2014-03-26 华为技术有限公司 Distributed cache data access method and device
CN103810061A (en) * 2014-01-28 2014-05-21 河南科技大学 High-availability cloud storage method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655618A (en) * 2005-04-08 2005-08-17 北京中星微电子有限公司 Search method for video frequency encoding based on motion vector prediction
CN103259729A (en) * 2012-12-10 2013-08-21 上海德拓信息技术有限公司 Network data compaction transmission method based on zero collision hash algorithm
CN103678523A (en) * 2013-11-28 2014-03-26 华为技术有限公司 Distributed cache data access method and device
CN103810061A (en) * 2014-01-28 2014-05-21 河南科技大学 High-availability cloud storage method

Also Published As

Publication number Publication date
CN111931000B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
Jin et al. Community structure mining in big data social media networks with MapReduce
CN104809242A (en) Distributed-structure-based big data clustering method and device
CN109740023B (en) Sparse matrix compression storage method based on bidirectional bitmap
CN111104457A (en) Massive space-time data management method based on distributed database
CN111586091A (en) Edge computing gateway system for realizing computing power assembly
CN110928878A (en) HDFS-based point cloud data processing method and device
CN109190052B (en) Spatial indexing method based on social perception in distributed environment
CN114022648A (en) Space analysis method and system based on Beidou grid code and three-dimensional engine
CN110580323A (en) Urban traffic network maximum traffic flow acceleration algorithm based on cut point segmentation mechanism
CN114420215A (en) Large-scale biological data clustering method and system based on spanning tree
CN116010722A (en) Query method of dynamic multi-objective space-time problem based on grid space-time knowledge graph
CN116860905A (en) Space unit coding generation method of city information model
CN111028897B (en) Hadoop-based distributed parallel computing method for genome index construction
CN116775661A (en) Big space data storage and management method based on Beidou grid technology
CN115994197A (en) GeoSOT grid data calculation method
CN107679127A (en) Point cloud information parallel extraction method and its system based on geographical position
CN111931000B (en) Large-scale vector field data processing method
Band et al. Compressed neighbour lists for SPH
CN109492068B (en) Method and device for positioning object in predetermined area and electronic equipment
CN115687517A (en) Method and device for storing spatio-temporal data, database engine and storage medium
CN111506576B (en) Land block coding method and device based on regional quadtree
CN114972669A (en) Map generation method and device
Tarmur et al. Parallel classification of spatial points into geographical regions
Cavojsky et al. Search by pattern in gps trajectories
Chen et al. Disatra: A real-time distributed abstract trajectory clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant