CN114756572A - Parallel computing method and system for vector space data quality inspection - Google Patents
Parallel computing method and system for vector space data quality inspection Download PDFInfo
- Publication number
- CN114756572A CN114756572A CN202210419501.8A CN202210419501A CN114756572A CN 114756572 A CN114756572 A CN 114756572A CN 202210419501 A CN202210419501 A CN 202210419501A CN 114756572 A CN114756572 A CN 114756572A
- Authority
- CN
- China
- Prior art keywords
- quality inspection
- inspection
- item
- quality
- vector space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Factory Administration (AREA)
Abstract
The invention discloses a parallel computing method and a system for vector space data quality inspection, which comprises the following steps: acquiring vector space data, and checking the vector space data according to a quality checking scheme; extracting four types of quality inspection items from the vector space data qualified by inspection according to a quality inspection item classification method to obtain four quality inspection item sets, and further decomposing the four quality inspection item sets to obtain a plurality of quality inspection subentries corresponding to the four quality inspection item sets; distributing the quality inspection tasks to a plurality of quality inspection sub-items according to the set fixed time step executed by a single quality inspection task and the type of the quality inspection item, and then generating the quality inspection task; and performing parallel quality inspection according to the distributed quality inspection tasks until all the quality inspection tasks are executed. The invention has the beneficial effects that: the quality inspection time is greatly shortened on the premise of ensuring the accuracy of the quality inspection result, and the efficiency of the vector space data quality inspection is improved.
Description
Technical Field
The invention relates to the field of vector space data quality inspection, in particular to a parallel computing method for vector space data quality inspection.
Background
With the rapid development of information technology, in addition to the traditional natural resource industry, vector space data is increasingly demanded by multiple industries such as housing, urban and rural construction, agricultural and rural areas, emergency management, transportation, energy and power, and the like. Quality is the life of vector space data and it directly affects the correctness and reliability of all application analysis and decisions based on the vector space data. With the shorter and shorter updating period of the vector space data, the requirement on the quality of the vector space data is higher and higher, and the higher requirement on the efficiency of the quality inspection of the vector space data is also provided. Because the vector space data has the characteristics of large data quantity, rich data types and more inspection contents, the quality inspection method for the vector space data is very complicated, and meanwhile, the corresponding quality inspection methods of different vector space data are inconsistent due to inconsistent database establishment specifications.
The most widely used form of vector space data storage in the mapping industry today is: ESRI sharpefile, ESRI personal geographic database (. mdb), ESRI spatial database (. gdb), all three data formats established by the ESRI corporation. Aiming at the data, the technical means for performing vector space data quality inspection at present mainly takes desktop single-machine quality inspection software as a main part to realize element graph inspection, attribute inspection, topology inspection, edge connection inspection and the like of the vector space data, and is limited by the storage form of the vector space data, the large data volume of the vector space data and the single-machine computing power of a desktop computer. For example, when there are more than 100 ten thousand elements in the vector space data, the time for checking the attribute of a single field in a single table is often more than 30 seconds, the time for topology checking generally reaches tens of minutes to several hours, the total time for completing the quality check of the vector space data exceeds several hours, and the stability is not high, so that the problem of seizure is easily caused, and the quality check efficiency cannot meet the requirements of the current vector space data updating frequency and vector space data quantity increasing on the quality check.
The invention provides a parallel computing method for vector space data quality inspection, which fully utilizes parallel computing to solve the performance bottleneck problem when large-data-volume vector space data is subjected to quality inspection on the premise of ensuring the accuracy of a quality inspection result, shortens the time of the quality inspection of the vector space data, and improves the quality inspection efficiency of the vector space data.
Disclosure of Invention
The invention provides a parallel computing method for vector space data quality inspection, which mainly solves the following technical problems:
on the premise of ensuring the accuracy of the quality inspection result, the parallel computation is fully utilized to solve the performance bottleneck problem when the quality inspection is carried out on the vector space data with large data volume, the time for inspecting the quality of the vector space data is shortened, and the quality inspection efficiency of the vector space data is improved.
A parallel computing method for vector space data quality inspection is characterized by comprising the following steps:
acquiring vector space data, and checking the vector space data according to a quality checking scheme;
extracting four types of quality inspection items from qualified vector space data according to a quality inspection item classification method to obtain four qualities
Measuring the quality inspection item set, and further decomposing the four quality inspection item sets to obtain a plurality of quality inspection sub-items corresponding to the four quality inspection item sets;
distributing quality inspection tasks to a plurality of quality inspection sub-items according to the set fixed time step executed by a single quality inspection task and the type of the quality inspection item, and then generating the quality inspection task;
and performing parallel quality inspection according to the distributed quality inspection tasks until all the quality inspection tasks are executed.
In the above parallel computing method for vector space data quality inspection, the database correctness inspection items include rendezvous directory correctness inspection, file naming correctness inspection, data format correctness inspection, data validity inspection, structure correctness inspection, attribute correctness inspection, space reference system inspection, layer integrity inspection, and attribute integrity inspection.
In the above parallel computing method for vector space data quality inspection, the four types of quality inspection items are respectively
Element pattern inspection, comprising: composite graph inspection, graph geometric abnormity inspection, line folding inspection, face side line inspection, micro short line inspection, micro face inspection, element node number overrun inspection, element direction correctness inspection, long and narrow face inspection and face gap inspection.
Attribute verification, comprising: attribute null value test, attribute correlation test, attribute value domain test, attribute unique value test, attribute correctness SQL test, ID card code verification, code length test, decimal digit correctness test, code rule correctness test, contour height value correctness test, figure attribute consistency test (area) and figure attribute consistency test (length).
Topology verification, comprising: the method comprises the following steps of intersecting same-layer lines, intersecting different-layer lines, intersecting same-layer planes, intersecting different-layer planes, self-intersecting lines, self-intersecting faces, overlapping points, overlapping lines, overlapping same-layer planes, overlapping different-layer planes, pseudo nodes, hanging points on the same layer, hanging points on different layers, points which must fall on a line, faces which are in another plane, non-intersecting line planes, points which fall on a plane, points which do not fall on a plane, line which does not fall on a plane, line intersection interruption inspection, space constraint condition inspection, line and face boundary consistency inspection and contour line and point contradiction inspection.
Edge joining inspection, comprising: checking the attribute connection among the maps, checking the attribute connection among the regions, checking the graph connection among the maps and checking the graph connection among the regions.
In the parallel computing method for vector space data quality inspection, when decomposing,
for quality inspection of a single layer, inspecting a certain rule of accurately describing a plurality of fields of a specific certain layer by a subitem;
for quality inspection of the associated layers, checking a certain rule that the subentry accurately describes a plurality of fields or conditions of the associated layers;
the syndrome entries do not contain parameters of other syndrome entries. And after the decomposition is finished, classifying the quality inspection sub-items into a classification set of the quality inspection sub-items according to the classification of the quality inspection items.
In the above parallel computing method for vector space data quality inspection, when quality inspection tasks are distributed, the number of element graph inspection sub-items is set to be n1, the number of topology inspection sub-items is set to be n2, the number of edge inspection sub-items is set to be n3, and the number of attribute inspection sub-items is set to be n 4. The time consumption parameter of each checker item in the element graph check is t1, the time consumption parameter of each checker item in the topology check is t2, the time consumption parameter of each checker item in the edge check is t3, and the time consumption parameter of each checker item in the attribute check is t 4;
linear calculation of total time consumption
T=n1t1+n2t2+n3t3+n4t4
Acquiring the number C of the calculation servers; calculating the shortest parallel quality inspection duration of
Setting the fixed time step for the execution of a single task to be one quarter of Tmin, i.e.
All quality inspection sub-items are allocated to the quality inspection tasks according to the quality inspection item types according to the fixed time step executed by the single quality inspection task, and each quality inspection task comprises a plurality of quality inspection sub-items of the same type.
When the parallel computing method facing to the vector space data quality inspection and the quality inspection task generating method are adopted,
the time consumption for the inspection evaluation of each inspection sub item is more than or equal to Ta, and each inspection sub item is divided into an inspection Task;
aggregating each syndrome item according to type if the time consumption of the inspection evaluation of each syndrome item is less than Ta;
aggregating each type of quality inspection subentry respectively, wherein the different types of quality inspection subentries are not put in the same quality inspection Task;
each type of quality inspection sub-item is aggregated to form a quality inspection Task, the aggregation method is to create a quality inspection Task, extract a quality inspection sub-item from a quality inspection sub-item set and add the quality inspection sub-item into the quality inspection Task, delete the quality inspection sub-item from the quality inspection sub-item set at the same time, accumulate the evaluation time consumption of all quality inspection sub-items contained in the quality inspection Task to obtain the evaluation time consumption of the quality inspection Task, and if the evaluation time consumption of the quality inspection Task is more than or equal to Ta; the inspection Task is saved; the creation of a new task then continues until the number of class quality checker sub-item sets is 0.
In the parallel computing method for vector space data quality inspection, during quality inspection,
extracting all the image layers associated with the quality inspection sub-items in the topology inspection quality inspection sub-item set, removing the repeated image layers to obtain an image layer set participating in topology inspection,
and then the scheduling server constructs the topological relation of the vector space data only for the layer sets participating in the topological check through a PostGIS engine of the open source space database PostgreSQL.
And distributing the vector space data containing the topological relation to the computing server by the scheduling server.
The scheduling server acquires a plurality of idle computing servers, a plurality of quality inspection tasks are respectively distributed to the computing servers for computing, and each computing server can only execute one quality inspection task at the same time.
The computing server executes computing by using a PostGIS engine of an open source space database PostgreSQL according with the OGC Simple Feature Access SQL specification, and when computing is completed by a certain computing server, the computing server notifies a scheduling server that computing tasks are completed; the dispatch server reads the quality check results from the compute server and assigns quality check tasks to the compute server that have not yet begun execution.
And outputting a quality inspection result until all quality inspection tasks are executed.
A parallel computing system for vector space data quality inspection, using the method of any one of claims 1 to 7, comprising:
a first module configured to acquire vector space data and to inspect the vector space data according to a quality inspection scheme;
a second module configured to extract four types of quality inspection items according to a quality inspection item classification method for vector space data qualified in inspection to obtain four qualities
Measuring the quality inspection item set, and further decomposing the four quality inspection item sets to obtain a plurality of quality inspection sub-items corresponding to the four quality inspection item sets;
the third module is configured to generate a quality inspection task after distributing the quality inspection tasks to a plurality of quality inspection sub-items according to the set fixed time step executed by a single quality inspection task and the type of the quality inspection items;
and the fourth module is configured to perform parallel quality inspection according to the distributed quality inspection tasks until all the quality inspection tasks are executed.
The beneficial effects provided by the invention are as follows:
(1) the parallel computing method for vector space data quality inspection provided by the invention preferentially executes 'database correctness' inspection before quality inspection is executed, thereby avoiding the problem that an effective quality inspection result cannot be obtained due to incorrect vector space data structure.
(2) According to the parallel computing method for vector space data quality inspection, only the layers participating in the topology inspection are extracted to construct the topological relation when the topology inspection is executed, so that the time for constructing the topological relation is shortened, the data storage space is reduced, and the data distribution time is shortened.
(3) The parallel computing method for vector space data quality inspection provided by the invention greatly shortens the total quality inspection time and improves the efficiency of vector space data quality inspection on the premise of ensuring the accuracy of the quality inspection result.
(4) According to the parallel computing method for vector space data quality inspection, which is provided by the invention, a PostGIS engine using an open source space database PostgreSQL uses Spatial SQL which conforms to OGC Simple Feature Access SQL standards to execute quality inspection computation, so that the dependence on industrial business software is reduced, and the parallel computing method is suitable for popularization and application.
Drawings
FIG. 1 is a flow chart of a parallel computing method for vector space data quality inspection according to the present invention;
fig. 2 is a flow chart of quality check task generation.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear, the following will take quality inspection of a three-tone vector spatial database in the country as an example, and further describe embodiments of the present invention with reference to the accompanying drawings.
Copying vector space data to a scheduling server, and submitting the vector space data to an open source space database PostgreSQL by the scheduling server; extracting a database correctness check item in the quality check scheme, and performing priority check; classifying and decomposing the quality inspection items in the quality inspection scheme into fine granularity to form a quality inspection sub item set; aggregating the quality inspection sub-items into a quality inspection sub-item set according to a fixed time step, and distributing the quality inspection sub-item set to a quality inspection task; the dispatching server only constructs a topological relation for the layers participating in the topological inspection, and distributes data after the topological relation is constructed to the computing server; the scheduling server distributes the quality inspection tasks to a plurality of computing servers for computing respectively; the computing server uses a PostGIS engine conforming to OGC Simple Feature Access SQL specification Spatial SQL to compute through an open source space database PostgreSQL, and when computing is completed by a certain computing server, the computing server notifies a scheduling server that computing tasks are completed; the scheduling server reads a quality inspection result from the computing server and distributes a quality inspection task which is not started to be executed to the computing server; repeatedly executing the previous step until all quality inspection tasks are executed; the scheduling server merges the quality check results.
The above steps are explained one by one.
Referring to fig. 1, the present invention includes the following steps:
s101: and copying the vector space data to a scheduling server, and submitting the vector space data to a spatial database PostgreSQL by the scheduling server.
S102: submitting the quality inspection scheme to a scheduling server, extracting inspection items of 'rendezvous directory correctness, file naming correctness, data format correctness, data validity, structure correctness, attribute correctness, spatial reference system, layer integrity and attribute integrity' from the quality inspection scheme by the scheduling server as inspection items of 'database correctness inspection', and carrying out quality inspection on the database correctness of vector spatial data by the scheduling server; if the database correctness check cannot be passed, the subsequent quality check cannot be carried out, and a quality check result is output by the scheduling server; if the database correctness check is passed, S103 is performed. For example, in the national trimodal vector spatial data quality inspection scheme, the "spatial reference system inspection" needs to inspect whether the geographic coordinate system names of CJDCQJX, TTQ, XZQ, XZQJX, ZYXMYD, CJDCQ, ZRYCBHQ, JZKZD, KFYQ, LSYD, PDT, PZWJSTD, QTJZKFQ, STBHHX, ZRBHQ, GJGY, SLGY, ZJ and CLKFQ layers are CGCS2000 and whether WKID is 4490.
S103: the quality check items in the quality check scheme are classified and decomposed. And extracting four types of quality inspection items from the quality inspection items in the quality inspection scheme according to a quality inspection item classification method to obtain four quality inspection item sets. The quality inspection and classification method comprises the following steps:
(1) element pattern inspection, comprising: the method comprises the following steps of (1) conforming to pattern inspection, pattern geometric abnormity inspection, line folding inspection, face side line inspection, micro short line inspection, micro face inspection, element node number overrun inspection, element direction correctness inspection, long and narrow face inspection and face gap inspection; for example, the 'tiny surface inspection' in the national three-tone vector space data needs to inspect tiny surfaces with the area of layers of less than 200 square meters, such as TTQ, XZQ, ZYXMYD, CJDCQ, ZRYCBHQ, KFYQ, LSYD, PDT, PZWJSDTD, QTJZKFQ, STBHHX, ZRBHQ, GJGY, SLGY, ZJ, CSKFBJ, DZGY, FJMSQ and the like; the 'surface gap inspection' needs to inspect the surfaces with the distance less than 0.0001 meter in the XZQ, CJDCQ and DLTB layers; the 'micro stub inspection' needs to inspect micro stubs with the length less than 0.2 m in CJDCQJX, XZQJX and DGX image layers.
(2) Attribute verification, comprising: attribute control inspection, attribute association inspection, attribute value domain inspection, attribute unique value inspection, attribute correctness SQL inspection, ID card code verification, code length inspection, decimal digit correctness inspection, code rule correctness inspection, contour height value correctness inspection, drawing attribute consistency inspection (area) and drawing attribute consistency inspection (length); for example, "farmland planting attribute code value field inspection" in the homeland three-tone vector space data needs to inspect whether the value of the ZZSXM field of DLTB is a certain value of { "LYFL", "WG", "XG", "LS", "FLS", "LLJZ", "JKHF", "GCHF" }; the "area field value is greater than 0 check" (one case of attribute value field check) requires checking whether or not the values of the xzq.jsmj, xzq.dcmj, lsyd.tbmj, pzwjstd.tbmj, pzwjstd.pzmj, etc. fields are greater than or equal to 0.
(3) Topology verification, comprising: the method comprises the following steps of same-layer line intersection inspection, different-layer line intersection inspection, same-layer plane intersection inspection, different-layer plane intersection inspection, line self-intersection inspection, plane self-intersection inspection, point overlap inspection, line overlap inspection, same-layer plane overlap inspection, different-layer plane overlap inspection, pseudo node inspection, same-layer suspension point inspection, different-layer suspension point inspection, inspection that a point must fall on a line, inspection that a plane is in another plane, inspection that a line and a plane cannot intersect, inspection that a point falls on a plane, inspection that a point does not fall on a plane, inspection that a line falls on a plane, inspection that a line does not fall on a plane, inspection that line intersection breaks, inspection for space constraint conditions, inspection for consistency of line and plane boundaries, and inspection for high line-point line contradiction; for example, the 'same-layer line intersection inspection' in the homeland three-tone vector space data needs to inspect the line intersection problem in CJDCQJX, XZQJX and DGX three image layers; "elements must be checked within administrative regions" (a case of cross-check at different levels) it is necessary to check whether all elements of the TTQ, ZYXMYD, CJDCQ, ZRYCBHQ, KFYQ, LSYD, PDT, PZWJSTD, QTJZKFQ, STBHHX, ZRBHQ layers are within XZQ layers.
(4) Edge joining inspection, comprising: checking the attribute connection among the maps, checking the attribute connection among the regions, checking the graph connection among the maps and checking the graph connection among the regions. For example, "consistency of inter-map class attributes border check" in the homeland three-tone vector space data needs to check whether DLBM and DLMC fields of the patches with the same BSM field value in the adjacent map sheets in the DLTB map layer are consistent.
And further decomposing each quality inspection item in the quality inspection item set into quality inspection sub-items, wherein the decomposition rule is as follows:
(1) for quality inspection of a single layer, inspecting a certain rule of accurately describing a plurality of fields of a specific certain layer by a subitem;
(2) for quality check of the associated layers, checking some rule that the subentry accurately describes a plurality of fields (or conditions) of the associated layers;
(3) the syndrome item does not contain parameters of other syndrome items;
(4) according to the first three decomposition rules, the child item is checked to be incapable of being decomposed continuously.
And after the decomposition is finished, classifying the quality inspection sub-items into a classification set of the quality inspection sub-items according to the classification of the quality inspection items. For example, after decomposing the test items exemplified in step S103, the following classification sets of 4 quality tester items are obtained:
(1) inspecting the element graph: checker subentry 1 (checking TTQ layer area less than 200 square meter face), checker subentry 2 (checking XZQ layer area less than 200 square meter face), checker subentry 3 (checking ZYXMYD layer area less than 200 square meter face), checker subentry 4 (checking CJDCQ layer area less than 200 square meter face), checker subentry 5 (checking zr ycbhq layer area less than 200 square meter face), checker subentry 6 (checking KFYQ layer area less than 200 square meter face), checker subentry 7 (checking LSYD layer area less than 200 square meter face), checker subentry 8 (checking PZWJSDTD layer area less than 200 square meter face), checker subentry 9 (checking XZQ layer pitch less than 0.0001 meter face), checker subentry 10 (checking XZQJX layer pitch less than 0.0001 meter face), checker subentry 11 (checking DLTB layer pitch less than 0.0001 meter face), checker subentry 12 (checking jqx layer length less than 0.2.cjcjx line length less than 0.0001 meter face), and so on, A checker item 13 (checking lines with the length of XZQJX layer less than 0.2 m) and a checker item 14 (checking lines with the length of DGX layer less than 0.2 m).
(2) And (4) checking the attribute: checker entry 15 (checks whether the value of the ZZSXM field of DLTB is a certain one of { "LYFL", "WG", "XG", "LS", "FLS", "LLJZ", "JKHF", "GCHF" }), checker entry 16 (checks whether xzq.jsmj is greater than 0), checker entry 17 (checks whether xzq.dcmj is greater than 0), checker entry 18 (checks whether lsyd.tbmj is greater than 0), checker entry 19 (checks whether pzjstd.tbmj is greater than 0), and checker entry 20 (checks whether pzwjstd.pj is greater than 0).
(3) Topology inspection: check subentry 21 (check intersecting lines within CJDCQJX layers), check subentry 22 (check intersecting lines within XZQJX layers), check subentry 23 (check intersecting lines within DGX layers), check subentry 24 (check whether all patches of TTQ layers are within XZQ layers), check subentry 25 (check whether all patches of ZYXMYD layers are within XZQ layers), check subentry 26 (check whether all patches of TTQ layers are within XZQ layers), check subentry 27 (check whether all patches of zrdcq layers are within XZQ layers), check subentry 28 (check whether all patches of zrdhq layers are within XZQ layers), check subentry 29 (check whether all patches of KFYQ layers are within XZQ layers), check subentry 30 (check whether all patches of LSYD layers are within XZQ layers), check whether all patches of zpdt layers are within XZQ layers, check whether all patches of PDT layers are within XZQ layers, A check sub-entry 32 (check if all patches for the PZWJSTD layer are within the XZQ layer range), a check sub-entry 33 (check if all patches for the QTJZKFQ layer are within the XZQ layer range), a check sub-entry 34 (check if all patches for the stbhx layer are within the XZQ layer range), and a check sub-entry 35 (check if all patches for the ZRBHQ layer are within the XZQ layer range).
(4) Edge connection inspection: a check sub-entry 36 (checking whether DLBMs of spots having the same value of BSM field in adjacent frames in the DLTB layer are consistent), and a check sub-entry 37 (checking whether DLMCs of spots having the same value of BSM field in adjacent frames in the DLTB layer are consistent).
The number of decomposed quality checker item sets is defined as follows:
inspecting the element graph: the number of such syndrome entries is set to n1,
topology inspection: the number of class syndrome entries is set to n2,
and (3) edge joint inspection: the number of such syndrome entries is set to n3,
and (4) checking the attribute: the number of such syndrome entries is set to n 4.
S104: calculating the time step length of a quality inspection task when the scheduling server executes scheduling:
the time consumption of each syndrome item is directly related to the type of the syndrome item, and the time consumption parameters of each type of the syndrome item are set as follows:
(1) inspecting the element graph: setting the time consumption parameter of each check sub item of the class check item as t1,
(2) topology inspection: setting the time consumption parameter of each check sub item of the class check item as t2,
(3) and (3) edge joint inspection: setting the time consumption parameter of each check sub-item of the class of check items as t3,
(4) and (3) attribute checking: and setting the time consumption parameter of each check sub item of the class of check items as t 4.
Calculating the total time consumption of linear calculation according to the number of the various types of the syndrome items obtained in the step S103
T=n1t1+n2t2+n3t3+n4t4
Acquiring the number C of the calculation servers by the scheduling server; calculating the shortest parallel quality inspection duration of
Setting the fixed time step for the execution of a single task to be one quarter of Tmin, i.e.
All quality inspection sub-items are assigned to the quality inspection tasks according to the quality inspection item types according to the fixed time step executed by the single quality inspection task, and each quality inspection task comprises a plurality of quality inspection sub-items of the same type. The generation method of the quality inspection task comprises the following steps:
(1) if the time consumption of the inspection evaluation of each inspection sub item is more than or equal to Ta, dividing each inspection sub item into an inspection Task; if the verification evaluation time of each verifier item is less than Ta, aggregating each verifier item according to type;
(2) aggregating each type of quality inspection subentry respectively, wherein the different types of quality inspection subentries are not put in the same quality inspection Task;
(3) referring to fig. 2, the aggregation method is to create a quality inspection Task, extract a quality inspection subentry from a quality inspection subentry set, add the quality inspection subentry into the quality inspection Task (delete the quality inspection subentry from the quality inspection subentry set at the same time), accumulate the evaluation consumed time of all quality inspection subentries included in the quality inspection Task to obtain the evaluation consumed time of the quality inspection Task, and if the evaluation consumed time of the quality inspection Task is greater than or equal to Ta; the inspection Task is saved; the creation of a new task then continues until the number of class quality checker sub-item sets is 0.
Taking 37 quality checker items obtained by decomposition in S103 as an example, the verification tasks that can be created are:
(1) task1 (element graph inspection Task) including syndrome item 1, syndrome item 2, syndrome item 3, syndrome item 5, syndrome item 6, syndrome item 7, syndrome item 8;
(2) task2 (element graph inspection Task) including syndrome item 9, syndrome item 10, syndrome item 11, syndrome item 12, syndrome item 13, syndrome item 14;
(3) task3 (attribute verification Task): including syndrome item 15, syndrome item 16, syndrome item 17, syndrome item 18, syndrome item 19, syndrome item 20;
(4) task4 (topology verification Task): including syndrome items 21, 22, 23, 24;
(5) task5 (topology verification Task): including syndrome items 25, 26, 27, 28;
(6) task6 (topology verification Task): including syndrome items 29, 30, 31, 32;
(7) task7 (topology verification Task): including syndrome items 33, syndrome items 34, syndrome items 35;
(8) task8 (edge check Task): including syndrome items 36;
(9) task9 (edge check Task): including syndrome item 37.
S105: and extracting the layers associated with all quality checker sub-items in the topology check quality checker sub-item set by the scheduling server, removing repeated layers to obtain a layer set participating in topology check, and then constructing the topological relation of vector space data only for the layer set participating in topology check through a spatial database by the scheduling server. Taking the topology inspection in S103 as an example, the image layers needing to extract and construct the topological relation are CJDCQJX, XZQJX, DGX, TTQ, ZYXMYD, CJDCQ, ZRYCBHQ, KFYQ, LSYD, PDT, PZWJJSTD, QTJZKFQ, STBHHX and ZRBHQ.
S106: and distributing the PostresQL database containing the topological relation to the computing server by the scheduling server.
S107: the dispatching server acquires a plurality of idle computing servers, and respectively distributes a plurality of quality inspection tasks to a plurality of computing servers for computing, wherein each computing server only executes one quality inspection Task at the same time. For example, having 3 free compute servers, Task1 may be assigned to compute server A, Task2 may be assigned to compute server B, and Task3 may be assigned to compute server C.
S108: the computing server performs data computing by using Spatial SQL according with OGC Simple Feature Access SQL specification through a Spatial data engine PostGIS of an open source Spatial database PostgreSQL, and when computing is completed by a certain computing server, the computing server notifies a scheduling server that computing tasks are completed; the dispatch server reads the quality check results from the compute server and assigns quality check tasks to the compute server that have not yet begun execution. For example, when Task1 notifies the dispatch server after the computation is completed on compute server a, the dispatch server reads the quality check result from compute server a and assigns Task5 to compute service area a for execution.
S109: and repeatedly executing S108 until all quality inspection tasks are executed.
S110: the scheduling server merges the quality check results.
At this point, the vector space data quality check ends.
The invention is mainly characterized in that a computation server directly uses a Spatial data engine PostGIS of an open source Spatial database PostgreSQL to compute by using Spatial SQL which conforms to the OGC Simple Feature Access SQL specification, so that the computation efficiency is greatly improved; only the layers participating in the topology verification are extracted to construct the topological relation, so that the time for constructing the topological relation is reduced, the data storage space is reduced, and the time for distributing the data is further shortened; the quality inspection sub items are aggregated according to fixed time steps and then distributed to the quality inspection tasks, so that the computing server can compute the quality inspection tasks with consistent evaluation time, the scheduling efficiency of the quality inspection tasks is improved, and the total time of the quality inspection is shortened.
The invention has the following beneficial effects:
(1) the parallel computing method for vector space data quality inspection provided by the invention preferentially executes 'database correctness' inspection before quality inspection is executed, thereby avoiding the problem that an effective quality inspection result cannot be obtained due to incorrect vector space data structure.
(2) According to the parallel computing method for vector space data quality inspection, only the layers participating in topology inspection are extracted to construct the topological relation when topology inspection is executed, so that the time for constructing the topological relation is shortened, the data storage space is reduced, and the data distribution time is shortened.
(3) The parallel computing method for vector space data quality inspection provided by the invention greatly shortens the time of overall quality inspection and improves the efficiency of vector space data quality inspection on the premise of ensuring the accuracy of quality inspection results.
(4) According to the parallel computing method for vector space data quality inspection, which is provided by the invention, a PostGIS engine using an open source space database PostgreSQL uses Spatial SQL which conforms to OGC Simple Feature Access SQL standards to execute quality inspection computation, so that the dependence on industrial business software is reduced, and the parallel computing method is suitable for popularization and application.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. However, those skilled in the art should appreciate that they can make various changes and modifications without departing from the spirit of the invention based on the disclosure, and therefore, the scope of the invention should not be limited by the disclosure of the embodiments, but should be construed to include various changes and modifications without departing from the scope of the invention and covered by the appended claims.
Claims (8)
1. A parallel computing method for vector space data quality inspection is characterized by comprising the following steps:
acquiring vector space data, and checking the vector space data according to a quality checking scheme;
extracting four types of quality inspection items from the vector space data qualified by inspection according to a quality inspection item classification method to obtain four quality inspection item sets, and further decomposing the four quality inspection item sets to obtain a plurality of quality inspection subentries corresponding to the four quality inspection item sets;
distributing the quality inspection tasks to a plurality of quality inspection sub-items according to the set fixed time step executed by a single quality inspection task and the type of the quality inspection item, and then generating the quality inspection task;
and performing parallel quality inspection according to the distributed quality inspection tasks until all the quality inspection tasks are executed.
2. The parallel computing method for vector space data quality inspection according to claim 1, wherein the database correctness inspection items include an aggregate directory correctness inspection, a file naming correctness inspection, a data format correctness inspection, a data validity inspection, a structure correctness inspection, an attribute correctness inspection, a spatial reference system inspection, a layer integrity inspection, and an attribute integrity inspection.
3. The parallel computing method for vector space data quality inspection according to claim 1, wherein the four quality inspection items are respectively
Element pattern inspection, comprising: composite graph inspection, graph geometric abnormity inspection, line folding inspection, face side line inspection, micro short line inspection, micro face inspection, element node number overrun inspection, element direction correctness inspection, long and narrow face inspection and face gap inspection;
attribute verification, comprising: attribute null value test, attribute association test, attribute value domain test, attribute unique value test, attribute correctness SQL test, ID card code verification, code length test, decimal digit correctness test, code rule correctness test, contour height value correctness test, drawing attribute consistency test (area) and drawing attribute consistency test (length);
topology verification, comprising: the method comprises the following steps of intersecting same-layer lines, intersecting different-layer lines, intersecting same-layer planes, intersecting different-layer planes, self-intersecting lines, self-intersecting planes, overlapping points, overlapping lines, overlapping same-layer planes, overlapping different-layer planes, pseudo nodes, hanging points on the same layer, hanging points on different layers, enabling points to fall on a line, enabling a plane to be in another plane, preventing line planes from intersecting, enabling points to fall on a plane, preventing points from falling on a plane, enabling lines to fall on a plane, preventing lines from falling on a plane, interrupting line intersection, checking space constraint conditions, checking line and plane boundary consistency, and checking contour line and point line contradiction;
edge joining inspection, comprising: checking attribute connection among the pictures, checking attribute connection among the regions, checking graph connection among the pictures and checking graph connection among the regions.
4. The parallel computing method for vector space data quality inspection according to claim 1, wherein, during decomposition,
for quality inspection of a single layer, the inspection subentry accurately describes a certain rule of a plurality of fields of a specific certain layer;
for quality inspection of the associated layers, checking a certain rule that the subentry accurately describes a plurality of fields or conditions of the associated layers;
the syndrome item does not contain parameters of other syndrome items; and after the decomposition is finished, classifying the quality inspection sub-items into a classification set of the quality inspection sub-items according to the classification of the quality inspection items.
5. The parallel computing method for vector space data quality inspection as claimed in claim 1, wherein when the quality inspection task is distributed, the number of element graphic inspection sub-items is set to be n1, the number of topology inspection sub-items is set to be n2, the number of edge inspection sub-items is set to be n3, and the number of attribute inspection sub-items is set to be n 4; the time consumption parameter of each checker item in the element graph check is t1, the time consumption parameter of each checker item in the topology check is t2, the time consumption parameter of each checker item in the edge check is t3, and the time consumption parameter of each checker item in the attribute check is t 4;
linear calculation of total time consumption
T=n1t1+n2t2+n3t3+n4t4
Acquiring the number C of the calculation servers; calculating the shortest parallel quality inspection duration of
Setting the fixed time step for the execution of a single task to be one quarter of Tmin, i.e.
All quality inspection sub-items are allocated to the quality inspection tasks according to the quality inspection item types according to the fixed time step executed by the single quality inspection task, and each quality inspection task comprises a plurality of quality inspection sub-items of the same type.
6. The parallel computing method for vector space data quality inspection according to claim 1, wherein when the quality inspection task is generated,
the time consumption for the inspection evaluation of each inspection sub item is more than or equal to Ta, and each inspection sub item is divided into an inspection Task;
aggregating each syndrome item according to type if the time consumed for inspection and evaluation of each syndrome item is less than Ta;
aggregating each type of quality inspection subentry respectively, wherein the different types of quality inspection subentries are not put in the same quality inspection Task;
each type of quality inspection sub-item is aggregated to form a quality inspection Task, the aggregation method is to create a quality inspection Task, extract a quality inspection sub-item from a quality inspection sub-item set and add the quality inspection sub-item into the quality inspection Task, delete the quality inspection sub-item from the quality inspection sub-item set at the same time, accumulate the evaluation time consumption of all quality inspection sub-items contained in the quality inspection Task to obtain the evaluation time consumption of the quality inspection Task, and if the evaluation time consumption of the quality inspection Task is more than or equal to Ta; the inspection Task is saved; the creation of a new task then continues until the number of class quality checker sub-item sets is 0.
7. The parallel computing method for vector space data quality inspection according to claim 1, wherein, during quality inspection,
extracting all the image layers associated with the quality inspection sub-items in the topology inspection quality inspection sub-item set, removing the repeated image layers to obtain an image layer set participating in topology inspection,
then the scheduling server only constructs the topological relation of vector space data for the layer sets participating in the topological inspection through a PostGIS engine of an open source space database PostgreSQL;
distributing vector space data containing the topological relation to a computing server by a scheduling server;
the method comprises the steps that a plurality of idle computing servers are obtained by a scheduling server, a plurality of quality inspection tasks are respectively distributed to the computing servers for calculation, and each computing server can only execute one quality inspection task at the same time;
the computing server executes computing by using a PostGIS engine of an open source space database PostgreSQL according with the OGC Simple Feature Access SQL specification, and when computing is completed by a certain computing server, the computing server notifies a scheduling server that computing tasks are completed; the scheduling server reads a quality inspection result from the computing server and distributes a quality inspection task which is not started to be executed to the computing server;
and outputting a quality inspection result until all quality inspection tasks are executed.
8. A parallel computing system for vector space data quality inspection, using the method of any one of claims 1 to 7, comprising:
a first module configured to acquire vector space data and to inspect the vector space data according to a quality inspection scheme;
a second module configured to extract four types of quality inspection items according to a quality inspection item classification method for vector space data qualified in inspection to obtain four qualities
Measuring the quality inspection item set, and further decomposing the four quality inspection item sets to obtain a plurality of quality inspection sub-items corresponding to the four quality inspection item sets;
the third module is configured to generate a quality inspection task after distributing the quality inspection tasks to a plurality of quality inspection sub-items according to the set fixed time step executed by a single quality inspection task and the type of the quality inspection items;
and the fourth module is configured to perform parallel quality inspection according to the distributed quality inspection tasks until all the quality inspection tasks are executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210419501.8A CN114756572A (en) | 2022-04-20 | 2022-04-20 | Parallel computing method and system for vector space data quality inspection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210419501.8A CN114756572A (en) | 2022-04-20 | 2022-04-20 | Parallel computing method and system for vector space data quality inspection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114756572A true CN114756572A (en) | 2022-07-15 |
Family
ID=82331969
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210419501.8A Pending CN114756572A (en) | 2022-04-20 | 2022-04-20 | Parallel computing method and system for vector space data quality inspection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114756572A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115357601A (en) * | 2022-10-19 | 2022-11-18 | 广东电网有限责任公司佛山供电局 | Method and system for acquiring real-time data of power transformation work |
CN116756258A (en) * | 2023-06-06 | 2023-09-15 | 北京捷泰云际信息技术有限公司 | Quality inspection method for space vector data in data lake |
-
2022
- 2022-04-20 CN CN202210419501.8A patent/CN114756572A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115357601A (en) * | 2022-10-19 | 2022-11-18 | 广东电网有限责任公司佛山供电局 | Method and system for acquiring real-time data of power transformation work |
CN116756258A (en) * | 2023-06-06 | 2023-09-15 | 北京捷泰云际信息技术有限公司 | Quality inspection method for space vector data in data lake |
CN116756258B (en) * | 2023-06-06 | 2024-03-15 | 易智瑞信息技术有限公司 | Quality inspection method for space vector data in data lake |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114756572A (en) | Parallel computing method and system for vector space data quality inspection | |
Washizaki et al. | Software-engineering design patterns for machine learning applications | |
TW201636888A (en) | Multi-cluster management method and device | |
CN111768096A (en) | Rating method and device based on algorithm model, electronic equipment and storage medium | |
CN113469496A (en) | Petrochemical engineering digital delivery process control method | |
CN115438024A (en) | Method, device and system for importing data into middle station, electronic equipment and storage medium | |
Zhang et al. | Logistics service supply chain order allocation mixed K-Means and Qos matching | |
Glava et al. | Information Systems Reengineering Approach Based on the Model of Information Systems Domains | |
CN115329011A (en) | Data model construction method, data query method, data model construction device and data query device, and storage medium | |
WO2022242401A1 (en) | Transaction processing method and apparatus for database system, and electronic device, computer readable storage medium, and computer program product | |
CN112445905A (en) | Information processing method and device | |
Liu et al. | Application of master data classification model in enterprises | |
CN115759742A (en) | Enterprise risk assessment method and device, computer equipment and storage medium | |
CN115170097A (en) | Spatial data distributed quality inspection method and system | |
Asmild et al. | Do efficiency scores depend on input mix? A statistical test and empirical illustration | |
CN111723129B (en) | Report generation method, report generation device and electronic equipment | |
CN113779116A (en) | Object sorting method, related equipment and medium | |
Lin et al. | A Distributed Routing Algorithm for Virtual Circuit Data Networks. | |
Ding et al. | Multi-resolution prediction model based on community relevance for missing links prediction | |
Xiao et al. | QoS-awared replica placement techniques in data grid applications | |
CN114416738B (en) | Data aggregation method and device based on relational database | |
Dai et al. | Graph Database and Graph Computing for Power System Analysis | |
Caron et al. | Heuristic for license-aware, performant and energy efficient deployment of multiple software in Cloud architecture | |
CN117851522A (en) | Data warehouse modeling method based on power grid big data center | |
Benkler | An Approach for Identifying Microservices using Clustering on Control Flow and Data Flow |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |