CN112925773A - POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database - Google Patents

POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database Download PDF

Info

Publication number
CN112925773A
CN112925773A CN201911261624.8A CN201911261624A CN112925773A CN 112925773 A CN112925773 A CN 112925773A CN 201911261624 A CN201911261624 A CN 201911261624A CN 112925773 A CN112925773 A CN 112925773A
Authority
CN
China
Prior art keywords
poi
data
poi data
parent
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911261624.8A
Other languages
Chinese (zh)
Inventor
熊政辉
史萍
岳溪柳
周俊华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Property Reinsurance Co ltd
Sinore Catastrophe Risk Management Co ltd
China Reinsurance Group Co ltd
Original Assignee
China Property Reinsurance Co ltd
Sinore Catastrophe Risk Management Co ltd
China Reinsurance Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Property Reinsurance Co ltd, Sinore Catastrophe Risk Management Co ltd, China Reinsurance Group Co ltd filed Critical China Property Reinsurance Co ltd
Priority to CN201911261624.8A priority Critical patent/CN112925773A/en
Publication of CN112925773A publication Critical patent/CN112925773A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a POI data cleaning and fusing method for constructing an industry risk exposure database. The data acquisition step comprises: acquiring attribute information and spatial position information of each POI data; obtaining a score of each POI data evaluated according to the attribute information; and acquiring the relationship type of each POI data. The data processing step includes: cleansing POI data based at least on the attribute information and/or score and/or relationship type; carrying out spatial combination on the cleaned POI data based on the position information and/or the relationship type, and assigning a score to the combined POI data; and summarizing the cleaned POI data and the POI data after spatial combination to be used as the weight of the final spatial distribution of the POI. The data output step includes: and outputting the summarized POI data in a summarized file form, and carrying out graphic display by combining a GIS technology. A large number of commercial and industrial cost splitting space distribution rationality tests and commercial and industrial area splitting quantitative experiments show that the POI data cleaning and fusing method can be used for well constructing an industrial risk exposure database and meeting the requirements of earthquake, typhoon and other major disaster models.

Description

POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database
Technical Field
The invention belongs to the field of POI data processing, and particularly relates to a POI data cleaning and fusing method and device for constructing an industry risk exposure database, which can be applied to earthquake, typhoon, flood and other major disaster models.
Background
The industry risk Exposure database aed (aggregate Exposure database) reflects the socially viable business, industry, residential building area, cost and spatial distribution thereof. Therefore, the industrial risk exposure database can be used for carrying out space splitting on the provincial or prefecture accumulated total insurance, the uncertainty of the earthquake insurance loss result under the accumulated insurance is reduced to the greatest extent, and the practical value of the industrial risk exposure database in insurance pricing and earthquake risk management is improved. Meanwhile, the industrial risk exposure database can also be used for evaluating the overall economic loss of commercial, industrial and residential society caused by earthquakes, and helping government departments to timely and effectively deal with disasters and resist earthquake and relief disasters. However, taking construction of the chinese (including hong kong, australia, etc.) industry risk exposure database AED as an example, due to data loss, there has been no report of successfully constructing chinese business and industry risk exposure databases, no report of using local-city-level statistics yearbook building area and unit cost data and performing space splitting based on building outlines and POI data, and no report of how to use computer technology to complete the method of cleaning and fusing POI data in performing space splitting based on building outlines and POI data.
Disclosure of Invention
In order to solve the technical problems, the invention provides a POI data cleaning and fusing method for constructing an industry risk exposure database, which comprises the following steps of data acquisition, data processing and data output:
the data acquisition step comprises:
acquiring attribute information and spatial position information of each POI data;
obtaining a score of each POI data evaluated according to the attribute information;
acquiring the relationship type of each POI data;
the data processing step includes:
cleansing POI data based at least on the attribute information and/or score and/or relationship type;
carrying out spatial combination on the cleaned POI data based on the position information and/or the relationship type, and assigning a score to the combined POI data;
summarizing the cleaned POI data and the POI data after spatial combination to serve as the POI of the final value distribution weight and the spatial distribution position;
the data output step includes:
and outputting the summarized POI data in a summarized file form, and carrying out graphic display by combining a GIS technology.
The invention also provides a POI data cleaning and fusing device for constructing the industry risk exposure database, which comprises a data acquisition device, a data processing device and a data output device, wherein the data acquisition device comprises a data acquisition unit, a data processing unit and a data output device:
the data acquisition device is configured to:
acquiring attribute information and spatial position information of each POI data;
obtaining a score of each POI data evaluated according to the attribute information;
acquiring the relationship type of each POI data;
the data processing apparatus includes:
a cleansing means configured to cleanse POI data based at least on the attribute information and/or score and/or relationship type;
the spatial merging device is configured to spatially merge the cleaned POI data based on the position information and/or the relationship type and/or the attribute information, and assign a score to the merged POI data;
summarizing the cleaned POI data and the POI data after spatial combination to be used as the weight of the final spatial distribution of the POI;
the data output device is configured to:
and outputting the summarized POI data in a summarized file form, and carrying out graphic display by combining a GIS technology.
The invention has the beneficial effects that a method for cleaning and fusing POI data in the process of carrying out space splitting based on the building outline and the POI data by utilizing the computer technology is provided, and a large amount of commercial and industrial cost splitting space distribution rationality inspection and commercial and industrial cost splitting quantitative experiments show that the POI data cleaning and fusing method can better construct an industrial risk exposure database and meet the requirements of earthquake catastrophe models and the like.
Drawings
FIG. 1 comparison before and after POI data cleaning and fusion;
FIG. 2 is a diagram showing the rationality of the distribution of the commercial cost in Beijing;
FIG. 3 shows the spatial distribution of commercial construction cost in the sunward region of Beijing;
FIG. 4 shows the spatial distribution of commercial construction costs in the western regions of Beijing;
FIG. 5 shows the spatial distribution of commercial costs in the Haitai district of Beijing;
FIG. 6 is a diagram showing the rationality of the spatial distribution of the industrial construction costs in Beijing;
FIG. 7 quantitative inspection of commercial building areas in Shanghai City;
FIG. 8 is a quantitative test of industrial building area in Shanghai city;
FIG. 9 is a national commercial building area distribution map;
FIG. 10 is a spatial distribution diagram of the national commercial costs;
FIG. 11 is a national industry area spatial distribution map;
FIG. 12 is a national industry cost space distribution diagram.
Detailed Description
The invention relates to a POI data cleaning and fusing method for constructing an industry risk exposure database, which mainly comprises the following steps of data acquisition, data processing and data output:
the data acquisition step comprises:
acquiring attribute information and spatial position information of each POI data;
obtaining a score of each POI data evaluated according to the attribute information;
acquiring the relationship type of each POI data;
the data processing step includes:
cleansing POI data based at least on the attribute information and/or score and/or relationship type;
carrying out spatial combination on the cleaned POI data based on the position information and/or the relationship type, and assigning a score to the combined POI data;
summarizing the cleaned POI data and the POI data after spatial combination to serve as the POI of the POI final value distribution weight and the spatial distribution position;
the data output step includes:
and outputting the summarized POI data in a summarized file form, and carrying out graphic display by combining a GIS technology.
Wherein, a POI (point Of interest), a POI may be a supermarket, a building, a bus station, etc. POIs may be used to describe the longitude and latitude, name, address, usage, etc. of a "point of interest," including business POI data, industrial POI data, typically including attribute information of fields such as unique identification number (FID), TYPE (kidd), Longitude (LON), Latitude (LAT), POI number (POI _ ID) relationship TYPE (TYPE); the 'score' is a KIND field of the POI data and represents the weight of the POI in the final spatial distribution, and any known scoring method is within the selection range of the invention; the "relationship TYPE" is a TYPE field of the POI data.
Some embodiments of the present invention relating to cleaning methods include:
judging whether the relation type is a first type label or a second type label;
judging whether the same target comprises a plurality of POI _ IDs;
when the relation type is a first-class label, namely a plurality of POI _ IDs are identified to comprise a parent-class POI _ ID and a subclass-class POI _ ID, deleting the subclass-class POI _ ID; for example, the POI Rel _ Type relationship is as follows: rel _ Type is 1, which means that POI _ ID1 is the parent of POI _ ID2, the relationship Type is physical, that is, the same target is composed of a plurality of POI _ IDs, and if there are a POI _ ID1 and an entrance POI _ ID2 of a building, the child ID needs to be identified and deleted.
And when the relation type is the second type of label, namely a plurality of POI _ IDs are identified as the same POI, and the POI _ IDs with lower scores are removed by dividing the POI _ IDs into different types of weights and dividing the POI _ IDs into different types of weights. For example, Rel _ Type is 2, which indicates that POI _ ID1 and POI _ ID2 are the same POI, that is, the same target is repeatedly represented by two POI _ IDs, such as POI _ ID1 of a hotel and POI _ ID2 of a restaurant in beijing, and at this time, it is necessary to divide the POI with different types of weights, and remove the POI _ ID with a lower score.
The first-class tag or the second-class tag can be distinguished by using different values of the relationship type, for example, type1 indicates that two columns of IDs before and after the POI are in a parent-child relationship, and type2 indicates that two columns of IDs are repeatedly indicated as the same POI.
Some embodiments of the invention relating to the spatial merging method comprise the steps of:
when the POI _ IDs are identified as different POIs and the distances between the POI _ IDs are judged to be less than the threshold value n based on the position information, the POIs are combined, and the weight values of the combined POI data are the sum of the weight scores of the POIs.
The threshold n is a threshold value, and may be set to be within 10 meters, for example; usually, the POI _ ID is represented as the same object, such as POI _ ID of a plurality of stores in a royal well department building, which does not need to verify the relationship type, but is a part of the royal well, and the POIs are merged, and the weight value is the sum of the scores of the weight of each POI. The system may preset the sum to not exceed 100 and in a more preferred embodiment, if 100 is exceeded, the upper limit is taken to be 100.
The following examples are provided to explain the process concept of the invention in more detail, in particular:
acquiring business POI data and industrial POI data of a secondary geographical range in a geographical range of a pre-established industry risk exposure database, a POI final scoring table, a POI relation table and a parent-child relation table of the secondary geographical range, and a POI _ ID table to be deleted;
the data processing step includes:
reading FID, KIND, LON, LAT and POI _ ID fields of the collected POI data;
reading a final POI scoring table and assigning a score to the POI according to a scoring value corresponding to KIND of the POI data;
reading a parent-child relationship table and deleting child POI data;
reading a POI _ ID table to be deleted and deleting repeated POI data;
selecting and summarizing parent POI data and non-parent POI data;
the data output step includes:
outputting POI after assigning score;
a parent-child relationship table;
POI _ ID table to be deleted
And summarizing a result file.
Taking a certain province as an example, the business POI data and the industrial POI data are text files respectively, for example, the business POI data may be named as < pN > com.txt, the industrial POI data may be named as < pN > ind.txt, and < pN > is the name of the province (the same applies below); the POI final scoring table comprises a score interval of the POI data, and POI scoring is completed according to a scoring value corresponding to KIND of the POI data; the "POI relationship table of secondary geographic range" includes at least TYPE fields, and may generate a "parent-child relationship table" according to the "POI relationship table of secondary geographic range", for example, 3 columns of POI _ IDs are selected from the "POI relationship table of secondary geographic range", each column of fields is [ parent ID, child ID, 1], and is derived by POI _ relationship _ < pN >. txt file name; "POI _ ID table to be deleted", for example, POI _ ID of type2 and low score is selected, 1 column [ ID to be deleted ], and files type1_ deletecom.
In another embodiment of the step of summarizing the selected parent POI data and the non-parent POI data, the non-parent POI data are summarized after being merged in the near-distance space. For example, POI data with the same decimal number 4 bits after latitude and longitude are integrated, and a score is assigned again.
Some embodiments of the invention may employ MATLAB to merge POIs.
The invention relates to an embodiment of a POI data cleaning and fusing device in a seismic hazard model, which comprises a data acquisition device, a data processing device and a data output device, wherein the data acquisition device comprises a data acquisition unit, a data processing unit and a data output device:
the data acquisition device is configured to:
acquiring attribute information and spatial position information of each POI data;
obtaining a score of each POI data evaluated according to the attribute information;
acquiring the relationship type of each POI data;
the data processing apparatus includes:
a cleansing means configured to cleanse POI data based at least on the attribute information and/or score and/or relationship type;
a spatial merging device configured to spatially merge the cleaned POI data based on the position information and/or the relationship type and assign a score to the merged POI data;
summarizing the cleaned POI data and the POI data after spatial combination to be used as POI of value distribution weight and spatial distribution position;
the data output device is configured to:
and outputting the summarized POI data in a summarized file form.
In some embodiments of the cleaning apparatus, is further configured to:
judging whether the relation type is a first type label or a second type label;
judging whether the same target comprises a plurality of POI _ IDs;
when the relation type is a first-class label and the multiple POI _ IDs comprise a parent POI _ ID and a child POI _ ID, deleting the child POI _ ID;
and when the relation type is the second type of label, and a plurality of POI _ IDs are identified as the same POI, dividing the POI _ IDs into different types according to different types of weights, and removing the POI _ IDs with lower scores.
In some embodiments of the spatial merge device is configured to:
judging whether the relation type is a first type label or a second type label;
when the relationship type is a second-class tag, and when the multiple POI _ IDs are identified as different POIs but it is determined that the multiple POI _ IDs are less than a threshold n apart (e.g., within 10 meters) based on the location information, the POIs are merged, and the weight value of the merged POI data is the sum of the weighted scores of the POIs.
The following embodiment examples are used to explain the device concept of the present invention more specifically, and specifically: the data acquisition device is configured to:
acquiring commercial POI data and industrial POI data of a secondary geographical range within a geographical range of a pre-established earthquake disaster model, a POI final scoring table, a POI relation table and a parent-child relation table of the secondary geographical range, and a POI _ ID table to be deleted;
the data processing apparatus is configured to:
reading FID, KIND, LON, LAT and POI _ ID fields of the collected POI data;
reading a final POI scoring table and assigning a score to the POI according to a scoring value corresponding to KIND of the POI data;
reading a parent-child relationship table and deleting child POI data;
reading a POI _ ID table to be deleted and deleting repeated POI data;
selecting and summarizing parent POI data and non-parent POI data;
the data output device is configured to:
outputting POI after assigning score;
a parent-child relationship table;
POI _ ID table to be deleted
And summarizing a result file.
In some other embodiments of the apparatus, in the summarizing configuration process of the selected parent POI data and the non-parent POI data, the non-parent POI data are summarized after being merged in the close-range space.
The following experiments further explain the POI data cleaning and fusion method of the present invention by result output and result validation.
The test takes the local commercial district of the core city of Chongqing city as an example, and the result output is compared and verified. In fig. 1, the black circles on the green bottom represent original POI data, the red triangles represent POI results after being cleaned and fused by the device, the shown effect is obtained, and the original POI is cleaned, classified and fused by the elements such as coupling relationship type, attribute information and spatial information.
The test takes the national commercial and industrial building area and cost as examples to carry out result verification.
1) Verification of splitting process of commercial and industrial buildings
The national commercial/industrial area splitting is controlled by taking the statistical yearbook data of various cities as the total amount, and in order to verify the accuracy of the splitting process, the commercial/industrial area values in each grid taking the cities as units are added and summed and compared with the statistical data, R2The resolution process is correct, 1.
2) Checking reasonability of spatial distribution of splitting of commercial and industrial construction costs- - -Beijing City as an example
And (3) carrying out superposition inspection on the splitting result of the commercial building cost in Beijing and Google Earth base map, and inspecting the rationality of the splitting result. The red circles in fig. 2 represent regions of comparative density of human activities and buildings as seen in Google Earth, with the results of the splitting being superimposed to give the effect shown, with red to blue indicating commercial costs going from large to small.
The commercial construction cost distribution of the rising area, the western area and the hai lake area in Beijing is shown in figure 3, figure 4 and figure 5. As can be seen from the figure, the maximum value of the grids in the sunward area in Beijing is in Sanlintun, and the total manufacturing cost of the commercial buildings is 303.4 billion yuan; the maximum value of the grid in the western region appears in a financial street, and the total cost of the commercial building is 214.4 billion yuan; the maximum value of the sea area grids appears in the middle-sized villages, commercial buildings are 188.6 billion yuan in total, and the experience cognition is kept consistent.
Referring to the thematic map of the industrial space layout of the urban area in Beijing, the industrial construction costs split from the risk exposure database are compared, as shown in FIG. 6. As can be seen, the special problem map of the industrial space layout of the Beijing urban city is basically consistent with that of the Beijing urban city in the area with high value of the splitting result of the industrial manufacturing cost.
3) Quantitative test of commercial and industrial building area splitting-Shanghai City as example
Since the commercial/industrial area splitting is based on the local-city-level statistical yearbook data, the quantitative test can be performed by using the lower-level administrative division data, i.e., the county-level/district-level statistical data, and the results are shown in fig. 7 and 8 by using the statistical data and the splitting data of each district in the example of the sea city.
4) Building outline data verification POI scoring system method-Shandong province as an example
Since the building outline data is the most direct and accurate data known at present, the building outline data can be used for the effectiveness of a POI scoring system method. In the research, taking Shandong province as an example, the total construction cost of commercial and industrial buildings obtained by building profile data of Jinan city, Qingdao city, tobacco station city, Weihai city and Weifang city covered by a profile area is compared with the total construction cost of the commercial and industrial buildings obtained by a POI method to test the rationality of the construction of the POI scoring system. As a result, the total construction cost of the commercial buildings is better than the total construction cost, and the commercial construction cost and the industrial construction cost of each city are obtained by POI scoring.
In order to further verify the rationality of the construction of the POI scoring system, a construction outline coverage area in the Jinan city is selected, and the position rationality of the maximum grid of the commercial and industrial construction cost obtained by directly calculating the construction outline and calculating by using the POI scoring system is compared. As a result, the total of 3 meshes before the cost of commercial buildings obtained by the construction outline method is 207 billion yuan, and the total cost of commercial buildings in the outline area is 724 billion yuan. The buildings distributed in the largest grid comprise north garden overpasses, daming lake parks, spring city squares, Eurasia electronic buildings, Yuanhui commercial buildings, door art markets and the like.
The result obtained by the POI scoring method for the commercial building cost of the Jinan city shows that the total 3 grids before the commercial building cost obtained by the POI scoring method is 169 billion yuan, and the total cost of the commercial buildings in the contour area is 754 billion yuan. The buildings distributed in the largest grid comprise Shimaotang city, Jinan railway station, Tiancheng Xinjiang, Beiyuan viaduct, Hexiang commercial building, Yingzhu shopping plaza and the like. The two methods are comprehensively compared to obtain the total cost of the commercial buildings obtained by the method for building the outline in the outline coverage area of the buildings in the Jinan city, which is 3.9 percent less than that obtained by the POI scoring method, and the total cost is kept consistent.
The results of the building outline method for the industrial building cost of the Jinan city show that the total 3 meshes before the industrial building cost obtained by the building outline method is 198 million yuan, and the total cost of the industrial buildings in the outline area is 575 million yuan. The result obtained by the POI scoring method for the manufacturing cost of the industrial buildings in the Jinan City shows that the total 3 grids before the manufacturing cost of the industrial buildings obtained by the POI scoring method is 189 hundred million yuan, and the manufacturing cost of the industrial buildings in the contour area is 624.5 hundred million yuan.
The two methods are comprehensively compared to obtain the total construction cost of the industrial buildings obtained by the method for building the outline in the outline coverage area of the buildings in the Jinan city, which is 7.8 percent less than that obtained by the POI scoring method, and the total construction cost is kept consistent.
5) National commercial building area and cost spatial distribution
Based on the data of the county annual book, the national 68 core city main urban area building data and the POI data, the calculation results of the national commercial building area and the cost spatial distribution are shown in the figures 9 and 10, the national commercial total area is 79.72 billion square meters, and the national commercial building total cost is 32.62 trillion yuan.
6) Thematic map of national industrial building area and cost spatial distribution
The calculation results of the national-wide industrial building area and the spatial distribution of the construction cost are shown in fig. 11 and 12 based on the geographic and urban statistical yearbook data, the national 68 core city main urban area building data and the POI data. The total industrial area of the whole country is 85.78 billion square meters, and the total construction cost of the industrial buildings of the whole country is 33.20 trillion yuan.
Implementations and functional operations of the subject matter described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware, including the structures disclosed in this specification and their structural equivalents, or combinations of more than one of the foregoing. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on one or more tangible, non-transitory program carriers, for execution by, or to control the operation of, data processing apparatus.
Alternatively or in addition, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution with a data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of the foregoing.
The term "data processing apparatus" encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or multiple computers. An apparatus can comprise special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can include, in addition to hardware, code that creates an execution environment for the associated computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program (which may also be referred to or described as a program, software application, module, software module, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in: in a markup language document; in a single file dedicated to the relevant program; or in multiple coordinated files, such as files that store one or more modules, sub programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Computers suitable for carrying out computer programs include, and illustratively may be based on, general purpose microprocessors, or special purpose microprocessors, or both, or any other kind of central processing unit. Typically, the central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for executing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such a device. Further, the computer may be embedded in another apparatus, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a removable storage device, e.g., a Universal Serial Bus (USB) flash drive, or the like.
Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example: semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To send interactions with a user, embodiments of the subject matter described in this specification can be implemented on a computer having: a display device, for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to a user; and a keyboard and a pointing device, such as a mouse or trackball, by which a user can communicate input to the computer. Other kinds of devices may also be used to send interactions with the user; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including acoustic, speech, or tactile input. In addition, the computer may interact with the user by sending documents to a device used by the user and receiving documents from the device; for example, by sending a web page to a web browser on the user's client device in response to a request received from the web browser.
Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components in the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), e.g., an internet computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features that may embody particular implementations of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in combination and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as: such operations are required to be performed in the particular order shown, or in sequential order, or all illustrated operations may be performed, in order to achieve desirable results. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Particular embodiments of the subject matter have been described. Other implementations are within the scope of the following claims. For example, the activities recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

Claims (10)

1. A POI data cleaning and fusion method for constructing an industry risk exposure database is characterized by comprising the following steps of data acquisition, data processing and data output:
the data acquisition step comprises:
acquiring attribute information and spatial position information of each POI data;
obtaining a score of each POI data evaluated according to the attribute information;
acquiring the relationship type of each POI data;
the data processing step includes:
cleansing POI data based at least on the attribute information and/or score and/or relationship type;
carrying out spatial combination on the cleaned POI data based on the position information and/or the relationship type, and assigning a score to the combined POI data;
summarizing the cleaned POI data and the POI data after spatial combination to serve as the POI of the POI final value distribution weight and the spatial distribution position;
the data output step includes:
and outputting the summarized POI data in a summarized file form, and carrying out graphic display by combining a GIS technology.
2. The method of claim 1, wherein the cleaning method comprises:
judging whether the relation type is a first type label or a second type label;
judging whether the same target comprises a plurality of POI _ IDs;
when the relation type is a first-class label, namely a plurality of POI _ IDs are identified to comprise a parent-class POI _ ID and a subclass-class POI _ ID, deleting the subclass-class POI _ ID;
and when the relation type is the second type of label, namely a plurality of POI _ IDs are identified as the same POI, and the POI _ IDs with lower scores are removed by dividing the POI _ IDs into different types of weights and dividing the POI _ IDs into different types of weights.
3. The method of claim 2, wherein the spatial combination method comprises:
judging whether the relation type is a first type label or a second type label;
when the relationship type is a second-class tag, and when the multiple POI _ IDs are identified as different POIs but it is determined that the multiple POI _ IDs are less than a threshold n apart (e.g., within 10 meters) based on the location information, the POIs are merged, and the weight value of the merged POI data is the sum of the weighted scores of the POIs.
4. The method of claim 1,
the data acquisition step comprises:
acquiring business POI data and industrial POI data of a secondary geographical range in a geographical range of a pre-established industry risk exposure database, a POI final scoring table, a POI relation table and a parent-child relation table of the secondary geographical range, and a POI _ ID table to be deleted;
the data processing step includes:
reading FID, KIND, LON, LAT and POI _ ID fields of the collected POI data;
reading a final POI scoring table and assigning a score to the POI according to a scoring value corresponding to KIND of the POI data;
reading a parent-child relationship table and deleting child POI data;
reading a POI _ ID table to be deleted and deleting repeated POI data;
selecting and summarizing parent POI data and non-parent POI data;
the data output step includes:
outputting POI after assigning score;
a parent-child relationship table;
POI _ ID table to be deleted
And summarizing a result file.
5. The method of claim 4, wherein the step of summarizing the selected parent POI data and the non-parent POI data includes summarizing the non-parent POI data after merging the non-parent POI data in a near space.
6. The utility model provides an industry risk exposure database POI data washs and fuses device which characterized in that, includes data acquisition device, data processing device, data output device:
the data acquisition device is configured to:
acquiring attribute information and spatial position information of each POI data;
obtaining a score of each POI data evaluated according to the attribute information;
acquiring the relationship type of each POI data;
the data processing apparatus includes:
a cleansing means configured to cleanse POI data based at least on the attribute information and/or score and/or relationship type;
the spatial merging device is configured to spatially merge the cleaned POI data based on the position information and/or the relationship type and/or the attribute information, and assign a score to the merged POI data;
summarizing the cleaned POI data and the POI data after spatial combination to be used as the weight of the final spatial distribution of the POI;
the data output device is configured to:
and outputting the summarized POI data in a summarized file form, and carrying out graphic display by combining a GIS technology.
7. The apparatus of claim 1, wherein the washing apparatus is further configured to:
judging whether the relation type is a first type label or a second type label;
judging whether the same target comprises a plurality of POI _ IDs;
when the relation type is a first-class label and the multiple POI _ IDs comprise a parent POI _ ID and a child POI _ ID, deleting the child POI _ ID;
and when the relation type is the second type of label, and a plurality of POI _ IDs are identified as the same POI, dividing the POI _ IDs into different types according to different types of weights, and removing the POI _ IDs with lower scores.
8. The apparatus of claim 2, wherein the spatial merge apparatus is configured to:
judging whether the relation type is a first type label or a second type label;
when the relationship type is the second type tag, and when the multiple POI _ IDs are identified as different POIs but it is determined that the multiple POI _ IDs are less than the threshold n apart (for example, within 10 meters) based on the location information, the POIs are merged, and the weight value of the merged POI data is the sum of the weighted scores of the POIs.
9. The apparatus of claim 6, wherein the data acquisition apparatus is configured to:
acquiring business POI data and industrial POI data of a secondary geographical range in a geographical range of a pre-established industry risk exposure database, a POI final scoring table, a POI relation table and a parent-child relation table of the secondary geographical range, and a POI _ ID table to be deleted;
the data processing apparatus is configured to:
reading FID, KIND, LON, LAT and POI _ ID fields of the collected POI data;
reading a final POI scoring table and assigning a score to the POI according to a scoring value corresponding to KIND of the POI data;
reading a parent-child relationship table and deleting child POI data;
reading a POI _ ID table to be deleted and deleting repeated POI data;
selecting and summarizing parent POI data and non-parent POI data;
the data output device is configured to:
outputting POI after assigning score;
a parent-child relationship table;
POI _ ID table to be deleted
And summarizing a result file.
10. The apparatus of claim 9, wherein during the configuration process of summarizing the selected parent POI data and the non-parent POI data, the non-parent POI data are merged and summarized in a short-distance space.
CN201911261624.8A 2019-12-10 2019-12-10 POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database Pending CN112925773A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911261624.8A CN112925773A (en) 2019-12-10 2019-12-10 POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911261624.8A CN112925773A (en) 2019-12-10 2019-12-10 POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database

Publications (1)

Publication Number Publication Date
CN112925773A true CN112925773A (en) 2021-06-08

Family

ID=76162178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911261624.8A Pending CN112925773A (en) 2019-12-10 2019-12-10 POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database

Country Status (1)

Country Link
CN (1) CN112925773A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210224821A1 (en) * 2020-07-24 2021-07-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Land usage property identification method, apparatus, electronic device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080262710A1 (en) * 2007-04-23 2008-10-23 Jing Li Method and system for a traffic management system based on multiple classes
CN101482864A (en) * 2008-01-08 2009-07-15 国际商业机器公司 Method and apparatus used for checking correctness of GIS data
CN102158802A (en) * 2011-02-15 2011-08-17 广州市动景计算机科技有限公司 Information distribution method and device
CN104699818A (en) * 2015-03-25 2015-06-10 武汉大学 Multi-source heterogeneous multi-attribute POI (point of interest) integration method
CN105608153A (en) * 2015-12-18 2016-05-25 晶赞广告(上海)有限公司 Universal POI information association method
CN109947881A (en) * 2019-02-26 2019-06-28 广州城市规划技术开发服务部 A kind of POI judging method, device, mobile terminal and computer readable storage medium
CN110375763A (en) * 2018-04-12 2019-10-25 上海博泰悦臻电子设备制造有限公司 The method, apparatus and intelligent terminal of searching for point of interest
CN110472559A (en) * 2019-08-13 2019-11-19 苏州中科天启遥感科技有限公司 A kind of remote sensing image construction area land use attribute space moving method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080262710A1 (en) * 2007-04-23 2008-10-23 Jing Li Method and system for a traffic management system based on multiple classes
CN101482864A (en) * 2008-01-08 2009-07-15 国际商业机器公司 Method and apparatus used for checking correctness of GIS data
CN102158802A (en) * 2011-02-15 2011-08-17 广州市动景计算机科技有限公司 Information distribution method and device
CN104699818A (en) * 2015-03-25 2015-06-10 武汉大学 Multi-source heterogeneous multi-attribute POI (point of interest) integration method
CN105608153A (en) * 2015-12-18 2016-05-25 晶赞广告(上海)有限公司 Universal POI information association method
CN110375763A (en) * 2018-04-12 2019-10-25 上海博泰悦臻电子设备制造有限公司 The method, apparatus and intelligent terminal of searching for point of interest
CN109947881A (en) * 2019-02-26 2019-06-28 广州城市规划技术开发服务部 A kind of POI judging method, device, mobile terminal and computer readable storage medium
CN110472559A (en) * 2019-08-13 2019-11-19 苏州中科天启遥感科技有限公司 A kind of remote sensing image construction area land use attribute space moving method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210224821A1 (en) * 2020-07-24 2021-07-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Land usage property identification method, apparatus, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Cavallaro et al. Assessment of urban ecosystem resilience through hybrid social–physical complex networks
WO2016150407A1 (en) Address resolution data-based construction land type rapid identification method
US11966424B2 (en) Method and apparatus for dividing region, storage medium, and electronic device
CN104899443A (en) Method and system for evaluating current travel demand and predicting travel demand in future
Masoomi et al. Simulation-based assessment of postearthquake functionality of buildings with disruptions to cross-dependent utility networks
Hu et al. Extraction and monitoring approach of dynamic urban commercial area using check-in data from Weibo
CN104217257A (en) Integrated risk calculating method of disaster chain
Zandbergen et al. Positional accuracy of TIGER 2000 and 2009 road networks
Yin et al. A deep learning approach for rooftop geocoding
Liu et al. A quantitative method for storm surge vulnerability assessment–a case study of Weihai city
CN112925773A (en) POI (Point of interest) data cleaning and fusing method and device for constructing industry risk exposure database
Husen et al. The quality of OpenStreetMap in Malaysia: A preliminary assessment
Sumari et al. Geospatial distribution and accessibility of primary and secondary schools: a case of Abbottabad City, Pakistan
Zhu et al. Ridership Prediction of Urban Rail Transit Stations Based on AFC and POI Data
CN112925764B (en) Industry risk exposure database of disaster model and construction method thereof
CN112926175B (en) Double-layer embedded POI quantitative scoring system and spatial weight model
CN104750929A (en) Rail transit service efficiency invulnerability measurement method combining network point right
CN110674843B (en) Method and system for generating parking lot entity
Howard et al. Definition of an Enriched GIS Network for Evacuation Planning.
CN113971247A (en) Data processing method and computer program product
CN105447163A (en) Geocoding method and system
O-tani et al. Automated Model Construction of Urban Structures with Limited Digital Data
Bariar et al. Development of GIS based spatial data infrastructure for micro-level planning
Gizzi et al. Mobile technology to contribute operatively to the safeguard of cultural heritage
Sahabo et al. A gis based multi-criteria analysis for siting recreational parks in Yola-North Local Government

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination