CN116911884A - Natural resource public land price management system and method based on big data - Google Patents
Natural resource public land price management system and method based on big data Download PDFInfo
- Publication number
- CN116911884A CN116911884A CN202310898194.0A CN202310898194A CN116911884A CN 116911884 A CN116911884 A CN 116911884A CN 202310898194 A CN202310898194 A CN 202310898194A CN 116911884 A CN116911884 A CN 116911884A
- Authority
- CN
- China
- Prior art keywords
- data
- land price
- land
- information
- public land
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0278—Product appraisal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/16—Real estate
- G06Q50/165—Land development
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- Finance (AREA)
- Development Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Accounting & Taxation (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域Technical field
本发明涉及公示地价管理技术领域,具体涉及一种基于大数据的自然资源公示地价管理系统及方法。The present invention relates to the technical field of public land price management, and specifically relates to a natural resource public land price management system and method based on big data.
背景技术Background technique
土地地价可以理解为土地所有者向土地需求者让渡土地使用权所获得的收入,是土地所有者权益的体现。公示地价是我国法定的地价体系,包括基准地价、标定地价等,其作为地价的核心,维系着市场经济的平稳发展,促动市场价值的公平公开。《中华人民共和国城市房地产管理法》规定基准地价、标定地价应当定期确定并公布。作为公示地价的重要组成部分,两项成果在日常土地估价中均有广泛的应用场景,基准地价反映的是区域的平均地价,侧重于宏观尺度,价格水平明显低于市场价,各地市县部门要每三年全面更新一次基准地价;而标定地价反映的是具体宗地的市场地价,侧重于微观尺度,与市场价水平相当或略低,对市场的指引性更强,标定地价更新周期为一年。地价体系繁杂多样,虽大体明确了各自的使用范围,但仍然存在运用上的交叉,多类型地价的公示也容易给社会公众造成认知上的干扰,削弱了地价成果对市场参与主体的引导效应,迫切需要对当前地价体系进行整合优化。因此,有必要构建一个信息化公示地价管理系统方便用户查询各个地区的地价。Land price can be understood as the income obtained by land owners from transferring land use rights to land demanders, and is a reflection of land owners' rights and interests. Publicly announced land prices are my country's legal land price system, including benchmark land prices, demarcated land prices, etc. As the core of land prices, they maintain the steady development of the market economy and promote the fair disclosure of market values. The Urban Real Estate Management Law of the People's Republic of China stipulates that benchmark land prices and demarcated land prices should be determined and announced regularly. As an important part of the public land price, these two results have a wide range of application scenarios in daily land valuation. The benchmark land price reflects the average land price in the region, focusing on the macro scale. The price level is significantly lower than the market price. Municipal and county departments in various places The benchmark land price must be comprehensively updated every three years; the calibrated land price reflects the market land price of a specific parcel, focusing on the micro scale, which is equivalent to or slightly lower than the market price level, and is more guiding for the market. The calibrated land price update cycle is One year. The land price system is complex and diverse. Although their respective scopes of use are generally clarified, there are still overlaps in application. The announcement of multiple types of land prices can also easily cause cognitive interference to the public and weaken the guiding effect of land price results on market participants. , there is an urgent need to integrate and optimize the current land price system. Therefore, it is necessary to build an information-based public land price management system to facilitate users to query land prices in various regions.
发明内容Contents of the invention
针对现有技术中的缺陷,本发明提供的一种基于大数据的自然资源公示地价管理系统及方法,通过采用大数据技术全面获取各个地区的公示地价体系建设成果信息数据,并经过大数据处理,提取公示地价相关信息形成统一模板的公示地价管理数据,用户只需输入待估对象信息便可实现自动估价,实现快速反映待估对象的潜在市场价值,为用户的决策提供重要参考。In view of the shortcomings in the existing technology, the present invention provides a natural resource public land price management system and method based on big data. By using big data technology, the public land price system construction achievement information data of each region is comprehensively obtained, and processed by big data. , extract relevant information about public land prices to form a unified template of public land price management data. Users only need to enter the information of the object to be valued to realize automatic valuation, quickly reflect the potential market value of the object to be valued, and provide important reference for users to make decisions.
第一方面,本发明实施例提供的一种基于大数据的自然资源公示地价管理系统,包括:数据爬取模块、大数据处理模块、地价管理模块、输入信息获取模块、查询模块和计算模块;In the first aspect, an embodiment of the present invention provides a natural resource publicity land price management system based on big data, including: a data crawling module, a big data processing module, a land price management module, an input information acquisition module, a query module and a calculation module;
所述数据爬取模块用于通过网络爬虫爬取网络上的各地相关部门公布的公示地价体系建设成果信息的原始数据;The data crawling module is used to crawl the original data of public land price system construction achievement information published by relevant departments in various places on the Internet through a web crawler;
所述大数据处理模块用于对所述原始数据进行数据清洗和筛选处理,得到处理后的数据;The big data processing module is used to perform data cleaning and screening processing on the original data to obtain processed data;
所述地价管理模块用于从处理后的数据中提取出公示地价相关信息,将公示地价相关信息对应填入到统一模板的公示地价管理表格中;The land price management module is used to extract relevant information about public land prices from the processed data, and fill in the relevant information about public land prices into the public land price management form of a unified template;
所述输入信息获取模块用于获取用户输入的待估对象信息;The input information acquisition module is used to obtain the information of the object to be evaluated input by the user;
所述查询模块用于根据待估对象信息从公示地价管理表格中查询相应的公示地价信息、计算因素及修正系数;The query module is used to query the corresponding public land price information, calculation factors and correction coefficients from the public land price management table according to the information of the object to be estimated;
所述计算模块用于根据公示地价信息、计算因素和修正系数计算出待估对象的地块价格。The calculation module is used to calculate the land price of the object to be valued based on the published land price information, calculation factors and correction coefficients.
第二方面,本发明实施例提供的一种基于大数据的自然资源公示地价管理方法,包括:In the second aspect, an embodiment of the present invention provides a natural resource public land price management method based on big data, including:
通过网络爬虫爬取网络上的各地相关部门公布的公示地价体系建设成果信息的原始数据;Use a web crawler to crawl the original data of information on public land price system construction achievements published by relevant departments in various places on the Internet;
对所述原始数据进行数据清洗和筛选处理,得到处理后的数据;Perform data cleaning and screening processing on the original data to obtain processed data;
从处理后的数据中提取出公示地价相关信息,将公示地价相关信息对应填入到相应的公示地价管理表格中;Extract the information related to the public land price from the processed data, and fill in the relevant information about the public land price into the corresponding public land price management form;
获取用户输入的待估对象信息;Obtain the information of the object to be valued input by the user;
根据待估对象信息从公示地价管理表格中查询相应的公示地价信息、计算因素及修正系数;Query the corresponding public land price information, calculation factors and correction coefficients from the public land price management table according to the information of the object to be valued;
根据公示地价信息、计算因素和修正系数计算出待估对象的地块价格。Calculate the land price of the object to be valued based on the published land price information, calculation factors and correction coefficients.
本发明的有益效果:Beneficial effects of the present invention:
本发明实施例提供的一种基于大数据的自然资源公示地价管理系统及方法,通过采用大数据技术全面获取各个地区的公示地价体系建设成果信息数据,并经过大数据处理,提取公示地价相关信息形成统一模板的公示地价管理数据,用户只需输入待估对象信息便可实现自动估价,实现快速反映待估对象的潜在市场价值,为用户的决策提供重要参考。The embodiments of the present invention provide a natural resource public land price management system and method based on big data. The public land price system construction achievement information data of each region is comprehensively obtained by using big data technology, and through big data processing, the public land price related information is extracted. The publicized land price management data forms a unified template. Users only need to input the information of the object to be evaluated to realize automatic valuation, quickly reflect the potential market value of the object to be evaluated, and provide an important reference for users to make decisions.
附图说明Description of the drawings
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍。在所有附图中,类似的元件或部分一般由类似的附图标记标识。附图中,各元件或部分并不一定按照实际的比例绘制。In order to more clearly explain the specific embodiments of the present invention or the technical solutions in the prior art, the drawings that need to be used in the description of the specific implementations or the prior art will be briefly introduced below. Throughout the drawings, similar elements or portions are generally identified by similar reference numerals. In the drawings, elements or parts are not necessarily drawn to actual scale.
图1示出了本发明第一实施例所提供的一种基于大数据的自然资源公示地价管理系统的结构框图;Figure 1 shows a structural block diagram of a natural resource public land price management system based on big data provided by the first embodiment of the present invention;
图2示出了本发明另一实施例所提供的一种基于大数据的自然资源公示地价管理方法的流程图。Figure 2 shows a flow chart of a big data-based natural resource public land price management method provided by another embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that, when used in this specification and the appended claims, the terms "comprises" and "comprises" indicate the presence of described features, integers, steps, operations, elements and/or components but do not exclude the presence of one or The presence or addition of multiple other features, integers, steps, operations, elements, components and/or collections thereof.
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly dictates otherwise.
还应当进一步理解,本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. . Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".
需要注意的是,除非另有说明,本申请使用的技术术语或者科学术语应当为本发明所属领域技术人员所理解的通常意义。It should be noted that, unless otherwise stated, the technical terms or scientific terms used in this application should have the usual meanings understood by those skilled in the art to which this invention belongs.
如图1所示,示出了本发明第一实施例所提供的一种基于大数据的自然资源公示地价管理系统的结构框图,该系统包括:数据爬取模块、大数据处理模块、地价管理模块、输入信息获取模块、查询模块和计算模块,其中,数据爬取模块用于通过网络爬虫爬取网络上的各地相关部门公布的公示地价体系建设成果信息的原始数据;大数据处理模块用于对所述原始数据进行数据清洗和筛选处理,得到处理后的数据;地价管理模块用于从处理后的数据中提取出公示地价相关信息,将公示地价相关信息对应填入到统一模板的公示地价管理表格中;输入信息获取模块用于获取用户输入的待估对象信息;查询模块用于根据待估对象信息从公示地价管理表格中查询相应的公示地价信息、计算因素及修正系数;计算模块用于根据公示地价信息、计算因素和修正系数计算出待估对象的地块价格。As shown in Figure 1, a structural block diagram of a natural resource publicity land price management system based on big data provided by the first embodiment of the present invention is shown. The system includes: a data crawling module, a big data processing module, and a land price management system. module, input information acquisition module, query module and calculation module. Among them, the data crawling module is used to crawl the original data of the public land price system construction achievement information published by relevant departments in various places on the network through a web crawler; the big data processing module is used to Perform data cleaning and filtering processing on the original data to obtain processed data; the land price management module is used to extract relevant information about public land prices from the processed data, and fill in the relevant information about public land prices into the publicized land prices of the unified template. In the management form; the input information acquisition module is used to obtain the information of the object to be estimated input by the user; the query module is used to query the corresponding public land price information, calculation factors and correction coefficients from the public land price management table according to the information of the object to be valued; the calculation module is used to Calculate the land price of the object to be valued based on the published land price information, calculation factors and correction coefficients.
由于各地相关部门公布地价体系建设成果信息,并且公开信息的时间不一致,因此,在本申请中设置了大数据爬取模块,通过网络爬虫爬取网络上公布的公示地价体系建设成果信息的原始数据,原始数据包括:基准地价和标定地价。各地的基准地价门类主要有:①建设用地地价包括城镇基准地价、集体建设用地基准地价、划拨建设用地基准地价;②农用地地价包括国有农用地基准地价、集体农用地基准地价;③其他类型包括地下空间基准地价、新型产业用地基准地价等。大数据处理模块对原始数据进行数据清洗和筛选处理,得到处理后的数据。地价管理模块从处理后的数据提取出需要的相关公示地价信息,将提取出来的信息填入到公示地价管理表格中,本发明实施例中的公示地价管理表格采用统一模板。将所有公示地价信息采用统一模板,构建城乡统一运行基准地价。用户在搜索框中输入待估对象信息,待估对象是某个地区某个地块,待估对象信息包括地块名称、级别、面积等。查询模块根据用户输入的待估对象信息从公示地价管理表格中查询出相应的公示地价信息、计算因素及修正系数,计算模块根据公示地价信息、计算因素和修正系数之间的关系计算出待估对象的地块价格。计算因素包括区域因素、容积率因素、土地开发程度因素和土地剩余使用年期因素。Since relevant departments in various places publish information on the construction results of the land price system, and the time for disclosing the information is inconsistent, a big data crawling module is set up in this application to crawl the original data of the information on the construction results of the land price system published on the Internet through a web crawler. , the original data includes: benchmark land price and calibrated land price. The main categories of benchmark land prices in various places are: ① Construction land prices include urban benchmark land prices, collective construction land benchmark land prices, and allocated construction land benchmark land prices; ② Agricultural land prices include state-owned agricultural land benchmark land prices and collective agricultural land benchmark land prices; ③ Other types include Benchmark land prices for underground space, benchmark land prices for new industrial land, etc. The big data processing module performs data cleaning and screening on the original data to obtain processed data. The land price management module extracts the required relevant public land price information from the processed data, and fills the extracted information into the public land price management form. The public land price management form in the embodiment of the present invention adopts a unified template. Use a unified template for all public land price information to build a unified benchmark land price for urban and rural areas. The user enters the information of the object to be valued in the search box. The object to be valued is a land parcel in a certain area. The information of the object to be valued includes the name, level, area, etc. of the land parcel. The query module queries the corresponding public land price information, calculation factors and correction coefficients from the public land price management table based on the information of the object to be estimated input by the user. The calculation module calculates the land price to be estimated based on the relationship between the public land price information, calculation factors and correction coefficients. The object's land price. Calculation factors include regional factors, floor area ratio factors, land development degree factors and remaining land use life factors.
本发明实施例提供的一种基于大数据的自然资源公示地价管理系统,通过采用大数据技术全面获取各个地区的公示地价体系建设成果信息数据,并经过大数据处理,提取公示地价相关信息形成统一模板的公示地价管理数据,用户只需输入待估对象信息便可实现自动估价,实现快速反映待估对象的潜在市场价值,为用户的决策提供重要参考。The embodiment of the present invention provides a natural resource public land price management system based on big data. By using big data technology, the public land price system construction achievement information data of each region is comprehensively obtained, and through big data processing, the public land price related information is extracted to form a unified For the public land price management data of the template, users only need to enter the information of the object to be valued to realize automatic valuation, quickly reflect the potential market value of the object to be valued, and provide an important reference for users to make decisions.
在本发明的另一实施例中,与上述第一实施例不同之处在于,该系统还包括地图模块,地图模块显示各地行政区域的地图,以及在待估对象的区域地图上显示出计算得到的地块价格,这样便于用户直观地通过地图模块看到查询地块的信息和地块价格。In another embodiment of the present invention, the difference from the above-mentioned first embodiment is that the system also includes a map module. The map module displays maps of administrative regions in various places, and displays the calculated results on the regional map of the object to be evaluated. The land parcel price makes it easy for users to intuitively see the information and land parcel price through the map module.
在本发明的另一实施例中,与上述第一实施例不同之处在于,该系统的大数据处理模块包括数据清洗单元,数据清洗单元用于采用文本处理技术对原始数据进行去噪、解析和分词,得到文本数据,采用数据挖掘技术分析文本数据的属性,得到清洗数据。大数据处理模块还包括数据筛选单元,所述数据筛选单元用于从清洗数据中分别筛选出市级和县级相关部门公布的信息,根据市级部门公布的地价内涵对县级部门公布的地价内涵进行统一和修正,得到地价内涵要素相同的数据。地价管理模块包括信息提取单元,所述信息提取单元用于从地价内涵要素相同的数据中提取出建设用地基准地价、农用地基准地价和标定地价的信息,其中,所述建设用地基准地价包括城镇基准地价、集体建设用地基准地价和划拨建设用地基准地价,所述农用地基准地价包括国有农用地基准地价和集体农用地基准地价。In another embodiment of the present invention, the difference from the above-mentioned first embodiment is that the big data processing module of the system includes a data cleaning unit. The data cleaning unit is used to denoise and analyze the original data using text processing technology. and word segmentation to obtain text data, and use data mining technology to analyze the attributes of the text data to obtain clean data. The big data processing module also includes a data screening unit. The data screening unit is used to screen out the information published by relevant departments at the municipal and county levels from the cleaned data, and compare the land prices published by the county-level departments according to the connotation of the land prices published by the municipal departments. The connotations are unified and revised to obtain data with the same connotative elements of land prices. The land price management module includes an information extraction unit, which is used to extract information on the benchmark land price for construction land, the benchmark land price for agricultural land, and the calibrated land price from data with the same land price connotation elements, where the benchmark land price for construction land includes urban land prices. Benchmark land prices, collective construction land benchmark land prices and allocated construction land benchmark land prices. The agricultural land benchmark land prices include state-owned agricultural land benchmark land prices and collective agricultural land benchmark land prices.
数据清洗过程包括对数据预处理、特征选择、数据清洗和清洗结果。数据预处理是对原始数据进行简单的约束处理;特征选择指的是提取数据特征,剔除冗余信息;数据清洗指的是根据实际应用情况,清洗脏数据;清洗结果检查指的是根据清洗标准检验数据质量。采用Hadoop的分布式数据清洗方法,可以有效提高大数据清洗的效率。采用Hadoop的分布式数据清洗的前期工作:将数据分为存储层和清洗计算层。采用HDFS进行数据存储,在HDFS上采用Hive数据仓库实现数据的转存,然后将数据库的数据按照指定格式进行处理,完成数据清洗前期的准备工作。采用Hadoop的分布式数据清洗的流程包括:(1)数据源加载:采用Sqoop和Hive对采集的数据进行加载和转存。(2)数据预处理:根据数据需求进行综合分析,以确保数据分析时数据的可用性。(3)特征选择:选择出影响数据质量的主要特征,剔除冗余特征,实现大数据的降维处理。(4)识别异常数据:采用改进的K-means算法,距离相差越大异常性越强的数据,采用MapReduce方法实现并行化计算。(5)清洗结果检验:对清洗后的数据进行数据质量检查,确保数据质量。The data cleaning process includes data preprocessing, feature selection, data cleaning and cleaning results. Data preprocessing refers to simple constraint processing of original data; feature selection refers to extracting data features and eliminating redundant information; data cleaning refers to cleaning dirty data according to actual application conditions; cleaning result inspection refers to cleaning standards based on Check data quality. Using Hadoop's distributed data cleaning method can effectively improve the efficiency of big data cleaning. Preliminary work for distributed data cleaning using Hadoop: divide the data into a storage layer and a cleaning computing layer. Use HDFS for data storage, use Hive data warehouse on HDFS to transfer data, and then process the database data according to the specified format to complete the preliminary preparations for data cleaning. The process of distributed data cleaning using Hadoop includes: (1) Data source loading: using Sqoop and Hive to load and dump the collected data. (2) Data preprocessing: Comprehensive analysis based on data requirements to ensure data availability during data analysis. (3) Feature selection: Select the main features that affect data quality, eliminate redundant features, and achieve dimensionality reduction processing of big data. (4) Identify abnormal data: The improved K-means algorithm is used. The larger the distance difference, the stronger the abnormality of the data, and the MapReduce method is used to achieve parallel computing. (5) Cleaning result inspection: Perform data quality inspection on the cleaned data to ensure data quality.
识别异常数据的具体方法:Specific methods to identify abnormal data:
从训练样本{x1,…,xm},xi∈Rn,随机取k个中心点。From the training samples {x 1 ,..., x m }, x i ∈R n , k center points are randomly selected.
Step 1:从{x1,…,xm}随机取k个样本,记作初始聚类中心μ1,μ2,…,μk∈Rn。Step 1: Randomly select k samples from {x 1 ,..., x m } and record them as initial clustering centers μ 1 , μ 2 ,..., μ k ∈R n .
Step 2:求取其他数据与该中心样本的距离,数据样本根据距离划分类别,如式(1):Step 2: Find the distance between other data and the center sample. The data samples are divided into categories according to the distance, as shown in Equation (1):
Step 3:对每个类j,求取平均值确定中心点,如式(2):Step 3: For each class j, find the average value to determine the center point, as shown in Equation (2):
式中,每个样本含d个属性,第j类共m个数据,其中样本xi属于第j类,属于第j类的样本xi的第D个特征求和,再求平均值,即为第D个特征的质心。In the formula, each sample contains d attributes, and there are m data in the jth category. The sample x i belongs to the jth category, and the Dth feature of the sample x i belonging to the jth category is summed and then averaged, that is is the centroid of the Dth feature.
Step 4:若目标函数收敛,终止程序;否则转到Step 2。Step 4: If the objective function converges, terminate the program; otherwise, go to Step 2.
采用Canopy算法优化K-means算法具体步骤如下:The specific steps of using the Canopy algorithm to optimize the K-means algorithm are as follows:
(1)原始数据存入数据集D中。(1) The original data is stored in data set D.
(2)随机确定中心点,放入canopy centerlist,将该数据从D里删除。(2) Randomly determine the center point, put it into canopy centerlist, and delete the data from D.
(3)计算其他数据与canopy centerlist的距离。Dist[i]<T1的样本作为一类,并将已经归类的样本从D中删除。按照此分类方法,将其余的样本数据分别归入T2,T3,…,直至D中的数据全部归类完成。(3) Calculate the distance between other data and canopy centerlist. The samples of Dist[i]<T1 are regarded as one category, and the classified samples are deleted from D. According to this classification method, the remaining sample data are classified into T 2 , T 3 ,... until all the data in D are classified.
(4)通过归类后得到了k个Canopy。(4) After classification, k Canopy are obtained.
(5)计算k个聚类中心点。(5) Calculate k cluster center points.
(6)计算每个数据与中心点的距离。将数据归类为Dist[i]最小的类。(6) Calculate the distance between each data and the center point. Classify the data into the smallest class Dist[i].
(7)将各类的平均值作为新的中心点。(7) Use the average value of each category as the new center point.
(8)再计算新中心点与Canopy中心点的距离,按照步骤(3)分类。(8) Then calculate the distance between the new center point and the Canopy center point, and classify according to step (3).
(9)若达到收敛条件,停止循环;否则继续步骤(6)~(8)。(9) If the convergence condition is reached, stop the cycle; otherwise, continue with steps (6) to (8).
在进行数据清洗时,采用Canopy算法改进K-means算法对异常数据进行清洗,将距离相差越大异常性越强的数据MapReduce方法实现并行化计算。采用改进K-means算法比传统的K-means算法清洗数据后,具有更高的准确度和更快的处理速度。在进行数据筛选处理时,采用贝叶斯分类算法进行筛选,对清洗数据进行再次过滤处理,提高数据过滤的精确度。When performing data cleaning, the Canopy algorithm is used to improve the K-means algorithm to clean abnormal data, and the MapReduce method for data with greater distance difference and stronger abnormality is used to implement parallel computing. After using the improved K-means algorithm to clean the data, it has higher accuracy and faster processing speed than the traditional K-means algorithm. When performing data screening and processing, Bayesian classification algorithm is used for screening, and the cleaned data is filtered again to improve the accuracy of data filtering.
贝叶斯分类算法是通过某对象的先验概率模型,利用贝叶斯公式计算出其后验概率;即对象源属于哪一类的主题,选择具有最大后验概率的类作为对象源所属的主题;通过训练源数据集合,由贝叶斯理论得到每个数据信息在小同类的概率大小,构造出贝叶斯模型;朴素贝叶斯是贝叶斯分类模型中误差率最小的,并且其所需估计参数很少,实现算法简单;最小风险贝叶斯分类算法就是以贝叶斯和朴素贝叶斯为基础来解决错误率问题,是最小错误率意义上的最优化。The Bayesian classification algorithm uses the prior probability model of an object and uses the Bayesian formula to calculate its posterior probability; that is, to determine which category the object source belongs to, the class with the largest posterior probability is selected as the subject to which the object source belongs. Theme; by training the source data set, the probability of each data information in the small category is obtained through Bayesian theory, and a Bayesian model is constructed; Naive Bayes has the smallest error rate among the Bayesian classification models, and its There are few estimated parameters required, and the implementation algorithm is simple; the minimum risk Bayesian classification algorithm is based on Bayes and Naive Bayes to solve the error rate problem, and is optimization in the sense of minimum error rate.
本实施例采用最小风险贝叶斯算对清洗数据进行筛选,筛选的具体的步骤包括:This embodiment uses the minimum risk Bayesian algorithm to screen the cleaning data. The specific steps of screening include:
已知P(ωs),P(X|ωt),s=1,2…,c及待识别的X(待过滤的数据包)的情况下,根据贝叶斯公式来计算出后验概率,Given that P(ω s ), P(X|ω t ), s=1,2...,c and the X to be identified (data packet to be filtered), the posterior is calculated according to the Bayesian formula probability,
其中,P(ωs)是先验概率,是由以往用户对网络数据的需求分析所得到的;P(ωt|X)是后验概率,是在得到信息X之后再重新加以更正的概率,P(X|ωs)是根据以往用户对网络数据的需求经验来判断收到的待识别X是否为垃圾网络数据的概率;Among them, P(ω s ) is the prior probability, which is obtained from the analysis of past users’ demand for network data; P(ω t | , P(X|ω s ) is the probability of judging whether the received X to be identified is spam network data based on the past user experience of network data demand;
记数据损失为α,将决策判定规则定义为:Let the data loss be α, and define the decision-making rule as:
1)当网络数据是垃圾数据时,将其判断为垃圾数据不会造成任何损失,α=0;1) When the network data is junk data, judging it as junk data will not cause any loss, α = 0;
2)当把垃圾网络数据判定为合法数据时,则损失α=0;2) When junk network data is determined to be legitimate data, the loss α = 0;
3)当把用户所需网络数据判定为垃圾数据时,则造成的损失是不可估量的,0<α<∞;3) When the network data required by users is judged to be junk data, the loss caused is immeasurable, 0<α<∞;
根据计算后得出的后验概率和设定的决策规则,按以下公式计算出采取rs,s=1,2,……a的条件风险:According to the calculated posterior probability and the set decision rules, the conditional risk of taking r s , s = 1, 2,...a is calculated according to the following formula:
考虑到数据被误判后,要将损失。α→0降到最小,故对之前得到的r个条件风险值R(rs|X)进行比较,从中找出使条件风险最小的决策,记为rh,rh就是最小风险贝叶斯分类决策。通过上述大数据处理方法具有更高的过滤精确度与系统鲁棒性,对在清洗后的样本数据进行过滤处理,得到更精确的数据。Taking into account the loss after the data is misjudged. α→0 is reduced to the minimum, so the r conditional risk values R ( rs| decision making. The above big data processing method has higher filtering accuracy and system robustness, and can filter and process the cleaned sample data to obtain more accurate data.
本发明实施例的系统,通过设置大数据处理模块对爬取的数据进行清洗过滤,可以得到高质量的公示地价数据。由于各地的地价体系有差别,例如:从某省内各地的地价体系构建来看,国有农用地基准地价限定在农垦系统使用,价值内涵设定为出让使用权,最高年限各用途统一为50年;集体农用地基准地价设定为承包经营权,最高年限按照土地管理法进行设定,与国有基准地价有较大的不一致;集体建设用地基准地价中宅基地(农村集体住宅)设定(限定)为村集体内部流转,与国有住宅用地差异较大,而其他经营性建设用地在容积率设定方面也存在一定不同。内涵形式上的统一是构建统一地价体系的第一步。在本发明实施例中,通过大数据处理模块对爬取的数据进行处理,从清洗数据中分别筛选出市级和县级相关部门公布的信息,根据市级部门公布的地价内涵对县级部门公布的地价内涵进行统一和修正,得到地价内涵要素相同的数据。公示地价管理表格采用统一的模板,表格中的内涵要素相同。其中,内涵要素包括:权利类型、估价期日等。通过将省内各地级市或同一地级市内各区县区的地价内涵进行统一和修正,建立横向可比、直观的公示地价体系,提高公示地价体系的适用性和实用性。The system of the embodiment of the present invention can obtain high-quality public land price data by setting up a big data processing module to clean and filter the crawled data. Due to the differences in land price systems in various places, for example: from the construction of the land price system in various places in a certain province, the benchmark land price of state-owned agricultural land is limited to the agricultural reclamation system, the value connotation is set to the transfer of use rights, and the maximum term is 50 years for each use. ; The benchmark land price for collective agricultural land is set as contracted management rights, and the maximum number of years is set in accordance with the Land Management Law, which is quite inconsistent with the state-owned benchmark land price; the benchmark land price for collective construction land is set (limited to rural collective housing) ) is an internal transfer of the village collective, which is quite different from state-owned residential land, and other commercial construction land also has certain differences in the setting of floor area ratio. The unification of connotation and form is the first step in building a unified land price system. In the embodiment of the present invention, the crawled data is processed through the big data processing module, the information published by relevant departments at the municipal and county levels is screened out from the cleaned data, and the county-level departments are evaluated according to the connotation of land prices published by the municipal departments. The published land price connotations are unified and revised to obtain data with the same land price connotation elements. The public land price management form adopts a unified template, and the connotative elements in the form are the same. Among them, the connotative elements include: right type, valuation date, etc. By unifying and revising the land price connotations of various prefecture-level cities in the province or districts and counties within the same prefecture-level city, a horizontally comparable and intuitive public land price system will be established to improve the applicability and practicality of the public land price system.
在本发明的另一实施例中,与上述第一实施例不同之处在于,系统还包括估价对比模块,估价对比模块用于将根据基准地价估算出来的地价与根据标定地价估算出来的地价进行比较,得到比较结果,将比较结果与设定阈值比较,若小于设定阈值,则分别显示两个地价给用户参考;若大于设定阈值,分别显示两个地价,并给出造成地价差异大的原因。通过估价对比模块,用户可以直观快速得到根据基准地价和标定地价估算出来的待估对象的价格的不同以及造成差异较大的原因。系统还包括分析模块,分析模块用于根据已公开的地价出让实际案例进行分析,向用户给出选择建议。通过设置的分析模块,用户可以参考给出的建议进行科学的选择。In another embodiment of the present invention, the difference from the above-mentioned first embodiment is that the system also includes a valuation comparison module. The valuation comparison module is used to compare the land price estimated based on the benchmark land price with the land price estimated based on the calibrated land price. Compare to get the comparison result, compare the comparison result with the set threshold. If it is less than the set threshold, two land prices will be displayed for the user's reference; if it is greater than the set threshold, two land prices will be displayed respectively, and the result of the large land price difference will be given. s reason. Through the valuation comparison module, users can intuitively and quickly obtain the difference in price of the object to be valued estimated based on the benchmark land price and the calibrated land price, as well as the reasons for the large differences. The system also includes an analysis module, which is used to analyze actual cases of public land price transfers and give selection suggestions to users. Through the set analysis module, users can make scientific choices with reference to the suggestions given.
在上述的第一实施例中,提供了一种基于大数据的自然资源公示地价管理系统,与之相对应的,本申请还提供一种基于大数据的自然资源公示地价管理方法。请参考图2,其为本发明另一实施例提供的一种基于大数据的自然资源公示地价管理方法的流程图。由于方法实施例基本相似于装置实施例,所以描述得比较简单,相关之处参见装置实施例的部分说明即可。下述描述的方法实施例仅仅是示意性的。In the above-mentioned first embodiment, a natural resource public land price management system based on big data is provided. Correspondingly, this application also provides a natural resource public land price management method based on big data. Please refer to Figure 2, which is a flow chart of a natural resource public land price management method based on big data provided by another embodiment of the present invention. Since the method embodiment is basically similar to the device embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the device embodiment. The method embodiments described below are merely illustrative.
如图2所示,示出了本发明另一实施例提供的一种基于大数据的自然资源公示地价管理方法的流程图,该方法包括以下步骤:As shown in Figure 2, there is shown a flow chart of a big data-based natural resource public land price management method provided by another embodiment of the present invention. The method includes the following steps:
通过网络爬虫爬取网络上的各地相关部门公布的公示地价体系建设成果信息的原始数据;Use a web crawler to crawl the original data of information on public land price system construction achievements published by relevant departments in various places on the Internet;
对所述原始数据进行数据清洗和筛选处理,得到处理后的数据;Perform data cleaning and screening processing on the original data to obtain processed data;
从处理后的数据中提取出公示地价相关信息,将公示地价相关信息对应填入到相应的公示地价管理表格中;Extract the information related to the public land price from the processed data, and fill in the relevant information about the public land price into the corresponding public land price management form;
获取用户输入的待估对象信息;Obtain the information of the object to be valued input by the user;
根据待估对象信息从公示地价管理表格中查询相应的公示地价信息、计算因素及修正系数;Query the corresponding public land price information, calculation factors and correction coefficients from the public land price management table according to the information of the object to be valued;
根据公示地价信息、计算因素和修正系数计算出待估对象的地块价格。Calculate the land price of the object to be valued based on the published land price information, calculation factors and correction coefficients.
通过上述步骤,本发明实施例提供的一种基于大数据的自然资源公示地价管理方法,通过采用大数据技术爬取各个地区的公示地价体系建设成果信息数据,并进行大数据处理后,提取公示地价相关信息形成统一模板的公示地价管理数据,用户可快速查询到相应地区待估对象的自动估价进行辅助参考。Through the above steps, the embodiment of the present invention provides a natural resource public land price management method based on big data. The big data technology is used to crawl the public land price system construction achievement information data of each region, and after big data processing, the public land price is extracted. Land price related information forms a unified template of public land price management data, and users can quickly query the automatic valuation of objects to be valued in the corresponding area for auxiliary reference.
为了便于用户直观地通过地图模块看到查询地块的信息和地块价格,该方法还包括:显示各地行政区域的地图,以及在待估对象的区域地图上显示出地块价格。In order to facilitate the user to intuitively see the information and the price of the queried land parcel through the map module, the method also includes: displaying a map of the administrative region of each place, and displaying the land parcel price on the regional map of the object to be evaluated.
其中,对所述原始数据进行数据清洗和筛选处理的具体方法包括:Among them, the specific methods for data cleaning and screening of the original data include:
对原始数据进行去噪、解析和分词,得到文本数据,采用数据挖掘技术分析文本数据的属性,得到清洗数据;Perform denoising, parsing and word segmentation on the original data to obtain text data, and use data mining technology to analyze the attributes of the text data to obtain clean data;
从清洗数据中分别筛选出市级和县级相关部门公布的信息,根据市级部门公布的地价内涵对县级部门公布的地价内涵进行统一和修正,得到地价内涵要素相同的数据。The information released by relevant departments at the municipal and county levels were screened out from the cleaned data, and the land price connotations announced by the county-level departments were unified and corrected based on the land price connotations announced by the municipal departments to obtain data with the same land price connotation elements.
具体地,数据清洗过程包括对数据预处理、特征选择、数据清洗和清洗结果。数据预处理是对原始数据进行简单的约束处理;特征选择指的是提取数据特征,剔除冗余信息;数据清洗指的是根据实际应用情况,清洗脏数据;清洗结果检查指的是根据清洗标准检验数据质量。采用Hadoop的分布式数据清洗方法,可以有效提高大数据清洗的效率。在进行数据清洗时,采用Canopy算法改进K-means算法对异常数据进行清洗,将距离相差越大异常性越强的数据MapReduce方法实现并行化计算。采用改进K-means算法比传统的K-means算法清洗数据后,具有更高的准确度和更快的处理速度。在进行数据筛选处理时,采用贝叶斯分类算法进行筛选,对清洗数据进行再次过滤处理,提高数据过滤的精确度。具体的数据处理过程参考上述系统的描述。通过将省内各地级市或同一地级市内各区县区的地价内涵进行统一和修正,建立横向可比、直观的公示地价体系,提高公示地价体系的适用性和实用性。Specifically, the data cleaning process includes data preprocessing, feature selection, data cleaning and cleaning results. Data preprocessing refers to simple constraint processing of original data; feature selection refers to extracting data features and eliminating redundant information; data cleaning refers to cleaning dirty data according to actual application conditions; cleaning result inspection refers to cleaning standards based on Check data quality. Using Hadoop's distributed data cleaning method can effectively improve the efficiency of big data cleaning. When performing data cleaning, the Canopy algorithm is used to improve the K-means algorithm to clean abnormal data, and the MapReduce method for data with greater distance difference and stronger abnormality is used to implement parallel computing. After using the improved K-means algorithm to clean the data, it has higher accuracy and faster processing speed than the traditional K-means algorithm. When performing data screening and processing, Bayesian classification algorithm is used for screening, and the cleaned data is filtered again to improve the accuracy of data filtering. For the specific data processing process, please refer to the description of the above system. By unifying and revising the land price connotations of various prefecture-level cities in the province or districts and counties within the same prefecture-level city, a horizontally comparable and intuitive public land price system will be established to improve the applicability and practicality of the public land price system.
从处理后的数据中提取出公示地价相关信息的具体方法包括:从地价内涵要素相同的数据中提取出建设用地基准地价、农用地基准地价和标定地价的信息,其中,所述建设用地基准地价包括城镇基准地价、集体建设用地基准地价和划拨建设用地基准地价,所述农用地基准地价包括国有农用地基准地价和集体农用地基准地价。The specific method of extracting information related to public land prices from the processed data includes: extracting information on the benchmark land price for construction land, the benchmark land price for agricultural land, and the calibrated land price from data with the same land price connotation elements, wherein the benchmark land price for construction land is It includes urban benchmark land prices, collective construction land benchmark land prices and allocated construction land benchmark land prices. The agricultural land benchmark land prices include state-owned agricultural land benchmark land prices and collective agricultural land benchmark land prices.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围,其均应涵盖在本发明的权利要求和说明书的范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; and these modifications or substitutions do not deviate from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention. scope, they should be covered by the claims and the scope of the description of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310898194.0A CN116911884A (en) | 2023-07-21 | 2023-07-21 | Natural resource public land price management system and method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310898194.0A CN116911884A (en) | 2023-07-21 | 2023-07-21 | Natural resource public land price management system and method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116911884A true CN116911884A (en) | 2023-10-20 |
Family
ID=88357826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310898194.0A Pending CN116911884A (en) | 2023-07-21 | 2023-07-21 | Natural resource public land price management system and method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116911884A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109523446A (en) * | 2018-10-19 | 2019-03-26 | 北京北大软件工程股份有限公司 | A kind of big data processing analysis system towards price field |
CN110659934A (en) * | 2019-09-06 | 2020-01-07 | 李俊鹏 | Big data benchmark land price and land price automatic evaluation updating system |
CN115151940A (en) * | 2020-05-14 | 2022-10-04 | 韩国不动产院 | Land market estimation system and method with final price calculation determination unit |
-
2023
- 2023-07-21 CN CN202310898194.0A patent/CN116911884A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109523446A (en) * | 2018-10-19 | 2019-03-26 | 北京北大软件工程股份有限公司 | A kind of big data processing analysis system towards price field |
CN110659934A (en) * | 2019-09-06 | 2020-01-07 | 李俊鹏 | Big data benchmark land price and land price automatic evaluation updating system |
CN115151940A (en) * | 2020-05-14 | 2022-10-04 | 韩国不动产院 | Land market estimation system and method with final price calculation determination unit |
Non-Patent Citations (1)
Title |
---|
孟德友 著: "《不动产估价理论与技术新探索》", 31 May 2023, 中国经济出版社, pages: 161 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109345348B (en) | Multi-dimensional information portrait recommendation method based on travel agency users | |
CN110704572B (en) | Early warning method, device, equipment and storage medium for suspected illegal fundraising risk | |
Chen et al. | Mapping the fine-scale spatial pattern of housing rent in the metropolitan area by using online rental listings and ensemble learning | |
Wang et al. | Visible green space predicts emotion: Evidence from social media and street view data | |
CN112632405B (en) | Recommendation method, recommendation device, recommendation equipment and storage medium | |
Leishman | Spatial change and the structure of urban housing sub-markets | |
CN107016068A (en) | Knowledge mapping construction method and device | |
CN104299182B (en) | The detection method of urban infrastructure accident based on cluster | |
CN112418696A (en) | Method and device for constructing urban traffic dynamic knowledge map | |
CN111414522A (en) | Recruitment information visualization analysis system based on web crawler | |
CN111738831A (en) | A business processing method, device and system | |
CN108460499A (en) | A kind of micro-blog user force arrangement method of fusion user time information | |
CN111324795A (en) | Construction of food safety network public opinion monitoring and evaluation model based on microblog platform | |
CN115100395B (en) | A method for urban block function classification integrating POI pre-classification and graph neural network | |
CN116662528A (en) | Map self-adaptive recommendation method based on knowledge graph and related equipment | |
CN118503509B (en) | Big data-based bidding net data acquisition method and system | |
Garcia Pozo | A nested housing market structure: additional evidence | |
CN114944209A (en) | An integrated computing method and system for medical similar medical records | |
CN110059240A (en) | A kind of network user's responsibility index calculation method based on influence grade | |
CN116911884A (en) | Natural resource public land price management system and method based on big data | |
CN113988930A (en) | Artificial intelligent valuation system for commercial real estate | |
CN105138636A (en) | Method and device for constructing entity-relationship graph | |
CN118568189A (en) | Urban function co-location analysis method and system based on distance | |
Oki et al. | Model for estimation of building structure and built year using building façade images and attributes obtained from a real estate database | |
CN115018474B (en) | Text and travel consumption heat degree analysis method based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20231020 |
|
RJ01 | Rejection of invention patent application after publication |