CN109035078A - A kind of source of houses polymerization based on the similar calculating of various dimensions information - Google Patents

A kind of source of houses polymerization based on the similar calculating of various dimensions information Download PDF

Info

Publication number
CN109035078A
CN109035078A CN201811009790.4A CN201811009790A CN109035078A CN 109035078 A CN109035078 A CN 109035078A CN 201811009790 A CN201811009790 A CN 201811009790A CN 109035078 A CN109035078 A CN 109035078A
Authority
CN
China
Prior art keywords
houses
source
polymerization
platform
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811009790.4A
Other languages
Chinese (zh)
Inventor
张文战
杨丽娟
白峻峰
刘子耀
张凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhuge Zhaofang Information Technology Co Ltd
Original Assignee
Beijing Zhuge Zhaofang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhuge Zhaofang Information Technology Co Ltd filed Critical Beijing Zhuge Zhaofang Information Technology Co Ltd
Priority to CN201811009790.4A priority Critical patent/CN109035078A/en
Publication of CN109035078A publication Critical patent/CN109035078A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Landscapes

  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to the source of houses polymerizations based on the similar calculating of various dimensions information, comprising the following steps: the source of houses of step (1), each platform of crawl or brokerage firm cleans information of real estate, and it is serious and doubtful false repeat the source of houses to filter out loss of learning;Whether step (2), the source of houses of the multiple platforms of identification belong to the same suite of rooms;The polymerization of step (3), the multi-platform source of houses carries out benchmark and selects;Step (4), source of houses polymerization accuracy and coverage rate detect;Step (5), the source of houses will record history lists in the history restocking of each platform, rise in price, price reduction information, show a source of houses in the life cycle of each platform of the whole network.The invention has the advantages that the data of fusion the whole network, information of real estate more comprehensively, converges historical data, and transverse and longitudinal fully understands the various states and price trend of a source of houses till now in the whole network history, one key can contact interested brokerage firm and broker, greatly improve user and look for room efficiency.

Description

A kind of source of houses polymerization based on the similar calculating of various dimensions information
Technical field
The present invention relates to a kind of source of houses polymerizations based on the similar calculating of various dimensions information.
Background technique
The official website of current each brokerage firm is only capable of showing the own source of houses, and some platforms also only simply grab other The source of houses of platform or brokerage firm is shown as former state.As for the historical price and condition managing in house, now in each platform base Sheet or blank, because brokerage firm is not intended to let the user know that a house is to appreciate or make a price reduction.
With regard to information provided by current platform and brokerage firm, if house purchaser wonders that a house is managed at more The listed information of company needs to multiple brokerage firms or platform to go to compare, and user does not know about some city on earth yet How many brokerage firm of family may have this house.
Summary of the invention
To overcome the shortcomings of existing technologies, the present invention provides a kind of source of houses polymerization side based on the similar calculating of various dimensions information Method, the technical scheme is that
A kind of source of houses polymerization based on the similar calculating of various dimensions information, comprising the following steps:
The source of houses of step (1), each platform of crawl or brokerage firm, cleans information of real estate, it is serious to filter out loss of learning And the doubtful false repetition source of houses;
Whether step (2), the source of houses of the multiple platforms of identification belong to the same suite of rooms;
The polymerization of step (3), the multi-platform source of houses carries out benchmark and selects;
Step (4), source of houses polymerization accuracy and coverage rate detect;
Step (5), the source of houses will record history lists in the history restocking of each platform, rise in price, price reduction information, show a source of houses In the life cycle of each platform of the whole network.
The step (2) specifically:
Entry criteria: when an information of real estate comes, first finding all same cells of database, with total floor, same to room, and same floor The house in section meets entry criteria to carry out similar weight calculation, and weight calculation meets the value attributes such as condition and key not With number less than 2, it is determined as that the same set of source of houses is polymerize;
Wherein, in different channel informations of real estate, the cell name of same cells might have difference, but cell ID be it is identical, It is known that whether belong to the same cell by comparing cell ID;The cell name of different channels and the corresponding relationship of cell ID are Merge foundation with name similarity by cell geographical location;
The step (3) specifically: when certain set source of houses of two channels condenses together, higher channel priority is base Standard, when the source of houses of third channel and two sources of houses meet polymerizing condition, and third channel priority is higher, then third is a On the basis of the adjustment of the channel source of houses;Specific polymerization are as follows:
By the area of the source of houses, price, room, floor multidimensional characteristic is abstracted as the feature vector input of (x1, x2 ..., xn): sample This collection D=(x1, x2 ..., xn), the generating mode of similar matrix, the dimension k after dimensionality reduction1, dimension k after cluster2, output: Cluster division C (c1, c2 ... ck2);
1) the similar matrix S of sample is constructed according to the generating mode of the similar matrix of input;
2) adjacency matrix W, building degree matrix D are constructed according to similar matrix S;
3) Laplacian Matrix L is calculated;
4) the Laplacian Matrix D after building standardization−1/2LD−1/2
5) D is calculated−1/2LD−1/2The smallest corresponding feature vector f of k1 characteristic value institute;
6) matrix by rows of corresponding feature vector f composition is standardized, finally forms n × k1The eigenmatrix F of dimension;
7) to every a line in F as a k1The sample of dimension, total n sample, is clustered with the clustering method of input, cluster Dimension is k2
8) obtain cluster divide C (c1, c2 ... ck2);By above-mentioned algorithm, the polymerization of the identical source of houses of different channels is realized.
The step (4) specifically: by inspecting the source of houses in polymerization by random samples, judge whether to be the same set of source of houses, If it is not, then adjusting the accidentally combined source of houses;
Unpolymerized source of houses coverage rate detected, filter out it is doubtful should polymerize, decide whether to polymerize, if should, Then adjust the unpolymerized source of houses.
The invention has the advantages that merging the data of the whole network, information of real estate more comprehensively, converges historical data, and transverse and longitudinal is comprehensive Understand a source of houses in the whole network history various states till now and price trend, a key can contact interested brokerage firm and Broker greatly improves user and looks for room efficiency.
Specific embodiment
The invention will now be further described with reference to specific embodiments, the advantages and features of the present invention will be with description and It is apparent.But examples are merely exemplary for these, and it is not intended to limit the scope of the present invention in any way.Those skilled in the art Member it should be understood that without departing from the spirit and scope of the invention can details to technical solution of the present invention and form into Row modifications or substitutions, but these modifications and replacement are fallen within the protection scope of the present invention.
The present invention relates to a kind of source of houses polymerizations based on the similar calculating of various dimensions information, comprising the following steps:
The source of houses of step (1), each platform of crawl or brokerage firm, cleans information of real estate, it is serious to filter out loss of learning And the doubtful false repetition source of houses;
Whether step (2), the source of houses of the multiple platforms of identification belong to the same suite of rooms;
The polymerization of step (3), the multi-platform source of houses carries out benchmark and selects;
Step (4), source of houses polymerization accuracy and coverage rate detect;
Step (5), the source of houses will record history lists in the history restocking of each platform, rise in price, price reduction information, show a source of houses In the life cycle of each platform of the whole network.
The step (2) specifically:
Entry criteria: when an information of real estate comes, first finding all same cells of database, with total floor, same to room, and same floor The house in section meets entry criteria to carry out similar weight calculation, and weight calculation meets the value attributes such as condition and key not With number less than 2, it is determined as that the same set of source of houses is polymerize;
Wherein, in different channel informations of real estate, the cell name of same cells might have difference, but cell ID be it is identical, It is known that whether belong to the same cell by comparing cell ID;The cell name of different channels and the corresponding relationship of cell ID are Merge foundation with name similarity by cell geographical location;
The step (3) specifically: when certain set source of houses of two channels condenses together, higher channel priority is base Standard, when the source of houses of third channel and two sources of houses meet polymerizing condition, and third channel priority is higher, then third is a On the basis of the adjustment of the channel source of houses;Specific polymerization are as follows:
By the area of the source of houses, price, room, floor multidimensional characteristic is abstracted as the feature vector input of (x1, x2 ..., xn): sample This collection D=(x1, x2 ..., xn), the generating mode of similar matrix, the dimension k after dimensionality reduction1, dimension k after cluster2, output: Cluster division C (c1, c2 ... ck2);
1) the similar matrix S of sample is constructed according to the generating mode of the similar matrix of input;
2) adjacency matrix W, building degree matrix D are constructed according to similar matrix S;
3) Laplacian Matrix L is calculated;
4) the Laplacian Matrix D after building standardization−1/2LD−1/2
5) D is calculated−1/2LD−1/2The smallest corresponding feature vector f of k1 characteristic value institute;
6) matrix by rows of corresponding feature vector f composition is standardized, finally forms n × k1The eigenmatrix F of dimension;
7) to every a line in F as a k1The sample of dimension, total n sample, is clustered with the clustering method of input, cluster Dimension is k2
8) obtain cluster divide C (c1, c2 ... ck2);By above-mentioned algorithm, the polymerization of the identical source of houses of different channels is realized.
The step (4) specifically: by inspecting the source of houses in polymerization by random samples, judge whether to be the same set of source of houses, If it is not, then adjusting the accidentally combined source of houses;
Unpolymerized source of houses coverage rate detected, filter out it is doubtful should polymerize, decide whether to polymerize, if should, Then adjust the unpolymerized source of houses.

Claims (4)

1. a kind of source of houses polymerization based on the similar calculating of various dimensions information, which comprises the following steps:
The source of houses of step (1), each platform of crawl or brokerage firm, cleans information of real estate, it is serious to filter out loss of learning And the doubtful false repetition source of houses;
Whether step (2), the source of houses of the multiple platforms of identification belong to the same suite of rooms;
The polymerization of step (3), the multi-platform source of houses carries out benchmark and selects;
Step (4), source of houses polymerization accuracy and coverage rate detect;
Step (5), the source of houses will record history lists in the history restocking of each platform, rise in price, price reduction information, show a source of houses In the life cycle of each platform of the whole network.
2. a kind of source of houses polymerization based on the similar calculating of various dimensions information according to claim 1, which is characterized in that The step (2) specifically:
Entry criteria: when an information of real estate comes, first finding all same cells of database, with total floor, same to room, and same floor The house in section meets entry criteria to carry out similar weight calculation, and weight calculation meets the value attributes such as condition and key not With number less than 2, it is determined as that the same set of source of houses is polymerize;
Wherein, in different channel informations of real estate, the cell name of same cells might have difference, but cell ID be it is identical, It is known that whether belong to the same cell by comparing cell ID;The cell name of different channels and the corresponding relationship of cell ID are Merge foundation with name similarity by cell geographical location.
3. a kind of source of houses polymerization based on the similar calculating of various dimensions information according to claim 1, which is characterized in that The step (3) specifically: when certain set source of houses of two channels condenses together, higher channel priority is benchmark, When the source of houses of third channel and two sources of houses meet polymerizing condition, and third channel priority is higher, then by third canal On the basis of the adjustment of the road source of houses;Specific polymerization are as follows:
By the area of the source of houses, price, room, floor multidimensional characteristic is abstracted as the feature vector input of (x1, x2 ..., xn): sample This collection D=(x1, x2 ..., xn), the generating mode of similar matrix, the dimension k after dimensionality reduction1, dimension k after cluster2, output: Cluster division C (c1, c2 ... ck2);
1) the similar matrix S of sample is constructed according to the generating mode of the similar matrix of input;
2) adjacency matrix W, building degree matrix D are constructed according to similar matrix S;
3) Laplacian Matrix L is calculated;
4) the Laplacian Matrix D after building standardization−1/2LD−1/2
5) D is calculated−1/2LD−1/2The smallest corresponding feature vector f of k1 characteristic value institute;
6) matrix by rows of corresponding feature vector f composition is standardized, finally forms n × k1The eigenmatrix F of dimension;
7) to every a line in F as a k1The sample of dimension, total n sample, is clustered with the clustering method of input, cluster Dimension is k2
8) obtain cluster divide C (c1, c2 ... ck2);By above-mentioned algorithm, the polymerization of the identical source of houses of different channels is realized.
4. a kind of source of houses polymerization based on the similar calculating of various dimensions information according to claim 1, which is characterized in that The step (4) specifically: by inspecting the source of houses in polymerization by random samples, judge whether to be the same set of source of houses, if it is not, then Adjustment misses the combined source of houses;
Unpolymerized source of houses coverage rate detected, filter out it is doubtful should polymerize, decide whether to polymerize, if should, Then adjust the unpolymerized source of houses.
CN201811009790.4A 2018-08-31 2018-08-31 A kind of source of houses polymerization based on the similar calculating of various dimensions information Pending CN109035078A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811009790.4A CN109035078A (en) 2018-08-31 2018-08-31 A kind of source of houses polymerization based on the similar calculating of various dimensions information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811009790.4A CN109035078A (en) 2018-08-31 2018-08-31 A kind of source of houses polymerization based on the similar calculating of various dimensions information

Publications (1)

Publication Number Publication Date
CN109035078A true CN109035078A (en) 2018-12-18

Family

ID=64622929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811009790.4A Pending CN109035078A (en) 2018-08-31 2018-08-31 A kind of source of houses polymerization based on the similar calculating of various dimensions information

Country Status (1)

Country Link
CN (1) CN109035078A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977287A (en) * 2019-03-28 2019-07-05 国家计算机网络与信息安全管理中心 A kind of house property data identity method of discrimination of different aforementioned sources
CN110096634A (en) * 2019-04-29 2019-08-06 成都理工大学 A kind of house property data vector alignment schemes based on particle group optimizing
CN110618982A (en) * 2018-12-26 2019-12-27 北京时光荏苒科技有限公司 Multi-source heterogeneous data processing method, device, medium and electronic equipment
CN110633726A (en) * 2018-12-25 2019-12-31 北京时光荏苒科技有限公司 Room source identification method and device, storage medium and electronic equipment
CN111260445A (en) * 2020-01-20 2020-06-09 北京无限光场科技有限公司 House resource information display method, device, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140337347A1 (en) * 2013-04-19 2014-11-13 Tencent Technology (Shenzhen) Company Limited Cluster method and apparatus based on user interest
CN104281967A (en) * 2013-07-10 2015-01-14 永庆房屋仲介股份有限公司 Object renting and selling system with object price fluctuation as display basis
CN107908677A (en) * 2017-10-27 2018-04-13 链家网(北京)科技有限公司 Cell source of houses methods of exhibiting and device based on intelligent terminal
CN108197312A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Obtain source of houses data method, device, equipment and readable storage medium storing program for executing
CN108197311A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Source of houses data aggregate methods of exhibiting, device, equipment and readable storage medium storing program for executing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140337347A1 (en) * 2013-04-19 2014-11-13 Tencent Technology (Shenzhen) Company Limited Cluster method and apparatus based on user interest
CN104281967A (en) * 2013-07-10 2015-01-14 永庆房屋仲介股份有限公司 Object renting and selling system with object price fluctuation as display basis
CN107908677A (en) * 2017-10-27 2018-04-13 链家网(北京)科技有限公司 Cell source of houses methods of exhibiting and device based on intelligent terminal
CN108197312A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Obtain source of houses data method, device, equipment and readable storage medium storing program for executing
CN108197311A (en) * 2018-01-31 2018-06-22 平安好房(上海)电子商务有限公司 Source of houses data aggregate methods of exhibiting, device, equipment and readable storage medium storing program for executing

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633726A (en) * 2018-12-25 2019-12-31 北京时光荏苒科技有限公司 Room source identification method and device, storage medium and electronic equipment
CN110618982A (en) * 2018-12-26 2019-12-27 北京时光荏苒科技有限公司 Multi-source heterogeneous data processing method, device, medium and electronic equipment
CN110618982B (en) * 2018-12-26 2022-09-30 北京时光荏苒科技有限公司 Multi-source heterogeneous data processing method, device, medium and electronic equipment
CN109977287A (en) * 2019-03-28 2019-07-05 国家计算机网络与信息安全管理中心 A kind of house property data identity method of discrimination of different aforementioned sources
CN110096634A (en) * 2019-04-29 2019-08-06 成都理工大学 A kind of house property data vector alignment schemes based on particle group optimizing
CN110096634B (en) * 2019-04-29 2023-02-24 成都理工大学 House property data vector alignment method based on particle swarm optimization
CN111260445A (en) * 2020-01-20 2020-06-09 北京无限光场科技有限公司 House resource information display method, device, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN109035078A (en) A kind of source of houses polymerization based on the similar calculating of various dimensions information
Bauman et al. Optimizing the choice of a spatial weighting matrix in eigenvector‐based methods
CN109873501B (en) Automatic identification method for low-voltage distribution network topology
US9287713B2 (en) Topology identification in distribution network with limited measurements
CN110110881B (en) Power customer demand prediction analysis method and system
Bornmann How to analyze percentile citation impact data meaningfully in bibliometrics: The statistical analysis of distributions, percentile rank classes, and top‐cited papers
CN106952159B (en) Real estate collateral risk control method, system and storage medium
Militino et al. Alternative models for describing spatial dependence among dwelling selling prices
US20140058705A1 (en) System and Method for Detecting Abnormal Occurrences
Mohammadian et al. Data-driven classifier for extreme outage prediction based on Bayes decision theory
Micevski et al. Regionalisation of the parameters of the log‐Pearson 3 distribution: A case study for New South Wales, Australia
CN106026092A (en) Island dividing method for power distribution network comprising distributed power supply
CN103559426A (en) Protein functional module excavating method for multi-view data fusion
CN103581982B (en) A kind of detection method of traffic hotspots, determine method, localization method and device
CN106326923A (en) Sign-in position data clustering method in consideration of position repetition and density peak point
CN104735710A (en) Mobile network performance early warning pre-judging method based on trend extrapolation clustering
US10557720B2 (en) Unauthorized electrical grid connection detection and characterization system and method
CN117200217A (en) Power system scheduling method based on load classification
Steinley et al. A note on the expected value of the Rand index
US20050131873A1 (en) System and method for adaptive pruning
Xia Improve the resilience of multilayer supply chain networks
Kabir et al. Power outage prediction using data streams: An adaptive ensemble learning approach with a feature‐and performance‐based weighting mechanism
CN109656904B (en) Case risk detection method and system
Nabian et al. Uncertainty quantification and pca-based model reduction for parallel monte carlo analysis of infrastructure system reliability
CN112598041B (en) Power distribution network cloud platform data verification method based on K-MEANS algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181218

WD01 Invention patent application deemed withdrawn after publication