CN117539920A - Data query method and system based on real estate transaction multidimensional data - Google Patents

Data query method and system based on real estate transaction multidimensional data Download PDF

Info

Publication number
CN117539920A
CN117539920A CN202410012977.9A CN202410012977A CN117539920A CN 117539920 A CN117539920 A CN 117539920A CN 202410012977 A CN202410012977 A CN 202410012977A CN 117539920 A CN117539920 A CN 117539920A
Authority
CN
China
Prior art keywords
data
transaction
abnormal
real estate
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410012977.9A
Other languages
Chinese (zh)
Other versions
CN117539920B (en
Inventor
刘煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tuli Information Technology Co ltd
Original Assignee
Shanghai Tuli Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tuli Information Technology Co ltd filed Critical Shanghai Tuli Information Technology Co ltd
Priority to CN202410012977.9A priority Critical patent/CN117539920B/en
Publication of CN117539920A publication Critical patent/CN117539920A/en
Application granted granted Critical
Publication of CN117539920B publication Critical patent/CN117539920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate
    • G06Q50/163Real estate management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Technology Law (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data query, in particular to a data query method and system based on real estate transaction multidimensional data. The method comprises the following steps: obtaining original real estate transaction data; detecting abnormal values of original property transaction data to generate an abnormal mark data set and property transaction normalization data; performing data restoration on the abnormal mark data set so as to generate real estate transaction normalization data; carrying out data standardization on the real estate transaction normalization data to generate standard real estate transaction data; creating edge nodes for standard real estate transaction data to generate real estate transaction nodes; establishing indexes of the real estate transaction nodes to generate node index data; partitioning graph data of the real estate transaction nodes according to the node index data to generate transaction partition optimization data; the method improves the accuracy and the prediction capability of multi-dimensional data query of the real estate transaction through outlier processing, graph data modeling, cross-domain data utilization and transfer learning.

Description

Data query method and system based on real estate transaction multidimensional data
Technical Field
The invention relates to the technical field of data query, in particular to a data query method and system based on real estate transaction multidimensional data.
Background
Past property transaction data is archived in paper form. This approach is difficult to manage and retrieve large amounts of information, time consuming and error prone. With the development of computer technology, data starts to be digitally stored, so that the storage and retrieval processes are simplified, but the query is still limited to basic text searching functions. Subsequently, with the development of database technology, a relational database management system (RDBMS), such as an SQL database, has emerged to provide more powerful functions for queries. However, conventional SQL queries are inefficient in processing multidimensional data and large-scale data, limiting flexible exploration of complex data patterns. With the rise of big data technology, noSQL databases and distributed computing frameworks such as Hadoop and Spark have emerged to provide new possibilities for processing large-scale real estate transaction data. These techniques allow for faster processing of mass data and parallel querying through distributed computing, speeding up data processing and analysis. However, the current property transaction data relates to multi-dimensional complex relationships, including geographic positions, economic factors and the like, and the current technology is difficult to completely capture and analyze the complex relationships, so that query results may not be comprehensive enough, and the comprehensiveness and accuracy of the query are affected.
Disclosure of Invention
Based on this, it is necessary to provide a data query method and system based on multi-dimensional data of real estate transaction, so as to solve at least one of the above technical problems.
In order to achieve the above purpose, a data query method and system based on real estate transaction multidimensional data, the method comprises the following steps:
step S1: obtaining original real estate transaction data; detecting abnormal values of original property transaction data to generate an abnormal mark data set and property transaction normalization data; performing data restoration on the abnormal mark data set so as to generate real estate transaction normalization data; carrying out data standardization on the real estate transaction normalization data to generate standard real estate transaction data;
step S2: creating edge nodes for standard real estate transaction data to generate real estate transaction nodes; establishing indexes of the real estate transaction nodes to generate node index data; partitioning graph data of the real estate transaction nodes according to the node index data to generate transaction partition optimization data; network connection is carried out on transaction partition optimization data, so that a distributed real estate transaction network diagram is generated;
step S3: carrying out graph neural network model modeling according to the distributed property transaction network graph to generate a property transaction graph neural network model; performing multidimensional feature vector label processing on the house property transaction graph neural network model to generate a house property transaction clustering result label; performing clustering performance evaluation on the property transaction clustering result labels to generate property transaction clustering performance evaluation data;
Step S4: acquiring cross-domain property transaction data and user query demand data; performing prediction model training on cross-domain property transaction data and property transaction clustering performance evaluation data to generate a property transaction transfer learning model; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
The invention can improve the data quality and reduce the negative influence on the model by identifying and processing the abnormal value. The outliers may be the result of data entry errors, system failures, or other anomalies. Excluding these outliers may allow the model to learn the pattern of the data more accurately. The robustness and generalization capability of the model are improved, and the sensitivity of the model to abnormal conditions is reduced. By generating the anomaly flag dataset, useful information can be provided for subsequent repair of the anomaly value. At the same time, normalizing the property transaction data can ensure the consistency and comparability of the data. Repairing the data in the abnormal marked data set can fill in potential data loss or errors and improve the integrity of the data. Normalization can eliminate the dimensional effects between different features, ensuring that they are on the same scale. This helps to accelerate the convergence process of the model, improving the stability of model training. Normalization allows the data to follow a standard normal distribution, contributing to the performance improvement of certain models, especially for distance-based models. By creating the edge nodes and generating the property transaction nodes, the relationship between property transactions can be better represented, a richer graph data structure is formed, the node index can improve the retrieval efficiency of graph data, the running speed of a graph algorithm is accelerated, the graph data partition can enable distributed computation to be more efficient, communication expenditure is reduced, and computing performance is improved. The transaction partition optimization data can optimize the graph data distribution, improve the parallelism of graph processing, and connect the partition optimization data with a network to construct a distributed real estate transaction network graph, so that the global structure and association relation of the real estate transaction data can be better reflected. The GNN can effectively capture complex relations and structures in graph data, is suitable for processing complex data such as a real estate transaction network graph, the multidimensional feature vector can describe features of real estate transaction nodes more comprehensively, label processing is helpful for providing a supervision learning target for a model, clustering result labels can classify similar real estate transaction nodes into one type, understanding of different clustering groups in real estate markets is facilitated, performance evaluation on clustering results can objectively scale the effectiveness of the model, and quality and accuracy of the clustering results are ensured. By combining the cross-domain property transaction data with the property transaction clustering performance evaluation data, more comprehensive and accurate information can be provided for training the transfer learning model, the user query demand data is imported into the transfer learning model for prediction, personalized prediction and recommendation can be carried out on the property transaction according to the model, the transfer learning model can utilize the existing knowledge to accelerate learning in a new domain or a new task, and the accuracy and adaptability of prediction are improved. The generated real estate transaction forecast result data can provide targeted information and advice for a decision maker, investor or user. Therefore, the accuracy and the prediction capability of the multi-dimensional data query of the real estate transaction are improved through outlier processing, graph data modeling, cross-domain data utilization and transfer learning.
The method has the advantages that the quality of original real estate transaction data can be improved, the interference of noise and abnormality on subsequent analysis is reduced, the generated standard real estate transaction data is more reliable, and the method is beneficial to establishing a more accurate model and improving the reliability of query results through the abnormal value detection, repair and data standardization in the step S1. The map data partition and the creation of the distributed property transaction network map in the step S2 can improve the efficiency and the expandability of data processing, and the generated distributed property transaction network map can better reflect the relationship between transactions, thereby being beneficial to more comprehensively understanding the complex structure of the property market. The graphic neural network model in the step S3 can learn more complex modes and relations from graphic data, the modeling flexibility is improved, the generated house property transaction graphic neural network model can better capture potential characteristics, the house property transaction is more accurately classified and clustered, and richer information is provided for subsequent analysis. The transfer learning model in the step S4 can combine the data in different fields, improves the prediction capability of the cross-field property transaction data, and the generated property transaction prediction result is more personalized and can better meet the requirements of users by importing the user query requirement data into the transfer learning model. The clustering performance evaluation data in the step S3 can be used for evaluating the effect of the model and guiding the subsequent model optimization, and the fitting condition of the model to the real estate transaction data can be better understood through evaluating the clustering performance, so that a decision maker is guided to make a more intelligent decision. Therefore, the accuracy and the prediction capability of the multi-dimensional data query of the real estate transaction are improved through outlier processing, graph data modeling, cross-domain data utilization and transfer learning.
Drawings
FIG. 1 is a schematic flow chart of steps of a method and system for querying data based on real estate transaction multidimensional data;
FIG. 2 is a flowchart illustrating the detailed implementation of step S2 in FIG. 1;
FIG. 3 is a flowchart illustrating the detailed implementation of step S3 in FIG. 1;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
To achieve the above objective, please refer to fig. 1 to 3, a method and a system for querying data based on multi-dimensional data of a real estate transaction, the method comprises the following steps:
step S1: obtaining original real estate transaction data; detecting abnormal values of original property transaction data to generate an abnormal mark data set and property transaction normalization data; performing data restoration on the abnormal mark data set so as to generate real estate transaction normalization data; carrying out data standardization on the real estate transaction normalization data to generate standard real estate transaction data;
step S2: creating edge nodes for standard real estate transaction data to generate real estate transaction nodes; establishing indexes of the real estate transaction nodes to generate node index data; partitioning graph data of the real estate transaction nodes according to the node index data to generate transaction partition optimization data; network connection is carried out on transaction partition optimization data, so that a distributed real estate transaction network diagram is generated;
Step S3: carrying out graph neural network model modeling according to the distributed property transaction network graph to generate a property transaction graph neural network model; performing multidimensional feature vector label processing on the house property transaction graph neural network model to generate a house property transaction clustering result label; performing clustering performance evaluation on the property transaction clustering result labels to generate property transaction clustering performance evaluation data;
step S4: acquiring cross-domain property transaction data and user query demand data; performing prediction model training on cross-domain property transaction data and property transaction clustering performance evaluation data to generate a property transaction transfer learning model; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
The invention can improve the data quality and reduce the negative influence on the model by identifying and processing the abnormal value. The outliers may be the result of data entry errors, system failures, or other anomalies. Excluding these outliers may allow the model to learn the pattern of the data more accurately. The robustness and generalization capability of the model are improved, and the sensitivity of the model to abnormal conditions is reduced. By generating the anomaly flag dataset, useful information can be provided for subsequent repair of the anomaly value. At the same time, normalizing the property transaction data can ensure the consistency and comparability of the data. Repairing the data in the abnormal marked data set can fill in potential data loss or errors and improve the integrity of the data. Normalization can eliminate the dimensional effects between different features, ensuring that they are on the same scale. This helps to accelerate the convergence process of the model, improving the stability of model training. Normalization allows the data to follow a standard normal distribution, contributing to the performance improvement of certain models, especially for distance-based models. By creating the edge nodes and generating the property transaction nodes, the relationship between property transactions can be better represented, a richer graph data structure is formed, the node index can improve the retrieval efficiency of graph data, the running speed of a graph algorithm is accelerated, the graph data partition can enable distributed computation to be more efficient, communication expenditure is reduced, and computing performance is improved. The transaction partition optimization data can optimize the graph data distribution, improve the parallelism of graph processing, and connect the partition optimization data with a network to construct a distributed real estate transaction network graph, so that the global structure and association relation of the real estate transaction data can be better reflected. The GNN can effectively capture complex relations and structures in graph data, is suitable for processing complex data such as a real estate transaction network graph, the multidimensional feature vector can describe features of real estate transaction nodes more comprehensively, label processing is helpful for providing a supervision learning target for a model, clustering result labels can classify similar real estate transaction nodes into one type, understanding of different clustering groups in real estate markets is facilitated, performance evaluation on clustering results can objectively scale the effectiveness of the model, and quality and accuracy of the clustering results are ensured. By combining the cross-domain property transaction data with the property transaction clustering performance evaluation data, more comprehensive and accurate information can be provided for training the transfer learning model, the user query demand data is imported into the transfer learning model for prediction, personalized prediction and recommendation can be carried out on the property transaction according to the model, the transfer learning model can utilize the existing knowledge to accelerate learning in a new domain or a new task, and the accuracy and adaptability of prediction are improved. The generated real estate transaction forecast result data can provide targeted information and advice for a decision maker, investor or user. Therefore, the accuracy and the prediction capability of the multi-dimensional data query of the real estate transaction are improved through outlier processing, graph data modeling, cross-domain data utilization and transfer learning.
In the embodiment of the present invention, as described with reference to fig. 1, a flow chart of steps of a method and a system for querying data based on real estate transaction multidimensional data according to the present invention is provided, in this example, the method and the system for querying data based on real estate transaction multidimensional data include the following steps:
step S1: obtaining original real estate transaction data; detecting abnormal values of original property transaction data to generate an abnormal mark data set and property transaction normalization data; performing data restoration on the abnormal mark data set so as to generate real estate transaction normalization data; carrying out data standardization on the real estate transaction normalization data to generate standard real estate transaction data;
in embodiments of the present invention, the property transaction data is obtained from a related institution, database or other data source. This may include information in multiple dimensions of house price, date of transaction, geographic location, area, etc., to obtain raw property transaction data. Outlier detection is performed using statistical methods or machine learning methods, such as Z-score, box plot, isolated forest, etc. The detected outliers are marked, an outlier marker dataset is created, and outliers in the original property transaction data are identified. Normalizing the property transaction data may include scaling the numerical features to a range (e.g., 0 to 1), processing missing values, processing discrete values, and so forth. This step may also generate normalized property transaction data, which is normalized to better process and analyze the data at a later stage. And repairing the abnormal data by using methods such as interpolation, regression, mean value, median value and the like. And obtaining the normalized data of the repaired property transaction after repairing. The data was converted to a standard normal distribution using standard deviation normalization methods. This can be achieved by subtracting the mean and dividing by the standard deviation, normalizing the property transaction normalization data. And (5) normalizing the repaired data to obtain standard real estate transaction data.
Step S2: creating edge nodes for standard real estate transaction data to generate real estate transaction nodes; establishing indexes of the real estate transaction nodes to generate node index data; partitioning graph data of the real estate transaction nodes according to the node index data to generate transaction partition optimization data; network connection is carried out on transaction partition optimization data, so that a distributed real estate transaction network diagram is generated;
in the embodiment of the invention, the standard property transaction data are converted into the nodes and edges in the graph database, each property transaction is taken as a node, attribute information of the transaction such as price, date, geographic position and the like is contained on the node, the edges are created to represent the relationship between properties, for example, if two transactions relate to the same property, one edge can be created between two nodes. The method has the advantages that the node retrieval efficiency is improved, the execution of a graph algorithm is accelerated, indexes are created for the real estate transaction nodes, unique identifiers are generally used for rapid retrieval, node index data are generated, and the node identifiers and corresponding node positions are recorded. The entire graph data is partitioned into a plurality of partitions for parallel processing on a distributed system, nodes are partitioned into different partitions using a graph data partitioning algorithm, and the connection relationship between the nodes is considered to minimize edges between the partitions. Generating transaction partition optimization data, and recording partition information and connection information between the partitions to which the nodes belong. And connecting the partitioned graph data to form a complete distributed graph, connecting nodes among the partitions by using a network connection algorithm to ensure the integrity of the graph, and finally generating a distributed property transaction network graph which can be processed and analyzed in parallel on a distributed system.
Step S3: carrying out graph neural network model modeling according to the distributed property transaction network graph to generate a property transaction graph neural network model; performing multidimensional feature vector label processing on the house property transaction graph neural network model to generate a house property transaction clustering result label; performing clustering performance evaluation on the property transaction clustering result labels to generate property transaction clustering performance evaluation data;
in the embodiment of the invention, modeling is performed by using a distributed property transaction network graph to capture complex relations among nodes, and proper graph neural network architectures such as Graph Convolutional Network (GCN), graph SAGE and the like are selected to be input into the distributed property transaction network graph and output into the feature vector of each node. Other key features (such as price, geographic position and the like) in the real estate transaction data are integrated into feature vectors generated by the graphic neural network to form comprehensive multidimensional feature vectors which are used as the representation of each node. Clustering the multidimensional feature vectors by using a clustering algorithm such as K-means, DBSCAN and the like to generate a house property transaction clustering result label, wherein each node is assigned to one clustering category. And evaluating the generated clustering results by using clustering performance evaluation indexes such as profile coefficients, mutual information and the like, comparing the performances under different clustering algorithms or parameter settings, and selecting an optimal clustering model.
Step S4: acquiring cross-domain property transaction data and user query demand data; performing prediction model training on cross-domain property transaction data and property transaction clustering performance evaluation data to generate a property transaction transfer learning model; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
In the embodiment of the invention, the property transaction data from different fields are collected to ensure the diversity and the universality of the data, and the data are arranged, including prices, geographic positions, property characteristics and the like. The user query requirement data is collected and may include information about the user's specific requirements for the property, budget, etc. The pre-trained graph neural network model and clustering performance evaluation data are used as basic models, on the basis, the data in the new field are finely tuned by a transfer learning method so as to adapt to new data distribution, a comprehensive transfer learning model is trained, and prediction can be performed in house property transaction data in different fields. The user query demand data is converted into a format acceptable to the model, including converting the query demand data into feature vectors, inputting the feature vectors into a trained transfer learning model, predicting, and obtaining a prediction result, which may include information such as predicted property labels, price ranges and the like.
Preferably, step S1 comprises the steps of:
step S11: the method comprises the steps of utilizing a distributed crawler to conduct data extraction on a real estate transaction data source to obtain original real estate transaction data;
step S12: performing data structure conversion processing on the original real estate transaction data according to a preset data standard format to generate real estate transaction structure conversion data;
step S13: detecting abnormal values of the property transaction structure conversion data, and when the abnormal value detection result of the property transaction structure conversion data is true, performing abnormal marking on the corresponding property transaction structure conversion data to generate an abnormal marking data set; when the abnormal value detection result of the real estate transaction structure conversion data is false, carrying out data normalization on the corresponding real estate transaction structure conversion data to generate real estate transaction normalization data;
step S14: carrying out data quality problem identification on the abnormal mark data set to generate abnormal type data; performing data restoration on the abnormal mark data set according to the abnormal type data, so as to generate real estate transaction restoration data;
step S15: carrying out data normalization on the real estate transaction repair data to generate real estate transaction normalization data; and carrying out data standardization on the real estate transaction normalized data based on a Z-score standardization method to generate standard real estate transaction data.
The invention utilizes the distributed crawler to extract the data of the real estate transaction data source, ensures that comprehensive and accurate original real estate transaction data is obtained, and establishes the basis of data analysis and processing. And carrying out structural conversion on the original data according to a preset data standard format, thereby being beneficial to unifying the data formats and improving the consistency and comparability of the data. This facilitates subsequent data analysis and mining work. By detecting abnormal values of the property transaction structure conversion data, possible erroneous or fraudulent transactions can be found in time. This helps to improve the quality and reliability of the data. The normal property transaction structure conversion data is normalized, so that the dimensional influence among the data can be eliminated, and the data is easier to compare and analyze. This helps to improve the training effect and result interpretation of the model. And the quality problem of the abnormal marked data set is identified and repaired, so that the marked data due to the abnormality can be restored, and the integrity of the whole data is improved. This is critical for subsequent analysis and decision making. Normalization of the data based on the Z-score normalization method can convert the data into a score based on a standard normal distribution, facilitating comparison between data of different scales. This improves the interpretability of the data and the stability of the model.
In the embodiment of the invention, the efficiency and the processing of large-scale data are improved by selecting proper distributed crawler tools such as Scrapy, apache Nutch and the like. The property transaction data source for determining the data to be extracted may be a property transaction website, a government data distribution platform, etc. And configuring crawlers, and setting parameters such as crawling frequency, target data structure and the like. And running the distributed crawler, and extracting original real estate transaction data. A data standard format is formulated, including field definitions, data types, etc. Cleaning the original data, processing missing values, data with inconsistent formats, and the like. And converting the cleaned data according to a preset standard format to form real estate transaction structure conversion data. And (3) detecting the abnormal value by using a statistical method or a machine learning model, marking corresponding data and generating an abnormal marking data set if the abnormal value detection result is true, and carrying out the next data normalization if the detection result is false. Data normalization methods, such as Min-Max Scaling or Z-score normalization, are used to ensure that the data is within a certain range or meets a standard normal distribution. By analyzing the abnormal mark data set, identifying the quality problem of the abnormal data, determining the abnormal type data, and adopting a proper repair strategy according to the abnormal type data, namely filling the missing value, deleting the abnormal data and the like, the real estate transaction repair data is generated. And normalizing the repaired data again to ensure the consistency of data processing, and converting the data into standard normal distribution by using a Z-score normalization method to generate standard real estate transaction data.
Preferably, step S14 includes the steps of:
step S141: performing anomaly type classification on the anomaly tagging data set to generate an anomaly type classification data set, wherein the anomaly type classification comprises a data format anomaly type, a data missing anomaly type, a data time sequence anomaly type, a geographic location data anomaly type and a fraudulent transaction anomaly type;
step S142: when the abnormal type classification data set is determined to be of the abnormal type of the data format, performing abnormal format conversion restoration on the corresponding abnormal type classification data to generate format abnormal restoration data;
step S143: when the abnormal type classification data set is determined to be of the data missing abnormal type, carrying out missing rate evaluation on the corresponding abnormal type classification data to generate missing rate evaluation data; comparing the deletion rate evaluation data with a preset deletion threshold, and performing variable deletion on abnormal type classification data corresponding to the deletion rate evaluation data when the deletion rate evaluation data is greater than or equal to the preset deletion threshold; when the deletion rate evaluation data is smaller than a preset deletion threshold value, interpolation filling repair processing is carried out on the abnormal type classification data corresponding to the deletion rate evaluation data, so that deletion abnormal repair data are generated;
Step S144: when the abnormal type classification data set is determined to be of the data time sequence abnormal type, carrying out seasonal component decomposition on the corresponding abnormal type classification data according to a time sequence decomposition technology to generate seasonal component decomposition data; carrying out data trend prediction on seasonal component decomposition data based on a trend time sequence model, and generating data trend prediction data; historical data collection is carried out on seasonal component decomposition data, and historical seasonal decomposition data are generated; performing abnormal time point correction on the abnormal type classification data by utilizing the data trend prediction data and the historical seasonal decomposition data to generate time sequence abnormal repair data;
step S145: when the abnormal type classification data set is determined to be of the abnormal type of the geographic position data, performing abnormal geographic position information verification on the corresponding abnormal type classification data through the geographic coding service to generate abnormal geographic information data; removing abnormal transaction positions of the abnormal geographic information data to generate geographic abnormal repair data;
step S146: when the abnormal type classification data set is determined to be of a fraudulent transaction abnormal type, carrying out abnormal identification on the corresponding abnormal type classification data to generate abnormal amount data; carrying out abnormal transaction mode identification on the abnormal type classification data according to the abnormal amount data to generate abnormal transaction identification data; abnormal value detection is carried out on abnormal amount data and abnormal transaction identification data by utilizing a transaction abnormal value quantification formula, and an abnormal transaction risk assessment value is generated; performing data blocking on the abnormal type classification data according to the abnormal transaction risk assessment value so as to generate identity abnormal repair data;
Step S147: and carrying out data combination on the format abnormal repair data, the missing abnormal repair data, the time sequence abnormal repair data, the geographic abnormal repair data and the identity abnormal repair data to generate real estate transaction repair data.
The invention can better understand the abnormal data property by classifying the abnormal data. This helps to take specific repair strategies for different types of anomalies, improving the accuracy of the repair. Repairing the data format abnormality is helpful to ensure the consistency and normalization of the data and improve the data quality. By evaluating the deletion rate and applying a corresponding repair strategy, the influence of data deletion on analysis and modeling can be reduced, and the integrity of data is ensured. By seasonal component decomposition and trend prediction, it is expected to more accurately repair anomalies in time series data, and the reliability of analysis of time series data is improved. By verifying and excluding the abnormal geographic position information, the accuracy of the geographic position data can be improved, and the authenticity and consistency of the geographic information can be ensured. By identifying and evaluating fraudulent transactions, the security and confidence of the data may be improved. Blocking out abnormal transactions may prevent potential fraud. And combining all the repair data into a complete real estate transaction repair data set, so that subsequent analysis and application are convenient.
In the embodiment of the invention, the abnormal type classification data set is generated by carrying out abnormal type classification on the abnormal mark data set, and the abnormal type classification data set comprises the following abnormal types: data format anomaly type, data missing anomaly type, data time series anomaly type, geographic location data anomaly type, fraudulent transaction anomaly type. When the anomaly type classification dataset is a data format anomaly type: and carrying out abnormal format conversion repair on the abnormal type classified data to generate format abnormal repair data. When the anomaly type classification dataset is a data missing anomaly type: and carrying out deletion rate evaluation on the abnormal type classification data to generate deletion rate evaluation data. Comparing the deletion rate evaluation data with a preset deletion threshold value: and if the abnormal type classification data is larger than or equal to the preset deletion threshold value, performing variable deletion on the corresponding abnormal type classification data. If the detected abnormal type classification data is smaller than the preset missing threshold value, interpolation filling repair processing is carried out on the corresponding abnormal type classification data, and missing abnormal repair data is generated. When the anomaly type classification dataset is a data time-series anomaly type: seasonal component decomposition is performed on the anomaly type classification data using a time-series decomposition technique to generate seasonal component decomposition data. And carrying out data trend prediction on the seasonal component decomposition data based on the trending time sequence model, generating data trend prediction data, and carrying out historical data collection on the seasonal component decomposition data to generate historical seasonal decomposition data. And correcting the abnormal time point of the abnormal type classification data by utilizing the data trend prediction data and the historical seasonal decomposition data to generate time sequence abnormal repair data. When the anomaly type classification dataset is of a geographic location data anomaly type: and carrying out abnormal geographic position information verification on the abnormal type classification data through the geographic coding service to generate abnormal geographic information data. And performing abnormal transaction position elimination on the abnormal geographic information data to generate geographic abnormal repair data. When the anomaly type classification dataset is of a fraudulent transaction anomaly type: and carrying out amount anomaly identification on the corresponding anomaly type classification data to generate anomaly amount data. And carrying out abnormal transaction mode identification on the abnormal type classification data according to the abnormal amount data to generate abnormal transaction identification data. And detecting abnormal value of the abnormal amount data and the abnormal transaction identification data by utilizing a transaction abnormal value quantification formula to generate an abnormal transaction risk assessment value. And performing data sealing and forbidden on the abnormal type classification data according to the abnormal transaction risk assessment value to generate identity abnormal repair data. And carrying out data combination on the format abnormal repair data, the missing abnormal repair data, the time sequence abnormal repair data, the geographic abnormal repair data and the identity abnormal repair data to generate real estate transaction repair data.
Preferably, step S145 includes the steps of:
step S1451: when the abnormal type classification data set is determined to be of the abnormal type of the geographic position data, geographic information extraction is carried out on the corresponding abnormal type classification data through a geographic coding service, and geographic information coding abnormal data are generated;
step S1452: performing abnormal section locking on the geographic information coding abnormal data to generate abnormal section range data; performing real estate location coordinate calibration on the abnormal region range data to generate abnormal region real estate coordinate data;
step S1453: the GPS is utilized to lock the real geographic region of the geographic information coding abnormal data, and the real region range data is generated; performing real-section range data on real-section coordinate data;
step S1464: comparing the abnormal regional property coordinate data with the real regional property coordinate data in position information to generate property regional correction data; and carrying out position information restoration on the abnormal type classification data through the real estate section correction data, thereby generating geographic abnormal restoration data.
According to the invention, the abnormal type classification data set is determined to be the abnormal type of the geographic position data, so that the abnormal type of the geographic position data in the processed data set can be definitely existed. This may include anomalies due to input errors, missing data, or other reasons. And processing the data of the abnormal type by using the geocoding service, extracting accurate geographic information and generating the abnormal data coded by the geographic information. This helps to convert unstructured geographical location data into a more manageable encoded form. And (3) performing abnormal section locking on the geographic information coding abnormal data to generate abnormal section range data, and determining an abnormal section, namely a specific geographic area possibly having a problem in the data. After the abnormal section is determined, processing the data of the section range, calibrating the position coordinates of each property in the section, and generating abnormal section property coordinate data. This helps to further analyze and locate potential problems. The GPS data is used to verify the previous geocoded anomaly data to lock its true geographic segment. After the real section is determined, processing the data of the section range, calibrating the position coordinates of each property in the section, and generating real section property coordinate data. This helps to establish more accurate reference data. And generating correction data by comparing the property coordinate data of the abnormal region and the real region, and recording the difference of the position information. And repairing the abnormal type classified data by using the correction data, so that the geographic position information is more accurate. This helps to improve the quality and accuracy of the overall dataset.
In the embodiment of the invention, the geographical information is extracted by selecting the applicable geographical coding service, such as Google Maps Geocoding API, hundred-degree map API and the like, aiming at the abnormal type classified data, and the geographical information coding abnormal data is generated by the geographical coding service. And (3) encoding the abnormal data by using geographic information, locking the abnormal section by using a Geographic Information System (GIS) tool or a related library, and calibrating coordinates of the house property position in the locked abnormal section range to generate house property coordinate data of the abnormal section. And locking the real geographic region of the geographic information coding abnormal data by using a GPS technology, calibrating the real property position by using GPS coordinates in the real region range, and generating real region property coordinate data. And comparing the position information of the abnormal regional property coordinate data with that of the real regional property coordinate data, calculating the difference between the coordinates through an algorithm (such as Euclidean distance calculation), and generating the property regional correction data. And (3) repairing the position information of the abnormal type classification data by utilizing the real estate correction data, wherein methods such as interpolation, average value and the like can be adopted, and geographic position information of the abnormal data is adjusted according to the correction data to generate geographic abnormal repair data.
Preferably, the transaction outlier quantization formula in step S146 is specifically as follows:
where R is represented as an abnormal transaction risk assessment value, α is represented as a parameter for controlling the response speed, β is represented as a transaction amount weight parameter, γ is represented as a importance coefficient for the history transaction integral, δ is represented as a parameter for controlling the response speed of the integral, x(s) is represented as a transaction amount at a maximum time s, x (u) is represented as a maximum transaction amount at a time u, du is represented as a derivative of the time u, ds is represented as a derivative of the maximum transaction time s, u is represented as a transaction time s is represented as a maximum transaction time s, and μ is represented as a transaction abnormal value quantization adjustment value.
The invention constructs a transaction abnormal value quantification formula, the principle of the formula is to evaluate the abnormal transaction risk at the current time point by carrying out responsive point and weighting calculation on transaction amount data and combining the point items of historical transaction. Exponential decay term e in the formula -αs For indicating that the response to the transaction amount gradually fades over time. By weighting the points of the historical transactions, the impact of past transactions on current risk may be considered. The correlation between the parameters for controlling the response speed and the parameters forms a functional relation:
By adjusting parameters that control the response speed. A larger value of alpha results in a faster response speed, is more sensitive to the effects of abnormal transactions, and facilitates the rapid detection of abnormal transactions. Transaction amount weight parameter beta. The degree of influence of the transaction amount on the risk assessment is adjusted by multiplying the transaction amount x(s). A larger value of beta increases the weight of the transaction amount, making a larger transaction amount a greater contribution to risk assessment. The importance factor for the historical transaction integral term. The degree of impact on the historical transaction is controlled by multiplying the integral term of the historical transaction. A larger gamma value will increase the importance of historical transactions, making historical transactions a greater contribution to risk assessment. And controlling the parameter of the response speed of the integral term. The larger delta value can enable the response speed of the integral term to be faster, namely, the integral calculation is carried out on the historical transaction faster, and the influence of the historical transaction can be updated in time. The adjustment value mu is quantized by trading outliers for correcting errors and deviations due to the complexity and non-idealities of the actual system. The method can correct the difference between theoretical assumption in the formula and an actual system, improves the accuracy and reliability of quantification of the abnormal transaction value, generates the abnormal transaction risk assessment value R more accurately, and simultaneously adjusts parameters such as parameters for controlling the response speed of the integral term, and the importance degree coefficient of the historical transaction integral term in the formula according to actual conditions, thereby adapting to different quantification scenes of the abnormal transaction value and improving the applicability and flexibility of the algorithm. When the transaction abnormal value quantification formula conventional in the art is used, the abnormal transaction risk assessment value can be obtained, and the abnormal transaction risk assessment value can be calculated more accurately by applying the transaction abnormal value quantification formula provided by the invention.
Preferably, step S2 comprises the steps of:
step S21: performing node-side data conversion on standard real estate transaction data to generate node-side relation data;
step S22: node creation is carried out on the node-side relation coefficient data by utilizing a distributed database, and a real estate transaction node is generated; establishing indexes of the real estate transaction nodes to generate node index data;
step S23: performing side relation mapping on the real estate transaction nodes according to the node index data to generate real estate transaction side relation mapping data;
step S24: partitioning the map data according to preset partitioning rules by the property transaction edge relation mapping data to generate transaction partition map data; inquiring and storing the transaction partition map data to perform performance test, thereby generating transaction partition performance test data; carrying out partition optimization on the transaction partition map data based on the transaction partition performance test data to generate transaction partition optimization data;
step S25: and connecting the transaction partition optimization data with a network, so as to generate a distributed real estate transaction network diagram.
The invention is helpful for better representing the relationship between properties by converting the standard property transaction data into the node-side relation coefficient data, provides a basis for the subsequent graph data processing and analysis, and utilizes the distributed database to establish the nodes and index, thereby being helpful for improving the efficiency of data retrieval and inquiry, and particularly, the distributed database can provide better transverse expansibility when processing large-scale data. The edge relation mapping is conducted according to the node index data, so that the association among the nodes is facilitated to be established, the structure of the graph data is clearer, and subsequent analysis and mining are facilitated. Partitioning the graph data according to preset partitioning rules is beneficial to improving query performance, and particularly in a distributed environment, reasonable data partitioning can reduce communication overhead. The performance test of inquiry and storage is helpful for evaluating the efficiency of the system when processing the property transaction diagram data, and provides guidance for improving the system performance. The response speed and throughput of the system can be further improved by carrying out partition optimization based on the performance test data, and the data distribution is optimized to adapt to the actual query requirement. The transaction partition optimization data are connected through the network, and the generation of the distributed real estate transaction network diagram is beneficial to comprehensively understanding and visualizing real estate transaction relations, so that more visual information display and analysis means are provided.
As an example of the present invention, referring to fig. 2, the step S2 in this example includes:
step S21: performing node-side data conversion on standard real estate transaction data to generate node-side relation data;
in embodiments of the present invention, edges represent relationships between entities, such as trade relationships between properties and sellers, by nodes generally representing entities, such as properties, trade participants (sellers, buyers), trade times, etc. The quality and consistency of the original data are ensured, missing values and abnormal values are processed, the data format is standardized, and the consistency of the attributes of the nodes and the edges is ensured. The attributes of the nodes and the edges are designed, the proper attributes are selected according to the service requirements, the types of the nodes and the edges are determined, for example, the properties of the nodes, the transaction participants and the like are the types of the nodes, and the transaction relationship is the type of the edges. An appropriate graph database (e.g., neo4 j) or graph processing framework (e.g., apache Spark GraphX) is selected to store and process the graph data. And importing the cleaned and preprocessed data into a graph database or a graph processing frame to ensure the correctness and the integrity of the data. In the graph database, models of nodes and edges are created, their attributes and relationships are defined, and in the graph processing framework, relationships of nodes and edges are defined using appropriate APIs. The query language or API provided by the graph database or the graph processing framework is utilized to analyze and explore data, the mode and insight related to the property transaction are found through the relationship between the query node and the edge, and the generated node-edge relationship data is displayed by utilizing the visualization tool.
Step S22: node creation is carried out on the node-side relation coefficient data by utilizing a distributed database, and a real estate transaction node is generated; establishing indexes of the real estate transaction nodes to generate node index data;
in the embodiment of the invention, the distributed database, such as Cassandra, mongoDB, HBase, and the like, which is suitable for the requirements are selected. Ensuring that it can handle large-scale data and support graph data models. According to the service demand, a node model is designed in a database, and the attribute of the real estate transaction node is defined. Ensuring that the node model is consistent with the previously designed graph model. The node-edge relationship data is imported into a distributed database. This may require adjustment based on the nature of the database and the supported importation tools. And utilizing the distributed property of the database, and simultaneously creating a property transaction node. This may be achieved by techniques such as batch insertion, parallel processing, etc. And indexing key attributes of the real estate transaction node to accelerate data query. Ensuring that the appropriate index type, such as a single field index or a compound index, is selected to meet the query requirements. The distributed computing function of the database is utilized to ensure that the performance of query and index building is balanced across the cluster. And performing necessary performance tuning according to the actual performance requirements and the characteristics of the database. This may include adjusting the tile settings, optimizing query statements, adding nodes, etc.
Step S23: performing side relation mapping on the real estate transaction nodes according to the node index data to generate real estate transaction side relation mapping data;
in the embodiment of the invention, the established node index data are obtained from the distributed database, and the index data comprise identifiers and key attributes of the real estate transaction nodes. A property transaction edge relationship mapping model is defined to specify the type of edge and the attributes that the edge may contain. Traversing node index data, and mapping side relations among nodes through association relations or other modes. This may involve matching and associating node identifiers. And creating property transaction side relation mapping data according to the side relation obtained by mapping. This may be a data structure containing edges including attribute information for the start node, the target node, and the edges. And importing the generated property transaction edge relation mapping data into a distributed database to ensure the consistency and the integrity of the data. If query performance of edge relationship data is a key issue, appropriate indexing can be established with respect to the attributes of the edges. And verifying the generated edge relationship coefficient data to ensure that the relation mapping between the nodes and the edges is correct. Quality control measures are implemented to deal with errors or inconsistencies that may exist. Performing necessary performance tuning according to actual performance requirements may include optimizing query statements, adjusting distributed computing parameters, and the like.
Step S24: partitioning the map data according to preset partitioning rules by the property transaction edge relation mapping data to generate transaction partition map data; inquiring and storing the transaction partition map data to perform performance test, thereby generating transaction partition performance test data; carrying out partition optimization on the transaction partition map data based on the transaction partition performance test data to generate transaction partition optimization data;
in the embodiment of the invention, by defining the partitioning rule of the graph data, how to partition the data according to the attribute, the identifier and the like of the node or the edge is considered. This may be based on a hash function, range partitioning, uniform distribution, etc. policy. And partitioning the house property transaction edge relation mapping data according to a preset partitioning rule. Ensuring uniform data distribution while meeting the intended partitioning rules. Performance test cases and indicators are designed, including query response time, data loading time, data writing speed, and the like. And inquiring and testing the storage performance of the transaction partition map data. Typical queries are executed and their response times measured, with extensive data loading and writing tests being performed to evaluate performance. And recording results of inquiry and storage performance tests, including performance indexes such as response time, throughput and the like under various loads. These data will be used for subsequent partition optimizations. Partition optimization is performed based on the performance test data. Partitioning rules may be adjusted, data reassigned, or other optimization measures may be taken to improve performance based on the test results. And performing performance test again on the optimized transaction partition map data, and verifying the optimizing effect. Ensuring optimization measures can improve query and storage performance. And recording the performance index changes before and after optimization, and comparing and evaluating the improvement degree of the performance. These data will be used for future reference and decision making.
Step S25: and connecting the transaction partition optimization data with a network, so as to generate a distributed real estate transaction network diagram.
In the embodiment of the invention, the proper network connection mode is selected by ensuring that all transaction partition optimization data are ready, including a partition optimized data set. This may involve using a distributed database system, a graph database system, or other suitable techniques. If the data is stored on multiple nodes or servers, a distributed database system (e.g., hadoop, cassandra, HBase, etc.) is used for data connection and management. If the data structure is more suitable for graphic storage, it is considered to use a graph database (e.g., neo4j, amazon Neptune, etc.) for network connection and processing of graphic data. The optimized data of each partition is integrated into a whole network diagram. This may require data consolidation, linking, or reconstruction of the graph structure, determining how to connect these partition optimized data to form a network graph based on business requirements and data structure, and connecting the distributed property transaction data into a complete network graph based on the connection scheme using corresponding techniques and tools. The consistency of the network diagram after data connection and the original data is ensured, and the result of partition optimization is correctly integrated into the network diagram. And performing performance test on the generated distributed property transaction network graph to verify the performance of the distributed property transaction network graph in aspects of query, data loading, data storage and the like, and performing necessary optimization adjustment according to the performance test result so as to improve the performance and expandability of the distributed property transaction network graph.
Preferably, step S3 comprises the steps of:
step S31: carrying out graph neural network model modeling according to the distributed property transaction network graph to generate a property transaction graph neural network model; capturing node characteristic relation of the neural network model of the real estate transaction graph to generate real estate transaction node characteristic data;
step S32: based on the characteristic data of the real estate transaction node, extracting multidimensional characteristic vectors from the real estate transaction graph neural network model to obtain real estate transaction dimension characteristic vectors;
step S33: performing data dimension reduction on the real estate transaction dimension feature vector to generate a real estate transaction dimension reduction feature vector; performing clustering label processing on the real estate transaction dimension reduction feature vector by using a deep clustering algorithm to generate a real estate transaction clustering result label; removing chaotic labels from the property transaction clustering result labels, so as to generate property transaction optimization data;
step S34: and carrying out clustering performance evaluation on the real estate transaction optimization data by using a clustering performance evaluation formula to generate real estate transaction clustering performance evaluation data.
According to the invention, the complex relationship among the nodes in the real estate transaction network graph can be captured through the graph neural network, the understanding of the whole network structure is improved, and the generated real estate transaction graph neural network model is helpful for deeply understanding the complex relationship of real estate transaction, so that a powerful characteristic foundation is provided for the subsequent steps. The multi-dimensional feature vector is extracted based on the node feature data, so that each node can be represented in a richer feature vector form, more information about each node can be captured by obtaining finer feature vectors, and more informative input is provided for subsequent data reduction and clustering. The multidimensional feature vector is subjected to dimension reduction, the feature dimension is reduced, then the data is clustered by using a deep clustering algorithm, the dimension reduction can simplify the model, the calculation efficiency is improved, and the deep clustering is beneficial to finding potential modes and groups in the data, so that a more meaningful clustering result is generated. And processing the clustering result labels, removing the chaotic labels, generating optimization data, and removing the chaotic labels is beneficial to improving the clustering accuracy, so that the generated real estate transaction optimization data is more reliable and has practical significance. And evaluating the optimized data by using a clustering performance evaluation formula, quantifying the accuracy and performance of clustering, providing objective evaluation of clustering results, helping to confirm the effectiveness of the model and providing guidance for further improvement.
As an example of the present invention, referring to fig. 3, the step S3 in this example includes:
step S31: carrying out graph neural network model modeling according to the distributed property transaction network graph to generate a property transaction graph neural network model; capturing node characteristic relation of the neural network model of the real estate transaction graph to generate real estate transaction node characteristic data;
in the embodiment of the invention, the data of the distributed property transaction network diagram, including the information of the nodes and the edges, is obtained. Each node may represent a property transaction record, while edges represent relationships between nodes, such as transaction relationships. Appropriate neural network structures are selected, such as Graph Convolutional Network (GCN), graphSAGE (Graph Sample and Aggregated) or Gated Graph Neural Network (GGNN), and the like. According to the selected model structure, a network layer is constructed, and the input and the output of the network are considered. Each node and edge is assigned an initial characteristic, which may include attribute information (e.g., transaction amount, geographic location, etc.) for the node. Features of nodes and edges are mapped to a low-dimensional space using embedding techniques so that the neural network learns better the structure and features of the graph. The prepared data is input into the graph neural network, defining a loss function (typically including the loss of node classification tasks or graph level tasks). The weights of the model are updated with a back propagation algorithm and an optimizer (e.g., adam, SGD, etc.) to minimize the loss function. In the training process, the graph neural network captures the characteristic relationship between nodes by learning the connection and the adjacent relationship between the nodes. Through this process, each node gets a learned feature vector reflecting its location and importance throughout the network. Node feature data is extracted from the trained neural network model, and can be used for subsequent data processing and analysis.
Step S32: based on the characteristic data of the real estate transaction node, extracting multidimensional characteristic vectors from the real estate transaction graph neural network model to obtain real estate transaction dimension characteristic vectors;
in the embodiment of the invention, the trained graph neural network model is ensured, and the node characteristic data can be extracted by using the model. And extracting the characteristics of each real estate transaction node by using a trained graph neural network model. These features may be modeled node embeddings or other high-dimensional features. The features of each node are combined into a multi-dimensional feature vector. This may be by directly linking the embedding of the nodes together or by performing some operation on the node characteristics (e.g., stitching, summing, averaging, etc.) to obtain a richer characteristic representation. The resulting multi-dimensional feature vectors are normalized to ensure that features of each dimension have similar dimensions. This helps to improve the stability and convergence speed of the subsequent model training. And verifying and analyzing the obtained property transaction dimension feature vector. The importance and characteristics of each dimension can be known through visualization, statistical analysis and the like.
Step S33: performing data dimension reduction on the real estate transaction dimension feature vector to generate a real estate transaction dimension reduction feature vector; performing clustering label processing on the real estate transaction dimension reduction feature vector by using a deep clustering algorithm to generate a real estate transaction clustering result label; removing chaotic labels from the property transaction clustering result labels, so as to generate property transaction optimization data;
in the embodiment of the invention, the dimension of the feature vector of the property transaction dimension is reduced by selecting a dimension reduction algorithm, such as Principal Component Analysis (PCA) or t-distribution neighborhood embedding (t-SNE). This helps to increase computational efficiency and preserve critical information. And (3) converting each property transaction dimension feature vector into a vector with a lower dimension, namely a dimension reduction feature vector by applying a selected dimension reduction algorithm. A depth clustering algorithm is selected, e.g., self-encoder based clustering, spectral clustering, etc. These algorithms can find potential cluster structures in the feature space after dimension reduction. And clustering the feature vectors after dimension reduction by using a selected deep clustering algorithm to generate a clustering label of each real estate transaction. And analyzing the clustering result, and detecting possible chaotic labels. This may be achieved by observing similarities between categories, closeness within clusters, etc. For chaotic labels, some heuristics or readjusting the parameters of the clustering algorithm may be considered. And reorganizing the original data based on the result of chaotic label elimination to obtain an optimized data set of the real estate transaction. This optimized dataset should reflect a more accurate cluster structure, improving the performance of subsequent tasks.
Step S34: and carrying out clustering performance evaluation on the real estate transaction optimization data by using a clustering performance evaluation formula to generate real estate transaction clustering performance evaluation data.
In the embodiment of the invention, the clustering performance is evaluated by selecting an appropriate index. Common indicators include contour coefficients (Silhouette Coefficient), calinski-Harabasz index, davies-Bouldin index, and the like. Each index has its advantages and disadvantages, so it is preferable to consider a plurality of indexes in combination. For the profile coefficients, the profile coefficients for each data point are calculated and averaged. For the Calinski-Harabasz index and the Davies-Bouldin index, calculations were performed according to the corresponding formulas. And calculating the clustering performance by applying a corresponding formula according to the selected clustering performance evaluation index. The following are examples of some metrics calculation methods: profile factor (Silhouette Coefficient): for each data point, its average distance (a) to other data points in the same cluster and its average distance (b) to all data points in a nearest different cluster are calculated, and then the contour coefficient is calculated as (b-a)/max (a, b). Calinski-Harabasz index: and calculating the ratio of the intra-cluster dispersion to the inter-cluster dispersion, wherein the higher the index is, the better the clustering effect is. Davies-Bouldin index: for each cluster, the average similarity with the nearest cluster is calculated, and the lower the index is, the better the clustering effect is. And applying the calculated clustering performance index to an optimized data set to generate the property transaction clustering performance evaluation data. These data may include index values for each cluster, overall cluster performance scores, etc.
Preferably, the clustering performance evaluation formula in step S34 is specifically as follows:
;/>
wherein Q is expressed as a cluster performance evaluation coefficient, a i Represented as sample i and all other in the same clusterAverage distance of samples, b i The average distance between the sample i and the nearest neighbor samples which do not belong to the same cluster is represented as i, i is represented as a sample number index, n is represented as the number of clusters of the property transaction optimization data, and epsilon is represented as the abnormal correction quantity of the cluster performance evaluation.
The invention constructs a clustering performance evaluation formula, the principle of the formula is to compare the compactness of each sample in the belonging cluster (through a i Metric) and degree of separation from other clusters (by b i A measurement). By calculating b i -a i And normalized (divided by the larger a i And b i A quality measure Q of the clustering result may be obtained. The average distance between the sample i and all other samples in the same cluster and the correlation between the parameters form a functional relation:
by adjusting the average distance of sample i from all other samples in the same cluster. It measures the closeness of a sample within its own cluster, with smaller values indicating that the sample is closer to other samples within the same cluster. Average distance of samples i and nearest neighbor samples not belonging to the same cluster. It measures the degree of separation of a sample from other clusters, with a larger value indicating a greater distance between the sample and other clusters. The number of clusters n of the property transaction optimization data represents the number of clusters into which the clustering algorithm divides the data. The anomaly correction amount epsilon is evaluated by clustering performance for correcting errors and deviations due to the complexity and non-idealities of the actual system. The method can correct the difference between theoretical assumption in the formula and an actual system, improves the accuracy and reliability of clustering performance evaluation, generates a clustering performance evaluation coefficient Q more accurately, and simultaneously adjusts parameters such as sample number index in the formula, the number of the property transaction optimization data clusters and the like according to actual conditions, so that the method is suitable for different clustering performance evaluation scenes, and improves the applicability and flexibility of the algorithm. When the conventional clustering performance evaluation formula in the field is used, the clustering performance evaluation coefficient can be obtained, and the clustering performance evaluation coefficient can be calculated more accurately by applying the clustering performance evaluation formula provided by the invention. The formula comprehensively considers the compactness in the clusters and the separation degree among the clusters, thereby providing comprehensive evaluation of the quality of the clustering result. A higher Q value indicates a better clustering result, i.e. samples are closer in their own cluster and farther from other clusters. By introducing the abnormal correction amount epsilon, abnormal conditions can be corrected, and the stability and reliability of the evaluation result can be ensured.
Preferably, step S4 comprises the steps of:
step S41: acquiring cross-domain property transaction data and user query demand data;
step S42: data combination is carried out on cross-domain property transaction data and property transaction clustering performance evaluation data, and a cross-domain property transaction data set is generated; dividing a cross-domain real estate transaction data set into data sets to generate a model training set and a model testing set; performing predictive model training based on the model training set to generate source field model parameters; performing migration learning model construction according to the source field model parameters and the model test set to generate a real estate transaction migration learning model;
step S43: model optimization is carried out on the house property transaction transfer learning model, and a house property transaction optimization transfer learning model is generated; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
According to the method, the real estate transaction data and the query demand data of the users are collected from different fields, cross-field data collection is helpful for building a more comprehensive and diversified data set, and a wider information base is provided, so that the generalization capability and adaptability of the model are enhanced. And combining the cross-domain property transaction data and the clustering performance evaluation data to generate a cross-domain property transaction data set. Then, the data set is divided into a model training set and a model testing set, and prediction model training is performed based on the model training set. The data merging can increase the information density of the data set, provide more features for model training, the data set division is helpful for evaluating the generalization performance of the model, the clustering performance evaluation data is used for training, more information about the data structure and the mode can be introduced, and the accuracy of the model is improved. Optimizing the real estate transaction transfer learning model, and then importing user query demand data into the model to query and predict, so as to generate real estate transaction prediction result data. The model optimization is beneficial to improving the performance of the model, so that the model is more suitable for cross-domain data, personalized real estate transaction suggestions can be provided for users by inquiring the predicted result data, and the user experience is improved. Through transfer learning, the model can utilize knowledge of source field to adapt to target field more effectively, improve performance and effect of the model, and can improve comprehensiveness of the model by combining cross-field data and clustering performance evaluation, so that the model has more adaptability to diversified data.
In the embodiment of the invention, the real estate transaction data are acquired from real estate transaction platforms, government databases or other related data sources in different fields, and the user query demand data are collected in the modes of user investigation, web crawlers and the like. Combining cross-domain property transaction data and property transaction clustering performance evaluation data, processing missing values and abnormal values, preprocessing the data by adopting methods such as standardization and normalization, dividing the combined data set into a model training set and a model test set, performing prediction model training by using the model training set in a random division or time division mode, selecting a proper machine learning algorithm such as deep learning, a support vector machine and the like, and generating source domain model parameters. And constructing a migration learning model by using the source field model parameters and the model test set, wherein an algorithm of migration learning, such as field self-adaption, knowledge distillation and the like, can be adopted. Optimizing the house property transaction transfer learning model, and evaluating the performance of the model by adopting methods such as parameter adjustment, regularization and the like and technologies such as cross verification and the like. And importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data. And performing performance evaluation on the generated prediction result, and performing necessary model adjustment and iteration according to the performance evaluation result by using various indexes such as accuracy, recall rate and the like so as to improve the accuracy and generalization capability of the model.
In this specification, a data query system based on property transaction multidimensional data is provided, which is configured to execute the data query method based on property transaction multidimensional data, where the data query system based on property transaction multidimensional data includes:
the abnormality detection module is used for obtaining original real estate transaction data; detecting abnormal values of original property transaction data to generate an abnormal mark data set and property transaction normalization data; performing data restoration on the abnormal mark data set so as to generate real estate transaction normalization data; carrying out data standardization on the real estate transaction normalization data to generate standard real estate transaction data;
the graph network connection module is used for creating edge nodes of standard real estate transaction data and generating real estate transaction nodes; establishing indexes of the real estate transaction nodes to generate node index data; partitioning graph data of the real estate transaction nodes according to the node index data to generate transaction partition optimization data; and connecting the transaction partition optimization data with a network, so as to generate a distributed real estate transaction network diagram.
The clustering label module is used for modeling the graphic neural network model according to the distributed property transaction network diagram and generating a property transaction diagram neural network model; performing multidimensional feature vector label processing on the house property transaction graph neural network model to generate a house property transaction clustering result label; performing clustering performance evaluation on the property transaction clustering result labels to generate property transaction clustering performance evaluation data;
The query prediction module is used for acquiring cross-domain real estate transaction data and user query demand data; performing prediction model training on cross-domain property transaction data and property transaction clustering performance evaluation data to generate a property transaction transfer learning model; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
The method has the advantages that the quality of original real estate transaction data can be improved, the interference of noise and abnormality on subsequent analysis is reduced, the generated standard real estate transaction data is more reliable, and the method is beneficial to establishing a more accurate model and improving the reliability of query results through the abnormal value detection, repair and data standardization in the step S1. The map data partition and the creation of the distributed property transaction network map in the step S2 can improve the efficiency and the expandability of data processing, and the generated distributed property transaction network map can better reflect the relationship between transactions, thereby being beneficial to more comprehensively understanding the complex structure of the property market. The graphic neural network model in the step S3 can learn more complex modes and relations from graphic data, the modeling flexibility is improved, the generated house property transaction graphic neural network model can better capture potential characteristics, the house property transaction is more accurately classified and clustered, and richer information is provided for subsequent analysis. The transfer learning model in the step S4 can combine the data in different fields, improves the prediction capability of the cross-field property transaction data, and the generated property transaction prediction result is more personalized and can better meet the requirements of users by importing the user query requirement data into the transfer learning model. The clustering performance evaluation data in the step S3 can be used for evaluating the effect of the model and guiding the subsequent model optimization, and the fitting condition of the model to the real estate transaction data can be better understood through evaluating the clustering performance, so that a decision maker is guided to make a more intelligent decision. Therefore, the accuracy and the prediction capability of the multi-dimensional data query of the real estate transaction are improved through outlier processing, graph data modeling, cross-domain data utilization and transfer learning.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The data query method based on the real estate transaction multidimensional data is characterized by comprising the following steps:
step S1: obtaining original real estate transaction data; detecting abnormal values of original property transaction data to generate an abnormal mark data set and property transaction normalization data; performing data restoration on the abnormal mark data set so as to generate real estate transaction normalization data; carrying out data standardization on the real estate transaction normalization data to generate standard real estate transaction data;
Step S2: creating edge nodes for standard real estate transaction data to generate real estate transaction nodes; establishing indexes of the real estate transaction nodes to generate node index data; partitioning graph data of the real estate transaction nodes according to the node index data to generate transaction partition optimization data; network connection is carried out on transaction partition optimization data, so that a distributed real estate transaction network diagram is generated;
step S3: carrying out graph neural network model modeling according to the distributed property transaction network graph to generate a property transaction graph neural network model; performing multidimensional feature vector label processing on the house property transaction graph neural network model to generate a house property transaction clustering result label; performing clustering performance evaluation on the property transaction clustering result labels to generate property transaction clustering performance evaluation data;
step S4: acquiring cross-domain property transaction data and user query demand data; performing prediction model training on cross-domain property transaction data and property transaction clustering performance evaluation data to generate a property transaction transfer learning model; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
2. The method for querying data based on multi-dimensional data for a property transaction according to claim 1, wherein the step S1 comprises the steps of:
step S11: the method comprises the steps of utilizing a distributed crawler to conduct data extraction on a real estate transaction data source to obtain original real estate transaction data;
step S12: performing data structure conversion processing on the original real estate transaction data according to a preset data standard format to generate real estate transaction structure conversion data;
step S13: detecting abnormal values of the property transaction structure conversion data, and when the abnormal value detection result of the property transaction structure conversion data is true, performing abnormal marking on the corresponding property transaction structure conversion data to generate an abnormal marking data set; when the abnormal value detection result of the real estate transaction structure conversion data is false, carrying out data normalization on the corresponding real estate transaction structure conversion data to generate real estate transaction normalization data;
step S14: carrying out data quality problem identification on the abnormal mark data set to generate abnormal type data; performing data restoration on the abnormal mark data set according to the abnormal type data, so as to generate real estate transaction restoration data;
step S15: carrying out data normalization on the real estate transaction repair data to generate real estate transaction normalization data; and carrying out data standardization on the real estate transaction normalized data based on a Z-score standardization method to generate standard real estate transaction data.
3. The method of claim 2, wherein step S14 comprises the steps of:
step S141: performing anomaly type classification on the anomaly tagging data set to generate an anomaly type classification data set, wherein the anomaly type classification comprises a data format anomaly type, a data missing anomaly type, a data time sequence anomaly type, a geographic location data anomaly type and a fraudulent transaction anomaly type;
step S142: when the abnormal type classification data set is determined to be of the abnormal type of the data format, performing abnormal format conversion restoration on the corresponding abnormal type classification data to generate format abnormal restoration data;
step S143: when the abnormal type classification data set is determined to be of the data missing abnormal type, carrying out missing rate evaluation on the corresponding abnormal type classification data to generate missing rate evaluation data; comparing the deletion rate evaluation data with a preset deletion threshold, and performing variable deletion on abnormal type classification data corresponding to the deletion rate evaluation data when the deletion rate evaluation data is greater than or equal to the preset deletion threshold; when the deletion rate evaluation data is smaller than a preset deletion threshold value, interpolation filling repair processing is carried out on the abnormal type classification data corresponding to the deletion rate evaluation data, so that deletion abnormal repair data are generated;
Step S144: when the abnormal type classification data set is determined to be of the data time sequence abnormal type, carrying out seasonal component decomposition on the corresponding abnormal type classification data according to a time sequence decomposition technology to generate seasonal component decomposition data; carrying out data trend prediction on seasonal component decomposition data based on a trend time sequence model, and generating data trend prediction data; historical data collection is carried out on seasonal component decomposition data, and historical seasonal decomposition data are generated; performing abnormal time point correction on the abnormal type classification data by utilizing the data trend prediction data and the historical seasonal decomposition data to generate time sequence abnormal repair data;
step S145: when the abnormal type classification data set is determined to be of the abnormal type of the geographic position data, performing abnormal geographic position information verification on the corresponding abnormal type classification data through the geographic coding service to generate abnormal geographic information data; removing abnormal transaction positions of the abnormal geographic information data to generate geographic abnormal repair data;
step S146: when the abnormal type classification data set is determined to be of a fraudulent transaction abnormal type, carrying out abnormal identification on the corresponding abnormal type classification data to generate abnormal amount data; carrying out abnormal transaction mode identification on the abnormal type classification data according to the abnormal amount data to generate abnormal transaction identification data; abnormal value detection is carried out on abnormal amount data and abnormal transaction identification data by utilizing a transaction abnormal value quantification formula, and an abnormal transaction risk assessment value is generated; performing data blocking on the abnormal type classification data according to the abnormal transaction risk assessment value so as to generate identity abnormal repair data;
Step S147: and carrying out data combination on the format abnormal repair data, the missing abnormal repair data, the time sequence abnormal repair data, the geographic abnormal repair data and the identity abnormal repair data to generate real estate transaction repair data.
4. The method of claim 3, wherein step S145 comprises the steps of:
step S1451: when the abnormal type classification data set is determined to be of the abnormal type of the geographic position data, geographic information extraction is carried out on the corresponding abnormal type classification data through a geographic coding service, and geographic information coding abnormal data are generated;
step S1452: performing abnormal section locking on the geographic information coding abnormal data to generate abnormal section range data; performing real estate location coordinate calibration on the abnormal region range data to generate abnormal region real estate coordinate data;
step S1453: the GPS is utilized to lock the real geographic region of the geographic information coding abnormal data, and the real region range data is generated; performing real-section range data on real-section coordinate data;
step S1464: comparing the abnormal regional property coordinate data with the real regional property coordinate data in position information to generate property regional correction data; and carrying out position information restoration on the abnormal type classification data through the real estate section correction data, thereby generating geographic abnormal restoration data.
5. The method for querying data based on property trade multidimensional data according to claim 4, wherein the trade outlier quantization formula in step S146 is as follows:
where R is represented as an abnormal transaction risk assessment value, α is represented as a parameter for controlling the response speed, β is represented as a transaction amount weight parameter, γ is represented as a importance coefficient for the history transaction integral, δ is represented as a parameter for controlling the response speed of the integral, x(s) is represented as a transaction amount at a maximum time s, x (u) is represented as a maximum transaction amount at a time u, du is represented as a derivative of the time u, ds is represented as a derivative of the maximum transaction time s, u is represented as a transaction time s is represented as a maximum transaction time s, and μ is represented as a transaction abnormal value quantization adjustment value.
6. The method for querying data based on multi-dimensional data for a property transaction according to claim 2, wherein step S2 comprises the steps of:
step S21: performing node-side data conversion on standard real estate transaction data to generate node-side relation data;
step S22: node creation is carried out on the node-side relation coefficient data by utilizing a distributed database, and a real estate transaction node is generated; establishing indexes of the real estate transaction nodes to generate node index data;
Step S23: performing side relation mapping on the real estate transaction nodes according to the node index data to generate real estate transaction side relation mapping data;
step S24: partitioning the map data according to preset partitioning rules by the property transaction edge relation mapping data to generate transaction partition map data; inquiring and storing the transaction partition map data to perform performance test, thereby generating transaction partition performance test data; carrying out partition optimization on the transaction partition map data based on the transaction partition performance test data to generate transaction partition optimization data;
step S25: and connecting the transaction partition optimization data with a network, so as to generate a distributed real estate transaction network diagram.
7. The method for querying data based on multi-dimensional data for a property transaction according to claim 6, wherein step S3 comprises the steps of:
step S31: carrying out graph neural network model modeling according to the distributed property transaction network graph to generate a property transaction graph neural network model; capturing node characteristic relation of the neural network model of the real estate transaction graph to generate real estate transaction node characteristic data;
step S32: based on the characteristic data of the real estate transaction node, extracting multidimensional characteristic vectors from the real estate transaction graph neural network model to obtain real estate transaction dimension characteristic vectors;
Step S33: performing data dimension reduction on the real estate transaction dimension feature vector to generate a real estate transaction dimension reduction feature vector; performing clustering label processing on the real estate transaction dimension reduction feature vector by using a deep clustering algorithm to generate a real estate transaction clustering result label; removing chaotic labels from the property transaction clustering result labels, so as to generate property transaction optimization data;
step S34: and carrying out clustering performance evaluation on the real estate transaction optimization data by using a clustering performance evaluation formula to generate real estate transaction clustering performance evaluation data.
8. The method for querying data based on multi-dimensional data for a property transaction according to claim 7, wherein the clustering performance assessment formula in step S34 is as follows:
wherein Q is expressed as a cluster performance evaluation coefficient, a i Expressed as the average distance of sample i from all other samples in the same cluster, b i The average distance between the sample i and the nearest neighbor samples which do not belong to the same cluster is represented as i, i is represented as a sample number index, n is represented as the number of clusters of the property transaction optimization data, and epsilon is represented as the abnormal correction quantity of the cluster performance evaluation.
9. The method for querying data based on multi-dimensional data for a property transaction according to claim 7, wherein step S4 comprises the steps of:
Step S41: acquiring cross-domain property transaction data and user query demand data;
step S42: data combination is carried out on cross-domain property transaction data and property transaction clustering performance evaluation data, and a cross-domain property transaction data set is generated; dividing a cross-domain real estate transaction data set into data sets to generate a model training set and a model testing set; performing predictive model training based on the model training set to generate source field model parameters; performing migration learning model construction according to the source field model parameters and the model test set to generate a real estate transaction migration learning model;
step S43: model optimization is carried out on the house property transaction transfer learning model, and a house property transaction optimization transfer learning model is generated; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
10. A property transaction multi-dimensional data based data query system for performing the property transaction multi-dimensional data based data query method of claim 1, the property transaction multi-dimensional data based data query system comprising:
the abnormality detection module is used for obtaining original real estate transaction data; detecting abnormal values of original property transaction data to generate an abnormal mark data set and property transaction normalization data; performing data restoration on the abnormal mark data set so as to generate real estate transaction normalization data; carrying out data standardization on the real estate transaction normalization data to generate standard real estate transaction data;
The graph network connection module is used for creating edge nodes of standard real estate transaction data and generating real estate transaction nodes; establishing indexes of the real estate transaction nodes to generate node index data; partitioning graph data of the real estate transaction nodes according to the node index data to generate transaction partition optimization data; network connection is carried out on transaction partition optimization data, so that a distributed real estate transaction network diagram is generated;
the clustering label module is used for modeling the graphic neural network model according to the distributed property transaction network diagram and generating a property transaction diagram neural network model; performing multidimensional feature vector label processing on the house property transaction graph neural network model to generate a house property transaction clustering result label; performing clustering performance evaluation on the property transaction clustering result labels to generate property transaction clustering performance evaluation data;
the query prediction module is used for acquiring cross-domain real estate transaction data and user query demand data; performing prediction model training on cross-domain property transaction data and property transaction clustering performance evaluation data to generate a property transaction transfer learning model; and importing the user query demand data into a property transaction transfer learning model to perform query prediction, and generating property transaction prediction result data.
CN202410012977.9A 2024-01-04 2024-01-04 Data query method and system based on real estate transaction multidimensional data Active CN117539920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410012977.9A CN117539920B (en) 2024-01-04 2024-01-04 Data query method and system based on real estate transaction multidimensional data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410012977.9A CN117539920B (en) 2024-01-04 2024-01-04 Data query method and system based on real estate transaction multidimensional data

Publications (2)

Publication Number Publication Date
CN117539920A true CN117539920A (en) 2024-02-09
CN117539920B CN117539920B (en) 2024-04-05

Family

ID=89794066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410012977.9A Active CN117539920B (en) 2024-01-04 2024-01-04 Data query method and system based on real estate transaction multidimensional data

Country Status (1)

Country Link
CN (1) CN117539920B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150137771A (en) * 2014-05-30 2015-12-09 조성환 Method for analyzing data and apparatus using the method
CN107220911A (en) * 2017-05-10 2017-09-29 深圳市易图资讯股份有限公司 A kind of real estate market and information of real estate management system
US20170359361A1 (en) * 2016-06-09 2017-12-14 Adobe Systems Incorporated Selecting representative metrics datasets for efficient detection of anomalous data
CN108182204A (en) * 2017-12-12 2018-06-19 链家网(北京)科技有限公司 The processing method and processing device of data query based on house prosperity transaction multi-dimensional data
CN110838067A (en) * 2019-11-14 2020-02-25 腾讯科技(深圳)有限公司 Real estate transaction data processing method, device, server and storage medium
CN113627977A (en) * 2021-07-30 2021-11-09 北京航空航天大学 House value prediction method based on heteromorphic graph
KR20220073295A (en) * 2020-11-26 2022-06-03 (주)태평양감정평가법인 Apparatus for estimating market price of real estate and method thereof
KR20220166038A (en) * 2021-06-09 2022-12-16 주식회사 데이터노우즈 Method and system for real estate transaction information prediction based on deep learning graph network
CN116226425A (en) * 2023-03-24 2023-06-06 中国科学技术大学 Graph data storage method, graph data reading method and graph data storage system
CN117010933A (en) * 2023-08-01 2023-11-07 无锡市曦晨测绘有限公司 Real estate market feature evaluation method based on model
CN117217779A (en) * 2023-10-09 2023-12-12 香港科技大学(广州) Training method and device of prediction model and information prediction method and device
CN117217783A (en) * 2023-07-28 2023-12-12 江苏警官学院 Multi-task learning-driven house price prediction method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150137771A (en) * 2014-05-30 2015-12-09 조성환 Method for analyzing data and apparatus using the method
US20170359361A1 (en) * 2016-06-09 2017-12-14 Adobe Systems Incorporated Selecting representative metrics datasets for efficient detection of anomalous data
CN107220911A (en) * 2017-05-10 2017-09-29 深圳市易图资讯股份有限公司 A kind of real estate market and information of real estate management system
CN108182204A (en) * 2017-12-12 2018-06-19 链家网(北京)科技有限公司 The processing method and processing device of data query based on house prosperity transaction multi-dimensional data
CN110838067A (en) * 2019-11-14 2020-02-25 腾讯科技(深圳)有限公司 Real estate transaction data processing method, device, server and storage medium
KR20220073295A (en) * 2020-11-26 2022-06-03 (주)태평양감정평가법인 Apparatus for estimating market price of real estate and method thereof
KR20220166038A (en) * 2021-06-09 2022-12-16 주식회사 데이터노우즈 Method and system for real estate transaction information prediction based on deep learning graph network
CN113627977A (en) * 2021-07-30 2021-11-09 北京航空航天大学 House value prediction method based on heteromorphic graph
CN116226425A (en) * 2023-03-24 2023-06-06 中国科学技术大学 Graph data storage method, graph data reading method and graph data storage system
CN117217783A (en) * 2023-07-28 2023-12-12 江苏警官学院 Multi-task learning-driven house price prediction method
CN117010933A (en) * 2023-08-01 2023-11-07 无锡市曦晨测绘有限公司 Real estate market feature evaluation method based on model
CN117217779A (en) * 2023-10-09 2023-12-12 香港科技大学(广州) Training method and device of prediction model and information prediction method and device

Also Published As

Publication number Publication date
CN117539920B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN111612039B (en) Abnormal user identification method and device, storage medium and electronic equipment
CN110990461A (en) Big data analysis model algorithm model selection method and device, electronic equipment and medium
CN112528519A (en) Method, system, readable medium and electronic device for engine quality early warning service
CN103513983A (en) Method and system for predictive alert threshold determination tool
CN111612038B (en) Abnormal user detection method and device, storage medium and electronic equipment
CN110866030A (en) Database abnormal access detection method based on unsupervised learning
CN112508105A (en) Method for detecting and retrieving faults of oil extraction machine
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN116361059B (en) Diagnosis method and diagnosis system for abnormal root cause of banking business
CN111709668A (en) Power grid equipment parameter risk identification method and device based on data mining technology
CN116861331A (en) Expert model decision-fused data identification method and system
CN116034379A (en) Activity level measurement using deep learning and machine learning
CN117687815A (en) Hard disk fault prediction method and system
CN116861373A (en) Query selectivity estimation method, system, terminal equipment and storage medium
CN117539920B (en) Data query method and system based on real estate transaction multidimensional data
CN110502669A (en) The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph
RU2745492C1 (en) Method and system for the search for analogues of oil and gas fields
CN113792114A (en) Credible evaluation method and system for urban field knowledge graph
Wijaya et al. Implementation of KNN Algorithm for Occupancy Classification of Rehabilitation Houses
CN111626586B (en) Data quality detection method, device, computer equipment and storage medium
CN117076454B (en) Engineering quality acceptance form data structured storage method and system
Huang et al. A Clustering‐based Method for Business Hall Efficiency Analysis
CN118052558B (en) Wind control model decision method and system based on artificial intelligence
CN117632313B (en) Software driving processing method and system based on artificial intelligence
CN117633652A (en) Electric charge abnormity diagnosis method and system based on Bayesian network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant