CN113392096A - Real estate data quality analysis method, device, equipment and storage medium - Google Patents

Real estate data quality analysis method, device, equipment and storage medium Download PDF

Info

Publication number
CN113392096A
CN113392096A CN202110618734.6A CN202110618734A CN113392096A CN 113392096 A CN113392096 A CN 113392096A CN 202110618734 A CN202110618734 A CN 202110618734A CN 113392096 A CN113392096 A CN 113392096A
Authority
CN
China
Prior art keywords
data
real estate
quality
rate
estate data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110618734.6A
Other languages
Chinese (zh)
Inventor
李琦
宋卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Ruiyun Technology Co ltd
Original Assignee
Chongqing Ruiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Ruiyun Technology Co ltd filed Critical Chongqing Ruiyun Technology Co ltd
Priority to CN202110618734.6A priority Critical patent/CN113392096A/en
Publication of CN113392096A publication Critical patent/CN113392096A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/16Real estate

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a real estate data quality analysis method, a real estate data quality analysis device, real estate data quality analysis equipment and a storage medium, wherein the method comprises the following steps: extracting real estate data in a database, wherein the real estate data carries a data source; identifying the real estate data, judging dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data; dividing the real estate data into sources through a data source, and performing quality analysis on the real estate data according to the dirty data rate and the missing rate to obtain the quality grade of the real estate data; and marking a grade label on the real estate data according to the quality grade of the real estate data. The method and the device can quickly judge the quality of the real estate data, and mark the quality grade label on the real estate data according to the quality grade, so that the data quality can be visually identified when the real estate data is subsequently used.

Description

Real estate data quality analysis method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of big data, in particular to a real estate data quality analysis method, a real estate data quality analysis device, real estate data quality analysis equipment and a storage medium.
Background
In the real estate transaction market, a client can leave relevant data in a real estate enterprise through various channels, and after the real estate enterprise acquires the relevant data, the subsequent client service quality and the like can be improved and improved according to the relevant data, so that better service is provided for the client. However, in the present society where the data volume is rapidly increased and the data representation is varied, how to extract valuable information from a large amount of real estate data is a challenge. And the data quality analysis is a necessary way for extracting valuable data, and massive data can be screened according to the quality analysis result, so that valuable data information can be obtained.
In the prior art, service personnel are generally required to compare data one by one, judge data quality and clean the data in a manual modification mode, so that the data quality is ensured, but the quality analysis is manually carried out, so that misjudgment and missed judgment are easy to occur, the workload is overlarge, and the efficiency is low.
Disclosure of Invention
In view of the above, it is necessary to provide a real estate data quality analysis method, apparatus, device and storage medium for solving the above technical problems.
A real estate data quality analysis method comprising the steps of: extracting real estate data in a database, wherein the real estate data carries a data source; identifying the real estate data, judging dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data; dividing the real estate data into sources through a data source, and performing quality analysis on the real estate data according to the dirty data rate and the missing rate to obtain the quality grade of the real estate data; and marking a grade label on the real estate data according to the quality grade of the real estate data.
In one embodiment, the identifying the real estate data, determining dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data specifically includes: the real estate data is composed of a plurality of subdata, and the subdata carries corresponding subdata sources; judging dirty data in the real estate data according to the subdata sources, and if the subdata sources are illegal, determining the corresponding subdata as the dirty data; calculating the proportion of the dirty data in the real estate data, and acquiring the dirty data rate of the real estate data; detecting whether subdata is missing in the real estate data, and if the subdata is missing, determining the corresponding subdata as missing data; and calculating the proportion of the missing data in the real estate data to obtain the missing data rate of the real estate data.
In one embodiment, the performing source division on the real estate data through a data source, performing quality analysis on the real estate data according to the dirty data rate and the missing rate, and acquiring a quality level of the real estate data specifically includes: the data source comprises a product use record and data established based on a third party; when the data source is a product use record, identifying the missing rate and the dirty data rate of the real estate data, and judging the quality grade of the real estate data according to the missing rate and the dirty data rate; and when the data source is data established based on a third party, identifying the loss rate of the real estate data, and judging the quality grade of the real estate data according to the loss rate.
In one embodiment, when the data source is a product usage record, identifying a missing rate and a dirty data rate of the real estate data, and determining a quality level of the real estate data according to the missing rate and the dirty data rate specifically includes: when the missing rate of the real estate data is more than 50% and the dirty data rate is more than 20%, the quality grade of the real estate data is determined to be low quality; when the loss rate of the real estate data is between 30% and 50% and the dirty data rate is more than 20%, or the loss rate is more than 50% and the dirty data rate is between 10% and 20%, the quality grade of the real estate data is determined to be medium quality; when the loss rate of the real estate data is between 10% and 30%, the dirty data rate is between 10% and 20%, or the loss rate is between 30% and 50%, and the dirty data rate is less than 10%, or the loss rate is greater than 50%, and the dirty data rate is less than 10%, the quality grade of the real estate data is determined to be high quality; and when the loss rate of the real estate data is less than 10% and the dirty data rate is less than 10%, the quality grade of the real estate data is determined to be reliable.
In one embodiment, when the data source is data created based on a third party, identifying a loss rate of the real estate data, and determining a quality level of the real estate data according to the loss rate specifically includes: when the loss rate of the real estate data is more than 50%, the quality grade of the real estate data is determined to be low quality; when the loss rate of the real estate data is between 30% and 50%, the quality grade of the real estate data is determined to be medium quality; when the loss rate of the real estate data is between 10% and 30%, the quality grade of the real estate data is determined to be high quality; and when the loss rate of the real estate data is less than 10%, the quality level of the real estate data is determined to be reliable.
In one embodiment, after said rating labeling said property data according to a quality rating of said property data, further comprising: and storing the real estate data and the corresponding grade labels into a database, and cleaning the real estate data according to the grade labels of the real estate data.
In one embodiment, after said rating labeling said property data according to a quality rating of said property data, further comprising: respectively counting the proportion of low-quality, medium-quality, high-quality and reliable real estate data in all real estate data to obtain a statistical result; and according to the statistical result, dividing the real estate data into three types, namely low-medium quality data consisting of the low-medium quality real estate data and the medium-medium quality real estate data, medium-high quality data consisting of the medium-medium quality real estate data and the high-reliability data consisting of the high-medium quality real estate data and the high-reliability real estate data, and acquiring a quality normal distribution diagram of the real estate data.
A real estate data quality analysis apparatus comprising: the data extraction module is used for extracting real estate data in a database, wherein the real estate data carries a data source; the data judgment module is used for identifying the real estate data, judging dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data; the quality analysis module is used for dividing the source of the real estate data through a data source, carrying out quality analysis on the real estate data according to the dirty data rate and the missing rate and obtaining the quality grade of the real estate data; and the grade acquisition module is used for marking a grade label on the real estate data according to the quality grade of the real estate data.
An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps of a method for quality analysis of real estate data as described in the various embodiments above.
A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of a real estate data quality analysis method as described in the various embodiments above.
Compared with the prior art, the invention has the advantages and beneficial effects that: the method and the device can quickly judge the quality of the real estate data, mark the quality label on the real estate data according to the quality grade, conveniently and visually determine the data quality when the real estate data is subsequently used, quickly clean the data according to the quality label, acquire high-value data and improve the overall quality of the real estate data.
Drawings
FIG. 1 is a schematic flow chart of a method for quality analysis of property data in one embodiment;
FIG. 2 is a schematic diagram of an embodiment of a property data quality analyzer;
fig. 3 is a schematic diagram of the internal structure of the apparatus in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings by way of specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In one embodiment, as shown in FIG. 1, there is provided a real estate data quality analysis method comprising the steps of:
step S101, real estate data in a database are extracted, and the real estate data carries a data source.
Specifically, the database stores real estate data of a plurality of users, and the real estate data in the database is extracted, wherein the real estate data carries data sources. The data sources can be real estate data obtained by a user using a product, real estate data obtained from a third-party marketing advisor, and the like.
And S102, identifying the real estate data, judging dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data.
Specifically, real estate data is identified, and dirty data and missing data in the real estate data are judged according to preset rules. The dirty data is data whose data source is illegal, and the missing data is sub-data missing in the real estate data. And acquiring the dirty data rate and the missing rate according to the proportion of the dirty data and the missing data in the real estate data.
And step S103, performing source division on the real estate data through a data source, performing quality analysis on the real estate data according to the dirty data rate and the missing rate, and acquiring the quality grade of the real estate data.
Specifically, the real estate data are subjected to source distribution through data sources, and quality analysis is respectively carried out on the real estate data according to different data sources. Determining the quality grade of the real estate data according to the dirty data rate and the missing data rate, and if the dirty data rate and the missing data rate are high, determining that the quality of the real estate data is low; and on the contrary, the real estate data is high in quality, so that the quality grade of the real estate data is acquired.
And step S104, marking a grade label on the real estate data according to the quality grade of the real estate data.
Specifically, the quality level of the real estate data can be conveniently checked by marking the quality level of the real estate data with a level label according to the quality level of the real estate data.
In this embodiment, the real estate data in the database is extracted, the real estate data carries a data source, the real estate data is identified, dirty data and missing data in the real estate data are judged according to preset rules, dirty data rate and missing rate of the real estate data are calculated, the real estate data are subjected to source distribution through the data source, the quality of the real estate data is analyzed according to the dirty data rate and the missing rate, the quality grade of the real estate data is obtained, a grade label is marked on the real estate data according to the quality grade of the real estate data, so that the quality of the real estate data can be rapidly judged, the data quality can be intuitively determined when the real estate data are used subsequently, the data can be rapidly cleaned according to the grade label, high-value data are obtained, and the overall quality of the real estate data is improved.
Wherein, step S102 specifically includes: the real estate data is composed of a plurality of subdata, and the subdata carries a corresponding subdata source; judging dirty data in the real estate data according to the subdata sources, and if the subdata sources are illegal, determining the corresponding subdata as the dirty data; calculating the proportion of the dirty data in the real estate data, and acquiring the dirty data rate of the real estate data; detecting whether subdata is missing in the real estate data, and if the subdata is missing, determining the corresponding subdata as missing data; and calculating the proportion of the missing data in the real estate data to obtain the missing data rate of the real estate data.
Specifically, the real estate data may include all relevant information of a certain client, the sub-data is specific information of the client, such as name, age, requirement, and the like, and the sub-data carries a corresponding sub-data source, and the source of the sub-data may be information obtained by using a product, or information obtained according to a public consultant and the like.
Specifically, a subdata source is detected according to a legality checking mechanism, if the subdata source is illegal, the subdata is determined to be dirty data, the proportion of the dirty data in the real estate data is calculated, and a dirty data rate is obtained; and simultaneously detecting whether the missing subdata exists in the real estate data, if the missing subdata exists, determining the corresponding subdata as missing data, and calculating the proportion of the missing data in the real estate data so as to obtain the missing data rate. Judging the quality of the real estate data according to the missing data rate and the dirty data rate, wherein if the missing data rate and the dirty data rate are high, the quality of the real estate data is low; otherwise, the quality is high. In addition, a cleaning reference of the real estate data can be given according to the missing data rate and the dirty data rate, so that low-quality real estate data can be filtered conveniently, and high-quality real estate data can be extracted.
Wherein, step S103 specifically includes: the data source comprises a product use record and data established based on a third party; when the data source is a product use record, identifying the deficiency rate and the dirty data rate of the real estate data, and judging the quality grade of the real estate data according to the deficiency rate and the dirty data rate; and when the data source is data established based on a third party, identifying the loss rate of the real estate data, and judging the quality grade of the real estate data according to the loss rate.
Specifically, data sources include, but are not limited to, product usage records and third party-based established data; when the data source is a product use record, identifying the deficiency rate and the dirty data rate of the real estate data, and judging the quality grade of the real estate data according to the deficiency rate and the dirty data rate; when the data source is data established by a third party, the data established by the third party is usually information acquired by the business consultant from the client, so that all the data established by the third party are legal by default, dirty data does not exist, and only the identification of the missing rate of the real estate data of the source is needed, and the quality grade of the real estate data is judged according to the missing rate.
Specifically, when the data source is a product use record, when the loss rate of the real estate data is more than 50% and the dirty data rate is more than 20%, the quality grade of the real estate data is determined to be low quality; when the loss rate of the real estate data is between 30% and 50% and the dirty data rate is more than 20%, or the loss rate is more than 50% and the dirty data rate is between 10% and 20%, the quality grade of the real estate data is determined to be medium quality; when the loss rate of the real estate data is between 10% and 30% and the dirty data rate is between 10% and 20%, or the loss rate is between 30% and 50%, and the dirty data rate is less than 10%, or the loss rate is more than 50%, and the dirty data rate is less than 10%, the quality grade of the real estate data is determined to be high quality; and when the missing rate of the real estate data is less than 10 percent and the dirty data rate is less than 10 percent, the quality grade of the real estate data is determined to be reliable.
Specifically, when the data source is data established based on a third party and the loss rate of the real estate data is greater than 50%, the quality grade of the real estate data is determined to be of low quality; when the loss rate of the real estate data is between 30% and 50%, the quality grade of the real estate data is determined to be medium quality; when the loss rate of the real estate data is between 10% and 30%, the quality grade of the real estate data is determined to be high quality; and when the loss rate of the real estate data is less than 10%, the quality grade of the real estate data is determined to be reliable.
After the quality level of the real estate data is determined, the corresponding real estate data are respectively marked with low-quality, medium-quality, high-quality and reliable labels, so that the quality level of the real estate data can be conveniently and visually checked.
After step S104, the method further includes: and storing the real estate data and the corresponding grade labels into a database, and cleaning the real estate data according to the grade labels.
Specifically, after the real estate data is marked with the grade label, the real estate data carrying the grade label is stored in the database, and a reference for data cleaning is given according to the grade label of the real estate data, for example, all data with low-grade quality of the cleaning grade label is provided, so that the confidence coefficient of the real estate data is improved, and the customer requirements are more accurately analyzed.
After step S104, the method further includes: respectively counting the proportion of low-quality, medium-quality, high-quality and reliable real estate data in all real estate data to obtain a statistical result; and according to the statistical result, dividing the real estate data into three types, namely low-medium quality data consisting of the low-quality real estate data and the medium-quality real estate data, medium-high quality data consisting of the medium-quality real estate data and the high-reliability data consisting of the high-quality real estate data and the reliable real estate data, and acquiring a quality normal distribution diagram of the real estate data.
Specifically, the real estate data of each quality grade is counted according to the quality grade of the real estate data, the real estate data are divided into three types of low and medium quality data, medium and high quality data and high reliability data, and a corresponding quality normal distribution graph is obtained based on the three types of real estate data. When part of the real estate data in the database is used, the quality of the real estate data can be judged according to the quality normal distribution diagram by using the three-sigma criterion, and the judgment result is displayed, so that the confidence of the used real estate data is obtained, and the accuracy of the analysis result is ensured.
As shown in fig. 2, there is provided a real estate data quality analyzing apparatus 20 comprising: data extraction module 21, data decision module 22, grade acquisition module 23 and label module 24, wherein:
the data extraction module 21 is configured to extract real estate data in a database, where the real estate data carries a data source;
the data judgment module 22 is used for identifying the real estate data, judging dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data;
the grade acquisition module 23 is configured to perform source classification on the real estate data through a data source, perform quality analysis on the real estate data according to the dirty data rate and the missing rate, and acquire a quality grade of the real estate data;
and the label module 24 is used for marking the quality grade label on the real estate data according to the quality grade of the real estate data.
In one embodiment, the data determination module 22 is specifically configured to: the real estate data is composed of a plurality of subdata, and the subdata carries a corresponding subdata source; judging dirty data in the real estate data according to the subdata sources, and if the subdata sources are illegal, determining the corresponding subdata as the dirty data; calculating the proportion of the dirty data in the real estate data, and acquiring the dirty data rate of the real estate data; detecting whether subdata is missing in the real estate data, and if the subdata is missing, determining the corresponding subdata as missing data; and calculating the proportion of the missing data in the real estate data to obtain the missing data rate of the real estate data.
In an embodiment, the level obtaining module 23 is specifically configured to: the data source comprises a product use record and data established based on a third party; when the data source is a product use record, identifying the deficiency rate and the dirty data rate of the real estate data, and judging the quality grade of the real estate data according to the deficiency rate and the dirty data rate; and when the data source is data established based on a third party, identifying the loss rate of the real estate data, and judging the quality grade of the real estate data according to the loss rate.
In one embodiment, a real estate data quality analysis device 20 is further configured to: and storing the real estate data and the corresponding grade labels into a database, and cleaning the real estate data according to the grade labels of the real estate data.
In one embodiment, a real estate data quality analysis device 20 is further configured to: respectively counting the proportion of low-quality, medium-quality, high-quality and reliable real estate data in all real estate data to obtain a statistical result; and according to the statistical result, dividing the real estate data into three types, namely low-medium quality data consisting of the low-quality real estate data and the medium-quality real estate data, medium-high quality data consisting of the medium-quality real estate data and the high-reliability data consisting of the high-quality real estate data and the reliable real estate data, and acquiring a quality normal distribution diagram of the real estate data.
In one embodiment, a device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 3. The device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the device is configured to provide computing and control capabilities. The memory of the device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the device is used for storing configuration templates and also can be used for storing target webpage data. The network interface of the device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a real estate data quality analysis method.
Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation on the devices to which the present application may be applied, and that a particular device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a storage medium may also be provided, the storage medium storing a computer program comprising program instructions which, when executed by a computer, which may be part of the aforementioned real estate data quality analysis apparatus, cause the computer to perform the method according to the preceding embodiment.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
It will be apparent to those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented in program code executable by a computing device, such that they may be stored on a computer storage medium (ROM/RAM, magnetic disks, optical disks) and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A real estate data quality analysis method is characterized by comprising the following steps:
extracting real estate data in a database, wherein the real estate data carries a data source;
identifying the real estate data, judging dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data;
dividing the real estate data into sources through a data source, and performing quality analysis on the real estate data according to the dirty data rate and the missing rate to obtain the quality grade of the real estate data;
and marking a grade label on the real estate data according to the quality grade of the real estate data.
2. The method as claimed in claim 1, wherein the identifying the real estate data, determining dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate of the real estate data according to the dirty data and the missing data comprises:
the real estate data is composed of a plurality of subdata, and the subdata carries corresponding subdata sources;
judging dirty data in the real estate data according to the subdata sources, and if the subdata sources are illegal, determining the corresponding subdata as the dirty data;
calculating the proportion of the dirty data in the real estate data, and acquiring the dirty data rate of the real estate data;
detecting whether subdata is missing in the real estate data, and if the subdata is missing, determining the corresponding subdata as missing data;
and calculating the proportion of the missing data in the real estate data to obtain the missing data rate of the real estate data.
3. The method as claimed in claim 1, wherein the step of performing data source separation on the real estate data through a data source, performing quality analysis on the real estate data according to the dirty data rate and the missing rate, and obtaining the quality grade of the real estate data comprises:
the data source comprises a product use record and data established based on a third party;
when the data source is a product use record, identifying the missing rate and the dirty data rate of the real estate data, and judging the quality grade of the real estate data according to the missing rate and the dirty data rate;
and when the data source is data established based on a third party, identifying the missing rate of the real estate data, and judging the quality grade of the real estate data according to the missing rate.
4. The method as claimed in claim 3, wherein the identifying the missing rate and the dirty data rate of the real estate data when the data source is the product usage record, and the determining the quality grade of the real estate data according to the missing rate and the dirty data rate specifically comprises:
when the missing rate of the real estate data is more than 50% and the dirty data rate is more than 20%, the quality grade of the real estate data is determined to be low quality;
when the loss rate of the real estate data is between 30% and 50% and the dirty data rate is more than 20%, or the loss rate is more than 50% and the dirty data rate is between 10% and 20%, the quality grade of the real estate data is determined to be medium quality;
when the loss rate of the real estate data is between 10% and 30%, the dirty data rate is between 10% and 20%, or the loss rate is between 30% and 50%, and the dirty data rate is less than 10%, or the loss rate is greater than 50%, and the dirty data rate is less than 10%, the quality grade of the real estate data is determined to be high quality;
and when the loss rate of the real estate data is less than 10% and the dirty data rate is less than 10%, the quality grade of the real estate data is determined to be reliable.
5. The method as claimed in claim 3, wherein the identifying the loss rate of the real estate data when the data source is data established based on a third party, and determining the quality grade of the real estate data according to the loss rate comprises:
when the loss rate of the real estate data is more than 50%, the quality grade of the real estate data is determined to be low quality;
when the loss rate of the real estate data is between 30% and 50%, the quality grade of the real estate data is determined to be medium quality;
when the loss rate of the real estate data is between 10% and 30%, the quality grade of the real estate data is determined to be high quality;
and when the loss rate of the real estate data is less than 10%, the quality level of the real estate data is determined to be reliable.
6. A method as claimed in claim 1, wherein after said rating label is applied to said property data according to the quality rating of said property data, further comprising:
and storing the real estate data and the corresponding grade labels into a database, and cleaning the real estate data according to the grade labels of the real estate data.
7. A method for quality analysis of property data according to claim 4 or claim 5 further comprising, after said rating label is applied to the property data according to the quality rating of the property data:
respectively counting the proportion of low-quality, medium-quality, high-quality and reliable real estate data in all real estate data to obtain a statistical result;
and according to the statistical result, dividing the real estate data into three types, namely low-medium quality data consisting of the low-medium quality real estate data and the medium-medium quality real estate data, medium-high quality data consisting of the medium-medium quality real estate data and the high-reliability data consisting of the high-medium quality real estate data and the high-reliability real estate data, and acquiring a quality normal distribution diagram of the real estate data.
8. A real estate data quality analysis apparatus comprising:
the data extraction module is used for extracting real estate data in a database, wherein the real estate data carries a data source;
the data judgment module is used for identifying the real estate data, judging dirty data and missing data in the real estate data according to a preset rule, and calculating a dirty data rate and a missing rate in the real estate data according to the dirty data and the missing data;
the quality analysis module is used for dividing the source of the real estate data through a data source, carrying out quality analysis on the real estate data according to the dirty data rate and the missing rate and obtaining the quality grade of the real estate data;
and the grade acquisition module is used for marking a grade label on the real estate data according to the quality grade of the real estate data.
9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, realizing the steps of the method of any one of claims 1 to 7.
CN202110618734.6A 2021-06-03 2021-06-03 Real estate data quality analysis method, device, equipment and storage medium Pending CN113392096A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110618734.6A CN113392096A (en) 2021-06-03 2021-06-03 Real estate data quality analysis method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110618734.6A CN113392096A (en) 2021-06-03 2021-06-03 Real estate data quality analysis method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113392096A true CN113392096A (en) 2021-09-14

Family

ID=77618071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110618734.6A Pending CN113392096A (en) 2021-06-03 2021-06-03 Real estate data quality analysis method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113392096A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN108734405A (en) * 2018-05-24 2018-11-02 国信优易数据有限公司 A kind of data value Evaluation Platform and method
CN108876481A (en) * 2018-07-19 2018-11-23 万翼科技有限公司 Statistical method, server and the computer readable storage medium of real estate information
CN109285092A (en) * 2017-07-20 2019-01-29 金东珉 Internet real estate information providing system, real estate trust that transactional services provide system
CN110232061A (en) * 2019-06-20 2019-09-13 国网上海市电力公司 A kind of power distribution network multi-source data method of quality control
KR102041621B1 (en) * 2019-02-25 2019-11-06 (주)미디어코퍼스 System for providing artificial intelligence based dialogue type corpus analyze service, and building method therefor
CN110472109A (en) * 2019-07-30 2019-11-19 深圳中科保泰科技有限公司 Mobilism Data Quality Analysis method and plateform system
CN110727665A (en) * 2019-09-23 2020-01-24 江河瑞通(北京)技术有限公司 Internet of things equipment reported data quality analysis method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN109285092A (en) * 2017-07-20 2019-01-29 金东珉 Internet real estate information providing system, real estate trust that transactional services provide system
CN108734405A (en) * 2018-05-24 2018-11-02 国信优易数据有限公司 A kind of data value Evaluation Platform and method
CN108876481A (en) * 2018-07-19 2018-11-23 万翼科技有限公司 Statistical method, server and the computer readable storage medium of real estate information
KR102041621B1 (en) * 2019-02-25 2019-11-06 (주)미디어코퍼스 System for providing artificial intelligence based dialogue type corpus analyze service, and building method therefor
CN110232061A (en) * 2019-06-20 2019-09-13 国网上海市电力公司 A kind of power distribution network multi-source data method of quality control
CN110472109A (en) * 2019-07-30 2019-11-19 深圳中科保泰科技有限公司 Mobilism Data Quality Analysis method and plateform system
CN110727665A (en) * 2019-09-23 2020-01-24 江河瑞通(北京)技术有限公司 Internet of things equipment reported data quality analysis method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
俞东进: "基于服务的决策支持系统研究", 《中国优秀博硕士学位论文全文数据库(博士)经济与管理科学辑》 *

Similar Documents

Publication Publication Date Title
CN106874134B (en) Work order type processing method, device and system
CN112491611B (en) Fault location system, method, apparatus, electronic device, and computer readable medium
CN109801151B (en) Financial falsification risk monitoring method, device, computer equipment and storage medium
US9965841B2 (en) Monitoring system based on image analysis of photos
CN107016298B (en) Webpage tampering monitoring method and device
JP2014132455A (en) Risk assessment and system for security of industrial installation
CN112434178A (en) Image classification method and device, electronic equipment and storage medium
CN110647523B (en) Data quality analysis method and device, storage medium and electronic equipment
CN106301979B (en) Method and system for detecting abnormal channel
CN115205766A (en) Block chain-based network security abnormal video big data detection method and system
CN114840286B (en) Service processing method and server based on big data
CN116346456A (en) Business logic vulnerability attack detection model training method and device
CN110019762B (en) Problem positioning method, storage medium and server
CN112819476A (en) Risk identification method and device, nonvolatile storage medium and processor
CN113392096A (en) Real estate data quality analysis method, device, equipment and storage medium
CN112445687A (en) Blocking detection method of computing equipment and related device
CN114817518B (en) License handling method, system and medium based on big data archive identification
CN112866295B (en) Big data crawler-prevention processing method and cloud platform system
CN112231272B (en) Information processing method based on remote online office and computer equipment
CN114553473A (en) Abnormal login behavior detection system and method based on login IP and login time
CN108235324B (en) Short message template testing method and server
CN108268988B (en) Grain purchasing business management method and system
CN116912603B (en) Pre-labeling screening method, related device, equipment and medium
CN111105263A (en) User identification method and device, electronic equipment and storage medium
CN109151579B (en) Method, device and equipment for testing whether web video traffic is correctly identified

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210914