CN116362429B

CN116362429B - Urban house lease prediction method and system based on community grid information acquisition

Info

Publication number: CN116362429B
Application number: CN202310642137.6A
Authority: CN
Inventors: 陈平; 谢江龙; 郭望; 郭劲军; 苏炳辉; 宁永鹏; 肖高云
Original assignee: Xiamen Sunsharing Information Technology Co ltd
Current assignee: Xiamen Sunsharing Information Technology Co ltd
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-08-11
Anticipated expiration: 2043-06-01
Also published as: CN116362429A

Abstract

The invention relates to the technical field of data prediction, in particular to a metropolitan area house lease prediction method and system based on community grid information acquisition. The prediction method comprises the steps of constructing characteristic identifiers of a historical data set of acquired house lease information to obtain a historical identification data set; training the constructed first logistic regression model and the constructed second logistic regression model according to the historical identification data set to obtain a first model and a second model; constructing a characteristic identifier of a data set of the existing house lease information for obtaining the data set of the existing house lease information so as to obtain an existing identifier data set; the existing identification data set is input into the first model and the second model to predict whether the existing house is in a rented state and a resident time period of a resident of the existing house. The method can effectively assist community personnel to complete information verification of house renting tasks, and can prolong the visit period and reduce the visit pressure while improving the information processing efficiency and accuracy.

Description

Urban house lease prediction method and system based on community grid information acquisition

Technical Field

The invention relates to the technical field of data prediction, in particular to a metropolitan area house lease prediction method and system based on community grid information acquisition.

Background

With the development of economy and the acceleration of urbanization, the floating population of cities is increasing, which also makes the house renting market expand rapidly. Particularly, in community grid management, communities receive commands of departments to conduct tasks, electricity federation, visit and other works so as to collect relevant information of all residents and households, and specific conditions of the communities are accurately mastered. The information verification of the house renting task is one of basic tasks of community work, and requires a community staff to perform visit verification for a long time and periodically. In particular, the information it verifies is mainly whether the current house is in a rented state or not and whether the tenant is replaced or not.

In the process of verifying information of house renting tasks by adopting a manual regular door-opening mode, the following technical problems exist: 1. human resources are wasted, and the large workload and the large verification difficulty are caused by the large number of visiting houses, so that the information processing efficiency is low; 2. since house renters have fluidity, there is often hysteresis in performing information verification periodically, resulting in inefficiency in information update, and accuracy of information update must be ensured by shortening the visit period.

Disclosure of Invention

In order to solve at least one defect of the prior art in the information verification of house renting tasks, the invention provides a metropolitan area house renting prediction method and system based on community grid information acquisition, so that the information verification efficiency is effectively improved.

In a first aspect, an embodiment of the present invention provides a metropolitan area house lease prediction method based on community grid information collection, including:

acquiring a historical data set of house lease information, and constructing a characteristic identifier of the historical data set of the house lease information to obtain a historical identifier data set;

constructing a first logistic regression model and a second logistic regression model, and training the first logistic regression model and the second logistic regression model according to the historical identification data set to obtain a first model and a second model; the first model is used for predicting whether the house is in a renting state or not, and the second model is used for predicting the residence time of the resident of the house;

acquiring a data set of existing house lease information, and constructing a characteristic identifier of the data set of the existing house lease information to obtain an existing identifier data set; the existing identification data set is input into the first model and the second model to predict whether the existing house is in a rented state and predict the residence time of the resident of the existing house.

In an embodiment, the historical data set of the house leasing information includes all house information in the target area and resident information, living time and moving-away time corresponding to each house; the house information includes at least a house address, and the resident information includes at least a resident name, a resident sex, and a resident age.

In one embodiment, the construction of the feature identification of the historical data set of house lease information includes the steps of:

according to the historical data set of the house leasing information, numbering and marking all houses to obtain house numbers, and carrying out name marking on all residents to obtain resident names;

generating a resident time chain data set according to the historical data set of the house lease information, wherein the resident time chain data set comprises all resident names and corresponding house numbers, living time and moving-away time;

based on the resident time chain data set, extracting personnel information with intersections at resident time in the same house to construct a resident and resident information data set; the resident and resident information data set comprises resident names, house numbers, corresponding resident names, resident entering time, resident moving-away time and resident moving-away time, and the number of the intersections of the resident names, the resident numbers, the resident entering time, the resident moving-away time and the resident moving-away time;

Determining a relationship identifier of a resident and a resident by acquiring and integrating employment information, household registration and family planning information in the target area, wherein the relationship identifier comprises a colleague relationship, a relative relationship and other relationships;

constructing a historical identification data set based on the resident and same resident information data set and the relationship identification of the resident and the same resident;

the historical identification data set comprises a first data set and a second data set, wherein the first data set comprises a house number, historical resident information corresponding to the house and resident information corresponding to a building where the house is located, and is used as a training sample of a first logistic regression model;

the second data set includes resident names, historical resident information of the resident, and corresponding historical co-resident information for use as training samples for a second logistic regression model.

In one embodiment, a history identification data set is constructed based on the resident and co-resident information data set and the relationship identification of the resident and the co-resident, and specifically includes the following steps:

constructing a resident characteristic detail data set of the resident in the current house;

summarizing the same resident characteristic detail data set and a resident time chain data set to construct a resident characteristic detail data set;

Building a house characteristic detail data set according to the resident characteristic detail data set;

carrying out data extraction on the resident characteristic detail data set, the house characteristic detail data set, resident information and house information to obtain a first data set and a second data set;

the same-living person characteristic detail data at least comprises: the name of the same resident, the corresponding resident name, the resident duration, the number of co-resident people, the total co-resident duration, the average co-resident duration and the relationship identification of the same resident and the resident;

the occupant characteristic detail dataset includes at least: the name of the resident, the corresponding house number, the historical total residence time of the resident, the average residence time of the resident, the number of the resident in the relationship with the resident in the same resident, the number of the co-resident in the relationship between the resident and the related resident, the number of the resident in the relationship with the resident in the same resident, the number of the co-resident in the relationship between the resident and the co-resident, the number of the resident in the common resident and the resident, and the historical residence number of the resident;

the house characteristic detail data set comprises at least: the house number and the number of the resident of the historical resident corresponding to the house, the number of the resident of the historical resident, the average resident number of the historical resident, the relation mark of the historical resident and the resident, the average common resident number of the historical resident and the number of the resident in the same year, and the number of the personnel change of the house in the last year.

In an embodiment, before training the first logistic regression model and the second logistic regression model according to the historical identification dataset, special data culling is performed on the historical identification dataset;

the special data culling includes at least one of:

if the ratio of the resident and the same person like identification house to all the houses in the building is more than 50%, the age variance of the resident and the same person in the building is lower than a preset variance value, the ratio of the resident and the same person within one month of the time difference between the entering time and the moving time to the house and the ratio of the resident and the house in the building to the house in the building are more than the preset ratio, eliminating all the house information and the resident information of the building, and primarily judging the building as a school dormitory;

if the ratio of the resident in the building where the building is located to the building in which the resident is related to the co-workers and all the buildings where the building is located exceeds a preset value, eliminating all the building information and resident information of the building, and primarily judging that the building is an enterprise dormitory;

if the age variance of the resident and the resident of the same house is higher than the preset variance ratio in the building where the house is located, and personnel relocation occurs in each verification, all house information and resident information of the building are removed, and the house is initially judged to be a hotel.

In an embodiment, the first and second logistic regression models employ sigmoid functions.

In an embodiment, the method further comprises the steps of: and verifying the prediction results of the first model and the second model, and adding corresponding verification data into the historical data set or the historical identification data set of the house lease information to correct and train the first model and the second model.

In a second aspect, an embodiment of the present invention provides a metropolitan area house lease prediction system based on community grid information collection, including:

the data processing module is used for acquiring a historical data set of house lease information and constructing characteristic identifiers of the historical data set of the house lease information to obtain a historical identification data set;

the model training module is used for constructing a first logistic regression model and a second logistic regression model, and training the first logistic regression model and the second logistic regression model according to the historical identification data set to obtain the first model and the second model; the first model is used for predicting whether the house is in a renting state or not, and the second model is used for predicting the moving-away time of the resident of the house;

The data prediction module is used for acquiring a data set of the existing house lease information and constructing characteristic identifiers of the data set of the existing house lease information to obtain an existing identification data set; the existing identification data set is input into the first model and the second model to predict whether the existing house is in a rented state and predict the residence time of the resident of the existing house.

In a third aspect, an embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored, where the computer is executed by a processor to implement a metro house lease prediction method based on community grid information collection as described in any one of the embodiments of the first aspect above.

In a fourth aspect, an embodiment of the present invention provides an electronic device, including at least one processor, and a memory communicatively connected to the processor, where the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the processor to perform the metropolitan area house lease prediction method based on community grid information collection as in any one of the embodiments of the first aspect above.

Based on the above, compared with the prior art, the metropolitan area house lease prediction method based on community grid information collection provided by the embodiment of the invention performs mining by means of the house lease information data implied by community visit and forms a history identification data set to train a model, thereby effectively assisting community personnel in completing the information verification of house lease tasks and optimizing resource management. The method not only can improve the information processing efficiency and accuracy, but also can prolong the interview period and reduce the interview pressure.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

For a clearer description of embodiments of the invention or of the solutions of the prior art, the drawings that are needed in the description of the embodiments or of the prior art will be briefly described, it being obvious that the drawings in the description below are some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art; the positional relationships described in the drawings in the following description are based on the orientation of the elements shown in the drawings unless otherwise specified.

FIG. 1 is a flow chart of steps of a method for predicting house rentals in a metropolitan area based on community grid information collection provided by an embodiment of the invention;

FIG. 2 is a flowchart of the steps for constructing a historical identification dataset;

FIG. 3 is a graph showing statistics of a test sample and a training sample using a second model of an embodiment of the present invention for data of a region of approximately one year;

fig. 4 is a schematic structural diagram of a metropolitan area house lease prediction system based on community grid information collection according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention; the technical features designed in the different embodiments of the invention described below can be combined with each other as long as they do not conflict with each other; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be noted that all terms used in the present invention (including technical terms and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs and are not to be construed as limiting the present invention; it will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In order to effectively solve at least one defect of information verification of house renting tasks in the prior art, the invention provides a metropolitan area house renting prediction method and system based on community grid information acquisition, so that the efficiency of information verification is effectively improved.

The following detailed description is made with reference to specific embodiments and accompanying drawings.

Example 1

Referring to fig. 1, an embodiment of the present invention provides a metropolitan area house lease prediction method based on community grid information collection, including the steps of:

step S10, acquiring a historical data set of house lease information, and constructing characteristic identifiers of the historical data set of the house lease information to obtain a historical identification data set.

Specifically, the historical data set of the house leasing information comprises all house information in a target area, resident information corresponding to each house, a living time and a moving-away time. The house information and the resident information can be obtained through the related files or the household registration information of the community. The house information at least comprises a house address, namely a detailed address of a house number of a unit/building where a community where a house is located in a target area. The resident information includes at least a resident name, a resident sex, and a resident age. Wherein the gender of the resident and the age of the resident can be obtained through the identity card number of the resident.

In addition, the living time and the moving-away time of the house where the resident is located can be directly and effectively obtained through a community file or a visit inquiry mode, and can also be mined from the past community visit data.

For example, a community visit task must include a community visit person visiting a resident of a house at a corresponding time point. The data may have multiple interview records at different points in time for the same resident, and the data may have multiple interview records at different points in time for the same house. According to the time sequence processing, the resident can be identified to move based on the data of two adjacent house changes of the same resident. Taking the time point of the first visit record of the resident in one house as the entrance time, taking the time point of the first visit record of the resident in the next house as the removal time, and then obtaining the corresponding entrance time and removal time of the resident in one house. The in-time and out-time obtained in this way tend to be subject to hysteresis and uncertainty. In order to enable the follow-up model to have accuracy, the method can be used for acquiring the in-life time and the out-of-life time, and then the in-life time and the out-of-life time are gradually corrected in a correction training mode. The specific correction training mode refers to the subsequent steps.

Step S20, a first logistic regression model and a second logistic regression model are built, and training is carried out on the first logistic regression model and the second logistic regression model according to the historical identification data set so as to obtain the first model and the second model; the first model is used for predicting whether the house is in a renting state or not, and the second model is used for predicting the residence time of the resident of the house.

Specifically, the present embodiment performs the construction of the prediction model by the logistic regression model (Logistic Regression, LR). Wherein the first logistic regression model and the second logistic regression model can adopt a sigmoid function, and the expression of the sigmoid function is thatWherein->、/>、/>…N eigenvalues (i.e., n factors affecting probability) expressed as training samples, the embodiment specifically refers to each specific parameter of the historical identification dataset; />、/>、/>…/>The regression parameters expressed as models, namely the weight value of each characteristic value, are specifically estimated by adopting a maximum likelihood method.

The embodiment can obtain the first model for predicting whether the house is in the rented state and the second model for predicting the living time of the house resident by putting each specific parameter of the history identification data set into the first logistic regression model and the second logistic regression model for training in a large amount. The specific training mode can be realized by adopting a general iterative algorithm GIS and an improved iterative algorithm IIS, and the details are not repeated here.

Step S30, acquiring a data set of existing house lease information, and constructing a characteristic identifier of the data set of the existing house lease information to obtain an existing identifier data set; the existing identification data set is input into the first model and the second model to predict whether the existing house is in a rented state and predict the residence time of the resident of the existing house.

Specifically, the data set of the existing house lease information includes current house information and corresponding current resident information in the target area. The current house information at least comprises a house address, and the resident information at least comprises a resident name, a resident gender and a resident age. The method for constructing the feature identifier for the data set of the existing house lease information to obtain the existing identifier data set may be constructed by referring to the subsequent method about the history identifier data set, which is not described in detail herein.

The present embodiment predicts a probability value of whether a house is a rental state by inputting an existing identification data set into a first model, and predicts a living time period of a resident of an existing house by inputting an existing identification data set into a second model. The method comprises the steps of calculating the expected moving-out time of the resident by using the living time of the resident of the existing house and the expected living time. If the predicted time of removal differs from the actual time of removal by less than one month, the prediction is considered to be correct.

In a preferred embodiment, referring to fig. 2, the construction of the feature identification of the historical data set of house lease information includes the following steps:

and S11, numbering and identifying all houses according to the historical data set of the house leasing information to obtain house numbers, and identifying names of all residents to obtain resident names. The identification can effectively simplify data, and avoid the problems of low processing efficiency and large occupied resources caused by overlarge data volume.

And step S12, generating a resident time chain data set according to the historical data set of the house lease information, wherein the resident time chain data set comprises all resident names and corresponding house numbers, living time and moving-away time. Specifically, the occupant time-chain dataset may be represented, for example, as: resident U1, house H1, live in time X1, and carry out time X2.

Step S13, extracting personnel information with intersections at residence time in the same house based on the residence time chain data set to construct a residence and residence information data set; the resident and resident information data set comprises resident names, house numbers, corresponding resident names, resident entering time, resident moving-away time and resident moving-away time, and the number of months of intersection of resident time.

Specifically, the resident and co-resident information data set may be constructed by extracting a person having an intersection at the same resident time in the house by self-correlation of the data in step S12. Taking the resident U1 and the resident U2 in the same house in the resident time chain data set as an example, if the entering time of the resident U1 is smaller than the moving-out time of the resident U2 and the moving-out time of the resident U1 is longer than the moving-out time of the resident U2, the relationship between the resident U1 and the resident U2 as the resident and the resident can be determined, the number of intersection months (or the number of intersection days) can be calculated, and then the resident and the resident information data set can be formed. For example, the resident and co-resident information data set may be expressed as resident U1, house H1, co-resident U2, resident U1 in-time, resident U1 out-of-time, co-resident U2 in-time, co-resident U2 out-of-time, number of months of intersection (or days of intersection).

And step S14, determining the relationship identification of the resident and the resident by acquiring and integrating employment information, household registration and family planning information in the target area, wherein the relationship identification comprises colleague relationship, relative relationship and other relationship.

Specifically, the colleague relationship can construct a employment chain according to employment information by resident time and job departure time, and the colleague relationship between the resident and the resident is determined by the intersection of the common employment time. The relatives can construct three layers of direct relatives according to the household registration and family planning information by the resident, and obtain spouse, child, parent and spouse parent information of the resident to determine the relatives of the resident and the resident. Further, the relationship identity may be digitally differentiated, for example, with a relative relationship represented as 1, a colleague relationship represented as 2, and other relationships represented as 0, and for example, with a relative relationship or colleague relationship represented as 1, and not represented as 0. The specific relationship identifier of the resident and the resident can be a resident U1, a resident U2 and a relationship identifier 1 (or 2 or 0); it may also be resident U1, resident U2, relationship 1 (or 0), colleague relationship 0 (or 1). The setting may be specifically performed according to actual needs, and is not limited herein.

Step S15, a historical identification data set is constructed based on the resident and same resident information data set and the relationship identification of the resident and the same resident; the historical identification data set comprises a first data set and a second data set, wherein the first data set comprises a house number, historical resident information corresponding to the house and resident information corresponding to a building where the house is located, and is used as a training sample of a first logistic regression model; the second data set includes resident names, historical resident information of the resident, and corresponding historical co-resident information for use as training samples for a second logistic regression model.

In specific implementation, a history identification data set is constructed based on the resident and same resident information data set and the relationship identification of the resident and the same resident, and the method comprises the following steps:

and step S15a, constructing a same-resident characteristic detail data set of the resident in the current house.

And the characteristic detail data of the same resident and the resident are obtained by correlating the resident with the data of the relationship identification of the same resident and the resident. Wherein, the same resident characteristic detail data at least comprises: the co-resident name, the corresponding resident name, the resident duration, the co-resident number, the total co-resident duration, the average co-resident duration, and the relationship identification of the co-resident and the resident.

Step S15b, summarizing the same resident characteristic detail data set and the resident time chain data set to construct a resident characteristic detail data set.

Wherein the resident feature detail dataset comprises at least: the name of the resident, the corresponding house number, the historical total residence time of the resident, the average residence time of the resident, the number of people in the relationship between the resident and the resident, the number of the common residence month of the resident and the related person, the number of people in the relationship between the resident and the co-worker, the number of the common residence month of the resident and the co-worker, and the historical residence number of the resident.

And step S15c, building a house characteristic detail data set according to the resident characteristic detail data set.

And step S15d, extracting the data of the resident characteristic detail data set, the house characteristic detail data set, resident information and house information to obtain a first data set and a second data set.

Specifically, the first data set may include data such as a house number, a number of times of personnel variation of a house in a recent year, a employment identifier of a historical resident, an average resident number of the historical resident, a variance of the historical resident number, a variance of the age of the historical resident, a mean of the variance of the age of the resident of a building where the house is located, a same proportion of resident persons of the building where the house is located, a mean of a first living time difference of resident of the building where the house is located, and the like. The historical resident employment identification, the historical resident number variance of the house, the historical resident age variance of the house, the resident age variance mean value of the building where the house is located, the same proportion of the resident sexes of the building where the house is located, and the mean value of the first living time difference of the resident in the same house of the building where the house is located are obtained through statistics of house information and resident information. The second data set may include data of a resident, a house number, a current month number, a current house number of co-resident, a co-resident number of co-resident, a current house number of co-resident intersections, a current house number of historical resident houses, a current historical resident house number of co-resident intersections, a current historical resident number of average resident months, and the like.

Of course, the person skilled in the art can also increase, decrease or change the characteristic value for the data set according to the actual requirement, so as to adapt to the prediction requirement of the target area on the house lease information.

Further, due to the existence of house information such as school dormitories and employee dormitories in the historical data set of house lease information, the data have relative consistency, and the data are easy to have a large influence on the model, so that the accuracy of model training is affected. In order to solve the above problem and improve accuracy of the model, in the preferred embodiment, before training the first logistic regression model and the second logistic regression model according to the historical identification dataset, special data rejection is performed on the historical identification dataset;

the special data culling includes at least one of:

in the first mode, if the ratio of the resident and the same person in the building to the resident and all the buildings in the building exceeds 50%, the age variance of the resident and the same person in the building is lower than a preset variance value, the ratio of the resident and the resident in the building within one month to the building with the difference of the time of entering and moving away is higher than the preset ratio, all the building information and resident information are removed, and the building is primarily judged as a school dormitory.

It is specifically understood that the ratio of the resident to the house identified by the identity of the resident in the dormitory of the school is theoretically 100%, and the ratio of the resident to the house identified by the identity of the resident in the dormitory of the employee is theoretically more than 50%. Moreover, the variance of the ages of the resident and the resident in the same house in the dormitory of the school is very low, and the entering time and the moving time are consistent, so that the related data in the dormitory of the school can be effectively screened out by the method, and the influence on the accuracy of model training is avoided.

In a second mode, if the ratio of the occupied person to the house in the building where the house is located and the occupied person to all the houses in the building where the house is located in the relationship with the artificial colleagues exceeds a preset value, eliminating all the house information and the occupied person information of the building; the building house is primarily judged to be an enterprise dormitory.

Specifically, it can be understood that the higher house occupation of the relationship between the resident and the co-worker in the building can be confirmed as a worker dormitory, and the preset value can be designed according to the actual requirement, for example, the range of the preset value can be between 70% and 100%, which is not limited herein.

And in the third mode, if the age variance of the resident and the resident of the same house is higher than the preset variance ratio in the building where the house is located, and personnel relocation occurs in each verification, all house information and resident information of the building are removed, and the house is initially judged to be a hotel.

The building where the house is located can be understood to comprise personnel in all age groups, and the hotel or the citizen can be judged if personnel are moved every time verification occurs. The specific preset variance ratio is adjusted and set according to the actual requirement, and is not limited herein.

Here, it should be specifically noted that the inventive concept according to the present embodiment is not limited to the above method in the manner of special data knockout. The relevant house information with larger influence can be effectively removed according to the characteristics of the relevant house information. For example, staff dormitory can also perform auxiliary evidence elimination according to employment registration information. In addition, the dormitory of schools is characterized in that 4 persons, 6 persons and 8 persons are commonly used for a large number, and if the number of living persons in the same house in a building is mostly 4 persons or 6 persons or 8 persons, the dormitory of schools can be used as a rejection object. The method is characterized in that the historical resident age variance is high, the resident time is short, most resident persons have no record in employment registration, and the like, and relevant data of the hotel and the resident which are easy to influence the accuracy of the model can be removed according to the characteristics.

Of course, it is preferable to exclude relevant data that is susceptible to having a large impact on model accuracy directly, not as part of the historical data set of house rental information. However, in a large amount of historical data, some data which cannot be eliminated is inevitably existed, and the data can be effectively cleaned by the elimination mode, so that the accuracy of model prediction is improved.

In particular, the correct verification data is added as training samples to the historical data set of house rental information or the historical identification data set and the first model and the second model are retrained. The positive feedback can be formed in the process of continuously training and optimizing the model through the correction training, so that the accuracy of the model is gradually improved.

Further, in order to evaluate the accuracy of the model, the prediction method further comprises the steps of: and performing model evaluation on the trained first model and the trained second model. In particular, a portion of the historical identification dataset may be input as test samples into the first model and the second model to obtain predicted test results. Verifying the predicted test result, marking as correct if the renting state of the house is verified to be consistent with the actual state, otherwise marking as error; if the difference between the residence time of the resident and the actual time is verified to be within a preset time, the resident is marked as correct, otherwise, the resident is marked as error. The preset time period may be defined according to actual requirements, for example, 15 days or 30 days. The accuracy of the model can be known by counting the labeling result. For example, fig. 3 is a graph showing the accuracy of a test sample and a training sample of a region using a second model of an embodiment of the present invention, and as can be seen from fig. 3, the overall accuracy of the model that only collects data for one year is as high as 83%. Because the period of relocation and house property change is relatively long, on the basis, the model can be trained and optimized by continuously adding the correct data after verification for a long time, so that the accuracy of the model is continuously improved.

In order to better illustrate the foregoing implementation of a metropolitan area house lease prediction method based on community grid information collection, the following describes in detail the implementation process of the present invention with an example of a specific step:

(1) Acquiring historical data set of house lease information

The community visit data are arranged, and because relevant information of residents and houses is hidden in various visit data of the community, historical data of house lease information can be effectively obtained through data combing and mining. Specific examples are: resident U1, house H1, XX, and interview A1. The data shows that community interviewee A1 interviews resident U1 in house H1 in the community at the XX time point, and shows the relationship between U1 and house H1 at the XX time point. The specific house information and resident information can be obtained through the related files or household registration information of the community.

(2) Generating resident time chain data sets

According to the data in the step (1), a plurality of visited records of the same resident at different time points can be obtained, and a plurality of visited records of the same house at different time points can be obtained. According to the time sequence processing, the house change is determined as the moving action of the person when the same resident walks for two times. And taking the time point of the first visit record of the resident in one house as the entrance time, taking the time point of the first visit record of the resident in the next house as the removal time, and establishing a resident time chain data set. For example, it can be expressed as: resident U1, house H1, live in time X1, and carry out time X2.

(3) Building resident and co-resident information data sets

Using a current resident to start, using the resident time chain data set in the step (1) to perform self-correlation, extracting a person group having the same house and having an intersection with a residence time (i.e. the residence time of the current resident U1 is less than or equal to the moving-away time of the resident U2 of another house and the moving-away time of the resident U1 is greater than or equal to the residence time of the resident U2), and obtaining the number of intersection months (or intersection days), thereby constructing a resident and resident information data set, specifically, as follows: the resident U1, house H1, with resident U2, U1 check-in time, U1 move-out time, U2 check-in time, U2 move-out time, number of months of intersection (or days of intersection).

(4) Determining resident to co-resident relationship identification

By acquiring and integrating household registration and family planning data, three layers of direct family information are constructed by a resident, and a resident partner, a child, a parent and a partner parent are acquired, such as: resident u1, with resident u2, relationship type code. The employment data are acquired for integration, resident colleague relation data are constructed, a employment chain is constructed according to the time of entering and leaving the resident, and colleague relation information is generated through common employment intersection, such as: resident u1, colleague u2.

(5) Statistics and construction of current resident historical resident characteristics

And (3) carrying out grouping statistics on the residence time chain data obtained in the step (1) by using the residents to obtain indexes such as the current number of residence months of the current residents, the total number of residence months of the individual houses, the maximum number of residence months of the individual, the minimum number of residence months of the individual, the average number of residence months of the individual and the like.

(6) Construction of same-resident feature detail data set and resident feature detail data set

The resident and the resident information data set obtained in the step (3) are correlated with the relation constructed in the step (4) so as to obtain a resident feature detail data set, such as: the house H1 is connected with the resident U1, the resident U2, the resident U1 in-time, the resident U1 out-of-service time, the resident U2 in-time, the resident U2 out-of-service time, the intersection month number, the relatives identification and the colleague identification. And taking the data as viewing angles of the resident U1 and the house H1 to carry out statistics, and constructing a resident characteristic detail data set of the resident U1 on the house H1, such as: the method comprises the steps of collecting the total residence time of the residents U1, the house H1, residence time, the co-residence number, the total residence time, the maximum co-residence time, the minimum co-residence time, the average co-residence time and the like, further summarizing, and collecting the total residence time of the residents U1, the average co-residence number of the residents U1, the average co-residence month number of the residents U1, the co-residence month number of the residents U1 and the co-residence relatives, the co-residence month number of the residents U1 and the co-residence, the historical residence house number of the residents and the like under granularity of the residents. Further, the resident characteristic detail data set is self-correlated, and the correlation condition is that the T1 table resident U2 is equal to the T2 table resident U1, so as to obtain the statistical values of the resident and the resident.

(7) Building house feature detail data sets

And counting the data obtained in the steps by taking the house H1 as granularity, and obtaining the data such as the number of historical residents of the house H1 and the residents of the same resident, the number of the historical residents living in months, the average number of the historical residents living in months, the relation identification of the historical residents and the same resident, the average number of the common living in months of the historical residents, the number of the personnel variation of the house in the last year and the like.

(8) Special data culling

Because the schooldormitory, the staff dormitory, the hotel, the civilian, and the like can be acquired in the community acquisition process, the data needs to be classified under the condition that the address cannot be accurately marked, and how to find the data with larger influence on the model can be eliminated by adopting the following method: the analysis is carried out on the building according to school dormitories and employee dormitories, the ratio of the same sex marks of the co-resident persons in the building is high, the school is theoretically 100%, when the synchronous analysis is carried out on the same employee dormitories, the building with 50% of the same sex marks of the co-resident persons in the building is taken as the dormitory to be removed, in addition, the school dormitory is increased according to 2 people, 3 people, 4 people, 6 people and 8 people, similar conditions are similar to those of the building, the building is more similar, the variance of the age of the co-resident persons is low, and the entrance and the removal consistency is achieved. While staff dormitory can be assisted by employment registration, the sum of the co-resident relative rate and the colleague rate is high. The method aims at hotel and residence data and is characterized in that the historical resident age variance is high, the residence time is short, and most people cannot have records in employment registration. Therefore, the relevant special data can be removed through the features so as to improve the accuracy of model training.

(9) Marking data

And (3) dividing the historical identification data set processed in the step (2) to the step (8) into a first data set and a second data set, respectively carrying out data identification, wherein the first data set marks whether the resident identification of the corresponding verification time point of the house is a rental house or not, and the second data set marks the residence time of the resident in the house.

(10) Model construction and training

And constructing a first logistic regression model and a second logistic regression model, respectively putting the first data set and the second data set into the first logistic regression model and the second logistic regression model as training samples for training, wherein the first model is mainly used for predicting whether a house is a rental house or not, and when the property of the house is possibly changed, a visit task can be generated for verification. The second model is mainly used for predicting the possible residence time of the resident, so that the moving-away time is calculated, and the dispatching optimization of the walking visit task is carried out.

(11) Model evaluation

After the model is dispatched, a corresponding verification result of the last month is obtained for marking, if the model predicts that the rented house is verified to be a rented house, the model is correct, and if the number of the predicted living months and the living time are greater than the current time and the change of the resident of the house is verified, the model is correct, so that the accuracy rate of the model is obtained.

(12) Predicting new data

And constructing the characteristic identification of the data set of the existing house lease information to obtain the existing identification data set, inputting the existing identification data set into a model, acquiring probability data of whether the house is a rented house in the model I, acquiring living time prediction of a resident in the model II, calculating whether the moving-away time is in a critical state or not according to the living time, judging whether the predicted value is positive or negative for 1 month, and if the predicted moving-away time is within one month, dispatching a visit task for verification.

(13) Feedback and correction

When the community dispatches a visit task according to the prediction result, the community is mainly used for verifying whether the visit task is a rented house or not and whether resident changes or not, verification data are added into a training sample, positive feedback is formed, and the model accuracy is gradually improved. It should be noted that the introduction of the model does not represent the non-dispatch of the periodic full verification, but it can be optimized to reduce the frequency of the periodic verification, with the implementation of the interview task in a more flexible way in the two periodic verifications. It is specifically understood that the original visit period is once in three months, and a great amount of information of house renting tasks needs to be verified one by one in three months. The prediction method provided by the invention can effectively prolong the visit period, for example, the visit period is prolonged to be half a year, and more targeted verification work can be carried out according to the prediction result in the half year without verification of all house renting tasks.

In summary, the metropolitan area house lease prediction method based on community grid information collection provided by the embodiment of the invention is characterized in that the data of house lease information implied by community visit is mined and a historical identification data set is formed to train a model, so that community personnel are effectively assisted in completing information verification of house lease tasks, and resource management is optimized. The method not only can improve the information processing efficiency and accuracy, but also can prolong the interview period and reduce the interview pressure.

Example two

The embodiment of the invention also provides a metropolitan area house lease prediction system based on community grid information collection, referring to fig. 4, fig. 4 is a schematic structural diagram of metropolitan area house lease prediction based on community grid information collection, the system at least comprises:

the data processing module is used for acquiring a historical data set of the house leasing information and constructing characteristic identifiers of the historical data set of the house leasing information to obtain a historical identification data set.

The model training module is used for constructing a first logistic regression model and a second logistic regression model, and training the first logistic regression model and the second logistic regression model according to the historical identification data set to obtain the first model and the second model; the first model is used for predicting whether the house is in a renting state or not, and the second model is used for predicting the moving-away time of the house resident.

Optionally, the system further comprises a special data rejection module, which is used for carrying out special data rejection on the historical identification data set. The special data culling includes at least one of:

if the ratio of the resident and the same person like identification building to all the buildings in the building exceeds 50%, the age variance of the resident and the same person in the building is lower than a preset variance value, the ratio of the resident and the same person within one month of the time difference between the entering time and the moving time to the building exceeds the preset ratio, all the building information and resident information of the building are removed, and the building is preliminarily judged to be a school dormitory.

If the ratio of the resident to the house related to the co-workers in the building where the house is located to all the houses in the building where the house is located exceeds a preset value, eliminating all the house information and resident information of the building; the building house is primarily judged to be an enterprise dormitory.

Further, the method also comprises the steps of verifying the prediction result in the data prediction module, and inputting corresponding verification data into the data processing module or the model training module for correction training.

The specific roles and functions of each module can refer to the content of the first embodiment, and are not repeated herein.

Example III

An embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored, where the computer is executed by a processor to implement a metro house lease prediction method based on community grid information collection as described in any one of the embodiments of the first aspect above.

In specific implementation, the computer readable storage medium is a magnetic Disk, an optical Disk, a Read-only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD) or a Solid State Drive (SSD); the computer readable storage medium may also include a combination of the above types of memory.

Example IV

Referring to fig. 5, an embodiment of the present invention provides an electronic device, including at least one processor, and a memory communicatively connected to the processor, where the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the processor to perform a metropolitan area house lease prediction method based on community grid information collection as described in the above method embodiment.

In particular, the number of processors may be one or more, and the processors may be central processing units (Central Processing Unit, CPU). The processor may also be any other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be communicatively coupled to the processors via a bus or other means, the memory storing instructions executable by the at least one processor to cause the processor to perform the metropolitan area house lease prediction method based on community grid information collection as described in the method embodiments above.

In summary, compared with the prior art, the metropolitan area house lease prediction method and system based on community grid information acquisition provided by the invention realize effective processing of data through the steps of data mining, feature identification construction, first model prediction, second model prediction and the like, thereby effectively assisting community personnel to complete information verification of house lease tasks. The information processing efficiency and accuracy of house renting tasks can be improved, the visit pressure is reduced, the manpower resource waste is reduced, the visit period can be effectively prolonged, and the community resource management is optimized.

In addition, it should be understood by those skilled in the art that although many problems exist in the prior art, each embodiment or technical solution of the present invention may be modified in only one or several respects, without having to solve all technical problems listed in the prior art or the background art at the same time. Those skilled in the art will understand that nothing in one claim should be taken as a limitation on that claim.

Although terms such as a historical data set of house lease information, a historical identification data set, a first logistic regression model, a second logistic regression model, a first model, a second model, an existing identification data set, a resident time chain data set, a resident and co-resident information data set, a co-resident feature detail data set, a house feature detail data set, etc. are more used herein, the possibility of using other terms is not excluded. These terms are used merely for convenience in describing and explaining the nature of the invention; they are to be interpreted as any additional limitation that is not inconsistent with the spirit of the present invention; the terms first, second, and the like in the description and in the claims of embodiments of the invention and in the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A metropolitan area house lease prediction method based on community grid information collection is characterized by comprising the following steps:

Acquiring a data set of existing house lease information, and constructing a characteristic identifier of the data set of the existing house lease information to obtain an existing identifier data set; inputting the existing identification data set into a first model and a second model to predict whether the existing house is in a rented state or not and predict the residence time of residents of the existing house;

the historical data set of the house leasing information comprises all house information in a target area, resident information corresponding to each house, living time and moving-away time; the house information at least comprises a house address, and the resident information at least comprises a resident name, a resident gender and a resident age;

the construction of the characteristic identification of the historical data set of the house lease information comprises the following steps:

2. The metropolitan area house lease prediction method based on community grid information collection as claimed in claim 1, characterized in that: based on the resident and same-resident information data set and the relationship identification of the resident and the same-resident, a history identification data set is constructed, and the method specifically comprises the following steps:

3. The metropolitan area house lease prediction method based on community grid information collection as claimed in claim 1, characterized in that: before training the first logistic regression model and the second logistic regression model according to the historical identification data set, special data rejection is carried out on the historical identification data set;

the special data culling includes at least one of:

4. The metropolitan area house lease prediction method based on community grid information collection as claimed in claim 1, characterized in that: the first logistic regression model and the second logistic regression model adopt sigmoid functions.

5. The metropolitan area house lease prediction method based on community grid information collection according to claim 1, further comprising the steps of: and verifying the prediction results of the first model and the second model, and adding corresponding verification data into the historical data set or the historical identification data set of the house lease information to correct and train the first model and the second model.

6. A metropolitan area house lease prediction system based on community grid information collection, comprising:

the data processing module is used for acquiring a historical data set of house lease information and constructing characteristic identifiers of the historical data set of the house lease information to obtain a historical identification data set; the historical data set of the house leasing information comprises all house information in a target area, resident information corresponding to each house, living time and moving-away time; the house information at least comprises a house address, and the resident information at least comprises a resident name, a resident gender and a resident age;

the construction of the characteristic identification of the historical data set of the house lease information comprises the following steps: according to the historical data set of the house leasing information, numbering and marking all houses to obtain house numbers, and carrying out name marking on all residents to obtain resident names; generating a resident time chain data set according to the historical data set of the house lease information, wherein the resident time chain data set comprises all resident names and corresponding house numbers, living time and moving-away time; based on the resident time chain data set, extracting personnel information with intersections at resident time in the same house to construct a resident and resident information data set; the resident and resident information data set comprises resident names, house numbers, corresponding resident names, resident entering time, resident moving-away time and resident moving-away time, and the number of the intersections of the resident names, the resident numbers, the resident entering time, the resident moving-away time and the resident moving-away time; determining a relationship identifier of a resident and a resident by acquiring and integrating employment information, household registration and family planning information in the target area, wherein the relationship identifier comprises a colleague relationship, a relative relationship and other relationships; constructing a historical identification data set based on the resident and same resident information data set and the relationship identification of the resident and the same resident;

The historical identification data set comprises a first data set and a second data set, wherein the first data set comprises a house number, historical resident information corresponding to the house and resident information corresponding to a building where the house is located, and is used as a training sample of a first logistic regression model; the second data set comprises resident names, historical resident information of the resident and corresponding historical same resident information, and is used as a training sample of a second logistic regression model;

the model training module is used for constructing a first logistic regression model and a second logistic regression model, and training the first logistic regression model and the second logistic regression model according to the historical identification data set to obtain the first model and the second model; the first model is used for predicting whether the house is in a renting state or not, and the second model is used for predicting the residence time of the resident of the house;