CN110096634B

CN110096634B - House property data vector alignment method based on particle swarm optimization

Info

Publication number: CN110096634B
Application number: CN201910354563.3A
Authority: CN
Inventors: 蔡彪; 谭富文
Original assignee: Chengdu Univeristy of Technology
Current assignee: Chengdu Univeristy of Technology
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2023-02-24
Anticipated expiration: 2039-04-29
Also published as: CN110096634A

Abstract

The invention discloses a real estate data vector alignment method based on particle swarm optimization. The model fusing the similarity of the multi-attribute structure entity provided by the invention is used for obtaining the similarity of each attribute weight and a total similarity threshold value by crawling the second-hand house data and preprocessing the data and respectively solving the similarity of different second-hand house attributes, then constructing the model fusing the similarity of the multi-attribute structure entity, optimizing the multi-attribute weight by using the model fusing the similarity of the multi-attribute structure entity, realizing the matching work of the similarity of the house property and obtaining the alignment result with better performance.

Description

House property data vector alignment method based on particle swarm optimization

Technical Field

The invention relates to a data vector alignment method, in particular to a real estate data vector alignment method based on particle swarm optimization.

Background

The property right law clearly stipulates that the state carries out a unified registration system on real estate, and the real estate registration system is integrated, and the real estate data is also an important work in real estate registration. The integration speed is low, the scale is large, the difficulty is large, errors are easy to occur, and the updating is delayed through manual work based on Excel, so that the requirement of actual requirements cannot be met. How to automatically construct a new real estate database has higher research value and application prospect for the unified registration of real estate.

The existing real estate data fusion technology is based on a cloud architecture technology, and data integration is carried out on cloud service dynamic migration in a large logic set. And information among multiple departments is integrated by using a GIS technology and the current communication technology.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides the method for aligning the real estate data vector based on particle swarm optimization, which solves the problem that the real estate data of different real estate transaction service platforms are difficult to align.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a real estate data vector alignment method based on particle swarm optimization is characterized by comprising the following steps:

s1, crawling second-hand house source data of a certain city from different second-hand house source webpages;

s2, preprocessing the second-hand house source data;

s3, calculating the similarity of different attributes of the preprocessed second-hand house source data, and constructing a fusion multi-attribute structure model;

s4, taking an array formed by the weight values of all attributes in the fusion multi-attribute structure model as the position of one particle in the particle swarm, initializing the number, the iteration times, the cognitive factors and the social factors of the particle swarm, initializing the positions of the individual particle swarm, namely the weight values of the similarity of all attributes, initializing the speed of the individual particle swarm, calculating the initial value of each individual extreme value, and making the initial value of the global extreme value equal to the initial value of the individual extreme value;

s5, calculating the total similarity of all entity pairs according to the fusion multi-attribute structure model, calculating a threshold value of the total similarity, and bringing the threshold value into a training set to compare real classification results to obtain an F1 value of the training set;

s6, taking the F1 value of the training set as the fitness of each particle, updating the individual extreme value to be the fitness of the particle when the fitness of the particle is greater than the individual extreme value of the particle, calculating the maximum fitness of the current group, and updating the global extreme value to be the maximum fitness when the maximum fitness is greater than the global extreme value;

s7, dividing the particle swarm into particle swarms of 3 grades according to the particle fitness, and calculating the self-adaptive inertia weight of objective function values of different grades;

s8, updating and calculating the speed of the particle swarm according to the individual extreme value, the global extreme value, the inertia weight, the cognitive factor and the social factor, updating the position of the particle swarm according to the speed of the particle swarm, and adding 1 to the iteration number;

s9, when the iteration times are smaller than the maximum iteration times, returning to the step S5, otherwise, outputting the positions of the particle swarm, namely the attribute weights in the multi-attribute structure model;

and S10, calculating a threshold value of the test lumped similarity, and predicting the entity pairs in the test set by using the multi-attribute structural model and the threshold value of the test lumped similarity to realize the matching of the second-hand house source.

Further, the method comprises the following steps: the preprocessing in the step S2 comprises complementing incomplete house source data and normalizing the house source data.

Further, the method comprises the following steps: the attributes of the second-hand house source data in the step S3 comprise a cell name, a title, a floor type graph, a price, an area, an orientation and a floor;

the calculation formula of the cell name similarity sim _ name (a, B) is as follows:

in the above formula, nameA and nameB are respectively the cell names of house source A and house source B in two house source webpages;

the title similarity sim _ title (A, B) is calculated by the following method:

adding a blank space between words of a group of entity pair titles S1 and S2 in two house source webpages, respectively calculating TF values and IDF values of each word, calculating TFIDF values through the TF values and the IDF values, further obtaining a word frequency-inverse text frequency matrix, and calculating cosine similarity sim _ title (A, B) of the two house source title word frequency-inverse text frequency matrices;

the calculation formula of the TFIDF value is as follows:

TFIDF _i，j ＝TF _i，j ×IDF _i，j

in the above formula, TFIDF _i，j For the word frequency-inverse text frequency matrix, TF _i，j As a word-frequency matrix, IDF _i，j Is an inverse text frequency matrix;

the calculation method of the similarity sim _ img (A, B) of the indoor graph comprises the following steps:

carrying out scaling and graying on two images img1 and img2 of the entity pair in the two house source webpages;

establishing an SURF algorithm model, respectively extracting the characteristics des1 and des2 of the img1 and img2 of the two pictures through the SURF algorithm model, and matching characteristic points through a Knn algorithm according to the characteristics desl and des 2;

calculating the number of the matching feature points with the distance ratio larger than 0.9, and calculating the proportion of the matching feature points in the total matching feature points as the picture similarity sim _ img (A, B);

the calculation formula of the price similarity sim _ price (A, B) is as follows:

in the above formula, price (a, B) is a relative value of Price, and the calculation formula is:

in the above formula, P _A And P _B Prices, max (P), of house source A and house source B in two house source web pages, respectively _n )、Min(P _n ) Are respectively two house sourcesAll entity pairs in the webpage are the maximum difference value and the minimum difference value of prices in the same house source;

the calculation formula of the area similarity sim _ size (a, B) is as follows:

in the above formula, size (a, B) is the area difference between the house source a and the house source B in the two house source webpages, and the calculation formula is:

Size(A，B)＝|S _A -S _B |

in the above formula, S _A And S _B The areas of the house source A and the house source B in the two house source webpages respectively;

when the orientation of the house sources in the two house source webpages is the same, the orientation similarity sim _ direction (A, B) is 1, otherwise the orientation similarity sim _ direction (A, B) is 0;

when the floors of the house sources in the two house source webpages are the same, the floor similarity sim _ floor (A, B) is 1, otherwise, the floor similarity sim _ floor (A, B) is 0.

Further: the calculation formula of the fusion multi-attribute structure model Sim (a, B) in step S3 is:

Sim(A，B)＝ω ₁ ×sim_name(A，B)+ω ₂ ×sim_title(A，B)+ω ₃ ×sim_im9(A，B)+ω ₄ ×sim_price(A，B)+ω ₅ ×sim_size(A，B) +ω ₆ ×sim_direction(A，B)+ω ₇ ×sim_floor(A，B)

in the above formula, ω ₁ Is the weight, omega, of the similarity of the cell name attributes ₂ As a weight, ω, of similarity of title attributes ₃ Is the weight, omega, of the similarity of the attributes of the house type graph ₄ As a weight of the similarity of the price attributes, ω ₅ As a weight of area attribute similarity, ω ₆ As weights towards attribute similarity, ω ₇ And the weight value of the similarity of the floor attributes is obtained.

Further: the calculation formula of the threshold Sim of the total similarity in step S5 is:

Sim＝A[i]-C[i]/2

in the above formula, ai is the ith item of the same house source total similarity set arranged in ascending order, ci = ai-Bi, B i is the ith item of different house source total similarity sets arranged in descending order, and i is the item of the set C which is first greater than 0.

Further, the method comprises the following steps: the calculation formula of the adaptive inertial weight ω in step S7 is:

in the above formula, ω _max 、ω _min Maximum and minimum values, f, initially set for the adaptive inertial weight ω, respectively _i Is the current objective function value of the ith particle, i =1,2 _m For optimal particle fitness in the particle swarm, f _avg Is the average value f of the fitness values of the constituent subgroups with a fitness value greater than the average value of the population fitness values' _avg Is the average of the fitness values of the formed subgroups with fitness values smaller than the average of the population fitness values.

Further: in step S8, the velocity of the particle group is updated as follows:

V _id ＝ωV′ _id +C ₁ random(0，1)(P _id -X′ _id )+C ₂ random(0，1)(P _gd -X′ _id )

in the above formula, V _id Is the velocity of the particle group after update, V' _id Is the velocity of the ith particle at the current iteration number, ω is the adaptive inertial weight, C1 is the cognitive factor, C2 is the social factor, C1= C2 ∈ [0,4 ]]Random (0, 1) is the interval [0, 1]]Random number of (1), P _id Is the d-th dimension, P, of the individual extremum of the i-th variable _gd D dimension, X 'of global extremum' _id The position of the ith particle at the current iteration number.

Further: the position update of the particle group in step S8 is:

X _id ＝X′ _id +V _id

in the above formula，X _id Is the updated position of the ith particle, X' _id Is the position of the ith particle at the current iteration number, V _id The updated velocity for the ith particle.

The beneficial effects of the invention are as follows: the model fusing the similarity of the multi-attribute structure entity provided by the invention is used for obtaining the similarity of each attribute weight and a total similarity threshold value by crawling the second-hand house data and preprocessing the data and respectively solving the similarity of different second-hand house attributes, then constructing the model fusing the similarity of the multi-attribute structure entity, optimizing the multi-attribute weight by using the model fusing the similarity of the multi-attribute structure entity, realizing the matching work of the similarity of the house property and obtaining the alignment result with better performance.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram illustrating weight distribution of different attributes in the present invention;

FIG. 3 is the maximum value of the adaptive value of the PSO algorithm of the present invention under different iteration times;

FIG. 4 is the average value of the fitness of the invention and a standard PSO algorithm under different iteration times;

FIG. 5 shows F1 values for different iterations of the present invention and a standard PSO algorithm.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, a real estate data vector alignment method based on particle swarm optimization is characterized by comprising the following steps:

s1, crawling second-hand house source data of a certain city from two second-hand house source webpages.

S2, preprocessing the second-hand house source data; the preprocessing comprises complementing incomplete house source data and normalizing the house source data.

the attributes of the second-hand house source data comprise a cell name, a title, a house type diagram, a price, an area, an orientation and a floor;

although the cell name belongs to the category of textual information, we cannot compute it with TF-IDF based methods. Therefore, a text comparison mode is directly performed after normalization, if the cell names are completely the same, the similarity is 1, and if not, the similarity is 0. The calculation formula of the cell name similarity sim _ name (a, B) is as follows:

in the above formula, nameA and nameB are respectively the cell names of the house source A and the house source B in different house source webpages in the two house source webpages; the two house source web pages are respectively a chain house and a resident.

The title is generally short, few stop words are in the title and are key information, so that the similarity of the title is calculated by adopting a TF-IDF-based method. The title similarity sim _ title (A, B) is calculated by the following method:

adding a blank space between words of a group of entities in two house source web pages to the titles S1 and S2, respectively calculating TF (word frequency, counting the occurrence frequency value of each word in the title) and IDF (inverse text frequency, which are used for correcting a word characteristic value represented by the word frequency) of each word, thereby improving the importance degree of the word in the text, calculating TFIDF value through the TF value and the IDF value, further obtaining a word frequency-inverse text frequency matrix, and calculating cosine similarity sim _ title (A, B) of the two house source title word frequency-inverse text frequency matrices;

the calculation formula of the TFIDF value is as follows:

TFIDF _i，j ＝TF _i，j ×IDF _i，j

the house type graph is generally a house type graph issued by house agency service personnel when house sales are carried out through previous developers, the pictures are regular, and due to the fact that the sources of uploaded pictures of each person are different, the pictures have the problems of different resolutions and picture rotation. Aiming at the characteristics, the invention selects an improved SIFT algorithm, namely an SURF algorithm. The method has the advantages of capturing the local characteristics of the picture, not influencing the zooming, the rotation and the brightness of the picture, and the like. The calculation method of the similarity sim _ img (A, B) of the indoor graph comprises the following steps:

price information of the same house source registered in different house agencies is different, and the calculation formula of the price similarity sim _ price (A, B) is as follows:

in the above formula, P _A And P _B Are respectively twoPrice, max (P), of Source A and Source B in Individual Source Web Page _n )、 Min(P _n ) Respectively setting the maximum difference value and the minimum difference value of prices in the same house source for all entity pairs in two house source webpages;

although the information areas registered by the same house source in different houses are different, the difference is not large, the difference is generally a number smaller than 1, and the calculation formula of the area similarity sim _ size (a, B) is as follows:

in the above formula, size (a, B) is the area difference between house source a and house source B in two house source webpages, and the calculation formula is:

Size(A，B)＝|S _A -S _B |

when the orientation of the house sources in the two house source webpages is the same, the orientation similarity sim _ direction (A, B) is 1, otherwise, the orientation similarity sim _ direction (A, B) is 0;

The calculation formula of the fusion multi-attribute structure model Sim (A, B) is as follows:

Sim(A，B)＝ω ₁ ×sim_name(A，B)+ω ₂ ×sim_title(A，B)+ω ₃ ×sim_img(A，B)+ω ₄ ×sim_price(A，B)+ω ₅ ×sim_size(A，B) +ω ₆ ×sim_direction(A，B)+ω ₇ ×sim_floor(A，B)

in the above formula, ω ₁ As a weight, ω, of the similarity of cell name attributes ₂ As a weight, ω, of similarity of title attributes ₃ Is the weight, omega, of the similarity of the attributes of the house type graph ₄ As a weight of the similarity of the price attributes, ω ₅ As a weight of area attribute similarity, ω ₆ As weights towards attribute similarity, ω ₇ As a floor attributeAnd (4) the weight of the similarity.

S4, fusing each attribute weight [ omega ] in the multi-attribute structure model ₁ ，ω ₂ ，…，ω ₇ ]The formed array is used as the position of one particle in the particle swarm, the number, the iteration times, the cognitive factors and the social factors of the particle swarm are initialized, the position of each individual particle swarm, namely the weight of each attribute similarity, the speed of each individual particle swarm is initialized, the initial value of each individual extreme value is calculated, and the initial value of the global extreme value is equal to the initial value of the individual extreme value; take [0,1]The random number in the range is used as the initial position of the particle, and the initial velocity of each particle is also set to 0,1]Random numbers within a range. Thus, an initial population of particles is generated.

S5, calculating the total similarity of all entity pairs according to the fused multi-attribute structure model, calculating the threshold value of the total similarity, and bringing the threshold value into a training set to compare real classification results to obtain an F1 value of the training set; the initialized or updated weight is brought into each entity pair instance, the total similarity of each entity pair is calculated, then the obtained total entity similarity is divided into two lists A and B according to whether the actual entity pair is the same house source or not (the list A represents the set of the total similarity of the same house source actually, and the list A represents the set of the total similarity of different house sources actually), the lists A and B are respectively arranged in ascending order and descending order, then the difference is made to obtain the list C, and the calculation formula of the threshold value Sim of the total similarity is as follows:

Sim＝A[i]-C[i]/2

in the above formula, a [ i ] is the i-th item of the set of the total similarity of the same house source arranged in an ascending order, C [ i ] = a [ i ] -B [ i ], B [ i ] is the i-th item of the set of the total similarity of different house sources arranged in a descending order, and i is the item which is firstly greater than 0 in the set C.

S6, taking the F1 value of the training set as the fitness of each particle, updating the individual extreme value to be the particle fitness when the fitness of the particle is greater than the historical extreme value of the particle, calculating the maximum fitness of the current group, and updating the global extreme value to be the maximum fitness when the maximum fitness is greater than the global extreme value;

s7, dividing the particle swarm into subgroups of 3 levels according to the particle fitness, and calculating the self-adaptive inertia weight of objective function values of different levels; in the standard particle swarm optimization algorithm, the inertia weight is one of important parameters, and the influence degree of the current particle speed on the updated speed can be changed by changing the inertia weight, so that the optimizing capability and the convergence speed of the whole algorithm are controlled. The speed of the particles with stronger exploration capacity is higher; particles with stronger development ability and smaller speed. Based on the discovery, the optimization of the inertia weight of the particle swarm has important significance. The calculation formula of the self-adaptive inertia weight omega is as follows:

S8, updating and calculating the speed of the particle swarm according to the individual extreme value, the global extreme value, the inertia weight, the cognitive factor and the social factor, updating the position of the particle swarm according to the speed of the particle swarm, and adding 1 to the number of iterations;

the velocity of the particle population is updated as:

in the above formula, V _id Is the velocity, V 'of the updated particle group' _id Is the velocity of the ith particle at the current iteration number, ω is the adaptive inertial weight, C1 is the cognitive factor, C2 is the social factor, C1= C2 ∈ [0,4 ]]Random (0, 1) is the interval [0, 1]]On the followingNumber of machines, pi _d Is the d-th dimension, P, of the individual extremum of the i-th variable _gd D dimension, X 'of a global extreme value' _id The position of the ith particle at the current iteration number.

The position of the particle swarm is updated as follows:

X _id ＝X′ _id +V _id

in the above formula, X _id Is the updated position of the ith particle, X' _id Is the position of the ith particle at the current iteration number, V _id Updated velocity for the ith particle.

And S9, when the iteration times are smaller than the maximum iteration times, returning to the step S5, otherwise, outputting the position of the particle swarm, namely the weight of each attribute in the multi-attribute structure model.

The experimental data of the invention are information such as cell names, titles, family type graphs, prices, areas, orientations, floors and the like of the second-hand house data respectively crawled from the chain house website and the security guest website. The expression form of the attribute value of the same attribute name of different second-hand house service platforms can be different, for example, the semantics of "60 square meters" and "60 square meters" are identical, but the expression form is different. In order to enhance the reliability of the similarity of the attributes, the expression form of the attribute value needs to be normalized. Table 1 is an example of partially normalized attribute values.

Table 1 attribute value normalization example

Properties	Attribute values in chain families	Attribute values in live guests	Normalized attribute values
				Cell name	Red maple ridge three stages	Three stages of Zhongchang red maple Ling	Red maple ridge three stages
Area of	88.5 square meter	88.5 square meters	88.5 square meters
				Floor level	Middle layer (23 layers in total)	Middle floor/total 23 floors	Middle layer (23 layers in all)

The experiment of the invention mainly judges the similarity of second-hand rooms of a real estate service platform, can be regarded as a two-classification model, combines the evaluation indexes of a classification algorithm, and selects the accuracy (P), the recall ratio (R) and the F1 value which are commonly used by the classification problem as the standard for evaluating the performance of the algorithm. Table 2 illustrates the relevant parameters of the evaluation index.

TABLE 2 evaluation index parameters

Precision rate (Precision), also called Precision rate. It is predicted how many of the results are correctly classified as positive samples. The calculation formula is as follows:

recall (Recall), also known as Recall. What is the correct classification in the true positive sample result. The completeness of the classification result of the model is reflected. The calculation formula is as follows:

in order to be able to evaluate the model proposed by the invention comprehensively, F is used ₁ And evaluating the classification effect of the whole data set by using the harmonic average value of the value, the accuracy rate and the recall rate, wherein the calculation formula is as follows:

through analysis and parameter optimization of main parameters of the particle swarm algorithm, the fact that when the number N of particle swarms is 50, the iteration number Step is 100, the learning factor C1= C2=2, and the minimum value and the maximum value of the inertia weight are 0.4 and 0.9 respectively is found that the classification effect is the best, the particle convergence condition is good, and the optimal classification effect on a test set shown in table 3 and the particle distribution diagram shown in fig. 2 are obtained.

TABLE 3 FAPSO optimal Classification Effect

	P	R	F1
				FAPSO	0.882	0.845	0.863

The similarity weights of the attributes in the optimal classification effect are respectively [1,0.236,0.72,1, 0.975,1], and the total similarity threshold is 5.249. From the weight distribution of the similarity, the similarity of the picture, the area, the orientation, the cell name and the floor has a large influence on the total similarity, and the title has a small influence on the total similarity. This is because the title contains a lot of information, but the format is not uniform, and other attributes are inherent to the house, and therefore, the influence on the total similarity is large.

Compared with the standard particle swarm algorithm PSO, the particle swarm algorithm FAPSO based on the self-adaptive inertial weight adopts a fitness self-adaptive inertial weight method, and reduces the particle dimensionality by independently calculating the total similarity threshold in a fitness function. The number N of particle swarms and the maximum iteration number Step of the particle swarms of the adaptive inertial weight and the standard PSO algorithm are respectively set to be 50 and 100, the learning factor C1= C2=2, wherein the inertial weight of the standard PSO algorithm is 0.6, and the minimum value and the maximum value of the inertial weight of FAPSO are respectively 0.4 and 0.9 of empirical values. The performance comparison and analysis as a function of iteration number is shown in fig. 3, 4 and 5.

As can be seen from fig. 3, the FAPSO and PSO algorithms have an influence on the global optimum value of the fitness as the number of iterations increases, and we find that the global optimum value of the PSO increases significantly as the number of iterations increases, but the algorithm used in the present invention increases more stably.

As can be seen from fig. 4, as the number of iterations increases, the fitness average of the PSO current iteration population increases significantly, but the algorithm used in the present invention increases more smoothly. The FAPSO stability is obviously superior to that of PSO.

As can be seen from fig. 5, as the number of iterations increases, the global optimal particle obtained from the current number of iterations is classified into a test set, and the obtained test set F1, FAPSO is significantly better than PSO.

In conclusion, the FAPSO algorithm is significantly better than the standard PSO algorithm, regardless of the convergence rate or the optimization ability of the particles. And when the number of iterations of the FAPSO algorithm is small, particles with high quality can be obtained, and the overall performance of the particle population is obviously high.

Claims

1. A real estate data vector alignment method based on particle swarm optimization is characterized by comprising the following steps:

s2, preprocessing the second-hand house source data;

s6, taking the F1 value of the training set as the fitness of each particle, updating the individual extreme value as the fitness of the particle when the fitness of the particle is greater than the individual extreme value of the particle, calculating the maximum fitness of the current group, and updating the global extreme value as the maximum fitness when the maximum fitness is greater than the global extreme value;

s10, calculating a threshold value of the test lumped similarity, and predicting the entity pairs in the test set by using the multi-attribute structural model and the threshold value of the test lumped similarity to realize the matching of the second-hand house source;

the preprocessing in the step S2 comprises complementing incomplete house source data and normalizing the house source data;

the attributes of the second-hand house source data in the step S3 comprise a cell name, a title, a floor type graph, a price, an area, an orientation and a floor;

the title similarity sim _ title (A, B) is calculated by the following method:

adding a blank space between words of a group of entity pair titles S1 and S2 in two house source web pages, respectively calculating TF values and IDF values of all words, calculating TFIDF values through the TF values and the IDF values, further obtaining word frequency-inverse text frequency matrixes, and calculating cosine similarity sim _ title (A, B) of the two house source title word frequency-inverse text frequency matrixes;

the calculation formula of the TFIDF value is as follows:

TFIDF _i,j ＝TF _i,j ×IDF _i,j

in the above formula, TFIDF _i,j For the word frequency-inverse text frequency matrix, TF _i,j As a word-frequency matrix, IDF _i,j Is an inverse text frequency matrix;

the calculation method of the similarity sim _ img (A, B) of the floor plan comprises the following steps:

establishing an SURF algorithm model, respectively extracting the characteristics des1 and des2 of the img1 and img2 of the two pictures through the SURF algorithm model, and matching characteristic points through a Knn algorithm according to the characteristics des1 and des 2;

in the above formula, P _A And P _B Prices, max (P), of Source A and Source B in the two Source Web pages, respectively _n )、Min(P _n ) Respectively setting the maximum difference value and the minimum difference value of prices in the same house source for all entity pairs in two house source webpages;

the calculation formula of the area similarity sim _ size (A, B) is as follows:

Size(A,B)＝|S _A -S _B |

when the floors of the house sources in the two house source webpages are the same, the floor similarity sim _ floor (A, B) is 1, otherwise, the floor similarity sim _ floor (A, B) is 0;

the calculation formula of the fusion multi-attribute structure model Sim (a, B) in step S3 is:

Sim(A,B)＝ω ₁ ×sim_name(A,B)+ω ₂ ×sim_title(A,B)+ω ₃ ×sim_img(A,B)+ω ₄ ×sim_price(A,B)+ω ₅ ×sim_size(A,B)+ω ₆ ×sim_direction(A,B)+ω ₇ ×sim_floor(A,B)

in the above formula, ω ₁ Is the weight, omega, of the similarity of the cell name attributes ₂ As a weight, ω, of similarity of title attributes ₃ Is the weight, omega, of the similarity of the attributes of the house type graph ₄ As a weight of the similarity of the price attributes, ω ₅ As a weight of area attribute similarity, ω ₆ As a weight of similarity of orientation attributes, ω ₇ The weight value of the similarity of the floor attributes is obtained;

the calculation formula of the threshold Sim of the total similarity in step S5 is:

Sim＝A[i]-C[i]/2

in the above formula, ai is the i-th item of the same house source total similarity set in ascending order, ci = ai-Bi, B [ i ] is the i-th item of different house source total similarity sets in descending order, and i is the item which is first greater than 0 in the set C;

the calculation formula of the adaptive inertia weight ω in step S7 is:

in the above formula, ω _max 、ω _min Respectively, the adaptive inertial weight omegaSet maximum and minimum values, f _i I =1,2, \ 8230;, where m, m is the population size, f is the current value of the objective function for the ith particle _m For optimal particle fitness in the particle swarm, f _avg Is the average value f of the fitness values of the constituent subgroups with a fitness value greater than the average value of the population fitness values' _avg The average value of the fitness values of the formed subgroups with the fitness value smaller than the average value of the population fitness values;

in step S8, the velocity of the particle group is updated as follows:

V _id ＝ωV′ _id +C ₁ random(0,1)(P _id -X′ _id )+C ₂ random(0,1)(P _gd -X′ _id )

in the above formula, V _id Is the velocity of the particle group after update, V' _id Is the velocity of the ith particle at the current iteration number, ω is the adaptive inertial weight, C1 is the cognitive factor, C2 is the social factor, C1= C2 ∈ [0,4 ]]Random (0, 1) is the interval [0, 1]]Random number of (2), P _id D-dimension, P, of the individual extremum of the i-th variable _gd D dimension, X 'of a global extreme value' _id The position of the ith particle under the current iteration number;

in step S8, the position of the particle group is updated as follows:

X _id ＝X′ _id +V _id

in the above formula, X _id Is the updated position of the ith particle, X' _id Is the position of the ith particle at the current iteration number, V _id The updated velocity for the ith particle.