CN114722810A - Real estate customer portrait method and system based on information extraction and multi-attribute decision - Google Patents

Real estate customer portrait method and system based on information extraction and multi-attribute decision Download PDF

Info

Publication number
CN114722810A
CN114722810A CN202210276309.8A CN202210276309A CN114722810A CN 114722810 A CN114722810 A CN 114722810A CN 202210276309 A CN202210276309 A CN 202210276309A CN 114722810 A CN114722810 A CN 114722810A
Authority
CN
China
Prior art keywords
customer
client
information
attribute
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210276309.8A
Other languages
Chinese (zh)
Inventor
朱李楠
徐翼飞
许敏皓
朱柘潮
孔祥杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210276309.8A priority Critical patent/CN114722810A/en
Publication of CN114722810A publication Critical patent/CN114722810A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Evolutionary Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method for profiling a real estate customer based on multi-attribute decision-making for information extraction, comprising: 1) data collection: comprehensively considering various data sources, and obtaining basic information and impression text data of client groups required by an experiment after screening and extracting; 2) information extraction: extracting key information by using a key phrase extraction model and taking a word embedding matrix of an impression text of a customer and a POS semantic mark embedding matrix corresponding to the text as input, and integrating the key information into basic information of the customer; 3) attribute weight assignment: according to the feature description of the real estate industry to each client crowd, corresponding weights are distributed for each attribute weight aiming at each crowd; 4) customer portrait identification: the client is depicted from 6 aspects such as education attention, root intention, investment tendency and the like, and the aspect with higher score is selected as the label of the client. The invention also includes a real estate customer representation system based on information extraction and multi-attribute decision making.

Description

Real estate customer portrait method and system based on information extraction and multi-attribute decision
Technical Field
The invention relates to a real estate customer imaging method and system.
Background
With the continuous development and deepening of urbanization, the production efficiency of various industries in cities is greatly improved by a large number of people rushing in, and the economic effect and the siphon effect bring more practitioners and potential customers to the real estate industry. The real estate industry is a field where customer portrayal is urgently needed. In the past, real estate businesses have typically promoted sales by large area advertising, questionnaires, and telephone interviews. However, these methods have disadvantages such as large investment and insignificant effects.
In recent years, with the continuous and deep revolution of information technology, the wave of informatization and digitization has rolled up various industries. With the increasing maturity of big data technologies and natural language processing technologies, data-driven client figures have made it possible to solve this series of problems. Aiming at the problems, the client portrait describes the overall behavior characteristics of the user by collecting mass user information, and can help enterprises to locate target client groups and perform individual and accurate marketing on clients with demands. Thus, both the enterprise and the client can achieve win-win.
The client portrait abstracts the tags from multiple dimensions (such as basic information, characteristic preference and social attribute) based on the behavior characteristics of the user in real life, and aims to describe the overall behavior characteristics of the user as fully as possible. In short, a customer representation can explore implicit heterogeneous relationships and help provide quality services applicable to many areas.
Disclosure of Invention
The present invention overcomes the above-identified deficiencies of the prior art and provides a method and system for real estate customer mapping based on information extraction and multi-attribute decision-making.
The invention applies a key phrase extraction method in information extraction, a multi-attribute decision analysis method and other methods to portray the clients in the real estate industry. The method has the advantages that the user can know the requirements and preferences of all the people and grasp the problems and the coping strategies in real estate marketing, so that valuable references are provided for the operation planning of client development, sales strategies, accurate marketing and the like in the real estate industry, and the method has important practical significance for simplifying the operation links of the industry and saving the cost.
The invention achieves the aim through the following technical scheme: a real estate customer imaging method based on information extraction and multi-attribute decision, characterized by comprising the steps of:
(1) screening data containing basic information of a client and data describing the client from client data of each floor in the real estate field;
(2) taking a customer description text as input, and extracting key phrases in text sentences by applying a key phrase extraction model;
(3) combining the phrases extracted in the step (2), matching key phrases by applying a regular matching template, and further extracting key information as filling and expansion of basic information;
(4) taking expert knowledge in the industry as guidance, taking multiple indexes such as religion, rooting and nesting as labels of weighing customers, and respectively allocating positive and negative correlation coefficients of corresponding customer attributes for the indexes;
(5) calculating weights for each attribute of the client by using an optimized entropy method;
(6) and (5) determining the purchase willingness of the customer by using a plurality of customer analysis indexes obtained by the calculation in the steps (4) and (5).
Wherein, the step (2) specifically comprises the following steps:
21) carrying out word segmentation, named entity recognition and semantic annotation on sentences in the text data to obtain a shape dWord matrix M of swAnd a semantic annotation matrix M of d spWhere d is the total number of texts and s represents the maximum length of the sentence.
22) Will matrix of words MwAnd a semantic matrix MpAnd respectively carrying out vectorization and bitwise splicing through a GloVe model to obtain a vectorized text representation matrix X of d s e, wherein e represents the dimension of a vector, and the value is set to be 124 in subsequent experiments.
23) Consider the key phrase extraction task as a sequence tagging task and tag the text sequence using "BIESOU", where B, I, E represents the beginning, body, and ending words of the key phrase, respectively, S represents a single word constituting the key phrase, U represents a stop word inside the key phrase, and O represents other words.
24) Training a neural network of a Bi-directional long-short term memory network (Bi-LSTM) + Conditional Random Field (CRF) structure as a model for key phrase extraction using a deep learning method.
25) Inputting the text expression matrix X into a neural network model, outputting the model to obtain a sequence marking matrix L of d s, and finally extracting a plurality of key phrases of the sentence through a decoding algorithm.
Wherein, the step (3) specifically comprises the following steps:
31) for each field to be filled, a regular matching score matrix Q of v x p is constructed, where v represents the number of legal values (including null values) of the field and p represents the number of regular expressions set for the field.
32) Matching p regular expressions with the key phrases extracted in the step (2) for the fields needing to be filled to obtain a p-dimensional matching vector V formed by 0 and 1pt
33) From the regular matching score matrix Q and the matching vector VptThe final matching Value of the field is obtained, and then the Value is used as a candidate Value for padding. The specific calculation formula is as follows:
Value=Values[Argmax(Vpt×Q)] (1)
where Values represents the list of legal Values for the field and T represents the matrix transpose operation.
Wherein, the step (5) specifically comprises the following steps:
51) the customer information is classified into four categories of basic information, family information, asset condition, and purchasing motivation.
52) Vectorizing the customer information by applying a GloVe embedding model according to the window boundary as the co-occurrence matrix in the classification in the step 51), and obtaining a customer information vector with dimensions of m × g ═ k, wherein m represents the total field number, and g represents the dimension of a single vector.
53) Clustering the customer information vectors by using a K-means clustering algorithm, and then calculating the weight of the customer attribute in each cluster by using an entropy method, wherein the specific content comprises the following steps:
531. calculating the information entropy of each field X of the overall customer information:
E(X)=-∑x∈Xp(x)logp(x) (2)
where X denotes a field, X denotes a legal value of the field, and p (X) denotes a frequency at which the value X appears in the field X.
532. For each member i in the cluster c, calculating the weight of the field j in the corresponding client information:
Figure BDA0003555868400000031
wherein X,jFields j, X representing the whole datac,jA field j representing data in the cluster c.
533. To prevent the attribute weights from being too large in some dimensions and affecting the result, we will get the attribute weight matrix W for a single clientiNormalization treatment is carried out:
Wi=Normalize(Wi) (4)
wherein, the step (6) specifically comprises the following steps:
61) using the positive and negative correlation matrix M obtained in step (4)attAnd the customer attribute matrix W obtained in step (5)iCalculating to obtain the distribution of the client on a plurality of indexes
Figure BDA0003555868400000032
Where T denotes transpose.
62) Selecting the index with the score exceeding the threshold k as the label of the client.
63) Analyzing the score composition of the client label, and setting the label j at MattIn corresponds to
Figure BDA0003555868400000033
The score of the label j of client i constitutes Ci,jThis can be obtained from the following equation:
Figure BDA0003555868400000034
wherein, Ci,jA higher value in (a) would indicate that the corresponding attribute may be a more prominent feature of the customer.
The invention also includes a real estate customer representation system based on information extraction and multi-attribute decision making, which is characterized in that: the system comprises a building customer data screening module, a key phrase extraction module, a key information extraction module, a customer index module, a customer attribute weight calculation module and a customer purchase intention determination module which are connected in the way, wherein:
the building customer data screening module is used for screening data containing basic information of customers and data describing the customers from customer data of each building in the real estate field;
the key phrase extraction module is used for extracting key phrases in the text sentence by taking the customer description text as input and applying a key phrase extraction model;
the key information extraction module is used for combining the phrases extracted by the key phrase extraction module, matching key phrases by applying a regular matching template, and further extracting key information to be used as filling and expansion of basic information;
the client index module is used for taking expert knowledge in the industry as guidance, taking a plurality of indexes such as religion, root pricking, nest gathering and the like as labels for measuring clients, and respectively distributing positive and negative related coefficients of corresponding client attributes for the indexes;
the client attribute weight calculation module is used for calculating the weight of each attribute of the client by using an optimized entropy method;
and the customer purchase intention determining module is used for determining the purchase intention of the customer by using a plurality of customer analysis indexes obtained by the operation of the customer index module and the customer attribute weight calculating module.
The invention draws the client from 6 aspects of education importance, root intention, investment tendency and the like, and selects the aspect with higher score as the label of the client. Customer portrait experiments conducted by taking a certain real estate company customer in China as an example show that the method has excellent performance for processing similar problems.
The innovation of the invention is that:
(1) the data-driven client portrait is applied to the real estate field for the first time, and the client labels are classified into 6 types according to the characteristics of different crowds by combining the expert knowledge of the real estate field.
(2) A new key phrase extraction model is provided, and a semantic annotation and vectorization technology are combined, so that the model can obtain a good key phrase extraction effect under the condition of less data volume.
The invention has the advantages that:
(1) the key information of the clients in the real estate field is automatically extracted by using an information extraction technology, and the summarized and processed information is presented in the system in the form of client figures, so that the information processing efficiency of the field workers can be effectively improved.
(2) The method can be used for imaging visiting clients in the real estate field, so that practitioners can more intuitively know the requirements, pain points and the like of the clients, and further more accurately put advertisements or call return visits.
Drawings
FIG. 1 is an overall flow chart of the present invention
FIG. 2 is a data processing flow diagram used by the present invention
FIG. 3 is a flow chart of a text information extraction section in the present invention
FIG. 4 is a partitioning of client attributes in the present invention
FIG. 5 is a filling of a text information extraction section in the present invention
FIG. 6 is an example of a client representation result in the present invention
Fig. 7(a) to 7(j) are schematic diagrams of the attribute distribution of each tag population after the present invention is portrayed, where fig. 7(a) is an age distribution of each tag population, fig. 7(b) is a household registration distribution of each tag population, fig. 7(c) is an occupation distribution of each tag population, fig. 7(d) is a family population distribution of each tag population, fig. 7(e) is a distribution of a co-living situation of each tag population, fig. 7(f) is a distribution of item attention of each tag population, fig. 7(g) is a distribution target budget of placement of each tag population, fig. 7 (h) is a distribution of placement causes of each tag population, fig. 7(i) is a distribution of living situations of each tag population, and fig. 7(j) is a distribution of each tag population.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.
The invention provides a real estate customer portrait method based on information extraction and multi-attribute decision, the main tasks of the method are shown in figure 1, the data processing flow is shown in figure 2, the method comprises:
(a) basic information is screened from a client basic information data set of a real estate company in a certain city of Zhejiang; and extracting key phrase information from the customer description text.
(b) After the client description text is vectorized by using a semantic annotation and vectorization technology, extracting key phrases in the client description text through a key phrase extraction model based on Bi-LSTM + CRF, and specifically comprising the following steps:
b1) description text vectorization:
performing word segmentation and semantic annotation on the Chinese text to obtain a group of words and a group of corresponding semantic annotations, and collecting the integral words and semantic annotationsDenoted by T and E, respectively. Let T number of words be n, then calculate n x n words co-occurrence matrix M for TtWherein
Figure BDA0003555868400000051
Indicating the number of times the words i and j occur simultaneously within the same co-occurrence window (i 1,2,3, …, n, j 1,2,3, …, n), the size of the co-occurrence window is typically set to 2 or 3. Similarly, we also compute M semantic annotation co-occurrence matrix M for EeWhere m is the kind of label that can appear in the semantic annotation. Finally, respectively aligning the vocabulary co-occurrence matrix MtCo-occurrence matrix M with semantic annotationeAnd (3) applying a GloVe word embedding model to vectorize words and semantic labels.
b2) Key phrase extraction:
firstly, a description text is marked by a 'BIESOU' label, wherein 'B', 'I' and 'E' respectively mark a beginning word, a main word and an ending word of a key phrase, an 'S' mark forms a single word of the key phrase, a 'U' marks an irrelevant word in the middle of the key phrase, and an 'O' marks other words to construct a training set with the size of d. Subsequently, the texts and semantic labels in the training set are vectorized through step b1) and input into a key phrase extraction model composed of Bi-LSTM + CRF for training, the network structure of the model is shown in fig. 3, and the main process can be represented as follows:
X=[T:E] (7)
H=BiLSTM(X) (8)
H′=CRF(H) (9)
Y=σ(WkH′+bk) (10)
wherein, [:]representing a concatenation operation, T and E representing a word vector and a semantic vector, respectively. σ is an activation function, WkAnd bkAre trainable parameters. Y represents the probability distribution of the sentence-token sequence output by the model.
And finally, decoding the Y by using a decoding algorithm to obtain a plurality of key phrases of the sentence.
(c) And (c) applying a template matching algorithm based on a regular expression set to the key phrases obtained in the step (b), and extracting key information of the client for data filling or expansion. The method specifically comprises the following steps:
c1) building a regular expression set for the field:
all legal values in the field, including null values, are first determined. Then, based on the occurrence of these legal values in the key phrases in step (b), several corresponding regular expressions are constructed for them, for example, in the "child academic" field, we construct several corresponding regular expressions for the legal values "upper school". Thus, we have built p regular expressions for n legal values in a field.
c2) Constructing a matching score matrix:
according to the n legal values and the p regular expressions in the step c1), constructing an n-p relation matrix Q, wherein Qi,jIndicating the score for the legal value i when the matching of the regular expression j with the key phrase is successful, and for the legal value i and the regular expression j, Q when they correspond to each otheri,jLarger, whereas when they contradict each other Qi,jIs smaller.
c3) Matching to get the field value
Matching the key phrases obtained in the step (b) by using p regular expressions in the step c1) to obtain a 0/1 matching vector V with the dimension pptWhere 0 indicates a failure in matching and 1 indicates a success in matching. Will VptAnd c) operating with the relation matrix Q in the step c1) to obtain the matching score of each legal Value of the field, and taking the legal Value with the highest score as the final matching Value of the key phrase:
Value=Values[Argmax(Vpt×QT)] (1)
(d) using a K-means clustering algorithm to obtain a clustering cluster of the customer information, and using the result to calculate the attribute weight, wherein the specific process comprises the following steps:
d1) attributes are divided into four broad categories: basic information, family situation, purchase intention, financial status, as shown in figure x. Family situation is intended to understand the customer's family and its potential needs; buying intent refers to the customer's preference for houses; the financial status is intended to understand the purchasing power of the customer.
d2) Taking the boundary of the four types of attributes in the step d1) as a co-occurrence window, calculating a co-occurrence matrix of the customer attributes, and obtaining a feature representation B of the customer attributes through a GloVe model. Specifically, for each attribute field of a client, we construct its specific feature representation according to the attribute class in which it resides, and then concatenate them as representation B of the client's overall attributes.
d3) Using the customer attribute representation B obtained in the step d2) as a vector, and obtaining a clustering cluster of customer information by using a K-means clustering algorithm.
d4) For a cluster c, calculating the weight of each attribute of the client information corresponding to the member in the cluster. Specifically, if client i ∈ c then for i there is: wi=WcWherein W isiEach attribute weight, W, representing icThis can be obtained from the following equation:
Figure BDA0003555868400000071
Wc=Normalize(Wc) (13)
wherein p isglb(x) Denotes the frequency of x over the entire data set, pc(x) Indicates the frequency of x in the cluster c and j indicates the data field.
(e) The customer analysis index obtained by calculation determines the purchase intention of the customer
e1) Comparing the attribute vector W of client i obtained in step (d)iAnd an n-M relation matrix M constructed based on expert experience of real estate fieldattAnd calculating the distribution of various indexes of the client together:
Figure BDA0003555868400000072
where n represents the number of customer metrics and m represents the dimension of the customer attributes.
e2) Selecting as the index having a score exceeding a threshold kThe label of the customer. Analyzing the score composition of the client label, and setting the label j in a relation matrix MattIn corresponds to
Figure BDA0003555868400000073
The score of label j for customer i constitutes Ci,jThis can be obtained from the following equation:
Figure BDA0003555868400000074
wherein, Ci,j∈RmAnd C isi,jA higher value in (b) indicates that the corresponding attribute may be a more prominent feature for the customer.
The invention also comprises a real estate customer portrait system based on information extraction and multi-attribute decision, which comprises a building customer data screening module, a key phrase extraction module, a key information extraction module, a customer index module, a customer attribute weight calculation module and a customer purchase intention determination module which are connected with each other, wherein:
the building customer data screening module is used for screening data containing basic information of customers and data describing the customers from customer data of each building in the real estate field;
the key phrase extraction module is used for extracting key phrases in the text sentences by using the customer description texts as input and applying a key phrase extraction model;
the key information extraction module is used for combining the phrases extracted by the key phrase extraction module, matching key phrases by applying a regular matching template, and further extracting key information to be used as filling and expansion of basic information;
the client index module is used for taking expert knowledge in the industry as guidance, taking a plurality of indexes such as religion, root pricking, nest gathering and the like as labels for measuring clients, and respectively distributing positive and negative related coefficients of corresponding client attributes for the indexes;
the client attribute weight calculation module is used for calculating the weight of each attribute of the client by using an optimized entropy method;
and the customer purchase intention determining module is used for determining the purchase intention of the customer by using a plurality of customer analysis indexes obtained by the operation of the customer index module and the customer attribute weight calculating module.

Claims (6)

1. A method for real estate customer portrayal based on information extraction and multi-attribute decision making, characterized by: the method comprises the following steps:
(1) screening data containing basic information of a client and data describing the client from client data of all floors in the real estate field;
(2) taking a customer description text as input, and extracting key phrases in text sentences by applying a key phrase extraction model;
(3) combining the phrases extracted in the step (2), matching key phrases by applying a regular matching template, and further extracting key information as filling and expansion of basic information;
(4) taking expert knowledge in the industry as guidance, taking multiple indexes such as religion, rooting and nesting as labels for measuring customers, and respectively allocating positive and negative correlation coefficients of corresponding customer attributes for the indexes;
(5) calculating weights for each attribute of the client by using an optimized entropy method;
(6) and (5) determining the purchase intention of the customer by using a plurality of customer analysis indexes obtained by calculation in the steps (4) and (5).
2. A method for real estate customer representation based on information extraction and multi-attribute decision making as claimed in claim 1 wherein: the step (2) specifically comprises the following steps:
21) carrying out word segmentation, named entity recognition and semantic annotation on sentences in the text data to obtain a word matrix M in the form of d swAnd a semantic annotation matrix M of d spWhere d is the total number of texts and s represents the maximum length of the sentence.
22) Will matrix of words MwAnd a semantic matrix MpRespectively obtaining a vectorized text representation matrix X of d s e by performing bit-by-bit splicing after vectorization through a GloVe model, wherein e represents the dimension of a vector, and the subsequent realityThe value is set to 124 in the experiment.
23) Consider the key phrase extraction task as a sequence tagging task and tag the text sequence with "BIESOU", where B, I, E denotes the beginning, body, and end of the key phrase, respectively, S denotes the individual words that make up the key phrase, U denotes the stop word that is inside the key phrase, and O denotes other words.
24) Training a neural network of a Bi-directional long-short term memory network (Bi-LSTM) + Conditional Random Field (CRF) structure as a model for key phrase extraction using a deep learning method.
25) Inputting the text expression matrix X into a neural network model, outputting the model to obtain a sequence marking matrix L of d s, and finally extracting a plurality of key phrases of the sentence through a decoding algorithm.
3. A method for real estate customer representation based on information extraction and multi-attribute decision making as claimed in claim 1 wherein: the step (3) specifically comprises the following steps:
31) for each field to be filled, a regular matching score matrix Q of v x p is constructed, where v represents the number of legal values (including null values) of the field and p represents the number of regular expressions set for the field.
32) Matching p regular expressions with the key phrases extracted in the step (2) for the fields needing to be filled to obtain a p-dimensional matching vector V formed by 0 and 1pt
33) From the regular matching score matrix Q and the matching vector VptGet the final matching Value of this field, and then use Value as a candidate for padding. The specific calculation formula is as follows:
Value=Values[Argmax(Vpt×Q)] (1)
where Values represents the list of legal Values for the field and T represents the matrix transpose operation.
4. A method for real estate customer representation based on information extraction and multi-attribute decision making as claimed in claim 1 wherein: the step (5) specifically comprises the following steps:
51) the customer information is classified into four categories of basic information, family information, asset condition, and purchasing motivation.
52) Vectorizing the client information by applying a GloVe embedding model according to the classification in the step 51) as a window boundary of the co-occurrence matrix to obtain a client information vector with dimensions m × g ═ k, wherein m represents the total field number, and g represents the dimension of a single vector.
53) Clustering the customer information vectors by using a K-means clustering algorithm, and then calculating the weight of the customer attribute in each cluster by using an entropy method, wherein the specific content comprises the following steps:
531. calculating the information entropy of each field X of the overall customer information:
E(X)=-∑x∈Xp(x)logp(x) (2)
where X represents a field, X represents a legal value for the field, and p (X) represents the frequency with which value X appears in field X.
532. For each member i in the cluster c, calculating the weight of the field j in the corresponding client information:
Figure FDA0003555868390000021
wherein, X,jFields j, X representing the whole datac,jField j representing data in cluster c.
533. To prevent the attribute weights from being too large in some dimensions and affecting the result, we will get the attribute weight matrix W for a single customeriNormalization treatment is carried out:
Wi=Normalize(Wi) (4) 。
5. a method for real estate customer representation based on information extraction and multi-attribute decision making as claimed in claim 1 wherein: the step (6) specifically comprises the following steps:
61) using the positive and negative correlation matrix M obtained in step (4)attAnd (5) obtaining a customer attribute matrix WiCalculating to obtain the distribution of the client on a plurality of indexes
Figure FDA0003555868390000031
Where T denotes transpose.
62) Selecting the index with the score exceeding the threshold k as the label of the client.
63) Analyzing the score composition of the client label, and setting the label j at MattIn response to
Figure FDA0003555868390000032
The score for label j of customer i constitutes Ci,jThis can be obtained from the following equation:
Figure FDA0003555868390000033
wherein, Ci,jA higher value in (a) would indicate that the corresponding attribute may be a more prominent feature of the customer.
6. A system for implementing a real estate customer representation method based on information extraction and multi-attribute decision making as recited in claim 1 wherein: the system comprises a building customer data screening module, a key phrase extraction module, a key information extraction module, a customer index module, a customer attribute weight calculation module and a customer purchase intention determination module which are connected with each other, wherein:
the building customer data screening module is used for screening data containing customer basic information and data describing customers from customer data of each building in the real estate field;
the key phrase extraction module is used for extracting key phrases in the text sentences by taking the customer description texts as input and applying a key phrase extraction model;
the key information extraction module is used for combining the phrases extracted by the key phrase extraction module, matching key phrases by applying a regular matching template, and further extracting key information to be used as filling and expansion of basic information;
the client index module is used for taking expert knowledge in the industry as guidance, taking a plurality of indexes such as religion, root pricking, nest gathering and the like as labels for measuring clients, and respectively distributing positive and negative correlation coefficients of corresponding client attributes for the indexes;
the client attribute weight calculation module is used for calculating the weight of each attribute of the client by using an optimized entropy method;
and the customer purchase intention determining module is used for determining the purchase intention of the customer by using a plurality of customer analysis indexes obtained by the calculation of the customer index module and the customer attribute weight calculating module.
CN202210276309.8A 2022-03-21 2022-03-21 Real estate customer portrait method and system based on information extraction and multi-attribute decision Pending CN114722810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210276309.8A CN114722810A (en) 2022-03-21 2022-03-21 Real estate customer portrait method and system based on information extraction and multi-attribute decision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210276309.8A CN114722810A (en) 2022-03-21 2022-03-21 Real estate customer portrait method and system based on information extraction and multi-attribute decision

Publications (1)

Publication Number Publication Date
CN114722810A true CN114722810A (en) 2022-07-08

Family

ID=82237223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210276309.8A Pending CN114722810A (en) 2022-03-21 2022-03-21 Real estate customer portrait method and system based on information extraction and multi-attribute decision

Country Status (1)

Country Link
CN (1) CN114722810A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934468A (en) * 2023-09-15 2023-10-24 成都运荔枝科技有限公司 Trusted client grading method based on semantic recognition
CN117035837A (en) * 2023-10-09 2023-11-10 广东电力交易中心有限责任公司 Method for predicting electricity purchasing demand of power consumer and customizing retail contract

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934468A (en) * 2023-09-15 2023-10-24 成都运荔枝科技有限公司 Trusted client grading method based on semantic recognition
CN116934468B (en) * 2023-09-15 2023-12-22 成都运荔枝科技有限公司 Trusted client grading method based on semantic recognition
CN117035837A (en) * 2023-10-09 2023-11-10 广东电力交易中心有限责任公司 Method for predicting electricity purchasing demand of power consumer and customizing retail contract
CN117035837B (en) * 2023-10-09 2024-01-19 广东电力交易中心有限责任公司 Method for predicting electricity purchasing demand of power consumer and customizing retail contract

Similar Documents

Publication Publication Date Title
CN109493166B (en) Construction method for task type dialogue system aiming at e-commerce shopping guide scene
CN111428053B (en) Construction method of tax field-oriented knowledge graph
US20230195773A1 (en) Text classification method, apparatus and computer-readable storage medium
CN110096575B (en) Psychological portrait method facing microblog user
CN102314417A (en) Method for identifying Web named entity based on statistical model
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN114722810A (en) Real estate customer portrait method and system based on information extraction and multi-attribute decision
CN112070543B (en) Method for detecting comment quality in E-commerce website
Huang et al. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow
CN110175857B (en) Method and device for determining optimal service
CN115564393A (en) Recruitment requirement similarity-based job recommendation method
CN115470871B (en) Policy matching method and system based on named entity recognition and relation extraction model
CN116562265B (en) Information intelligent analysis method, system and storage medium
CN114266443A (en) Data evaluation method and device, electronic equipment and storage medium
Li et al. Mining online reviews for ranking products: A novel method based on multiple classifiers and interval-valued intuitionistic fuzzy TOPSIS
CN116821372A (en) Knowledge graph-based data processing method and device, electronic equipment and medium
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN111651606A (en) Text processing method and device and electronic equipment
CN117314593B (en) Insurance item pushing method and system based on user behavior analysis
CN117573894A (en) Knowledge graph-based resource recommendation system and method
Mary et al. ASFuL: Aspect based sentiment summarization using fuzzy logic
CN107609921A (en) A kind of data processing method and server
CN117391765A (en) Construction method for pharmacy member group portraits
CN112818215A (en) Product data processing method, device, equipment and storage medium
Bochkaryov et al. Application of the ensemble clustering algorithm in solving the problem of segmentation of users taking into account their loyalty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination