WO2022126963A1

WO2022126963A1 - Customer profiling method based on customer response corpora, and device related thereto

Info

Publication number: WO2022126963A1
Application number: PCT/CN2021/090166
Authority: WO
Inventors: 孙向欣
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-12-16
Filing date: 2021-04-27
Publication date: 2022-06-23
Also published as: CN112507116A; CN112507116B

Abstract

An embodiment of the present application belongs to the field of big data, and applies to the field of smart communities, and relates to a customer profiling method based on customer response corpora, and a device related thereto, comprising: tokenizing and adjusting the customer response corpora to obtain target keywords; performing vector conversion on the customer response corpora on the basis of a feature dictionary constructed by means of the target keywords to obtain corpora feature vectors; processing true values and intent labels on the basis of a preset policy to obtain true value-derived variables; screening corpora-derived variables and the true value-derived variables on the basis of a single variable analysis method to obtain target variables; adjusting a preset first profile model on the basis of the target variables to obtain a second profile model, and training the second profile model on the basis of variable values corresponding to the target variables to obtain a target profile model; and inputting received values of variables to be recognized into the target profile model to obtain a customer profile. The target profile model can be stored in a blockchain. The present application generates a more accurate customer profile.

Description

Customer portrait method and related equipment based on customer response corpus

This application claims the priority of the Chinese patent application filed on December 16, 2020 with the application number 202011487411.X and the invention titled "Customer portrait method and related equipment based on customer response corpus", the entire content of which is Incorporated herein by reference.

technical field

The present application relates to the field of big data technology, and in particular, to a customer portrait method and related equipment based on customer response corpus.

Background technique

With the continuous innovation and development of computer technology, computer technology has been widely used in all walks of life. Among them, big data technology occupies an important position and is widely used, especially in customer behavior analysis, customer prediction and customer portraits. For customer portraits, it is necessary to apply massive customer record data, and the computer learns massive customer record data through the customer portrait model to better understand customers.

The inventor realizes that most of the current customer portrait model training is simply to extract a small number of significant labels for subsequent learning of the customer portrait model. This method only uses a very limited part of the massive customer record data, resulting in low accuracy of the customer portrait output by the trained customer portrait model, which is difficult for subsequent reuse, causing a lot of inconvenience.

SUMMARY OF THE INVENTION

The purpose of the embodiments of the present application is to propose a customer portrait method, device, computer equipment and storage medium based on customer response corpus, which can obtain more accurate customer portraits.

In order to solve the above technical problems, the embodiment of the present application provides a customer portrait method based on customer response corpus, and adopts the following technical solutions:

A customer portrait method based on customer response corpus, comprising the following steps:

receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;

Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;

A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;

Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;

The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;

Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;

Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait. ,

In order to solve the above technical problems, the embodiment of the present application also provides a customer portrait device based on customer response corpus, which adopts the following technical solutions:

A customer portrait device based on customer response corpus, comprising:

The receiving module is configured to receive the client response corpus, the intent label and the true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one correspondence Mapping relations;

A word segmentation module, configured to perform word segmentation operations on the customer response corpus, obtain target words, adjust the target words, and obtain target keywords;

A building module is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and convert the vector value of each dimension in the corpus feature vector , respectively as the variable values of the preset corpus-derived variables of the corresponding dimensions;

A determination module, configured to perform variable determination operations on the real value and the intent label based on different preset strategies, to obtain real value derived variables;

a screening module, configured to use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable;

A training module for adjusting the preset first portrait model based on the target variable, obtaining a second portrait model, and training the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model; And

The input module is configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain the customer portrait.

In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:

A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the following steps of the customer portrait method based on the customer response corpus:

Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.

In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:

A computer-readable storage medium, on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of the following-described customer portrait method based on customer response corpus are implemented:

Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:

This application effectively uses a large number of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out variables that are highly relevant to customer portraits, and then realize the By entering the values of a small number of variables in the final target portrait model, more accurate customer portraits can be obtained. The output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.

Description of drawings

In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;

Fig. 2 is a flow chart of an embodiment of the method for customer portrait based on customer response corpus according to the present application;

3 is a schematic structural diagram of an embodiment of a client portrait device based on client response corpus according to the present application;

FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Reference numerals: 200, computer equipment; 201, memory; 202, processor; 203, network interface; 300, customer portrait device based on customer response corpus; 301, receiving module; 302, word segmentation module; 303, building module; 304 305, a screening module; 306, a training module; 307, an input module.

Detailed ways

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

As shown in FIG. 1 , the system architecture 100 may include

terminal devices

101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the

terminal devices

101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the

terminal devices

101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

The

terminal devices

101, 102, and 103 may be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.

The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the

terminal devices

101 , 102 , and 103 .

It should be noted that the client portrait method based on the client response corpus provided by the embodiment of the present application is generally executed by the server/terminal device, and accordingly, the client portrait device based on the client response corpus is generally set in the server/terminal device.

It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

Continuing to refer to FIG. 2 , there is shown a flow chart of an embodiment of a method for customer portraiture based on customer response corpus according to the present application. The described customer portrait method based on customer response corpus includes the following steps:

S1: Receive a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship.

In this embodiment, the customer response corpus is the customer's historical response corpus in the question-and-answer dialogue. This application can extract the customer's response corpus within the past period of time (such as the past six months). By receiving customer response corpus, intent labels and ground truth values, it facilitates subsequent data processing. The intent label refers to the customer's intent marked according to the customer response corpus, wherein the generation of the intent label can be generated by a pre-trained intent classification model, or it can be manually annotated. In the scenario of urging customers to repay the loan, the intent labels can be: willingness to repay the loan, repayment of the loan, and no willingness to repay the loan. The true value refers to the actual actions of the customer. For example, in the scenario of urging the customer to repay the loan, the true value is whether the customer repays the loan. In the phone call scenario, the true value is whether the customer rejects the call. When rejecting the call, the corresponding customer response corpus is None, and the intent label can be the customer rejecting the call or None.

In this embodiment, the electronic device (for example, the server/terminal device shown in FIG. 1 ) on which the client portrait method based on the client response corpus runs can receive the client response corpus, the intent tag and the actual value. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .

S2: Perform word segmentation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords.

In this embodiment, the segmentation of the customer response corpus is realized by segmenting the customer response corpus, which is convenient for further processing. The target keywords are obtained by adjusting the split customer response corpus.

Specifically, the steps of performing word segmentation on the customer response corpus to obtain target words, adjusting the target words, and obtaining target keywords include:

Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;

Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;

Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;

The initial keywords under each of the intent tags are filtered to obtain the target keywords.

In this embodiment, the initial word segmentation dictionary in this application is a default dictionary of jieba, which can be obtained directly from an open source website. The keyword extraction method preset in this application is the TF-IDF (term frequency-inverse document frequency, term frequency-inverse document frequency) method. The preset initial word segmentation dictionary is adjusted based on the customer response corpus, and the customer response word segmentation dictionary is obtained, so that the customer response word segmentation dictionary has the characteristics of the scene corresponding to the customer response corpus. The word segmentation is performed on the customer response corpus based on the customer response word segmentation dictionary, so as to reduce the phenomenon of wrong word segmentation and achieve better word segmentation results. Then, the TF-IDF (term frequency-inverse document frequency, term frequency-inverse document frequency) method is used to extract the target words under each intent label, and then the extracted target words are screened. A specific screening method may be: screen out the top n initial keywords with the highest importance under each intent label as the target keywords, wherein the importance is directly output by the TF-IDF method. TF-IDF is used to assess the importance of a word to a document set or one of the documents in a corpus. n is set to 50 for this application. When there are 50 intent tags, a total of 2500 target keywords are finally obtained. In the subsequent process of generating the feature dictionary, based on the target keyword and the preset word placeholder (nan), a feature dictionary is generated, that is, the feature dictionary is composed of a total of 2501 words. Filter the extracted target words. The specific screening method may also be: identifying the number of different initial keywords, and calculating the frequency of each initial keyword based on the number; sorting the initial keywords based on the frequency; deleting words with low frequency The target keyword is obtained from the initial keyword at the preset threshold.

Wherein, the step of adjusting the preset initial word segmentation dictionary based on the customer response corpus, and obtaining the customer response word segmentation dictionary includes:

Identify the customer response corpus under the same intent label;

Perform word segmentation on the customer response corpus under the current intent label based on the preset initial word segmentation dictionary to obtain the first feature word;

Extracting the first feature word based on the keyword extraction method to obtain a second feature word;

Adjust the second feature word to obtain a unique word;

The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.

In this embodiment, in the AI collection scenario, the unique words refer to the collection unique words. The method of extracting collection-specific words is to organize the customer corpus generated by AI collection in production according to the classification of intent labels. The format is shown in Table 2:

Table 2

After sorting into the above format, first use the jieba default dictionary to segment the customer response corpus (also called customer vocabulary) under each intent tag, and then use the TF-IDF method to classify the feature words under each intent tag. (also known as key feature words) for extraction. For example, the feature words under the intent label 1 are: I save, on, enough, above, already, in... According to the feature words extracted above, according to the collection business meaning of the intent label 1, the feature words under the label are manually sorted according to the scene conditions. Make combinations to generate unique words: save on, save enough, save on, save in, have saved…. Add unique words to jieba default dictionary to generate customer response word segmentation dictionary. According to the customer response word segmentation dictionary, in the subsequent word segmentation operation on the customer response corpus, when a unique word is encountered, the word segmentation will be performed first according to the unique word, and when the unique word is not found, the word segmentation will be performed according to the customer response word segmentation dictionary.

The default general jieba default dictionary will cause some unique words in different scenarios to be misclassified, resulting in the subsequent feature extraction, the extracted features are not representative and applicable, and cannot be used well for subsequent features. operate. For example: Table 1 below shows the word segmentation in the AI collection scenario:

客户应答语料customer response corpus	结巴默认词典Stuttering Default Dictionary	客户应答分词词典customer response word segmentation dictionary
我存上了。I saved.	我存上了。I saved.	我存上了。I saved.
我存不了了。I can't save.	我存不了了。I can't save.	我存不了了。I can't save.

Table 1

When the word segmentation operation is performed according to the default general jieba word segmentation dictionary, according to the word segmentation results, the features that can be extracted are: I save, on, no, no. However, when such features are delivered to downstream services (such as AI collectors), downstream services cannot intuitively understand these features and use them in practice. However, after the word segmentation operation is performed according to the customer response word segmentation dictionary, the features that can be extracted are: I, saved, cannot be saved, and sent words that conform to the scene characteristics such as "saved" and "cannot be saved" to downstream services , which can help the downstream service to understand the customer's situation more intuitively and help the downstream service to make better policy decisions. In order to better realize word segmentation, the present application supplements the unique words of the corresponding scene on the basis of the existing jieba word segmentation dictionary, thereby establishing a customer response word segmentation dictionary.

S3: Construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and use the vector value of each dimension in the corpus feature vector as The variable value of the preset corpus-derived variable corresponding to the dimension.

In this embodiment, the application uses the vector value of each dimension in the corpus feature vector as the variable value of the preset corpus-derived variable of the corresponding dimension, so as to obtain more abundant variables, and through the above steps such as word segmentation processing The determined variable values are also more accurate. Based on target keywords and preset word placeholders (nan), a feature dictionary is formed. Replace the words in the customer's vocabulary that are not in the target keyword with word placeholders (nan). According to the feature dictionary, one-hot-encoding (one-hot encoding) is performed on the customer response corpus to create a corpus feature vector (ie, customer vocabulary features). In the collection scenario, the form of the feature dictionary is as follows in Table 3:

词ID word ID		词word
00	存上save
11	存够enough
22	存入deposit
33	已经存already saved
44	存好save
55	没钱no money
66	没工资no salary
77	困难difficulty
88	没办法no way
99	还不上not yet
……	……
25002500	nannan

table 3

According to the above feature dictionary, each customer response corpus will be converted into a 2501-dimensional corpus feature vector. The word ID in the feature dictionary determines the words corresponding to different dimensions during the vector conversion process. One-hot-encoding (one-hot encoding) method is used to easily restore the text meaning of the features that have a significant effect on the customer's default, which is conducive to the establishment of more understandable and practical user portraits. Of course, this application can also choose according to actual needs. Other vector conversion methods are applicable. A specific example of the corpus feature vector is as follows: In this application, "save" is set to 0, then "save" is at the first position of the word vector, that is, the first dimension of the word vector. When the word "Save" exists in the word segmentation result of the customer response corpus, the value of the first dimension in the converted customer response corpus is "1", otherwise it is 0; and the second dimension of the word vector is "Save" Enough", if the word segmentation result of the customer's response corpus hits "save enough", the second dimension of the corpus feature vector is 1, otherwise it is 0, and so on. An example is shown in Table 4 below:

Table 4

S4: Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable.

In this embodiment, the real-value derived variable is determined through different preset strategies, so as to realize the expansion of the variable, so as to facilitate the subsequent selection of variables to be more abundant.

Specifically, the real value includes a default real value, the default real value and the intention label are in a one-to-one mapping relationship, and the real value and the intention label are variable based on different preset strategies Determining the operation, the steps to obtain the true value derived variable include:

Calculate the ratio of the number of default true values to the number of customers in each intent label separately to obtain the default ratio;

Taking the default rate greater than the pre-calculated total default rate as the significant default rate, and taking the intention label corresponding to the significant default rate as the significant label;

Deriving a variable of times of repayment refusal based on the significant label, and deriving a variable of the number of times of lying and the number of rejected calls based on the intention label;

The variable of the number of times of refusing payment, the number of lying and the number of rejecting calls is used as the true value derivative variable.

In this embodiment, the true value of default refers to that in the collection scenario, the customer fails to repay the loan within the agreed time limit; or, in the logistics scenario, the customer fails to deliver the goods on the agreed date or the delivery quality is lower than the agreed quality, etc. ; All of the above situations belong to the default of the customer. If the customer defaults, the true value of the default will be generated correspondingly to mark that the customer has defaulted. This application extracts the customer's intent tags over a period of time in the past. The intention labels generated by customers at one time are contingent to a certain extent, and more comprehensive information can be obtained by using the repayment intention labels generated by customers in the past period of time. If a customer indicated that he had made a deposit in the current collection, but had been questioning whether the AI made a call in the past few months, this probably means that the customer has recognized the AI call and is therefore perfunctory. In response to this part of the information, on the one hand, a correlation analysis is made between the intention labels generated by the collection and the customer default rate. According to the relevant analysis, labels that may have a significant effect on the customer default prediction are derived. Among them, the calculation method of the default rate is: under each intention label, the ratio of the actual number of customers who have defaulted to the total number of the label is counted. For example: output intention label when collecting collection - the number of customers who refuse to pay is 100, and among these 100 people, 50 people actually defaulted, then the default rate under this intention label is 50%. After such correlation analysis, the default rate under some intent labels is much higher or lower than the overall customer default rate, so these labels are considered to have a significant role in predicting customer default. Based on these labels, a series of variables can be derived. If the default rate of the label of refusal to repay is much higher than the overall customer default rate, then variables can be derived: the number of refusals to repay in the past 1 month, the number of refusals to repay in the past 3 months, and the number of refusals to repay in the past 6 months times, etc. Extract the customer's intent labels in the past period and the customer's real repayment performance in the past period. Flag customers who are inconsistent with their actual repayment performance during the collection process. derived variables. For example, the customer promised to repay the loan when collecting collections, but a breach of contract actually occurred, which proves that the customer lied when collecting collections. The variables that can be derived based on this are: the number of times of commitment to repayment but default in the past 1 month, the number of times of commitment to repayment but default in the past 3 months, the number of times of commitment to repayment but default in the past 6 months, etc. Extract the customer's AI collection call reception in the past period of time. Create a derived variable for AI's collection of phone calls. Such as the number of unanswered calls for three consecutive months, etc.

S5: Use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable.

In this embodiment, the independent variable is subjected to univariate analysis and screening to obtain the target variable. After the above derived variables are processed, as independent variables, whether the customer defaults as a dependent variable, the lightgbm method is used to model. Validate and trace the model. Variables that have a significant and stabilizing effect on customer default are screened out for stable and standardized output of user profile variables.

Specifically, the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable includes:

Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;

Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;

An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.

In this embodiment, the missing rate refers to the missing condition of the variable value corresponding to the variable. Univariate analysis was performed on the independent variables, and the missing rate for each variable was calculated. Remove independent variables with a missing rate greater than a preset threshold. The missing rate in this application is 95%. If the missing rate of an independent variable x _n reaches 95%, the independent variable is deleted. Calculate the correlation coefficient between each independent variable and other independent variables, and delete independent variables that are highly correlated with other independent variables. If the correlation _coefficients of the independent variables: x ₁ and x ₂ , _x ₅ . It is helpful to reduce the number of independent variables by screening and remove more than one independent variables.

Wherein, the step of calculating the correlation coefficient between the initial independent variables includes:

The characteristics of the correlation coefficient are:

Among them, ρ _{X, Y} represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u _x represents the expectation of X, and u _y represents the expectation of Y.

In this embodiment, the Pearson correlation coefficient between the two variables x, y is calculated by the above formula, and the correlation coefficient is equal to the covariance of the two variables divided by the standard deviation of the two variables. where cov(X, Y) represents the covariance between two variables X and Y, E represents the expectation, u _x represents the expectation E(X) of X, and u _y represents the expectation E(Y) of Y.

S6: Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model.

In this embodiment, the variables filtered through the above step S5, that is, the univariate analysis method, are put into a preset first portrait model to obtain an intermediate portrait model, wherein the first portrait model is the lightgbm model, according to the intermediate portrait model The variable importance output by the portrait model, delete the variables whose variable importance is lower than the preset importance threshold, and obtain the first target variable set and the second portrait model. The second portrait model is trained based on the real values corresponding to the first target variable set to obtain a target portrait model.

Specifically, the step of training the second portrait model based on the true value corresponding to the target variable, and obtaining the target portrait model further includes:

Train the second portrait model based on the true value corresponding to the target variable to obtain an initial portrait model;

Based on the target variable, receive the true value corresponding to the target variable in the next time period as an intertemporal sample;

Calculate the stability of each target variable in the initial portrait model on the intertemporal sample by using the intertemporal sample;

Adjust the target variable based on the stability to obtain the adjusted target variable;

Adjust the initial portrait model based on the adjusted target variable, obtain the adjusted initial portrait model, and train the adjusted initial portrait model based on the actual value corresponding to the adjusted target variable, and obtain the Target portrait model.

In this embodiment, based on the target variable, the actual value corresponding to the target variable in the next time period is received as an inter-period sample, wherein the next time period may be a new subsequent month; The inter-period sample verifies the second portrait model, and calculates the stability of each target variable in the second portrait model on the inter-period sample, wherein the stability is measured by PSI, and the calculation formula of PSI is as follows:

in,

represents the actual proportion of the intertemporal samples in all true values,

Indicates the expected proportion of the intertemporal sample in all true values. After calculating the stability of each target variable, delete the target variable with PSI>0.1 in the second portrait model, and obtain the second target variable set, that is, the adjusted target variable. The present application can also continue to track new intertemporal samples for at least the following two months to determine the stability of the performance of the second portrait model on the new intertemporal samples.

S7: Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.

In this embodiment, the variable to be identified is the target variable that is finally determined to be added to the target portrait model. After the above steps, the target portrait model can not only stably predict the default probability of customers, but also can generate user portraits according to the values of the variables to be identified in the input model, which can reflect customer risks more comprehensively and intuitively, and help urge employees to improve their performance. Develop collection strategies effectively. For example: for customer A, the value of the variable to be identified is obtained, and the value of the variable to be identified is input into the target portrait model, and the target portrait model outputs the predicted default probability and the label with a greater correlation with the default probability, Create customer portraits. For example, the target portrait model outputs a default probability of 0.9, and outputs labels such as "complaint", "no money", "annoying", one complaint in the past 3 months, and a promise to repay but not once in the past 6 months. . By transmitting customer portraits to user terminals, relevant users (such as collection personnel in collection scenarios) can better understand key customer information and formulate follow-up collection strategies. At the same time, the user portraits output by this application enable the company to understand customers more comprehensively and stably, and manage customer risks. Make full use of a large number of precious natural language text resources. The establishment of customer portraits based on historical customer response corpus and intent labels can accurately display the key risk points of customers and complement the traditional portrait model. It is beneficial for the relevant departments of the company to manage customers, make more reasonable resource allocation, and save the company's operating costs. At the same time, by directing resources to high-risk customers in the customer profile, the interruption rate of low-risk customers in the customer profile is reduced, and the customer experience is improved.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned target portrait model, the above-mentioned target portrait model can also be stored in a node of a blockchain.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

The present application can be applied in the field of smart communities, thereby promoting the construction of smart cities.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

Further referring to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a customer portrait device based on customer response corpus, and the device embodiment corresponds to the method embodiment shown in FIG. 2 . , the device can be specifically applied to various electronic devices.

As shown in FIG. 3 , the customer portrait device 300 based on the customer response corpus described in this embodiment includes: a receiving module 301, which is configured to receive the customer response corpus, the intention label and the truth value, wherein the customer response corpus and the The intent label has a one-to-one mapping relationship, and the intent label and the real value have a one-to-one mapping relationship; the word segmentation module 302 is used to perform a word segmentation operation on the customer response corpus to obtain target words, The target word is adjusted to obtain the target keyword; the building module 303 is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the The vector value of each dimension in the corpus feature vector is respectively used as the variable value of the preset corpus-derived variable of the corresponding dimension; the determination module 304 is used for performing the actual value and the intent label based on different preset strategies. The variable determination operation is to obtain the real value derived variable; the screening module 305 is used to use the corpus derived variable and the real value derived variable as independent variables, and screen the independent variables based on the preset univariate analysis method, and obtain target variable; the training module 306 is used to adjust the preset first portrait model based on the target variable, obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target variable a portrait model; and an input module 307, configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a client portrait.

In this embodiment, the application effectively uses a large amount of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out high correlation with customer portraits. variables, and then achieve more accurate customer portraits by inputting the values of a small number of variables in the final target portrait model. The output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.

The word segmentation module 302 includes an adjustment sub-module, a word segmentation sub-module, an extraction sub-module and a screening sub-module. Wherein, the adjustment sub-module is used to adjust the preset initial word segmentation dictionary based on the customer response corpus, and obtain the customer response word segmentation dictionary; the word segmentation sub-module is used to separately analyze the customers under each of the intent tags based on the customer response word segmentation dictionary. The response corpus is segmented to obtain the target words; the extraction sub-module is used to extract the target words under each of the intent tags based on the preset keyword extraction method, to obtain the initial keywords; the screening sub-module is used to The initial keywords under each of the intent tags are screened to obtain the target keywords.

The adjustment sub-module includes a recognition unit, a word segmentation unit, an extraction unit, an adjustment unit and an acquisition unit. The identification unit is used to identify the customer response corpus under the same intent label; the word segmentation unit is used to segment the customer response corpus under the current intent label based on a preset initial word segmentation dictionary to obtain the first feature word; the extraction unit is used for The first feature word is extracted based on the keyword extraction method to obtain the second feature word; the adjusting unit is used to adjust the second feature word to obtain the unique word; the obtaining unit is used to add the unique word into the In the initial word segmentation dictionary, the customer response word segmentation dictionary is obtained.

The determination module 304 includes a calculation sub-module, a default rate sub-module, a first derivative sub-module and a second derivative sub-module. Wherein, the calculation sub-module is used to calculate the ratio of the number of default real values to the number of customers in each intent label, to obtain the default ratio; the default rate sub-module is used to calculate the default ratio greater than the pre-calculated total default rate As the significant default rate, the intention label corresponding to the significant default rate is used as the significant label; the first derivative sub-module is used to derive the variable of the number of times of repayment rejection based on the significant label, and to derive the variable of the number of lies and The variable of times of rejecting calls; the second derivative sub-module is configured to use the variable of times of rejecting payment, the times of lying and the times of rejecting calls as the real value derivative variables.

The screening module 305 includes a missing rate calculation sub-module, a correlation coefficient calculation sub-module and a selection sub-module. Among them, the missing rate calculation sub-module is used to calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable; the correlation coefficient calculation sub-module is used to calculate the difference between the initial independent variables. The correlation coefficient between the correlation coefficients is generated, and the relevant independent variable set is generated according to the correlation coefficient; the selection sub-module is used to randomly select an initial independent variable from each of the relevant independent variable sets as the target variable.

In some optional implementations of this embodiment, the above-mentioned correlation coefficient calculation sub-module is further configured to: the characteristics of the correlation coefficient are:

The training module 306 includes a training sub-module, a receiving sub-module, a stability calculation sub-module, a first obtaining sub-module and a second obtaining sub-module. The training submodule is used to train the second portrait model based on the actual value corresponding to the target variable to obtain an initial portrait model; the receiving submodule is used to receive the target in the next time period based on the target variable The true value corresponding to the variable is used as an intertemporal sample; the stability calculation submodule is used to calculate the stability of each target variable in the initial portrait model on the intertemporal sample through the intertemporal sample; the first Obtaining a sub-module for adjusting the target variable based on the stability to obtain the adjusted target variable; the second obtaining sub-module for adjusting the initial portrait model based on the adjusted target variable, obtaining the adjusted target variable and train the adjusted initial portrait model based on the true value corresponding to the adjusted target variable to obtain the target portrait model.

To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.

The computer device 200 includes a memory 201 , a processor 202 , and a network interface 203 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 201-203 is shown in the figure, but it should be understood that implementation of all shown components is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.

The memory 201 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. The computer-readable storage medium may be non-volatile or volatile. In some embodiments, the memory 201 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 . In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 201 may also include both an internal storage unit of the computer device 200 and an external storage device thereof. In this embodiment, the memory 201 is generally used to store the operating system and various application software installed on the computer device 200 , such as computer-readable instructions of the customer portrait method based on the customer response corpus. In addition, the memory 201 can also be used to temporarily store various types of data that have been output or will be output.

In some embodiments, the processor 202 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 202 is typically used to control the overall operation of the computer device 200 . In this embodiment, the processor 202 is configured to execute computer-readable instructions or process data stored in the memory 201 , for example, computer-readable instructions for executing the customer portrait method based on the customer response corpus.

The network interface 203 may include a wireless network interface or a wired network interface, and the network interface 203 is generally used to establish a communication connection between the computer device 200 and other electronic devices.

In this embodiment, independent variables are generated based on historical customer response corpus, and by adjusting the independent variables, variables with high correlation with customer portraits are screened out, and a small number of variables in the final target portrait model are inputted. value, and can obtain a more accurate customer portrait.

The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned method of customer portrait based on customer response corpus.

From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the patent scope of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structures made by using the contents of the description and drawings of this application, which are directly or indirectly used in other related technical fields, are all within the scope of protection of the patent of this application.

Claims

A customer portrait method based on customer response corpus, comprising the following steps:

receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;

Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;

A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;

Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;

The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;

Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;

Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
The customer portrait method based on customer response corpus according to claim 1, wherein the step of performing a word segmentation operation on the customer response corpus to obtain target words, adjusting the target words, and obtaining target keywords comprises:

Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;

Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;

Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;

The initial keywords under each of the intent tags are filtered to obtain the target keywords.
The customer portrait method based on the customer response corpus according to claim 2, wherein the step of adjusting a preset initial word segmentation dictionary based on the customer response corpus, and obtaining the customer response word segmentation dictionary comprises:

Identify the customer response corpus under the same intent label;

Perform word segmentation on the customer response corpus under the current intent label based on the preset initial word segmentation dictionary to obtain the first feature word;

Extracting the first feature word based on the keyword extraction method to obtain a second feature word;

Adjust the second feature word to obtain a unique word;

The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
The customer portrait method based on customer response corpus according to claim 1, wherein the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable comprises:

Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;

Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;

An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
The customer portrait method based on customer response corpus according to claim 4, wherein the step of calculating the correlation coefficient between the initial independent variables comprises:

The characteristics of the correlation coefficient are:

Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.
The customer portrait method based on customer response corpus according to claim 1, wherein the second portrait model is trained based on the actual value corresponding to the target variable, and the step of obtaining the target portrait model further comprises:

Train the second portrait model based on the true value corresponding to the target variable to obtain an initial portrait model;

Based on the target variable, receive the true value corresponding to the target variable in the next time period as an intertemporal sample;

Calculate the stability of each target variable in the initial portrait model on the intertemporal sample by using the intertemporal sample;

Adjust the target variable based on the stability to obtain the adjusted target variable;

Adjust the initial portrait model based on the adjusted target variable, obtain the adjusted initial portrait model, and train the adjusted initial portrait model based on the actual value corresponding to the adjusted target variable, and obtain the Target portrait model.
The customer portrait method based on customer response corpus according to claim 1, wherein the real value includes a default real value, the default real value and the intention label are in a one-to-one mapping relationship, and the The preset strategy performs a variable determination operation on the true value and the intent label, and the step of obtaining a true value derived variable includes:

Calculate the ratio of the number of default true values to the number of customers in each intent label separately to obtain the default ratio;

Taking a default rate greater than the pre-calculated total default rate as a significant default rate, and using the intent label corresponding to the significant default rate as a significant label;

Deriving a variable of times of repayment refusal based on the significant label, and deriving a variable of the number of times of lying and the number of rejected calls based on the intention label;

The variable of the number of times of refusing payment, the number of lying and the number of rejecting calls is used as the true value derivative variable.
A customer portrait device based on customer response corpus, comprising:

The receiving module is configured to receive the client response corpus, the intent label and the true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one correspondence Mapping relations;

A word segmentation module, configured to perform word segmentation operations on the customer response corpus, obtain target words, adjust the target words, and obtain target keywords;

A building module is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and convert the vector value of each dimension in the corpus feature vector , respectively as the variable values of the preset corpus-derived variables of the corresponding dimensions;

A determination module, configured to perform variable determination operations on the real value and the intent label based on different preset strategies, to obtain real value derived variables;

A screening module, configured to use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable;

A training module, configured to adjust the preset first portrait model based on the target variable, obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model; And

The input module is configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain the customer portrait.
A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the following steps of the customer portrait method based on the customer response corpus are implemented:

receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;

Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;

A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;

Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;

The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;

Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;

Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
The computer device according to claim 9, wherein the step of performing a word segmentation operation on the customer response corpus to obtain a target word, adjusting the target word, and obtaining a target keyword comprises:

Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;

Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;

Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;

The initial keywords under each of the intent tags are filtered to obtain the target keywords.
The computer device according to claim 10, wherein the step of adjusting a preset initial word segmentation dictionary based on the customer response corpus, and obtaining the customer response word segmentation dictionary comprises:

Identify the customer response corpus under the same intent label;

Perform word segmentation on the customer response corpus under the current intent label based on the preset initial word segmentation dictionary to obtain the first feature word;

Extracting the first feature word based on the keyword extraction method to obtain a second feature word;

Adjust the second feature word to obtain a unique word;

The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
The computer device according to claim 9, wherein the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable comprises:

Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;

Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;

An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
The computer device of claim 12, wherein the step of calculating the correlation coefficient between the initial independent variables comprises:

The characteristics of the correlation coefficient are:

Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.
The computer device according to claim 9, wherein the step of training the second portrait model based on the actual value corresponding to the target variable, the step of obtaining the target portrait model further comprises:

Train the second portrait model based on the true value corresponding to the target variable to obtain an initial portrait model;

Based on the target variable, receive the true value corresponding to the target variable in the next time period as an intertemporal sample;

Calculate the stability of each target variable in the initial portrait model on the intertemporal sample by using the intertemporal sample;

Adjust the target variable based on the stability to obtain the adjusted target variable;

Adjust the initial portrait model based on the adjusted target variable, obtain the adjusted initial portrait model, and train the adjusted initial portrait model based on the true value corresponding to the adjusted target variable, and obtain the Target portrait model.
The computer device according to claim 9, wherein the real value comprises a default real value, the default real value and the intention label are in a one-to-one mapping relationship, and the The variable determination operation is performed on the true value and the intent label, and the steps of obtaining the variable derived from the true value include:

Calculate the ratio of the number of default true values to the number of customers in each intent label separately to obtain the default ratio;

Taking a default rate greater than the pre-calculated total default rate as a significant default rate, and using the intent label corresponding to the significant default rate as a significant label;

Deriving a variable of the number of times of repayment refusal based on the significant label, and a variable of the number of times of lying and the number of rejected calls based on the intention label, respectively;

The variable of the number of times of refusing payment, the number of lying and the number of times of rejecting calls is used as the true value derivative variable.
A computer-readable storage medium, on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of the following-described client portrait method based on client response corpus are implemented:

receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;

Perform word segmentation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;

A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;

Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain true value derived variables;

Using the corpus-derived variable and the true value-derived variable as independent variables, the independent variables are screened based on a preset univariate analysis method to obtain a target variable;

Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;

Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain the customer portrait.
The computer-readable storage medium according to claim 16, wherein the step of performing a word segmentation operation on the customer response corpus to obtain a target word, and adjusting the target word to obtain a target keyword comprises:

Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;

Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;

Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;

The initial keywords under each of the intent tags are screened to obtain the target keywords.
The computer-readable storage medium according to claim 17, wherein the step of adjusting a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary comprises:

Identify the customer response corpus under the same intent label;

Perform word segmentation on the customer response corpus under the current intent tag based on the preset initial word segmentation dictionary to obtain the first feature word;

Extracting the first feature word based on the keyword extraction method to obtain a second feature word;

Adjust the second feature word to obtain a unique word;

The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
The computer-readable storage medium according to claim 16, wherein the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable comprises:

Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;

Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;

An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
The computer-readable storage medium of claim 19, wherein the step of calculating the correlation coefficient between the initial independent variables comprises:

The characteristics of the correlation coefficient are:

Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.