WO2022126963A1 - Customer profiling method based on customer response corpora, and device related thereto - Google Patents
Customer profiling method based on customer response corpora, and device related thereto Download PDFInfo
- Publication number
- WO2022126963A1 WO2022126963A1 PCT/CN2021/090166 CN2021090166W WO2022126963A1 WO 2022126963 A1 WO2022126963 A1 WO 2022126963A1 CN 2021090166 W CN2021090166 W CN 2021090166W WO 2022126963 A1 WO2022126963 A1 WO 2022126963A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- variable
- target
- corpus
- customer
- initial
- Prior art date
Links
- 230000004044 response Effects 0.000 title claims abstract description 165
- 238000000034 method Methods 0.000 title claims abstract description 68
- 239000013598 vector Substances 0.000 claims abstract description 53
- 238000012216 screening Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims description 97
- 238000013507 mapping Methods 0.000 claims description 23
- 238000000605 extraction Methods 0.000 claims description 17
- 238000007473 univariate analysis Methods 0.000 claims description 17
- 230000009466 transformation Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 abstract description 7
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000006243 chemical reaction Methods 0.000 abstract description 3
- 230000000875 corresponding effect Effects 0.000 description 29
- 238000004364 calculation method Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000010219 correlation analysis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 208000003028 Stuttering Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Definitions
- the present application relates to the field of big data technology, and in particular, to a customer portrait method and related equipment based on customer response corpus.
- the inventor realizes that most of the current customer portrait model training is simply to extract a small number of significant labels for subsequent learning of the customer portrait model. This method only uses a very limited part of the massive customer record data, resulting in low accuracy of the customer portrait output by the trained customer portrait model, which is difficult for subsequent reuse, causing a lot of inconvenience.
- the purpose of the embodiments of the present application is to propose a customer portrait method, device, computer equipment and storage medium based on customer response corpus, which can obtain more accurate customer portraits.
- the embodiment of the present application provides a customer portrait method based on customer response corpus, and adopts the following technical solutions:
- a customer portrait method based on customer response corpus comprising the following steps:
- a feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension.
- the corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;
- the embodiment of the present application also provides a customer portrait device based on customer response corpus, which adopts the following technical solutions:
- a customer portrait device based on customer response corpus comprising:
- the receiving module is configured to receive the client response corpus, the intent label and the true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one correspondence Mapping relations;
- a word segmentation module configured to perform word segmentation operations on the customer response corpus, obtain target words, adjust the target words, and obtain target keywords;
- a building module is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and convert the vector value of each dimension in the corpus feature vector , respectively as the variable values of the preset corpus-derived variables of the corresponding dimensions;
- a determination module configured to perform variable determination operations on the real value and the intent label based on different preset strategies, to obtain real value derived variables
- a screening module configured to use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable;
- a training module for adjusting the preset first portrait model based on the target variable, obtaining a second portrait model, and training the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;
- the input module is configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain the customer portrait.
- the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
- a computer device comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the following steps of the customer portrait method based on the customer response corpus:
- a feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension.
- the corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;
- Receive the value of the variable to be identified input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
- the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
- a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of the following-described customer portrait method based on customer response corpus are implemented:
- a feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension.
- the corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;
- Receive the value of the variable to be identified input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
- This application effectively uses a large number of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out variables that are highly relevant to customer portraits, and then realize the By entering the values of a small number of variables in the final target portrait model, more accurate customer portraits can be obtained.
- the output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
- FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
- Fig. 2 is a flow chart of an embodiment of the method for customer portrait based on customer response corpus according to the present application
- FIG. 3 is a schematic structural diagram of an embodiment of a client portrait device based on client response corpus according to the present application
- FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
- the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
- the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
- the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
- the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
- Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
- the terminal devices 101, 102, and 103 may be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
- MP3 players Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3
- MP4 Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4
- the server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
- the client portrait method based on the client response corpus provided by the embodiment of the present application is generally executed by the server/terminal device, and accordingly, the client portrait device based on the client response corpus is generally set in the server/terminal device.
- terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
- FIG. 2 there is shown a flow chart of an embodiment of a method for customer portraiture based on customer response corpus according to the present application.
- the described customer portrait method based on customer response corpus includes the following steps:
- S1 Receive a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship.
- the customer response corpus is the customer's historical response corpus in the question-and-answer dialogue.
- This application can extract the customer's response corpus within the past period of time (such as the past six months).
- intent labels and ground truth values it facilitates subsequent data processing.
- the intent label refers to the customer's intent marked according to the customer response corpus, wherein the generation of the intent label can be generated by a pre-trained intent classification model, or it can be manually annotated.
- the intent labels can be: willingness to repay the loan, repayment of the loan, and no willingness to repay the loan.
- the true value refers to the actual actions of the customer.
- the true value is whether the customer repays the loan.
- the true value is whether the customer rejects the call.
- the corresponding customer response corpus is None, and the intent label can be the customer rejecting the call or None.
- the electronic device for example, the server/terminal device shown in FIG. 1
- the client portrait method based on the client response corpus runs can receive the client response corpus, the intent tag and the actual value.
- the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .
- S2 Perform word segmentation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords.
- the segmentation of the customer response corpus is realized by segmenting the customer response corpus, which is convenient for further processing.
- the target keywords are obtained by adjusting the split customer response corpus.
- the steps of performing word segmentation on the customer response corpus to obtain target words, adjusting the target words, and obtaining target keywords include:
- the initial keywords under each of the intent tags are filtered to obtain the target keywords.
- the initial word segmentation dictionary in this application is a default dictionary of jieba, which can be obtained directly from an open source website.
- the keyword extraction method preset in this application is the TF-IDF (term frequency-inverse document frequency, term frequency-inverse document frequency) method.
- the preset initial word segmentation dictionary is adjusted based on the customer response corpus, and the customer response word segmentation dictionary is obtained, so that the customer response word segmentation dictionary has the characteristics of the scene corresponding to the customer response corpus.
- the word segmentation is performed on the customer response corpus based on the customer response word segmentation dictionary, so as to reduce the phenomenon of wrong word segmentation and achieve better word segmentation results.
- the TF-IDF (term frequency-inverse document frequency, term frequency-inverse document frequency) method is used to extract the target words under each intent label, and then the extracted target words are screened.
- a specific screening method may be: screen out the top n initial keywords with the highest importance under each intent label as the target keywords, wherein the importance is directly output by the TF-IDF method.
- TF-IDF is used to assess the importance of a word to a document set or one of the documents in a corpus. n is set to 50 for this application. When there are 50 intent tags, a total of 2500 target keywords are finally obtained.
- a feature dictionary is generated, that is, the feature dictionary is composed of a total of 2501 words. Filter the extracted target words.
- the specific screening method may also be: identifying the number of different initial keywords, and calculating the frequency of each initial keyword based on the number; sorting the initial keywords based on the frequency; deleting words with low frequency
- the target keyword is obtained from the initial keyword at the preset threshold.
- the step of adjusting the preset initial word segmentation dictionary based on the customer response corpus, and obtaining the customer response word segmentation dictionary includes:
- the unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
- the unique words refer to the collection unique words.
- the method of extracting collection-specific words is to organize the customer corpus generated by AI collection in production according to the classification of intent labels. The format is shown in Table 2:
- the feature words under the intent label 1 are: I save, on, enough, above, already, in...
- the feature words under the label are manually sorted according to the scene conditions. Make combinations to generate unique words: save on, save enough, save on, save in, have saved.... Add unique words to jieba default dictionary to generate customer response word segmentation dictionary.
- the word segmentation in the subsequent word segmentation operation on the customer response corpus, when a unique word is encountered, the word segmentation will be performed first according to the unique word, and when the unique word is not found, the word segmentation will be performed according to the customer response word segmentation dictionary.
- the default general jieba default dictionary will cause some unique words in different scenarios to be misclassified, resulting in the subsequent feature extraction, the extracted features are not representative and applicable, and cannot be used well for subsequent features. operate.
- Table 1 below shows the word segmentation in the AI collection scenario:
- customer response corpus Stuttering Default Dictionary customer response word segmentation dictionary I saved. I saved. I saved. I can't save. I can't save. I can't save.
- the features that can be extracted are: I save, on, no, no.
- downstream services such as AI collectors
- downstream services cannot intuitively understand these features and use them in practice.
- the features that can be extracted are: I, saved, cannot be saved, and sent words that conform to the scene characteristics such as "saved” and "cannot be saved” to downstream services , which can help the downstream service to understand the customer's situation more intuitively and help the downstream service to make better policy decisions.
- the present application supplements the unique words of the corresponding scene on the basis of the existing jieba word segmentation dictionary, thereby establishing a customer response word segmentation dictionary.
- S3 Construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and use the vector value of each dimension in the corpus feature vector as The variable value of the preset corpus-derived variable corresponding to the dimension.
- the application uses the vector value of each dimension in the corpus feature vector as the variable value of the preset corpus-derived variable of the corresponding dimension, so as to obtain more abundant variables, and through the above steps such as word segmentation processing The determined variable values are also more accurate.
- a feature dictionary is formed. Replace the words in the customer's vocabulary that are not in the target keyword with word placeholders (nan).
- one-hot-encoding is performed on the customer response corpus to create a corpus feature vector (ie, customer vocabulary features).
- the form of the feature dictionary is as follows in Table 3:
- word ID word 0 save 1 enough 2 deposit 3 already saved 4 save 5 no money 6 no salary 7 difficulty 8 no way 9 not yet ... ... 2500 nan
- each customer response corpus will be converted into a 2501-dimensional corpus feature vector.
- the word ID in the feature dictionary determines the words corresponding to different dimensions during the vector conversion process.
- One-hot-encoding (one-hot encoding) method is used to easily restore the text meaning of the features that have a significant effect on the customer's default, which is conducive to the establishment of more understandable and practical user portraits.
- this application can also choose according to actual needs.
- Other vector conversion methods are applicable.
- a specific example of the corpus feature vector is as follows: In this application, “save” is set to 0, then "save" is at the first position of the word vector, that is, the first dimension of the word vector.
- S4 Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable.
- the real-value derived variable is determined through different preset strategies, so as to realize the expansion of the variable, so as to facilitate the subsequent selection of variables to be more abundant.
- the real value includes a default real value
- the default real value and the intention label are in a one-to-one mapping relationship
- the real value and the intention label are variable based on different preset strategies Determining the operation, the steps to obtain the true value derived variable include:
- variable of the number of times of refusing payment, the number of lying and the number of rejecting calls is used as the true value derivative variable.
- the true value of default refers to that in the collection scenario, the customer fails to repay the loan within the agreed time limit; or, in the logistics scenario, the customer fails to deliver the goods on the agreed date or the delivery quality is lower than the agreed quality, etc. ; All of the above situations belong to the default of the customer. If the customer defaults, the true value of the default will be generated correspondingly to mark that the customer has defaulted.
- This application extracts the customer's intent tags over a period of time in the past. The intention labels generated by customers at one time are contingent to a certain extent, and more comprehensive information can be obtained by using the repayment intention labels generated by customers in the past period of time.
- output intention label when collecting collection - the number of customers who refuse to pay is 100, and among these 100 people, 50 people actually defaulted, then the default rate under this intention label is 50%.
- the default rate under some intent labels is much higher or lower than the overall customer default rate, so these labels are considered to have a significant role in predicting customer default.
- a series of variables can be derived. If the default rate of the label of refusal to repay is much higher than the overall customer default rate, then variables can be derived: the number of refusals to repay in the past 1 month, the number of refusals to repay in the past 3 months, and the number of refusals to repay in the past 6 months times, etc.
- the variables that can be derived based on this are: the number of times of commitment to repayment but default in the past 1 month, the number of times of commitment to repayment but default in the past 3 months, the number of times of commitment to repayment but default in the past 6 months, etc.
- S5 Use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable.
- the independent variable is subjected to univariate analysis and screening to obtain the target variable.
- the lightgbm method is used to model. Validate and trace the model. Variables that have a significant and stabilizing effect on customer default are screened out for stable and standardized output of user profile variables.
- the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable includes:
- An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
- the missing rate refers to the missing condition of the variable value corresponding to the variable.
- Univariate analysis was performed on the independent variables, and the missing rate for each variable was calculated. Remove independent variables with a missing rate greater than a preset threshold. The missing rate in this application is 95%. If the missing rate of an independent variable x n reaches 95%, the independent variable is deleted. Calculate the correlation coefficient between each independent variable and other independent variables, and delete independent variables that are highly correlated with other independent variables. If the correlation coefficients of the independent variables: x 1 and x 2 , x 5 . It is helpful to reduce the number of independent variables by screening and remove more than one independent variables.
- the step of calculating the correlation coefficient between the initial independent variables includes:
- ⁇ X, Y represents the correlation coefficient
- X and Y represent different initial independent variables
- cov represents the covariance
- E represents the expectation
- u x represents the expectation of X
- u y represents the expectation of Y.
- the Pearson correlation coefficient between the two variables x, y is calculated by the above formula, and the correlation coefficient is equal to the covariance of the two variables divided by the standard deviation of the two variables.
- cov(X, Y) represents the covariance between two variables X and Y
- E represents the expectation
- u x represents the expectation E(X) of X
- u y represents the expectation E(Y) of Y.
- S6 Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model.
- the variables filtered through the above step S5 are put into a preset first portrait model to obtain an intermediate portrait model, wherein the first portrait model is the lightgbm model, according to the intermediate portrait model
- the variable importance output by the portrait model delete the variables whose variable importance is lower than the preset importance threshold, and obtain the first target variable set and the second portrait model.
- the second portrait model is trained based on the real values corresponding to the first target variable set to obtain a target portrait model.
- the step of training the second portrait model based on the true value corresponding to the target variable, and obtaining the target portrait model further includes:
- Adjust the initial portrait model based on the adjusted target variable obtain the adjusted initial portrait model, and train the adjusted initial portrait model based on the actual value corresponding to the adjusted target variable, and obtain the Target portrait model.
- the actual value corresponding to the target variable in the next time period is received as an inter-period sample, wherein the next time period may be a new subsequent month;
- the inter-period sample verifies the second portrait model, and calculates the stability of each target variable in the second portrait model on the inter-period sample, wherein the stability is measured by PSI, and the calculation formula of PSI is as follows:
- the present application can also continue to track new intertemporal samples for at least the following two months to determine the stability of the performance of the second portrait model on the new intertemporal samples.
- S7 Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
- the variable to be identified is the target variable that is finally determined to be added to the target portrait model.
- the target portrait model can not only stably predict the default probability of customers, but also can generate user portraits according to the values of the variables to be identified in the input model, which can reflect customer risks more comprehensively and intuitively, and help urge employees to improve their performance.
- Develop collection strategies effectively For example: for customer A, the value of the variable to be identified is obtained, and the value of the variable to be identified is input into the target portrait model, and the target portrait model outputs the predicted default probability and the label with a greater correlation with the default probability, Create customer portraits.
- the target portrait model outputs a default probability of 0.9, and outputs labels such as "complaint”, “no money”, “annoying”, one complaint in the past 3 months, and a promise to repay but not once in the past 6 months.
- relevant users such as collection personnel in collection scenarios
- the user portraits output by this application enable the company to understand customers more comprehensively and stably, and manage customer risks. Make full use of a large number of precious natural language text resources.
- the establishment of customer portraits based on historical customer response corpus and intent labels can accurately display the key risk points of customers and complement the traditional portrait model.
- This application effectively uses a large number of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out variables that are highly relevant to customer portraits, and then realize the By entering the values of a small number of variables in the final target portrait model, more accurate customer portraits can be obtained.
- the output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
- the above-mentioned target portrait model can also be stored in a node of a blockchain.
- the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
- Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
- the present application can be applied in the field of smart communities, thereby promoting the construction of smart cities.
- the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
- the present application provides an embodiment of a customer portrait device based on customer response corpus, and the device embodiment corresponds to the method embodiment shown in FIG. 2 .
- the device can be specifically applied to various electronic devices.
- the customer portrait device 300 based on the customer response corpus described in this embodiment includes: a receiving module 301, which is configured to receive the customer response corpus, the intention label and the truth value, wherein the customer response corpus and the The intent label has a one-to-one mapping relationship, and the intent label and the real value have a one-to-one mapping relationship;
- the word segmentation module 302 is used to perform a word segmentation operation on the customer response corpus to obtain target words, The target word is adjusted to obtain the target keyword;
- the building module 303 is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the The vector value of each dimension in the corpus feature vector is respectively used as the variable value of the preset corpus-derived variable of the corresponding dimension;
- the determination module 304 is used for performing the actual value and the intent label based on different preset strategies.
- the variable determination operation is to obtain the real value derived variable; the screening module 305 is used to use the corpus derived variable and the real value derived variable as independent variables, and screen the independent variables based on the preset univariate analysis method, and obtain target variable; the training module 306 is used to adjust the preset first portrait model based on the target variable, obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target variable a portrait model; and an input module 307, configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a client portrait.
- the application effectively uses a large amount of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out high correlation with customer portraits. variables, and then achieve more accurate customer portraits by inputting the values of a small number of variables in the final target portrait model.
- the output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
- the word segmentation module 302 includes an adjustment sub-module, a word segmentation sub-module, an extraction sub-module and a screening sub-module.
- the adjustment sub-module is used to adjust the preset initial word segmentation dictionary based on the customer response corpus, and obtain the customer response word segmentation dictionary;
- the word segmentation sub-module is used to separately analyze the customers under each of the intent tags based on the customer response word segmentation dictionary.
- the response corpus is segmented to obtain the target words;
- the extraction sub-module is used to extract the target words under each of the intent tags based on the preset keyword extraction method, to obtain the initial keywords;
- the screening sub-module is used to The initial keywords under each of the intent tags are screened to obtain the target keywords.
- the adjustment sub-module includes a recognition unit, a word segmentation unit, an extraction unit, an adjustment unit and an acquisition unit.
- the identification unit is used to identify the customer response corpus under the same intent label;
- the word segmentation unit is used to segment the customer response corpus under the current intent label based on a preset initial word segmentation dictionary to obtain the first feature word;
- the extraction unit is used for The first feature word is extracted based on the keyword extraction method to obtain the second feature word;
- the adjusting unit is used to adjust the second feature word to obtain the unique word;
- the obtaining unit is used to add the unique word into the In the initial word segmentation dictionary, the customer response word segmentation dictionary is obtained.
- the determination module 304 includes a calculation sub-module, a default rate sub-module, a first derivative sub-module and a second derivative sub-module.
- the calculation sub-module is used to calculate the ratio of the number of default real values to the number of customers in each intent label, to obtain the default ratio;
- the default rate sub-module is used to calculate the default ratio greater than the pre-calculated total default rate As the significant default rate, the intention label corresponding to the significant default rate is used as the significant label;
- the first derivative sub-module is used to derive the variable of the number of times of repayment rejection based on the significant label, and to derive the variable of the number of lies and The variable of times of rejecting calls;
- the second derivative sub-module is configured to use the variable of times of rejecting payment, the times of lying and the times of rejecting calls as the real value derivative variables.
- the screening module 305 includes a missing rate calculation sub-module, a correlation coefficient calculation sub-module and a selection sub-module.
- the missing rate calculation sub-module is used to calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable
- the correlation coefficient calculation sub-module is used to calculate the difference between the initial independent variables.
- the correlation coefficient between the correlation coefficients is generated, and the relevant independent variable set is generated according to the correlation coefficient
- the selection sub-module is used to randomly select an initial independent variable from each of the relevant independent variable sets as the target variable.
- the above-mentioned correlation coefficient calculation sub-module is further configured to: the characteristics of the correlation coefficient are:
- ⁇ X, Y represents the correlation coefficient
- X and Y represent different initial independent variables
- cov represents the covariance
- E represents the expectation
- u x represents the expectation of X
- u y represents the expectation of Y.
- the training module 306 includes a training sub-module, a receiving sub-module, a stability calculation sub-module, a first obtaining sub-module and a second obtaining sub-module.
- the training submodule is used to train the second portrait model based on the actual value corresponding to the target variable to obtain an initial portrait model;
- the receiving submodule is used to receive the target in the next time period based on the target variable
- the true value corresponding to the variable is used as an intertemporal sample;
- the stability calculation submodule is used to calculate the stability of each target variable in the initial portrait model on the intertemporal sample through the intertemporal sample;
- the second obtaining sub-module for adjusting the initial portrait model based on the adjusted target variable, obtaining the adjusted target variable and train the adjusted initial portrait model based on the true value corresponding to the adjusted target variable to obtain the target portrait model.
- This application effectively uses a large number of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out variables that are highly relevant to customer portraits, and then realize the By entering the values of a small number of variables in the final target portrait model, more accurate customer portraits can be obtained.
- the output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
- FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.
- the computer device 200 includes a memory 201 , a processor 202 , and a network interface 203 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 201-203 is shown in the figure, but it should be understood that implementation of all shown components is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- DSP Digital Signal Processor
- the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
- the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
- the memory 201 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
- the computer-readable storage medium may be non-volatile or volatile.
- the memory 201 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 .
- the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
- the memory 201 may also include both an internal storage unit of the computer device 200 and an external storage device thereof.
- the memory 201 is generally used to store the operating system and various application software installed on the computer device 200 , such as computer-readable instructions of the customer portrait method based on the customer response corpus.
- the memory 201 can also be used to temporarily store various types of data that have been output or will be output.
- the processor 202 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
- the processor 202 is typically used to control the overall operation of the computer device 200 .
- the processor 202 is configured to execute computer-readable instructions or process data stored in the memory 201 , for example, computer-readable instructions for executing the customer portrait method based on the customer response corpus.
- the network interface 203 may include a wireless network interface or a wired network interface, and the network interface 203 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
- independent variables are generated based on historical customer response corpus, and by adjusting the independent variables, variables with high correlation with customer portraits are screened out, and a small number of variables in the final target portrait model are inputted. value, and can obtain a more accurate customer portrait.
- the present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned method of customer portrait based on customer response corpus.
- independent variables are generated based on historical customer response corpus, and by adjusting the independent variables, variables with high correlation with customer portraits are screened out, and a small number of variables in the final target portrait model are inputted. value, and can obtain a more accurate customer portrait.
- the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
- the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
- a storage medium such as ROM/RAM, magnetic disk, CD-ROM
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Accounting & Taxation (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An embodiment of the present application belongs to the field of big data, and applies to the field of smart communities, and relates to a customer profiling method based on customer response corpora, and a device related thereto, comprising: tokenizing and adjusting the customer response corpora to obtain target keywords; performing vector conversion on the customer response corpora on the basis of a feature dictionary constructed by means of the target keywords to obtain corpora feature vectors; processing true values and intent labels on the basis of a preset policy to obtain true value-derived variables; screening corpora-derived variables and the true value-derived variables on the basis of a single variable analysis method to obtain target variables; adjusting a preset first profile model on the basis of the target variables to obtain a second profile model, and training the second profile model on the basis of variable values corresponding to the target variables to obtain a target profile model; and inputting received values of variables to be recognized into the target profile model to obtain a customer profile. The target profile model can be stored in a blockchain. The present application generates a more accurate customer profile.
Description
本申请要求于2020年12月16日提交中国专利局、申请号为202011487411.X,发明名称为“基于客户应答语料的客户画像方法及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on December 16, 2020 with the application number 202011487411.X and the invention titled "Customer portrait method and related equipment based on customer response corpus", the entire content of which is Incorporated herein by reference.
本申请涉及大数据技术领域,尤其涉及基于客户应答语料的客户画像方法及其相关设备。The present application relates to the field of big data technology, and in particular, to a customer portrait method and related equipment based on customer response corpus.
随着计算机技术的不断革新和发展,计算机技术已经广泛的应用于各行各业中。其中,大数据技术占据了重要的位置,应用极为广泛,尤其主要应用于客户行为分析、客户预测以及客户画像中。对于客户画像来说,需要运用到海量的客户记录数据,计算机通过客户画像模型学习海量的客户记录数据,来更加的了解客户。With the continuous innovation and development of computer technology, computer technology has been widely used in all walks of life. Among them, big data technology occupies an important position and is widely used, especially in customer behavior analysis, customer prediction and customer portraits. For customer portraits, it is necessary to apply massive customer record data, and the computer learns massive customer record data through the customer portrait model to better understand customers.
发明人意识到,目前对于客户画像模型的训练,大多数是简单的提取少部分显著的标签,用于进行后续客户画像模型的学习。这种方式只运用了海量的客户记录数据中非常有限的一部分数据,导致训练出的客户画像模型所输出的客户画像准确度低,难以进行后续的再利用,造成了许多的不便。The inventor realizes that most of the current customer portrait model training is simply to extract a small number of significant labels for subsequent learning of the customer portrait model. This method only uses a very limited part of the massive customer record data, resulting in low accuracy of the customer portrait output by the trained customer portrait model, which is difficult for subsequent reuse, causing a lot of inconvenience.
发明内容SUMMARY OF THE INVENTION
本申请实施例的目的在于提出一种基于客户应答语料的客户画像方法、装置、计算机设备及存储介质,能够获得更加精准的客户画像。The purpose of the embodiments of the present application is to propose a customer portrait method, device, computer equipment and storage medium based on customer response corpus, which can obtain more accurate customer portraits.
为了解决上述技术问题,本申请实施例提供一种基于客户应答语料的客户画像方法,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application provides a customer portrait method based on customer response corpus, and adopts the following technical solutions:
一种基于客户应答语料的客户画像方法,包括下述步骤:A customer portrait method based on customer response corpus, comprising the following steps:
接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;
对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;
基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;
基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;
将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;
基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;
接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。,Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait. ,
为了解决上述技术问题,本申请实施例还提供一种基于客户应答语料的客户画像装置,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a customer portrait device based on customer response corpus, which adopts the following technical solutions:
一种基于客户应答语料的客户画像装置,包括:A customer portrait device based on customer response corpus, comprising:
接收模块,用于接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;The receiving module is configured to receive the client response corpus, the intent label and the true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one correspondence Mapping relations;
分词模块,用于对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;A word segmentation module, configured to perform word segmentation operations on the customer response corpus, obtain target words, adjust the target words, and obtain target keywords;
构建模块,用于基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A building module is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and convert the vector value of each dimension in the corpus feature vector , respectively as the variable values of the preset corpus-derived variables of the corresponding dimensions;
确定模块,用于基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;A determination module, configured to perform variable determination operations on the real value and the intent label based on different preset strategies, to obtain real value derived variables;
筛选模块,用于将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;a screening module, configured to use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable;
训练模块,用于基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;以及A training module for adjusting the preset first portrait model based on the target variable, obtaining a second portrait model, and training the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model; And
输入模块,用于接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。The input module is configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain the customer portrait.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的基于客户应答语料的客户画像方法的步骤:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the following steps of the customer portrait method based on the customer response corpus:
接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;
对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;
基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;
基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;
将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;
基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;
接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的基于客户应答语料的客户画像方法的步骤:A computer-readable storage medium, on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of the following-described customer portrait method based on customer response corpus are implemented:
接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;
对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;
基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;
基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;
将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;
基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;
接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请有效运用了海量的历史的客户应答语料和意图标签,基于历史的客户应答语料生成自变量,并通过对自变量的调整,实现筛选出与客户画像的相关性高的变量,进而实现通过输入最终的目标画像模型中少量的变量的值,而能够获得更加精准的客户画像。输出的客户画像可以明确地将客户的关键点展示出来,实现获得表现更佳的客户画像,进而可以通过客户画像进行更加合理的后续配置。This application effectively uses a large number of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out variables that are highly relevant to customer portraits, and then realize the By entering the values of a small number of variables in the final target portrait model, more accurate customer portraits can be obtained. The output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的基于客户应答语料的客户画像方法的一个实施例的流程图;Fig. 2 is a flow chart of an embodiment of the method for customer portrait based on customer response corpus according to the present application;
图3是根据本申请的基于客户应答语料的客户画像装置的一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of a client portrait device based on client response corpus according to the present application;
图4是根据本申请的计算机设备的一个实施例的结构示意图。FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
附图标记:200、计算机设备;201、存储器;202、处理器;203、网络接口;300、基于客户应答语料的客户画像装置;301、接收模块;302、分词模块;303、构建模块;304、确定模块;305、筛选模块;306、训练模块;307、输入模块。Reference numerals: 200, computer equipment; 201, memory; 202, processor; 203, network interface; 300, customer portrait device based on customer response corpus; 301, receiving module; 302, word segmentation module; 303, building module; 304 305, a screening module; 306, a training module; 307, an input module.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 may be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .
需要说明的是,本申请实施例所提供的基于客户应答语料的客户画像方法一般由服务器/终端设备执行,相应地,基于客户应答语料的客户画像装置一般设置于服务器/终端设备中。It should be noted that the client portrait method based on the client response corpus provided by the embodiment of the present application is generally executed by the server/terminal device, and accordingly, the client portrait device based on the client response corpus is generally set in the server/terminal device.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
继续参考图2,示出了根据本申请的基于客户应答语料的客户画像方法的一个实施例的流程图。所述的基于客户应答语料的客户画像方法,包括以下步骤:Continuing to refer to FIG. 2 , there is shown a flow chart of an embodiment of a method for customer portraiture based on customer response corpus according to the present application. The described customer portrait method based on customer response corpus includes the following steps:
S1:接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系。S1: Receive a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship.
在本实施例中,客户应答语料为客户在问答对话中的历史应答语料。本申请可以提取客户在过去一段时间(如近六个月)之内的客户应答语料。通过接收客户应答语料、意图标签和真实值,便于后续的数据处理。意图标签指根据客户应答语料而标记的客户的意图,其中,意图标签的生成可以是通过预先训练的意图分类模型生成的,也可以是人工标注的。在催促客户还贷的场景中,意图标签可以为:有还贷意愿、已还贷以及无还贷意愿等。真实值指客户的实际动作,比如,在催促客户还贷的场景中,真实值为客户是否还贷。在电话通话的场景中,真实值为客户是否拒绝电话,其中,拒接电话时,对应的客户应答语料为无,意图标签可以为客户拒接电话或者为无。In this embodiment, the customer response corpus is the customer's historical response corpus in the question-and-answer dialogue. This application can extract the customer's response corpus within the past period of time (such as the past six months). By receiving customer response corpus, intent labels and ground truth values, it facilitates subsequent data processing. The intent label refers to the customer's intent marked according to the customer response corpus, wherein the generation of the intent label can be generated by a pre-trained intent classification model, or it can be manually annotated. In the scenario of urging customers to repay the loan, the intent labels can be: willingness to repay the loan, repayment of the loan, and no willingness to repay the loan. The true value refers to the actual actions of the customer. For example, in the scenario of urging the customer to repay the loan, the true value is whether the customer repays the loan. In the phone call scenario, the true value is whether the customer rejects the call. When rejecting the call, the corresponding customer response corpus is None, and the intent label can be the customer rejecting the call or None.
在本实施例中,基于客户应答语料的客户画像方法运行于其上的电子设备(例如图1所示的服务器/终端设备)可以通过有线连接方式或者无线连接方式接收客户应答语料、意图标签和真实值。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the electronic device (for example, the server/terminal device shown in FIG. 1 ) on which the client portrait method based on the client response corpus runs can receive the client response corpus, the intent tag and the actual value. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .
S2:对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词。S2: Perform word segmentation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords.
在本实施例中,通过对客户应答语料进行分词,实现对客户应答语料的拆分,便于进一步的处理。通过对拆分后的客户应答语料进行调整,获得目标关键词。In this embodiment, the segmentation of the customer response corpus is realized by segmenting the customer response corpus, which is convenient for further processing. The target keywords are obtained by adjusting the split customer response corpus.
具体的,所述对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词的步骤包括:Specifically, the steps of performing word segmentation on the customer response corpus to obtain target words, adjusting the target words, and obtaining target keywords include:
基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典;Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;
基于所述客户应答分词词典分别对每种所述意图标签下的客户应答语料进行分词,获得目标词语;Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;
基于预设的关键词提取方式分别对每种所述意图标签下的所述目标词语进行提取,获得初始关键词;Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;
对每种所述意图标签下的初始关键词进行筛选,获得所述目标关键词。The initial keywords under each of the intent tags are filtered to obtain the target keywords.
在本实施例中,本申请中的初始分词词典为结巴(jieba)默认词典,可以直接在开源的网站中获取。本申请中预设的关键词提取方式为TF-IDF(term frequency–inverse document frequency,词频-逆向文件频率)方法。基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典,使得客户应答分词词典中具有客户应答语料所对应的场景的特征。基于所述客户应答分词词典对所述客户应答语料进行分词,减少词汇分错 的现象,实现获得较好的分词结果。再利用TF-IDF(term frequency–inverse document frequency,词频-逆向文件频率)方法,对每一种意图标签下的目标词语进行提取,再对提取后的目标词语进行筛选。具体的筛选方式可以为:每一种意图标签下筛选出重要性最高的前n个初始关键词,作为所述目标关键词,其中,重要性由TF-IDF方法直接输出。TF-IDF用于评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。本申请的n设置为50。当意图标签为50种时,最终获得共计2500个目标关键词。在后续的生成特征字典的过程中,基于目标关键词和预设的词语占位符(nan),生成特征字典,即特征字典由共计2501个词所组成。对提取后的目标词语进行筛选。具体的筛选方式还可以是:识别不同的所述初始关键词的数量,并基于所述数量计算出每种初始关键词的频率;基于所述频率对所述初始关键词进行排序;删除词频低于预设阈值的初始关键词,获得目标关键词。In this embodiment, the initial word segmentation dictionary in this application is a default dictionary of jieba, which can be obtained directly from an open source website. The keyword extraction method preset in this application is the TF-IDF (term frequency-inverse document frequency, term frequency-inverse document frequency) method. The preset initial word segmentation dictionary is adjusted based on the customer response corpus, and the customer response word segmentation dictionary is obtained, so that the customer response word segmentation dictionary has the characteristics of the scene corresponding to the customer response corpus. The word segmentation is performed on the customer response corpus based on the customer response word segmentation dictionary, so as to reduce the phenomenon of wrong word segmentation and achieve better word segmentation results. Then, the TF-IDF (term frequency-inverse document frequency, term frequency-inverse document frequency) method is used to extract the target words under each intent label, and then the extracted target words are screened. A specific screening method may be: screen out the top n initial keywords with the highest importance under each intent label as the target keywords, wherein the importance is directly output by the TF-IDF method. TF-IDF is used to assess the importance of a word to a document set or one of the documents in a corpus. n is set to 50 for this application. When there are 50 intent tags, a total of 2500 target keywords are finally obtained. In the subsequent process of generating the feature dictionary, based on the target keyword and the preset word placeholder (nan), a feature dictionary is generated, that is, the feature dictionary is composed of a total of 2501 words. Filter the extracted target words. The specific screening method may also be: identifying the number of different initial keywords, and calculating the frequency of each initial keyword based on the number; sorting the initial keywords based on the frequency; deleting words with low frequency The target keyword is obtained from the initial keyword at the preset threshold.
其中,所述基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典的步骤包括:Wherein, the step of adjusting the preset initial word segmentation dictionary based on the customer response corpus, and obtaining the customer response word segmentation dictionary includes:
识别相同意图标签下的客户应答语料;Identify the customer response corpus under the same intent label;
基于预设的初始分词词典对当前的意图标签下的客户应答语料进行分词,获得第一特征词;Perform word segmentation on the customer response corpus under the current intent label based on the preset initial word segmentation dictionary to obtain the first feature word;
基于所述关键词提取方式对所述第一特征词进行提取,获得第二特征词;Extracting the first feature word based on the keyword extraction method to obtain a second feature word;
调整所述第二特征词,获得特有词;Adjust the second feature word to obtain a unique word;
将所述特有词添加入所述初始分词词典中,获得所述客户应答分词词典。The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
在本实施例中,在AI催收场景中,特有词指催收特有词。催收特有词的抽取方法为,将生产上AI催收产生的客户语料,按照意图标签的分类整理在一起。格式如表2:In this embodiment, in the AI collection scenario, the unique words refer to the collection unique words. The method of extracting collection-specific words is to organize the customer corpus generated by AI collection in production according to the classification of intent labels. The format is shown in Table 2:
表2Table 2
在整理成以上格式之后,先利用jieba默认词典对每一个意图标签下的客户应答语料(也可以称为客户话术)进行分词,再利用TF-IDF方法,对每一个意图标签下的特征词(也可以称为关键特征词)进行提取。如在意图标签1下的特征词为:我存、上、够、上面、已经、入…根据以上提取的特征词,根据意图标签1的催收业务含义,人工根据场景情况将标签下的特征词进行组合,生成特有词:存上、存够、存上面、存入、已经存…。将特有词加入jieba默认词典,生成客户应答分词词典。根据客户应答分词词典,在后续对客户应答语料进行分词操作时,遇到特有词时,会根据特有词优先进行分词,当没有找到所述特有词时,会根据客户应答分词词典进行分词。After sorting into the above format, first use the jieba default dictionary to segment the customer response corpus (also called customer vocabulary) under each intent tag, and then use the TF-IDF method to classify the feature words under each intent tag. (also known as key feature words) for extraction. For example, the feature words under the intent label 1 are: I save, on, enough, above, already, in... According to the feature words extracted above, according to the collection business meaning of the intent label 1, the feature words under the label are manually sorted according to the scene conditions. Make combinations to generate unique words: save on, save enough, save on, save in, have saved…. Add unique words to jieba default dictionary to generate customer response word segmentation dictionary. According to the customer response word segmentation dictionary, in the subsequent word segmentation operation on the customer response corpus, when a unique word is encountered, the word segmentation will be performed first according to the unique word, and when the unique word is not found, the word segmentation will be performed according to the customer response word segmentation dictionary.
默认的通用的结巴(jieba)默认词典,会使的不同场景中的一些特有的词汇分错,导致在后续特征提取时,提取的特征不具有代表性和适用性,不能很好的用于后续操作。例如:下表1为在AI催收场景中的分词情况:The default general jieba default dictionary will cause some unique words in different scenarios to be misclassified, resulting in the subsequent feature extraction, the extracted features are not representative and applicable, and cannot be used well for subsequent features. operate. For example: Table 1 below shows the word segmentation in the AI collection scenario:
客户应答语料customer response corpus | 结巴默认词典Stuttering Default Dictionary | 客户应答分词词典customer response word segmentation dictionary |
我存上了。I saved. | 我存上了。I saved. | 我存上了。I saved. |
我存不了了。I can't save. | 我存不了了。I can't save. | 我存不了了。I can't save. |
表1Table 1
按照默认的通用jieba分词词典进行分词操作时,根据分词结果,能提取的特征为:我存、上、不了、了。但是将这样的特征输送给下游服务(如AI催收员),下游服务并不 能直观地理解这些特征,并用于实务中。但是按照客户应答分词词典进行分词操作后,能提取的特征为:我、存上了、存不了、了,将“存上了”、“存不了”这样的符合场景特征的词输送给下游服务,可以帮助下游服务更直观的了解客户的情况,帮助下游服务做更优的策略决定。为了能够更好的实现分词,本申请在现有jieba分词词典的基础上,补充对应场景的特有词,从而建立客户应答分词词典。When the word segmentation operation is performed according to the default general jieba word segmentation dictionary, according to the word segmentation results, the features that can be extracted are: I save, on, no, no. However, when such features are delivered to downstream services (such as AI collectors), downstream services cannot intuitively understand these features and use them in practice. However, after the word segmentation operation is performed according to the customer response word segmentation dictionary, the features that can be extracted are: I, saved, cannot be saved, and sent words that conform to the scene characteristics such as "saved" and "cannot be saved" to downstream services , which can help the downstream service to understand the customer's situation more intuitively and help the downstream service to make better policy decisions. In order to better realize word segmentation, the present application supplements the unique words of the corresponding scene on the basis of the existing jieba word segmentation dictionary, thereby establishing a customer response word segmentation dictionary.
S3:基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值。S3: Construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and use the vector value of each dimension in the corpus feature vector as The variable value of the preset corpus-derived variable corresponding to the dimension.
在本实施例中,本申请将语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值,实现获得更加丰富的变量,且通过上述分词处理等步骤所确定出的变量值也更加的精准。基于目标关键词和预设的词语占位符(nan),组成特征字典。将不在目标关键词中的客户话术中的词,替换成词语占位符(nan)。根据特征词典对客户应答语料进行one-hot-encoding(独热编码),创建语料特征向量(即客户话术特征)。在催收场景中,特征词典的形式如下表3:In this embodiment, the application uses the vector value of each dimension in the corpus feature vector as the variable value of the preset corpus-derived variable of the corresponding dimension, so as to obtain more abundant variables, and through the above steps such as word segmentation processing The determined variable values are also more accurate. Based on target keywords and preset word placeholders (nan), a feature dictionary is formed. Replace the words in the customer's vocabulary that are not in the target keyword with word placeholders (nan). According to the feature dictionary, one-hot-encoding (one-hot encoding) is performed on the customer response corpus to create a corpus feature vector (ie, customer vocabulary features). In the collection scenario, the form of the feature dictionary is as follows in Table 3:
词ID | 词word | |
00 | 存上save | |
11 | 存够enough | |
22 | 存入deposit | |
33 | 已经存already saved | |
44 | 存好save | |
55 | 没钱no money | |
66 | 没工资no salary | |
77 | 困难difficulty | |
88 | 没办法no way | |
99 | 还不上not yet | |
…… | …… | |
25002500 | nannan |
表3table 3
根据以上特征字典,每一条客户应答语料都会转化为2501维的语料特征向量。特征字典中的词ID,决定在向量转换过程中,不同的维度所对应的词语。用one-hot-encoding(独热编码)方法很容易恢复对客户违约有显著作用的特征的文字含义,从而有利于建立更加易懂实用的用户画像,当然,本申请也可以根据实际需要,选择其他的向量转换方式,适用即可。具体的语料特征向量举例如下:本申请中“存上”设置为0,则“存上”在词向量的第一个位置,即词向量的第一个维度。当客户应答语料的分词结果中存在“存上”这个词时,转换生成的客户应答语料中的第一个维度的值为“1”,否则为0;而词向量第二个维度是“存够”,如果客户应答语料的分词结果命中“存够”,则语料特征向量的第二维为1,否则为0,以此类推。举例如下表4所示:According to the above feature dictionary, each customer response corpus will be converted into a 2501-dimensional corpus feature vector. The word ID in the feature dictionary determines the words corresponding to different dimensions during the vector conversion process. One-hot-encoding (one-hot encoding) method is used to easily restore the text meaning of the features that have a significant effect on the customer's default, which is conducive to the establishment of more understandable and practical user portraits. Of course, this application can also choose according to actual needs. Other vector conversion methods are applicable. A specific example of the corpus feature vector is as follows: In this application, "save" is set to 0, then "save" is at the first position of the word vector, that is, the first dimension of the word vector. When the word "Save" exists in the word segmentation result of the customer response corpus, the value of the first dimension in the converted customer response corpus is "1", otherwise it is 0; and the second dimension of the word vector is "Save" Enough", if the word segmentation result of the customer's response corpus hits "save enough", the second dimension of the corpus feature vector is 1, otherwise it is 0, and so on. An example is shown in Table 4 below:
表4Table 4
S4:基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量。S4: Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable.
在本实施例中,通过预设的不同的策略,确定出真实值衍生变量,实现对变量的扩充, 便于后续待筛选的变量更加的丰富。In this embodiment, the real-value derived variable is determined through different preset strategies, so as to realize the expansion of the variable, so as to facilitate the subsequent selection of variables to be more abundant.
具体的,所述真实值包括违约真实值,所述违约真实值与所述意图标签为一一对应的映射关系,所述基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量的步骤包括:Specifically, the real value includes a default real value, the default real value and the intention label are in a one-to-one mapping relationship, and the real value and the intention label are variable based on different preset strategies Determining the operation, the steps to obtain the true value derived variable include:
分别计算在每种意图标签中,所述违约真实值的数量与客户数量的比率,获得违约比率;Calculate the ratio of the number of default true values to the number of customers in each intent label separately to obtain the default ratio;
将大于预先计算的总违约率的违约比率作为显著违约率,将所述显著违约率对应的意图标签作为显著标签;Taking the default rate greater than the pre-calculated total default rate as the significant default rate, and taking the intention label corresponding to the significant default rate as the significant label;
基于所述显著标签衍生拒绝还款次数变量,并基于所述意图标签分别衍生说谎次数变量和拒接电话次数变量;Deriving a variable of times of repayment refusal based on the significant label, and deriving a variable of the number of times of lying and the number of rejected calls based on the intention label;
将所述拒绝还款次数变量、说谎次数变量和拒接电话次数变量作为所述真实值衍生变量。The variable of the number of times of refusing payment, the number of lying and the number of rejecting calls is used as the true value derivative variable.
在本实施例中,违约真实值指在催收场景中,客户未按照约定的期限进行还款;或者,在物流场景中,客户未按照约定的日期发货或发货质量低于约定的质量等;上述情况均属于该客户违约,若客户违约,则对应产生该违约真实值,用以标记客户已违约。本申请提取客户在过去一段时间内的意图标签。客户在一次产生的意图标签有一定的偶然性,而利用客户在过去一段时间中产生的还款意图标签,能够获取客户更加全面的信息。如客户在当期催收中表示已经存款,但是在前几个月一直质疑是否AI拨打,这很可能表示这位客户已经识别出AI拨打,因此在敷衍。针对这部分信息,一方面,对于催收产生的意图标签与客户违约率进行相关分析,根据相关分析,衍生可能对客户违约预测有显著作用的标签。其中,违约率的计算方法为:统计每一个意图标签下,实际发生违约的客户数量与该标签的总数量的比率。如:催收时输出意图标签-拒绝还款的客户数是100人,而这100人中,实际发生违约的是50人,那么在该意图标签下的违约率为50%。经过这样的相关分析,有些意图标签下的违约率远远高于或者低于总体客户违约率,那么认为这些标签对于预测客户违约有显著作用。基于这些标签,可以衍生一系列变量。如拒绝还款这一标签的违约率要远高于总体客户违约率,那么可以衍生出变量:近1个月拒绝还款次数、近3个月拒绝还款次数、近6个月拒绝还款次数等。提取客户在过去一段时间内的意图标签和客户在过去一段时间内的真实还款表现。将在催收过程中与真实还款表现不一致的客户标记出来。衍生出变量。如:客户在催收时承诺还款,但是实际出现了违约,证明客户在催收时说了谎。基于此可以衍生出的变量如:近1个月承诺还款但违约的次数,近3个月承诺还款但违约的次数,近6个月承诺还款但违约的次数等。提取客户在过去一段时间内的AI催收电话接听情况。创建AI催收电话接听情况的衍生变量。如连续三个月未接听电话的次数等。In this embodiment, the true value of default refers to that in the collection scenario, the customer fails to repay the loan within the agreed time limit; or, in the logistics scenario, the customer fails to deliver the goods on the agreed date or the delivery quality is lower than the agreed quality, etc. ; All of the above situations belong to the default of the customer. If the customer defaults, the true value of the default will be generated correspondingly to mark that the customer has defaulted. This application extracts the customer's intent tags over a period of time in the past. The intention labels generated by customers at one time are contingent to a certain extent, and more comprehensive information can be obtained by using the repayment intention labels generated by customers in the past period of time. If a customer indicated that he had made a deposit in the current collection, but had been questioning whether the AI made a call in the past few months, this probably means that the customer has recognized the AI call and is therefore perfunctory. In response to this part of the information, on the one hand, a correlation analysis is made between the intention labels generated by the collection and the customer default rate. According to the relevant analysis, labels that may have a significant effect on the customer default prediction are derived. Among them, the calculation method of the default rate is: under each intention label, the ratio of the actual number of customers who have defaulted to the total number of the label is counted. For example: output intention label when collecting collection - the number of customers who refuse to pay is 100, and among these 100 people, 50 people actually defaulted, then the default rate under this intention label is 50%. After such correlation analysis, the default rate under some intent labels is much higher or lower than the overall customer default rate, so these labels are considered to have a significant role in predicting customer default. Based on these labels, a series of variables can be derived. If the default rate of the label of refusal to repay is much higher than the overall customer default rate, then variables can be derived: the number of refusals to repay in the past 1 month, the number of refusals to repay in the past 3 months, and the number of refusals to repay in the past 6 months times, etc. Extract the customer's intent labels in the past period and the customer's real repayment performance in the past period. Flag customers who are inconsistent with their actual repayment performance during the collection process. derived variables. For example, the customer promised to repay the loan when collecting collections, but a breach of contract actually occurred, which proves that the customer lied when collecting collections. The variables that can be derived based on this are: the number of times of commitment to repayment but default in the past 1 month, the number of times of commitment to repayment but default in the past 3 months, the number of times of commitment to repayment but default in the past 6 months, etc. Extract the customer's AI collection call reception in the past period of time. Create a derived variable for AI's collection of phone calls. Such as the number of unanswered calls for three consecutive months, etc.
S5:将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量。S5: Use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable.
在本实施例中,对所述自变量进行单变量分析和筛选,获得目标变量。在将以上衍生变量处理好之后,作为自变量,客户是否违约作为因变量,用lightgbm方法进行建模。并对模型进行验证和追踪测试。将对客户违约有显著并稳定作用的变量筛选出来,用于用户画像变量的稳定、标准化输出。In this embodiment, the independent variable is subjected to univariate analysis and screening to obtain the target variable. After the above derived variables are processed, as independent variables, whether the customer defaults as a dependent variable, the lightgbm method is used to model. Validate and trace the model. Variables that have a significant and stabilizing effect on customer default are screened out for stable and standardized output of user profile variables.
具体的,所述基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量的步骤包括:Specifically, the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable includes:
计算每一个自变量的缺失率,删除缺失率大于预设的缺失阈值的自变量,获得初始自变量;Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;
计算所述初始自变量之间的相关系数,根据相关系数生成相关自变量集合;Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;
从每个所述相关自变量集合中随机选择一个初始自变量作为所述目标变量。An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
在本实施例中,缺失率指变量所对应的变量值的缺失情况。对自变量进行单变量分析, 计算每一个变量的缺失率。将缺失率大于预设阈值的自变量删除。本申请中缺失率为95%。如某一自变量x
n的缺失率达到95%,则删除该自变量。计算每一个自变量与其他自变量的相关系数,删除与其他自变量相关性很高的自变量。如自变量:x
1与x
2、x
5…x
200的相关系数都大于0.95,则任选其中一个自变量作为目标变量,比如只保留x
1。通过筛选有利于减少自变量的数量,去除多于的自变量。
In this embodiment, the missing rate refers to the missing condition of the variable value corresponding to the variable. Univariate analysis was performed on the independent variables, and the missing rate for each variable was calculated. Remove independent variables with a missing rate greater than a preset threshold. The missing rate in this application is 95%. If the missing rate of an independent variable x n reaches 95%, the independent variable is deleted. Calculate the correlation coefficient between each independent variable and other independent variables, and delete independent variables that are highly correlated with other independent variables. If the correlation coefficients of the independent variables: x 1 and x 2 , x 5 . It is helpful to reduce the number of independent variables by screening and remove more than one independent variables.
其中,所述计算所述初始自变量之间的相关系数的步骤包括:Wherein, the step of calculating the correlation coefficient between the initial independent variables includes:
所述相关系数的特征为:The characteristics of the correlation coefficient are:
其中,ρ
X,Y表示所述相关系数,X和Y表示不同的初始自变量,cov表示协方差,E表示期望,u
x表示X的期望,u
y表示Y的期望。
Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.
在本实施例中,通过上述公式计算两个变量x,y之间Pearson相关系数,相关系数等于两个变量的协方差除于两个变量的标准差。其中,cov(X,Y)表示两个变量X和Y之间的协方差,E表示期望,u
x表示X的期望E(X),u
y表示Y的期望E(Y)。
In this embodiment, the Pearson correlation coefficient between the two variables x, y is calculated by the above formula, and the correlation coefficient is equal to the covariance of the two variables divided by the standard deviation of the two variables. where cov(X, Y) represents the covariance between two variables X and Y, E represents the expectation, u x represents the expectation E(X) of X, and u y represents the expectation E(Y) of Y.
S6:基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型。S6: Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model.
在本实施例中,将经过上述步骤S5,即单变量分析方式筛选后的变量,放入预设的第一画像模型中,获得中间画像模型,其中,第一画像模型为lightgbm模型,根据中间画像模型输出的变量重要性,删除变量重要性低于预设重要性阈值的变量,获得第一目标变量集合,和第二画像模型。基于第一目标变量集合所对应的真实值训练所述第二画像模型,获得目标画像模型。In this embodiment, the variables filtered through the above step S5, that is, the univariate analysis method, are put into a preset first portrait model to obtain an intermediate portrait model, wherein the first portrait model is the lightgbm model, according to the intermediate portrait model The variable importance output by the portrait model, delete the variables whose variable importance is lower than the preset importance threshold, and obtain the first target variable set and the second portrait model. The second portrait model is trained based on the real values corresponding to the first target variable set to obtain a target portrait model.
具体的,所述基于所述目标变量所对应的真实值训练所述第二画像模型,获得目标画像模型的步骤还包括:Specifically, the step of training the second portrait model based on the true value corresponding to the target variable, and obtaining the target portrait model further includes:
基于所述目标变量所对应的真实值训练所述第二画像模型,获得初始画像模型;Train the second portrait model based on the true value corresponding to the target variable to obtain an initial portrait model;
基于所述目标变量,接收下一时间段中所述目标变量所对应的真实值,作为跨期样本;Based on the target variable, receive the true value corresponding to the target variable in the next time period as an intertemporal sample;
通过所述跨期样本计算所述初始画像模型中每个目标变量在所述跨期样本上的稳定度;Calculate the stability of each target variable in the initial portrait model on the intertemporal sample by using the intertemporal sample;
基于所述稳定度调整所述目标变量,获得调整后的目标变量;Adjust the target variable based on the stability to obtain the adjusted target variable;
基于所述调整后的目标变量调整所述初始画像模型,获得调整后的初始画像模型,并基于所述调整后的目标变量所对应的真实值训练所述调整后的初始画像模型,获得所述目标画像模型。Adjust the initial portrait model based on the adjusted target variable, obtain the adjusted initial portrait model, and train the adjusted initial portrait model based on the actual value corresponding to the adjusted target variable, and obtain the Target portrait model.
在本实施例中,基于所述目标变量,接收下一时间段中所述目标变量所对应的真实值,作为跨期样本,其中,下一时间段可以为后续新的一个月;通过所述跨期样本对所述第二画像模型进行验证,计算所述第二画像模型中每一个目标变量在跨期样本上的稳定度,其中,稳定度用PSI来衡量,PSI的计算公式如下:In this embodiment, based on the target variable, the actual value corresponding to the target variable in the next time period is received as an inter-period sample, wherein the next time period may be a new subsequent month; The inter-period sample verifies the second portrait model, and calculates the stability of each target variable in the second portrait model on the inter-period sample, wherein the stability is measured by PSI, and the calculation formula of PSI is as follows:
其中,
表示所述跨期样本在所有真实值中的实际占比,
表示所述跨期样本在所有真实值中的预期占比。在计算出每一个目标变量的稳定度后,删除第二画像模型中的PSI>0.1的目标变量,获得第二目标变量集合,即调整后的目标变量。本申请还可以继续持续追踪至少后续两个月的新的跨期样本,确定第二画像模型在新的跨期样本上表现的稳定性。
in, represents the actual proportion of the intertemporal samples in all true values, Indicates the expected proportion of the intertemporal sample in all true values. After calculating the stability of each target variable, delete the target variable with PSI>0.1 in the second portrait model, and obtain the second target variable set, that is, the adjusted target variable. The present application can also continue to track new intertemporal samples for at least the following two months to determine the stability of the performance of the second portrait model on the new intertemporal samples.
S7:接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。S7: Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
在本实施例中,待识别变量为最终确定加入目标画像模型中的目标变量。在经过上述步骤之后,目标画像模型不仅能够稳定地预测客户的违约概率,同时,可以根据输入模型 中的待识别变量的值,产出用户画像,更全面直观地反映客户风险,帮助催员更有效地制定催收策略。如:对于客户A,获得待识别变量的值,将待识别变量的值输入至所述目标画像模型中,目标画像模型输出预测的违约概率,以及与所述违约概率相关性较大的标签,形成客户画像。例如,目标画像模型输出违约概率为0.9,同时输出“投诉”、“没钱”、“烦”、近3个月表示要投诉1次、近6个月承诺还款但未还1次等标签。通过将客户画像传输给用户终端,相关用户(例如催收场景中的催收人员)能够更好地了解客户关键信息,来制定后续催收策略。同时,本申请输出的用户画像,使公司能够更加全面稳定地了解客户,管理客户风险。充分的利用了大量珍贵自然语言文本资源。建立基于历史的客户回答语料和意图标签的客户画像,可以准确地将客户的关键风险点展示出来,补足传统的画像模型。有利于公司相关部门管理客户,进行更加合理地资源配置,节省公司运营成本。同时,通过将资源更倾向于客户画像中高风险客户,降低客户画像中低风险客户地打扰率,提升客户体验。In this embodiment, the variable to be identified is the target variable that is finally determined to be added to the target portrait model. After the above steps, the target portrait model can not only stably predict the default probability of customers, but also can generate user portraits according to the values of the variables to be identified in the input model, which can reflect customer risks more comprehensively and intuitively, and help urge employees to improve their performance. Develop collection strategies effectively. For example: for customer A, the value of the variable to be identified is obtained, and the value of the variable to be identified is input into the target portrait model, and the target portrait model outputs the predicted default probability and the label with a greater correlation with the default probability, Create customer portraits. For example, the target portrait model outputs a default probability of 0.9, and outputs labels such as "complaint", "no money", "annoying", one complaint in the past 3 months, and a promise to repay but not once in the past 6 months. . By transmitting customer portraits to user terminals, relevant users (such as collection personnel in collection scenarios) can better understand key customer information and formulate follow-up collection strategies. At the same time, the user portraits output by this application enable the company to understand customers more comprehensively and stably, and manage customer risks. Make full use of a large number of precious natural language text resources. The establishment of customer portraits based on historical customer response corpus and intent labels can accurately display the key risk points of customers and complement the traditional portrait model. It is beneficial for the relevant departments of the company to manage customers, make more reasonable resource allocation, and save the company's operating costs. At the same time, by directing resources to high-risk customers in the customer profile, the interruption rate of low-risk customers in the customer profile is reduced, and the customer experience is improved.
本申请有效运用了海量的历史的客户应答语料和意图标签,基于历史的客户应答语料生成自变量,并通过对自变量的调整,实现筛选出与客户画像的相关性高的变量,进而实现通过输入最终的目标画像模型中少量的变量的值,而能够获得更加精准的客户画像。输出的客户画像可以明确地将客户的关键点展示出来,实现获得表现更佳的客户画像,进而可以通过客户画像进行更加合理的后续配置。This application effectively uses a large number of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out variables that are highly relevant to customer portraits, and then realize the By entering the values of a small number of variables in the final target portrait model, more accurate customer portraits can be obtained. The output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
需要强调的是,为进一步保证上述目标画像模型的私密和安全性,上述目标画像模型还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned target portrait model, the above-mentioned target portrait model can also be stored in a node of a blockchain.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
本申请可应用于智慧社区领域中,从而推动智慧城市的建设。The present application can be applied in the field of smart communities, thereby promoting the construction of smart cities.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种基于客户应答语料的客户画像装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a customer portrait device based on customer response corpus, and the device embodiment corresponds to the method embodiment shown in FIG. 2 . , the device can be specifically applied to various electronic devices.
如图3所示,本实施例所述的基于客户应答语料的客户画像装置300包括:接收模块301,用于接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;分词模块302,用于对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;构建模块303,用于基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;确定模 块304,用于基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;筛选模块305,用于将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;训练模块306,用于基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;以及输入模块307,用于接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。As shown in FIG. 3 , the customer portrait device 300 based on the customer response corpus described in this embodiment includes: a receiving module 301, which is configured to receive the customer response corpus, the intention label and the truth value, wherein the customer response corpus and the The intent label has a one-to-one mapping relationship, and the intent label and the real value have a one-to-one mapping relationship; the word segmentation module 302 is used to perform a word segmentation operation on the customer response corpus to obtain target words, The target word is adjusted to obtain the target keyword; the building module 303 is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the The vector value of each dimension in the corpus feature vector is respectively used as the variable value of the preset corpus-derived variable of the corresponding dimension; the determination module 304 is used for performing the actual value and the intent label based on different preset strategies. The variable determination operation is to obtain the real value derived variable; the screening module 305 is used to use the corpus derived variable and the real value derived variable as independent variables, and screen the independent variables based on the preset univariate analysis method, and obtain target variable; the training module 306 is used to adjust the preset first portrait model based on the target variable, obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target variable a portrait model; and an input module 307, configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a client portrait.
在本实施例中,本申请有效运用了海量的历史的客户应答语料和意图标签,基于历史的客户应答语料生成自变量,并通过对自变量的调整,实现筛选出与客户画像的相关性高的变量,进而实现通过输入最终的目标画像模型中少量的变量的值,而能够获得更加精准的客户画像。输出的客户画像可以明确地将客户的关键点展示出来,实现获得表现更佳的客户画像,进而可以通过客户画像进行更加合理的后续配置。In this embodiment, the application effectively uses a large amount of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out high correlation with customer portraits. variables, and then achieve more accurate customer portraits by inputting the values of a small number of variables in the final target portrait model. The output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
分词模块302包括调整子模块、分词子模块、提取子模块和筛选子模块。其中,调整子模块用于基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典;分词子模块用于基于所述客户应答分词词典分别对每种所述意图标签下的客户应答语料进行分词,获得目标词语;提取子模块用于基于预设的关键词提取方式分别对每种所述意图标签下的所述目标词语进行提取,获得初始关键词;筛选子模块用于对每种所述意图标签下的初始关键词进行筛选,获得所述目标关键词。The word segmentation module 302 includes an adjustment sub-module, a word segmentation sub-module, an extraction sub-module and a screening sub-module. Wherein, the adjustment sub-module is used to adjust the preset initial word segmentation dictionary based on the customer response corpus, and obtain the customer response word segmentation dictionary; the word segmentation sub-module is used to separately analyze the customers under each of the intent tags based on the customer response word segmentation dictionary. The response corpus is segmented to obtain the target words; the extraction sub-module is used to extract the target words under each of the intent tags based on the preset keyword extraction method, to obtain the initial keywords; the screening sub-module is used to The initial keywords under each of the intent tags are screened to obtain the target keywords.
调整子模块包括识别单元、分词单元、提取单元、调整单元和获得单元。其中,识别单元用于识别相同意图标签下的客户应答语料;分词单元用于基于预设的初始分词词典对当前的意图标签下的客户应答语料进行分词,获得第一特征词;提取单元用于基于所述关键词提取方式对所述第一特征词进行提取,获得第二特征词;调整单元用于调整所述第二特征词,获得特有词;获得单元用于将所述特有词添加入所述初始分词词典中,获得所述客户应答分词词典。The adjustment sub-module includes a recognition unit, a word segmentation unit, an extraction unit, an adjustment unit and an acquisition unit. The identification unit is used to identify the customer response corpus under the same intent label; the word segmentation unit is used to segment the customer response corpus under the current intent label based on a preset initial word segmentation dictionary to obtain the first feature word; the extraction unit is used for The first feature word is extracted based on the keyword extraction method to obtain the second feature word; the adjusting unit is used to adjust the second feature word to obtain the unique word; the obtaining unit is used to add the unique word into the In the initial word segmentation dictionary, the customer response word segmentation dictionary is obtained.
确定模块304包括计算子模块、违约率子模块、第一衍生子模块和第二衍生子模块。其中,计算子模块用于分别计算在每种意图标签中,所述违约真实值的数量与客户数量的比率,获得违约比率;违约率子模块用于将大于预先计算的总违约率的违约比率作为显著违约率,将所述显著违约率对应的意图标签作为显著标签;第一衍生子模块用于基于所述显著标签衍生拒绝还款次数变量,并基于所述意图标签分别衍生说谎次数变量和拒接电话次数变量;第二衍生子模块用于将所述拒绝还款次数变量、说谎次数变量和拒接电话次数变量作为所述真实值衍生变量。The determination module 304 includes a calculation sub-module, a default rate sub-module, a first derivative sub-module and a second derivative sub-module. Wherein, the calculation sub-module is used to calculate the ratio of the number of default real values to the number of customers in each intent label, to obtain the default ratio; the default rate sub-module is used to calculate the default ratio greater than the pre-calculated total default rate As the significant default rate, the intention label corresponding to the significant default rate is used as the significant label; the first derivative sub-module is used to derive the variable of the number of times of repayment rejection based on the significant label, and to derive the variable of the number of lies and The variable of times of rejecting calls; the second derivative sub-module is configured to use the variable of times of rejecting payment, the times of lying and the times of rejecting calls as the real value derivative variables.
筛选模块305包括缺失率计算子模块、相关系数计算子模块和选择子模块。其中,缺失率计算子模块用于计算每一个自变量的缺失率,删除缺失率大于预设的缺失阈值的自变量,获得初始自变量;相关系数计算子模块用于计算所述初始自变量之间的相关系数,根据相关系数生成相关自变量集合;选择子模块用于从每个所述相关自变量集合中随机选择一个初始自变量作为所述目标变量。The screening module 305 includes a missing rate calculation sub-module, a correlation coefficient calculation sub-module and a selection sub-module. Among them, the missing rate calculation sub-module is used to calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable; the correlation coefficient calculation sub-module is used to calculate the difference between the initial independent variables. The correlation coefficient between the correlation coefficients is generated, and the relevant independent variable set is generated according to the correlation coefficient; the selection sub-module is used to randomly select an initial independent variable from each of the relevant independent variable sets as the target variable.
在本实施例的一些可选的实现方式中,上述相关系数计算子模块进一步用于:所述相关系数的特征为:In some optional implementations of this embodiment, the above-mentioned correlation coefficient calculation sub-module is further configured to: the characteristics of the correlation coefficient are:
其中,ρ
X,Y表示所述相关系数,X和Y表示不同的初始自变量,cov表示协方差,E表示期望,u
x表示X的期望,u
y表示Y的期望。
Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.
训练模块306包括训练子模块、接收子模块、稳定度计算子模块、第一获得子模块和第二获得子模块。训练子模块,用于基于所述目标变量所对应的真实值训练所述第二画像模型,获得初始画像模型;接收子模块,用于基于所述目标变量,接收下一时间段中所述目标变量所对应的真实值,作为跨期样本;稳定度计算子模块,用于通过所述跨期样本计算所述初始画像模型中每个目标变量在所述跨期样本上的稳定度;第一获得子模块,用于 基于所述稳定度调整所述目标变量,获得调整后的目标变量;第二获得子模块,用于基于所述调整后的目标变量调整所述初始画像模型,获得调整后的初始画像模型,并基于所述调整后的目标变量所对应的真实值训练所述调整后的初始画像模型,获得所述目标画像模型。The training module 306 includes a training sub-module, a receiving sub-module, a stability calculation sub-module, a first obtaining sub-module and a second obtaining sub-module. The training submodule is used to train the second portrait model based on the actual value corresponding to the target variable to obtain an initial portrait model; the receiving submodule is used to receive the target in the next time period based on the target variable The true value corresponding to the variable is used as an intertemporal sample; the stability calculation submodule is used to calculate the stability of each target variable in the initial portrait model on the intertemporal sample through the intertemporal sample; the first Obtaining a sub-module for adjusting the target variable based on the stability to obtain the adjusted target variable; the second obtaining sub-module for adjusting the initial portrait model based on the adjusted target variable, obtaining the adjusted target variable and train the adjusted initial portrait model based on the true value corresponding to the adjusted target variable to obtain the target portrait model.
本申请有效运用了海量的历史的客户应答语料和意图标签,基于历史的客户应答语料生成自变量,并通过对自变量的调整,实现筛选出与客户画像的相关性高的变量,进而实现通过输入最终的目标画像模型中少量的变量的值,而能够获得更加精准的客户画像。输出的客户画像可以明确地将客户的关键点展示出来,实现获得表现更佳的客户画像,进而可以通过客户画像进行更加合理的后续配置。This application effectively uses a large number of historical customer response corpus and intent labels, generates independent variables based on the historical customer response corpus, and adjusts the independent variables to screen out variables that are highly relevant to customer portraits, and then realize the By entering the values of a small number of variables in the final target portrait model, more accurate customer portraits can be obtained. The output customer portrait can clearly display the key points of the customer, so as to obtain a better-performing customer portrait, and then a more reasonable follow-up configuration can be carried out through the customer portrait.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.
所述计算机设备200包括通过系统总线相互通信连接存储器201、处理器202、网络接口203。需要指出的是,图中仅示出了具有组件201-203的计算机设备200,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 200 includes a memory 201 , a processor 202 , and a network interface 203 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 201-203 is shown in the figure, but it should be understood that implementation of all shown components is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
所述存储器201至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。所述计算机可读存储介质可以是非易失性,也可以是易失性。在一些实施例中,所述存储器201可以是所述计算机设备200的内部存储单元,例如该计算机设备200的硬盘或内存。在另一些实施例中,所述存储器201也可以是所述计算机设备200的外部存储设备,例如该计算机设备200上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器201还可以既包括所述计算机设备200的内部存储单元也包括其外部存储设备。本实施例中,所述存储器201通常用于存储安装于所述计算机设备200的操作系统和各类应用软件,例如基于客户应答语料的客户画像方法的计算机可读指令等。此外,所述存储器201还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 201 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. The computer-readable storage medium may be non-volatile or volatile. In some embodiments, the memory 201 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 . In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 201 may also include both an internal storage unit of the computer device 200 and an external storage device thereof. In this embodiment, the memory 201 is generally used to store the operating system and various application software installed on the computer device 200 , such as computer-readable instructions of the customer portrait method based on the customer response corpus. In addition, the memory 201 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器202在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器202通常用于控制所述计算机设备200的总体操作。本实施例中,所述处理器202用于运行所述存储器201中存储的计算机可读指令或者处理数据,例如运行所述基于客户应答语料的客户画像方法的计算机可读指令。In some embodiments, the processor 202 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 202 is typically used to control the overall operation of the computer device 200 . In this embodiment, the processor 202 is configured to execute computer-readable instructions or process data stored in the memory 201 , for example, computer-readable instructions for executing the customer portrait method based on the customer response corpus.
所述网络接口203可包括无线网络接口或有线网络接口,该网络接口203通常用于在所述计算机设备200与其他电子设备之间建立通信连接。The network interface 203 may include a wireless network interface or a wired network interface, and the network interface 203 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
在本实施例中,基于历史的客户应答语料生成自变量,并通过对自变量的调整,实现筛选出与客户画像的相关性高的变量,实现通过输入最终的目标画像模型中少量的变量的值,而能够获得更加精准的客户画像。In this embodiment, independent variables are generated based on historical customer response corpus, and by adjusting the independent variables, variables with high correlation with customer portraits are screened out, and a small number of variables in the final target portrait model are inputted. value, and can obtain a more accurate customer portrait.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于客户应答语料的客户画像方法的步骤。The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to execute the steps of the above-mentioned method of customer portrait based on customer response corpus.
在本实施例中,基于历史的客户应答语料生成自变量,并通过对自变量的调整,实现筛选出与客户画像的相关性高的变量,实现通过输入最终的目标画像模型中少量的变量的值,而能够获得更加精准的客户画像。In this embodiment, independent variables are generated based on historical customer response corpus, and by adjusting the independent variables, variables with high correlation with customer portraits are screened out, and a small number of variables in the final target portrait model are inputted. value, and can obtain a more accurate customer portrait.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the patent scope of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structures made by using the contents of the description and drawings of this application, which are directly or indirectly used in other related technical fields, are all within the scope of protection of the patent of this application.
Claims (20)
- 一种基于客户应答语料的客户画像方法,包括下述步骤:A customer portrait method based on customer response corpus, comprising the following steps:接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
- 根据权利要求1所述的基于客户应答语料的客户画像方法,其中,所述对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词的步骤包括:The customer portrait method based on customer response corpus according to claim 1, wherein the step of performing a word segmentation operation on the customer response corpus to obtain target words, adjusting the target words, and obtaining target keywords comprises:基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典;Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;基于所述客户应答分词词典分别对每种所述意图标签下的客户应答语料进行分词,获得目标词语;Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;基于预设的关键词提取方式分别对每种所述意图标签下的所述目标词语进行提取,获得初始关键词;Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;对每种所述意图标签下的初始关键词进行筛选,获得所述目标关键词。The initial keywords under each of the intent tags are filtered to obtain the target keywords.
- 根据权利要求2所述的基于客户应答语料的客户画像方法,其中,所述基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典的步骤包括:The customer portrait method based on the customer response corpus according to claim 2, wherein the step of adjusting a preset initial word segmentation dictionary based on the customer response corpus, and obtaining the customer response word segmentation dictionary comprises:识别相同意图标签下的客户应答语料;Identify the customer response corpus under the same intent label;基于预设的初始分词词典对当前的意图标签下的客户应答语料进行分词,获得第一特征词;Perform word segmentation on the customer response corpus under the current intent label based on the preset initial word segmentation dictionary to obtain the first feature word;基于所述关键词提取方式对所述第一特征词进行提取,获得第二特征词;Extracting the first feature word based on the keyword extraction method to obtain a second feature word;调整所述第二特征词,获得特有词;Adjust the second feature word to obtain a unique word;将所述特有词添加入所述初始分词词典中,获得所述客户应答分词词典。The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
- 根据权利要求1所述的基于客户应答语料的客户画像方法,其中,所述基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量的步骤包括:The customer portrait method based on customer response corpus according to claim 1, wherein the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable comprises:计算每一个自变量的缺失率,删除缺失率大于预设的缺失阈值的自变量,获得初始自变量;Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;计算所述初始自变量之间的相关系数,根据相关系数生成相关自变量集合;Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;从每个所述相关自变量集合中随机选择一个初始自变量作为所述目标变量。An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
- 根据权利要求4所述的基于客户应答语料的客户画像方法,其中,所述计算所述初始自变量之间的相关系数的步骤包括:The customer portrait method based on customer response corpus according to claim 4, wherein the step of calculating the correlation coefficient between the initial independent variables comprises:所述相关系数的特征为:The characteristics of the correlation coefficient are:其中,ρ X,Y表示所述相关系数,X和Y表示不同的初始自变量,cov表示协方差,E表示期望,u x表示X的期望,u y表示Y的期望。 Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.
- 根据权利要求1所述的基于客户应答语料的客户画像方法,其中,所述基于所述目标变量所对应的真实值训练所述第二画像模型,获得目标画像模型的步骤还包括:The customer portrait method based on customer response corpus according to claim 1, wherein the second portrait model is trained based on the actual value corresponding to the target variable, and the step of obtaining the target portrait model further comprises:基于所述目标变量所对应的真实值训练所述第二画像模型,获得初始画像模型;Train the second portrait model based on the true value corresponding to the target variable to obtain an initial portrait model;基于所述目标变量,接收下一时间段中所述目标变量所对应的真实值,作为跨期样本;Based on the target variable, receive the true value corresponding to the target variable in the next time period as an intertemporal sample;通过所述跨期样本计算所述初始画像模型中每个目标变量在所述跨期样本上的稳定度;Calculate the stability of each target variable in the initial portrait model on the intertemporal sample by using the intertemporal sample;基于所述稳定度调整所述目标变量,获得调整后的目标变量;Adjust the target variable based on the stability to obtain the adjusted target variable;基于所述调整后的目标变量调整所述初始画像模型,获得调整后的初始画像模型,并基于所述调整后的目标变量所对应的真实值训练所述调整后的初始画像模型,获得所述目标画像模型。Adjust the initial portrait model based on the adjusted target variable, obtain the adjusted initial portrait model, and train the adjusted initial portrait model based on the actual value corresponding to the adjusted target variable, and obtain the Target portrait model.
- 根据权利要求1所述的基于客户应答语料的客户画像方法,其中,所述真实值包括违约真实值,所述违约真实值与所述意图标签为一一对应的映射关系,所述基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量的步骤包括:The customer portrait method based on customer response corpus according to claim 1, wherein the real value includes a default real value, the default real value and the intention label are in a one-to-one mapping relationship, and the The preset strategy performs a variable determination operation on the true value and the intent label, and the step of obtaining a true value derived variable includes:分别计算在每种意图标签中,所述违约真实值的数量与客户数量的比率,获得违约比率;Calculate the ratio of the number of default true values to the number of customers in each intent label separately to obtain the default ratio;将大于预先计算的总违约率的违约比率作为显著违约率,并将所述显著违约率对应的意图标签作为显著标签;Taking a default rate greater than the pre-calculated total default rate as a significant default rate, and using the intent label corresponding to the significant default rate as a significant label;基于所述显著标签衍生拒绝还款次数变量,并基于所述意图标签分别衍生说谎次数变量和拒接电话次数变量;Deriving a variable of times of repayment refusal based on the significant label, and deriving a variable of the number of times of lying and the number of rejected calls based on the intention label;将所述拒绝还款次数变量、说谎次数变量和拒接电话次数变量作为所述真实值衍生变量。The variable of the number of times of refusing payment, the number of lying and the number of rejecting calls is used as the true value derivative variable.
- 一种基于客户应答语料的客户画像装置,包括:A customer portrait device based on customer response corpus, comprising:接收模块,用于接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;The receiving module is configured to receive the client response corpus, the intent label and the true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one correspondence Mapping relations;分词模块,用于对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;A word segmentation module, configured to perform word segmentation operations on the customer response corpus, obtain target words, adjust the target words, and obtain target keywords;构建模块,用于基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A building module is used to construct a feature dictionary based on the target keyword, and perform vector transformation on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and convert the vector value of each dimension in the corpus feature vector , respectively as the variable values of the preset corpus-derived variables of the corresponding dimensions;确定模块,用于基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;A determination module, configured to perform variable determination operations on the real value and the intent label based on different preset strategies, to obtain real value derived variables;筛选模块,用于将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;A screening module, configured to use the corpus-derived variable and the true value-derived variable as independent variables, and screen the independent variables based on a preset univariate analysis method to obtain a target variable;训练模块,用于基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;以及A training module, configured to adjust the preset first portrait model based on the target variable, obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model; And输入模块,用于接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。The input module is configured to receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain the customer portrait.
- 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的基于客户应答语料的客户画像方法的步骤:A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the following steps of the customer portrait method based on the customer response corpus are implemented:接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;Perform a word segmentation operation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain a true value derived variable;将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;The corpus-derived variable and the true-value derived variable are used as independent variables, and the independent variables are screened based on a preset univariate analysis method to obtain a target variable;基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain a customer portrait.
- 根据权利要求9所述的计算机设备,其中,所述对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词的步骤包括:The computer device according to claim 9, wherein the step of performing a word segmentation operation on the customer response corpus to obtain a target word, adjusting the target word, and obtaining a target keyword comprises:基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典;Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;基于所述客户应答分词词典分别对每种所述意图标签下的客户应答语料进行分词,获得目标词语;Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;基于预设的关键词提取方式分别对每种所述意图标签下的所述目标词语进行提取,获得初始关键词;Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;对每种所述意图标签下的初始关键词进行筛选,获得所述目标关键词。The initial keywords under each of the intent tags are filtered to obtain the target keywords.
- 根据权利要求10所述的计算机设备,其中,所述基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典的步骤包括:The computer device according to claim 10, wherein the step of adjusting a preset initial word segmentation dictionary based on the customer response corpus, and obtaining the customer response word segmentation dictionary comprises:识别相同意图标签下的客户应答语料;Identify the customer response corpus under the same intent label;基于预设的初始分词词典对当前的意图标签下的客户应答语料进行分词,获得第一特征词;Perform word segmentation on the customer response corpus under the current intent label based on the preset initial word segmentation dictionary to obtain the first feature word;基于所述关键词提取方式对所述第一特征词进行提取,获得第二特征词;Extracting the first feature word based on the keyword extraction method to obtain a second feature word;调整所述第二特征词,获得特有词;Adjust the second feature word to obtain a unique word;将所述特有词添加入所述初始分词词典中,获得所述客户应答分词词典。The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
- 根据权利要求9所述的计算机设备,其中,所述基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量的步骤包括:The computer device according to claim 9, wherein the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable comprises:计算每一个自变量的缺失率,删除缺失率大于预设的缺失阈值的自变量,获得初始自变量;Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;计算所述初始自变量之间的相关系数,根据相关系数生成相关自变量集合;Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;从每个所述相关自变量集合中随机选择一个初始自变量作为所述目标变量。An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
- 根据权利要求12所述的计算机设备,其中,所述计算所述初始自变量之间的相关系数的步骤包括:The computer device of claim 12, wherein the step of calculating the correlation coefficient between the initial independent variables comprises:所述相关系数的特征为:The characteristics of the correlation coefficient are:其中,ρ X,Y表示所述相关系数,X和Y表示不同的初始自变量,cov表示协方差,E表示期望,u x表示X的期望,u y表示Y的期望。 Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.
- 根据权利要求9所述的计算机设备,其中,所述基于所述目标变量所对应的真实值训练所述第二画像模型,获得目标画像模型的步骤还包括:The computer device according to claim 9, wherein the step of training the second portrait model based on the actual value corresponding to the target variable, the step of obtaining the target portrait model further comprises:基于所述目标变量所对应的真实值训练所述第二画像模型,获得初始画像模型;Train the second portrait model based on the true value corresponding to the target variable to obtain an initial portrait model;基于所述目标变量,接收下一时间段中所述目标变量所对应的真实值,作为跨期样本;Based on the target variable, receive the true value corresponding to the target variable in the next time period as an intertemporal sample;通过所述跨期样本计算所述初始画像模型中每个目标变量在所述跨期样本上的稳定度;Calculate the stability of each target variable in the initial portrait model on the intertemporal sample by using the intertemporal sample;基于所述稳定度调整所述目标变量,获得调整后的目标变量;Adjust the target variable based on the stability to obtain the adjusted target variable;基于所述调整后的目标变量调整所述初始画像模型,获得调整后的初始画像模型,并基于所述调整后的目标变量所对应的真实值训练所述调整后的初始画像模型,获得所述目标画像模型。Adjust the initial portrait model based on the adjusted target variable, obtain the adjusted initial portrait model, and train the adjusted initial portrait model based on the true value corresponding to the adjusted target variable, and obtain the Target portrait model.
- 根据权利要求9所述的计算机设备,其中,所述真实值包括违约真实值,所述违约真实值与所述意图标签为一一对应的映射关系,所述基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量的步骤包括:The computer device according to claim 9, wherein the real value comprises a default real value, the default real value and the intention label are in a one-to-one mapping relationship, and the The variable determination operation is performed on the true value and the intent label, and the steps of obtaining the variable derived from the true value include:分别计算在每种意图标签中,所述违约真实值的数量与客户数量的比率,获得违约比率;Calculate the ratio of the number of default true values to the number of customers in each intent label separately to obtain the default ratio;将大于预先计算的总违约率的违约比率作为显著违约率,并将所述显著违约率对应的意图标签作为显著标签;Taking a default rate greater than the pre-calculated total default rate as a significant default rate, and using the intent label corresponding to the significant default rate as a significant label;基于所述显著标签衍生拒绝还款次数变量,并基于所述意图标签分别衍生说谎次数变量和拒接电话次数变量;Deriving a variable of the number of times of repayment refusal based on the significant label, and a variable of the number of times of lying and the number of rejected calls based on the intention label, respectively;将所述拒绝还款次数变量、说谎次数变量和拒接电话次数变量作为所述真实值衍生变量。The variable of the number of times of refusing payment, the number of lying and the number of times of rejecting calls is used as the true value derivative variable.
- 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的基于客户应答语料的客户画像方法的步骤:A computer-readable storage medium, on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the steps of the following-described client portrait method based on client response corpus are implemented:接收客户应答语料、意图标签和真实值,其中,所述客户应答语料和所述意图标签具有一一对应的映射关系,所述意图标签和所述真实值具有一一对应的映射关系;receiving a client response corpus, an intent label, and a true value, wherein the client response corpus and the intent label have a one-to-one mapping relationship, and the intent label and the actual value have a one-to-one mapping relationship;对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词;Perform word segmentation on the customer response corpus to obtain target words, and adjust the target words to obtain target keywords;基于所述目标关键词构建特征字典,并基于所述特征字典对所述客户应答语料进行向量转换,获得语料特征向量,将所述语料特征向量中的每一维度的向量值,分别作为对应维度的预设的语料衍生变量的变量值;A feature dictionary is constructed based on the target keyword, and vector transformation is performed on the customer response corpus based on the feature dictionary to obtain a corpus feature vector, and the vector value of each dimension in the corpus feature vector is taken as the corresponding dimension. The variable value of the preset corpus-derived variable;基于不同的预设策略对所述真实值和所述意图标签进行变量确定操作,获得真实值衍生变量;Perform variable determination operations on the true value and the intent label based on different preset strategies to obtain true value derived variables;将所述语料衍生变量和所述真实值衍生变量作为自变量,基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量;Using the corpus-derived variable and the true value-derived variable as independent variables, the independent variables are screened based on a preset univariate analysis method to obtain a target variable;基于所述目标变量调整预设的第一画像模型,获得第二画像模型,并基于所述目标变量所对应的变量值训练所述第二画像模型,获得目标画像模型;Adjust the preset first portrait model based on the target variable to obtain a second portrait model, and train the second portrait model based on the variable value corresponding to the target variable to obtain the target portrait model;接收待识别变量的值,将所述待识别变量的值输入至所述目标画像模型中,获得客户画像。Receive the value of the variable to be identified, input the value of the variable to be identified into the target portrait model, and obtain the customer portrait.
- 根据权利要求16所述的计算机可读存储介质,其中,所述对所述客户应答语料进行分词操作,获得目标词语,对所述目标词语进行调整,获得目标关键词的步骤包括:The computer-readable storage medium according to claim 16, wherein the step of performing a word segmentation operation on the customer response corpus to obtain a target word, and adjusting the target word to obtain a target keyword comprises:基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典;Adjust a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary;基于所述客户应答分词词典分别对每种所述意图标签下的客户应答语料进行分词,获得目标词语;Perform word segmentation on the customer response corpus under each of the intent tags based on the customer response word segmentation dictionary to obtain target words;基于预设的关键词提取方式分别对每种所述意图标签下的所述目标词语进行提取,获得初始关键词;Extracting the target words under each of the intent tags based on a preset keyword extraction method to obtain initial keywords;对每种所述意图标签下的初始关键词进行筛选,获得所述目标关键词。The initial keywords under each of the intent tags are screened to obtain the target keywords.
- 根据权利要求17所述的计算机可读存储介质,其中,所述基于所述客户应答语料调整预设的初始分词词典,获得客户应答分词词典的步骤包括:The computer-readable storage medium according to claim 17, wherein the step of adjusting a preset initial word segmentation dictionary based on the customer response corpus to obtain a customer response word segmentation dictionary comprises:识别相同意图标签下的客户应答语料;Identify the customer response corpus under the same intent label;基于预设的初始分词词典对当前的意图标签下的客户应答语料进行分词,获得第一特征词;Perform word segmentation on the customer response corpus under the current intent tag based on the preset initial word segmentation dictionary to obtain the first feature word;基于所述关键词提取方式对所述第一特征词进行提取,获得第二特征词;Extracting the first feature word based on the keyword extraction method to obtain a second feature word;调整所述第二特征词,获得特有词;Adjust the second feature word to obtain a unique word;将所述特有词添加入所述初始分词词典中,获得所述客户应答分词词典。The unique word is added to the initial word segmentation dictionary to obtain the customer response word segmentation dictionary.
- 根据权利要求16所述的计算机可读存储介质,其中,所述基于预设的单变量分析方式对所述自变量进行筛选,获得目标变量的步骤包括:The computer-readable storage medium according to claim 16, wherein the step of screening the independent variable based on a preset univariate analysis method, and obtaining the target variable comprises:计算每一个自变量的缺失率,删除缺失率大于预设的缺失阈值的自变量,获得初始自变量;Calculate the missing rate of each independent variable, delete the independent variable whose missing rate is greater than the preset missing threshold, and obtain the initial independent variable;计算所述初始自变量之间的相关系数,根据相关系数生成相关自变量集合;Calculate the correlation coefficient between the initial independent variables, and generate a set of relevant independent variables according to the correlation coefficient;从每个所述相关自变量集合中随机选择一个初始自变量作为所述目标变量。An initial independent variable is randomly selected as the target variable from each set of relevant independent variables.
- 根据权利要求19所述的计算机可读存储介质,其中,所述计算所述初始自变量之间的相关系数的步骤包括:The computer-readable storage medium of claim 19, wherein the step of calculating the correlation coefficient between the initial independent variables comprises:所述相关系数的特征为:The characteristics of the correlation coefficient are:其中,ρ X,Y表示所述相关系数,X和Y表示不同的初始自变量,cov表示协方差,E表示期望,u x表示X的期望,u y表示Y的期望。 Among them, ρ X, Y represents the correlation coefficient, X and Y represent different initial independent variables, cov represents the covariance, E represents the expectation, u x represents the expectation of X, and u y represents the expectation of Y.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011487411.XA CN112507116B (en) | 2020-12-16 | 2020-12-16 | Customer portrait method based on customer response corpus and related equipment thereof |
CN202011487411.X | 2020-12-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022126963A1 true WO2022126963A1 (en) | 2022-06-23 |
Family
ID=74972719
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/090166 WO2022126963A1 (en) | 2020-12-16 | 2021-04-27 | Customer profiling method based on customer response corpora, and device related thereto |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112507116B (en) |
WO (1) | WO2022126963A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114911917A (en) * | 2022-07-13 | 2022-08-16 | 树根互联股份有限公司 | Asset meta-information searching method and device, computer equipment and readable storage medium |
CN115099680A (en) * | 2022-07-14 | 2022-09-23 | 平安科技(深圳)有限公司 | Risk management method, device, equipment and storage medium |
CN115660695A (en) * | 2022-11-21 | 2023-01-31 | 浪潮通信信息系统有限公司 | Customer service personnel label portrait construction method and device, electronic equipment and storage medium |
CN117575649A (en) * | 2023-11-22 | 2024-02-20 | 中国人寿保险股份有限公司山东省分公司 | Multi-model-based customer portrait distinguishing method, device, equipment and medium |
CN118013399A (en) * | 2024-04-08 | 2024-05-10 | 北京博瑞彤芸科技股份有限公司 | AI model-based user portrait processing method and device |
CN118332175A (en) * | 2024-06-14 | 2024-07-12 | 江西微博科技有限公司 | Data processing system for converting shared data into user portraits |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507116B (en) * | 2020-12-16 | 2023-10-10 | 平安科技(深圳)有限公司 | Customer portrait method based on customer response corpus and related equipment thereof |
CN112991049B (en) * | 2021-04-13 | 2023-05-30 | 重庆度小满优扬科技有限公司 | Loan information processing method and electronic equipment |
CN113298385A (en) * | 2021-05-26 | 2021-08-24 | 上海晓途网络科技有限公司 | User management method and device, electronic equipment and storage medium |
CN113435998B (en) * | 2021-06-23 | 2023-05-02 | 平安科技(深圳)有限公司 | Loan overdue prediction method and device, electronic equipment and storage medium |
CN114048283A (en) * | 2022-01-11 | 2022-02-15 | 北京仁科互动网络技术有限公司 | User portrait generation method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649704A (en) * | 2016-12-20 | 2017-05-10 | 竹间智能科技(上海)有限公司 | Intelligent dialogue control method and intelligent dialogue control system |
CN109146610A (en) * | 2018-07-16 | 2019-01-04 | 众安在线财产保险股份有限公司 | It is a kind of intelligently to insure recommended method, device and intelligence insurance robot device |
US20190340250A1 (en) * | 2018-05-02 | 2019-11-07 | International Business Machines Corporation | Associating characters to story topics derived from social media content |
CN111639162A (en) * | 2020-06-03 | 2020-09-08 | 贝壳技术有限公司 | Information interaction method and device, electronic equipment and storage medium |
CN112507116A (en) * | 2020-12-16 | 2021-03-16 | 平安科技(深圳)有限公司 | Customer portrait method based on customer response corpus and related equipment thereof |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154395B (en) * | 2017-12-26 | 2021-10-29 | 上海新炬网络技术有限公司 | Big data-based customer network behavior portrait method |
CN110347823A (en) * | 2019-06-06 | 2019-10-18 | 平安科技(深圳)有限公司 | Voice-based user classification method, device, computer equipment and storage medium |
CN111444341B (en) * | 2020-03-16 | 2024-04-12 | 中国平安人寿保险股份有限公司 | User portrait construction method, device, equipment and readable storage medium |
-
2020
- 2020-12-16 CN CN202011487411.XA patent/CN112507116B/en active Active
-
2021
- 2021-04-27 WO PCT/CN2021/090166 patent/WO2022126963A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649704A (en) * | 2016-12-20 | 2017-05-10 | 竹间智能科技(上海)有限公司 | Intelligent dialogue control method and intelligent dialogue control system |
US20190340250A1 (en) * | 2018-05-02 | 2019-11-07 | International Business Machines Corporation | Associating characters to story topics derived from social media content |
CN109146610A (en) * | 2018-07-16 | 2019-01-04 | 众安在线财产保险股份有限公司 | It is a kind of intelligently to insure recommended method, device and intelligence insurance robot device |
CN111639162A (en) * | 2020-06-03 | 2020-09-08 | 贝壳技术有限公司 | Information interaction method and device, electronic equipment and storage medium |
CN112507116A (en) * | 2020-12-16 | 2021-03-16 | 平安科技(深圳)有限公司 | Customer portrait method based on customer response corpus and related equipment thereof |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114911917A (en) * | 2022-07-13 | 2022-08-16 | 树根互联股份有限公司 | Asset meta-information searching method and device, computer equipment and readable storage medium |
CN115099680A (en) * | 2022-07-14 | 2022-09-23 | 平安科技(深圳)有限公司 | Risk management method, device, equipment and storage medium |
CN115099680B (en) * | 2022-07-14 | 2024-02-02 | 平安科技(深圳)有限公司 | Risk management method, apparatus, device and storage medium |
CN115660695A (en) * | 2022-11-21 | 2023-01-31 | 浪潮通信信息系统有限公司 | Customer service personnel label portrait construction method and device, electronic equipment and storage medium |
CN117575649A (en) * | 2023-11-22 | 2024-02-20 | 中国人寿保险股份有限公司山东省分公司 | Multi-model-based customer portrait distinguishing method, device, equipment and medium |
CN118013399A (en) * | 2024-04-08 | 2024-05-10 | 北京博瑞彤芸科技股份有限公司 | AI model-based user portrait processing method and device |
CN118332175A (en) * | 2024-06-14 | 2024-07-12 | 江西微博科技有限公司 | Data processing system for converting shared data into user portraits |
Also Published As
Publication number | Publication date |
---|---|
CN112507116A (en) | 2021-03-16 |
CN112507116B (en) | 2023-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022126963A1 (en) | Customer profiling method based on customer response corpora, and device related thereto | |
WO2022126971A1 (en) | Density-based text clustering method and apparatus, device, and storage medium | |
CN112148987B (en) | Message pushing method based on target object activity and related equipment | |
WO2021239004A1 (en) | Abnormal community detection method and apparatus, computer device, and storage medium | |
WO2022126970A1 (en) | Method and device for financial fraud risk identification, computer device, and storage medium | |
WO2020062660A1 (en) | Enterprise credit risk evaluation method, apparatus and device, and storage medium | |
WO2022174491A1 (en) | Artificial intelligence-based method and apparatus for medical record quality control, computer device, and storage medium | |
WO2020207167A1 (en) | Text classification method, apparatus and device, and computer-readable storage medium | |
CN111199474B (en) | Risk prediction method and device based on network map data of two parties and electronic equipment | |
CN110866110A (en) | Conference summary generation method, device, equipment and medium based on artificial intelligence | |
CN112307472A (en) | Abnormal user identification method and device based on intelligent decision and computer equipment | |
WO2023134057A1 (en) | Affair information query method and apparatus, and computer device and storage medium | |
CN112686022A (en) | Method and device for detecting illegal corpus, computer equipment and storage medium | |
WO2022105119A1 (en) | Training corpus generation method for intention recognition model, and related device thereof | |
CN112308173B (en) | Multi-target object evaluation method based on multi-evaluation factor fusion and related equipment thereof | |
CN110197426B (en) | Credit scoring model building method, device and readable storage medium | |
JP7499946B2 (en) | Method and device for training sorting model for intelligent recommendation, method and device for intelligent recommendation, electronic device, storage medium, and computer program | |
CN113887214B (en) | Willingness presumption method based on artificial intelligence and related equipment thereof | |
CN110930242B (en) | Reliability prediction method, device, equipment and storage medium | |
US10664457B2 (en) | System for real-time data structuring and storage | |
CN112182390A (en) | Letter pushing method and device, computer equipment and storage medium | |
CN113724058A (en) | Credit and loan limit evaluation method, device, equipment and storage medium | |
CN115545753A (en) | Partner prediction method based on Bayesian algorithm and related equipment | |
CN113936677A (en) | Tone conversion method, device, computer equipment and storage medium | |
CN113065354A (en) | Method for identifying geographic position in corpus and related equipment thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21904902 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21904902 Country of ref document: EP Kind code of ref document: A1 |