CN117196031A

CN117196031A - Method and system for constructing customer demand cognition system

Info

Publication number: CN117196031A
Application number: CN202311256271.9A
Authority: CN
Inventors: 朱亮
Original assignee: Jinmao Cloud Technology Service Beijing Co ltd
Current assignee: Jinmao Cloud Technology Service Beijing Co ltd
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2023-12-08

Abstract

The application discloses a method and a system for constructing a client demand cognitive system, wherein the method comprises the following steps: acquiring historical customer consultation data of the project; controlling the LLM model to extract client problems based on historical client consultation data; vectorizing all the extracted problems, and respectively calculating the similarity of any two problems; and clustering the extracted problems based on the similarity to obtain a client demand cognitive system. By the method, the extracted problems of attention of the clients are comprehensive, so that sales can comprehensively know the demands of the clients on the projects, and the demands of the clients can be more rapidly solved.

Description

Method and system for constructing customer demand cognition system

Technical Field

The application relates to the technical field of customer demand cognition of projects, in particular to a method and a system for constructing a customer demand cognition system.

Background

In the current market environment, the sales area faces a series of challenges, especially in marketing in both the on-line and off-line industries. Sales personnel often rely on personal experience accumulation to learn about customer needs, however this approach often makes it difficult to quickly build global customer awareness. The newly-entered sales personnel take a long time to accumulate experience, even the deep sales personnel are limited by personal experience, better global cognition cannot be obtained, and the personal experience is difficult to be effectively transferred to other people, so that the problem raised by a customer cannot be solved in time. Taking real estate sales as an example, factors such as project region, price, target customers, brand characteristics, etc., determine the difference in concerns of different customer groups. Sales teams of large real estate enterprises face more significant challenges, especially when the projects are rotating or are being sold new.

The prior art is to construct a customer demand cognition system based on expert experience to construct a label system, supplement and perfect the label system by sampling dialogue records, phone recording texts and the like of users, and express the demands of the users by using the label system. The method comprises the following specific steps: 1. constructing a label system by an expert; 2. sampling users, and collecting WeChat chat records, telephone recording texts and the like; 3. extracting labels from texts in a common mode of word segmentation+TF-IDF algorithm, and mapping labels of word segmentation or vector matching to map text contents into an existing label system; 4. and checking the content and expression of the mapping words or the content which cannot be completed, and expanding a label system. And finally, establishing a user cognitive system.

However, firstly, the construction of a label system depends on manual experience and iteration, and the construction and iteration have limitations and have higher labor cost; and the system constructed by each expert has differences and has poor universality and uniformity. Secondly, to obtain a more comprehensive system label, the constructed label system needs to be perfected continuously, however, the user sample sampling to complement and perfect the label system also needs expert operation, and multiple iterations are needed, which still consumes very much manpower. Finally, the label expresses personal demand, and the granularity is relatively coarse, taking as an example: demand label: just needed, invested, intention house type label: 3 house, etc. But the short expression has high ambiguity and small information load in a sales-oriented experience accumulation scene. For example: the just needed reasons can be that the child reads a book, can be wedding colonisation or otherwise; but fine granularity coverage is more difficult by experts.

Disclosure of Invention

Based on the above, a method for constructing a client demand cognitive system is provided to solve the problem that the client demand cognitive system constructed in the prior art has limitations.

In a first aspect, a method of constructing a customer demand awareness architecture for a project, the method comprising:

acquiring historical customer consultation data of the project;

controlling the LLM model to extract client problems based on historical client consultation data;

vectorizing all the extracted problems, and respectively calculating the similarity of any two problems;

and clustering the extracted problems based on the similarity to obtain a client demand cognitive system.

In the above solution, optionally, the calculating the similarity of any two problems specifically includes:

each problem and all other problems are respectively formed into a problem pair, and all problem pairs are constructed to generate an N multiplied by N distance matrix; n is the number of all problems;

and dividing the distance matrix according to the number n of the server or the cluster CPU by rows to obtain n parts, and controlling each server or the cluster CPU to calculate the similarity of the problem pairs of one part.

In the above solution, further optionally, the splitting the distance matrix according to rows to obtain n parts includes: the following formula is called, so that i calculates the row number of the distance matrix to be segmented when sequentially taking 1, 2 and 3 … n-1:

k(i+1)＝[sqrt(k(i)*k(i)+N*N/n)-k(i)]

wherein k (i) is a line number to be sliced, k (1) =1;

and dividing the distance matrix into n parts according to the calculated dividing line numbers.

In the above solution, further optionally, after calculating the similarity between any two problems, the method further includes:

ordering the problem pairs of each row of the distance matrix according to the order of the similarity from large to small;

taking a problem pairs with the distance matrix of which each row is ranked to be the front, and deleting the rest problem pairs;

the problem pairs reserved in all rows of the distance matrix are ranked according to the similarity from big to small; and the sorting is performed according to the order of the similarity from big to small.

In the foregoing solution, further optionally, clustering the extracted problem based on the similarity includes:

step S301: the first problem pair of the ordering is divided into one type;

step S302: acquiring a problem pair with the ordering at the b bit, wherein the initial value of b is 2;

step S303: judging whether any problem in the problem pair in the b-th position is classified, if so, classifying the problem which is not classified in the b-th problem pair and the problem which is classified in the b-th problem pair into one type, and if not, classifying the b-th problem pair into a new type;

step S304: let b=b+1; step S303 is performed in a loop until all pairs of questions are classified.

In the foregoing solution, further optionally, the clustering the extracted problems based on the similarity further includes:

step S501: vectorizing each type of problem, and calculating the similarity of each type of problem;

step S502: performing hierarchical iterative clustering on each type of problem based on the similarity;

step S503: step S501-step S502 are circularly executed until the total number of categories is smaller than the category set value or the similarity of any two categories of questions is smaller than the similarity set value.

In the above solution, further optionally, the class setting value is 80, and the similarity setting value is 0.65.

In the above aspect, optionally, the method further includes:

after the new problem is acquired, vectorizing the new problem;

calculating the similarity of any two categories, and taking the minimum similarity as a first similarity;

calculating the similarity between the new problem and each divided category respectively, and taking the maximum similarity as a second similarity;

judging whether the second similarity is larger than the first similarity, if so, classifying the new problem into a category corresponding to the second similarity, and if not, reclassifying the new problem into one category.

In a second aspect, a client demand cognitive architecture system for building projects, the system comprising:

the data acquisition module is used for acquiring historical client consultation data of the project;

the client problem extraction module is used for controlling the LLM model to extract client problems based on historical client consultation data;

the similarity module is used for carrying out vectorization processing on all the extracted problems and respectively calculating the similarity of any two problems;

a problem classification module: for clustering the extracted questions based on similarity.

In a third aspect, a computer device comprises a memory storing a computer program and a processor implementing the steps of the client demand awareness architecture method of the build project of the first aspect when the computer program is executed.

The application has at least the following beneficial effects:

according to the application, the problems in the historical consultation data of the clients are extracted, and the extracted problems are clustered to classify the problems with different expressions and similar semantics into one class, so that a hierarchical system cognition is formed, and sales personnel can comprehensively and quickly know what the clients want to know about the project, so that more accurate answering operation can be prepared in advance. In addition, the client demand cognition system constructed by the application is based on a large number of client problems extracted from historical client consultation data, so that the extracted problems of attention of clients have global property, and the sales comprehensively know the demand cognition of the clients on the project so as to more rapidly solve the client demands.

The application also divides the similarity of all the problem pairs into a matrix, and divides the matrix according to the number of the servers so as to enable a plurality of servers to synchronously calculate the similarity of the problem pairs, thereby reducing the calculated amount and the memory usage amount.

The client demand cognition system established by the method can be perfected by continuously perfecting the client demand cognition system in the working process, and only by acquiring the client consultation data and extracting new problems and classifying the problems, the client demand cognition system can be perfected. Therefore, the system for perfecting the customer demand is simple in mode and adaptive, and does not cost too much labor cost.

Drawings

FIG. 1 is a flow chart of a method for constructing a project customer demand cognition system according to one embodiment of the present application;

FIG. 2 is a flow chart of a method for training LLM minimodels according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating the steps of optimizing a pre-training model according to one embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In one embodiment, as shown in FIG. 1, a method of constructing a customer demand awareness hierarchy for an item is provided, comprising the steps of:

step S101, historical customer consultation data of an item is obtained;

the historical customer consultation data is data in the process of historical sales and customer communication and can be divided into online communication and offline communication; all communication data are saved after the client agrees; for example, online communication includes various chat records and text records of telephone calls, so when recording the telephone communication process, the telephone recording can be performed with the consent of the client. The offline recording is mainly used for carrying out consultation by taking recording equipment such as a bill board and the like as a main recording client.

After the historical consultation data of the client is obtained, preprocessing the data, sorting according to dialogue time, removing unique identification, time stamp and the like of the client and semantic irrelevant information, and arranging the text into the following format:

and (3) a client: XXXXX; sales: XXXXX

And (3) a client: XXXXX; sales: XXXXX

And during optimization, removing redundancy of various spoken languages and the like, and refining the attention of customers. The reserve sales answer part has positive and negative effects due to recording-to-text deviation (homophonic different words, environmental interference, etc.). Negative: multiple rounds of clear communication such as sales site conversation, back questions and the like can lead the model to refine the problem by mistake; front face: the user dialogues misinterpretation|missing and the like, so that extracted content is lost, the relevance of the content is answered, and the LLM model is combined to assist the model in completing the relevance of the questions; character mismatches of live voice recordings translation are also not a low probability event. Data testing shows that the benefit of the overall retained answer portion is biased towards the forward direction. Under the condition that the on-line/off-line corpus quality and scene difference are obvious, scene segmentation can be performed, and knowledge extraction can be performed respectively.

After optimizing the data, simply splicing the data sources of the same type; for example: each user is boring and may be directly merged together.

Due to the input limitations of the LLM model and the extraction information effect, the input content slicing is required. The comprehensive input content contains more complete information and LLM reasoning performance under a large amount of data. Through the actual representation of a large amount of data in the real estate industry, when each piece of content is controlled to be about 600token, the integrity, the readability effect and the like of the extraction problem are relatively good.

Step S102: controlling the LLM model to extract client problems based on historical client consultation data;

specifically, the LLM trillion-level large model or the billion-level small model can be adopted for problem extraction, and the main mode is as follows:

a: using LLM trillion-level large models: and (3) adopting a prompt command mode, directly instructing the large model to refine the output of the client problem based on question-answering pairs, and calling out better prompt.

b. If the data confidentiality is involved, a large model of a public API can not be called, billion-level small models can be deployed for extraction, and the effect of the large model can be approached by carrying out certain fine-tune due to generalization of the small models and incapacitation of the large models by comprehension/adaptation of the prompt. The more acceptable effect can be achieved according to about 3-5000 high-quality corpus of experience. The specific steps are shown in fig. 2.

Step S103: vectorizing all the extracted problems, and respectively calculating the similarity of any two problems;

specifically, the extracted problem can be vectorized by using a pre-trained word vector model. Common pre-training models such as word2vec, bert, roberta models are available. The actual steps are as follows:

a. preprocessing the word of the problem, such as: i love Beijing, cut into I, love and Beijing;

b. vectorization is performed with a pre-trained model, such as: i translate to vector a [0.233,0.142, … ];

c. carrying out mean value processing on the word vector sum to obtain a question sentence vector; (a+b+c)/3.

Specifically, calculating the similarity of any two problems can be calculated by:

a: each problem and all other problems are respectively formed into a problem pair, and all problem pairs are constructed to generate an N multiplied by N distance matrix; n is the number of all problems; therefore, since the matrix a (i, j) =a (j, i), in order to save calculation time, only the similarity of the problem pair from and above the main diagonal of the matrix is calculated.

b: too large a problem amount results in a large distance matrix a, the problem number is N, and the matrix a data amount is nxn, which results in huge calculation time and memory consumption. Thus, the computation may be performed in a multi-process manner, i.e., with multiple servers or clusters. Dividing the distance matrix according to rows according to the number n of the servers or the cluster CPUs to obtain n parts, and calculating the similarity of the main diagonal and the problem pairs above the main diagonal of the distance matrix in each part by each server or the cluster CPU;

c: specifically, the following formula is used for calculating the distance matrix segmentation line number,

k (i+1) = [ sqrt (k (i) ×k (i) +n×n/N) -k (i) ]; k (i) is the segmentation line number

i is sequentially 1, 2 and 3 … n-1, the formula is circularly called, and the value of k (i) is calculated. Finally, the distance matrix is segmented according to the calculated segmentation line number. Starting from line 1, i.e. k (1) =1; k (2) =sqrt (1+n×n/N) -1, approximately equal to N/sqrt (N); and (5) performing subsequent segmentation recursive calculation. The calculated segmentation line number is rounded to obtain the last line number to be segmented, i.e. k (2) = 2.3333 is calculated, i.e. k (2) =2 is taken.

Step S104: and clustering the extracted problems based on the similarity to obtain a client demand cognitive system.

Specifically, all problem pairs are ranked from large to small according to the similarity, and in general, since the set similarity accuracy is very high, two problem pairs with identical similarity hardly exist, and if there are problem pairs with identical similarity, problem pairs with identical similarity are ranked in random order, and problem pairs with identical similarity are ranked from large to small.

Secondly, sorting the first problem pairs into one type; then, acquiring a problem pair with the order at the 2 nd position, judging whether any problem in the problem pair with the order at the 2 nd position is classified, if so, classifying the problem which is not classified in the problem pair at the 2 nd position and the problem which is classified into one type, and if not, classifying the problem pair at the 2 nd position into a new type; and classifying other problem pairs according to the sequence of the steps until N problems are classified. After classifying all questions, an index is built for each class of questions, the index representing the attributes of such questions.

Finally, vectorizing each class of problems, for example, carrying out direct average pooling (average pool) on the elements in the cluster Ci, a (x) is Ci to obtain class vectors, wherein Ci= Σa (x)/cn is the number of class problems, and obtaining the vector of each class of problems. And (3) for all the problem classes, performing hierarchical iterative clustering by adopting the steps S103-S104, and stopping iteration when the total class number is smaller than 80 or the class similarity is smaller than 0.65, which basically indicates that the class similarity is not reached. Each hierarchical iteration is performed to index each type of problem, which can represent an index of such problems.

After classification is completed, a client demand cognition system can be obtained, wherein the client demand cognition system carries out classification statistics on problems proposed by clients, and the types of the problems generated by the clients on the project can be directly and clearly seen according to classification conditions; therefore, when the project rotates or a new salesperson takes part, the constructed client demand cognitive system is used for the salesperson to learn, and the salesperson can comprehensively know the client demand.

Based on the client demand cognition system, salesperson can prepare corresponding answer phone operation in advance, so that after the client presents questions, salesperson can effectively organize sales language, answer the questions of the client more timely and accurately, the client can better know the project, and the salesperson can be helped to promote business more quickly.

Thus, the whole cognition system of the user attention problem can be constructed. Typically contains 3-4 multi-level user problem classifications from coarse to fine. In particular, based on real-world scene data, the reference results are shown in table 1, it can be seen that based on user dialog, the method can quickly identify the classification of the related problems of the price, after-sales and parking spaces of the clients, and can classify and distribute the problems.

TABLE 1

In the method for constructing the client demand cognition system of the project, the problems in the historical consultation data of the client are extracted, and all the extracted problems are clustered to divide the problems with different expressions and similar semantics into one class, so that hierarchical system cognition is formed, and the sales can quickly know the content of the client wanted to be known for the project; in addition, the client demand cognition system constructed by the application is based on a large number of client problems extracted from historical client consultation data, so that the extracted problems of attention of clients have comprehensiveness, and the sales comprehensively know the demand cognition of the clients on the project so as to more rapidly solve the client demands.

In one embodiment, after calculating the similarity of any two problems, the method further includes:

taking a problem pairs which are ranked at the front, and deleting the rest problem pairs; a may be 50.

The problem pairs with the later ordering of each row are deleted in order to reduce the workload of overall ordering and clustering, because the problem pairs with the later ordering are not classified into the same class with high probability, and are therefore deleted.

In one embodiment, the similarity of the calculated problem pairs can be calculated by using cosine distance values of two problems, and the pre-trained model can better adapt to vectorization of the problems to obtain better effects. If the content is very drooping, the problem vectorization of the pre-training model is not well adapted, so the pre-training model needs to be fine-tune. Since the optimization objective is to strengthen the similarity expression of similar content, the optimization of vectorization is performed using a double-tower mode, as shown in fig. 3. The specific adjustment mode is as follows:

a. word segmentation is carried out on sentences A and B with similar or completely dissimilar meanings; for example: sentence a, "which advantages of same-layer drainage" is split into: drainage, existence, dominance and the like of the same layer;

b. the pre-training model converts each word into a word vector, example a sentence above, into 4 word vectors, for example: the same-layer drainage is converted into a vector u1[0.983,0.213,0,321 ]. . . . . 764 or other dimensions, the others being u2, u3, u4;

pool, generating a word vector U= (u1+u2+u3+u4)/4 in a mean manner;

d. calculating a sentence vector v= = (v1+v2+v3 … …)/n, wherein n is the number of word vectors;

e. and calculating U, V vector cosine similarity, wherein the formula is as follows: cos (θ) = (a.b)/(a b) with-1 representing vector U, V in exactly 180 degrees opposite directions, the similarity is the lowest, 1 indicates U, V that the directions are completely consistent, and the similarity is the highest.

f. Through training, the parameters of the bert model are finely adjusted according to the calculated cos (theta) value, so that the similar A, B sentences and the final cosine value are more approximate to 1, and the dissimilar A, B sentences are more approximate to-1.

In one embodiment, for a new problem, after the new problem is acquired, vectorizing the new problem to obtain an a vector;

calculating the similarity of any two categories C (i), and taking the minimum similarity as a first similarity min (i);

calculating the similarity between the new problem vector a and each divided class C (i), and taking the maximum similarity as a second similarity max (i);

The application has the following beneficial technical effects:

1. the LLM model extraction problem is put forward, and the method is based on vectorization and multi-level clustering. Customer questions can be quickly extracted from mass sales dialogue texts, and global and hierarchical systematic cognition is formed for the customer attention questions.

2. Aiming at common kmeans, euclidean distance calculation and other methods, the method adopts an unsupervised mode, can be adapted to complex and classified scenes with large classification quantity and hierarchical classification logic.

3. Aiming at performance optimization under large data volume, a fast distributed/multi-process calculation method is provided, and fast result output is realized.

In one embodiment, a system for building a customer demand cognitive system for an item is provided, comprising:

and the similarity module is used for carrying out vectorization processing on all the extracted problems and respectively calculating the similarity of any two problems.

For specific limitations on the system for constructing the client demand-aware system of the project, reference may be made to the above limitations on the method for constructing the client demand-aware system of the project, which are not repeated herein. The various modules in the system for constructing the customer demand cognitive system of the project can be implemented in whole or in part by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a client demand awareness architecture method of building an item as described above.

In an embodiment, a computer readable storage medium is also provided, on which a computer program is stored, involving all or part of the flow of the method of the above embodiment.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of constructing a customer demand cognitive system, the method comprising:

acquiring historical customer consultation data of the project;

2. The method for constructing a customer demand cognitive system according to claim 1, wherein the calculating the similarity of any two questions specifically comprises:

3. The method for constructing a client demand cognitive system of claim 2, wherein the splitting the distance matrix by rows to obtain n parts comprises: the following formula is called, so that i calculates the row number of the distance matrix to be segmented when sequentially taking 1, 2 and 3 … n-1:

k(i+1)＝[sqrt(k(i)*k(i)+N*N/n)-k(i)]

wherein k (i) is a line number to be sliced, k (1) =1;

4. The method for constructing a customer need cognition system according to claim 3, wherein after calculating the similarity between any two questions, further comprising:

5. The method of claim 4, wherein clustering the extracted questions based on similarity comprises:

step S301: the first problem pair of the ordering is divided into one type;

6. The method of claim 5, wherein clustering the extracted questions based on similarity further comprises:

7. The method of claim 6, wherein the class set point is 80 and the similarity set point is 0.65.

8. The method of constructing a customer demand cognitive system of claim 1, further comprising:

after the new problem is acquired, vectorizing the new problem;

9. A system for building a customer demand cognitive system, the system comprising:

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.