CN114398512A

CN114398512A - Big data-based voice portrait analysis method for communication operator business customer

Info

Publication number: CN114398512A
Application number: CN202110989375.5A
Authority: CN
Inventors: 刘卫平; 王福君; 樊炳恒; 吴金燕
Original assignee: Beijing Zhongyun Jinnuo Technology Co ltd
Current assignee: Beijing Zhongyun Jinnuo Technology Co ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2022-04-26

Abstract

The invention discloses a big data-based voice portrait analysis method for communication operator industry customers, which comprises the following steps: step 1, collecting voice data of a user and a seat in a communication mode, and converting the voice data into text data; step 2, performing word segmentation and feature selection on the text data, establishing a feature vector of each word segmentation, and modeling data; step 3, realizing automatic clustering of words according to the feature vector of each participle, and after clustering the classified words, carrying out classification marking according to a clustering semantic label; step 4, carrying out voice service on the classification marks to identify the user intention and calculating the user label value through a label model; and 5, analyzing the user label and outputting the user image set through the multi-dimensional index. Has the advantages that: the label model is carried out on the user to form a multi-dimensional user image, so that the information of the customer can be conveniently known by the seat before the seat communicates with the customer, the service is provided specifically, the customer service perception is improved, and the complaint rate is reduced.

Description

Big data-based voice portrait analysis method for communication operator business customer

Technical Field

The invention relates to the field of communication, in particular to a big data-based voice portrait analysis method for communication operator industry customers.

Background

The entertainment, emotion, high efficiency and excellent experience requirements of customers become the core direction of continuous innovation of driving technology, application, terminals and services under the background of the mobile internet, the business mode transformation of enterprises caused by the method also becomes the key driving force of service innovation, and with the explosive growth of data volume and the maturity of large data technology, more and more behavior data of the capturable customers are obtained, so that the user portrait can be really called as a more valuable portrait.

The method is based on the massive voice data of the user, develops and defines the client appeal behaviors and figures, expands the information of the marketing tendency, complaint tendency, consultation preference, product interest, handling behavior and the like of the user, explains the characteristics of the user in an all-around manner, and provides comprehensive data support for service and market operation and maintenance activities. The method can know the heart sound of the client in time, improve the service capability and promote the innovation of the business process.

At present, operators mostly use structural information to portray users in the aspect of customer portrayal construction, portray the users, can not comprehensively reflect the individuality and the requirement of the users, and can not provide thousands of service experiences for the users in the service process. Meanwhile, the traditional marketing maintenance mode of an operator has the phenomena of unclear targets, directions and rhythms, and huge investment of operation and maintenance resources causes resource waste.

Disclosure of Invention

The invention provides a big data-based voice portrait analysis method for communication operator business customers, which greatly improves the marketing success rate, reduces the user maintenance cost, and simultaneously lightens the labor intensity of customer service personnel, so that the seat can know the business and provide targeted service for thousands of people.

A big data-based voice portrait analysis method for communication operator business customers comprises the following steps:

step 1, collecting voice data of a user and a seat in a communication mode, and converting the voice data into text data;

step 2, performing word segmentation and feature selection on the text data through a distributed message system based on the data set, forming a feature vector of each word segmentation, and modeling data;

step 3, realizing automatic clustering of words according to the feature vector of each participle, and after clustering the classified words, carrying out classification marking according to a clustering semantic label;

step 4, carrying out voice service on the classification marks to identify the user intention and calculating the user label value through a label model;

step 5, analyzing the user label and outputting a user image set through a multi-dimensional index;

step 6, verifying the accuracy of the user portrait through a verification model;

and 7, analyzing the user image set to generate a visual multi-dimensional report.

In step 1, the user voice data needs to be cleaned and preprocessed before being converted into text data.

In the step 1, the ASR is used for transcribing the user recording data, and the recognition mode adopts an acoustic model of a deep neural network to finish semantic analysis of the voice transcribed recording to the dialog text.

Wherein, in the step 2, the data set comprises: a user behavior database, a system database, a corpus and a lexicon,

a user behavior database: business habits and preference data used by users;

a system database: basic information and service basic information data of users;

corpus: a user cycle portrait model formed by historical dialogue texts of the user and the agent;

a word bank: operator business product thesaurus and service thesaurus.

In the step 2, word segmentation is performed through an HMM algorithm, feature selection is performed on data through a TF-IDF algorithm and an LDA algorithm, text data can be calculated, feature vectors of the segmented words are built through a word2Vec algorithm, data modeling is performed on the data with the features through a CNN algorithm, and a user label value is calculated through a data model and a classification model.

In step 3, the semantic tags used by the classified labels after the user dialogues text is participled include: business acceptance semantic tags, complaint semantic tags, business consultation semantic tags, business query semantic tags, fault semantic tags and attitude semantic tags.

In the step 4, the user intention recognition is performed through a user intention classification model, and the classification of the user intention by the user intention classification model includes a complaint high-risk type, a flow sensitive type, a product sensitive type, a password sensitive type, a complaint type and a marketing refusal type.

Wherein, in the step 4, the tag model of the user includes: the system comprises a basic feature tag, a product requirement tag, a business feature tag, a consumption feature tag, a channel feature tag, a terminal preference tag, a user service evaluation tag, a position tag and an internet content preference tag.

In the step 4, the user tags include four update levels, namely a daily level, a weekly level, a monthly level and a yearly level, and the tags of different user figures are updated according to different update requirements.

In step 5, the user image set includes a product image, a classification image, a complaint image, a consultation service image, a business tendency image and a consumption tendency image.

The invention at least comprises the following beneficial effects:

according to the invention, a multi-dimensional user image is formed by performing a label model on the user, so that a seat can conveniently know the information of the customer before communicating with the customer, and thus, the service is provided in a targeted manner, the customer service perception is improved, the complaint rate is reduced, the marketing success rate is improved, the user maintenance cost is reduced, and the product operation is assisted.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a block flow diagram of a big data based communications carrier business customer speech profile analysis method according to the present invention;

FIG. 2 is an application architecture diagram of the present invention for a big data based voice portrait analysis method for a communication carrier business customer;

FIG. 3 is a schematic diagram of the relationship between user tags for a big data based voice portrait analysis method of a communication carrier business customer according to the present invention;

FIG. 4 is a user tag weight calculation formula for a big data based voice portrait analysis method for a communication carrier business customer according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

step 6, verifying the accuracy of the user portrait through a verification model, and when the accuracy is lower than a threshold value, repeating the step 4 to recalculate the user label;

The distributed message system in step 2 comprises:

portrait application: the customer voice image can be applied to all links of the camp service, and specifically comprises the following steps: service prediction, accurate service, incoming call maintenance, product configuration and value improvement.

Data visualization: user portrait data created by the analysis processing service is presented in a multi-dimensional report form.

Analysis processing service: the method comprises cross analysis, text classification, text clustering and thematic analysis.

And (3) cross analysis: the structured fields of the data are selected for multi-dimensional cross analysis, so that the distribution, the contrast, the variation trend and the like of the analysis subject data on the known dimension can be rapidly known.

Text classification: and according to the keyword modeling, the detailed text data of each record is matched, and the detailed text data is automatically classified into various sets, so that the rapid classification of mass data is realized.

Text clustering: and automatically collecting according to semantic understanding of the detail texts in the analysis data to form different unknown collections, so as to realize rapid classification and collection of mass data.

Analysis of special subjects: and performing deep multi-level analysis on the analysis data by comprehensive rule analysis and intelligent analysis, performing root research on the analysis theme, and supporting the analysis result to be brought into a corpus for knowledge precipitation.

The voice service includes: speech recognition, mute recognition, emotion analysis, scene segmentation, semantic understanding and full-text transcription.

And (3) voice recognition: the voice call telephone traffic is converted into a speaker-separated conversation pair in real time;

mute recognition: identifying the mute condition of a user in the conversation process;

emotion recognition: recognizing the emotion of the seat and the user;

scene segmentation: the conversation process comprises a plurality of scenes, and different scenes are divided;

semantic understanding: understanding the user intent.

Full text transcription: and transferring the call record into a conversation text in an off-line mode.

Basic data layer: including the data set used for modeling.

Service prejudgment is provided before and during call answering of the manual agent through the multi-dimensional report form of the user portrait, so that service, marketing and maintenance are more accurately carried out.

In this embodiment, in step 1, before the user speech data is converted into text data, cleaning and preprocessing are performed, where the cleaning of the data is performed by cleaning the client dialogue record, cleaning the null audio, screening out audio meeting the requirements, encoding and classifying the data, and removing an abnormal value, a completion missing value, and a repetition value.

In this embodiment, in step 1, the ASR transcribes the user recording data, and the recognition mode adopts an acoustic model of a deep neural network, so as to complete semantic analysis of the dialog text by the recording of the voice transcription, and transcribe the user voice to obtain the user semantics through the semantic model.

In this embodiment, in step 2, the data set includes: a user behavior database, a system database, a corpus and a lexicon,

a user behavior database: business habits and preference data used by users, such as common websites, APPs, and the like;

a system database: basic information and service basic information data of the user, such as gender, package and monthly consumption;

a word bank: and the operator industry product word stock and the service word stock, such as ice cream packages and happy shopping.

Example of the product thesaurus: 3G package, Internet financing, bee card, Xinlang V card, Baidu Shen card, unlimited additional product, 4G package, flow rate limitation unlimited, hungry card, Taobao smooth card, ant treasure card, flow rate limitation relieveable, M2M connection service, prepaid product package Mei Tuan card, drip orange card, Tengwang card, voice unlimited, external enterprise payment service, post-paid product package, beep li card, recruit card, drip Wang card, high-value old user smooth experience product, Wo wallet, nail card, enantio card, Jingdong strong card, smooth-crossing ice cream product.

A service word bank: the method comprises the following steps of fixed network basic voice service, internet access service, campus fusion, mobile phone internet surfing, data VPN, fixed network electronic payment, wireless local telephone service, data and network element service, 2I fusion, three-way calling, short message service, video conference, public telephone service, ICT service, caller ID, incoming call restriction, voice VPN, fusion information service, telephone card service, advertisement service, incoming call highlight, call hold, personalized ring back tone, internet payment, unlimited fusion, limited fusion, caller display prohibition, call transfer, call center and mobile phone media.

In this embodiment, in step 2, word segmentation is performed through an HMM algorithm, feature selection is performed on data through a TF-IDF algorithm and an LDA algorithm, text data can be calculated, feature vectors of the segmented words are constructed through algorithms such as word2Vec, derivation 4j, fastext, and LDA, classification and clustering are performed on the text, a classification clustering output result is applied to a data clustering service and a text classification service, data modeling can be performed on the data with features through algorithms such as fastext, CNN, and THUCTC, and a user tag value is calculated through a data model and a classification model.

Introduction of the algorithm applied:

HMM algorithm: the method is applied to the word segmentation process, and is a process for determining implicit parameters of the process from observable parameters and then utilizing the parameters for further analysis.

TF-IDF algorithm: the method is applied to a feature selection process for evaluating the importance of a word to one of a set of documents or a corpus of documents. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.

The LDA algorithm: the algorithm is applied to three links of feature selection, feature vector and data clustering processing, is a document theme generation model, is also called a three-layer Bayes probability model, and comprises three-layer structures of words, themes and documents.

word2Vec algorithm: the method is applied to a characteristic vector process and is used for generating a correlation model of word vectors, the model is a shallow and double-layer neural network, a word2vec model can be used for mapping each word to a vector and can be used for representing the relation between word-to-word, and the vector is a hidden layer of the neural network.

Deep learning4j algorithm: the method is applied to the process of the feature vector, widely supports the operation frameworks of various deep learning algorithms, and can implement word2vec technology.

The Fasttext algorithm: the method is applied to a characteristic vector, data modeling process, word vector calculation and text classification tool.

CNN algorithm: the method is applied to a data modeling process, data modeling is carried out through a CNN algorithm after data vectorization, and mapping capacity from input to output is formed through learning a large number of voice dialog texts.

The THUCTC algorithm: the method is applied to the data modeling process, and automatically and efficiently realizes the training, evaluating and classifying functions of the user-defined text classification corpus.

After the user tags are calculated, the weights of the user tags in the representation are determined, and the user representation has different tag weights in different scenes.

Calculating user tag weights using the TF-IDF algorithm, for example, there are 3 users and 5 tags (as shown in fig. 3), and the relationship between the tags and the users will reflect the relationship between the tags to some extent, and the number of times a tag T is used to mark a user P is represented by w (P, T). TF (P, T) represents the proportion of the number of times of this tagging in all tags of user P, and the formula is:

as shown in fig. 3, if there are 6 tags a, 4 tags b, and 2 tags c marked on the user a, the a tag TF on the user a is 6/(6+4+ 2).

The corresponding IDF (P, T) indicates the scarcity of the tag T in all tags, i.e. the probability of occurrence of the tag, if a tag T has a small probability of occurrence and is used to mark a user at the same time, the relationship between the user and the tag T is made tighter, and the formula is:

then, the weight value of the label of the user can be obtained according to TF and IDF, the weight at this time does not consider the service scenario, obviously, the user label weight needs to consider the service scenario, how long the time is, the number of times the label is generated by the user, and the like, and the calculation formula is shown in fig. 4.

In this embodiment, in step 3, the semantic tags used by the classified labels after the user dialogues text is participled include: business acceptance semantic tags, complaint semantic tags, business consultation semantic tags, business query semantic tags, fault semantic tags and attitude semantic tags.

Examples of semantic tags are:

the service acceptance type semantics comprise: flow package, main and auxiliary card service, fixed telephone new installation, package change, new installation integration, call transfer, special service function, package change integration, and emergency shutdown and startup.

Complaint class semantics include: the method comprises the following steps of failing to collect short messages, solving the problem of telephone operator service skills, failing to pay account, failing to use services normally, disputing of value-added fees, failing to access the internet, real-name system of client data, solving the problems of certificate blacklist, disputing of traffic fees, failing to talk, harassment, fraud and halt, failing to delay and fail to take effect due to untimely service processing and double capping of traffic.

The business consultation semantics comprise: package consultation, number portability, voyage cloud disk, machine dismantling, international roaming, privilege and exclusive flow, charge storage and delivery machine, remote card supplementing, fixed-width charge, service password, point exchange, handling procedures, charge storage and delivery charge, remote number sale, broadband package year charge, junk information communication fraud consultation, electronic invoice, user passing, 2l package charge, product consultation, fixed-network delivery machine, voyage video, one-card charge and international long distance.

The business query class semantics include: balance allowance inquiry, activity return and expiration time inquiry, local phone number inquiry, complex phone charge inquiry and arrearage reason inquiry, business validation inquiry, business hall information inquiry, point inquiry, fixed-width account password inquiry, account balance allowance inquiry, account balance and expiration time inquiry, account balance and account balance inquiry, account balance and account balance inquiry, account balance inquiry, account balance inquiry, account balance, account balance, account balance, account balance, account balance,

The failure class semantics include: fixed line fault, system upgrade, broadband fault, large area fault.

The attitude class semantics include: bad attitude, friendly attitude, irritability of spleen qi, and mild attitude.

In this embodiment, in step 4, the user intention recognition is performed through a user intention classification model, and the user intention classification model classifies the user intention according to a complaint high-risk type, a flow-sensitive type, a product-sensitive type, a password-sensitive type, a complaint type, and a denial of marketing type.

In this embodiment, in step 4, the tag model of the user includes:

basic feature label: describing client attributes and corresponding social relations from the perspective of natural people;

product requirement labeling: analyzing user ordering Unicom product information from the incoming voice data, including contract plan participation and client tendency information for marketing activity selection;

service characteristic label: analyzing the use condition and the call circle of the user from the aspects of incoming call consultation voice, flow, short message and the like;

a consumption characteristic label: describing the composition of the user's expenditure income, the settlement and payment, the payment and the credit related information;

channel feature labeling: describing channels and channel preference information in customer service contact;

terminal preference tag: describing user terminal use information and terminal preference information through incoming call consultation and service handling;

user service evaluation label: describing the value of a client and the satisfaction degree of the client to the service from the aspects of marketing, maintenance and the like;

position labeling: recording user actions and a base station use track;

internet content preference tag: and classifying the Internet contents to describe the Internet surfing behavior preference of the client.

Such as the client complaint information belonging to the user service evaluation label, the consultation information belonging to the product demand label, the tariff and historical bill information belonging to the consumption characteristic label, the client age and gender belonging to the basic characteristic label, etc.

The behavior data of the user is mapped to a user label system to obtain rules and weights, semantic labels of the user call behaviors are mapped with the user label system, the mapped rules and the weights of different mapping relations are obtained through a label model, an accurate user portrait is constructed, and a characteristic label value and a label weight are calculated.

The user portrait construction needs to be carried out by collecting and cleaning the customer voice through the

steps

1 and 2, then the label value of the user is calculated through algorithm combination, the customer voice data is collected through a distributed message system, a data set is established, and the conversation is participled through a word bank of an operator.

In this embodiment, in the step 4, the update appeal of different tag attributes of the operator user tag is different, the user tag includes four update levels, i.e., a daily level, a weekly level, a monthly level and a yearly level, and the tags of different user figures are updated according to different update requirements.

Day-level label: mood-like labels, such as: violent and compatible;

week-level label: product demand labels, such as: a flow packet;

monthly-level tags: package level label: such as: 5G, flow package and auxiliary card handling;

year level label: customer base attribute tag: such as: the age.

In this embodiment, in step 5, the user image set includes a product image, a classification image, a complaint image, a consultation service image, a business tendency image, a consumption tendency image, a service channel preference image, and a consumption capability image.

In the scheme, the business personnel construct the business system based on the business operation requirement to construct the label system, obtain the data of the user in different systems, particularly the voice data, calculate the label value of the user through the label model, finally obtain the user portrait which is different in different application layers, and the user portrait set not only improves the customer service perception, reduces the complaint rate, improves the marketing success rate, reduces the user maintenance cost and assists the product operation.

Although the embodiments of the present invention have been disclosed in the foregoing description, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, it can be fully applied to various fields adapted to the present invention, and further modifications can be easily implemented by those skilled in the art, so that any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A big data-based voice portrait analysis method for communication operator business customers is characterized in that: the method comprises the following steps:

2. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 1, the user voice data needs to be cleaned and preprocessed before being converted into text data.

3. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 1, the voice recording data of the user is transcribed through the ASR, and the recognition mode adopts an acoustic model of a deep neural network to finish the semantic analysis of the voice transcribed voice to the dialog text.

4. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 2, the data set includes: a user behavior database, a system database, a corpus and a lexicon,

a user behavior database: business habits and preference data used by users;

a word bank: operator business product thesaurus and service thesaurus.

5. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 2, word segmentation is performed through an HMM algorithm, feature selection is performed on data through a TF-IDF algorithm and an LDA algorithm, text data can be calculated, feature vectors of the segmented words are built through a word2Vec algorithm, data modeling is performed on the data with the features through a CNN algorithm, and a user label value is calculated through a data model and a classification model.

6. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 3, the semantic tags used by the classified labels after the user dialogues text is participled comprise: business acceptance semantic tags, complaint semantic tags, business consultation semantic tags, business query semantic tags, fault semantic tags and attitude semantic tags.

7. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 4, the user intention recognition is performed through a user intention classification model, and the user intention classification model classifies the user intention to the complaint high-risk type, the flow sensitive type, the product sensitive type, the password sensitive type, the complaint type and the marketing refusal type.

8. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 4, the tag model of the user includes: the system comprises a basic feature tag, a product requirement tag, a business feature tag, a consumption feature tag, a channel feature tag, a terminal preference tag, a user service evaluation tag, a position tag and an internet content preference tag.

9. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 4, the user tags include four update levels, namely a day level, a week level, a month level and a year level, and the tags of different user figures are updated according to different update requirements.

10. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 5, the user image set comprises a product image, a classification image, a complaint image, a consultation service image, a service tendency image and a consumption tendency image.