CN114398512A - Big data-based voice portrait analysis method for communication operator business customer - Google Patents

Big data-based voice portrait analysis method for communication operator business customer Download PDF

Info

Publication number
CN114398512A
CN114398512A CN202110989375.5A CN202110989375A CN114398512A CN 114398512 A CN114398512 A CN 114398512A CN 202110989375 A CN202110989375 A CN 202110989375A CN 114398512 A CN114398512 A CN 114398512A
Authority
CN
China
Prior art keywords
user
data
voice
service
analysis method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110989375.5A
Other languages
Chinese (zh)
Inventor
刘卫平
王福君
樊炳恒
吴金燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongyun Jinnuo Technology Co ltd
Original Assignee
Beijing Zhongyun Jinnuo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongyun Jinnuo Technology Co ltd filed Critical Beijing Zhongyun Jinnuo Technology Co ltd
Priority to CN202110989375.5A priority Critical patent/CN114398512A/en
Publication of CN114398512A publication Critical patent/CN114398512A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/64Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0281Customer communication at a business location, e.g. providing product or service information, consulting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Accounting & Taxation (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Library & Information Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)

Abstract

The invention discloses a big data-based voice portrait analysis method for communication operator industry customers, which comprises the following steps: step 1, collecting voice data of a user and a seat in a communication mode, and converting the voice data into text data; step 2, performing word segmentation and feature selection on the text data, establishing a feature vector of each word segmentation, and modeling data; step 3, realizing automatic clustering of words according to the feature vector of each participle, and after clustering the classified words, carrying out classification marking according to a clustering semantic label; step 4, carrying out voice service on the classification marks to identify the user intention and calculating the user label value through a label model; and 5, analyzing the user label and outputting the user image set through the multi-dimensional index. Has the advantages that: the label model is carried out on the user to form a multi-dimensional user image, so that the information of the customer can be conveniently known by the seat before the seat communicates with the customer, the service is provided specifically, the customer service perception is improved, and the complaint rate is reduced.

Description

Big data-based voice portrait analysis method for communication operator business customer
Technical Field
The invention relates to the field of communication, in particular to a big data-based voice portrait analysis method for communication operator industry customers.
Background
The entertainment, emotion, high efficiency and excellent experience requirements of customers become the core direction of continuous innovation of driving technology, application, terminals and services under the background of the mobile internet, the business mode transformation of enterprises caused by the method also becomes the key driving force of service innovation, and with the explosive growth of data volume and the maturity of large data technology, more and more behavior data of the capturable customers are obtained, so that the user portrait can be really called as a more valuable portrait.
The method is based on the massive voice data of the user, develops and defines the client appeal behaviors and figures, expands the information of the marketing tendency, complaint tendency, consultation preference, product interest, handling behavior and the like of the user, explains the characteristics of the user in an all-around manner, and provides comprehensive data support for service and market operation and maintenance activities. The method can know the heart sound of the client in time, improve the service capability and promote the innovation of the business process.
At present, operators mostly use structural information to portray users in the aspect of customer portrayal construction, portray the users, can not comprehensively reflect the individuality and the requirement of the users, and can not provide thousands of service experiences for the users in the service process. Meanwhile, the traditional marketing maintenance mode of an operator has the phenomena of unclear targets, directions and rhythms, and huge investment of operation and maintenance resources causes resource waste.
Disclosure of Invention
The invention provides a big data-based voice portrait analysis method for communication operator business customers, which greatly improves the marketing success rate, reduces the user maintenance cost, and simultaneously lightens the labor intensity of customer service personnel, so that the seat can know the business and provide targeted service for thousands of people.
A big data-based voice portrait analysis method for communication operator business customers comprises the following steps:
step 1, collecting voice data of a user and a seat in a communication mode, and converting the voice data into text data;
step 2, performing word segmentation and feature selection on the text data through a distributed message system based on the data set, forming a feature vector of each word segmentation, and modeling data;
step 3, realizing automatic clustering of words according to the feature vector of each participle, and after clustering the classified words, carrying out classification marking according to a clustering semantic label;
step 4, carrying out voice service on the classification marks to identify the user intention and calculating the user label value through a label model;
step 5, analyzing the user label and outputting a user image set through a multi-dimensional index;
step 6, verifying the accuracy of the user portrait through a verification model;
and 7, analyzing the user image set to generate a visual multi-dimensional report.
In step 1, the user voice data needs to be cleaned and preprocessed before being converted into text data.
In the step 1, the ASR is used for transcribing the user recording data, and the recognition mode adopts an acoustic model of a deep neural network to finish semantic analysis of the voice transcribed recording to the dialog text.
Wherein, in the step 2, the data set comprises: a user behavior database, a system database, a corpus and a lexicon,
a user behavior database: business habits and preference data used by users;
a system database: basic information and service basic information data of users;
corpus: a user cycle portrait model formed by historical dialogue texts of the user and the agent;
a word bank: operator business product thesaurus and service thesaurus.
In the step 2, word segmentation is performed through an HMM algorithm, feature selection is performed on data through a TF-IDF algorithm and an LDA algorithm, text data can be calculated, feature vectors of the segmented words are built through a word2Vec algorithm, data modeling is performed on the data with the features through a CNN algorithm, and a user label value is calculated through a data model and a classification model.
In step 3, the semantic tags used by the classified labels after the user dialogues text is participled include: business acceptance semantic tags, complaint semantic tags, business consultation semantic tags, business query semantic tags, fault semantic tags and attitude semantic tags.
In the step 4, the user intention recognition is performed through a user intention classification model, and the classification of the user intention by the user intention classification model includes a complaint high-risk type, a flow sensitive type, a product sensitive type, a password sensitive type, a complaint type and a marketing refusal type.
Wherein, in the step 4, the tag model of the user includes: the system comprises a basic feature tag, a product requirement tag, a business feature tag, a consumption feature tag, a channel feature tag, a terminal preference tag, a user service evaluation tag, a position tag and an internet content preference tag.
In the step 4, the user tags include four update levels, namely a daily level, a weekly level, a monthly level and a yearly level, and the tags of different user figures are updated according to different update requirements.
In step 5, the user image set includes a product image, a classification image, a complaint image, a consultation service image, a business tendency image and a consumption tendency image.
The invention at least comprises the following beneficial effects:
according to the invention, a multi-dimensional user image is formed by performing a label model on the user, so that a seat can conveniently know the information of the customer before communicating with the customer, and thus, the service is provided in a targeted manner, the customer service perception is improved, the complaint rate is reduced, the marketing success rate is improved, the user maintenance cost is reduced, and the product operation is assisted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a block flow diagram of a big data based communications carrier business customer speech profile analysis method according to the present invention;
FIG. 2 is an application architecture diagram of the present invention for a big data based voice portrait analysis method for a communication carrier business customer;
FIG. 3 is a schematic diagram of the relationship between user tags for a big data based voice portrait analysis method of a communication carrier business customer according to the present invention;
FIG. 4 is a user tag weight calculation formula for a big data based voice portrait analysis method for a communication carrier business customer according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A big data-based voice portrait analysis method for communication operator business customers comprises the following steps:
step 1, collecting voice data of a user and a seat in a communication mode, and converting the voice data into text data;
step 2, performing word segmentation and feature selection on the text data through a distributed message system based on the data set, forming a feature vector of each word segmentation, and modeling data;
step 3, realizing automatic clustering of words according to the feature vector of each participle, and after clustering the classified words, carrying out classification marking according to a clustering semantic label;
step 4, carrying out voice service on the classification marks to identify the user intention and calculating the user label value through a label model;
step 5, analyzing the user label and outputting a user image set through a multi-dimensional index;
step 6, verifying the accuracy of the user portrait through a verification model, and when the accuracy is lower than a threshold value, repeating the step 4 to recalculate the user label;
and 7, analyzing the user image set to generate a visual multi-dimensional report.
The distributed message system in step 2 comprises:
portrait application: the customer voice image can be applied to all links of the camp service, and specifically comprises the following steps: service prediction, accurate service, incoming call maintenance, product configuration and value improvement.
Data visualization: user portrait data created by the analysis processing service is presented in a multi-dimensional report form.
Analysis processing service: the method comprises cross analysis, text classification, text clustering and thematic analysis.
And (3) cross analysis: the structured fields of the data are selected for multi-dimensional cross analysis, so that the distribution, the contrast, the variation trend and the like of the analysis subject data on the known dimension can be rapidly known.
Text classification: and according to the keyword modeling, the detailed text data of each record is matched, and the detailed text data is automatically classified into various sets, so that the rapid classification of mass data is realized.
Text clustering: and automatically collecting according to semantic understanding of the detail texts in the analysis data to form different unknown collections, so as to realize rapid classification and collection of mass data.
Analysis of special subjects: and performing deep multi-level analysis on the analysis data by comprehensive rule analysis and intelligent analysis, performing root research on the analysis theme, and supporting the analysis result to be brought into a corpus for knowledge precipitation.
The voice service includes: speech recognition, mute recognition, emotion analysis, scene segmentation, semantic understanding and full-text transcription.
And (3) voice recognition: the voice call telephone traffic is converted into a speaker-separated conversation pair in real time;
mute recognition: identifying the mute condition of a user in the conversation process;
emotion recognition: recognizing the emotion of the seat and the user;
scene segmentation: the conversation process comprises a plurality of scenes, and different scenes are divided;
semantic understanding: understanding the user intent.
Full text transcription: and transferring the call record into a conversation text in an off-line mode.
Basic data layer: including the data set used for modeling.
Service prejudgment is provided before and during call answering of the manual agent through the multi-dimensional report form of the user portrait, so that service, marketing and maintenance are more accurately carried out.
In this embodiment, in step 1, before the user speech data is converted into text data, cleaning and preprocessing are performed, where the cleaning of the data is performed by cleaning the client dialogue record, cleaning the null audio, screening out audio meeting the requirements, encoding and classifying the data, and removing an abnormal value, a completion missing value, and a repetition value.
In this embodiment, in step 1, the ASR transcribes the user recording data, and the recognition mode adopts an acoustic model of a deep neural network, so as to complete semantic analysis of the dialog text by the recording of the voice transcription, and transcribe the user voice to obtain the user semantics through the semantic model.
In this embodiment, in step 2, the data set includes: a user behavior database, a system database, a corpus and a lexicon,
a user behavior database: business habits and preference data used by users, such as common websites, APPs, and the like;
a system database: basic information and service basic information data of the user, such as gender, package and monthly consumption;
corpus: a user cycle portrait model formed by historical dialogue texts of the user and the agent;
a word bank: and the operator industry product word stock and the service word stock, such as ice cream packages and happy shopping.
Example of the product thesaurus: 3G package, Internet financing, bee card, Xinlang V card, Baidu Shen card, unlimited additional product, 4G package, flow rate limitation unlimited, hungry card, Taobao smooth card, ant treasure card, flow rate limitation relieveable, M2M connection service, prepaid product package Mei Tuan card, drip orange card, Tengwang card, voice unlimited, external enterprise payment service, post-paid product package, beep li card, recruit card, drip Wang card, high-value old user smooth experience product, Wo wallet, nail card, enantio card, Jingdong strong card, smooth-crossing ice cream product.
A service word bank: the method comprises the following steps of fixed network basic voice service, internet access service, campus fusion, mobile phone internet surfing, data VPN, fixed network electronic payment, wireless local telephone service, data and network element service, 2I fusion, three-way calling, short message service, video conference, public telephone service, ICT service, caller ID, incoming call restriction, voice VPN, fusion information service, telephone card service, advertisement service, incoming call highlight, call hold, personalized ring back tone, internet payment, unlimited fusion, limited fusion, caller display prohibition, call transfer, call center and mobile phone media.
In this embodiment, in step 2, word segmentation is performed through an HMM algorithm, feature selection is performed on data through a TF-IDF algorithm and an LDA algorithm, text data can be calculated, feature vectors of the segmented words are constructed through algorithms such as word2Vec, derivation 4j, fastext, and LDA, classification and clustering are performed on the text, a classification clustering output result is applied to a data clustering service and a text classification service, data modeling can be performed on the data with features through algorithms such as fastext, CNN, and THUCTC, and a user tag value is calculated through a data model and a classification model.
Introduction of the algorithm applied:
HMM algorithm: the method is applied to the word segmentation process, and is a process for determining implicit parameters of the process from observable parameters and then utilizing the parameters for further analysis.
TF-IDF algorithm: the method is applied to a feature selection process for evaluating the importance of a word to one of a set of documents or a corpus of documents. The importance of a word increases in proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in the corpus.
The LDA algorithm: the algorithm is applied to three links of feature selection, feature vector and data clustering processing, is a document theme generation model, is also called a three-layer Bayes probability model, and comprises three-layer structures of words, themes and documents.
word2Vec algorithm: the method is applied to a characteristic vector process and is used for generating a correlation model of word vectors, the model is a shallow and double-layer neural network, a word2vec model can be used for mapping each word to a vector and can be used for representing the relation between word-to-word, and the vector is a hidden layer of the neural network.
Deep learning4j algorithm: the method is applied to the process of the feature vector, widely supports the operation frameworks of various deep learning algorithms, and can implement word2vec technology.
The Fasttext algorithm: the method is applied to a characteristic vector, data modeling process, word vector calculation and text classification tool.
CNN algorithm: the method is applied to a data modeling process, data modeling is carried out through a CNN algorithm after data vectorization, and mapping capacity from input to output is formed through learning a large number of voice dialog texts.
The THUCTC algorithm: the method is applied to the data modeling process, and automatically and efficiently realizes the training, evaluating and classifying functions of the user-defined text classification corpus.
After the user tags are calculated, the weights of the user tags in the representation are determined, and the user representation has different tag weights in different scenes.
Calculating user tag weights using the TF-IDF algorithm, for example, there are 3 users and 5 tags (as shown in fig. 3), and the relationship between the tags and the users will reflect the relationship between the tags to some extent, and the number of times a tag T is used to mark a user P is represented by w (P, T). TF (P, T) represents the proportion of the number of times of this tagging in all tags of user P, and the formula is:
Figure BDA0003231923650000081
as shown in fig. 3, if there are 6 tags a, 4 tags b, and 2 tags c marked on the user a, the a tag TF on the user a is 6/(6+4+ 2).
The corresponding IDF (P, T) indicates the scarcity of the tag T in all tags, i.e. the probability of occurrence of the tag, if a tag T has a small probability of occurrence and is used to mark a user at the same time, the relationship between the user and the tag T is made tighter, and the formula is:
Figure BDA0003231923650000091
then, the weight value of the label of the user can be obtained according to TF and IDF, the weight at this time does not consider the service scenario, obviously, the user label weight needs to consider the service scenario, how long the time is, the number of times the label is generated by the user, and the like, and the calculation formula is shown in fig. 4.
In this embodiment, in step 3, the semantic tags used by the classified labels after the user dialogues text is participled include: business acceptance semantic tags, complaint semantic tags, business consultation semantic tags, business query semantic tags, fault semantic tags and attitude semantic tags.
Examples of semantic tags are:
the service acceptance type semantics comprise: flow package, main and auxiliary card service, fixed telephone new installation, package change, new installation integration, call transfer, special service function, package change integration, and emergency shutdown and startup.
Complaint class semantics include: the method comprises the following steps of failing to collect short messages, solving the problem of telephone operator service skills, failing to pay account, failing to use services normally, disputing of value-added fees, failing to access the internet, real-name system of client data, solving the problems of certificate blacklist, disputing of traffic fees, failing to talk, harassment, fraud and halt, failing to delay and fail to take effect due to untimely service processing and double capping of traffic.
The business consultation semantics comprise: package consultation, number portability, voyage cloud disk, machine dismantling, international roaming, privilege and exclusive flow, charge storage and delivery machine, remote card supplementing, fixed-width charge, service password, point exchange, handling procedures, charge storage and delivery charge, remote number sale, broadband package year charge, junk information communication fraud consultation, electronic invoice, user passing, 2l package charge, product consultation, fixed-network delivery machine, voyage video, one-card charge and international long distance.
The business query class semantics include: balance allowance inquiry, activity return and expiration time inquiry, local phone number inquiry, complex phone charge inquiry and arrearage reason inquiry, business validation inquiry, business hall information inquiry, point inquiry, fixed-width account password inquiry, account balance allowance inquiry, account balance and expiration time inquiry, account balance and account balance inquiry, account balance and account balance inquiry, account balance inquiry, account balance inquiry, account balance, account balance, account balance, account balance, account balance,
The failure class semantics include: fixed line fault, system upgrade, broadband fault, large area fault.
The attitude class semantics include: bad attitude, friendly attitude, irritability of spleen qi, and mild attitude.
In this embodiment, in step 4, the user intention recognition is performed through a user intention classification model, and the user intention classification model classifies the user intention according to a complaint high-risk type, a flow-sensitive type, a product-sensitive type, a password-sensitive type, a complaint type, and a denial of marketing type.
In this embodiment, in step 4, the tag model of the user includes:
basic feature label: describing client attributes and corresponding social relations from the perspective of natural people;
product requirement labeling: analyzing user ordering Unicom product information from the incoming voice data, including contract plan participation and client tendency information for marketing activity selection;
service characteristic label: analyzing the use condition and the call circle of the user from the aspects of incoming call consultation voice, flow, short message and the like;
a consumption characteristic label: describing the composition of the user's expenditure income, the settlement and payment, the payment and the credit related information;
channel feature labeling: describing channels and channel preference information in customer service contact;
terminal preference tag: describing user terminal use information and terminal preference information through incoming call consultation and service handling;
user service evaluation label: describing the value of a client and the satisfaction degree of the client to the service from the aspects of marketing, maintenance and the like;
position labeling: recording user actions and a base station use track;
internet content preference tag: and classifying the Internet contents to describe the Internet surfing behavior preference of the client.
Such as the client complaint information belonging to the user service evaluation label, the consultation information belonging to the product demand label, the tariff and historical bill information belonging to the consumption characteristic label, the client age and gender belonging to the basic characteristic label, etc.
The behavior data of the user is mapped to a user label system to obtain rules and weights, semantic labels of the user call behaviors are mapped with the user label system, the mapped rules and the weights of different mapping relations are obtained through a label model, an accurate user portrait is constructed, and a characteristic label value and a label weight are calculated.
The user portrait construction needs to be carried out by collecting and cleaning the customer voice through the steps 1 and 2, then the label value of the user is calculated through algorithm combination, the customer voice data is collected through a distributed message system, a data set is established, and the conversation is participled through a word bank of an operator.
In this embodiment, in the step 4, the update appeal of different tag attributes of the operator user tag is different, the user tag includes four update levels, i.e., a daily level, a weekly level, a monthly level and a yearly level, and the tags of different user figures are updated according to different update requirements.
Day-level label: mood-like labels, such as: violent and compatible;
week-level label: product demand labels, such as: a flow packet;
monthly-level tags: package level label: such as: 5G, flow package and auxiliary card handling;
year level label: customer base attribute tag: such as: the age.
In this embodiment, in step 5, the user image set includes a product image, a classification image, a complaint image, a consultation service image, a business tendency image, a consumption tendency image, a service channel preference image, and a consumption capability image.
In the scheme, the business personnel construct the business system based on the business operation requirement to construct the label system, obtain the data of the user in different systems, particularly the voice data, calculate the label value of the user through the label model, finally obtain the user portrait which is different in different application layers, and the user portrait set not only improves the customer service perception, reduces the complaint rate, improves the marketing success rate, reduces the user maintenance cost and assists the product operation.
Although the embodiments of the present invention have been disclosed in the foregoing description, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, it can be fully applied to various fields adapted to the present invention, and further modifications can be easily implemented by those skilled in the art, so that any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A big data-based voice portrait analysis method for communication operator business customers is characterized in that: the method comprises the following steps:
step 1, collecting voice data of a user and a seat in a communication mode, and converting the voice data into text data;
step 2, performing word segmentation and feature selection on the text data through a distributed message system based on the data set, forming a feature vector of each word segmentation, and modeling data;
step 3, realizing automatic clustering of words according to the feature vector of each participle, and after clustering the classified words, carrying out classification marking according to a clustering semantic label;
step 4, carrying out voice service on the classification marks to identify the user intention and calculating the user label value through a label model;
step 5, analyzing the user label and outputting a user image set through a multi-dimensional index;
step 6, verifying the accuracy of the user portrait through a verification model;
and 7, analyzing the user image set to generate a visual multi-dimensional report.
2. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 1, the user voice data needs to be cleaned and preprocessed before being converted into text data.
3. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 1, the voice recording data of the user is transcribed through the ASR, and the recognition mode adopts an acoustic model of a deep neural network to finish the semantic analysis of the voice transcribed voice to the dialog text.
4. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 2, the data set includes: a user behavior database, a system database, a corpus and a lexicon,
a user behavior database: business habits and preference data used by users;
a system database: basic information and service basic information data of users;
corpus: a user cycle portrait model formed by historical dialogue texts of the user and the agent;
a word bank: operator business product thesaurus and service thesaurus.
5. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 2, word segmentation is performed through an HMM algorithm, feature selection is performed on data through a TF-IDF algorithm and an LDA algorithm, text data can be calculated, feature vectors of the segmented words are built through a word2Vec algorithm, data modeling is performed on the data with the features through a CNN algorithm, and a user label value is calculated through a data model and a classification model.
6. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 3, the semantic tags used by the classified labels after the user dialogues text is participled comprise: business acceptance semantic tags, complaint semantic tags, business consultation semantic tags, business query semantic tags, fault semantic tags and attitude semantic tags.
7. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 4, the user intention recognition is performed through a user intention classification model, and the user intention classification model classifies the user intention to the complaint high-risk type, the flow sensitive type, the product sensitive type, the password sensitive type, the complaint type and the marketing refusal type.
8. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in step 4, the tag model of the user includes: the system comprises a basic feature tag, a product requirement tag, a business feature tag, a consumption feature tag, a channel feature tag, a terminal preference tag, a user service evaluation tag, a position tag and an internet content preference tag.
9. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 4, the user tags include four update levels, namely a day level, a week level, a month level and a year level, and the tags of different user figures are updated according to different update requirements.
10. The big data based voice portrait analysis method for communication carrier business customer as claimed in claim 1, characterized in that: in the step 5, the user image set comprises a product image, a classification image, a complaint image, a consultation service image, a service tendency image and a consumption tendency image.
CN202110989375.5A 2021-08-26 2021-08-26 Big data-based voice portrait analysis method for communication operator business customer Pending CN114398512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110989375.5A CN114398512A (en) 2021-08-26 2021-08-26 Big data-based voice portrait analysis method for communication operator business customer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110989375.5A CN114398512A (en) 2021-08-26 2021-08-26 Big data-based voice portrait analysis method for communication operator business customer

Publications (1)

Publication Number Publication Date
CN114398512A true CN114398512A (en) 2022-04-26

Family

ID=81225905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110989375.5A Pending CN114398512A (en) 2021-08-26 2021-08-26 Big data-based voice portrait analysis method for communication operator business customer

Country Status (1)

Country Link
CN (1) CN114398512A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115623130A (en) * 2022-12-19 2023-01-17 北京青牛技术股份有限公司 Agent conversation service business distribution method and system
CN115834940A (en) * 2022-11-14 2023-03-21 浪潮通信信息系统有限公司 IPTV/OTT end-to-end data reverse acquisition analysis method and system
CN115907784A (en) * 2022-11-01 2023-04-04 国网江苏省电力有限公司营销服务中心 Method and system for identifying and actively early warning and notifying sensitive customers in electric power business hall
CN116095619A (en) * 2022-12-30 2023-05-09 天翼物联科技有限公司 Industry short message channel self-selection method based on naive Bayes algorithm
CN117745328A (en) * 2023-12-29 2024-03-22 深圳市南方网通网络技术开发有限公司 Multi-platform-based network marketing data processing method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907784A (en) * 2022-11-01 2023-04-04 国网江苏省电力有限公司营销服务中心 Method and system for identifying and actively early warning and notifying sensitive customers in electric power business hall
CN115834940A (en) * 2022-11-14 2023-03-21 浪潮通信信息系统有限公司 IPTV/OTT end-to-end data reverse acquisition analysis method and system
CN115623130A (en) * 2022-12-19 2023-01-17 北京青牛技术股份有限公司 Agent conversation service business distribution method and system
CN116095619A (en) * 2022-12-30 2023-05-09 天翼物联科技有限公司 Industry short message channel self-selection method based on naive Bayes algorithm
CN117745328A (en) * 2023-12-29 2024-03-22 深圳市南方网通网络技术开发有限公司 Multi-platform-based network marketing data processing method and system

Similar Documents

Publication Publication Date Title
CN114398512A (en) Big data-based voice portrait analysis method for communication operator business customer
CN111488433B (en) Artificial intelligence interactive system suitable for bank and capable of improving field experience
US9910845B2 (en) Call flow and discourse analysis
CN109559221A (en) Collection method, apparatus and storage medium based on user data
CN108764649A (en) Insurance sales method for real-time monitoring, device, equipment and storage medium
CN108521525A (en) Intelligent robot customer service marketing method and system based on user tag system
CN107330706A (en) A kind of electricity battalion's customer service system and commercial operation pattern based on artificial intelligence
CN111383093A (en) Intelligent overdue bill collection method and system
CN111539221B (en) Data processing method and system
CN112507116A (en) Customer portrait method based on customer response corpus and related equipment thereof
CN107071193A (en) The method and apparatus of interactive answering system accessing user
WO2021022790A1 (en) Active risk control method and system based on intelligent interaction
CN112200660B (en) Bank counter business supervision method, device and equipment
CN107527240A (en) A kind of operator's industry product Praise effect identification system and method
CN110009480A (en) The recommended method in judicial collection path, device, medium, electronic equipment
CN109145050B (en) Computing device
CN111541819A (en) Harvesting accelerating method and system
CN114971017A (en) Bank transaction data processing method and device
CN113887214A (en) Artificial intelligence based wish presumption method and related equipment thereof
CN111062422B (en) Method and device for identifying set-way loan system
CN115687754B (en) Active network information mining method based on intelligent dialogue
CN112866491B (en) Multi-meaning intelligent question-answering method based on specific field
CN113283979A (en) Loan credit evaluation method and device for loan applicant and storage medium
CN113191882A (en) Account checking reminding method and device, electronic equipment and medium
CN111192008A (en) Self-help guide system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination