WO2020244073A1 - 基于语音的用户分类方法、装置、计算机设备及存储介质 - Google Patents

基于语音的用户分类方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020244073A1
WO2020244073A1 PCT/CN2019/103265 CN2019103265W WO2020244073A1 WO 2020244073 A1 WO2020244073 A1 WO 2020244073A1 CN 2019103265 W CN2019103265 W CN 2019103265W WO 2020244073 A1 WO2020244073 A1 WO 2020244073A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition result
keyword
word
voice
model
Prior art date
Application number
PCT/CN2019/103265
Other languages
English (en)
French (fr)
Inventor
黄锦伦
张桂芝
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020244073A1 publication Critical patent/WO2020244073A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • This application relates to the field of voice recognition technology, and in particular to a voice-based user classification method, device, computer equipment and storage medium.
  • the quality inspection post can only listen to the recorded information of the call during quality inspection, and cannot be converted into text and cannot be known in real time.
  • the embodiments of the application provide a voice-based user classification method, device, computer equipment, and storage medium, which are designed to solve the problem that the quality inspection post in the prior art can only listen to the recordings one by one when performing quality inspection on the voice between the agent and the customer. Information, recorded information cannot be converted into text, and the effect of communication between the agent and the customer cannot be known in real time, which reduces the efficiency of voice quality inspection.
  • an embodiment of the present application provides a voice-based user classification method, which includes:
  • an embodiment of the present application provides a voice-based user classification device, which includes:
  • a voice recognition unit configured to receive a voice to be recognized, recognize the voice to be recognized through the N-gram model, and obtain a recognition result
  • the keyword extraction unit is used to extract keywords from the recognition result through a word frequency-inverse text frequency index model to obtain a keyword set corresponding to the recognition result;
  • the emotion recognition unit is used to obtain the semantic vector of the keyword set, and use the semantic vector as the input of the text emotion classifier to obtain the text emotion recognition result;
  • the user portrait drawing unit is used to obtain the part-of-speech keywords of the names in the keyword set, and convert the part-of-speech keywords of the names into corresponding tags according to the tag conversion strategy corresponding to the preset tag library, so as to obtain the corresponding tags User portrait.
  • an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer
  • the program implements the voice-based user classification method described in the first aspect.
  • the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned The voice-based user classification method described in one aspect.
  • FIG. 1 is a schematic diagram of an application scenario of a voice-based user classification method provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of a voice-based user classification method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of another process of a voice-based user classification method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a sub-flow of a voice-based user classification method provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of another sub-flow of a voice-based user classification method provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of another sub-flow of a voice-based user classification method provided by an embodiment of this application.
  • FIG. 7 is a schematic block diagram of a voice-based user classification apparatus provided by an embodiment of the application.
  • FIG. 8 is another schematic block diagram of a voice-based user classification apparatus provided by an embodiment of this application.
  • FIG. 9 is a schematic block diagram of subunits of a voice-based user classification apparatus according to an embodiment of the application.
  • FIG. 10 is a schematic block diagram of another subunit of the voice-based user classification apparatus provided by an embodiment of the application.
  • FIG. 11 is a schematic block diagram of another subunit of the voice-based user classification apparatus provided by an embodiment of this application.
  • FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • Figure 1 is a schematic diagram of an application scenario of a voice-based user classification method provided by an embodiment of this application
  • Figure 2 is a schematic flowchart of a voice-based user classification method provided by an embodiment of this application.
  • the user classification method is applied to the server, and the method is executed by the application software installed in the server.
  • the method includes steps S110 to S140.
  • the server can receive the training set corpus training to obtain the N-gram model, and use the N-gram model to recognize the speech to be recognized that is set at the agent end and uploaded to the server. To get the recognition result.
  • the method before step S110, the method further includes:
  • the training set corpus is received, and the training set corpus is input to the initial N-gram model for training to obtain an N-gram model; wherein, the N-gram model is an N-gram model.
  • the N-gram model is a language model (Language Model, LM).
  • the language model is a probability-based discriminant model. Its input is a sentence (the sequence of words), and the output is this sentence The probability is the joint probability of these words (joint probability).
  • N-Gram language model Assuming that the sentence T is composed of word sequences w 1 , w 2 , w 3 ... w n , the N-Gram language model is expressed as follows:
  • conditional probability of each word in the sentence T can be obtained by counting in the corpus. Then the n-ary model is as follows:
  • C (w in-1, ... , w i) represents a string w in-1, ..., w i is the number of times in the corpus.
  • the model fusion ratio of the first N-gram model and the second N-gram model is also 2:8.
  • the first N-gram model and the second N-gram model are fused, and finally an N-gram model for speech recognition is obtained.
  • S120 Perform keyword extraction on the recognition result through a word frequency-inverse text frequency index model to obtain a keyword set corresponding to the recognition result.
  • step S120 includes:
  • the word segmentation model based on probability statistics can find the target word string W so that W satisfies: P(W
  • C) MAX(P(Wa
  • the word frequency-inverse text frequency index model (ie TF-IDF model, TF-IDF is the abbreviation of Term Frequency-Inverse Document Frequency) is used to extract the word segmentation result.
  • the keyword information before the preset first ranking value is used as the keyword set.
  • the keyword information before the preset ranking value in the word segmentation result is extracted through the TF-IDF model, as follows:
  • IDF i lg[total number of documents in the corpus/(number of documents containing the word segmentation+1)];
  • the denominator is larger, and the inverse document frequency is smaller and closer to 0.
  • the reason for adding 1 to the denominator is to prevent the denominator from being 0 (that is, all documents do not contain the word).
  • TF-IDF is directly proportional to the number of occurrences of a word in the document, and inversely proportional to the number of occurrences of the word in the entire language. Therefore, the automatic extraction of keywords is to calculate the TF-IDF value of each word segmentation of the document, and then sort them in descending order, and take the top N words as the keyword list of the document.
  • step S130 includes:
  • the target word vector corresponding to each keyword in the keyword set can be obtained correspondingly.
  • the word vector corresponding to the keyword information is obtained based on a pre-built vocabulary list query.
  • the acquisition process of the word vector is called word2vec, and the function is to convert the words in the natural language into a dense vector that the computer can understand.
  • the corpus that is, the vocabulary
  • AA, BB, CC, DD where AA, BB, CC, and DD represent a Chinese word
  • the words are first converted into discrete individual symbols through One-Hot Encoder, and then converted into low-dimensional continuous values through Word2Vec dimensionality reduction, which is a dense vector, and words with similar meanings will be mapped To a similar position in the vector space.
  • the semantic vector corresponding to the keyword set When the semantic vector corresponding to the keyword set is obtained, it can be input to a traditional classifier to obtain a text emotion recognition result.
  • the text sentiment classifier can choose the traditional classifier (SVM or Bayes), and the text sentiment recognition result can be obtained through the traditional classifier.
  • SVM traditional classifier
  • Bayes Bayes
  • SVM Small Vector Machine
  • SVM Small Vector Machine
  • a support vector machine which is a common identification method.
  • machine learning it is a supervised learning model, usually used for pattern recognition, classification and regression analysis.
  • the Bayesian classifier is the classifier with the smallest probability of classification error or the smallest average risk in the case of a predetermined cost. Its design method is one of the most basic statistical classification methods.
  • the classification principle is to calculate the posterior probability of a certain object by using the Bayesian formula, that is, the probability that the object belongs to a certain class, and select the class with the largest posterior probability as the class to which the object belongs.
  • the customer's acceptance, pleasure, and irritability when hearing the promotion marketing information can be analyzed. For example, when the quality inspection post checks the recording information of irritable emotions, you can listen to this type of recording information as long as you enter the keywords corresponding to the text emotion recognition results.
  • step S140 includes:
  • S141 Obtain, in the tag library, a tag conversion strategy corresponding to each keyword in the keywords of the name part of speech in the keyword set;
  • transforming qualitative information into quantitative classification is an important work link of the user portrait, which has higher business scenario requirements. Its main purpose is to help companies simplify complex data, qualitatively categorize transaction data, and incorporate business analysis requirements to process data commercially.
  • customers when setting a label conversion strategy, customers can be divided into life stages such as students, youth, middle-aged, middle-aged, middle-aged, and old according to age ranges. Different financial service needs arise from different life stages. When looking for target customers, you can locate target customers through life stages. Companies can use their income, education, assets, etc. to classify customers into low, medium, and high-end customers, and provide different financial services based on their financial service needs. You can refer to its financial consumption records and asset information, as well as transaction products and purchased products, to qualitatively describe customer behavior characteristics to distinguish e-commerce customers, wealth management customers, insurance customers, stable investment customers, aggressive investment customers, etc.
  • the user portrait corresponding to the recognition result can be drawn based on these keywords.
  • the customer intention model can be analyzed and learned, so that the agent can push more accurate information to the user based on the user portrait.
  • the method further includes:
  • the keyword with the maximum word frequency-inverse text frequency index in the keyword set may be obtained as the target keyword, and then the The time point of the target keyword in the voice to be recognized is marked with keywords (similar to marking the climax of a song).
  • step S140 the method further includes:
  • the text emotion recognition result corresponding to the recognition result is added as a user emotion tag to the user portrait corresponding to the recognition result to obtain the user portrait after fusion.
  • the text emotion recognition result after obtaining the text emotion recognition result corresponding to the voice to be recognized of a certain user and the user portrait, the text emotion recognition result can also be added as a user emotion tag to the user portrait corresponding to the recognition result to form User portrait after fusion with user emotional tag data.
  • the quality inspection post checks the recording information of irritable emotions, as long as the keyword corresponding to the text emotion recognition result is input, the user portrait of this type, and the voice to be recognized and the recognition result corresponding to each user portrait can be heard.
  • This method realizes the text emotion recognition and user portrait drawing after voice recognition based on the to-be-recognized speech communicated between the agent and the user, effectively classifies various types of customers, and facilitates the random inspection of quality inspection posts, which improves the efficiency of quality inspection.
  • the embodiment of the present application also provides a voice-based user classification device, which is used to execute any embodiment of the aforementioned voice-based user classification method.
  • FIG. 7 is a schematic block diagram of a voice-based user classification apparatus provided in an embodiment of the present application.
  • the voice-based user classification device 100 can be configured in a server.
  • the voice-based user classification device 100 includes a voice recognition unit 110, a keyword extraction unit 120, an emotion recognition unit 130, and a user portrait drawing unit 140.
  • the voice recognition unit 110 is configured to receive the voice to be recognized, and to recognize the voice to be recognized through the N-gram model to obtain a recognition result.
  • the server can receive the training set corpus training to obtain the N-gram model, and use the N-gram model to recognize the speech to be recognized that is set at the agent end and uploaded to the server. To get the recognition result.
  • the voice-based user classification apparatus 100 further includes:
  • the model training unit is configured to receive a training set corpus, and input the training set corpus to the initial N-gram model for training to obtain an N-gram model; wherein the N-gram model is an N-gram model.
  • the N-gram model is a language model (Language Model, LM).
  • the language model is a probability-based discriminant model. Its input is a sentence (the sequence of words), and the output is this sentence The probability is the joint probability of these words (joint probability).
  • the model fusion ratio of the first N-gram model and the second N-gram model is also 2:8.
  • the first N-gram model and the second N-gram model are fused, and finally an N-gram model for speech recognition is obtained.
  • the keyword extraction unit 120 is configured to perform keyword extraction on the recognition result through a word frequency-inverse text frequency index model to obtain a keyword set corresponding to the recognition result.
  • the keyword extraction unit 120 includes:
  • the word segmentation unit 121 is configured to perform word segmentation on the recognition result through a word segmentation model based on probability statistics to obtain a corresponding word segmentation result;
  • the target extraction unit 122 is configured to extract keyword information located before the preset first ranking value in the word segmentation result through the word frequency-inverse text frequency index model, as a keyword set corresponding to the recognition result.
  • the word frequency-inverse text frequency index model (that is, the TF-IDF model, TF-IDF is short for Term Frequency-Inverse Document Frequency) is used to extract
  • the keyword information before the preset first ranking value in the word segmentation result is used as a keyword set.
  • the emotion recognition unit 130 is configured to obtain a semantic vector of the keyword set, and use the semantic vector as an input of a text emotion classifier to obtain a text emotion recognition result.
  • the emotion recognition unit 130 includes:
  • the target word vector obtaining unit 131 is configured to obtain the target word vector corresponding to each keyword information in the keyword set;
  • the semantic vector obtaining unit 132 is configured to obtain a semantic vector corresponding to the keyword set according to each target word vector in the keyword set and the weight corresponding to each target word vector.
  • the target word vector corresponding to each keyword in the keyword set can be obtained correspondingly.
  • the word vector corresponding to the keyword information is obtained based on the pre-built vocabulary list query.
  • the word vector acquisition process is called word2vec, and its function is to convert the words in the natural language into dense vectors that the computer can understand.
  • AA, BB, CC, DD where AA, BB, CC, and DD represent a Chinese word
  • each corresponds to a vector, and only one value in the vector is 1 and the rest are 0 .
  • the words are first converted into discrete individual symbols through One-Hot Encoder, and then converted into low-dimensional continuous values through Word2Vec dimensionality reduction, which is a dense vector, and words with similar meanings will be mapped To a similar position in the vector space.
  • the semantic vector corresponding to the keyword set When the semantic vector corresponding to the keyword set is obtained, it can be input to a traditional classifier to obtain a text emotion recognition result.
  • the text sentiment classifier can choose the traditional classifier (SVM or Bayes), and the text sentiment recognition result can be obtained through the traditional classifier.
  • SVM traditional classifier
  • Bayes Bayes
  • the customer's acceptance, pleasure, and irritability when hearing the promotion marketing information can be analyzed. For example, when the quality inspection post checks the recording information of irritable emotions, you can listen to this type of recording information as long as you enter the keywords corresponding to the text emotion recognition results.
  • the user portrait drawing unit 140 is used to obtain the part-of-speech keywords of the names in the keyword set, and convert the part-of-speech keywords of the names into corresponding tags according to the tag conversion strategy corresponding to the preset tag library, so as to obtain the recognition results The corresponding user portrait.
  • the user portrait drawing unit 140 includes:
  • the strategy obtaining unit 141 is configured to obtain, in the tag library, the tag conversion strategy corresponding to each keyword in the keywords of the name part of speech in the keyword set;
  • the tag conversion unit 142 is configured to convert each keyword into a tag according to the tag conversion strategy corresponding to each keyword;
  • the portrait drawing unit 143 is configured to form a user portrait corresponding to the recognition result from tags corresponding to each keyword.
  • transforming qualitative information into quantitative classification is an important work link of the user portrait, which has higher business scenario requirements. Its main purpose is to help companies simplify complex data, qualitatively categorize transaction data, and incorporate business analysis requirements to process data commercially.
  • customers when setting a label conversion strategy, customers can be divided into life stages such as students, youth, middle-aged, middle-aged, middle-aged, and old according to age ranges. Different financial service needs arise from different life stages. When looking for target customers, you can locate target customers through life stages. Companies can use their income, education, assets, etc. to classify customers into low, medium, and high-end customers, and provide different financial services based on their financial service needs. You can refer to its financial consumption records and asset information, as well as transaction products and purchased products, to qualitatively describe customer behavior characteristics to distinguish e-commerce customers, wealth management customers, insurance customers, stable investment customers, aggressive investment customers, etc.
  • the user portrait corresponding to the recognition result can be drawn based on these keywords.
  • the customer intention model can be analyzed and learned, so that the agent can push more accurate information to the user based on the user portrait.
  • the voice-based user classification apparatus 100 further includes:
  • the key point marking unit 150 is used to obtain the keyword whose word frequency-inverse text frequency index is the maximum value in the keyword set, as a target keyword, and locate the time point of the target keyword in the recognition result and Perform keyword tagging.
  • the keyword with the maximum word frequency-inverse text frequency index in the keyword set may be obtained as the target keyword, and then the The time point of the target keyword in the voice to be recognized is marked with keywords (similar to marking the climax of a song).
  • the voice-based user classification apparatus 100 further includes:
  • the emotion tag fusion unit is used to add the text emotion recognition result corresponding to the recognition result as the user emotion tag to the user portrait corresponding to the recognition result to obtain the user portrait after fusion.
  • the text emotion recognition result after obtaining the text emotion recognition result corresponding to the voice to be recognized of a certain user and the user portrait, the text emotion recognition result can also be added as a user emotion tag to the user portrait corresponding to the recognition result to form User portrait after fusion with user emotional tag data.
  • the quality inspection post checks the recording information of irritable emotions, as long as the keyword corresponding to the text emotion recognition result is input, the user portrait of this type, and the voice to be recognized and the recognition result corresponding to each user portrait can be heard.
  • the device realizes text emotion recognition and user portrait drawing after voice recognition based on the to-be-recognized speech communicated between the agent and the user, effectively classifies various types of customers, and facilitates the random inspection of quality inspection posts, which improves the efficiency of quality inspection.
  • the above voice-based user classification device can be implemented in the form of a computer program, which can run on a computer device as shown in FIG. 12.
  • FIG. 12 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute a voice-based user classification method.
  • the processor 502 is used to provide computing and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute a voice-based user classification method.
  • the network interface 505 is used for network communication, such as providing data information transmission.
  • the structure shown in FIG. 12 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 502 is configured to run a computer program 5032 stored in a memory to implement the voice-based user classification method in the embodiment of the present application.
  • the embodiment of the computer device shown in FIG. 12 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or less components than those shown in the figure. Or combine certain components, or different component arrangements.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 12, and will not be repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the voice-based user classification method in the embodiment of the present application.
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.
  • a physical, non-transitory storage medium such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了基于语音的用户分类方法、装置、计算机设备及存储介质。该方法包括:接收待识别语音,通过N-gram模型对待识别语音进行识别,得到识别结果;将识别结果通过关键词抽取,得到与识别结果对应的关键词集合;获取关键词集合的语义向量,将语义向量作为文本情感分类器的输入,得到文本情感识别结果;以及将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。

Description

基于语音的用户分类方法、装置、计算机设备及存储介质
本申请要求于2019年6月6日提交中国专利局、申请号为201910492604.5、申请名称为“基于语音的用户分类方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音识别技术领域,尤其涉及一种基于语音的用户分类方法、装置、计算机设备及存储介质。
背景技术
目前,电话营销在业务推广中得到了广泛的应用,但是现在坐席对客户拨打电话进行沟通后,质检岗对通话的录音信息进行质检时只能逐个收听,无法转化为文字,不能实时知道坐席与客户之间的沟通效果;也无法对录音信息进行情感分类及关键语音节点的标记,降低了语音质检的效率。
发明内容
本申请实施例提供了一种基于语音的用户分类方法、装置、计算机设备及存储介质,旨在解决现有技术中质检岗对坐席与客户之间的语音进行质检时只能逐个收听录音信息,录音信息无法转化为文字,不能实时知道坐席与客户之间的沟通效果,降低了语音质检的效率的问题。
第一方面,本申请实施例提供了一种基于语音的用户分类方法,其包括:
接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果;
将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合;
获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果;以及
获取所述关键词集合中名称词性的关键词,将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对 应的用户画像。
第二方面,本申请实施例提供了一种基于语音的用户分类装置,其包括:
语音识别单元,用于接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果;
关键词抽取单元,用于将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合;
情感识别单元,用于获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果;以及
用户画像绘制单元,用于获取所述关键词集合中名称词性的关键词,将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的基于语音的用户分类方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的基于语音的用户分类方法。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的基于语音的用户分类方法的应用场景示意图;
图2为本申请实施例提供的基于语音的用户分类方法的流程示意图;
图3为本申请实施例提供的基于语音的用户分类方法的另一流程示意图;
图4为本申请实施例提供的基于语音的用户分类方法的子流程示意图;
图5为本申请实施例提供的基于语音的用户分类方法的另一子流程示意图;
图6为本申请实施例提供的基于语音的用户分类方法的另一子流程示意图;
图7为本申请实施例提供的基于语音的用户分类装置的示意性框图;
图8为本申请实施例提供的基于语音的用户分类装置的另一示意性框图;
图9为本申请实施例提供的基于语音的用户分类装置的子单元示意性框图;
图10为本申请实施例提供的基于语音的用户分类装置的另一子单元示意性框图;
图11为本申请实施例提供的基于语音的用户分类装置的另一子单元示意性框图;
图12为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
请参阅图1和图2,图1为本申请实施例提供的基于语音的用户分类方法的应用场景示意图,图2为本申请实施例提供的基于语音的用户分类方法的流程示意图,该基于语音的用户分类方法应用于服务器中,该方法通过安装于服务器中的应用软件进行执行。
如图2所示,该方法包括步骤S110~S140。
S110、接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果。
在本实施例中,是站在服务器的角度描述技术方案。服务器可接收训练集语料库训练得到N-gram模型,通过N-gram模型对设置在坐席端上传至服务器的待识别语音进行识别。从而得到识别结果。
在一实施例中,步骤S110之前还包括:
接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型。
在本实施例中,N-gram模型是一种语言模型(Language Model,LM),语言模型是一个基于概率的判别模型,它的输入是一句话(单词的顺序序列),输出是这句话的概率,即这些单词的联合概率(joint probability)。
假设句子T是有词序列w 1,w 2,w 3...w n组成,用公式表示N-Gram语言模型如下:
P(T)=P(w 1)*p(w 2)*p(w 3)*…*p(w n)
=p(w 1)*p(w 2|w 1)*p(w 3|w 1w 2)*…*p(w n|w 1w 2w 3...)
一般常用的N-Gram模型是Bi-Gram和Tri-Gram。分别用公式表示如下:
Bi-Gram:
P(T)=p(w 1|begin)*p(w 2|w 1)*p(w 3|w 2)*…*p(w n|w n-1)
Tri-Gram:
P(T)=p(w 1|begin 1,begin 2)*p(w 2|w 1,begin 1)*p(w 3|w 2w 1)*…*p(w n|w n-1,w n-2);
可见,对于句子T中每一个词出现的条件概率,可以通过在语料库中统计计数的方式得出。则n元模型如下:
p(w n|w 1w 2w 3...)=C(w i-n-1,...,w i)/C(w i-n-1,...,w i-1);
式中C(w i-n-1,...,w i)表示字符串w i-n-1,...,w i在语料库中的次数。
根据所设置的模型融合比例,如商品语料与通用语料的比例设置为2∶8,得到第一N-gram模型和所述第二N-gram模型的模型融合比例也为2∶8,将所述第一N-gram模型和所述第二N-gram模型进行融合,最终得到用于语音识别的N-gram模型。
S120、将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合。
在一实施例中,如图4所示,步骤S120包括:
S121、将所述识别结果通过基于概率统计分词模型进行分词,得到对应的 分词结果;
S122、通过词频-逆文本频率指数模型,抽取所述分词结果中位于预设的第一排名值之前的关键词信息,以作为与所述识别结果对应的关键词集合。
在本实施例中,将所述识别结果通过基于概率统计分词模型进行分词过程如下:
例如,令C=C1C2...Cm,C是待切分的汉字串,令W=W1W2...Wn,W是切分的结果,Wa,Wb,……,Wk是C的所有可能的切分方案。那么,基于概率统计分词模型就是能够找到目的词串W,使得W满足:P(W|C)=MAX(P(Wa|C),P(Wb|C)...P(Wk|C))的分词模型,上述分词模型得到的词串W即估计概率为最大之词串。即:
对一个待分词的子串S,按照从左到右的顺序取出全部候选词w1、w2、...、wi、...、wn;在词典中查出每个候选词的概率值P(wi),并记录每个候选词的全部左邻词;计算每个候选词的累计概率,同时比较得到每个候选词的最佳左邻词;如果当前词wn是字串S的尾词,且累计概率P(wn)最大,则wn就是S的终点词;从wn开始,按照从右到左顺序,依次将每个词的最佳左邻词输出,即S的分词结果。
获取了与所述识别结果对应的分词结果后,再通过词频-逆文本频率指数模型(即TF-IDF模型,TF-IDF是Term Frequency-Inverse Document Frequency的简写),抽取所述分词结果中位于预设的第一排名值之前的关键词信息以作为关键词集合。通过TF-IDF模型抽取所述分词结果中位于预设的排名值之前的关键词信息,具体如下:
1)计算分词结果中每一分词i的词频,记为TF i
2)计算分词结果中每一分词i的逆文档频率IDF i
在计算每一分词i的逆文档频率IDF i时,需要一个语料库(与分词过程中的字典类似),用来模拟语言的使用环境;
逆文档频率IDF i=lg[语料库的文档总数/(包含该分词的文档数+1)];
如果一个词越常见,那么分母就越大,逆文档频率就越小越接近0。分母之所以要加1,是为了避免分母为0(即所有文档都不包含该词)。
3)根据TF i*IDF i计算分词结果中每一分词i对应的词频-逆文本频率指数TF-IDF i
显然,TF-IDF与一个词在文档中的出现次数成正比,与该词在整个语言中的出现次数成反比。所以,自动提取关键词即是计算出文档的每个分词的TF-IDF值,然后按降序排列,取排在前N位的词作为文档的关键词列表。
4)将分词结果中每一分词对应的词频-逆文本频率指数按降序排序,取排名位于预设的第一排名值之前(例如预设的第一排名值为6)的分词组成与所述识别结果对应的关键词集合。
S130、获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果。
在一实施例中,如图5所示,步骤S130包括:
S131、获取所述关键词集合中各关键词信息对应的目标词向量;
S132、根据所述关键词集合中各目标词向量,及各目标词向量对应的权重,获取与所述关键词集合对应的语义向量。
在本实施例中,获取与识别结果对应的关键词集合后,即可对应获取关键词集合中每一关键词对应的目标词向量。其中,获取关键词信息对应的词向量是基于预先构建的词汇表查询得到,词向量的获取过程称为word2vec,作用就是将自然语言中的字词转为计算机可以理解的稠密向量。例如,在语料库(也即词汇表)中,AA、BB、CC、DD(其中AA、BB、CC、DD代表一个中文词)各对应一个向量,向量中只有一个值为1,其余都为0。即先通过One-Hot Encoder(独热码)将字词转为离散的单独的符号,再通过Word2Vec降维转化为低维度的连续值,也就是稠密向量,并且其中意思相近的词将被映射到向量空间中相近的位置。
当获取了所述关键词集合对应的语义向量时,即可输入至传统分类器,得到文本情感识别结果。
文本情感分类器可以选择传统分类器(SVM或者贝叶斯),通过传统分类器得到文本情感识别结果。
SVM(Support Vector Machine)指的是支持向量机,是常见的一种判别方法。在机器学习领域,是一个有监督的学习模型,通常用来进行模式识别、分类以及回归分析。
贝叶斯分类器是各种分类器中分类错误概率最小或者在预先给定代价的情况下平均风险最小的分类器。它的设计方法是一种最基本的统计分类方法。其 分类原理是通过某对象的先验概率,利用贝叶斯公式计算出其后验概率,即该对象属于某一类的概率,选择具有最大后验概率的类作为该对象所属的类。
当对客户的待识别语音提取识别结果,并进行文本情感识别后,可以分析客户听到推广营销信息时的接受度,愉悦度,烦躁度。例如质检岗抽查烦躁情绪的录音信息时,只要输入文本情感识别结果对应的关键字,就可以听取这一类型的录音信息。
S140、获取所述关键词集合中名称词性的关键词,将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。
在一实施例中,如图6所示,步骤S140包括:
S141、在所述标签库中获取与所述关键词集合中名称词性的关键词中各关键词对应的标签转化策略;
S142、根据与各关键词对应的标签转化策略,将各关键词对应转化为标签;
S143、由各关键词对应的标签,组成与所述识别结果对应的用户画像。
在本实施例中,将定性信息转化为定量分类是用户画像的一个重要工作环节,具有较高的业务场景要求。其主要目的是帮助企业将复杂数据简单化,将交易数据定性进行归类,并且融入商业分析的要求,对数据进行商业加工。
例如在设置标签转化策略时,可以将客户按照年龄区间分为学生,青年,中青年,中年,中老年,老年等人生阶段。源于各人生阶段的金融服务需求不同,在寻找目标客户时,可以通过人生阶段进行目标客户定位。企业可以利用客户的收入、学历、资产等情况将客户分为低、中、高端客户,并依据其金融服务需求,提供不同的金融服务。可以参考其金融消费记录和资产信息,以及交易产品,购买的产品,将客户行为特征进行定性描述,区分出电商客户,理财客户,保险客户,稳健投资客户,激进投资客户等。
当获取了所述关键词集合中名称词性的关键词,即可根据这些关键词绘制与所述识别结果对应的用户画像。当获知了用户画像后,可以分析获知客户意向模型,从而便于坐席根据用户画像对用户进行更精准的信息推送。
在一实施例中,如图3所示,步骤S140之后还包括:
S150、获取所述关键词集合中词频-逆文本频率指数为最大值的关键词,以作为目标关键词,定位所述目标关键词在所述识别结果中的时间点并进行关键 词标记。
在本实施例中,为了对每一段待识别语音进行关键词的标记时,可以先获取所述关键词集合中词频-逆文本频率指数为最大值的关键词以作为目标关键词,然后所述目标关键词在该待识别的语音中的时间点并进行关键词标记(类似于标记歌曲的高潮部分)。这样质检人员可以很清楚的知道听哪些重点部分,节省时间,无需从头听到尾,提高了质检效率。
在一实施例中,步骤S140之后还包括:
将所述识别结果对应的文本情感识别结果作为用户情感标签增加至所述识别结果对应的用户画像中,得到融合后用户画像。
在本实施例中,即获取某一用户的待识别语音对应的文本情感识别结果以及用户画像后,还可将文本情感识别结果作为用户情感标签增加至所述识别结果对应的用户画像中,形成具有用户情感标签数据的融合后用户画像。例如质检岗抽查烦躁情绪的录音信息时,只要输入文本情感识别结果对应的关键字,就可以听取这一类型的用户画像,以及与每一用户画像对应的待识别语音以及识别结果。
该方法实现了根据坐席与用户沟通的待识别语音进行语音识别后,进行文本情感识别及用户画像绘制,有效将各类型客户分类后便于质检岗分了抽查,提高了质检效率。
本申请实施例还提供一种基于语音的用户分类装置,该基于语音的用户分类装置用于执行前述基于语音的用户分类方法的任一实施例。具体地,请参阅图7,图7是本申请实施例提供的基于语音的用户分类装置的示意性框图。该基于语音的用户分类装置100可以配置于服务器中。
如图7所示,基于语音的用户分类装置100包括语音识别单元110、关键词抽取单元120、情感识别单元130、用户画像绘制单元140。
语音识别单元110,用于接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果。
在本实施例中,是站在服务器的角度描述技术方案。服务器可接收训练集语料库训练得到N-gram模型,通过N-gram模型对设置在坐席端上传至服务器的待识别语音进行识别。从而得到识别结果。
在一实施例中,基于语音的用户分类装置100还包括:
模型训练单元,用于接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型。
在本实施例中,N-gram模型是一种语言模型(Language Model,LM),语言模型是一个基于概率的判别模型,它的输入是一句话(单词的顺序序列),输出是这句话的概率,即这些单词的联合概率(joint probability)。
根据所设置的模型融合比例,如商品语料与通用语料的比例设置为2∶8,得到第一N-gram模型和所述第二N-gram模型的模型融合比例也为2∶8,将所述第一N-gram模型和所述第二N-gram模型进行融合,最终得到用于语音识别的N-gram模型。
关键词抽取单元120,用于将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合。
在一实施例中,如图9所示,关键词抽取单元120包括:
分词单元121,用于将所述识别结果通过基于概率统计分词模型进行分词,得到对应的分词结果;
目标抽取单元122,用于通过词频-逆文本频率指数模型,抽取所述分词结果中位于预设的第一排名值之前的关键词信息,以作为与所述识别结果对应的关键词集合。
在本实施例中,获取了与所述识别结果对应的分词结果后,再通过词频-逆文本频率指数模型(即TF-IDF模型,TF-IDF是Term Frequency-Inverse Document Frequency的简写),抽取所述分词结果中位于预设的第一排名值之前的关键词信息以作为关键词集合。
情感识别单元130,用于获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果。
在一实施例中,如图10所示,情感识别单元130包括:
目标词向量获取单元131,用于获取所述关键词集合中各关键词信息对应的目标词向量;
语义向量获取单元132,用于根据所述关键词集合中各目标词向量,及各目标词向量对应的权重,获取与所述关键词集合对应的语义向量。
在本实施例中,获取与识别结果对应的关键词集合后,即可对应获取关键词集合中每一关键词对应的目标词向量。其中,获取关键词信息对应的词向量 是基于预先构建的词汇表查询得到,词向量的获取过程称为word2vec,作用就是将自然语言中的字词转为计算机可以理解的稠密向量。例如,在语料库(也即词汇表)中,AA、BB、CC、DD(其中AA、BB、CC、DD代表一个中文词)各对应一个向量,向量中只有一个值为1,其余都为0。即先通过One-Hot Encoder(独热码)将字词转为离散的单独的符号,再通过Word2Vec降维转化为低维度的连续值,也就是稠密向量,并且其中意思相近的词将被映射到向量空间中相近的位置。
当获取了所述关键词集合对应的语义向量时,即可输入至传统分类器,得到文本情感识别结果。
文本情感分类器可以选择传统分类器(SVM或者贝叶斯),通过传统分类器得到文本情感识别结果。
当对客户的待识别语音提取识别结果,并进行文本情感识别后,可以分析客户听到推广营销信息时的接受度,愉悦度,烦躁度。例如质检岗抽查烦躁情绪的录音信息时,只要输入文本情感识别结果对应的关键字,就可以听取这一类型的录音信息。
用户画像绘制单元140,用于获取所述关键词集合中名称词性的关键词,将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。
在一实施例中,如图11所示,用户画像绘制单元140包括:
策略获取单元141,用于在所述标签库中获取与所述关键词集合中名称词性的关键词中各关键词对应的标签转化策略;
标签转化单元142,用于根据与各关键词对应的标签转化策略,将各关键词对应转化为标签;
画像绘制单元143,用于由各关键词对应的标签,组成与所述识别结果对应的用户画像。
在本实施例中,将定性信息转化为定量分类是用户画像的一个重要工作环节,具有较高的业务场景要求。其主要目的是帮助企业将复杂数据简单化,将交易数据定性进行归类,并且融入商业分析的要求,对数据进行商业加工。
例如在设置标签转化策略时,可以将客户按照年龄区间分为学生,青年,中青年,中年,中老年,老年等人生阶段。源于各人生阶段的金融服务需求不 同,在寻找目标客户时,可以通过人生阶段进行目标客户定位。企业可以利用客户的收入、学历、资产等情况将客户分为低、中、高端客户,并依据其金融服务需求,提供不同的金融服务。可以参考其金融消费记录和资产信息,以及交易产品,购买的产品,将客户行为特征进行定性描述,区分出电商客户,理财客户,保险客户,稳健投资客户,激进投资客户等。
当获取了所述关键词集合中名称词性的关键词,即可根据这些关键词绘制与所述识别结果对应的用户画像。当获知了用户画像后,可以分析获知客户意向模型,从而便于坐席根据用户画像对用户进行更精准的信息推送。
在一实施例中,如图8所示,基于语音的用户分类装置100还包括:
关键点标记单元150,用于获取所述关键词集合中词频-逆文本频率指数为最大值的关键词,以作为目标关键词,定位所述目标关键词在所述识别结果中的时间点并进行关键词标记。
在本实施例中,为了对每一段待识别语音进行关键词的标记时,可以先获取所述关键词集合中词频-逆文本频率指数为最大值的关键词以作为目标关键词,然后所述目标关键词在该待识别的语音中的时间点并进行关键词标记(类似于标记歌曲的高潮部分)。这样质检人员可以很清楚的知道听哪些重点部分,节省时间,无需从头听到尾,提高了质检效率。
在一实施例中,基于语音的用户分类装置100还包括:
情感标签融合单元,用于将所述识别结果对应的文本情感识别结果作为用户情感标签增加至所述识别结果对应的用户画像中,得到融合后用户画像。
在本实施例中,即获取某一用户的待识别语音对应的文本情感识别结果以及用户画像后,还可将文本情感识别结果作为用户情感标签增加至所述识别结果对应的用户画像中,形成具有用户情感标签数据的融合后用户画像。例如质检岗抽查烦躁情绪的录音信息时,只要输入文本情感识别结果对应的关键字,就可以听取这一类型的用户画像,以及与每一用户画像对应的待识别语音以及识别结果。
该装置实现了根据坐席与用户沟通的待识别语音进行语音识别后,进行文本情感识别及用户画像绘制,有效将各类型客户分类后便于质检岗分了抽查,提高了质检效率。
上述基于语音的用户分类装置可以实现为计算机程序的形式,该计算机程 序可以在如图12所示的计算机设备上运行。
请参阅图12,图12是本申请实施例提供的计算机设备的示意性框图。该计算机设备500是服务器,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。
参阅图12,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行基于语音的用户分类方法。
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行基于语音的用户分类方法。
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图12中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请实施例的基于语音的用户分类方法。
本领域技术人员可以理解,图12中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图12所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中, 通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本申请实施例的基于语音的用户分类方法。
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于语音的用户分类方法,其中,包括:
    接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果;
    将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合;
    获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果;以及
    将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。
  2. 根据权利要求1所述的基于语音的用户分类方法,其中,所述接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果之前,还包括:
    接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型。
  3. 根据权利要求1所述的基于语音的用户分类方法,其中,所述获取所述关键词集合中名称词性的关键词,将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像之后,还包括:
    获取所述关键词集合中词频-逆文本频率指数为最大值的关键词,以作为目标关键词,定位所述目标关键词在所述识别结果中的时间点并进行关键词标记。
  4. 根据权利要求1所述的基于语音的用户分类方法,其中,所述将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合,包括:
    将所述识别结果通过基于概率统计分词模型进行分词,得到对应的分词结果;
    通过词频-逆文本频率指数模型,抽取所述分词结果中位于预设的第一排名值之前的关键词信息,以作为与所述识别结果对应的关键词集合。
  5. 根据权利要求1所述的基于语音的用户分类方法,其中,所述获取所述 关键词集合的语义向量,包括:
    获取所述关键词集合中各关键词信息对应的目标词向量;
    根据所述关键词集合中各目标词向量,及各目标词向量对应的权重,获取与所述关键词集合对应的语义向量。
  6. 根据权利要求1所述的基于语音的用户分类方法,其中,所述根据名称词性的关键词及预先设置的标签转化策略,以得到与所述识别结果对应的用户画像,包括:
    在所述标签库中获取与所述关键词集合中名称词性的关键词中各关键词对应的标签转化策略;
    根据与各关键词对应的标签转化策略,将各关键词对应转化为标签;
    由各关键词对应的标签,组成与所述识别结果对应的用户画像。
  7. 根据权利要求2所述的基于语音的用户分类方法,其中,所述接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型,包括:
    将商品语料与通用语料的比例设置为2∶8,分别输入至初始N-gram模型进行训练,得到第一N-gram模型和第二N-gram模型;
    将第一N-gram模型和第二N-gram模型根据2∶8的模型融合比例进行融合,得到用于语音识别的N-gram模型。
  8. 根据权利要求4所述的基于语音的用户分类方法,其中,所述通过词频-逆文本频率指数模型,抽取所述分词结果中位于预设的第一排名值之前的关键词信息,以作为与所述识别结果对应的关键词集合,包括:
    计算分词结果中每一分词i的词频,记为TF i
    计算分词结果中每一分词i的逆文档频率,记为IDF i
    根据TF i*IDF i计算分词结果中每一分词i对应的词频-逆文本频率指数TF-IDF i
    将分词结果中每一分词对应的词频-逆文本频率指数按降序排序,取排名位于预设的第一排名值之前的分词组成与所述识别结果对应的关键词集合。
  9. 根据权利要求4所述的基于语音的用户分类方法,其中,所述将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像之后,还包括:
    将所述识别结果对应的文本情感识别结果作为用户情感标签增加至所述识别结果对应的用户画像中,得到融合后用户画像。
  10. 一种基于语音的用户分类装置,包括:
    语音识别单元,用于接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果;
    关键词抽取单元,用于将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合;
    情感识别单元,用于获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果;以及
    用户画像绘制单元,用于获取所述关键词集合中名称词性的关键词,将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。
  11. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
    接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果;
    将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合;
    获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果;以及
    将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。
  12. 根据权利要求11所述的计算机设备,其中,所述接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果之前,还包括:
    接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型。
  13. 根据权利要求11所述的计算机设备,其中,所述获取所述关键词集合中名称词性的关键词,将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像之后,还包 括:
    获取所述关键词集合中词频-逆文本频率指数为最大值的关键词,以作为目标关键词,定位所述目标关键词在所述识别结果中的时间点并进行关键词标记。
  14. 根据权利要求11所述的计算机设备,其中,所述将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合,包括:
    将所述识别结果通过基于概率统计分词模型进行分词,得到对应的分词结果;
    通过词频-逆文本频率指数模型,抽取所述分词结果中位于预设的第一排名值之前的关键词信息,以作为与所述识别结果对应的关键词集合。
  15. 根据权利要求11所述的计算机设备,其中,所述获取所述关键词集合的语义向量,包括:
    获取所述关键词集合中各关键词信息对应的目标词向量;
    根据所述关键词集合中各目标词向量,及各目标词向量对应的权重,获取与所述关键词集合对应的语义向量。
  16. 根据权利要求11所述的计算机设备,其中,所述根据名称词性的关键词及预先设置的标签转化策略,以得到与所述识别结果对应的用户画像,包括:
    在所述标签库中获取与所述关键词集合中名称词性的关键词中各关键词对应的标签转化策略;
    根据与各关键词对应的标签转化策略,将各关键词对应转化为标签;
    由各关键词对应的标签,组成与所述识别结果对应的用户画像。
  17. 根据权利要求12所述的计算机设备,其中,所述接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型,包括:
    将商品语料与通用语料的比例设置为2∶8,分别输入至初始N-gram模型进行训练,得到第一N-gram模型和第二N-gram模型;
    将第一N-gram模型和第二N-gram模型根据2∶8的模型融合比例进行融合,得到用于语音识别的N-gram模型。
  18. 根据权利要求14所述的计算机设备,其中,所述通过词频-逆文本频率指数模型,抽取所述分词结果中位于预设的第一排名值之前的关键词信息,以 作为与所述识别结果对应的关键词集合,包括:
    计算分词结果中每一分词i的词频,记为TF i
    计算分词结果中每一分词i的逆文档频率,记为IDF i
    根据TF i*IDF i计算分词结果中每一分词i对应的词频-逆文本频率指数TF-IDF i
    将分词结果中每一分词对应的词频-逆文本频率指数按降序排序,取排名位于预设的第一排名值之前的分词组成与所述识别结果对应的关键词集合。
  19. 根据权利要求14所述的计算机设备,其中,所述将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像之后,还包括:
    将所述识别结果对应的文本情感识别结果作为用户情感标签增加至所述识别结果对应的用户画像中,得到融合后用户画像。
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
    接收待识别语音,通过所述N-gram模型对所述待识别语音进行识别,得到识别结果;
    将所述识别结果通过词频-逆文本频率指数模型进行关键词抽取,得到与所述识别结果对应的关键词集合;
    获取所述关键词集合的语义向量,将所述语义向量作为文本情感分类器的输入,得到文本情感识别结果;以及
    将名称词性的关键词根据预先设置的标签库对应的标签转化策略转化成对应标签,以得到与所述识别结果对应的用户画像。
PCT/CN2019/103265 2019-06-06 2019-08-29 基于语音的用户分类方法、装置、计算机设备及存储介质 WO2020244073A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910492604.5A CN110347823A (zh) 2019-06-06 2019-06-06 基于语音的用户分类方法、装置、计算机设备及存储介质
CN201910492604.5 2019-06-06

Publications (1)

Publication Number Publication Date
WO2020244073A1 true WO2020244073A1 (zh) 2020-12-10

Family

ID=68181606

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103265 WO2020244073A1 (zh) 2019-06-06 2019-08-29 基于语音的用户分类方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110347823A (zh)
WO (1) WO2020244073A1 (zh)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046163A (zh) * 2019-11-15 2020-04-21 贝壳技术有限公司 未读消息的处理方法、装置、存储介质及设备
CN111061902B (zh) * 2019-12-12 2023-12-19 广东智媒云图科技股份有限公司 一种基于文本语义分析的绘图方法、装置及终端设备
CN111326142A (zh) * 2020-01-21 2020-06-23 青梧桐有限责任公司 基于语音转文本的文本信息提取方法、系统和电子设备
CN111326160A (zh) * 2020-03-11 2020-06-23 南京奥拓电子科技有限公司 一种纠正噪音文本的语音识别方法、系统及存储介质
CN111563190B (zh) * 2020-04-07 2023-03-14 中国电子科技集团公司第二十九研究所 一种区域网络用户行为的多维度分析与监管方法及系统
CN111695353B (zh) * 2020-06-12 2023-07-04 百度在线网络技术(北京)有限公司 时效性文本的识别方法、装置、设备及存储介质
CN111753802B (zh) * 2020-07-06 2024-06-21 北京猿力未来科技有限公司 识别方法及装置
CN112052375B (zh) * 2020-09-30 2024-06-11 北京百度网讯科技有限公司 舆情获取和词粘度模型训练方法及设备、服务器和介质
CN112329437B (zh) * 2020-10-21 2024-05-28 交通银行股份有限公司 一种智能客服语音质检评分方法、设备及存储介质
CN112507116B (zh) * 2020-12-16 2023-10-10 平安科技(深圳)有限公司 基于客户应答语料的客户画像方法及其相关设备
CN112487039B (zh) * 2020-12-16 2024-04-30 平安养老保险股份有限公司 一种数据处理方法、装置、设备及可读存储介质
CN112712407A (zh) * 2020-12-25 2021-04-27 云汉芯城(上海)互联网科技股份有限公司 一种新客引流的方法、装置、存储介质和设备
CN112579781B (zh) * 2020-12-28 2023-09-15 平安银行股份有限公司 文本归类方法、装置、电子设备及介质
CN112818118B (zh) * 2021-01-22 2024-05-21 大连民族大学 基于反向翻译的中文幽默分类模型的构建方法
CN112818009A (zh) * 2021-02-25 2021-05-18 华侨大学 一种在线展会的用户画像建模方法与系统
CN113139141B (zh) * 2021-04-22 2023-10-31 康键信息技术(深圳)有限公司 用户标签扩展标注方法、装置、设备及存储介质
CN113743721A (zh) * 2021-07-29 2021-12-03 深圳市东信时代信息技术有限公司 营销策略生成方法、装置、计算机设备及存储介质
CN114048283A (zh) * 2022-01-11 2022-02-15 北京仁科互动网络技术有限公司 用户画像生成方法、装置、电子设备及存储介质
CN114048714A (zh) * 2022-01-14 2022-02-15 阿里巴巴达摩院(杭州)科技有限公司 逆文本标准化方法和装置
CN116523545B (zh) * 2023-06-28 2023-09-15 大汉电子商务有限公司 基于大数据的用户画像构建方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102723078A (zh) * 2012-07-03 2012-10-10 武汉科技大学 基于自然言语理解的语音情感识别方法
KR20140002171A (ko) * 2012-06-28 2014-01-08 한국전자통신연구원 자동통역방법
US20140278408A1 (en) * 2013-03-15 2014-09-18 Lg Electronics Inc. Mobile terminal and method of controlling the mobile terminal
CN104090955A (zh) * 2014-07-07 2014-10-08 科大讯飞股份有限公司 一种音视频标签自动标注方法及系统
CN105335352A (zh) * 2015-11-30 2016-02-17 武汉大学 基于微博情感的实体识别方法
US20170018272A1 (en) * 2015-07-16 2017-01-19 Samsung Electronics Co., Ltd. Interest notification apparatus and method
CN109767791A (zh) * 2019-03-21 2019-05-17 中国—东盟信息港股份有限公司 一种针对呼叫中心通话的语音情绪识别及应用系统
CN109840323A (zh) * 2018-12-14 2019-06-04 深圳壹账通智能科技有限公司 保险产品的语音识别处理方法及服务器

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104184870A (zh) * 2014-07-29 2014-12-03 小米科技有限责任公司 通话记录标记方法、装置及电子设备
CN108564942B (zh) * 2018-04-04 2021-01-26 南京师范大学 一种基于敏感度可调的语音情感识别方法及系统
CN109410986B (zh) * 2018-11-21 2021-08-06 咪咕数字传媒有限公司 一种情绪识别方法、装置及存储介质
CN109658928B (zh) * 2018-12-06 2020-06-23 山东大学 一种家庭服务机器人云端多模态对话方法、装置及系统
CN109325132A (zh) * 2018-12-11 2019-02-12 平安科技(深圳)有限公司 专家知识推荐方法、装置、计算机设备及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140002171A (ko) * 2012-06-28 2014-01-08 한국전자통신연구원 자동통역방법
CN102723078A (zh) * 2012-07-03 2012-10-10 武汉科技大学 基于自然言语理解的语音情感识别方法
US20140278408A1 (en) * 2013-03-15 2014-09-18 Lg Electronics Inc. Mobile terminal and method of controlling the mobile terminal
CN104090955A (zh) * 2014-07-07 2014-10-08 科大讯飞股份有限公司 一种音视频标签自动标注方法及系统
US20170018272A1 (en) * 2015-07-16 2017-01-19 Samsung Electronics Co., Ltd. Interest notification apparatus and method
CN105335352A (zh) * 2015-11-30 2016-02-17 武汉大学 基于微博情感的实体识别方法
CN109840323A (zh) * 2018-12-14 2019-06-04 深圳壹账通智能科技有限公司 保险产品的语音识别处理方法及服务器
CN109767791A (zh) * 2019-03-21 2019-05-17 中国—东盟信息港股份有限公司 一种针对呼叫中心通话的语音情绪识别及应用系统

Also Published As

Publication number Publication date
CN110347823A (zh) 2019-10-18

Similar Documents

Publication Publication Date Title
WO2020244073A1 (zh) 基于语音的用户分类方法、装置、计算机设备及存储介质
CN110765244B (zh) 获取应答话术的方法、装置、计算机设备及存储介质
WO2019153737A1 (zh) 用于对评论进行评估的方法、装置、设备和存储介质
Katz et al. ConSent: Context-based sentiment analysis
CN112069298A (zh) 基于语义网和意图识别的人机交互方法、设备及介质
US20130060769A1 (en) System and method for identifying social media interactions
JP6335898B2 (ja) 製品認識に基づく情報分類
WO2021204017A1 (zh) 文本意图识别方法、装置以及相关设备
CN108027814B (zh) 停用词识别方法与装置
CN109086265B (zh) 一种语义训练方法、短文本中多语义词消歧方法
CN110377733B (zh) 一种基于文本的情绪识别方法、终端设备及介质
CN110990532A (zh) 一种处理文本的方法和装置
JP2003223456A (ja) 要約自動評価処理装置、要約自動評価処理プログラム、および要約自動評価処理方法
CN115274086B (zh) 一种智能导诊方法及系统
TWI681304B (zh) 自適應性調整關連搜尋詞的系統及其方法
CN114330366A (zh) 事件抽取方法及相关装置、电子设备和存储介质
CN114022192A (zh) 一种基于智能营销场景的数据建模方法及系统
US11922515B1 (en) Methods and apparatuses for AI digital assistants
US20230351121A1 (en) Method and system for generating conversation flows
CN111783424A (zh) 一种文本分句方法和装置
CN107729509B (zh) 基于隐性高维分布式特征表示的篇章相似度判定方法
WO2023035529A1 (zh) 基于意图识别的信息智能查询方法、装置、设备及介质
CN115358817A (zh) 基于社交数据的智能产品推荐方法、装置、设备及介质
US12001797B2 (en) System and method of automatic topic detection in text
US11475529B2 (en) Systems and methods for identifying and linking events in structured proceedings

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932120

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932120

Country of ref document: EP

Kind code of ref document: A1