WO2020019866A1 - 为客服系统日志打标签的方法、客服系统和存储介质 - Google Patents

为客服系统日志打标签的方法、客服系统和存储介质 Download PDF

Info

Publication number
WO2020019866A1
WO2020019866A1 PCT/CN2019/089289 CN2019089289W WO2020019866A1 WO 2020019866 A1 WO2020019866 A1 WO 2020019866A1 CN 2019089289 W CN2019089289 W CN 2019089289W WO 2020019866 A1 WO2020019866 A1 WO 2020019866A1
Authority
WO
WIPO (PCT)
Prior art keywords
label
cleaned
model
log
results
Prior art date
Application number
PCT/CN2019/089289
Other languages
English (en)
French (fr)
Inventor
刘俊仕
刘云峰
吴悦
胡晓
汶林丁
Original Assignee
深圳追一科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳追一科技有限公司 filed Critical 深圳追一科技有限公司
Publication of WO2020019866A1 publication Critical patent/WO2020019866A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Definitions

  • the present application relates to the technical field of natural language processing, and in particular, to a method for tagging customer service system logs, a customer service system, and a non-volatile computer-readable storage medium.
  • the intelligent customer service robot is an artificial intelligence information system capable of communicating with users using natural language processing and speech recognition technology. It can be used in a variety of user service scenarios, providing functions such as user service consulting, business query management, product marketing and promotion, and brings a new communication experience for users. It can replace manual customer service to complete a large number of repetitive daily tasks and answer users Frequently asked questions, greatly reducing the labor intensity of existing user service personnel, thereby reducing corporate labor costs.
  • Tagging customer service system logs is widely used in intelligent customer service robot systems, and its role is to tag the customer and customer service conversation logs.
  • Tags can have one or more, and can also involve multiple modalities, such as user intent, emotion Service satisfaction. Labeling can help improve the user experience of the product and help companies build user portraits to improve marketing strategies.
  • the customer service specialist signs the label manually through the system.
  • the tags are manually labeled manually, the customer service specialist needs to select the appropriate tags from the label system, one by one, which is not only inefficient, but also because the customer service specialist will access the next user immediately after the session ends, thus As a result, there is no time to accurately tag the current conversation, and even when there are many users, some conversations may not be tagged.
  • a method for tagging customer service system logs, a customer service system, and a non-volatile computer-readable storage medium are provided.
  • the new session log is entered into the modified analysis parameters for automatic tagging.
  • a customer service system including:
  • Collection module for collecting historical session logs
  • a cleaning module configured to clean the historical session log
  • a label mapping module configured to map the cleaned session logs to corresponding service labels
  • the label integration module is used to filter out at least one most accurate label.
  • One or more non-transitory computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following operations collection Historical session logs, and manually labeling the historical session logs;
  • the new session log is entered into the modified analysis parameters for automatic tagging.
  • FIG. 1 is an application environment diagram of a method for tagging customer service system logs provided by one or more embodiments.
  • FIG. 2 is a schematic flowchart of a method for labeling a log of a customer service system provided by one or more embodiments.
  • FIG. 3 is a block diagram of a customer service system provided by one or more embodiments.
  • FIG. 4 is a schematic diagram of an internal structure of a computer device provided by one or more embodiments.
  • the method for labeling customer service system logs provided in this application can be applied to the application environment shown in FIG. 1.
  • the application environment includes a computer device 102.
  • the computer device 102 may collect historical session logs, manually label the historical session logs, and clean the historical session logs.
  • the semantic analysis model is used to map the cleaned session logs to corresponding business labels, and the corresponding business labels are integrated to obtain at least One label, analyzes at least one label and manually labeling the labels, and modifies the parameters of the semantic analysis model according to the analysis result.
  • the new session log is input into the semantic analysis model with the modified parameters for automatic labeling.
  • the computer device 102 is various personal computers, notebook computers, smart phones, tablet computers, portable wearable devices, and the like.
  • the computer device 102 may also be a server, and the server may be implemented by an independent server or a server cluster composed of multiple servers.
  • FIG. 2 is a schematic flowchart of a method for tagging customer service system logs according to an embodiment.
  • a method for labeling customer service system logs includes:
  • the label includes the business category of the text, the emotion category of the text, and the part of speech of the text.
  • the log cleaning module based on natural language processing is mainly used to clean the original log data.
  • the customer service raw log data often has a lot of noise, such as a large number of greetings, expressions, system prompts, and web links. These are unrelated to users or difficult to use. You need to remove noise and remove noise during the data cleaning process.
  • customer service log needs to correct typos, customer service or customer communication often have typos, and these typos will also affect labeling. The last is the normalization of the text.
  • the customer service logs are often too spoken and need to be converted into standard expressions.
  • cleaning the historical session log can be completed by the following three rounds of filtering:
  • First round of filtering The historical session logs are filtered by rules to obtain the first round of filtering results.
  • Rule filtering is, for example, regular expression filtering or general corpus filtering.
  • the customer service robot reminds the customer that “5 people in front of you are queuing”.
  • the customer service robot will repeatedly remind, so the regular expression “There are ⁇ d people in line in front of you” will only keep the number of people in the queue. Filter out repetitive, semantically unrelated customer service robot responses.
  • the universal corpus filtering includes universal greetings such as "hello”, “okay”, “thank you”, and the general greeting can be filtered through the universal corpus when the customer's response includes the above universal greeting.
  • Second round of filtering Query rewrite (Query Rewrite) of the first round of filtering results to get the second round of filtering results. For example, a random input date input by a customer is rewritten and filtered by a query to generate a date in a standard date format.
  • the log is cleaned, noise is eliminated to reduce redundant data, the format is standardized, typos are corrected, and the cleaned log is input to the semantic analysis model for training, improving the accuracy of the model's training data.
  • Semantic analysis models include learning models and predictive models.
  • the learning model is a machine learning model or a deep learning model
  • the prediction model is a support vector machine model, a convolutional neural network model, or a recurrent neural network model.
  • the learning model and prediction model have a good generalization effect.
  • the conversation information in the customer service session log does not include the training information, the learning model and the prediction model can still map the log to the corresponding business label.
  • the process of inputting the cleaned session log into the prediction model and inputting the output of the prediction model into the learning model to obtain the class probability includes:
  • Convolved neural network model operations are used to convolve the washed session logs on multiple scales
  • the output results are input to the softmax classifier of the deep learning model to obtain the class probability.
  • the semantic analysis model can quickly map the conversation log after the new conversation log is generated to find the accurate label.
  • the process of using the semantic analysis model to map the cleaned session log to the corresponding business label may further include: performing semantic analysis on the customer's sentence to obtain the predictive label; and based on the semantic analysis module, the cleaned session is performed. Emotional labeling of logs; custom labeling of post-cleaning session logs. For example, in some conversation scenarios, sensitive word detection can be set. When customers enter sensitive words, sensitive words are processed, such as using "*" instead of sensitive words. .
  • sentiment analysis mainly distinguishes positive, neutral or negative sentiment of customers based on customer service logs.
  • transaction information consultation belongs to neutral sentiment, and transaction disputes are treated as negative sentiment.
  • sentiment analysis of the conversation log can help the company understand the user's intention more accurately and improve the product.
  • the prediction model output prediction label may include business labels of multiple models with different granularities. Therefore, it is necessary to further integrate the labels to filter out the most accurate one or more labels.
  • the integration methods include: ranking filtering method and threshold filtering method.
  • the sorting and filtering method sorts the accuracy of all business tags, and can obtain the top one or more tags, and can control the number of obtained tags;
  • the threshold filtering method is to set the accuracy threshold of the business tags in advance, and filter out the business tags.
  • the service tag with accuracy greater than the threshold is simple and fast.
  • the corresponding business labels are sorted according to the confidence level, and the top three business labels with a high confidence level are filtered, or the high confidence level threshold is set, and the business labels with a higher confidence level are selected.
  • the semantic analysis model parameters are continuously modified, and new session logs are input into the semantic analysis after the correction parameters are modified. Models are automatically labeled. This solves the problems of low manual labeling efficiency, improves the labeling efficiency of the customer service system, and improves label accuracy.
  • FIG. 3 is a structural block diagram of a customer service system provided by an embodiment. As shown in FIG. 3, in one embodiment, a customer service system is provided.
  • the customer service system includes:
  • the collecting module 31 is configured to collect historical session logs.
  • the cleaning module 32 is configured to clean the historical session log.
  • the label mapping module 33 is configured to map the cleaned session logs to corresponding service labels.
  • the label integration module 34 is configured to filter out at least one most accurate label.
  • the cleaning module 32 is further configured to perform regular filtering on the historical session logs to obtain the first round of filtering results, and perform query rewriting (Query Rewrite) to obtain the second round of filtering results. , Correct the typo of the second round of filtering results to get the cleaned log.
  • the label mapping module 33 includes a semantic analysis unit, a sentiment analysis unit, and a custom unit.
  • the semantic analysis unit is used to perform semantic analysis on the customer's sentence, and obtain the predictive label through the semantic model.
  • the sentiment analysis unit labels sentiment tags on the cleaned conversation log based on the semantic analysis module.
  • the sentiment analysis module mainly distinguishes positive, neutral or negative sentiment of customers based on customer service logs. For example, transaction information consultation belongs to neutral sentiment. Transaction disputes are handled as negative emotions. By combining semantics and emotions, customer service conversations can be labeled more accurately, thereby improving the accuracy of labeling. In addition, the sentiment analysis of the conversation log can help the company understand the user's intention more accurately and improve the product.
  • the custom unit labels the cleaned session logs with custom tags. For example, in some conversation scenarios, sensitive word detection can be set. When customers enter sensitive words, the sensitive words are processed, such as using "*" instead of sensitive words.
  • the label integration module 34 is further configured to filter the labels by at least one of a ranking filtering method and a threshold filtering method to obtain the most accurate at least one label.
  • the cleaning module 32 is further configured to perform regular filtering on the historical session log to obtain a first round of filtering results; perform query rewriting on the first round of filtering results to obtain a second round of filtering results; and The second round of filtering results is typo corrected to obtain a cleaned log.
  • the label mapping module 33 is further configured to input the cleaned session log into a prediction model; input the output of the prediction model into a learning model to obtain a category probability; and take the category label with the largest category probability as the corresponding service label.
  • the label mapping module 33 is further configured to input the cleaned session log into a convolutional neural network model; and perform convolution of the cleaned session log on multiple scales through the convolutional neural network model; Pooling the results of the convolution; stitching the results of the pooling operation; inputting the stitched results into the fully connected layer and outputting them to obtain the output results; inputting the output results to the softmax classifier of the deep learning model, Obtaining the category probability; and taking the category tag with the largest category probability as the corresponding service tag.
  • the tag mapping module includes a semantic analysis module, an sentiment analysis module, and a custom module, by combining semantics and sentiment, the customer service conversation can be labeled more accurately, thereby improving the accuracy of labeling tags. Further, by adding a custom module to meet the needs of different dialog scenarios.
  • Any process or method description in a flowchart or otherwise described herein can be understood as representing a module, fragment, or portion of code that includes one or more executable instructions for implementing the operation of a particular logical function or process
  • the scope of the preferred embodiments of this application includes additional implementations in which the functions may be performed out of the order shown or discussed, including performing the functions in a substantially simultaneous manner or in the reverse order according to the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application pertain.
  • FIG. 4 is a schematic diagram of an internal structure of a computer device in an embodiment.
  • the computer device includes a processor, a memory, a network interface, a display screen, and an input device connected through a system bus.
  • the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for running an operating system and computer programs in a non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program when executed by a processor, implements a method for labeling client system logs.
  • the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen.
  • the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball, or a touchpad provided on the computer device casing. , Or an external keyboard, trackpad, or mouse.
  • the customer service system provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in FIG. 4.
  • the memory of the computer equipment can store each program module constituting the customer service system.
  • the computer program constituted by each program module causes the processor to perform operations in the method for tagging customer service system logs in the embodiments of the present application described in this specification.
  • each part of the application may be implemented by hardware, software, firmware, or a combination thereof.
  • multiple operations or methods may be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it may be implemented using any one or a combination of the following techniques known in the art: Discrete logic circuits, application-specific integrated circuits with suitable combinational logic gate circuits, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above integrated modules may be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a non-volatile computer-readable storage medium.
  • the non-volatile computer-readable storage medium mentioned above may be a read-only memory, a magnetic disk, or an optical disk.

Abstract

一种为客服系统日志打标签的方法,包括:收集历史会话日志,并对所述历史会话日志进行人工标注标签;对所述历史会话日志进行清洗;利用语义分析模型将清洗后会话日志映射到相应业务标签上;对所述相应业务标签进行整合得到至少一个标签;分析所述至少一个标签与人工标注标签,并根据分析结果修正所述语义分析模型参数;将新的会话日志输入修正参数后的语义分析模型进行自动打标签。

Description

为客服系统日志打标签的方法、客服系统和存储介质
相关申请的交叉引用
本申请要求于2018年07月25日提交中国专利局、申请号为201810830223.9、发明名称为“为客服系统日志打标签的方法及客服系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及自然语言处理技术领域,尤其是一种为客服系统日志打标签的方法、客服系统和非易失性计算机可读存储介质。
背景技术
为节约人工客服成本,智能客服机器人被引入客服系统。智能客服机器人是一种能够使用自然语言处理和语音识别技术与用户进行交流的人工智能信息系统。它能够用于多种用户服务场景,提供用户服务咨询、业务查询办理、产品营销推广等功能,为用户带来全新的沟通体验,它可以替代人工客服完成大量的重复性日常工作,为用户解答常见问题,大幅降低现有用户服务人员的劳动强度,从而削减企业人工成本。
为客服系统日志打标签广泛存在于智能客服机器人系统中,其作用是为客户与客服的会话日志进行打标签,标签可以有一个或者多个,也可以涉及多个模态,比如用户意图、情感、服务满意程度。通过打标签可以帮助提升产品的用户体验,帮助企业建立用户画像改进营销策略。
相关技术中,客服专员在与客户结束当前轮对话后通过系统来手动打标 签。但由于人工手动打标签,客服专员需要从标签系统中选择适配的标签一个一个勾选,不仅效率低,而且由于客服专员会在会话结束后很短的时间内马上接入下一个用户,从而导致没有时间给当前对话打上准确标签,甚至当用户很多时,会出现一些对话漏打标签的状况。
发明内容
根据本申请的各种实施例,提供一种为客服系统日志打标签的方法、客服系统和非易失性计算机可读存储介质。
一种为客服系统日志打标签的方法,包括:
收集历史会话日志,并对所述历史会话日志进行人工标注标签;
对所述历史会话日志进行清洗;
利用语义分析模型将清洗后会话日志映射到相应业务标签上;
对所述相应业务标签进行整合得到至少一个标签;
分析所述至少一个标签与人工标注标签,并根据分析结果修正所述语义分析模型参数;
将新的会话日志输入修正参数后的语义分析模型进行自动打标签。
一种客服系统,包括:
收集模块,用于收集历史会话日志;
清洗模块,用于对所述历史会话日志进行清洗;
标签映射模块,用于将清洗后会话日志映射到相应业务标签上;
标签整合模块,用于筛选出至少一个最准确标签。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所 述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下操作收集历史会话日志,并对所述历史会话日志进行人工标注标签;
对所述历史会话日志进行清洗;
利用语义分析模型将清洗后会话日志映射到相应业务标签上;
对所述相应业务标签进行整合得到至少一个标签;
分析所述至少一个标签与人工标注标签,并根据分析结果修正所述语义分析模型参数;
将新的会话日志输入修正参数后的语义分析模型进行自动打标签。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
本发明的一个或多个实施例的细节在下面的附图和描述中提出。本发明的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
为了更好地描述和说明本申请公开的实施例和/或示例,可以参考一副或者多副附图。用于描述附图的附加细节或示例不应当被认为是对所公开的发明、目前描述的实施例和/或示例以及目前理解的这些发明的最佳模式中的任何一者的范围的限制。
图1是一个或多个实施例提供的为客服系统日志打标签的方法的应用环境图。
图2是一个或多个实施例提供的为客服系统日志打标签的方法的流程示意图。
图3是一个或多个实施例提供的客服系统的模块图。
图4是一个或多个实施例提供的计算机设备的内部结构示意图。
具体实施方式
下面结合附图和实施例对本发明进行详细的描述。
本申请提供的为客服系统日志打标签的方法,可以应用于如图1所示的应用环境中。该应用环境包括计算机设备102。计算机设备102可以收集历史会话日志,并对历史会话日志进行人工标注标签,对历史会话日志进行清洗利用语义分析模型将清洗后会话日志映射到相对应业务标签上,对相应业务标签进行整合得到至少一个标签,分析至少一个标签与人工标注标签,并根据分析结果修正语义分析模型参数,将新的会话日志输入修正参数后的语义分析模型进行自动打标签。其中,计算机设备102是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备等。可选地,计算机设备102也可以是服务器,服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
图2是一个实施例提供的为客服系统日志打标签的方法的流程示意图。
如图2所示,在一个实施例中,提供的为客服系统日志打标签的方法包括:
S21,收集历史会话日志,并对历史会话日志进行人工标注标签。
例如,客户想要办理密码变更业务,可以通过人工将客户咨询的问题“密码如何变更”标注为“密码咨询业务”,进一步细化标签为“密码变更业务”。
标注标签包括文本的业务类别、文本的情感类别和文本的分词词性。通过对标签进行多个维度进行标注,有利于分析用户意图、情感以及服务满意程度,从而帮助提升产品的用户体验。比如通过分析一个时间段内用户意图的分布,就能获取哪些问题是用户常见问题,企业就可以针对这些问题对产品做出相应调整。
S22,对历史会话日志进行清洗。
基于自然语言处理的日志清洗模块主要作用清洗原始日志数据。客服原始日志数据往往有非常多的噪声,比如大量的问候语、表情、系统提示和web链接等,这些都是和用户无关或者难以被利用的,需要在数据清洗过程中将噪声去除,去除噪声后的客服日志需进行错别字的修正,客服或者客户交流往往会出现错别字,这些错别字也是对打标签会有影响。最后是文本的归一化,客服日志中往往过于口语,需要将其转化成标准的表达。
具体的,对历史会话日志进行清洗,可以通过以下三轮过滤完成:
第一轮过滤:将历史会话日志进行规则过滤,得到第一轮过滤结果。规则过滤例如为正则表达式过滤或者为通用语料库过滤。
例如,客服机器人提醒客户“您前面有5人在排队”,当排队人数变更时,客服机器人会重复提醒,因此使用正则表达式“您前面有\d人在排队”只保留排队人个数,过滤掉重复出现的且与语义无关的客服机器人答句。
例如,通用语料库过滤中包含“你好”、“好的”、“谢谢”等通用问候语,当客户的回答中包含上述通用问候语时可通过通用语料库将通用问候语过滤。
第二轮过滤:将第一轮过滤结果进行query改写(Query Rewrite,查询改写),得到第二轮过滤结果。例如,将客户输入的随意输入的日期通过query改写过滤后生成标准日期格式的日期。
第三轮过滤:将第二轮过滤结果进行错别字纠正,得到清洗后的日志。例如,将客户输入的“密马”通过错别字纠正过滤后得到“密码”。
对日志进行清洗,消除噪声从而减少冗余数据,并且对格式进行标准化、修正错别字,并将清洗后的日志输入到语义分析模型进行训练,提高模型的训练数据准确性。
S23,利用语义分析模型将清洗后会话日志映射到相对应业务标签上。
语义分析模型包括学习模型和预测模型。学习模型为机器学习模型或者为深度学习模型,预测模型为支持向量机模型或者卷积神经网络模型或者循环神经网络模型。学习模型和预测模型具有良好的泛化效果,当客服会话日志出现训练数据中没有的对话信息后,学习模型和预测模型仍能将日志映射到相对应业务标签上。
利用语义分析模型将清洗后会话日志映射到相应业务标签上,包括:
将清洗后会话日志输入预测模型;
将预测模型输出结果输入到学习模型得到类别概率;
取类别概率最大的类别标签作为相应业务标签。
可选地,以深度学习模型和卷积神经网络模型为例进行说明,将将清洗后会话日志输入预测模型,将预测模型输出结果输入到学习模型得到类别概率的过程,包括:
将清洗后的会话日志输入卷积神经网络模型;
通过卷积神经网络模型操作将清洗后会话日志进行多个尺度的卷积;
对卷积的结果进行池化操作;
将池化操作的结果进行拼接;
将拼接的结果输入全连接层后进行输出,得到输出结果;
将输出结果输入到深度学习模型的softmax分类器得到类别概率。
通过对学习模型和预测模型的训练,在新的会话日志产生后,语义分析模型能快速将会话日志进行映射,找到准确标签。
可选地,利用语义分析模型将清洗后会话日志映射到相对应业务标签上的过程还可以包括:针对客户的语句进行语义分析,得到预测标注标签;在语义分析模的基础上对清洗后会话日志进行情感标签标注;为清洗后会话日志进行自定义标签标注,例如在一些对话场景中,可以设置敏感词检测,当客户输入敏感词时,对敏感词进行处理如使用“*”替代敏感词。
其中,情感分析主要是针对客服日志区分出客户的正面情感、中性情感或者负面情感,例如交易信息咨询属于中性情感,而交易争议处理为负面情感,通过将语义与情感相结合,可以更准确的对客服对话进行标注,从而提高标注标签的准确率。并且,通过对会话日志进行情感分析,有助于企业更准确了解用户意图,从而对产品做出改进。
S24,对相应业务标签进行整合得到至少一个标签。
预测模型输出预测标注标签可能包含粒度不同的多个模型的业务标签,因此需要进一步对标签进行整合,从而筛选出最准确的一个或多个标签,整合方法包括:排序筛选法和阈值过滤法。排序筛选法对所有业务标签的准确性进行排序,可以获取排名在前的一个或者多个标签,对获取标签的数量可以掌控;阈值过滤法是预先设置好业务标签准确性阈值,过滤出业务标签准确性大于阈值的业务标签,方法简单快捷。
例如,对相应业务标签根据置信度进行排序,筛选出置信度排名前三的业务标签,或者设置置信度高阈值,筛选出置信度高于置信度高阈值的业务标签。
S25,分析至少一个标签与人工标注标签,并根据分析结果修正语义分析模型参数。
将语义分析模型预测的结果和人工标注的结果做对比,通过人工分析至少一个标签与人工标注标签,当至少一个标签比人工标注标签准确时,则将错误标注的数据按照语义分析模型预测结果进行修正;当人工标注的结果比整合得到的所有标签均准确时,则增加这个数据的损失权值从而提高语义分析模型学习效果。
S26,将新的会话日志输入修正参数后的语义分析模型进行自动打标签。
本实施例中,通过对历史会话日志进行人工标注标签与清洗并将历史会话日志与人工标注标签输入语义分析模型,不断修正语义分析模型参数,并将新的会话日志输入修正参数后的语义分析模型进行自动打标签。从而解决人工手动标注标签效率低等问题,提高客服系统打标签效率,并且提高了标签准确率。
图3是一个实施例提供的客服系统的结构框图。如图3所示,在一个实施例中,提供了一个客服系统,该客服系统包括:
收集模块31,用于收集历史会话日志。
清洗模块32,用于对历史会话日志进行清洗。
标签映射模块33,用于将清洗后会话日志映射到相应业务标签上。
标签整合模块34,用于筛选出至少一个最准确标签。
在一个实施例中,清洗模块32还用于将历史会话日志进行规则过滤,得 到第一轮过滤结果,将第一轮过滤结果进行query改写(Query Rewrite,查询改写),得到第二轮过滤结果,将第二轮过滤结果进行错别字纠正,得到清洗后的日志。
标签映射模块33包括语义分析单元、情感分析单元和自定义单元。
语义分析单元用于针对客户的语句进行语义分析,通过语义模型得到预测标注标签。
情感分析单元在语义分析模块基础上对清洗后会话日志进行情感标签标注,情感分析模块主要是针对客服日志区分出客户的正面情感、中性情感或者负面情感,例如交易信息咨询属于中性情感,而交易争议处理为负面情感,通过将语义与情感相结合,可以更准确的对客服对话进行标注,从而提高标注标签的准确率。并且,通过对会话日志进行情感分析,有助于企业更准确了解用户意图,从而对产品做出改进。
自定义单元为清洗后会话日志进行自定义标签标注,例如在一些对话场景中,可以设置敏感词检测,当客户输入敏感词时,对敏感词进行处理如使用“*”替代敏感词。
标签整合模块34还用于通过排序筛选法和阈值过滤法中的至少一个对标签进行筛选,得到最准确的至少一个标签。
在一个实施例中,清洗模块32还用于将所述历史会话日志进行规则过滤,得到第一轮过滤结果;将所述第一轮过滤结果进行query改写,得到第二轮过滤结果;及将所述第二轮过滤结果进行错别字纠正,得到清洗后的日志。
在一个实施例中,标签映射模块33还用于将清洗后会话日志输入预测模型;将所述预测模型输出结果输入到学习模型得到类别概率;及取类别概率 最大的类别标签作为相应业务标签。
在一个实施例中,标签映射模块33还用于将清洗后的会话日志输入卷积神经网络模型;通过所述卷积神经网络模型将清洗后的所述会话日志进行多个尺度的卷积;对卷积的结果进行池化操作;将池化操作的结果进行拼接;将拼接的结果输入全连接层后进行输出,得到输出结果;将所述输出结果输入到深度学习模型的softmax分类器,得到所述类别概率;及取类别概率最大的类别标签作为相应业务标签。
本实施例中,由于标签映射模块包括语义分析模块、情感分析模块和自定义模块,通过将语义与情感相结合,可以更准确的对客服对话进行标注,从而提高标注标签的准确率。进一步的,通过增加自定义模块,以满足不同对话场景需求。
可以理解的是,上述各实施例中相同或相似部分可以相互参考,在一些实施例中未详细说明的内容可以参见其他实施例中相同或相似的内容。
需要说明的是,在本申请的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本申请的描述中,除非另有说明,“多个”的含义是指至少两个。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的操作的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
图4为一个实施例中计算机设备的内部结构示意图。如图3所示,该计 算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种为客户系统日志打标签的方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
在一个实施例中,本申请提供的客服系统可以实现为一种计算机程序的形式,计算机程序可在如图4所示的计算机设备上运行。计算机设备的存储器中可存储组成该客服系统的各个程序模块。各个程序模块构成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的为客服系统日志打标签的方法中的操作。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个操作或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或 部分操作是可以通过程序来指令相关的硬件完成,的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的操作之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个非易失性计算机可读存储介质中。
上述提到的非易失性计算机可读存储介质可以是只读存储器,磁盘或光盘等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。
需要说明的是,本发明不局限于上述最佳实施方式,本领域技术人员在本发明的启示下都可得出其他各种形式的产品,但不论在其形状或结构上作任何变化,凡是具有与本申请相同或相近似的技术方案,均落在本发明的保护范围之内。

Claims (21)

  1. 一种为客服系统日志打标签的方法,包括:
    收集历史会话日志,并对所述历史会话日志进行人工标注标签;
    对所述历史会话日志进行清洗;
    利用语义分析模型将清洗后会话日志映射到相应业务标签上;
    对所述相应业务标签进行整合得到至少一个标签;
    分析所述至少一个标签与人工标注标签,并根据分析结果修正所述语义分析模型参数;
    将新的会话日志输入修正参数后的语义分析模型进行自动打标签。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述历史会话日志进行清洗,包括:
    第一轮过滤:将所述历史会话日志进行规则过滤,得到第一轮过滤结果;
    第二轮过滤:将所述第一轮过滤结果进行query改写,得到第二轮过滤结果;
    第三轮过滤:将所述第二轮过滤结果进行错别字纠正,得到清洗后的日志。
  3. 根据权利要求1所述的方法,其特征在于,所述利用语义分析模型将清洗后的会话日志映射到相应业务标签上,包括:
    对清洗后的所述会话日志进行语义分析,得到预测标注标签;
    对清洗后的所述会话日志进行情感标签标注;及
    对清洗后的所述会话日志进行自定义标签标注。
  4. 根据权利要求1所述的方法,其特征在于,所述语义分析模型包括学习模型和预测模型。
  5. 根据权利要求1或4任一项所述的方法,其特征在于,所述利用语义分析模型将清洗后会话日志映射到相应业务标签上,包括:
    将清洗后会话日志输入预测模型;
    将所述预测模型输出结果输入到学习模型得到类别概率;
    取类别概率最大的类别标签作为相应业务标签。
  6. 根据权利要求5所述的方法,其特征在于,所述将清洗后会话日志输入预测模型,将所述预测模型输出结果输入到学习模型得到类别概率,包括:
    将清洗后的会话日志输入卷积神经网络模型;
    通过所述卷积神经网络模型将清洗后的所述会话日志进行多个尺度的卷积;
    对卷积的结果进行池化操作;
    将池化操作的结果进行拼接;
    将拼接的结果输入全连接层后进行输出,得到输出结果;
    将所述输出结果输入到深度学习模型的softmax分类器,得到所述类别概率。
  7. 根据权利要求1所述的方法,其特征在于,所述标注标签包括文本的业务类别、文本的情感类别和文本的分词词性。
  8. 根据权利要求1所述的方法,其特征在于,所述对所述相应业务标签进行整合得到至少一个标签,整合方法包括:排序筛选法和阈值过滤法。
  9. 一种客服系统,包括:
    收集模块,用于收集历史会话日志;
    清洗模块,用于对所述历史会话日志进行清洗;
    标签映射模块,用于将清洗后会话日志映射到相应业务标签上;
    标签整合模块,用于对所述相应业务标签进行整合得到至少一个标签。
  10. 根据权利要求9所述的系统,其特征在于,所述系统还包括:
    模型修正模块,用于分析所述至少一个标签与人工标注标签,并根据分析结果修正所述语义分析模型参数;
    打标签模块,用于将新的会话日志输入修正参数后的语义分析模型进行自动打标签。
  11. 根据权利要求9所述的系统,其特征在于,所述标签映射模块包括:语义分析单元、情感分析单元和自定义单元。
  12. 根据权利要求11所述的系统,其特征在于,所述情感分析单元用于为清洗后会话日志进行情感标签标注。
  13. 根据权利要求11所述的系统,其特征在于,所述自定义模块为清洗后会话日志进行自定义标签标注。
  14. 根据权利要求9所述的系统,其特征在于,所述清洗模块还用于将所述历史会话日志进行规则过滤,得到第一轮过滤结果;将所述第一轮过滤结果进行query改写,得到第二轮过滤结果;及将所述第二轮过滤结果进行错别字纠正,得到清洗后的日志。
  15. 根据权利要求9所述的系统,其特征在于,所述标签映射模块还用于将清洗后会话日志输入预测模型;将所述预测模型输出结果输入到学习模型得到类别概率;及取类别概率最大的类别标签作为相应业务标签。
  16. 根据权利要求15所述的系统,其特征在于,所述标签映射模块还用于将清洗后的会话日志输入卷积神经网络模型;通过所述卷积神经网络模型将清洗后的所述会话日志进行多个尺度的卷积;对卷积的结果进行池化操作;将池化操作的结果进行拼接;将拼接的结果输入全连接层后进行输出,得到 输出结果;将所述输出结果输入到深度学习模型的softmax分类器,得到所述类别概率;及取类别概率最大的类别标签作为相应业务标签。
  17. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下操作:收集历史会话日志,并对所述历史会话日志进行人工标注标签;
    对所述历史会话日志进行清洗;
    利用语义分析模型将清洗后会话日志映射到相应业务标签上;
    对所述相应业务标签进行整合得到至少一个标签;
    分析所述至少一个标签与人工标注标签,并根据分析结果修正所述语义分析模型参数;
    将新的会话日志输入修正参数后的语义分析模型进行自动打标签。
  18. 根据权利要求17所述的非易失计算机可读存储介质,其特征在于,所述一个或多个处理器执行所述对所述历史会话日志进行清洗时,还执行如下操作:
    将所述历史会话日志进行规则过滤,得到第一轮过滤结果;
    将所述第一轮过滤结果进行query改写,得到第二轮过滤结果;
    将所述第二轮过滤结果进行错别字纠正,得到清洗后的日志。
  19. 根据权利要求17所述的非易失性计算机可读存储介质,其特征在于,所述一个或多个处理器执行所述利用语义分析模型将清洗后的会话日志映射到相应业务标签上时,还执行如下操作:
    对清洗后的所述会话日志进行语义分析,得到预测标注标签;
    对清洗后的所述会话日志进行情感标签标注;及
    对清洗后的所述会话日志进行自定义标签标注。
  20. 根据权利要求17所述的非易失性计算机可读存储介质,其特征在于,所述一个或多个处理器执行所述利用语义分析模型将清洗后会话日志映射到相应业务标签上时,还执行如下操作:
    将清洗后会话日志输入预测模型;
    将所述预测模型输出结果输入到学习模型得到类别概率;
    取类别概率最大的类别标签作为相应业务标签。
  21. 根据权利要求20所述的非易失性计算机可读存储介质,其特征在于,所述一个或多个处理器所述将清洗后会话日志输入预测模型,将所述预测模型输出结果输入到学习模型得到类别概率时,还执行如下操作:
    将清洗后的会话日志输入卷积神经网络模型;
    通过所述卷积神经网络模型将清洗后的所述会话日志进行多个尺度的卷积;
    对卷积的结果进行池化操作;
    将池化操作的结果进行拼接;
    将拼接的结果输入全连接层后进行输出,得到输出结果;
    将所述输出结果输入到深度学习模型的softmax分类器,得到所述类别概率。
PCT/CN2019/089289 2018-07-25 2019-05-30 为客服系统日志打标签的方法、客服系统和存储介质 WO2020019866A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810830223.9A CN109033356A (zh) 2018-07-25 2018-07-25 为客服系统日志打标签的方法及客服系统
CN201810830223.9 2018-07-25

Publications (1)

Publication Number Publication Date
WO2020019866A1 true WO2020019866A1 (zh) 2020-01-30

Family

ID=64646369

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089289 WO2020019866A1 (zh) 2018-07-25 2019-05-30 为客服系统日志打标签的方法、客服系统和存储介质

Country Status (2)

Country Link
CN (1) CN109033356A (zh)
WO (1) WO2020019866A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033356A (zh) * 2018-07-25 2018-12-18 深圳追科技有限公司 为客服系统日志打标签的方法及客服系统
CN110160583A (zh) * 2019-05-05 2019-08-23 任志刚 一种文物监测装置、文物环境状态监测系统及存储箱
CN112395261A (zh) * 2019-08-16 2021-02-23 中国移动通信集团浙江有限公司 业务推荐方法、装置、计算设备及计算机存储介质
CN112487186A (zh) * 2020-11-27 2021-03-12 上海浦东发展银行股份有限公司 一种人人对话日志分析方法、系统、设备及存储介质
CN113609825B (zh) * 2021-10-11 2022-03-25 北京百炼智能科技有限公司 一种客户属性标签智能标识方法和装置
CN117149988B (zh) * 2023-11-01 2024-02-27 广州市威士丹利智能科技有限公司 基于教育数字化的数据管理处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309948A (zh) * 2013-05-20 2013-09-18 携程计算机技术(上海)有限公司 联络中心舆情监控分析和智能分配处理系统及方法
US20150310862A1 (en) * 2014-04-24 2015-10-29 Microsoft Corporation Deep learning for semantic parsing including semantic utterance classification
CN106202159A (zh) * 2016-06-23 2016-12-07 深圳追科技有限公司 一种客服系统的人机交互方法
CN106802951A (zh) * 2017-01-17 2017-06-06 厦门快商通科技股份有限公司 一种用于智能对话的话题抽取方法及系统
CN109033356A (zh) * 2018-07-25 2018-12-18 深圳追科技有限公司 为客服系统日志打标签的方法及客服系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844750A (zh) * 2017-02-16 2017-06-13 深圳追科技有限公司 一种基于客服机器人中情感安抚的人机交互方法及系统
CN106897268B (zh) * 2017-02-28 2020-06-02 科大讯飞股份有限公司 文本语义理解方法、装置和系统
CN107153672A (zh) * 2017-03-22 2017-09-12 中国科学院自动化研究所 基于言语行为理论的用户交互意图识别方法及系统
CN107025284B (zh) * 2017-04-06 2020-10-27 中南大学 网络评论文本情感倾向的识别方法及卷积神经网络模型
CN107562856A (zh) * 2017-08-28 2018-01-09 深圳追科技有限公司 一种自助式客户服务系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309948A (zh) * 2013-05-20 2013-09-18 携程计算机技术(上海)有限公司 联络中心舆情监控分析和智能分配处理系统及方法
US20150310862A1 (en) * 2014-04-24 2015-10-29 Microsoft Corporation Deep learning for semantic parsing including semantic utterance classification
CN106202159A (zh) * 2016-06-23 2016-12-07 深圳追科技有限公司 一种客服系统的人机交互方法
CN106802951A (zh) * 2017-01-17 2017-06-06 厦门快商通科技股份有限公司 一种用于智能对话的话题抽取方法及系统
CN109033356A (zh) * 2018-07-25 2018-12-18 深圳追科技有限公司 为客服系统日志打标签的方法及客服系统

Also Published As

Publication number Publication date
CN109033356A (zh) 2018-12-18

Similar Documents

Publication Publication Date Title
WO2020019866A1 (zh) 为客服系统日志打标签的方法、客服系统和存储介质
US11663409B2 (en) Systems and methods for training machine learning models using active learning
WO2019084810A1 (zh) 一种信息处理方法及终端、计算机存储介质
US20190179903A1 (en) Systems and methods for multi language automated action response
US11258902B2 (en) Partial automation of text chat conversations
KR20200009117A (ko) 텍스트 데이터 수집 및 분석을 위한 시스템
US20210026924A1 (en) Natural language response improvement in machine assisted agents
US20220237376A1 (en) Method, apparatus, electronic device and storage medium for text classification
WO2018182501A1 (en) Method and system of intelligent semtiment and emotion sensing with adaptive learning
Saha et al. Towards sentiment-aware multi-modal dialogue policy learning
Wei et al. Sentiment classification of Chinese Weibo based on extended sentiment dictionary and organisational structure of comments
Miao et al. A dynamic financial knowledge graph based on reinforcement learning and transfer learning
Garg et al. Potential use-cases of natural language processing for a logistics organization
CN116956068A (zh) 基于规则引擎的意图识别方法、装置、电子设备及介质
CN115017271B (zh) 用于智能生成rpa流程组件块的方法及系统
EP3876228A1 (en) Automated assessment of the quality of a dialogue system in real time
CN114546326A (zh) 一种虚拟人手语生成方法和系统
WO2020010930A1 (zh) 客服机器人知识库歧义检测方法、存储介质和计算机设备
Asha et al. Implication and advantages of machine learning-based chatbots in diverse disciplines
CN110442716A (zh) 智能文本数据处理方法和装置、计算设备、存储介质
Bhola et al. Hybrid Framework for Sentiment Analysis Using ConvBiLSTM and BERT
US11860824B2 (en) Graphical user interface for display of real-time feedback data changes
US20230351170A1 (en) Automated processing of feedback data to identify real-time changes
US11907500B2 (en) Automated processing and dynamic filtering of content for display
CN111191030B (zh) 基于分类的单句意图识别方法、装置和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19840648

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 10/06/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19840648

Country of ref document: EP

Kind code of ref document: A1