WO2017118333A1 - 一种基于数据驱动预测用户问题的方法及装置 - Google Patents

一种基于数据驱动预测用户问题的方法及装置 Download PDF

Info

Publication number
WO2017118333A1
WO2017118333A1 PCT/CN2016/112853 CN2016112853W WO2017118333A1 WO 2017118333 A1 WO2017118333 A1 WO 2017118333A1 CN 2016112853 W CN2016112853 W CN 2016112853W WO 2017118333 A1 WO2017118333 A1 WO 2017118333A1
Authority
WO
WIPO (PCT)
Prior art keywords
behavior data
user
data
predicting
target
Prior art date
Application number
PCT/CN2016/112853
Other languages
English (en)
French (fr)
Inventor
薛少飞
张家兴
崔恒斌
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to EP16883458.8A priority Critical patent/EP3401853A4/en
Priority to JP2018535292A priority patent/JP2019505909A/ja
Publication of WO2017118333A1 publication Critical patent/WO2017118333A1/zh
Priority to US16/029,508 priority patent/US11481698B2/en
Priority to US18/045,801 priority patent/US11928617B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • G06Q30/015Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
    • G06Q30/016After-sales

Definitions

  • the invention belongs to the technical field of data processing, and in particular relates to a method and device for predicting user problems based on data driving.
  • Predicting problems that users may encounter in advance is a typical multi-classification problem, usually consisting of two parts: feature selection and model modeling.
  • feature selection end extracts features
  • it is usually artificially set by rules, which are empirically considered to be related to problems that the user may ask, such as whether the user has opened a certain service, in the past few Have you ever had a consumption record in the day?
  • rules which are empirically considered to be related to problems that the user may ask, such as whether the user has opened a certain service, in the past few Have you ever had a consumption record in the day?
  • behaviors include, but are not limited to, mobile phones, tablet client clicks, PC web browsing, and other operations performed by the user. This includes the user's behavioral trajectory information before the question. In theory, these behavioral trajectories are strongly related to the user's subsequent help. .
  • predicting the problems encountered by the user as accurately as possible before the user describes the problem can avoid the influence of the prior art human intervention and improve the accuracy of the classification prediction.
  • a method for predicting user problems based on data driving includes:
  • the selected behavior data is filtered by the set target behavior data set, the candidate behavior data included in the target behavior data set is filtered out from the candidate behavior data, and the selected candidate behavior data is input into the trained classifier model. Predict the category to which the user’s question belongs.
  • the trained classifier model includes the following steps:
  • the data-driven method is used to score the candidate behavior data corresponding to each user's feedback, and the target behavior data meeting the set conditions is filtered out.
  • the target behavior data corresponding to the user feedback problem is a union of the target behavior data set by the screening;
  • the classifier model is trained according to the feedback of each user and the target behavior data set.
  • the preprocessing includes:
  • the interference behavior data whose frequency is lower than the set frequency threshold is removed.
  • the preprocessing further includes:
  • the behavioral data is digitally identified in order to facilitate the processing of the digital identification directly in the subsequent steps, so that it is not necessary to process according to the specific data of the behavior data, such as a long string data such as a web address or an API name, and the processing is simpler.
  • the present invention intercepts the candidate behavior data that contributes to the problem raised by the user from the pre-processed user behavior data by using windowing truncation, and the windowing truncation includes:
  • the method further includes:
  • the method further includes the steps of:
  • the target behavior data in the target behavior data set is vectorized.
  • the method further includes:
  • the selected behavior data is vectorized.
  • the vectorized user behavior data can directly train the classifier model and be used for actual prediction, making calculations easier.
  • the present invention also provides an apparatus for predicting user problems based on data driving, and the apparatus for predicting user problems includes:
  • a pre-processing module configured to collect user behavior data and perform pre-processing when receiving a question raised by the user
  • An intercepting module configured to intercept, from the pre-processed user behavior data, candidate behavior data that contributes to a problem raised by the user;
  • the prediction module is configured to filter the selected behavior data by the set target behavior data set, select the candidate behavior data included in the target behavior data set from the candidate behavior data, and input the selected candidate behavior data into the training.
  • the classifier model predicts the category to which the user's question belongs.
  • the apparatus further includes a model training module for training the classifier model, and the model training model performs the following operations when training the classifier model:
  • the data-driven method is used to score the candidate behavior data corresponding to each user's feedback, and the target behavior data meeting the set conditions is filtered out.
  • the target behavior data corresponding to the user feedback problem is a union of the target behavior data set by the screening;
  • the classifier model is trained according to the feedback of each user and the target behavior data set.
  • the preprocessing module of the present invention performs the following steps when preprocessing the collected user behavior data:
  • the interference behavior data whose frequency is lower than the set frequency threshold is removed.
  • the pre-processing module is further configured to digitally identify user behavior data.
  • the windowing truncation method is adopted, and the windowing truncation includes:
  • model training module is configured to re-mark the target behavior data in the target behavior data set after the aggregated target behavior data corresponding to the feedback of all the users constitutes the filtered target behavior data set.
  • model training module is further configured to perform vectorization processing on the target behavior data in the target behavior data set before training the classifier model.
  • the prediction module further performs vectorization processing on the selected behavior data before inputting the selected candidate behavior data into the trained classifier model.
  • the invention provides a method and device for predicting user problems based on data driving, and uses the behavior track information of the user in a short time to classify and predict user problems to improve the classification accuracy rate, and significantly improve the model prediction effect without including such information.
  • FIG. 1 is a flow chart of a training classifier model of the present invention
  • FIG. 2 is a flow chart of a method for predicting a user problem based on data driving according to the present invention
  • FIG. 3 is a schematic structural diagram of an apparatus for predicting a user problem based on data driving according to the present invention.
  • the general idea of the present invention is to train the classifier model using the training data, and analyze the user behavior data according to the trained classifier model to predict the problems encountered by the user.
  • the process of training the classifier model by using the training data in this embodiment is as follows:
  • F1 Collecting user feedback problems and corresponding behavior data, preprocessing the collected user behavior data, preprocessing includes removing interference behavior data, and digitally identifying behavior data.
  • Behavioral data is a number of user actions, including mobile phones, tablet client clicks, PC web browsing, and other operations performed by the user, represented by a URL or API name, preceded by a unix timestamp.
  • Behavioral data is a number of user actions, including mobile phones, tablet client clicks, PC web browsing, and other operations performed by the user, represented by a URL or API name, preceded by a unix timestamp.
  • the behavior of a user X in the past period of time can be expressed as:
  • the preprocessing of this embodiment includes removing interference behavior data and digitally identifying behavior data.
  • the removal of the interference behavior data refers to the behavior data that is extremely low in frequency, for example, lower than the set frequency threshold. These behavior data with extremely low frequency result in the possibility of user feedback is relatively low, and this embodiment does not consider, thereby eliminating the interference caused by the behavior data with extremely low frequency.
  • the behavioral data is digitally identified in order to facilitate the direct processing of the digitized identifier in the subsequent steps, so that it is not necessary to process according to the specific data of the behavior data, such as a long string data such as a web address or an API name, and the processing is simpler.
  • the URL or API of the above behavior data may be digitally identified according to the mapping table prepared in advance; or by counting the frequency of occurrence of the behavior data, the number is sorted according to the frequency quantity, and the number is used.
  • a digital identifier of the behavior data or according to the specific content of the behavior data, the corresponding digitized identifier is obtained through HASH calculation.
  • the behavior data after the digital identification becomes:
  • the digitization identification is used for screening and processing directly in subsequent steps.
  • the actual feedback to the user feedback problem is often the behavior data of the user in the most recent period before the problem occurs. That is, the behavior data contributing to the feedback of the user is the behavior data of the user in the most recent time, and the historical behavior data can ignore the influence. Therefore, this embodiment needs to intercept user behavior data.
  • the behavior data of the user's most recent time is selected as the candidate behavior data.
  • a fixed window length or a variable window length can be selected.
  • the fixed window length is, for example, 30-120 behavior data, that is, 30-120 behavior data are selected from the current behavior data;
  • the variable window length is behavior data that is selected from the current behavior data for a certain period of time, for example, the current time is 0.5. Behavioral data within hours - 2 hours.
  • windowing is truncated from the last behavior data, ie, 1436862999:111, back length, fixed length window length (30-120 data) or variable window length (0.5 hours - 2 hours, through Unix The timestamp is determined). Assume that the data becomes truncated by windowing:
  • the candidate behavior data contributing to the feedback of the user is obtained, and the behavior data corresponding to the problem fed back by each user is traversed, and the candidate behavior data corresponding to the problem fed back by each user is obtained.
  • the data-driven method is used to score the candidate behavior data corresponding to each user feedback problem, and the target behavior data meeting the set conditions is selected.
  • the target behavior data corresponding to the feedback of all users is combined to form a filtered target behavior data set.
  • This embodiment takes all the feedbacks from the user as a file set, and each user's feedback is a file.
  • the data-driven method in this embodiment is a TF-IDF method, which is a commonly used weighting technique statistical method for information retrieval and data mining, for evaluating a word for a file set or a corpus.
  • the importance of a document The importance of a word increases proportionally with the number of times it appears in the file, but it also decreases inversely with the frequency it appears in the corpus.
  • the words are equivalent to the candidate behavior data, and all the user feedback questions are used as a file set, and each user feedback problem is used as a file, and the question corresponding to each user feedback through TF-IDF is selected.
  • the behavioral data is scored.
  • TF-IDF The main idea of TF-IDF is: If a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, then the word or phrase is considered to have good class distinguishing ability and is suitable for use. To classify. TFIDF is actually: TFXIDF, TF word frequency (Term Frequency), IDF inverse document frequency (Inverse Document Frequency). TF indicates the frequency at which the entry appears in document d.
  • IDF The main idea of IDF is: if there are fewer documents containing the term t, that is, the smaller n is, the larger the IDF is, indicating that the term t has a good class distinguishing ability. If the number of documents containing a term t in a certain class C.
  • the first step is to calculate the word frequency.
  • the inverse document frequency is calculated.
  • the third step is to calculate the TF-IDF.
  • TF-IDF Word Frequency (TF) ⁇ Reverse Document Frequency (IDF)
  • User behavior data can be regarded as words to some extent.
  • the importance of words to a file set or one of the files in a corpus is used by TF-IDF technology to filter and filter behavior data.
  • the behavior data suitable for classification is referred to as the target behavior data.
  • the behavior data of the highest N (50-200) or higher than the certain threshold is scored as the target behavior data by the problem of feedback from each user.
  • the target behavior data corresponding to the feedback of all users is combined to form a filtered target behavior data set, and the set contains the behavior data much less than the behavior data in all the training data.
  • the behavior data corresponding to question A is (indicated by a digitized identifier):
  • the union of the digital identification sets corresponding to all the questions constitutes the target behavior data set. It can be seen that when the above-mentioned user feedback problem is identified as a known problem, the target behavior data set contains the target behavior data of all known problems. .
  • target behavior data in the target behavior data set is re-digitized, so that the collection is simpler and convenient for subsequent processing.
  • the classifier model is trained according to the problem of each user feedback and the target behavior data set.
  • Classifier models include, but are not limited to, logistic regression models, deep neural network models, support vector machine models, recursive neural network models, and the like. In view of the prior art, there are many methods for obtaining a model based on training data, and details are not described herein again.
  • this embodiment is a method for predicting user problems based on data driving, including:
  • Step S1 When receiving a question raised by the user, collect user behavior data and perform pre-processing.
  • Step S2 intercepting behavior data that contributes to the problem fed back by the user from the pre-processed user behavior data as the candidate behavior data.
  • the customer service After the customer service receives the user's question, it can grab the user behavior data for pre-processing.
  • the specific method of pre-processing and how to perform windowing truncation have been described in the above training classifier model, and will not be described here.
  • Step S3 screening the selected behavior data by the set target behavior data set, selecting the candidate behavior data included in the target behavior data set from the candidate behavior data, and inputting the selected candidate behavior data into the trained classification.
  • the model predicts the category to which the user's question belongs.
  • user X's behavior data becomes filtered through the target behavior data set:
  • the three pieces of data will be removed because they are not included in the target behavior data set.
  • the classifier model Since the target behavior data set has been obtained through screening, the classifier model is trained. Therefore, when a user submits a question to the customer service, the customer service can submit the user's candidate behavior data to the trained classifier model, and the classifier model calculates which type of problem the user asks, and the output corresponds to different The probability of the problem, the question with the highest probability of selection as the category to which the user's question belongs.
  • the present embodiment is a method for predicting user problems based on data driving, and separately performs vectorization processing on the target behavior data in the target behavior data set, and the behavior to be selected.
  • the data is vectorized.
  • Binarization refers to the position of the corresponding vector position, and does not appear to be set to 0; the quantization refers to the number of occurrences of this behavior at the corresponding vector position.
  • the vectorized user behavior data can directly train the classifier model and be used for actual prediction, or it can be combined with the original features to train the classifier model and used for actual prediction.
  • the present invention also provides an apparatus for predicting user problems based on data driving, the apparatus comprising:
  • a pre-processing module configured to collect user behavior data and perform pre-processing when receiving a question raised by the user
  • An intercepting module configured to intercept, from the pre-processed user behavior data, candidate behavior data that contributes to a problem raised by the user;
  • the prediction module is configured to filter the selected behavior data by the set target behavior data set, select the candidate behavior data included in the target behavior data set from the candidate behavior data, and input the selected candidate behavior data into the training.
  • the classifier model predicts the category to which the user's question belongs.
  • the apparatus for predicting a user problem in this embodiment further includes a model training module for training the classifier model, and the model training model performs the following operations when training the classifier model:
  • the data-driven method is used to score the candidate behavior data corresponding to each user's feedback, and the target behavior data meeting the set conditions is filtered out.
  • the target behavior data corresponding to the user feedback problem is a union of the target behavior data set by the screening;
  • the classifier model is trained according to the feedback of each user and the target behavior data set.
  • the interference behavior data whose frequency is lower than the set frequency threshold is removed.
  • the pre-processing module is further configured to digitally identify user behavior data.
  • the window truncation method when the candidate behavior data contributing to the problem raised by the user is intercepted from the pre-processed user behavior data, the window truncation method is adopted, and the window truncation includes:
  • the target behavior of the model training module of the present embodiment corresponding to the feedback of all users After the data is combined to form the filtered target behavior data set, it is also used to digitally identify the target behavior data in the target behavior data set.
  • the model training module of the embodiment is further used for vectorizing the target behavior data in the target behavior data set before training the classifier model.
  • the prediction module of the embodiment is further used for vectorizing the selected behavior data before inputting the selected candidate behavior data into the trained classifier model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于数据驱动预测用户问题的方法及装置,所述方法当收到用户提出的问题时,采集用户行为数据并进行预处理(S1),并从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据(S2),通过设定的目标行为数据集合对待选行为数据进行筛选,从待选行为数据中筛选出目标行为数据集合包含的待选行为数据,将筛选出的待选行为数据输入训练好的分类器模型,预测出用户提出的问题所属的类别(S3)。装置包括预处理模块、截取模块和预测模块。该方法及装置能够显著提升预测效果。

Description

一种基于数据驱动预测用户问题的方法及装置
本申请要求2016年01月08日递交的申请号为201610014971.0、发明名称为“一种基于数据驱动预测用户问题的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明属于数据处理技术领域,尤其涉及一种基于数据驱动预测用户问题的方法及装置。
背景技术
用户在使用产品或者服务的时候经常会遇到自己无法处理的问题,进而会寻求客服帮助。通常客服人员需要和用户经过多轮对话才能确定用户遇到的是什么问题,这样需要投入大量的人力成本。如果能够提前对用户可能遇到的问题做出预测,则可以智能推送相关答案或帮助客服人员更有效的定位用户问题。
提前对用户可能遇到的问题做出预测是一个典型的多分类问题,通常由特征选择和模型建模两个部分组成。在已有的方法中,特征选择端提取特征时,通常由人为设定一些规则,这些规则从经验上被认为与用户可能提问的问题相关,如该用户是否开通了某种服务、在过去几天内是否有过消费记录等。通过与这些规则的匹配可以得到描述该用户提问前状态的特征。而后采用逻辑回归的技术对这些特征进行建模,得到分类器并用于进行新特征的预测。
在现有技术中,由人为设定一些经验上被认为与用户可能遇到问题相关的规则,通过与这些规则的匹配得到描述该用户提问前状态的特征。这存在两个问题:1.并非数据驱动,而是需要强烈的人为干预,并要求干预者充分了解和熟悉相应产品或业务,在产品变动频繁或业务覆盖范围扩展时会引入许多不便,可扩展性不强。2.未能考虑到用户在寻求客服人员帮助前短时间内的行为与该用户问题之间的关系,通常用户在寻求客服人员帮助前短时间内(例如2小时内)会有一系列的行为,这些行为包括但不限于手机、平板客户端点击、PC网页浏览以及其它由该用户进行的操作,这其中包含了用户在提问前的行为轨迹信息,理论上这些行为轨迹与用户后续求助存在强烈关联。
发明内容
本发明的目的是提供一种基于数据驱动预测用户问题的方法及装置,用于在已知一 些用户信息和操作的情况下,在用户未描述问题前尽可能准确的预测用户遇到的问题,能够避免现有技术人为干预的影响,提高了分类预测的准确性。
为了实现上述目的,本发明技术方案如下:
一种基于数据驱动预测用户问题的方法,所述预测用户问题的方法包括:
当收到用户提出的问题时,采集用户行为数据并进行预处理;
从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据;
通过设定的目标行为数据集合对待选行为数据进行筛选,从待选行为数据中筛选出目标行为数据集合包含的待选行为数据,将筛选出的待选行为数据输入训练好的分类器模型,预测出用户提出的问题所属的类别。
进一步地,所述训练好的分类器模型,训练过程包括如下步骤:
采集用户反馈的问题及其对应的行为数据,对采集的用户行为数据进行预处理;
从预处理后的用户行为数据中截取对用户反馈的问题有贡献的行为数据作为待选行为数据;
根据所有用户反馈的问题及其对应的待选行为数据,采用数据驱动的方法对每一个用户反馈的问题对应的待选行为数据进行打分,并筛选出符合设定条件的目标行为数据,对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合;
根据每一个用户反馈的问题及目标行为数据集合,训练得到分类器模型。
进一步地,所述预处理包括:
去除频次低于设定的频次阈值的干扰行为数据。
所述预处理还包括:
对用户行为数据进行数字化标识。对行为数据进行数字化标识,是为了便于后续步骤中直接以该数字化标识来进行处理,从而不需要根据行为数据的具体数据例如网址或API名等长字符串数据进行处理,处理起来更加简单。
进一步地,本发明从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据采用加窗截断的方法,所述加窗截断包括:
截取在发生问题前最近一段时间内的用户行为数据。
进一步地,所述对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合之后,还包括:
重新对目标行为数据集合中的目标行为数据进行数字化标识。
进一步地,所述训练得到分类器模型之前,还包括步骤:
对目标行为数据集合中的目标行为数据进行矢量化处理。
进一步地,所述将筛选出的待选行为数据输入训练好的分类器模型之前,还包括:
对待选行为数据进行矢量化处理。矢量化后的用户行为数据可直接训练分类器模型和用于实际预测,计算更加简便。
本发明还提出了一种基于数据驱动预测用户问题的装置,所述预测用户问题的装置包括:
预处理模块,用于当收到用户提出的问题时,采集用户行为数据并进行预处理;
截取模块,用于从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据;
预测模块,用于通过设定的目标行为数据集合对待选行为数据进行筛选,从待选行为数据中筛选出目标行为数据集合包含的待选行为数据,将筛选出的待选行为数据输入训练好的分类器模型,预测出用户提出的问题所属的类别。
进一步地,所述装置还包括模型训练模块,用于训练分类器模型,所述模型训练模型在训练分类器模型时,执行如下操作:
采集用户反馈的问题及其对应的行为数据,对采集的用户行为数据进行预处理;
从预处理后的用户行为数据中截取对用户反馈的问题有贡献的行为数据作为待选行为数据;
根据所有用户反馈的问题及其对应的待选行为数据,采用数据驱动的方法对每一个用户反馈的问题对应的待选行为数据进行打分,并筛选出符合设定条件的目标行为数据,对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合;
根据每一个用户反馈的问题及目标行为数据集合,训练得到分类器模型。
本发明所述预处理模块在对采集的用户行为数据进行预处理时,执行如下步骤:
去除频次低于设定的频次阈值的干扰行为数据。
所述预处理模块还用于对用户行为数据进行数字化标识。
进一步地,所述截取模块在从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据时,采用加窗截断的方法,所述加窗截断包括:
截取在发生问题前最近一段时间内的用户行为数据。
进一步地,所述模型训练模块对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合之后,还用于重新对目标行为数据集合中的目标行为数据进行数字化标识。
进一步地,所述模型训练模块在训练得到分类器模型之前,还用于对目标行为数据集合中的目标行为数据进行矢量化处理。
进一步地,所述预测模块在将筛选出的待选行为数据输入训练好的分类器模型之前,还用于对待选行为数据进行矢量化处理。
本发明提出的一种基于数据驱动预测用户问题的方法及装置,利用用户短时间内的行为轨迹信息进行用户问题的分类预测以提升分类准确率,显著提升未包含这些信息的模型预测效果。
附图说明
图1为本发明训练分类器模型的流程图;
图2为本发明基于数据驱动预测用户问题的方法流程图;
图3为本发明基于数据驱动预测用户问题的装置结构示意图。
具体实施方式
下面结合附图和实施例对本发明技术方案做进一步详细说明,以下实施例不构成对本发明的限定。
本发明的总体思想是采用训练数据训练出分类器模型,根据训练的分类器模型对用户行为数据进行分析,来预测用户遇到的问题。
如图1所示,本实施例采用训练数据训练出分类器模型的过程如下:
F1、采集用户反馈的问题及其对应的行为数据,对采集的用户行为数据进行预处理,预处理包括去除干扰行为数据,以及对行为数据进行数字化标识。
对于任何用户反馈的问题,都采集该用户的行为数据,从而得到大量的行为数据。行为数据是一些用户操作,包括手机、平板客户端点击、PC网页浏览以及其它由该用户进行的操作,这些操作以网址或API名表示,其前冠以unix时间戳。例如一个用户X在过去一段时间的行为可以表示为:
1438661879:alipay.mappprod.shop.queryPage
1438661885:alipay.client.mobileapp.checkResult
1438661889:alipay.commerce.category.queryByCategoryId
1438661899:alipay.siteprobe.sync.queryWifis
1438661909:alipay.charity.mobile.donate.deduct.unsign
…..
…..
1438661999:https://couriercore.alipay.com/errorRepeatSubmit.htm
1438662999:https://cshall.alipay.com/lab/question.htm
为了更加准确和便于后续的处理,本实施例预处理包括去除干扰行为数据,以及对行为数据进行数字化标识。
其中去除干扰行为数据,是指出现的频次极低的行为数据,例如低于设定的频次阈值。这些出现频次极低的行为数据造成用户反馈的问题的可能比较低,本实施例不予考虑,从而排除出现频次极低的行为数据带来的干扰。
其中对行为数据进行数字化标识,是为了便于后续步骤中直接以该数字化标识来进行处理,从而不需要根据行为数据的具体数据例如网址或API名等长字符串数据进行处理,处理起来更加简单。
对行为数据进行数字化标识,可以将以上行为数据的网址或API按照事先准备好的映射表进行数字化标识;或通过对行为数据出现的频次进行统计,按照频次数量的大小进行排序编号,以该编号作为行为数据的数字化标识;或者根据行为数据的具体内容通过HASH计算得到其对应的数字化标识。数字化标识后的行为数据变为:
1438661879:2
1438661885:65
1438661889:11
1438661899:6
1438661909:18
…..
…..
1438661999:108
1438662999:111
在后续步骤中直接以该数字化标识来进行筛选和处理。
F2、从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据。
对于大量的用户行为数据,真正对用户反馈的问题带来影响的往往是用户在发生问题前最近一段时间内的行为数据。即对用户反馈的问题有贡献的行为数据是用户最近时间的行为数据,历史行为数据可以忽略其影响。因此本实施例需要截取用户行为数据, 选择用户最近时间的行为数据作为待选行为数据。
具体地,通过加窗来进行截取,可以选择固定窗长或可变窗长。固定窗长例如30-120个行为数据,即从当前行为数据往前选取30-120个行为数据;可变窗长是从当前行为数据往前选取一定时长的行为数据,例如当前时间往前0.5小时-2小时内的行为数据。
例如,对于上述行为数据,加窗截断时从最后一个行为数据,即1438662999:111往前回溯,长度固定窗长(30-120个数据)或可变窗长(0.5小时-2小时,通过unix时间戳确定)。假设通过加窗截断后数据变为:
1438661885:65
1438661889:11
1438661899:6
1438661909:18
…..
…..
1438661999:108
1438662999:111
从而得到对用户反馈的问题有贡献的待选行为数据,遍历每一个用户反馈的问题对应的行为数据,得到每一个用户反馈的问题对应的待选行为数据。
F3、根据所有用户反馈的问题及其对应的待选行为数据,采用数据驱动的方法对每一个用户反馈的问题对应的待选行为数据进行打分,并筛选出符合设定条件的目标行为数据,对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合。
本实施例将所有用户反馈的问题作为文件集,每一个用户反馈的问题作为一个文件。本实施例数据驱动的方法为TF-IDF方法,TF-IDF方法是一种用于信息检索与数据挖掘的常用加权技术统计方法,用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加,但同时会随着它在语料库中出现的频率成反比下降。对于本实施例来说,字词相当于待选行为数据,所有用户反馈的问题作为文件集,每一个用户反馈的问题作为一个文件,通过TF-IDF对每一个用户反馈的问题对应的待选行为数据进行打分。
TF-IDF的主要思想是:如果某个词或短语在一篇文章中出现的频率TF高,并且在其他文章中很少出现,则认为此词或者短语具有很好的类别区分能力,适合用来分类。 TFIDF实际上是:TFXIDF,TF词频(Term Frequency),IDF逆文档频率(Inverse Document Frequency)。TF表示词条,在文档d中出现的频率。IDF的主要思想是:如果包含词条t的文档越少,也就是n越小,IDF越大,则说明词条t具有很好的类别区分能力。如果某一类C.中包含词条t的文档数为m,而其它类包含t的文档总数为k,显然所有包含t的文档数n=m+k,当m大的时候,n也大,按照IDF公式得到的IDF的值会小,就说明该词条t类别区分能力不强。
详细算法如下:
第一步,计算词频。
Figure PCTCN2016112853-appb-000001
第二步,计算逆文档频率。
Figure PCTCN2016112853-appb-000002
第三步,计算TF-IDF。
TF-IDF=词频(TF)×逆文档频率(IDF)
用户行为数据从某种程度上可以看作是字词,字词对于一个文件集或一个语料库中的其中一份文件的重要程度,将TF-IDF技术借鉴过来,应用于行为数据的筛选,筛选出适合用来分类的行为数据,将这些筛选出的行为数据称为目标行为数据,
本实施例通过对每一个用户反馈的问题取打分最高的前N(50-200)个或高于一定阈值的行为数据,作为目标行为数据。并对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合,该集合包含的行为数据数量远小于所有训练数据中的行为数据数量。
例如问题A对应的行为数据为(以数字化标识来表示):
6 18 1 9 77 98 69………………….
………………………………88 189 87
对所有的数字化标识进行TF-IDF打分,并从高到低取对问题A最重要的top N(如50),可以得到对问题A最重要的行为数据:
A:11 18……..108…….
取所有问题对应的数字化标识集合的并集即构成了目标行为数据集合,可见当上述用户反馈的问题经过识别为已知的问题时,上述目标行为数据集合包含了所有已知问题的目标行为数据。
进一步地,还对目标行为数据集合中的目标行为数据进行重新数字化标识,使得该集合更加简单,便于进行后续处理。
F4、根据每一个用户反馈的问题及目标行为数据集合,训练得到分类器模型。
利用已知问题及其对应的目标行为数据训练出分类器模型,通过该分类器模型,从而当有用户反馈了一个问题时,能够通过对用户的行为数据进行分析,预测出该问题可能是哪一个已知的问题,从而便于客服回答用户的问题,并给出解决办法。
分类器模型包括但不限于逻辑回归模型、深度神经网络模型、支持向量机模型、递归神经网络模型等。鉴于现有技术根据训练数据进行训练得到模型的方法比较多,这里不再赘述。
如图2所示,本实施例一种基于数据驱动预测用户问题的方法,包括:
步骤S1、当收到用户提出的问题时,采集用户行为数据并进行预处理。
步骤S2、从预处理后的用户行为数据中截取对用户反馈的问题有贡献的行为数据作为待选行为数据。
客服接收到用户提出的问题后,则可以抓取用户行为数据进行预处理,关于预处理的具体办法以及如何进行加窗截断,在上文训练分类器模型时已经描述,这里不再赘述。
步骤S3、通过设定的目标行为数据集合对待选行为数据进行筛选,从待选行为数据中筛选出目标行为数据集合包含的待选行为数据,将筛选出的待选行为数据输入训练好的分类器模型,预测出用户提出的问题所属的类别。
例如,用户X的行为数据通过目标行为数据集合的筛选后即变为:
1438661889:11
1438661909:18
…..
…..
1438661999:108
假设目标行为数据集合中不包括1438661885:65,1438661899:6,1438662999:111,则该三条数据会被去掉,因为其不包含在目标行为数据集合中。
由于前面已经通过筛选得到了目标行为数据集合,并训练得到了分类器模型。因此在有用户向客服提交问题时,客服就能够将用户的待选行为数据提交给训练好的分类器模型,分类器模型计算出用户提出的问题具体是哪一类的问题,输出对应于不同问题的概率,选择概率最高的问题作为用户提出的问题所属的类别。
进一步地,为了便于训练分类器模型,以及后续的预测,本实施例一种基于数据驱动预测用户问题的方法,还分别对目标行为数据集合中的目标行为数据进行矢量化处理,以及对待选行为数据进行矢量化处理。进行矢量化处理
矢量化处理分为二值化和数量化,二值化指出现则在对应矢量位置置1,不出现置0;数量化指在对应矢量位置该行为出现的次数。矢量化后的用户行为数据可直接训练分类器模型和用于实际预测,也可以和原有特征结合训练分类器模型和用于实际预测。
如图3所示,与上述方法对应地,本发明还提出了一种基于数据驱动预测用户问题的装置,该装置包括:
预处理模块,用于当收到用户提出的问题时,采集用户行为数据并进行预处理;
截取模块,用于从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据;
预测模块,用于通过设定的目标行为数据集合对待选行为数据进行筛选,从待选行为数据中筛选出目标行为数据集合包含的待选行为数据,将筛选出的待选行为数据输入训练好的分类器模型,预测出用户提出的问题所属的类别。
本实施例预测用户问题的装置还包括模型训练模块,用于训练分类器模型,模型训练模型在训练分类器模型时,执行如下操作:
采集用户反馈的问题及其对应的行为数据,对采集的用户行为数据进行预处理;
从预处理后的用户行为数据中截取对用户反馈的问题有贡献的行为数据作为待选行为数据;
根据所有用户反馈的问题及其对应的待选行为数据,采用数据驱动的方法对每一个用户反馈的问题对应的待选行为数据进行打分,并筛选出符合设定条件的目标行为数据,对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合;
根据每一个用户反馈的问题及目标行为数据集合,训练得到分类器模型。
本实施例预处理模块在对采集的用户行为数据进行预处理时,执行如下步骤:
去除频次低于设定的频次阈值的干扰行为数据。
进一步地,预处理模块还用于对用户行为数据进行数字化标识。
本实施例截取模块在从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据时,采用加窗截断的方法,所述加窗截断包括:
截取在发生问题前最近一段时间内的用户行为数据。
与上述方法对应地,本实施例模型训练模块对所有用户反馈的问题对应的目标行为 数据取并集构成筛选出的目标行为数据集合之后,还用于重新对目标行为数据集合中的目标行为数据进行数字化标识。
本实施例模型训练模块在训练得到分类器模型之前,还用于对目标行为数据集合中的目标行为数据进行矢量化处理。
与上述方法对应地,本实施例预测模块在将筛选出的待选行为数据输入训练好的分类器模型之前,还用于对待选行为数据进行矢量化处理。
以上实施例仅用以说明本发明的技术方案而非对其进行限制,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。

Claims (16)

  1. 一种基于数据驱动预测用户问题的方法,其特征在于,所述预测用户问题的方法包括:
    当收到用户提出的问题时,采集用户行为数据并进行预处理;
    从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据;
    通过设定的目标行为数据集合对待选行为数据进行筛选,从待选行为数据中筛选出目标行为数据集合包含的待选行为数据,将筛选出的待选行为数据输入训练好的分类器模型,预测出用户提出的问题所属的类别。
  2. 根据权利要求1所述的基于数据驱动预测用户问题的方法,其特征在于,所述训练好的分类器模型,训练过程包括如下步骤:
    采集用户反馈的问题及其对应的行为数据,对采集的用户行为数据进行预处理;
    从预处理后的用户行为数据中截取对用户反馈的问题有贡献的行为数据作为待选行为数据;
    根据所有用户反馈的问题及其对应的待选行为数据,采用数据驱动的方法对每一个用户反馈的问题对应的待选行为数据进行打分,并筛选出符合设定条件的目标行为数据,对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合;
    根据每一个用户反馈的问题及目标行为数据集合,训练得到分类器模型。
  3. 根据权利要求1或2所述的基于数据驱动预测用户问题的方法,其特征在于,所述预处理包括:
    去除频次低于设定的频次阈值的干扰行为数据。
  4. 根据权利要求3所述的基于数据驱动预测用户问题的方法,其特征在于,所述预处理还包括:
    对用户行为数据进行数字化标识。
  5. 根据权利要求1或2所述的基于数据驱动预测用户问题的方法,其特征在于,所述从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据采用加窗截断的方法,所述加窗截断包括:
    截取在发生问题前最近一段时间内的用户行为数据。
  6. 根据权利要求4所述的基于数据驱动预测用户问题的方法,其特征在于,所述对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合之后,还包括:
    重新对目标行为数据集合中的目标行为数据进行数字化标识。
  7. 根据权利要求2所述的基于数据驱动预测用户问题的方法,其特征在于,所述训练得到分类器模型之前,还包括步骤:
    对目标行为数据集合中的目标行为数据进行矢量化处理。
  8. 根据权利要求7所述的基于数据驱动预测用户问题的方法,其特征在于,所述将筛选出的待选行为数据输入训练好的分类器模型之前,还包括:
    对待选行为数据进行矢量化处理。
  9. 一种基于数据驱动预测用户问题的装置,其特征在于,所述预测用户问题的装置包括:
    预处理模块,用于当收到用户提出的问题时,采集用户行为数据并进行预处理;
    截取模块,用于从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据;
    预测模块,用于通过设定的目标行为数据集合对待选行为数据进行筛选,从待选行为数据中筛选出目标行为数据集合包含的待选行为数据,将筛选出的待选行为数据输入训练好的分类器模型,预测出用户提出的问题所属的类别。
  10. 根据权利要求9所述的基于数据驱动预测用户问题的装置,其特征在于,所述装置还包括模型训练模块,用于训练分类器模型,所述模型训练模型在训练分类器模型时,执行如下操作:
    采集用户反馈的问题及其对应的行为数据,对采集的用户行为数据进行预处理;
    从预处理后的用户行为数据中截取对用户反馈的问题有贡献的行为数据作为待选行为数据;
    根据所有用户反馈的问题及其对应的待选行为数据,采用数据驱动的方法对每一个用户反馈的问题对应的待选行为数据进行打分,并筛选出符合设定条件的目标行为数据,对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合;
    根据每一个用户反馈的问题及目标行为数据集合,训练得到分类器模型。
  11. 根据权利要求9或10所述的基于数据驱动预测用户问题的装置,其特征在于,所述预处理模块在对采集的用户行为数据进行预处理时,执行如下步骤:
    去除频次低于设定的频次阈值的干扰行为数据。
  12. 根据权利要求11所述的基于数据驱动预测用户问题的装置,其特征在于,所述预处理模块还用于对用户行为数据进行数字化标识。
  13. 根据权利要求9或10所述的基于数据驱动预测用户问题的装置,其特征在于,所述截取模块在从预处理后的用户行为数据中截取对用户提出的问题有贡献的待选行为数据时,采用加窗截断的方法,所述加窗截断包括:
    截取在发生问题前最近一段时间内的用户行为数据。
  14. 根据权利要求12所述的基于数据驱动预测用户问题的装置,其特征在于,所述模型训练模块对所有用户反馈的问题对应的目标行为数据取并集构成筛选出的目标行为数据集合之后,还用于重新对目标行为数据集合中的目标行为数据进行数字化标识。
  15. 根据权利要求10所述的基于数据驱动预测用户问题的装置,其特征在于,所述模型训练模块在训练得到分类器模型之前,还用于对目标行为数据集合中的目标行为数据进行矢量化处理。
  16. 根据权利要求15所述的基于数据驱动预测用户问题的装置,其特征在于,所述预测模块在将筛选出的待选行为数据输入训练好的分类器模型之前,还用于对待选行为数据进行矢量化处理。
PCT/CN2016/112853 2016-01-08 2016-12-29 一种基于数据驱动预测用户问题的方法及装置 WO2017118333A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP16883458.8A EP3401853A4 (en) 2016-01-08 2016-12-29 METHOD AND APPARATUS FOR PREDICTING A USER PROBLEM BASED ON DATA CONTROL
JP2018535292A JP2019505909A (ja) 2016-01-08 2016-12-29 ユーザ質問を予測するためのデータ駆動型方法及び装置
US16/029,508 US11481698B2 (en) 2016-01-08 2018-07-06 Data-driven method and apparatus for handling user inquiries using collected data
US18/045,801 US11928617B2 (en) 2016-01-08 2022-10-11 Data-driven method and apparatus for handling user inquiries using collected data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610014971.0A CN106960248B (zh) 2016-01-08 2016-01-08 一种基于数据驱动预测用户问题的方法及装置
CN201610014971.0 2016-01-08

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/029,508 Continuation US11481698B2 (en) 2016-01-08 2018-07-06 Data-driven method and apparatus for handling user inquiries using collected data

Publications (1)

Publication Number Publication Date
WO2017118333A1 true WO2017118333A1 (zh) 2017-07-13

Family

ID=59273370

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/112853 WO2017118333A1 (zh) 2016-01-08 2016-12-29 一种基于数据驱动预测用户问题的方法及装置

Country Status (5)

Country Link
US (2) US11481698B2 (zh)
EP (1) EP3401853A4 (zh)
JP (1) JP2019505909A (zh)
CN (1) CN106960248B (zh)
WO (1) WO2017118333A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634827A (zh) * 2018-12-12 2019-04-16 北京字节跳动网络技术有限公司 用于生成信息的方法和装置
US11481698B2 (en) 2016-01-08 2022-10-25 Alibaba Group Holding Limited Data-driven method and apparatus for handling user inquiries using collected data

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933705B (zh) * 2019-03-22 2021-10-19 国家电网有限公司 一种大数据平台运维管理系统
US11727312B2 (en) * 2019-09-03 2023-08-15 International Business Machines Corporation Generating personalized recommendations to address a target problem
CN110569446B (zh) * 2019-09-04 2022-05-17 第四范式(北京)技术有限公司 一种构建推荐对象候选集的方法和系统
CN110889551A (zh) * 2019-11-23 2020-03-17 湖南新泉工程造价咨询有限公司 一种全过程工程咨询服务方法
CN112348614A (zh) * 2019-11-27 2021-02-09 北京京东尚科信息技术有限公司 用于推送信息的方法和装置
CN111159015B (zh) * 2019-12-13 2022-01-14 华为技术有限公司 定位问题的方法和装置
CN115423485B (zh) * 2022-11-03 2023-03-21 支付宝(杭州)信息技术有限公司 数据处理方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390194A (zh) * 2012-05-07 2013-11-13 北京三星通信技术研究有限公司 用户意图预测及推荐建议的方法、设备和系统
US20150106284A1 (en) * 2013-10-10 2015-04-16 Askem Tlv Ltd System and method for a user online experience distilling the collective knowledge and experience of a plurality of participants
CN104572734A (zh) * 2013-10-23 2015-04-29 腾讯科技(深圳)有限公司 问题推荐方法、装置及系统
CN104778176A (zh) * 2014-01-13 2015-07-15 阿里巴巴集团控股有限公司 一种数据搜索处理方法及装置
CN104951433A (zh) * 2015-06-24 2015-09-30 北京京东尚科信息技术有限公司 基于上下文进行意图识别的方法和系统

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132347A1 (en) * 2003-08-12 2009-05-21 Russell Wayne Anderson Systems And Methods For Aggregating And Utilizing Retail Transaction Records At The Customer Level
KR100873373B1 (ko) * 2007-02-14 2008-12-10 성균관대학교산학협력단 사용자 의도 인식 시스템 및 방법
JP2008262362A (ja) * 2007-04-11 2008-10-30 Denso Corp 情報通信システム、施設側装置、ユーザ側装置、施設側装置用プログラム及びユーザ側装置用プログラム
CN101079851B (zh) * 2007-07-09 2011-01-05 华为技术有限公司 邮件类型判断方法、装置及系统
US8126881B1 (en) * 2007-12-12 2012-02-28 Vast.com, Inc. Predictive conversion systems and methods
US8140328B2 (en) * 2008-12-01 2012-03-20 At&T Intellectual Property I, L.P. User intention based on N-best list of recognition hypotheses for utterances in a dialog
US8326777B2 (en) * 2009-07-31 2012-12-04 Yahoo! Inc. Supplementing a trained model using incremental data in making item recommendations
CN102279851B (zh) * 2010-06-12 2017-05-03 阿里巴巴集团控股有限公司 一种智能导航方法、装置和系统
US8918331B2 (en) * 2010-12-21 2014-12-23 Yahoo ! Inc. Time-triggered advertisement replacement
CN102096717B (zh) * 2011-02-15 2013-01-16 百度在线网络技术(北京)有限公司 搜索方法及搜索引擎
US10096033B2 (en) * 2011-09-15 2018-10-09 Stephan HEATH System and method for providing educational related social/geo/promo link promotional data sets for end user display of interactive ad links, promotions and sale of products, goods, and/or services integrated with 3D spatial geomapping, company and local information for selected worldwide locations and social networking
CN103957777B (zh) * 2011-12-07 2018-01-09 捷通国际有限公司 行为跟踪和修正系统
CN103310343A (zh) * 2012-03-15 2013-09-18 阿里巴巴集团控股有限公司 商品信息发布方法和装置
US20150294078A1 (en) * 2012-09-06 2015-10-15 Koninklijke Philips N.V. Scheduling instruction items
US20140149177A1 (en) * 2012-11-23 2014-05-29 Ari M. Frank Responding to uncertainty of a user regarding an experience by presenting a prior experience
US9600774B1 (en) * 2013-09-25 2017-03-21 Amazon Technologies, Inc. Predictive instance suspension and resumption
US9508360B2 (en) * 2014-05-28 2016-11-29 International Business Machines Corporation Semantic-free text analysis for identifying traits
CN105447038A (zh) * 2014-08-29 2016-03-30 国际商业机器公司 用于获取用户特征的方法和系统
US20160364757A1 (en) * 2015-06-09 2016-12-15 Yahoo! Inc. Method and system for sponsored search results placement in a search results page
CN106960248B (zh) 2016-01-08 2021-02-23 阿里巴巴集团控股有限公司 一种基于数据驱动预测用户问题的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390194A (zh) * 2012-05-07 2013-11-13 北京三星通信技术研究有限公司 用户意图预测及推荐建议的方法、设备和系统
US20150106284A1 (en) * 2013-10-10 2015-04-16 Askem Tlv Ltd System and method for a user online experience distilling the collective knowledge and experience of a plurality of participants
CN104572734A (zh) * 2013-10-23 2015-04-29 腾讯科技(深圳)有限公司 问题推荐方法、装置及系统
CN104778176A (zh) * 2014-01-13 2015-07-15 阿里巴巴集团控股有限公司 一种数据搜索处理方法及装置
CN104951433A (zh) * 2015-06-24 2015-09-30 北京京东尚科信息技术有限公司 基于上下文进行意图识别的方法和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3401853A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11481698B2 (en) 2016-01-08 2022-10-25 Alibaba Group Holding Limited Data-driven method and apparatus for handling user inquiries using collected data
US11928617B2 (en) 2016-01-08 2024-03-12 Alibaba Group Holding Limited Data-driven method and apparatus for handling user inquiries using collected data
CN109634827A (zh) * 2018-12-12 2019-04-16 北京字节跳动网络技术有限公司 用于生成信息的方法和装置

Also Published As

Publication number Publication date
US20230116895A1 (en) 2023-04-13
US11481698B2 (en) 2022-10-25
CN106960248B (zh) 2021-02-23
EP3401853A1 (en) 2018-11-14
US11928617B2 (en) 2024-03-12
EP3401853A4 (en) 2019-01-23
CN106960248A (zh) 2017-07-18
US20180314990A1 (en) 2018-11-01
JP2019505909A (ja) 2019-02-28

Similar Documents

Publication Publication Date Title
WO2017118333A1 (zh) 一种基于数据驱动预测用户问题的方法及装置
CN110297988B (zh) 基于加权LDA和改进Single-Pass聚类算法的热点话题检测方法
Parlina et al. Naive Bayes algorithm analysis to determine the percentage level of visitors the most dominant zoo visit by age category
CN110163647B (zh) 一种数据处理方法及装置
CN108733816B (zh) 一种微博突发事件检测方法
CN107862022B (zh) 文化资源推荐系统
US10387805B2 (en) System and method for ranking news feeds
JP2015062117A (ja) 実体のリンク付け方法及び実体のリンク付け装置
Kumar et al. Predicting clicks: CTR estimation of advertisements using logistic regression classifier
CN108363784A (zh) 一种基于文本机器学习的舆情走向预测方法
US20210374681A1 (en) System and method for providing job recommendations based on users' latent skills
CN106537387B (zh) 检索/存储与事件相关联的图像
WO2016040304A1 (en) A method for detection and characterization of technical emergence and associated methods
Liang et al. MOPSO-based CNN for keyword selection on Google ads
CN113392920B (zh) 生成作弊预测模型的方法、装置、设备、介质及程序产品
Chaparro et al. Sentiment analysis of social network content to characterize the perception of security
Nirmala et al. Twitter data analysis for unemployment crisis
CN110019563B (zh) 一种基于多维数据的肖像建模方法和装置
Priyoko et al. Implementation of Naive Bayes algorithm for spam comments classification on Instagram
CN104933097B (zh) 一种用于检索的数据处理方法和装置
CN111507528A (zh) 一种基于cnn-lstm的股票长期趋势预测方法
JP6026036B1 (ja) データ分析システム、その制御方法、プログラム、及び、記録媒体
KR101613397B1 (ko) 시계열 텍스트 데이터 및 시계열 수치 데이터의 연관 방법 및 그 장치
CN110472680B (zh) 目标分类方法、装置和计算机可读存储介质
TW201828165A (zh) 基於資料驅動預測使用者問題的方法及裝置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16883458

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018535292

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016883458

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2016883458

Country of ref document: EP

Effective date: 20180808