CN101119326A - Method and device for managing instant communication conversation recording - Google Patents

Method and device for managing instant communication conversation recording Download PDF

Info

Publication number
CN101119326A
CN101119326A CN 200610109539 CN200610109539A CN101119326A CN 101119326 A CN101119326 A CN 101119326A CN 200610109539 CN200610109539 CN 200610109539 CN 200610109539 A CN200610109539 A CN 200610109539A CN 101119326 A CN101119326 A CN 101119326A
Authority
CN
China
Prior art keywords
session
classification
corresponding
user
recording
Prior art date
Application number
CN 200610109539
Other languages
Chinese (zh)
Other versions
CN101119326B (en
Inventor
石燕伟
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to CN 200610109539 priority Critical patent/CN101119326B/en
Publication of CN101119326A publication Critical patent/CN101119326A/en
Application granted granted Critical
Publication of CN101119326B publication Critical patent/CN101119326B/en

Links

Abstract

The present invention discloses a managing method for instant communication conversation records, which aims at solving the prior problems that inquiring information in the conversation records is not only fussy but also inefficiency for instant communicating users. The method comprises the following steps: getting and assorting the conversation records of users to get sample groups; correlating analyzing each sample group to get a corresponding assorted group that contains eigenvector corresponding with the conversation records in the sample groups; determining the conversation topic corresponding with the assorted group according to the emerging frequency of the words in the assorted groups, and relating the conversation topic to the conversation records corresponding with the assorted group; searching for the conversation topic that matches with the key word inputted by the users, and displaying the conversation records that relates with the conversation topic to the users. The present invention also discloses a managing device used for instant communication conversation records.

Description

一种即时通信会话记录的管理方法及装置技术领域本发明涉及通信及计算机技术领域,尤其涉及一种即时通讯会话记录的管理方法及装置。 Method and apparatus for managing an instant communications session Technical Field The present invention relates to recording communications and computer technology, and in particular, to a chat session management method and recording apparatus. 背景技术随着即时通讯(IM)技术的不断发展和普及,越来越多的用户不仅采用IM 软件在网络中与其他用户进行交流,还可以将IM软件作为用户向其他用户咨询工作或学习中遇到问题的工具,同时,用户间的会话记录伴随着用户间的交流在IM系统中保存下来,为用户以后查找自己关注的信息提供了资料。 BACKGROUND With the development and popularity of instant messaging (IM) technology, more and more users using IM software not only to communicate with other users in the network, the IM software can also work as a user to other users advice or learning tool encounters problems while recording a session between the user along with the exchange between IM users saved in the system for future users to find the information they provided information of interest. 例如:当用户A向用户B对一个问题进行了咨询,用户B返回了问题的答案,当用户C就同一个问题咨询用户A或用户B时,用户A需要查看与用户B的会话记录中的相关信息,或者用户B需要查看与用户A的会话记录中的相关信息时,用户A或用户B都需要在会话记录中人工查找相关记录,当会话记录较多或用户A与用户C咨询问题的时间间隔较长时,采用现有技术的方法,不仅增加了人工查找的工作量,而且查找效率较低。 For example: when a user A to a problem advice to user B, user B returns the answer to the question, when the user C on the same issue advice user A and user B, user A need to view the session recording with user B's for information or need to see when the user B and user a session record of relevant information, the user a or user B will need to manually locate the relevant records in the recording session, when the session is recorded or more user a and user C counseling when the time interval is longer, the prior art methods, not only increases the workload of manual lookup and find less efficient. 如果用户A就同一问题对多个用户进行了咨询,当用户A希望从与多个用户的会话记录中查询信息时,采用现有技术的方法,如用户使用的即时通信系统提供会话记录查看功能的即时通信系统时,用户A只能人工对多个用户的会话记录逐一查看,找到自己关心的信息。 If user A plurality of users consulted on the same issue, when a user A wants to query information from a recording session with multiple users, the use of prior art methods, such as instant messaging system provides users viewing session record when the instant messaging system, user a can only manually recording sessions for multiple users to view one by one, to find the information they care about. 即使用户A使用其它一些提供了用户会话记录的数据导入/导出功能的即时通信系统,用户A也需要将多个用户的会话记录数据先进行导出,然后在导出数据中进行查询,用户A还可根据自己关心的信息的关键词在导出数据中进行查询,但采用关键词的方式也只能定位到包含该关键词的语段,该语段不一定与用户关心的信息相关,也不能实现 Even using some other user A provides a data import / export function of the user's instant messaging session recording system, the user A also requires a plurality of data records the user's session to be exported, and exporting the data in the query, the user A may also according to their interest in the export of key data, data query, but using keyword targeting only way to contain the keyword discourse, the discourse is not necessarily related to information of interest to users, can not be achieved

用户在会话记录中有效查找信息。 Effective user to find information in the session record. 发明内容本发明提供一种即时通讯会话记录的管理方法及装置,用以解决现有技术中存在的即时通讯用户在会话记录中查询信息时,不仅操作繁瑣,而且查询效率低的问题。 Management method and apparatus of the present invention to provide a chat session record to solve the problems present in prior art IM user query information, not only the operation is complicated in a recording session, the query and low efficiency. 本发明提供以下技术方案:一种即时通信会话记录的管理方法,包括如下步骤: 获取用户的会话记录并对其进行分类得到样本集合;分别对各样本集合进行相关性分析生成相应的分类組合,该分类组合包含所述样本集合中会话记录对应的特征向量;会话主题关联到分类组合对应的会话记录;以及冲艮据用户查询时输入的关键词查找与该关键词匹配的会话主题,并将查找到的与会话主题关联的会话记录呈现给用户。 The present invention provides the following technical solutions: A IM session record management method, comprising the steps of: obtaining a user's session record and classified to obtain a set of samples; for each sample were set of correlation analysis to generate the corresponding combinations of classification, the classification of a composition comprising the sample set feature vectors corresponding to the recorded session; classification corresponding to the combination of a session record associated with a session relating; query and a keyword input Burgundy red data relating to a user session to find matches with the keyword, and find the theme of the session associated with the session recording presented to the user. 其中,生成会话主题后进一步分析会话主题之间的相关性,并将相关性大于预定阈值的会话主题合并为同一个会话主题,使合并后的会话主题与被合并的所有会话主题所对应的会话记录关联。 Wherein, after generating a session relating to further analyze the correlation between the conversation thread, and the correlation is greater than a predetermined threshold value relating to a session belong to a single conversation thread, so that a conversation thread combined with the subject matter being incorporated all sessions corresponding to session record association. 按不同的会话用户对会话记录进行分类生成样本集合。 Generating the session records can be categorized according to different sample sets of user sessions. 较佳的,才艮据所述样本集合中会话记录的间隔时间,进一步将一个样本集合划分为多个不同的样本集合。 Preferably, only the sample interval according Gen set recorded session, a further set of samples is divided into a plurality of different sample sets. 对样本集合进行相关性分析生成分类组合包括步骤:生成样本集合中每条会话记录对应的特征向量;分析各特征向量与其他特征向量的相关性;根据所述相关性对特征向量进行分类生成分类组合。 Sample set of correlation analysis to generate classification composition comprising the steps of: generating a sample set feature vectors corresponding to each recording session; Related vectors of each feature vector with other features; the correlation of the feature vectors are classified according to the classification generated combination. 其中,对每条会话记录进行分词处理,删除该会话记录中无实际意义的词语并合并剩余词i吾中的同义词生成该会话记录对应的特4正向量。 Wherein each of the word processing session record, delete the words in the session record and combined moot synonym word i I remaining in the session record is generated vector corresponding to the positive Patent 4. 根据组成所述特征向量的各词在其特征向量中的权重计算各特征向量的相关性。 The composition of the feature vector of each word in its right eigenvectors of the correlation recalculated for each feature vector. 根据分类组合中出现频率大于预定阈值的词语确定该分类组合的会话主题。 Term frequency value greater than a predetermined threshold for determining the classification of a session relating to combination compositions according to the classification occurs. 一种即时通信会话记录的管理装置,包括:用于存储用户会话记录的单元;用于对所述会话记录进行分类生成样本集合的单元;用于对所述样本集合进行相关性分析生成相应的分类组合的单元;用于确定所述分类组合对应的会话主题,并使该会话主题关联到分类组合对应的^S舌记录的单元;以及用于根据用户查询时输入的关键词查找与该关键词匹配的会话主题,并将查找到的与会话主题关联的会话记录呈现给用户的单元。 A method of managing a recording apparatus the instant communications session, comprising: means for storing user session record; recording means for classifying the session a set of samples to generate; sample set for the correlation analysis to yield the corresponding combined classification unit; means for determining the classification corresponding to the combination of a conversation thread, and associate the session relating to the classification unit corresponding to the combination of recording tongue ^ S; and for finding the key to the user input keyword query words that match the theme of the session, and presents the found associated with the session topic of conversation recording unit to the user. 较佳的,所述装置还包括:用于分析会话主题之间的相关性,并将相关性大于预定阈值的会话主题合并为同一个会话主题,以及将合并后的会话主题与被合并的所有会话主题所对应的会话记录关联的单元。 Preferably, the apparatus further comprising: a correlation analysis between a conversation thread, and the combined session relating to the correlation value is greater than a predetermined threshold relating to the same session, and a session relating to the merged with the all merged recording means associated with the topic corresponding to the session session. 本发明有益效果如下:本发明对用户会话记录进行分类生成样本集合后,分别对各样本集合进行话主题关联到分类组合对应的会话记录。 Advantageous effects of the present invention are as follows: the present invention is to classify the user session records generate samples after collection, each sample of each set of words associated with the classification corresponding to the combination relating to the session record. 采用本发明后,当用户需要从会话记录中查询信息时,用户只需输入关键词,系统将自动查找与该关键词匹配的会话主题,并将查找到的会话主题所关联的会话记录呈现给用户,不仅避免了用户手工查询信息时的繁瑣4喿作,而且提高了查询效率。 With the present invention, when the user needs to query information from the recording session, the user can enter keywords, the system will automatically find the conversation topic matches the keyword, and presents the associated session record session relating to the found users, not only to avoid the tedious work at 4 Qiao user manual query information, and improve the query efficiency. 附图说明 BRIEF DESCRIPTION

图1为本发明实施例中用户会话记录的管理装置结构示意图; 图2为本发明实施例中用户会话记录管理方法的示意图; 图3为本发明实施例中对用户会话记录进行分类的处理流程图; 图4为本发明实施例中对样本集合进行相关性分析的处理流程图。 FIG 1 is a schematic configuration management apparatus records user sessions embodiment of the present invention; FIG. 2 is a schematic embodiment of a user session record management method embodiment of the present invention; FIG. 3 process flow in the user session recording classify embodiment of the present invention. ; Figure 4 is a flowchart of processing of the sample set of correlation analysis embodiment of the invention. 具体实施方式为了解决现有技术中,即时通讯用户在会话记录中查询信息时,不仅操作繁瑣,而且查询效率低的问题,本实施例中对用户会话记录进行分类生成样本集合,分别对各样本集合进行相关性分析生成相应的分类组合并确定出分类组合对应的会话主题,并将会话主题关联到分类组合对应的会话记录,以及根据用户输入的关键词查找与该关键词匹配的会话主题,并将查找到的会话主题所关联的会话记录呈现给用户。 DETAILED DESCRIPTION In order to solve the cumbersome prior art, when the instant messaging user query information, operation only in the session record and query efficiency is low, the present embodiment classifies the recording session the user generates a set of samples, each sample respectively generating a set of correlation analysis and the corresponding cluster in the determined combination of the classification corresponding conversation thread, and associate the session relating to the classification corresponding to the combination of the recording session, and a session relating to find matches with the keyword according to the keyword input by the user, and presents the found session record associated with the session theme to the user. 参阅图1所示为本实施例中用户会话记录的管理装置结构示意图,包括: 存储单元101、分类单元102、分析单元103、会话主题单元104、合并单元105 和查询单元106。 Referring to FIG structural schematic view of a user session record management apparatus in the embodiment of FIG 1, comprising: a storage unit 101, the classification unit 102, analyzing unit 103, a conversation thread unit 104, combining unit 105 and a query unit 106. 存储单元101用于保存用户的会话记录和会话主题。 The storage unit 101 for storing user session records and conversation. 分类单元102用于获取会话记录并对会话记录进行分类得到样本集合。 Classification unit 102 for acquiring a recording session and session records classified to obtain sample set. 分析单元103用于对样本集合进行相关性分析,生成样本集合的分类组合。 Analysis of the sample collection unit 103 for the correlation analysis, generate a classification combined sample sets. 会话主题单元104用于确定样本集合分类组合的会话主题,并使该会话主题关联到分类组合对应的会话记录。 Unit 104 for determining a conversation topic classification combined sample sets relating to the session, and the session relating to a combination of the classification corresponding to the associated session record. 合并单元105用于分析会话主题之间的相关性,并将相关性大于预定阈值的会话主题合并为同一个会话主题,以及将合并后的会话主题关联到被合并的所有会话主题对应的会话记录。 Combining unit 105 for analyzing the correlation between the conversation thread, and the combined session relating to the correlation value is greater than a predetermined threshold relating to the same session, and a session associated with the subject matter will be incorporated into all sessions merged session record corresponding to a theme . 查询单元106用于接收用户在会话记录中查询信息时输入的关键词和查找与该关键词匹配的会话主题,并将查找到的会话主题所关联的会话记录呈现给用户。 And a keyword query unit 106 to find matches the conversation topic keyword query receiving user input information in a recording session, and presents the found session record associated with a session relating to the user. 参阅图2所示为本实施例中用户会话记录管理方法的示意图,包括: 步骤201 、获取用户的会话记录并对该会话记录进行分类得到样本集合。 See the schematic diagram of a user session record management method of the present embodiment shown in FIG. 2, comprising: a step 201, the user acquires the recording session and the session record classified to obtain sample set.

步骤202、对生成的样本集合进行相关性分析生成相应的分类組合。 Step 202, a set of samples generated correlation analysis to generate the corresponding combinations of classification. 步骤203、根据各分类组合中词语出现的频率确定分类组合对应的会话主题,并使该会话主题关联到分类组合对应的会话记录。 Step 203, determine the session topic classification corresponding to the combination of the frequency of each class in accordance with a combination of the word occurs, the session and relating to the classification corresponding to the combination of the associated session record. 步骤204、分析会话主题之间的相关性,并将相关性大于预定阈值的会话主题合并为同一个会话主题,使合并后的会话主题关联到被合并的所有会话主题对应的会话记录。 Step 204, the correlation between the conversation thread, and the correlation is greater than a predetermined threshold value relating to a session belong to a single conversation thread, associating the session relating to the merged into the merged session corresponding to all records relating to the session. 步骤205、当用户在会话记录中查询信息时,根据用户查询时输入的关键词查找与该关键词匹配的会话主题,并将查找到的会话主题所关联的会话记录呈现给用户。 Step 205, when a user queries the information in the session record, find a session that matches the keyword relating to the input keyword query the user, and presents the associated session record found relating to a user session. 在步骤201中,对会话记录进行分类的处理流程参阅图3所示,处理过程步骤301、判断会话记录是否已经过分类处理,如果已经过分类处理,则不对其进行处理;否则,执行步骤302。 In step 201, the process flow for the recording session classification see FIG. 3, the processing of step 301, determines whether the session has been recorded sorting process, if the process has been classified, not subjected to processing; otherwise, step 302 is performed . 步骤302、对没有经过分类处理的会话记录4艮据不同的用户对会话记录进行分类,如:判断会话记录TR/和会话记录TRj是否属于同一用户间的会话记录,如果会话记录TR/和会话记录TRj分属于不同用户间的会话,将会话记录TR/和会话记录TRj划分为不同的样本集合TS;如果会话记录TR,'和会话记录TRj属于同一用户间的会话记录,则将会话记录TR/和会话记录TRj划分到相同的样本集合中。 Step 302, a no 4 Burgundy according to the different user session records can be categorized through session record classification process, such as: session Analyzing session record TR / and session records TRj whether conversation record between the same user, if the session record TR / and TRj belong recording sessions between different users, the session record TR / TRj and session records into different sample sets of the TS; TR If the session record, 'and belonging to the conversation recording TRj recording session between the same users, then the session record TR / TRj and session record set is divided into the same sample. 步骤303、将同一样本集合根据该样本集合中的会话记录的间隔时间进行划分,进一步划分为不同的样本集合,会话记录的间隔时间根据实际应用,可设为一星期等。 Step 303, the same set of samples classified according to the session record interval sample set, is further divided into a set of different samples, recording the session time interval depending on the application, it can be set to one week and the like. 经过步骤303处理生成的样本集合TS为进行相关性分析的样本集合。 Sample processing after step 303 is performed to generate a set of sample sets TS correlation analysis. 参阅图4所示,对一个样本集合采用KNN ( K Nearest Neighbor, K最近邻居)算法进行相关性分析的处理过程如下:步骤401、对样本集合TS中的每条会话记录TR生成对应的特征向量。 Referring to FIG. 4, for a set of samples using KNN (K Nearest Neighbor, K nearest neighbor) algorithm correlation analysis processing procedure is as follows: Step 401 generates a feature vector corresponding to each sample set TS session records in TR .

首先对每条会话记录TR进4亍分词处理,去除其中的助词,"又词等无实际意义的词,得到集合S;合并S中的同义词,例如将{"电脑","计算机"}合并为{"计算机,,,"计算机,,}。将经过同义词合并后的对应亍每条会话记录的集合S进行向量化,生成特征向量S (W,,W2,W3......Wn),其中Wi为第i个元素的权重,各元素为S中的词语。步骤402、计算与各会话记录TR对应的特征向量^中各元素的权值W。 采用如下公式进行权值计算:『"^ = (A^)xl。 First, each conversation record TR into 4 right foot word processing, removing particle therein, "and words like moot words, to obtain a set of S; combined S synonyms, for example, {" computer "," computer "} Merge is { "computer ,,," ,,} computer would be the set S corresponding to the right foot after the session record for each synonym combined quantization to generate a feature vector S (W ,, W2, W3 ...... Wn .), where Wi is the weight of i-th element of the weight of the elements are words of S in step 402, calculates for each session eigenvector TR record corresponding ^ weights each element value W. using the following formula for weight calculation: "" ^ = (A ^) xl. g(iV/",+0.01) , V2L[机")x!og(W"'+0.01)12其中,『(/,^)为词t在特征向量S中的权重,而为词t在特征向量S中的词频,N为每个样本集合TS中会话记录TR的总数,&为每个样本集合TS中出现词t的会话记录TR数,分母为归一化因子。步骤403、计算与各会话记录对应的特征向量之间的相关系数,根据计算所得相关系数确定与各特征向量最相似的K个特征向量。具体实施时,采用如下公式其中,Sim(《,《)为特征向量《与特征向量《的相关系数,^和W,分别为特征向量c/,.和特征向量《的第k个元素的权值。通过计算,获得各特征向量间的相关系数,根据该相关系数,将与每一个特征向量最相关的K个特征向量分别组合为一个集合,K的取值可根据实际应用进行确定。步骤404、将各会话记录对应的特征向量划分到分类C中的不同类中生成分类组合。分类C为样本集合TS中各会话记录对 g (iV / ", + 0.01), V2L [machines") x! og (W " '+ 0.01) 12 wherein" (/, ^) Ci t weight in the feature vector S in weight, and for the word t in feature vector S in the word frequency, N the total number of samples for each set of a TS session record TR, the word & recording session TR number t for each occurrence TS sample set, the denominator is a normalization factor. in step 403, calculates each session record correlation coefficient between the feature vector corresponding to each feature vector to determine the most similar to the K feature vector from the resulting correlation coefficient is calculated. when the particular embodiments, using the following formula wherein, Sim ( ",") is the feature vector " eigenvector "correlation coefficients, ^ and W, respectively feature vector c / ,. feature vector and" weight value of the k-th element by calculation, to obtain a correlation coefficient between each feature vector, based on the correlation coefficient, the most relevant feature vector of each of the K eigenvectors were combined as one set, the value K may be determined according to the application step 404, for each session record corresponding eigenvector C classification classified into different classes generating classification combination classified as C in the sample set for each session record TS 应的特征向量组成的集合。计算出各会话记录对应的特征向量间的相关系数, A set of feature vectors to be composed of. The calculated correlation coefficient between the feature vector corresponding to each record session,

方法一:当分类C为空时,则采用如下方式生成分类C中的一个向量集合c,然后将c添加到分类C中:对应于会话记录的特征向量《和特征向量(分别属于对方最相似的K个邻居组成的集合,则《和^属于同一类c,生成类c并将该类与特征向量《和特征向量^对应的会话记录关联,然后将类c添加到分类C,每个类c中的特征向量组成一个分类组合。方法二:当分类C不为空时,则计算对应于各会话记录的特征向量^属于某个类c (ceC)的权重,采用如下的公式: ;Ki,C》=》一力:K《,C:)其中,^为对应于一条会话记录的特征向量,《为与^最相似的K个邻居组成的集合中的特征向量,s/w(je,《)为元与其最相关的特征向量《的相关系数,该相关系数可根据步骤403计算结果获得,y",。)为类别属性函数,如果特征向量《属于类q, y(《,。)的函数值为1,否则为0。根据计算得 Method 1: When Class C is empty, then the following way to generate Class C in a set of vectors c, then add c to Category C,: feature vectors corresponding to the session record "and the feature vector (belonging to the other party is most similar to the composition of the K sets of neighbors, the "^ and c belong to the same class, and the class of class c generated with the eigenvectors" corresponding to the session records and associated eigenvectors ^, and then add c to the class classification C, each class feature vector c is composed of a classification of a combination of two: when the class C is not empty, then the calculation corresponding to the feature vector to each session record ^ right belongs to a class c (CEC) of the weight, using the following formula:;. Ki , C "=" a force: K ", C :) where ^ eigenvector corresponding to a session record," the set of feature vectors is most similar to the K ^ consisting of neighbors, s / w (je ") is the element most relevant feature vector" of the correlation coefficient, the correlation coefficient can be obtained according to step 403 the calculation results, y ",.) as a function of the category attribute, if the feature vector" belongs to the class q, y ( ",. ) is a function value, otherwise it would be calculated as 0.5 z^,c》,比较特征向量^在各类q中的权值,将特征向量文分到权值较大的类Q中,并将该类G与特征向量^对应的会话记录关联。釆用方法二时,如果特征向量^和现存每个类c的相关度都很小,则可采用方法一的方式生成一个新的类c',并将类c'加入到分类C中,并将类c'与特性向量^对应的会话记录关联。对各特征向量进行处理后,将特征向量都划分到一个类中,由各类分别组成分类组合。将生成的各分类组合中出现频率最高的N个词语或者频率大于a的词语, 确定为该分类组合的会话主题,N值和a值根据实际应用进行确定。对每个样本集合TS进行上述处理后生成分类组合及该分类组合对应的会话主题,将生成的会话主题进行相关性分析时,将会话主题作为KNN算法的 z ^, c ", comparative feature vector q ^ weights of the various types, the text feature vectors into weight classes Q is larger, and class G ^ eigenvector associated with the corresponding session record. Bian when using two methods, if the feature vectors for each class c ^ and the existing degree of correlation is very small, can be adopted a method of generating a new class manner c ', and the class c' is added to the class C, and the class c '^ and the characteristic vector corresponding to the session record is associated. after processing for each feature vector, the feature vectors are classified into a class, a combination of various types consisting each classification. each classification appears in combination of the highest frequency generated after the frequency is greater than N words or a word, it is determined that a session relating to a combination of classification, N and a values ​​determined according to practical applications. TS is set for each sample generated above classification process of the classification and a combination corresponding to a combination of the session when the topic of conversation topics will generate correlation analysis, as the session topic KNN algorithm

一个样本集合,计算该集合中每一个会话主题中各词在该会话主题中的权重,根据权重,利用步骤403中的公式,计算出各会话主题的相关系数,将相关系数大亍设定阈值的会话主题进行合并。 A sample collection, calculated for each session relating to each term weighting in the conversation thread weights in the set, according to the weight, the use of step 403 formula to calculate the correlation coefficient for each session the subject matter, the correlation coefficient right foot set threshold session topics combined. 呈现会话记录给用户时,根据不同的会话用户将会话记录进行排列,也可以根据会话主题中会话记录的权重顺序排列。 Presentation session recording to the user, arranged different sessions based on user session recording can also be re-ordered according to the theme of the session in the session record. 以上实施例中采用了KNN算法对样本集合进行相关性分析,但本发明不仅限于采用KNN算法对样本集合进行分析。 In the above embodiment of the KNN algorithm using correlation analysis sample set, but the present invention is not limited to the use of KNN algorithm for analysis of the sample set. 对会话记录进行相关性分析的方法还可以应用向量机算法、神经网络算法以及贝叶斯算法等基于向量空间的训练算法和分类方法。 The method of recording sessions correlation analysis can also be applied vector machine algorithm, neural network algorithms and Bayesian algorithm training algorithms and classification methods based on vector space. 例如采用贝叶斯算法时,计算各会话记录对应特征向量中每个词出现在某个会话中的概率,然后根据贝叶斯公式计算出特征向量属于某个会话的概率,将其加入到概率最大的会话中。 For example, using Bayesian algorithm, the probability of the feature vector corresponding to each word appears in each computing a session in a session record, and then calculate the probability of a feature vector belongs to a session in accordance with Bayes formula, and added to the probability the maximum session. 采用本发明,当用户在会话记录中查询用户关心的信息时,用户只需要输入关键词,系统将自动查询与关键词匹配的会话主题,并将与该会话主题关联的会话记录呈现给用户,不仅避免了用户手工查询信息时的繁琐操作,而且提高了查询效率。 According to the present invention, when a user queries the user interest information in a session record, the user only needs to enter keywords, the system automatically matches the keyword relating to the query session, and presentation session associated with the session record relating to a user, not only to avoid the tedious manual operation when the user query information, and improve the query efficiency. 明的精神和范围。 The spirit and scope of the Ming. 这样,倘若对本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。 Thus, if part of the claimed invention for such modifications and variations within the scope of the present invention and equivalents thereof, the present invention intends to include these modifications and variations.

Claims (10)

1、一种即时通信会话记录的管理方法,其特征在于,包括如下步骤: 获取用户的会话记录并对其进行分类得到样本集合; 分别对各样本集合进行相关性分析生成相应的分类组合,该分类组合包含所述样本集合中会话记录对应的特征向量; 根据各分类组合中词语出现的频率确定分类组合对应的会话主题,并使该会话主题关联到分类组合对应的会话记录;以及根据用户查询时输入的关键词查找与该关键词匹配的会话主题,并将查找到的与会话主题关联的会话记录呈现给用户。 1, an instant communications session record management method, comprising the steps of: obtaining a session record of the user and classified to obtain a set of samples; for each sample were set of correlation analysis to generate the corresponding cluster in the said composition comprising a classification feature vector corresponding to the sample set of the recording session; determine a classification corresponding to the combination of a session relating to the frequency of each word occurs composition classification, and the classification associated with the session topic corresponding to a combination of a recording session; according to the user query and Find keyword input session theme that matches the keyword, and present to find the theme of the session associated with the session recording to the user.
2、 如权利要求1所述的方法,其特征在于,生成会话主题后进一步分析会话主题之间的相关性,并将相关性大于预定阈值的会话主题合并为同一个会联。 2. The method as claimed in claim 1, wherein the further analysis of the correlation between the generated session relating to a conversation thread, and the correlation is greater than a predetermined threshold value relating to the session merging will be linked to the same.
3、 如权利要求1或2所述的方法,其特征在于,按不同的会话用户对会话记录进行分类生成样本集合。 3. A method as claimed in claim 1 or 2, characterized in that, to generate the session records can be categorized according to different sample sets of user sessions.
4、 如权利要求3所述的方法,其特征在于,根据所述样本集合中会话记录的间隔时间,进一步将一个样本集合划分为多个不同的样本集合。 4. The method as claimed in claim 3, wherein, according to the sample interval set recorded session, a further set of samples is divided into a plurality of different sample sets.
5、 如权利要求1所述的方法,其特征在于,对样本集合进行相关性分析生成分类组合包括步骤:生成样本集合中每条会话记录对应的特征向量; 分析各特征向量与其他特征向量的相关性; 才艮据所述相关性对特征向量进行分类生成分类组合。 5. The method of claim 1, wherein the sample set of correlation analysis to generate classification composition comprising the steps of: generating a sample set feature vectors corresponding to each recording session; Analysis of each feature vector with other features vectors correlation; Gen It was the correlation of the feature vectors are classified generating a classification combination.
6、 如权利要求5所述的方法,其特征在于,对每条会话记录进行分词处理,删除该会话记录中无实际意义的词语并合并剩余词语中的同义词生成该会话记录对应的特征向量。 6. The method as claimed in claim 5, wherein each of the word processing session record, delete the words in the conversation record moot remaining words and synonyms combined in the generate feature vector corresponding to the session record.
7、 如权利要求6所述的方法,其特征在于,根据组成所述特征向量的各词在其特征向量中的权重计算各特征向量的相关性。 7. The method as claimed in claim 6, characterized in that, in its right eigenvectors of the correlation recalculated each feature vector of each word according to the composition of the feature vector.
8、 如权利要求5所述的方法,其特征在于,根据分类组合中出现频率大于预定阈值的词语确定该分类组合的会话主题。 8. A method as claimed in claim 5, wherein determining the classification of a session relating to term frequency combinations than a predetermined threshold occurs according to the combination of the classification.
9、 一种即时通信会话记录的管理装置,其特征在于,包括: 用于存储用户会话记录的单元;用于对所述会话记录进行分类生成样本集合的单元;用于对所述样本集合进行相关性分析生成相应的分类组合的单元;用于确定所述分类组合对应的会话主题,并使该会话主题关联到分类组合对应的会话记录的单元;以及用于根据用户查询时输入的关键词查找与该关键词匹配的会话主题,并将查找到的与会话主题关联的会话记录呈现给用户的单元。 9, recording an instant communication session management apparatus comprising: recording means for storing user session; means for recording the session classification means for generating a set of samples; means for collection of the sample correlation analysis in combination to form the corresponding classification unit; means for determining the classification corresponding to the combination of a conversation thread, and associate the session relating to the classification unit session record corresponding to the combination; and a keyword input according to the user query Find a conversation topic that matches the keyword, and present to find the theme of the session associated with the session recording to the user unit.
10、 如权利要求9所述的装置,其特征在于,还包括:用于分析会话主题之间的相关性,并将相关性大于预定阈值的会话主题合并为同一个会话主题,以及将合并后的会话主题与被合并的所有会话主题所对应的会话记录关联的单元。 After analysis of a correlation between a conversation thread, and the correlation is greater than a predetermined threshold value relating to a session belong to a single conversation thread, and the combined: 10, The apparatus of claim 9, characterized in that, further comprising conversation topics relating to all of the sessions are merged session corresponding associated recording unit.
CN 200610109539 2006-08-04 2006-08-04 Method and device for managing instant communication conversation record CN101119326B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200610109539 CN101119326B (en) 2006-08-04 2006-08-04 Method and device for managing instant communication conversation record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200610109539 CN101119326B (en) 2006-08-04 2006-08-04 Method and device for managing instant communication conversation record

Publications (2)

Publication Number Publication Date
CN101119326A true CN101119326A (en) 2008-02-06
CN101119326B CN101119326B (en) 2010-07-28

Family

ID=39055265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200610109539 CN101119326B (en) 2006-08-04 2006-08-04 Method and device for managing instant communication conversation record

Country Status (1)

Country Link
CN (1) CN101119326B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101997964A (en) * 2009-08-13 2011-03-30 中国电信股份有限公司 Processing method of mobile communication terminal and contact records thereof
CN102246175A (en) * 2008-12-12 2011-11-16 皇家飞利浦电子股份有限公司 An assertion-based record linkage in distributed and autonomous healthcare environments
CN102646134A (en) * 2012-03-29 2012-08-22 百度在线网络技术(北京)有限公司 Method and device for determining message session in message record
CN101483620B (en) 2009-02-17 2012-09-26 腾讯科技(深圳)有限公司 Session reservation method and system in instant communication tool
CN103078781A (en) * 2011-10-25 2013-05-01 国际商业机器公司 Method for instant messaging system and instant messaging system
CN103279465A (en) * 2012-12-18 2013-09-04 北京奇虎科技有限公司 Method and device for controlling communication historical data
CN103279466A (en) * 2012-12-18 2013-09-04 北京奇虎科技有限公司 Method and device for controlling communication historical data
CN103425648A (en) * 2012-05-15 2013-12-04 腾讯科技(深圳)有限公司 Relationship circle processing method and relationship circle processing system
CN104361003A (en) * 2014-10-10 2015-02-18 金硕澳门离岸商业服务有限公司 Method and device for classified displaying of chat records
CN104462518A (en) * 2014-12-22 2015-03-25 百度在线网络技术(北京)有限公司 IM information labeling method and device
CN105024906A (en) * 2014-04-21 2015-11-04 腾讯科技(深圳)有限公司 SNS (social networking services) group message storing, inquiring methods and systems
CN105049336A (en) * 2015-08-12 2015-11-11 深圳前海珩昌科技有限公司 Method and system for processing instant communication messages, server and client
CN105141502A (en) * 2015-08-12 2015-12-09 深圳前海珩昌科技有限公司 Method and device for managing instant communication process
CN105450497A (en) * 2014-07-31 2016-03-30 国际商业机器公司 Method and device for generating clustering model and carrying out clustering based on clustering model
CN105589625A (en) * 2015-12-21 2016-05-18 惠州Tcl移动通信有限公司 Method and device for processing social media message and communication terminal
CN105959205A (en) * 2016-04-29 2016-09-21 杨夫春 Chatting records keeping method
CN106487640A (en) * 2015-08-25 2017-03-08 平安科技(深圳)有限公司 Many communication modules control method and server
CN106599147A (en) * 2016-12-06 2017-04-26 庄爱芹 Method and device for browser browsing history management
CN106777013A (en) * 2016-12-07 2017-05-31 科大讯飞股份有限公司 Dialogue management method and apparatus
CN106888236A (en) * 2015-12-15 2017-06-23 腾讯科技(深圳)有限公司 Conversation managing method and session management device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1306440C (en) 2004-05-27 2007-03-21 威盛电子股份有限公司 Related document connecting managing system and method
CN100535895C (en) 2004-08-23 2009-09-02 富士施乐株式会社 Test search apparatus and method
CN1609859A (en) 2004-11-26 2005-04-27 孙斌 Search results clustering method

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102246175A (en) * 2008-12-12 2011-11-16 皇家飞利浦电子股份有限公司 An assertion-based record linkage in distributed and autonomous healthcare environments
CN101483620B (en) 2009-02-17 2012-09-26 腾讯科技(深圳)有限公司 Session reservation method and system in instant communication tool
CN101997964A (en) * 2009-08-13 2011-03-30 中国电信股份有限公司 Processing method of mobile communication terminal and contact records thereof
CN103078781A (en) * 2011-10-25 2013-05-01 国际商业机器公司 Method for instant messaging system and instant messaging system
CN102646134A (en) * 2012-03-29 2012-08-22 百度在线网络技术(北京)有限公司 Method and device for determining message session in message record
CN103425648A (en) * 2012-05-15 2013-12-04 腾讯科技(深圳)有限公司 Relationship circle processing method and relationship circle processing system
CN103425648B (en) * 2012-05-15 2016-04-13 腾讯科技(深圳)有限公司 The disposal route of relation loop and system
CN103279465A (en) * 2012-12-18 2013-09-04 北京奇虎科技有限公司 Method and device for controlling communication historical data
CN103279466A (en) * 2012-12-18 2013-09-04 北京奇虎科技有限公司 Method and device for controlling communication historical data
CN103279466B (en) * 2012-12-18 2018-01-26 北京奇虎科技有限公司 Control the method and device of communication historical data
CN103279465B (en) * 2012-12-18 2018-05-25 北京奇虎科技有限公司 The control method and device of communication historical data
CN105024906B (en) * 2014-04-21 2018-10-02 腾讯科技(深圳)有限公司 The storage of group's message, querying method and system in social networks
CN105024906A (en) * 2014-04-21 2015-11-04 腾讯科技(深圳)有限公司 SNS (social networking services) group message storing, inquiring methods and systems
CN105450497A (en) * 2014-07-31 2016-03-30 国际商业机器公司 Method and device for generating clustering model and carrying out clustering based on clustering model
CN104361003A (en) * 2014-10-10 2015-02-18 金硕澳门离岸商业服务有限公司 Method and device for classified displaying of chat records
CN104462518A (en) * 2014-12-22 2015-03-25 百度在线网络技术(北京)有限公司 IM information labeling method and device
CN104462518B (en) * 2014-12-22 2018-10-19 百度在线网络技术(北京)有限公司 Method and apparatus for being labeled to IM information
CN105049336A (en) * 2015-08-12 2015-11-11 深圳前海珩昌科技有限公司 Method and system for processing instant communication messages, server and client
CN105141502A (en) * 2015-08-12 2015-12-09 深圳前海珩昌科技有限公司 Method and device for managing instant communication process
CN106487640A (en) * 2015-08-25 2017-03-08 平安科技(深圳)有限公司 Many communication modules control method and server
CN106888236A (en) * 2015-12-15 2017-06-23 腾讯科技(深圳)有限公司 Conversation managing method and session management device
CN105589625A (en) * 2015-12-21 2016-05-18 惠州Tcl移动通信有限公司 Method and device for processing social media message and communication terminal
CN105959205A (en) * 2016-04-29 2016-09-21 杨夫春 Chatting records keeping method
CN106599147A (en) * 2016-12-06 2017-04-26 庄爱芹 Method and device for browser browsing history management
CN106777013A (en) * 2016-12-07 2017-05-31 科大讯飞股份有限公司 Dialogue management method and apparatus

Also Published As

Publication number Publication date
CN101119326B (en) 2010-07-28

Similar Documents

Publication Publication Date Title
US7151852B2 (en) Method and system for segmentation, classification, and summarization of video images
EP1062590B1 (en) A scalable system for clustering of large databases
AU2006277608B2 (en) Method and system for extracting web data
US6438579B1 (en) Automated content and collaboration-based system and methods for determining and providing content recommendations
US6507841B2 (en) Methods of and apparatus for refining descriptors
EP0802489B1 (en) Method of image retrieval based on probabilistic function
US7849089B2 (en) Method and system for adapting search results to personal information needs
DE60313283T2 (en) Method for summary of unknown video content
US7421429B2 (en) Generate blog context ranking using track-back weight, context weight and, cumulative comment weight
US7062498B2 (en) Systems, methods, and software for classifying text from judicial opinions and other documents
US6744935B2 (en) Content-based image retrieval apparatus and method via relevance feedback by using fuzzy integral
US9672217B2 (en) System and methods for generation of a concept based database
US5852821A (en) High-speed data base query method and apparatus
US20150026182A1 (en) Systems and methods for generation of searchable structures respective of multimedia data content
US20040107194A1 (en) Information storage and retrieval
CN101295305B (en) The image search device
US7693904B2 (en) Method and system for determining relation between search terms in the internet search system
US7158970B2 (en) Maximizing expected generalization for learning complex query concepts
US8538955B2 (en) Ranking expert responses and finding experts based on rank
US20090265290A1 (en) Optimizing ranking functions using click data
US10152517B2 (en) System and method for identifying similar media objects
US8959037B2 (en) Signature based system and methods for generation of personalized multimedia channels
US20030004966A1 (en) Business method and apparatus for employing induced multimedia classifiers based on unified representation of features reflecting disparate modalities
US7818315B2 (en) Re-ranking search results based on query log
US7127127B2 (en) System and method for adaptive video fast forward using scene generative models

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted