CN101119326B - Method and device for managing instant communication conversation record - Google Patents
Method and device for managing instant communication conversation record Download PDFInfo
- Publication number
- CN101119326B CN101119326B CN2006101095396A CN200610109539A CN101119326B CN 101119326 B CN101119326 B CN 101119326B CN 2006101095396 A CN2006101095396 A CN 2006101095396A CN 200610109539 A CN200610109539 A CN 200610109539A CN 101119326 B CN101119326 B CN 101119326B
- Authority
- CN
- China
- Prior art keywords
- conversation recording
- session
- conversation
- session theme
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a managing method for instant communication conversation records, which aims at solving the prior problems that inquiring information in the conversation records is not only fussy but also inefficiency for instant communicating users. The method comprises the following steps: getting and assorting the conversation records of users to get sample groups; correlating analyzing each sample group to get a corresponding assorted group that contains eigenvector corresponding with the conversation records in the sample groups; determining the conversation topic corresponding with the assorted group according to the emerging frequency of the words in the assorted groups, and relating the conversation topic to the conversation records corresponding with the assorted group; searching for the conversation topic that matches with the key word inputted by the users, and displaying the conversation records that relates with the conversation topic to the users. The present invention also discloses a managing device used for instant communication conversation records.
Description
Technical field
The present invention relates to communication and field of computer technology, relate in particular to a kind of management method and device of instant telecommunication session record.
Background technology
Along with the continuous development of instant messaging (IM) technology with popularize, increasing user not only adopts IM software to exchange with other users in network, the instrument that IM software can also be encountered problems in other user's works of consultation or study as the user, simultaneously, conversation recording between the user is accompanied by under interchange between the user preserves in the IM system, provides data for the user searches the information of oneself paying close attention to later on.
For example: when user A seeks advice from a problem to user B, user B has returned the answer of problem, as user C during with regard to same problem counsel user A or user B, user A need check the relevant information in the conversation recording with user B, when perhaps user B need check relevant information in the conversation recording with user A, user A or user B need manually search relative recording in conversation recording, the time interval more when conversation recording or user A and user C counseling problem is when longer, adopt the method for prior art, not only increased the workload of manually searching, and search efficiency is lower.
If user A seeks advice from a plurality of users with regard to same problem, when user A wish from a plurality of users' conversation recording during Query Information, adopt the method for prior art, when the instantaneous communication system that uses as the user provides the instantaneous communication system of conversation recording look facility, user A can only manually check a plurality of users' conversation recording one by one, finds the information of oneself being concerned about.Even user A uses some other that instantaneous communication system of the data importing/export function of user conversation record is provided, user A also needs a plurality of users' conversation recording data are derived earlier, in derived data, inquire about then, user A also can inquire about in derived data according to the keyword of the information of oneself being concerned about, but adopt the mode of keyword also can only navigate to the paragraph that comprises this keyword, this paragraph is not necessarily relevant with the information that the user is concerned about, can not realize that the user effectively searches information in conversation recording.
Summary of the invention
The invention provides a kind of management method and device of instant telecommunication session record, in order to solve the instant communication user that exists in the prior art in conversation recording during Query Information, the problem that not only complex operation, and search efficiency is low.
The invention provides following technical scheme:
A kind of management method of instant communication conversation recording comprises the steps:
Obtain user's conversation recording and it is classified and obtain sample set;
Generate every conversation recording characteristic of correspondence vector in the described sample set, analyze the correlation of each characteristic vector and other characteristic vectors, according to described correlation to the characteristic vector generation sort merge of classifying;
Determine the session theme of sort merge correspondence according to the frequency that word in each sort merge occurs, and make this session theme be associated with the conversation recording of sort merge correspondence; And
The session theme of the keyword lookup of importing during according to user inquiring and this keyword coupling, and the conversation recording related with the session theme that will find presented to the user.
Wherein, correlation behind the generation session theme between the further analysis session theme, and correlation merged into same session theme greater than the session theme of predetermined threshold, make the session theme after the merging related with the pairing conversation recording of merged all session themes.
By different session subscriber to the conversation recording generation sample set of classifying.
Preferable, according to the blanking time of conversation recording in the described sample set, further a sample set is divided into a plurality of different sample sets.
Generate every conversation recording characteristic of correspondence vector in the sample set, analyze the correlation of each characteristic vector and other characteristic vectors, specifically comprise:
Every conversation recording is carried out word segmentation processing, and the word of deleting no practical significance in this conversation recording obtains S set, merges the synonym among the S, and carries out vectorization, then generates and this conversation recording characteristic of correspondence vector
(W
1, W
2, W
3... Wn), wherein Wi is the weight of i element, and each element is the word among the S;
Calculate and each conversation recording characteristic of correspondence vector
In the weight of each speech, according to the correlation of weight calculation each characteristic vector of each speech in its characteristic vector of forming described characteristic vector.
Determine the session theme of this sort merge greater than the word of predetermined threshold according to the frequency of occurrences in the sort merge.
A kind of management devices of instant communication conversation recording comprises:
Be used to store the unit of user conversation record;
Be used for described conversation recording classified and generate the unit of sample set;
Be used for generating every conversation recording characteristic of correspondence of described sample set vector, analyze the correlation of each characteristic vector and other characteristic vectors, according to described correlation characteristic vector being classified generates the unit of sort merge;
Be used for determining the session theme of described sort merge correspondence, and make this session theme be associated with the unit of the conversation recording of sort merge correspondence; And
The session theme of the keyword lookup of importing when being used for and this keyword coupling, and the conversation recording related that will the find unit of presenting to the user with the session theme according to user inquiring.
Preferable, described device also comprises:
Be used for the correlation between the analysis session theme, and correlation is merged into same session theme greater than the session theme of predetermined threshold, and session theme after will merging and the related unit of the merged pairing conversation recording of all session themes.
Beneficial effect of the present invention is as follows:
The present invention classifies to the user conversation record behind the generation sample set, respectively each sample set is carried out the session theme that correlation analysis generates the respective classified combination and determines the sort merge correspondence, and the conversation recording that the session theme is associated with the sort merge correspondence.After adopting the present invention, when the user need be from conversation recording during Query Information, the user only need import keyword, system will search the session theme with this keyword coupling automatically, and the associated conversation recording of session theme that finds presented to the user, troublesome operation when not only having avoided user's craft Query Information, and improved search efficiency.
Description of drawings
Fig. 1 is the management devices structural representation of user conversation record in the embodiment of the invention;
Fig. 2 is the schematic diagram of user conversation record management method in the embodiment of the invention;
The process chart of Fig. 3 in the embodiment of the invention user conversation record being classified;
Fig. 4 is for carrying out the process chart of correlation analysis to sample set in the embodiment of the invention.
Embodiment
In order to solve in the prior art, instant communication user is in conversation recording during Query Information, complex operation not only, and the low problem of search efficiency, in the present embodiment user conversation is write down the generation sample set of classifying, respectively each sample set is carried out the session theme that correlation analysis generates the respective classified combination and determines the sort merge correspondence, and the session theme is associated with the conversation recording of sort merge correspondence, and the session theme that mates according to the keyword lookup of user input and this keyword, and the associated conversation recording of session theme that finds presented to the user.
Consult the management devices structural representation that Figure 1 shows that user conversation record in the present embodiment, comprising: memory cell 101, taxon 102, analytic unit 103, session thematic unit 104, merge cells 105 and query unit 106.
Consult the schematic diagram that Figure 2 shows that user conversation record management method in the present embodiment, comprising:
Correlation between step 204, the analysis session theme, and correlation merged into same session theme greater than the session theme of predetermined threshold makes session theme after the merging be associated with the conversation recording of merged all session theme correspondences.
In step 201, the handling process that conversation recording is classified is consulted shown in Figure 3, and processing procedure is as follows:
Handling the sample set TS that generates through step 303 is the sample set that carries out correlation analysis.
Consult shown in Figure 4ly, it is as follows to adopt KNN (K Nearest Neighbor, K nearest-neighbors) algorithm to carry out the processing procedure of correlation analysis to sample set:
Wherein,
For speech t in characteristic vector
In weight, and
For speech t in characteristic vector
In word frequency, N is the sum of conversation recording TR among each sample set TS, n
tFor occurring the conversation recording TR number of speech t among each sample set TS, denominator is a normalization factor.
Coefficient correlation between step 403, calculating and each conversation recording characteristic of correspondence vector is determined K the characteristic vector the most similar to each characteristic vector according to calculating the gained coefficient correlation.
During concrete enforcement, adopt following formula
Calculate the coefficient correlation between each conversation recording characteristic of correspondence vector, wherein, Sim (d
i, d
j) be characteristic vector d
iWith characteristic vector d
jCoefficient correlation, W
IkAnd W
JkBe respectively characteristic vector d
iWith characteristic vector d
jThe weights of k element.
By calculating, obtain the coefficient correlation between each characteristic vector, according to this coefficient correlation, will be combined as a set respectively with the maximally related K of each a characteristic vector characteristic vector, the value of K can be determined according to practical application.
Classification C is the set that each conversation recording characteristic of correspondence vector is formed among the sample set TS.
Method one: when classification C is sky, then generate a vector set c among the classification C in the following way, then c is added among the C that classifies:
Characteristic vector corresponding to conversation recording
And characteristic vector
Belong to the set that the most similar K of the other side neighbours form respectively, then
With
Belong to same class c, generate class c and such and characteristic vector
And characteristic vector
Classification C is added class c in corresponding conversation recording association then, and the characteristic vector among each class c is formed a sort merge.
Method two: when classification C is not sky, then calculate characteristic vector corresponding to each conversation recording
Belong to the weight of certain class c (c ∈ C), adopt following formula:
Wherein,
Be characteristic vector corresponding to a conversation recording,
For with
Characteristic vector in the set of the most similar K neighbours' composition,
For
Characteristic vector maximally related with it
Coefficient correlation, this coefficient correlation can obtain according to step 403 result of calculation,
Be the category attribute function, if characteristic vector
Belong to class C
j,
Functional value be 1, otherwise be 0.According to calculating
The comparative feature vector
At all kinds of C
jIn weights, with characteristic vector
Assign to the bigger class C of weights
jIn, and with such C
jWith characteristic vector
Corresponding conversation recording association.
When adopting method two, if characteristic vector
All very little with the degree of correlation of existing each class c, then can adopt the mode of method one to generate a new class c ', and class c ' is joined among the classification C, and with class c ' and eigen vector
Corresponding conversation recording association.
After each characteristic vector handled, characteristic vector all is divided in the class, by all kinds of sort merges of forming respectively.
N word that the frequency of occurrences in each sort merge that generates is the highest or frequency are defined as the session theme of this sort merge greater than the word of a, and N value and a value are determined according to practical application.
Each sample set TS is carried out generating after the above-mentioned processing session theme of sort merge and this sort merge correspondence, when the session theme that generates is carried out correlation analysis, with the sample set of session theme as the KNN algorithm, calculate in this set the weight of each speech in this session theme in each session theme, according to weight, utilize the formula in the step 403, calculate the coefficient correlation of each session theme, the session theme of coefficient correlation greater than setting threshold merged.
When presenting conversation recording, the session record is arranged, also can be arranged according to the weight order of conversation recording in the session theme according to different session subscriber to the user.
Adopted the KNN algorithm that sample set is carried out correlation analysis among the above embodiment, but the present invention is not limited only to adopt the KNN algorithm that sample set is analyzed.Conversation recording is carried out the method for correlation analysis and can also use training algorithm and the sorting technique based on vector space such as vector machine algorithm, neural network algorithm and bayesian algorithm.When for example adopting bayesian algorithm, calculate each speech in each conversation recording character pair vector and appear at probability in certain session, calculate the probability that characteristic vector belongs to certain session according to Bayesian formula then, it is joined in the session of probability maximum.
Adopt the present invention, when user's information that inquiring user is concerned about in conversation recording, the user only need import keyword, system will inquire about the session theme with the keyword coupling automatically, and the conversation recording related with this session theme presented to the user, troublesome operation when not only having avoided user's craft Query Information, and improved search efficiency.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.
Claims (8)
1. the management method of an instant communication conversation recording is characterized in that, comprises the steps:
Obtain user's conversation recording and it is classified and obtain sample set;
Generate every conversation recording characteristic of correspondence vector in the described sample set, analyze the correlation of each characteristic vector and other characteristic vectors, according to described correlation to the characteristic vector generation sort merge of classifying;
Determine the session theme of sort merge correspondence according to the frequency that word in each sort merge occurs, and make this session theme be associated with the conversation recording of sort merge correspondence; And
The session theme of the keyword lookup of importing during according to user inquiring and this keyword coupling, and the conversation recording related with the session theme that will find presented to the user.
2. the method for claim 1, it is characterized in that, correlation behind the generation session theme between the further analysis session theme, and correlation merged into same session theme greater than the session theme of predetermined threshold, make the session theme after the merging related with the pairing conversation recording of merged all session themes.
3. method as claimed in claim 1 or 2 is characterized in that, by different session subscriber to the conversation recording generation sample set of classifying.
4. method as claimed in claim 3 is characterized in that, according to the blanking time of conversation recording in the described sample set, further a sample set is divided into a plurality of different sample sets.
5. the method for claim 1 is characterized in that, every conversation recording characteristic of correspondence vector in the described generation sample set is analyzed the correlation of each characteristic vector and other characteristic vectors, specifically comprises:
Every conversation recording is carried out word segmentation processing, and the word of deleting no practical significance in this conversation recording obtains S set, merges the synonym among the S, and carries out vectorization, then generates and this conversation recording characteristic of correspondence vector
(W
1, W
2, W
3... Wn), wherein Wi is the weight of i element, and each element is the word among the S;
6. the method for claim 1 is characterized in that, determines the session theme of this sort merge greater than the word of predetermined threshold according to the frequency of occurrences in the sort merge.
7. the management devices of an instant communication conversation recording is characterized in that, comprising:
Be used to store the unit of user conversation record;
Be used for described conversation recording classified and generate the unit of sample set;
Be used for generating every conversation recording characteristic of correspondence of described sample set vector, analyze the correlation of each characteristic vector and other characteristic vectors, according to described correlation characteristic vector being classified generates the unit of sort merge;
Be used for determining the session theme of described sort merge correspondence, and make this session theme be associated with the unit of the conversation recording of sort merge correspondence; And
The session theme of the keyword lookup of importing when being used for and this keyword coupling, and the conversation recording related that will the find unit of presenting to the user with the session theme according to user inquiring.
8. device as claimed in claim 7 is characterized in that, also comprises:
Be used for the correlation between the analysis session theme, and correlation is merged into same session theme greater than the session theme of predetermined threshold, and session theme after will merging and the related unit of the merged pairing conversation recording of all session themes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2006101095396A CN101119326B (en) | 2006-08-04 | 2006-08-04 | Method and device for managing instant communication conversation record |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2006101095396A CN101119326B (en) | 2006-08-04 | 2006-08-04 | Method and device for managing instant communication conversation record |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101119326A CN101119326A (en) | 2008-02-06 |
CN101119326B true CN101119326B (en) | 2010-07-28 |
Family
ID=39055265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2006101095396A Active CN101119326B (en) | 2006-08-04 | 2006-08-04 | Method and device for managing instant communication conversation record |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101119326B (en) |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102246175A (en) * | 2008-12-12 | 2011-11-16 | 皇家飞利浦电子股份有限公司 | An assertion-based record linkage in distributed and autonomous healthcare environments |
CN101483620B (en) * | 2009-02-17 | 2012-09-26 | 腾讯科技(深圳)有限公司 | Session reservation method and system in instant communication tool |
CN101997964A (en) * | 2009-08-13 | 2011-03-30 | 中国电信股份有限公司 | Processing method of mobile communication terminal and contact records thereof |
CN103078781A (en) * | 2011-10-25 | 2013-05-01 | 国际商业机器公司 | Method for instant messaging system and instant messaging system |
CN102646134A (en) * | 2012-03-29 | 2012-08-22 | 百度在线网络技术(北京)有限公司 | Method and device for determining message session in message record |
CN103425648B (en) * | 2012-05-15 | 2016-04-13 | 腾讯科技(深圳)有限公司 | The disposal route of relation loop and system |
CN103279465B (en) * | 2012-12-18 | 2018-05-25 | 北京奇虎科技有限公司 | The control method and device of communication historical data |
CN103279466B (en) * | 2012-12-18 | 2018-01-26 | 北京奇虎科技有限公司 | Control the method and device of communication historical data |
CN105024906B (en) * | 2014-04-21 | 2018-10-02 | 腾讯科技(深圳)有限公司 | The storage of group's message, querying method and system in social networks |
CN105450497A (en) * | 2014-07-31 | 2016-03-30 | 国际商业机器公司 | Method and device for generating clustering model and carrying out clustering based on clustering model |
CN104361003A (en) * | 2014-10-10 | 2015-02-18 | 金硕澳门离岸商业服务有限公司 | Method and device for classified displaying of chat records |
CN104462518B (en) * | 2014-12-22 | 2018-10-19 | 百度在线网络技术(北京)有限公司 | Method and apparatus for being labeled to IM information |
CN105141502A (en) * | 2015-08-12 | 2015-12-09 | 深圳前海珩昌科技有限公司 | Method and device for managing instant communication process |
CN105049336A (en) * | 2015-08-12 | 2015-11-11 | 深圳前海珩昌科技有限公司 | Method and system for processing instant communication messages, server and client |
CN106487640A (en) * | 2015-08-25 | 2017-03-08 | 平安科技(深圳)有限公司 | Many communication modules control method and server |
CN106888236B (en) * | 2015-12-15 | 2021-08-31 | 腾讯科技(深圳)有限公司 | Session management method and session management device |
CN105589625B (en) * | 2015-12-21 | 2020-06-02 | 惠州Tcl移动通信有限公司 | Processing method and device of social media message and communication terminal |
CN105959205A (en) * | 2016-04-29 | 2016-09-21 | 杨夫春 | Chatting records keeping method |
CN106599147A (en) * | 2016-12-06 | 2017-04-26 | 庄爱芹 | Method and device for browser browsing history management |
CN106777013B (en) * | 2016-12-07 | 2020-09-11 | 科大讯飞股份有限公司 | Conversation management method and device |
CN108737240A (en) * | 2017-04-18 | 2018-11-02 | 阿里巴巴集团控股有限公司 | The method that the method, apparatus and group that chat group automatically creates create |
CN111357245B (en) * | 2017-11-15 | 2022-08-09 | 华为技术有限公司 | Information searching method, terminal, network equipment and system |
CN111698143B (en) * | 2019-03-14 | 2022-12-16 | 阿里巴巴集团控股有限公司 | Information processing method, information display method and device |
CN110138645B (en) * | 2019-03-29 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Session message display method, device, equipment and storage medium |
CN110781930A (en) * | 2019-10-14 | 2020-02-11 | 西安交通大学 | User portrait grouping and behavior analysis method and system based on log data of network security equipment |
CN112769673A (en) * | 2019-11-05 | 2021-05-07 | 钉钉控股(开曼)有限公司 | Communication record generation, recommendation and display method and device |
CN111327518B (en) * | 2020-01-21 | 2022-10-11 | 上海掌门科技有限公司 | Method and equipment for splicing messages |
CN111708866B (en) * | 2020-08-24 | 2020-12-11 | 北京世纪好未来教育科技有限公司 | Session segmentation method and device, electronic equipment and storage medium |
CN111798870A (en) * | 2020-09-08 | 2020-10-20 | 共道网络科技有限公司 | Session link determining method, device and equipment and storage medium |
CN113113017B (en) * | 2021-04-08 | 2024-04-09 | 百度在线网络技术(北京)有限公司 | Audio processing method and device |
CN113595886A (en) * | 2021-07-29 | 2021-11-02 | 北京达佳互联信息技术有限公司 | Instant messaging message processing method and device, electronic equipment and storage medium |
CN114691830B (en) * | 2022-03-31 | 2022-12-20 | 江苏冬云云计算股份有限公司 | Network security analysis method and system based on big data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1584883A (en) * | 2004-05-27 | 2005-02-23 | 威盛电子股份有限公司 | Related document connecting managing system, method and recording media |
CN1609859A (en) * | 2004-11-26 | 2005-04-27 | 孙斌 | Search result clustering method |
CN1741012A (en) * | 2004-08-23 | 2006-03-01 | 富士施乐株式会社 | Test search apparatus and method |
-
2006
- 2006-08-04 CN CN2006101095396A patent/CN101119326B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1584883A (en) * | 2004-05-27 | 2005-02-23 | 威盛电子股份有限公司 | Related document connecting managing system, method and recording media |
CN1741012A (en) * | 2004-08-23 | 2006-03-01 | 富士施乐株式会社 | Test search apparatus and method |
CN1609859A (en) * | 2004-11-26 | 2005-04-27 | 孙斌 | Search result clustering method |
Non-Patent Citations (2)
Title |
---|
JP特开2005-173847A 2005.06.30 |
同上. |
Also Published As
Publication number | Publication date |
---|---|
CN101119326A (en) | 2008-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101119326B (en) | Method and device for managing instant communication conversation record | |
CN103678670B (en) | Micro-blog hot word and hot topic mining system and method | |
CN105808590B (en) | Search engine implementation method, searching method and device | |
JP5092165B2 (en) | Data construction method and system | |
US8543380B2 (en) | Determining a document specificity | |
Weng et al. | Using text classification and multiple concepts to answer e-mails | |
CN104573130B (en) | The entity resolution method and device calculated based on colony | |
CN109101479A (en) | A kind of clustering method and device for Chinese sentence | |
CN107729336A (en) | Data processing method, equipment and system | |
CN108549647B (en) | Method for realizing active prediction of emergency in mobile customer service field without marking corpus based on SinglePass algorithm | |
CN104731954A (en) | Music recommendation method and system based on group perspective | |
CN101621391A (en) | Method and system for classifying short texts based on probability topic | |
US20110191335A1 (en) | Method and system for conducting legal research using clustering analytics | |
CN116455861B (en) | Big data-based computer network security monitoring system and method | |
CN112257419A (en) | Intelligent retrieval method and device for calculating patent document similarity based on word frequency and semantics, electronic equipment and storage medium thereof | |
EP2045732A2 (en) | Determining the depths of words and documents | |
CN105787662A (en) | Mobile application software performance prediction method based on attributes | |
CN112149422A (en) | Enterprise news dynamic monitoring method based on natural language | |
CN105159898A (en) | Searching method and searching device | |
JP2005092442A (en) | Multi-dimensional space model expressing device and method | |
CN105512270B (en) | Method and device for determining related objects | |
Goldberg et al. | CASTLE: crowd-assisted system for text labeling and extraction | |
CN113761104A (en) | Method and device for detecting entity relationship in knowledge graph and electronic equipment | |
CN103793448B (en) | Article information providing method and system | |
Siegen | Virtual Citation Proximity (VCP): Calculating Co-Citation-Proximity-Based Document Relatedness for Uncited Documents with Machine Learning (preprint) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |