CN100410943C - Extraction for instant message subject content - Google Patents

Extraction for instant message subject content Download PDF

Info

Publication number
CN100410943C
CN100410943C CNB2005101344549A CN200510134454A CN100410943C CN 100410943 C CN100410943 C CN 100410943C CN B2005101344549 A CNB2005101344549 A CN B2005101344549A CN 200510134454 A CN200510134454 A CN 200510134454A CN 100410943 C CN100410943 C CN 100410943C
Authority
CN
China
Prior art keywords
key word
subject content
instant message
word
extracting method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2005101344549A
Other languages
Chinese (zh)
Other versions
CN1983252A (en
Inventor
李建成
梁柱
王麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CNB2005101344549A priority Critical patent/CN100410943C/en
Publication of CN1983252A publication Critical patent/CN1983252A/en
Application granted granted Critical
Publication of CN100410943C publication Critical patent/CN100410943C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

A method for picking up subject content of demand message includes picking up key word from demand message according to preset pick-up condition then selecting out subject content from picked up key word according to policy.

Description

A kind of subject content extracting method of instant message
Technical field
The present invention relates to computing machine and communication technical field, relate in particular to a kind of subject content extracting method of instant message.
Background technology
Along with popularizing of the development of Internet and network, the use of instant messaging (IM, Instant Messenger) software is slowly the indispensable online communation mode in people's routine work, the studying and living that becomes.By IM software, people can carry out the communication of literal, sound and video in real time.When the user carries out text chat by IM software, how current instant message is carried out the intelligent extraction of subject content, and the subject information how application fetches arrives, present IM software do not bring better experience and convenient to the user as yet.
In the text chat of IM software was used at present, still unmatchful chat content was carried out the application of the intelligent extraction of subject content.When the user in chat process, if want when institute's topics of interest content is carried out information search in the chat content, the user must manually select the also subject information of copy chat, and controlling oneself simultaneously starts search engine (as Google, Baidu etc.), the chat subject content is carried out information search.If the user thinks a result of search and shares the other side who gives chat, also necessary manually copy information, so mode has been brought great inconvenience to the user.
Secondly, in the Intelligent Extraction Technology of at present existing subject content based on literal, performance is bad on real-time and correlativity.So-called real-time refers to whether the subject content that captures is the theme of current chat content; Correlativity is promptly caught the accuracy of subject content.
Summary of the invention
The invention provides a kind of method, can't be in the prior art to solve to the problem of the subject content intelligent extraction in the instant messaging.
The invention provides a kind of method, further solve in the prior art problem that the subject content in can only the manual extraction instant messaging is used.
The invention provides following technical scheme:
A kind of subject content extracting method that is applied in the instant messaging comprises the steps:
A, with the frequency of occurrences is the highest in the instant message word as key word, the frequency of occurrences is the highest in the instant message word as key word, is extracted this key word from instant message;
B, calculate the degree of correlation that each key word extract occurs, this degree of correlation and threshold value compared, and greater than threshold value the time the key word of this degree of correlation correspondence as subject content.
Wherein:
Described the word that the frequency of occurrences is the highest in the instant message is comprised as key word: all literal and theme dictionary that instant message comprised are mated, and with the frequency of occurrences is the highest in the matching result word as key word.
Described instant message is the instant message that is present in the moving window.
Described moving window is provided with window size and 2 parameters of sliding speed, and moving window is according to these 2 real-time catching instant message of parameter.
The setting of described theme dictionary can be expanded, and can freely add, delete one or more dictionaries, also can increase or delete one or more words in dictionary.
Described the word that the frequency of occurrences is the highest in the instant message is comprised as key word: all literal to instant message carry out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
All literal of instant message are carried out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
Described key word is to be present in the key word formation of the first in first out with the length of preestablishing.
The described degree of correlation is the ratio of the length of the number of times that occurs in the key word formation of key word and key word formation.
After determining subject content, start search engine, search for and display of search results according to subject content.
Described display of search results for part or all of Search Results is shown in both sides, in many ways or on the instant message window of any one party.
Beneficial effect of the present invention is as follows:
When the user carries out instant messaging by IM software, can in time carry out the intelligent extraction of subject content, and the subject content that can application fetches arrives brings better experience and convenient for IM user to current instant message.
Description of drawings
Fig. 1 is a schematic flow sheet of the present invention.
Embodiment
The present invention carries out intelligent extraction according to instant message real-time among the IM to subject content, thereby obtains the subject content of the current chat of user.In the acquisition procedure of subject content, the present invention has used sliding window technique, key word formation and degree of subject relativity parameter, thereby has realized extracting the real-time and the correlativity of subject content.
The present invention has designed a moving window, is used to catch the key word in this moving window, and key word only carries out statistics and analysis to the instant message that is present in real time in the moving window as the foundation and the basis that finally obtain subject content.Moving window has a certain size, and slides with certain speed, and every slip is once just carried out statistics and analysis one time to the instant message in the moving window, and the least unit of statistical study is an instant message.
Moving window is provided with two parameters: window size (WindowSize) and sliding speed (Slide Velocity), unit all is the instant message number.For example, work as WindowSize=2, during SlideVelocity=2, expression keeps two instant messages are handled in the moving window, simultaneously, and when the new instant message of two of appearance, moving window is to two of lower slider, and 2 instant messages in the moving window are carried out statistics and analysis.Generally, sliding speed equals window size, because, if the partial record that sliding speed during less than window size, then can occur in the moving window is repeated statistical treatment; Otherwise, if sliding speed during greater than window size, the part instant message then can occur and be omitted, not by statistical treatment.
IM user can define this two variablees according to own demand, and the initial value of variable is set to: WindowSize=2, SlideVelocity=2.
When moving window slided at every turn, the instant message that all can newly obtain from moving window extracted key word.
Custom and common-use words according to people's text chat are provided with a theme dictionary, comprise the theme that weather dictionary, sight spot dictionary, physical culture dictionary, military dictionary etc. are commonly used.In each theme dictionary, collect the common word of related category.For example, in the weather dictionary, have: weather, weather, temperature, cold, sweltering heat etc.; In the dictionary of sight spot, have: Jiu Zhaigou, Zhangjiajie, Gulang Island, Mount Huang, the West Lake etc.Being set to and can expanding of theme dictionary can add or delete one or more dictionaries freely, also can increase or delete one or more words easily in one or more theme dictionaries.
Had after the theme dictionary, just can add up, extract key word.When moving window slided according to the parameter that sets in advance, all literal that will be present in instant message in the moving window mated in each theme dictionary.In matching process, note word that matches and the frequency that this word occurs, according to matching result, obtain the key word in this moving window.In matching process,, then therefrom select a word that the frequency of occurrences is the highest as final key word if a plurality of matching results are arranged.
Yet,, have the possibility that does not match any key word by above-mentioned coupling.
Therefore, after the coupling of finishing the theme dictionary,, then carry out the word frequency statistics operation if do not match key word.Promptly, when the instant message in the moving window can not find any coupling in the theme dictionary, illustrate that then user's chatting contents is not in theme dictionary scope, at this moment, all literal that instant message in the moving window is comprised carry out the word statistics, and note the highest word of the frequency of occurrences, this word as the key word that extracts.
Thus, after the coupling of finishing the theme dictionary and word frequency statistics, one extracts the key word in the current moving window surely.
By above-mentioned processing, each slip of moving window all can produce a key word, after moving window slides through several times, a plurality of key words can occur, then according to certain strategy, filters out one of them as subject content from these key words.
At first, design a key word formation, be used to deposit each the slip afterwards of moving window and the key word that produces.This key word formation is according to first in first out (First In First Out, formation rule FIFO), and have certain length restriction.By the length of formation is set, can control the time validity of key word, that is, in being chat before for a long time, extracts a key word, and so, this key word just should not become the subject content of current chat.So by the length restriction and the fifo queue rule of formation, the key word of joining the team in the recent period can make the accurate topic keyword of joining the team in early days team.Thus, the subject content that goes out based on the key word priority-queue statistic all is current up-to-date chat theme forever.The initial value of queue length is made as 5, and the expression formation keeps 5 up-to-date key words.
Secondly, in order to be used to characterize the frequency that similar theme occurs, introduce a degree of subject relativity, when the frequency that occurs is up to or surpass degree of correlation threshold value, just think that key word is exactly the subject content of current chat.The degree of correlation of each key word equals, and the number of times that this key word occurs in the key word formation is divided by the numerical value of the length gained of key word formation.For example, key word " West Lake " has occurred in the key word formation 4 times, and the length of formation is 5, and the degree of correlation of key word " West Lake " is 80% so, if predefined degree of correlation threshold value is 60%, then key word " West Lake " is just screened as subject content.
Be provided with after the key word queue length and the degree of correlation, just can have obtained key word.Whenever moving window slides once, all key words in the key word formation are carried out the calculating of the degree of correlation, take out the key word and the predefined degree of correlation threshold of degree of correlation maximum, if surpass this degree of correlation threshold value, then the key word of the maximal correlation degree of Qu Chuing is final topic keyword; On the contrary,, illustrate that then current chat does not have extractible theme theme, wait for the slip of moving window next time if do not surpass threshold value.
When each window slides, all obtain the subject content of user's chat in real time, in case the subject content of getting access to, then start the search engine of prior appointment, such as google or Baidu etc., this subject content is searched for, and part or all of (can be redefined for 3) of Search Results link is presented on the chat conversations window of IM software, may be displayed on the chat both sides, in many ways or on the chat conversations window of any one party, thus, IM user does not need own manual operation, just can inquire about fast current chat subject content easily, and share Query Result automatically.
Below in conjunction with accompanying drawing complete scheme of the present invention is described further, sees also Fig. 1, the method that provides of the present invention is:
Step 100: will be present in all literal that instant message comprised and theme dictionary coupling in the moving window;
Step 200: if match word, then that the frequency of occurrences is the highest word is pressed in the key word formation and direct execution in step 400 as key word, if do not match any word, then execution in step 300;
Step 300: all literal that instant message comprised are carried out the statistics of the frequency of occurrences, and the highest word of statistics medium frequency is pressed in the key word formation;
Step 400: calculate the degree of correlation of all key words in the key word formation, and from the result, take out the maximal correlation degree and compare with degree of correlation threshold value, if above degree of correlation threshold value, then the key word of this maximal correlation degree correspondence as subject content;
Step 500: use the subject content filter out, start the search engine of setting in advance, and Search Results shared automatically be presented at the both sides that chat, in many ways or on the chat conversations window of any one party.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (10)

1. the subject content extracting method of an instant message is characterized in that, comprises the steps:
A, with the frequency of occurrences is the highest in the instant message word as key word, from instant message, extract this key word;
B, calculate the degree of correlation that each key word extract occurs, this degree of correlation and threshold value compared, and greater than threshold value the time the key word of this degree of correlation correspondence as subject content.
2. subject content extracting method as claimed in claim 1 is characterized in that, described the word that the frequency of occurrences is the highest in the instant message is comprised as key word:
All literal and theme dictionary that instant message comprised are mated, and with the frequency of occurrences is the highest in the matching result word as key word.
3. subject content extracting method as claimed in claim 2 is characterized in that, described instant message is the instant message that is present in the moving window.
4. subject content extracting method as claimed in claim 3 is characterized in that, described moving window is provided with window size and 2 parameters of sliding speed, and moving window is according to these 2 real-time catching instant message of parameter.
5. subject content extracting method as claimed in claim 2 is characterized in that the setting of theme dictionary can be expanded, and can freely add, delete one or more dictionaries, also can increase or delete one or more words in dictionary.
6. subject content extracting method as claimed in claim 1 is characterized in that, described the word that the frequency of occurrences is the highest in the instant message is comprised as key word:
All literal of instant message are carried out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
7. subject content extracting method as claimed in claim 1 is characterized in that, described key word is to be present in the key word formation of the first in first out with the length of preestablishing.
8. as claim 1 or 7 described subject content extracting method, it is characterized in that the described degree of correlation is the ratio of the length of the number of times that occurs of key word and key word formation in the key word formation.
9. subject content extracting method as claimed in claim 1 is characterized in that, after determining subject content, starts search engine, searches for and display of search results according to subject content.
10. subject content extracting method as claimed in claim 9 is characterized in that, described display of search results for part or all of Search Results is shown in both sides, in many ways or on the instant message window of any one party.
CNB2005101344549A 2005-12-15 2005-12-15 Extraction for instant message subject content Active CN100410943C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005101344549A CN100410943C (en) 2005-12-15 2005-12-15 Extraction for instant message subject content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005101344549A CN100410943C (en) 2005-12-15 2005-12-15 Extraction for instant message subject content

Publications (2)

Publication Number Publication Date
CN1983252A CN1983252A (en) 2007-06-20
CN100410943C true CN100410943C (en) 2008-08-13

Family

ID=38165792

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005101344549A Active CN100410943C (en) 2005-12-15 2005-12-15 Extraction for instant message subject content

Country Status (1)

Country Link
CN (1) CN100410943C (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249380A1 (en) 2004-05-06 2008-10-09 Koninklijke Philips Electronics, N.V. Protection Mechanism for Spectroscopic Analysis of Biological Tissue
CN100579055C (en) * 2007-08-13 2010-01-06 腾讯科技(深圳)有限公司 Processing method and device for instant communication information including hyperlink
US8718610B2 (en) * 2008-12-03 2014-05-06 Sony Corporation Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages
CN102520853A (en) * 2011-11-29 2012-06-27 上海量明科技发展有限公司 Method, terminal and system for triggering instant messaging interaction interface
US8812527B2 (en) 2011-11-29 2014-08-19 International Business Machines Corporation Automatically recommending asynchronous discussion forum posts during a real-time collaboration
CN102594734A (en) * 2012-03-24 2012-07-18 上海量明科技发展有限公司 Method, terminal and system for starting instant communication interactive interface
CN103577587A (en) * 2013-11-08 2014-02-12 南京绿色科技研究院有限公司 News theme classification method
KR20160076201A (en) * 2014-12-22 2016-06-30 엘지전자 주식회사 Mobile terminal and method for controlling the same
CN104683218B (en) * 2015-02-11 2018-10-23 广州酷狗计算机科技有限公司 Service the method and device of early warning
CN106294725B (en) * 2016-08-08 2019-02-15 浪潮集团有限公司 A kind of key message extracting method based on mobile terminal instant messaging application
CN107770037A (en) * 2016-08-18 2018-03-06 丁雷 By group chat Content Transformation into the method for seminar, server, terminal and system
CN107453978B (en) * 2017-07-06 2021-04-27 深圳Tcl新技术有限公司 Group-based data statistical method, mobile terminal, server and storage medium
CN111352685B (en) * 2020-02-28 2024-04-09 北京百度网讯科技有限公司 Display method, device, equipment and storage medium of input method keyboard
CN111859119A (en) * 2020-06-30 2020-10-30 维沃移动通信有限公司 Information processing method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002058304A2 (en) * 2001-01-18 2002-07-25 Buzzcity Pte Ltd. Wireless messaging system and method
CN1518703A (en) * 2001-03-26 2004-08-04 ��Ѷ�Ƽ������ڣ����޹�˾ Instant messaging system and method
CN1574749A (en) * 2003-06-19 2005-02-02 微软公司 Method and device for instant messaging for multi-user computers
EP1515521A2 (en) * 2003-09-12 2005-03-16 Axel Druschel A method and an apparatus for automatic conversion of text-based messages to a query of internet-based applications
CN1620045A (en) * 2003-08-19 2005-05-25 罗技欧洲公司 Instant messenger presence and identity management
KR20050096422A (en) * 2004-03-30 2005-10-06 (주)퓨쳐인포넷 System and method for interlocking instant messaging services and web services

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002058304A2 (en) * 2001-01-18 2002-07-25 Buzzcity Pte Ltd. Wireless messaging system and method
CN1518703A (en) * 2001-03-26 2004-08-04 ��Ѷ�Ƽ������ڣ����޹�˾ Instant messaging system and method
CN1574749A (en) * 2003-06-19 2005-02-02 微软公司 Method and device for instant messaging for multi-user computers
CN1620045A (en) * 2003-08-19 2005-05-25 罗技欧洲公司 Instant messenger presence and identity management
EP1515521A2 (en) * 2003-09-12 2005-03-16 Axel Druschel A method and an apparatus for automatic conversion of text-based messages to a query of internet-based applications
KR20050096422A (en) * 2004-03-30 2005-10-06 (주)퓨쳐인포넷 System and method for interlocking instant messaging services and web services

Also Published As

Publication number Publication date
CN1983252A (en) 2007-06-20

Similar Documents

Publication Publication Date Title
CN100410943C (en) Extraction for instant message subject content
CN110457404B (en) Social media account classification method based on complex heterogeneous network
CN103617169B (en) A kind of hot microblog topic extracting method based on Hadoop
CN103761242B (en) Search method, searching system and natural language understanding system
CN103309998B (en) A kind of message query method and device, terminal device
WO2017186054A1 (en) Emoticon recommendation method and apparatus
US9087106B2 (en) Behavior targeting social recommendations
CN103116605A (en) Method and system of microblog hot events real-time detection based on detection subnet
CN103823844A (en) Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
CN106357416A (en) Group information recommendation method, device and terminal
CN104615627B (en) A kind of event public feelings information extracting method and system based on microblog
CA2753775C (en) Method and device for extracting characteristic relation circle from network
CN102867020A (en) Personal character trait-based friend making matching method
EP2606421A1 (en) Bulletin board data mapping and presentation
CN102750346A (en) Method, system and terminal device for recommending software
CN112235230B (en) Malicious traffic identification method and system
US20090024591A1 (en) Device, method and program for producing related words dictionary, and content search device
CN102193949A (en) Search method, device and system
CN106407287A (en) Multimedia resource pushing method and system
CN113032557A (en) Microblog hot topic discovery method based on frequent word set and BERT semantics
CN106372083B (en) A kind of method and system that controversial news clue is found automatically
CN114881041A (en) Multi-dimensional intelligent extraction system for microblog big data hot topics
CN102810103A (en) Search result sharing method and system
US20140032675A1 (en) Method and system for pushing associated users in social networking service network
CN112199601B (en) News recommendation method based on event popularity of mass news data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant