CN1983252A - Extraction for instant message subject content - Google Patents

Extraction for instant message subject content Download PDF

Info

Publication number
CN1983252A
CN1983252A CN 200510134454 CN200510134454A CN1983252A CN 1983252 A CN1983252 A CN 1983252A CN 200510134454 CN200510134454 CN 200510134454 CN 200510134454 A CN200510134454 A CN 200510134454A CN 1983252 A CN1983252 A CN 1983252A
Authority
CN
China
Prior art keywords
subject content
key word
instant message
extracting method
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200510134454
Other languages
Chinese (zh)
Other versions
CN100410943C (en
Inventor
李建成
梁柱
王麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CNB2005101344549A priority Critical patent/CN100410943C/en
Publication of CN1983252A publication Critical patent/CN1983252A/en
Application granted granted Critical
Publication of CN100410943C publication Critical patent/CN100410943C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A method for picking up subject content of demand message includes picking up key word from demand message according to preset pick-up condition then selecting out subject content from picked up key word according to policy.

Description

A kind of subject content extracting method of instant message
Technical field
The present invention relates to computing machine and communication technical field, relate in particular to a kind of subject content extracting method of instant message.
Background technology
Along with popularizing of the development of Internet and network, the use of instant messaging (IM, Instant Messenger) software is slowly the indispensable online communation mode in people's routine work, the studying and living that becomes.By IM software, people can carry out the communication of literal, sound and video in real time.When the user carries out text chat by IM software, how current instant message is carried out the intelligent extraction of subject content, and the subject information how application fetches arrives, present IM software do not bring better experience and convenient to the user as yet.
In the text chat of IM software was used at present, still unmatchful chat content was carried out the application of the intelligent extraction of subject content.When the user in chat process, if want when institute's topics of interest content is carried out information search in the chat content, the user must manually select the also subject information of copy chat, initiates self search engine simultaneously (as Google, Baidu etc.), the chat subject content is carried out information search.If the user thinks a result of search and shares the other side who gives chat, also necessary manually copy information, so mode has been brought great inconvenience to the user.
Secondly, in the Intelligent Extraction Technology of at present existing subject content based on literal, performance is bad on real-time and correlativity.So-called real-time refers to whether the subject content that captures is the theme of current chat content; Correlativity is promptly caught the accuracy of subject content.
Summary of the invention
The invention provides a kind of method, can't be in the prior art to solve to the problem of the subject content intelligent extraction in the instant messaging.
The invention provides a kind of method, further solve in the prior art problem that the subject content in can only the manual extraction instant messaging is used.
The invention provides following technical scheme:
A kind of subject content extracting method that is applied in the instant messaging comprises the steps:
A, the predetermined extraction conditions of basis extract key word from instant message;
B, from the key word that extracts, determine subject content according to strategy.
Wherein:
Described predetermined extraction conditions is: all literal and theme dictionary that instant message comprised are mated, and with the frequency of occurrences is the highest in the matching result word as key word.
Described instant message is the instant message that is present in the moving window.
Described moving window is provided with window size and 2 parameters of sliding speed, and moving window is according to these 2 real-time catching instant message of parameter.
The setting of described theme dictionary can be expanded, and can freely add, delete one or more dictionaries, also can increase or delete one or more words in dictionary.
Described predetermined extraction conditions is meant that all literal to instant message carry out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
All literal of instant message are carried out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
Described strategy is: calculate the degree of correlation that each key word extract occurs, this degree of correlation and threshold value compared, and greater than threshold value the time the key word of this degree of correlation correspondence as subject content.
Described key word is to be present in the key word formation of the first in first out with the length of preestablishing.
The described degree of correlation is the ratio of the length of the number of times that occurs in the key word formation of key word and key word formation.
After determining subject content, start search engine, search for and display of search results according to subject content.
Described display of search results for part or all of Search Results is shown in both sides, in many ways or on the instant message window of any one party.
Beneficial effect of the present invention is as follows:
When the user carries out instant messaging by IM software, can in time carry out the intelligent extraction of subject content, and the subject content that can application fetches arrives brings better experience and convenient for IM user to current instant message.
Description of drawings
Fig. 1 is a schematic flow sheet of the present invention.
Embodiment
The present invention carries out intelligent extraction according to instant message real-time among the IM to subject content, thereby obtains the subject content of the current chat of user.In the acquisition procedure of subject content, the present invention has used sliding window technique, key word formation and degree of subject relativity parameter, thereby has realized extracting the real-time and the correlativity of subject content.
The present invention has designed a moving window, is used to catch the key word in this moving window, and key word only carries out statistics and analysis to the instant message that is present in real time in the moving window as the foundation and the basis that finally obtain subject content.Moving window has a certain size, and slides with certain speed, and every slip is once just carried out statistics and analysis one time to the instant message in the moving window, and the least unit of statistical study is an instant message.
Moving window is provided with two parameters: window size (WindowSize) and sliding speed (SlideVelocity), unit all is the instant message number.For example, work as WindowSize=2, during SlideVelocity=2, expression keeps two instant messages are handled in the moving window, simultaneously, and when the new instant message of two of appearance, moving window is to two of lower slider, and 2 instant messages in the moving window are carried out statistics and analysis.Generally, sliding speed equals window size, because, if the partial record that sliding speed during less than window size, then can occur in the moving window is repeated statistical treatment; Otherwise, if sliding speed during greater than window size, the part instant message then can occur and be omitted, not by statistical treatment.
IM user can define this two variablees according to own demand, and the initial value of variable is set to: WindowSize=2, SlideVelocity=2.
When moving window slided at every turn, the instant message that all can newly obtain from moving window extracted key word.
Custom and common-use words according to people's text chat are provided with a theme dictionary, comprise the theme that weather dictionary, sight spot dictionary, physical culture dictionary, military dictionary etc. are commonly used.In each theme dictionary, collect the common word of related category.For example, in the weather dictionary, have: weather, weather, temperature, cold, sweltering heat etc.; In the dictionary of sight spot, have: Jiu Zhaigou, Zhangjiajie, Gulang Island, Mount Huang, the West Lake etc.Being set to and can expanding of theme dictionary can add or delete one or more dictionaries freely, also can increase or delete one or more words easily in one or more theme dictionaries.
Had after the theme dictionary, just can add up, extract key word.When moving window slided according to the parameter that sets in advance, all literal that will be present in instant message in the moving window mated in each theme dictionary.In matching process, note word that matches and the frequency that this word occurs, according to matching result, obtain the key word in this moving window.In matching process,, then therefrom select a word that the frequency of occurrences is the highest as final key word if a plurality of matching results are arranged.
Yet,, have the possibility that does not match any key word by above-mentioned coupling.
Therefore, after the coupling of finishing the theme dictionary,, then carry out the word frequency statistics operation if do not match key word.Promptly, when the instant message in the moving window can not find any coupling in the theme dictionary, illustrate that then user's chatting contents is not in theme dictionary scope, at this moment, all literal that instant message in the moving window is comprised carry out the word statistics, and note the highest word of the frequency of occurrences, this word as the key word that extracts.
Thus, after the coupling of finishing the theme dictionary and word frequency statistics, one extracts the key word in the current moving window surely.
By above-mentioned processing, each slip of moving window all can produce a key word, after moving window slides through several times, a plurality of key words can occur, then according to certain strategy, filters out one of them as subject content from these key words.
At first, design a key word formation, be used to deposit each the slip afterwards of moving window and the key word that produces.This key word formation is according to first in first out (First In First Out, formation rule FIFO), and have certain length restriction.By the length of formation is set, can control the time validity of key word, that is, in being chat before for a long time, extracts a key word, and so, this key word just should not become the subject content of current chat.So by the length restriction and the fifo queue rule of formation, the key word of joining the team in the recent period can make the accurate topic keyword of joining the team in early days team.Thus, the subject content that goes out based on the key word priority-queue statistic all is current up-to-date chat theme forever.The initial value of queue length is made as 5, and the expression formation keeps 5 up-to-date key words.
Secondly, in order to be used to characterize the frequency that similar theme occurs, introduce a degree of subject relativity, when the frequency that occurs is up to or surpass degree of correlation threshold value, just think that key word is exactly the subject content of current chat.The degree of correlation of each key word equals, and the number of times that this key word occurs in the key word formation is divided by the numerical value of the length gained of key word formation.For example, key word " West Lake " has occurred in the key word formation 4 times, and the length of formation is 5, and the degree of correlation of key word " West Lake " is 80% so, if predefined degree of correlation threshold value is 60%, then key word " West Lake " is just screened as subject content.
Be provided with after the key word queue length and the degree of correlation, just can have obtained key word.Whenever moving window slides once, all key words in the key word formation are carried out the calculating of the degree of correlation, take out the key word and the predefined degree of correlation threshold of degree of correlation maximum, if surpass this degree of correlation threshold value, then the key word of the maximal correlation degree of Qu Chuing is final topic keyword; On the contrary,, illustrate that then current chat does not have extractible theme theme, wait for the slip of moving window next time if do not surpass threshold value.
When each window slides, all obtain the subject content of user's chat in real time, in case the subject content of getting access to, then start the search engine of prior appointment, such as google or Baidu etc., this subject content is searched for, and part or all of (can be redefined for 3) of Search Results link is presented on the chat conversations window of IM software, may be displayed on the chat both sides, in many ways or on the chat conversations window of any one party, thus, IM user does not need own manual operation, just can inquire about fast current chat subject content easily, and share Query Result automatically.
Below in conjunction with accompanying drawing complete scheme of the present invention is described further, sees also Fig. 1, the method that provides of the present invention is:
Step 100: will be present in all literal that instant message comprised and theme dictionary coupling in the moving window;
Step 200: if match word, then that the frequency of occurrences is the highest word is pressed in the key word formation and direct execution in step 400 as key word, if do not match any word, then execution in step 300;
Step 300: all literal that instant message comprised are carried out the statistics of the frequency of occurrences, and the highest word of statistics medium frequency is pressed in the key word formation;
Step 400: calculate the degree of correlation of all key words in the key word formation, and from the result, take out the maximal correlation degree and compare with degree of correlation threshold value, if above degree of correlation threshold value, then the key word of this maximal correlation degree correspondence as subject content;
Step 500: use the subject content filter out, start the search engine of setting in advance, and Search Results shared automatically be presented at the both sides that chat, in many ways or on the chat conversations window of any one party.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (11)

1, a kind of subject content extracting method of instant message is characterized in that, comprises the steps:
A, the predetermined extraction conditions of basis extract key word from instant message;
B, from the key word that extracts, determine subject content according to strategy.
2, subject content extracting method as claimed in claim 1 is characterized in that, described predetermined extraction conditions is: all literal and theme dictionary that instant message comprised are mated, and with the frequency of occurrences is the highest in the matching result word as key word.
3, subject content extracting method as claimed in claim 2 is characterized in that, described instant message is the instant message that is present in the moving window.
4, subject content extracting method as claimed in claim 3 is characterized in that, described moving window is provided with window size and 2 parameters of sliding speed, and moving window is according to these 2 real-time catching instant message of parameter.
5, subject content extracting method as claimed in claim 2 is characterized in that, the setting of theme dictionary can be expanded, and can freely add, delete one or more dictionaries, also can increase or delete one or more words in dictionary.
6, subject content extracting method as claimed in claim 1 is characterized in that, described predetermined extraction conditions is meant that all literal to instant message carry out word occurrence rate statistics, and the highest word of the frequency of occurrences as key word.
7, subject content extracting method as claimed in claim 1, it is characterized in that, described strategy is: calculate the degree of correlation that each key word extract occurs, this degree of correlation and threshold value compared, and greater than threshold value the time the key word of this degree of correlation correspondence as subject content.
8, subject content extracting method as claimed in claim 7 is characterized in that, described key word is to be present in the key word formation of the first in first out with the length of preestablishing.
As claim 7 or 8 described subject content extracting method, it is characterized in that 9, the described degree of correlation is the ratio of the length of the number of times that occurs of key word and key word formation in the key word formation.
10, subject content extracting method as claimed in claim 1 is characterized in that, after determining subject content, starts search engine, searches for and display of search results according to subject content.
11, subject content extracting method as claimed in claim 10 is characterized in that, described display of search results for part or all of Search Results is shown in both sides, in many ways or on the instant message window of any one party.
CNB2005101344549A 2005-12-15 2005-12-15 Extraction for instant message subject content Active CN100410943C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005101344549A CN100410943C (en) 2005-12-15 2005-12-15 Extraction for instant message subject content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005101344549A CN100410943C (en) 2005-12-15 2005-12-15 Extraction for instant message subject content

Publications (2)

Publication Number Publication Date
CN1983252A true CN1983252A (en) 2007-06-20
CN100410943C CN100410943C (en) 2008-08-13

Family

ID=38165792

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005101344549A Active CN100410943C (en) 2005-12-15 2005-12-15 Extraction for instant message subject content

Country Status (1)

Country Link
CN (1) CN100410943C (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005107578A1 (en) 2004-05-06 2005-11-17 Koninklijke Philips Electronics N. V. Protection mechanism for spectroscopic analysis of biological tissue
WO2009021429A1 (en) * 2007-08-13 2009-02-19 Tencent Technology (Shenzhen) Company Limited Method and device for dealing with the instant messaging information
CN102246497A (en) * 2008-12-03 2011-11-16 索尼爱立信移动通讯有限公司 Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages
CN102520853A (en) * 2011-11-29 2012-06-27 上海量明科技发展有限公司 Method, terminal and system for triggering instant messaging interaction interface
CN102594734A (en) * 2012-03-24 2012-07-18 上海量明科技发展有限公司 Method, terminal and system for starting instant communication interactive interface
CN103150318A (en) * 2011-11-29 2013-06-12 国际商业机器公司 Automatically recommending asynchronous discussion forum posts during a real-time collaboration
CN103577587A (en) * 2013-11-08 2014-02-12 南京绿色科技研究院有限公司 News theme classification method
CN104683218A (en) * 2015-02-11 2015-06-03 广州酷狗计算机科技有限公司 Service warning method and device
CN105721668A (en) * 2014-12-22 2016-06-29 Lg电子株式会社 Mobile terminal and method of controlling therefor
CN106294725A (en) * 2016-08-08 2017-01-04 浪潮集团有限公司 A kind of key message extracting method based on mobile terminal instant messaging application
CN107453978A (en) * 2017-07-06 2017-12-08 深圳Tcl新技术有限公司 Data statistical approach, mobile terminal, server and storage medium based on group
CN107770037A (en) * 2016-08-18 2018-03-06 丁雷 By group chat Content Transformation into the method for seminar, server, terminal and system
CN111352685A (en) * 2020-02-28 2020-06-30 北京百度网讯科技有限公司 Input method keyboard display method, device, equipment and storage medium
CN111859119A (en) * 2020-06-30 2020-10-30 维沃移动通信有限公司 Information processing method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG99879A1 (en) * 2001-01-18 2003-11-27 Buzzcity Pte Ltd Wireless messaging system and method
CN1251108C (en) * 2001-03-26 2006-04-12 腾讯科技(深圳)有限公司 Instant messaging system and method
US7640341B2 (en) * 2003-06-19 2009-12-29 Microsoft Corporation Instant messaging for multi-user computers
US20050044143A1 (en) * 2003-08-19 2005-02-24 Logitech Europe S.A. Instant messenger presence and identity management
EP1515521A3 (en) * 2003-09-12 2005-07-06 Axel Druschel A method and an apparatus for automatic conversion of text-based messages to a query of internet-based applications
KR20050096422A (en) * 2004-03-30 2005-10-06 (주)퓨쳐인포넷 System and method for interlocking instant messaging services and web services

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005107578A1 (en) 2004-05-06 2005-11-17 Koninklijke Philips Electronics N. V. Protection mechanism for spectroscopic analysis of biological tissue
WO2009021429A1 (en) * 2007-08-13 2009-02-19 Tencent Technology (Shenzhen) Company Limited Method and device for dealing with the instant messaging information
US8204946B2 (en) 2007-08-13 2012-06-19 Tencent Technology (Shenzhen) Company Ltd. Method and apparatus for processing instant messaging information
CN102246497A (en) * 2008-12-03 2011-11-16 索尼爱立信移动通讯有限公司 Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages
US8718610B2 (en) 2008-12-03 2014-05-06 Sony Corporation Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages
CN102246497B (en) * 2008-12-03 2014-09-17 索尼爱立信移动通讯有限公司 Controlling sound characteristics of alert tunes that signal receipt of messages responsive to content of the messages
CN102520853A (en) * 2011-11-29 2012-06-27 上海量明科技发展有限公司 Method, terminal and system for triggering instant messaging interaction interface
CN103150318A (en) * 2011-11-29 2013-06-12 国际商业机器公司 Automatically recommending asynchronous discussion forum posts during a real-time collaboration
US9294420B2 (en) 2011-11-29 2016-03-22 International Business Machines Corporation Augmenting a real-time collaboration with ranked electronic bulletin board posts
CN102594734A (en) * 2012-03-24 2012-07-18 上海量明科技发展有限公司 Method, terminal and system for starting instant communication interactive interface
CN103577587A (en) * 2013-11-08 2014-02-12 南京绿色科技研究院有限公司 News theme classification method
CN105721668A (en) * 2014-12-22 2016-06-29 Lg电子株式会社 Mobile terminal and method of controlling therefor
CN104683218A (en) * 2015-02-11 2015-06-03 广州酷狗计算机科技有限公司 Service warning method and device
CN104683218B (en) * 2015-02-11 2018-10-23 广州酷狗计算机科技有限公司 Service the method and device of early warning
CN106294725A (en) * 2016-08-08 2017-01-04 浪潮集团有限公司 A kind of key message extracting method based on mobile terminal instant messaging application
CN106294725B (en) * 2016-08-08 2019-02-15 浪潮集团有限公司 A kind of key message extracting method based on mobile terminal instant messaging application
CN107770037A (en) * 2016-08-18 2018-03-06 丁雷 By group chat Content Transformation into the method for seminar, server, terminal and system
CN107453978A (en) * 2017-07-06 2017-12-08 深圳Tcl新技术有限公司 Data statistical approach, mobile terminal, server and storage medium based on group
CN111352685A (en) * 2020-02-28 2020-06-30 北京百度网讯科技有限公司 Input method keyboard display method, device, equipment and storage medium
CN111352685B (en) * 2020-02-28 2024-04-09 北京百度网讯科技有限公司 Display method, device, equipment and storage medium of input method keyboard
CN111859119A (en) * 2020-06-30 2020-10-30 维沃移动通信有限公司 Information processing method and device

Also Published As

Publication number Publication date
CN100410943C (en) 2008-08-13

Similar Documents

Publication Publication Date Title
CN100410943C (en) Extraction for instant message subject content
CN110457404B (en) Social media account classification method based on complex heterogeneous network
CN103309998B (en) A kind of message query method and device, terminal device
CN103761242B (en) Search method, searching system and natural language understanding system
CN103617169B (en) A kind of hot microblog topic extracting method based on Hadoop
CN106357416A (en) Group information recommendation method, device and terminal
CN103116605A (en) Method and system of microblog hot events real-time detection based on detection subnet
CN104615627B (en) A kind of event public feelings information extracting method and system based on microblog
CN105159938B (en) Search method and device
CN103823844A (en) Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
CN103761279B (en) Method and system for scheduling network crawlers on basis of keyword search
CN105095434B (en) The recognition methods of timeliness demand and device
CN102456054B (en) A kind of searching method and system
CA2753775C (en) Method and device for extracting characteristic relation circle from network
CN107609103A (en) It is a kind of based on push away spy event detecting method
CN102750346A (en) Method, system and terminal device for recommending software
CN112235230B (en) Malicious traffic identification method and system
CN112149422B (en) Dynamic enterprise news monitoring method based on natural language
CN103942268A (en) Method and device for combining search and application and application interface
CN106372083B (en) A kind of method and system that controversial news clue is found automatically
CN114881041A (en) Multi-dimensional intelligent extraction system for microblog big data hot topics
CN110008405A (en) A kind of personalization message method for pushing and system based on timeliness
CN102810103A (en) Search result sharing method and system
CN104063479B (en) A kind of branded network temperature computational methods based on community network
CN112199601B (en) News recommendation method based on event popularity of mass news data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant