CN104346480B - information mining method and device - Google Patents

information mining method and device Download PDF

Info

Publication number
CN104346480B
CN104346480B CN201410710424.7A CN201410710424A CN104346480B CN 104346480 B CN104346480 B CN 104346480B CN 201410710424 A CN201410710424 A CN 201410710424A CN 104346480 B CN104346480 B CN 104346480B
Authority
CN
China
Prior art keywords
information
message
message content
characterization information
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410710424.7A
Other languages
Chinese (zh)
Other versions
CN104346480A (en
Inventor
刘松
孙凯
陶明远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410710424.7A priority Critical patent/CN104346480B/en
Publication of CN104346480A publication Critical patent/CN104346480A/en
Priority to PCT/CN2015/086095 priority patent/WO2016082575A1/en
Application granted granted Critical
Publication of CN104346480B publication Critical patent/CN104346480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Abstract

The embodiment of the present invention provides a kind of information mining method and device.This method includes:Monitor the message issued in instant communication software application;The message listened to is parsed, obtains message content;Message content is matched with the keyword in the feature recognition dictionary pre-established;In successful match, capture the message content or the related content of the message content and the message content and preserved as characterization information, and by the characterization information.The not only classification clarity that gives out information in being applied due to instant communication software is high, and the professional height of information, therefore by the way that the message content being resolved to is matched with the keyword in feature recognition dictionary, and capture the message content of successful match, or the related content of the message content and the message content of crawl successful match, can automatic capture special object characterization information, save human cost, and the professional and accuracy of the characterization information of the special object improved.

Description

Information mining method and device
Technical field
The present embodiments relate to information technology field more particularly to a kind of information mining methods and device.
Background technology
The relevant information of objects such as acquisition and product or service in the prior art, such as to the helpful production of the improvement of product During product defect description information, typically by manually being captured in the forum of related field or webpage, inefficiency and standard Exactness is not high.
Invention content
The embodiment of the present invention provides a kind of information mining method and device, to realize that the feature of automatic capture special object is believed Breath saves human cost, and promotes the accuracy of the characteristic information of special object captured.
In a first aspect, an embodiment of the present invention provides a kind of information mining method, including:
Monitor the message issued in instant communication software application;
The message listened to is parsed, obtains message content;
The message content is matched with the keyword in the feature recognition dictionary pre-established;
In successful match, the message content or the phase of the message content and the message content are captured inside the Pass Hold as characterization information, and the characterization information is preserved.
Second aspect, the embodiment of the present invention additionally provide a kind of information excavating device, including:
Message monitors module, for monitoring the message issued in instant communication software application;
Message resolution module for being parsed to the message listened to, obtains message content;
A matching module, for the keyword in the message content and the feature recognition dictionary that pre-establishes to be carried out Match;
Characterization information processing module, in successful match, capturing in the message content or the message Hold and the related content of the message content is preserved as characterization information, and by the characterization information.
Information mining method and device provided in an embodiment of the present invention, by monitoring and parsing in instant communication software application The message of publication, the not only classification clarity that gives out information in being applied due to instant communication software is high, but also the professional height of information, because This is by the way that the message content being resolved to is matched, and capture matching with the keyword in the feature recognition dictionary pre-established Successful message content captures the message content of successful match and the related content of the message content, can be with automatic capture The characterization information of special object saves human cost, and the characterization information of the special object improved Professional and accuracy is conducive to be improved special object according to the characterization information.
Description of the drawings
Fig. 1 is the flow chart of a kind of information mining method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of information mining method provided by Embodiment 2 of the present invention;
Fig. 3 a are the flow chart of a kind of information mining method that the embodiment of the present invention three provides;
Fig. 3 b are the flow chart of another information mining method that the embodiment of the present invention three provides;
Fig. 3 c are the flow chart of another information mining method that the embodiment of the present invention three provides;
Fig. 4 is the structure diagram of a kind of information excavating device that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limitation of the invention.It also should be noted that in order to just Part related to the present invention rather than entire infrastructure are illustrated only in description, attached drawing.
Embodiment one
Referring to Fig. 1, a kind of flow chart of information mining method provided for the embodiment of the present invention one.The embodiment of the present invention Method can be performed by being configured with the information excavating device of hardware and/or software realization, which typically matches It is placed in the server for being capable of providing data mining service.
This method includes:Operation 110~operation 140.
110th, the message issued in instant communication software application is monitored.
In general, each enterprises have with the relevant instant communication software application of the enterprise product or department, with convenient It is responsible for each research and development of products group in the enterprise or the group of responsible operation maintenance gives out information.
For example, the Baidu Hi that baidu company is released is the work(such as a collection word message, voice and video call and file transmission The instant message applications application of energy, that is established in Baidu Hi is corresponding with product " Baidu map " or product " Baidu's translation " etc. Group, the staff for being responsible for each research and development of products or responsible operation maintenance in baidu company to be facilitated to give out information.
Wherein, there are many modes to give out information, can be issued with written form, can also be with voice, video or picture Etc. other forms publication, the present embodiment is not limited this, is supported as long as obtaining instant communication software application.
This operation be specifically monitor instant communication software application in group relevant with enterprise product or with business enterprice sector phase The word message issued in the group of pass.
120th, the message listened to is parsed, obtains message content.
In this operation, specifically according to the communication protocol of instant communication software application, the message listened to is translated, Correctly restore initial data corresponding with the message listened to namely restore the character string that can be read.
130th, the message content is matched with the keyword in the feature recognition dictionary pre-established.
This operation is specifically using keyword match technique, according to the feature recognition dictionary pre-established, determines described disappear It whether ceases in content comprising the keyword in the feature recognition dictionary.
It should be noted that the corresponding group of each object gives out information difference in enterprise, the message content being resolved to is different. Group have the characteristics that classification clarity is high, the professional high and language feature of information it is apparent (such as the group that includes of each group into Member is a kind of classification or is the crowd of identical product, and group members all have the same or similar specialty background), therefore distinct group The message of body publication can reflect Enterprise Object information.
Wherein, object can be the specific macroscopic objects such as each product or business administration.
It is responsible for " Baidu map " research and development or operation maintenance for example, the corresponding group of " Baidu map " product is baidu company Group, the advantage and disadvantage information or the follow-up of the product that the message package that group members are issued in the group contains the product improve information.
For another example, the message package that group members are issued in the corresponding debugging group of " baidu browser " product contains the production debugging The bug occurred in the process or doubtful problem.
Therefore, the corresponding group of object that can be different to enterprise establishes corresponding feature recognition dictionary, so as to obtain not The corresponding characterization information of same object (such as different products or business administration) (such as the advantage and disadvantage letter of different product Breath or business administration there are the problem of);To the different groups of enterprise's same target, corresponding feature recognition word is preferably established Allusion quotation, so as to obtain the characterization information of the different level related with same target.
For example, the feature recognition dictionary related with research and development, the dictionary are established to the research and development group in " Baidu map " product In keyword can include " research and development ", " progress ", " trend ", " cost " and " opponent " etc.;To in " Baidu map " product Debug group and establish the feature recognition dictionary related with debugging, the keyword in the dictionary can including " debugging errors ", " debug Period ", " bug ", " loophole " and " defect " etc.;The feature related with publication is established to the publication group in " Baidu map " product Identify dictionary, the keyword in the dictionary can include " publication ", " news conference ", " issuing stroke " and " issue date " etc..
140th, in successful match, the message content or the phase of the message content and the message content are captured Hold inside the Pass as characterization information, and the characterization information is preserved.
, can be there are two types of embodiment in this operation, one kind is in successful match, captures the message content as special Description information is levied, and the characterization information is preserved;Another kind is in successful match, captures the message content Related content with the message content is preserved as characterization information, and by the characterization information.
Wherein, the related content of the message content and the message content is preferably captured as characterization information, And preserve the characterization information, compared to only capturing the message content, which is conducive to obtain pair The complete characterization information of elephant.
Crawl time interval and/or crawl item number can be set, to capture the related content of the message content of successful match, Such as crawl time interval is set as 15s, crawl item number is set as 5.
Further, the related content of the message content can include:The context message of the message content;With/ Or, establishing session and after the user sends message content augmentation requests with issuing the user of the message content, it is described The supplemental content that user returns.
Example 1
By object to be illustrated for " baidu browser " product.It is included in the message of some group publication of the product The largely evaluation in relation to the product and problem discussion, such as:One designer of the product gives out information in group is developed " when logging in baidu browser, logon rights are problematic ", then another designer of the product sends out in the exploitation group Cloth message is " really, the reason is that A ", after matching operation, " during login baidu browser, logon rights are problematic " this is issued Message and keyword " problem " successful match in the feature recognition dictionary " log in Baidu to browse by capturing message content During device, logon rights are problematic ", the defects of product can be obtained corresponding characterization information, and pass through and capture the message The context message of content " really, the reason is that A ", can obtain the corresponding feature description of producing cause of the defect in the product Information, so as to enrich the characterization information of the product.
It should be noted that it is above-mentioned to capture product the defects of corresponding characterization information and the defect producing cause Illustrated for corresponding characterization information, the corresponding characterization information of the producing cause in addition to capturing the defect it Outside, other characterization informations such as the corresponding solution of the defect can also be captured, as the complete information of product defects, and (such as [name of product, defect content, producing cause]) storage is formatted, the present embodiment is not limited this.
Example 2
After session is established with the user of the publication message content, and didactic put question to is taken to be sent to the user Message content augmentation requests, to ask the complete description of supplementary defect, at this time can dialogue-based (session) grabbed It takes, i.e., the situation of dimension more (such as defect type, defect producing cause etc.) is described for defect, setting is one longer to grab The time (such as one minute) is taken, within this time, captures the supplemental content that the user returns.If still without benefit within this time Description is filled, then only record essential information or returns to failure since necessary information is not complete.
The technical solution of the present embodiment, by monitoring and parsing the message issued in instant communication software application, due to i.e. When communication software application in the not only classification clarity that gives out information it is high, but also the professional height of information, therefore by that will be resolved to Message content is matched with the keyword in the feature recognition dictionary pre-established, and captures the message content of successful match, Or the related content of the message content and the message content of crawl successful match, it can be retouched with the feature of automatic capture special object Information is stated, saves human cost, and the professional and accuracy of the characterization information of the special object improved, is had Special object is improved conducive to according to the characterization information.
In the present embodiment, the feature recognition dictionary is established, can specifically be included:
Receive the keyword in the feature recognition dictionary of human configuration;Alternatively,
The typical sentence manually included is searched in the chat history of the instant communication software, according to typical case's language The context cooccurrence relation of sentence is excavated the keyword of expression individual features and is added in feature recognition dictionary.
It in other words, can be with each keyword in human configuration feature recognition dictionary, for example, being configured in feature recognition dictionary Keywords such as " problems ", " defect " or " improvement ".
Some typical sentences can also be manually included, and are closed according to the context co-occurrence of sentence typical in chat history System so as to meet the word of the expression characteristic in the typical sentence of certain co-occurrence frequency as keyword, and knows added to feature In other dictionary;Or excavate the semantic template of expression characteristic.
For example, in the research and development group of " baidu browser " product of Baidu Hi, a people says " retrieval type=xxx, figure Mistake, so-and-so look at ", another person answers " quite right, to be a problem, recorded defect ", if repeatedly occurred in group's message When " figure mistake " and " having recorded defect " this two word is matched, being considered as this two word matched, there are cooccurrence relations, show This is the defects of needing record, based on this, can excavate the semantic template " [arbitrary word] figure mistake " of expression defect.
Information mining method provided in this embodiment, can be applied to several scenes, for example, the product defects according to foundation Corresponding feature recognition dictionary obtains the defects of object is product description information;For another example, it is corresponding according to the production debugging of foundation Feature recognition dictionary obtains the debugging problem description information that object is product;For another example, according to the corresponding spy of the business administration of foundation Sign identification dictionary, obtain object be business administration event management opinion collect etc. description informations, the present embodiment to this without Limitation.
Specifically, when for capturing the defects of object is product description information, the key in the feature recognition dictionary Word includes the keyword of reflection product defects, and the characterization information is the information for describing product defects.Present embodiment carries It has supplied to excavate from product, captured to defect related content, be finally saved in the full-automatic realization method of designated space, it can be with Cover the staple product group of all product lines of enterprise.
Embodiment two
Referring to Fig. 2, for a kind of flow chart of information mining method provided by Embodiment 2 of the present invention.The present embodiment is upper On the basis of stating embodiment, the preferred embodiment before the message issued in monitoring instant communication software application is provided.This is excellent Choosing method includes:Operation 210~operation 220.
210th, after acquisition applies the access rights of corresponding server with the instant communication software, with the server Establish connection.
For example, obtain the access rights that " Baidu Hi " corresponding server is applied with instant communication software, and with the service Device establishes connection.
220th, the group's account or personal user's account in being applied to the instant communication software are sent to the server Addition request.
For example, send group's account " baidu browser-grind to the corresponding server of instant communication software application " Baidu Hi " The addition request of hair group ", has so that the group members newly added in can be issued in the group with product " baidu browser " The message of pass.
For another example, it sends personal user's account to the corresponding server of instant communication software application " Baidu Hi " and adds in request, The personal account newly added in can chat with having been added to other people's accounts of the application with regard to identical product, form disappearing for publication Breath;The personal account newly added in, which can apply adding in, has been added to group's account of the application so that the group newly added in into Member gives out information in the group.
The technical solution of the present embodiment, in instant communication software application is monitored before the message issued, by with it is instant Communication software establishes connection using corresponding server, and interaction account adds in request, so that it is soft to add in the instant messaging Account in part application can give out information in this application.
It should be noted that group's account or a in being applied to server transmission to the instant communication software After the addition request of people's user account, the message issued in instant communication software application is monitored, is specifically included:Receiving After the response message adhereed to for stating server return, the user in the group of addition or personal user's publication of addition are monitored Message.
Embodiment three
Fig. 3 a are please referred to, the flow chart of a kind of information mining method provided for the embodiment of the present invention three.The present embodiment exists On the basis of the various embodiments described above, provide and capturing the message content or the message content and the message content Related content as after characterization information, the characterization information is preserved before preferred embodiment.
The preferred method includes:Operation 310~operation 360.
310th, the message issued in instant communication software application is monitored.
320th, the message listened to is parsed, obtains message content.
330th, the message content is matched with the keyword in the feature recognition dictionary pre-established.
340th, in successful match, the message content or the phase of the message content and the message content are captured Hold inside the Pass as characterization information.
350th, the characterization information is matched with the keyword in the classification identification dictionary pre-established, according to Matching result determines the corresponding classification of the characterization information.
As previously mentioned, information mining method provided in an embodiment of the present invention, can be applied to several scenes, therefore can root According to practical application request, establish comprising there are many classification of application demand identification dictionaries.
Keyword in classification identification dictionary can be with human configuration.Keyword in the classification identification dictionary can wrap It includes:Baidu map research and development defect, baidu browser debugging defect and Baidu translation research and development improve etc., the present embodiment to this without Limitation.
360th, determining classification is associated preservation with the characterization information.
The technical solution of the present embodiment, by monitoring and parsing the message issued in instant communication software application, due to i.e. When communication software application in the not only classification clarity that gives out information it is high, but also the professional height of information, therefore by that will be resolved to Message content is matched with the keyword in the feature recognition dictionary pre-established, and captures the message content of successful match, Or the related content of the message content and the message content of crawl successful match, it can be retouched with the feature of automatic capture special object Information is stated, saves human cost, and the professional and accuracy of the characterization information of the special object improved, is had Special object is improved conducive to according to the characterization information;After the characterization information of object is grabbed, lead to It crosses and determines the corresponding classification of the characterization information, and determining classification is associated guarantor with the characterization information It deposits, is conducive to the corresponding responsible group of binding category, so as to make accordingly to be responsible for spy of the group according to the profession of special object Description information is levied, knows the valuable feedback of object in time.
It should be noted that operation 350 is to determine in the embodiment of the corresponding classification of characterization information wherein One kind determines that the corresponding classification of characterization information can also be:Pass through natural language processing (Natural Language Processing, NLP) model determines the corresponding classification of the characterization information (operation 351 as shown in Figure 3b).
It is specific that Arithmetic of Semantic Similarity model may be used and/or click similarity algorithm model, determine the feature The corresponding classification of description information.
Wherein, the measure of supervision training pattern of natural language processing cloud backstage training is utilized to analyze two in semantic similarity The similarity of Duan Wenben.The bigger value the more similar.The networking of semantic similarity provides the function of calculating similarity.For example it inputs " laptop ", the semantic similarity of " notebook " is 2.08478.
Wherein, clicking similarity can use in the case where semantic similarity is unable to reach threshold value (such as 1.8), analysis two The click similarity (such as title in retrieval type and retrieval result) of Duan Wenben is calculated using trained embedding vectors It is stronger to be worth bigger click similarity for cosine similarity values, value range [- 1,1].For example input " hello for Baidu " and " Zhou Hong Hello for Yi " both click similarity be -0.121407, input " hello for Baidu " and " Li Yan is macro, and hello " the two click it is similar Degree is 0.218664;It is higher than the former that the latter clicks similarity.
Characterization information and preset multiple classifications are preferentially subjected to semantic similarity judgement respectively in actual use, returned It returns semantic similarity and reaches threshold value and highest classification, if characterization information and the semantic similarity of pre-set categories are not up to Threshold value then continues to carry out characterization information and the pre-set categories to click similarity judgement, if clicking similarity reaches threshold Value then returns to respective classes, if clicking similarity is not up to threshold value, returns to default category (such as:Other).Threshold value can basis Historical data is constantly fitted, to keep higher accuracy.
It should also be noted that, determine that the corresponding classification of characterization information can also be:Using previously according to having marked The probabilistic model that the feature description text of classification information trains determines the corresponding classification of the characterization information, the probability The input of model is characterized description text, exports the probability value (operation 352 as shown in Figure 3c) to belong to setting classification.Specifically , probabilistic model is trained according to the feature description text for having marked classification information in advance, the characterization information is inputted The probabilistic model obtains the classification A corresponding to the characterization information of probabilistic model output and corresponds to the general of category A Rate value, if the probability value meets certain threshold value, it is determined that the corresponding classification of the characterization information is classification A.It such as can be with By manual sort in chat record mark and corresponding description text, the probability of P (type | characterization information) is trained Model, training method can flexibly be selected according to the business scope feature of system, typical such as Nae Bayesianmethod.It is applying In, if the probability that customer problem description belongs to a certain Question Classification meets certain threshold value, you can think to belong to the classification.
On the basis of the present embodiment, after the corresponding classification of the characterization information is determined, under can also including State operation:
The information of the recipient of the characterization information is determined according to the classification;
The characterization information is sent to by the recipient according to the information of the recipient.
Wherein, the information of the recipient can be the address of setting website, set note number, the mailbox for receiving user Address or setting receive the instant communication software account of user.
Present embodiment provides in the characterization information for grabbing object and determines the characterization information pair After the classification answered, recipient is made to know the realization method of the characterization information of object, it is corresponding using recipient as classification It is responsible for group, and is responsible for the professional characterization information of group interaction object with this, so as to makes accordingly to be responsible for group's root According to the characterization information of the profession of special object, the valuable feedback of object is known in time.
Example IV
Referring to Fig. 4, a kind of structure diagram of information excavating device provided for the embodiment of the present invention four.The device packet It includes:Message monitors module 410, message resolution module 420, matching module 430 and characterization information processing module 440.
Wherein, message monitors module 410 and is used to monitor the message issued in instant communication software application;Message resolution module 420, for being parsed to the message listened to, obtain message content;Matching module 430 is for by the message content and in advance Keyword in the feature recognition dictionary first established is matched;Characterization information processing module 440 is used in successful match When, the message content or the related content of the message content and the message content are captured as characterization information, And the characterization information is preserved.
The technical solution of the present embodiment, by monitoring and parsing the message issued in instant communication software application, due to i.e. When communication software application in the not only classification clarity that gives out information it is high, but also the professional height of information, therefore by that will be resolved to Message content is matched with the keyword in the feature recognition dictionary pre-established, and captures the message content of successful match, Or the related content of the message content and the message content of crawl successful match, it can be retouched with the feature of automatic capture special object Information is stated, saves human cost, and the professional and accuracy of the characterization information of the special object improved, is had Special object is improved conducive to according to the characterization information.
In said program, described device can also include:Connection establishment module and request sending module.
Wherein, connection establishment module is used in instant communication software application is monitored before the message issued, obtain with After the instant communication software is using the access rights of corresponding server, establishes and connect with the server;Request sends mould Block is used to send the addition of group's account or personal user's account in being applied to the instant communication software to the server Request;The message is monitored module 410 and is specifically used for:In the response message adhereed to for receiving the server return Afterwards, the message of the user in the group of addition or personal user's publication of addition is monitored.
In said program, described device can also establish module including feature recognition dictionary, for receiving human configuration Feature recognition dictionary in keyword;Alternatively,
For searching the typical sentence manually included in the chat history of the instant communication software, according to the allusion quotation The context cooccurrence relation of type sentence is excavated the keyword of expression individual features and is added in feature recognition dictionary.
In said program, described device can also include:First category determining module or second category determining module, Or third category determination module.
Wherein, first category determining module is used to capture the message content or the message content and described disappear Cease content related content as after characterization information, the characterization information is preserved before, by the spy Sign description information is matched with the keyword in the classification identification dictionary pre-established, and the feature is determined according to matching result The corresponding classification of description information;Second category determining module be used for capture the message content or the message content and The related content of the message content as after characterization information, the characterization information is preserved before, lead to It crosses natural language processing (NLP) model and determines the corresponding classification of the characterization information;Third category determination module is used for Capture the message content or the related content of the message content and the message content as characterization information it Afterwards, it before the characterization information is preserved, is instructed using previously according to the feature description text for having marked classification information The probabilistic model practised determines the corresponding classification of the characterization information;The characterization information processing module 440 is specific For:Determining classification is associated preservation with the characterization information.
Wherein, the second category determining module is specifically used for:Using Arithmetic of Semantic Similarity model and/or click similar Algorithm model is spent, determines the corresponding classification of characterization information.
Further, described device can also include:Recipient's information determination module and characterization information sending module.
Wherein, recipient's information determination module is used for after the corresponding classification of the characterization information is determined, according to The classification determines the information of the recipient of the characterization information;Characterization information sending module is used to connect according to The characterization information is sent to the recipient by the information of debit.
Wherein, the information of the recipient can be the address of setting website, set note number, the mailbox for receiving user Address or setting receive the instant communication software account of user.
The related content of the message content can include:The context message of the message content;And/or with hair The user of message content described in cloth establishes session and after the user sends message content augmentation requests, what the user returned Supplemental content.
In said program, the keyword in the feature recognition dictionary can include the keyword of reflection product defects, Correspondingly, the characterization information can be the information of description product defects.
Information excavating device provided in an embodiment of the present invention can perform the information excavating that any embodiment of the present invention is provided Method has the corresponding function module of execution method and advantageous effect.
Note that it above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The present invention is not limited to specific embodiment described here, can carry out for a person skilled in the art various apparent variations, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also It can include other more equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.

Claims (16)

1. a kind of information mining method, which is characterized in that including:
Monitor the message issued in instant communication software application;
The message listened to is parsed, obtains message content;
The message content is matched with the keyword in the feature recognition dictionary pre-established;
In successful match, the related content of the message content and the message content is captured as characterization information, and The characterization information is preserved;
Wherein, the related content of the message content includes:Session is being established with issuing the user of the message content and to institute After stating user's transmission message content augmentation requests, the supplemental content of user's return;Or, the context of the message content disappears It ceases and is establishing session and after the user sends message content augmentation requests with issuing the user of the message content, it is described The supplemental content that user returns.
2. the method as described in claim 1, which is characterized in that in instant communication software application is monitored the message issued it Before, it further includes:
It is obtaining with after access rights of the instant communication software using corresponding server, establishing and connecting with the server It connects;
The addition that group's account or personal user's account in being applied to the instant communication software are sent to the server please It asks;
It is described to monitor the message issued in instant communication software application, it specifically includes:
After the response message adhereed to that the server returns is received, the user in the group of addition or addition are monitored Personal user publication message.
3. the method as described in claim 1, which is characterized in that establish the feature recognition dictionary, specifically include:
Receive the keyword in the feature recognition dictionary of human configuration;Alternatively,
The typical sentence manually included is searched in the chat history of the instant communication software, according to typical case's sentence Context cooccurrence relation is excavated the keyword of expression individual features and is added in feature recognition dictionary.
4. the method as described in claim 1, which is characterized in that the message content is related to the message content capturing Content as after characterization information, the characterization information is preserved before, further include:
The characterization information is matched with the keyword in the classification identification dictionary pre-established, according to matching result Determine the corresponding classification of the characterization information;Or, the characterization information is determined by natural language processing NLP models Corresponding classification;Or, institute is determined using the probabilistic model trained previously according to the feature description text for having marked classification information State the corresponding classification of characterization information;
The characterization information preserve and is included:Determining classification is associated guarantor with the characterization information It deposits.
5. method as claimed in claim 4, which is characterized in that determine that the feature is retouched by natural language processing NLP models The corresponding classification of information is stated, is specifically included:
Using Arithmetic of Semantic Similarity model and/or similarity algorithm model is clicked, determines that the characterization information corresponds to Classification.
6. method as claimed in claim 4, which is characterized in that after the corresponding classification of the characterization information is determined, It further includes:
The information of the recipient of the characterization information is determined according to the classification;
The characterization information is sent to by the recipient according to the information of the recipient.
7. method as claimed in claim 6, which is characterized in that the information of the recipient is the address for setting website, setting The note number, email address or setting for receiving user receive the instant communication software account of user.
8. the method as described in any in claim 1-7, which is characterized in that the keyword in the feature recognition dictionary includes Reflect the keyword of product defects, the characterization information is the information for describing product defects.
9. a kind of information excavating device, which is characterized in that including:
Message monitors module, for monitoring the message issued in instant communication software application;
Message resolution module for being parsed to the message listened to, obtains message content;
Matching module, for the message content to be matched with the keyword in the feature recognition dictionary pre-established;
Characterization information processing module, in successful match, capturing the phase of the message content and the message content Hold inside the Pass as characterization information, and the characterization information is preserved;
Wherein, the related content of the message content includes:Session is being established with issuing the user of the message content and to institute After stating user's transmission message content augmentation requests, the supplemental content of user's return;Or, the context of the message content disappears It ceases and is establishing session and after the user sends message content augmentation requests with issuing the user of the message content, it is described The supplemental content that user returns.
10. device as claimed in claim 9, which is characterized in that described device further includes:
Connection establishment module, before the message issued in instant communication software application is monitored, obtain with it is described immediately After communication software is using the access rights of corresponding server, establishes and connect with the server;
Request sending module, for sending group's account or the individual in being applied to the instant communication software to the server The addition request of user account;
The message is monitored module and is specifically used for:After the response message adhereed to that the server returns is received, prison Listen the message of the user in the group of addition or personal user's publication of addition.
11. device as claimed in claim 9, which is characterized in that described device further includes feature recognition dictionary and establishes module, uses Keyword in the feature recognition dictionary for receiving human configuration;Alternatively,
For searching the typical sentence manually included in the chat history of the instant communication software, according to typical case's language The context cooccurrence relation of sentence is excavated the keyword of expression individual features and is added in feature recognition dictionary.
12. device as claimed in claim 9, which is characterized in that described device further includes:
First category determining module, for being retouched in the related content for capturing the message content and the message content as feature After stating information, the characterization information is preserved before, by the characterization information and the classification pre-established Keyword in identification dictionary is matched, and the corresponding classification of the characterization information is determined according to matching result;Or
Second category determining module, for being retouched in the related content for capturing the message content and the message content as feature After stating information, the characterization information is preserved before, the feature is determined by natural language processing NLP models The corresponding classification of description information;Or
Third category determination module, for being retouched in the related content for capturing the message content and the message content as feature After stating information, the characterization information is preserved before, retouched using previously according to the feature for having marked classification information It states the probabilistic model that text trains and determines the corresponding classification of the characterization information;
The characterization information processing module is specifically used for:Determining classification is associated guarantor with the characterization information It deposits.
13. device as claimed in claim 12, which is characterized in that the second category determining module is specifically used for:Using language Adopted similarity algorithm model and/or click similarity algorithm model, determine the corresponding classification of characterization information.
14. device as claimed in claim 12, which is characterized in that described device further includes:
Recipient's information determination module, for after the corresponding classification of the characterization information is determined, according to the classification Determine the information of the recipient of the characterization information;
Characterization information sending module, the characterization information is sent to for the information according to the recipient described in Recipient.
15. device as claimed in claim 14, which is characterized in that the information of the recipient is the address for setting website, sets Surely the note number, email address or setting for receiving user receive the instant communication software account of user.
16. the device as described in any in claim 9-15, which is characterized in that the keyword packet in the feature recognition dictionary The keyword of the product defects containing reflection, the characterization information are the information for describing product defects.
CN201410710424.7A 2014-11-27 2014-11-27 information mining method and device Active CN104346480B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410710424.7A CN104346480B (en) 2014-11-27 2014-11-27 information mining method and device
PCT/CN2015/086095 WO2016082575A1 (en) 2014-11-27 2015-08-05 Information mining method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410710424.7A CN104346480B (en) 2014-11-27 2014-11-27 information mining method and device

Publications (2)

Publication Number Publication Date
CN104346480A CN104346480A (en) 2015-02-11
CN104346480B true CN104346480B (en) 2018-06-26

Family

ID=52502071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410710424.7A Active CN104346480B (en) 2014-11-27 2014-11-27 information mining method and device

Country Status (2)

Country Link
CN (1) CN104346480B (en)
WO (1) WO2016082575A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346480B (en) * 2014-11-27 2018-06-26 百度在线网络技术(北京)有限公司 information mining method and device
CN105282012A (en) * 2015-10-23 2016-01-27 广东小天才科技有限公司 Method and system for enhancing information reminding when group chat is carried out
CN106649404B (en) * 2015-11-04 2019-12-27 陈包容 Method and device for creating session scene database
CN108345582B (en) * 2017-01-23 2021-08-24 腾讯科技(深圳)有限公司 Method and device for identifying social group engaged business
CN107526779A (en) * 2017-07-22 2017-12-29 长沙兔子代跑网络科技有限公司 A kind of method and device for excavating generation race client
CN107491493A (en) * 2017-07-22 2017-12-19 长沙兔子代跑网络科技有限公司 A kind of intelligence obtains the method and device for running chat record in generation
CN109063029A (en) * 2018-07-10 2018-12-21 苏奇 A kind of information filing management method based on instant communication software
CN109582719B (en) * 2018-10-19 2021-08-24 国电南瑞科技股份有限公司 Method and system for automatically linking SCD file of intelligent substation to virtual terminal
US11587095B2 (en) * 2019-10-15 2023-02-21 Microsoft Technology Licensing, Llc Semantic sweeping of metadata enriched service data
CN113765767A (en) * 2020-06-02 2021-12-07 上海回声网络科技有限公司 Enterprise WeChat supervision method and system
CN113051476B (en) * 2021-03-25 2023-06-13 北京百度网讯科技有限公司 Method and device for sending message

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133477A1 (en) * 2001-03-05 2002-09-19 Glenn Abel Method for profile-based notice and broadcast of multimedia content
CN1987852A (en) * 2005-12-21 2007-06-27 腾讯科技(深圳)有限公司 Method and device for determining communication object attribute according to news content
CN101166160B (en) * 2006-10-20 2010-09-15 阿里巴巴集团控股有限公司 A method and system for filtering instant communication rubbish information
CN102323933A (en) * 2011-08-31 2012-01-18 张潇 Information embedding and interaction system facing real-time communication and method
CN102419778B (en) * 2012-01-09 2013-03-20 中国科学院软件研究所 Information searching method for discovering and clustering sub-topics of query statement
CN103577416B (en) * 2012-07-20 2017-09-22 阿里巴巴集团控股有限公司 Expanding query method and system
CN102970210A (en) * 2012-11-02 2013-03-13 北京百度网讯科技有限公司 Method and device for reminding group messages in instant chat tool
CN103605690A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method for recognizing advertising messages in instant messaging
CN104346480B (en) * 2014-11-27 2018-06-26 百度在线网络技术(北京)有限公司 information mining method and device

Also Published As

Publication number Publication date
CN104346480A (en) 2015-02-11
WO2016082575A1 (en) 2016-06-02

Similar Documents

Publication Publication Date Title
CN104346480B (en) information mining method and device
US20200143288A1 (en) Training of Chatbots from Corpus of Human-to-Human Chats
CN107870896B (en) Conversation analysis method and device
US20190243916A1 (en) Cognitive Ranking of Terms Used During a Conversation
CN111182162B (en) Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence
CN107680019A (en) A kind of implementation method of Examination Scheme, device, equipment and storage medium
CN113468296B (en) Model self-iteration type intelligent customer service quality inspection system and method capable of configuring business logic
CN107451110A (en) A kind of method, apparatus and server for generating meeting summary
US11520983B2 (en) Methods and systems for trending issue identification in text streams
CN104050221A (en) Automatic note taking within a virtual meeting
CN105373478B (en) Automated testing method and system
US10885080B2 (en) Cognitive ranking of terms used during a conversation
US10685655B2 (en) Leveraging natural language processing
CN110222513B (en) Abnormality monitoring method and device for online activities and storage medium
CN115099239B (en) Resource identification method, device, equipment and storage medium
CN113111658B (en) Method, device, equipment and storage medium for checking information
CN110163013A (en) A kind of method and apparatus detecting sensitive information
US20200220741A1 (en) System and Method for Modeling an Asynchronous Communication Channel
CN106649102A (en) Graphical interface program testing log record and replay method based on hook function
Lima et al. Land of lost knowledge: an initial investigation into projects lost knowledge
WO2022206307A1 (en) Method for electronic messaging using image based noisy content
CN115114495B (en) Airworthiness data management auxiliary method and system based on deep learning
CN114827237B (en) Remote connection operation log recording method and electronic equipment
WO2021232282A1 (en) Vulnerability information obtaining method and apparatus, and electronic device and storage medium
CN113688280B (en) Ordering method, ordering device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant