CN104346480B - information mining method and device - Google Patents
information mining method and device Download PDFInfo
- Publication number
- CN104346480B CN104346480B CN201410710424.7A CN201410710424A CN104346480B CN 104346480 B CN104346480 B CN 104346480B CN 201410710424 A CN201410710424 A CN 201410710424A CN 104346480 B CN104346480 B CN 104346480B
- Authority
- CN
- China
- Prior art keywords
- information
- message
- message content
- characterization information
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Abstract
The embodiment of the present invention provides a kind of information mining method and device.This method includes:Monitor the message issued in instant communication software application;The message listened to is parsed, obtains message content;Message content is matched with the keyword in the feature recognition dictionary pre-established;In successful match, capture the message content or the related content of the message content and the message content and preserved as characterization information, and by the characterization information.The not only classification clarity that gives out information in being applied due to instant communication software is high, and the professional height of information, therefore by the way that the message content being resolved to is matched with the keyword in feature recognition dictionary, and capture the message content of successful match, or the related content of the message content and the message content of crawl successful match, can automatic capture special object characterization information, save human cost, and the professional and accuracy of the characterization information of the special object improved.
Description
Technical field
The present embodiments relate to information technology field more particularly to a kind of information mining methods and device.
Background technology
The relevant information of objects such as acquisition and product or service in the prior art, such as to the helpful production of the improvement of product
During product defect description information, typically by manually being captured in the forum of related field or webpage, inefficiency and standard
Exactness is not high.
Invention content
The embodiment of the present invention provides a kind of information mining method and device, to realize that the feature of automatic capture special object is believed
Breath saves human cost, and promotes the accuracy of the characteristic information of special object captured.
In a first aspect, an embodiment of the present invention provides a kind of information mining method, including:
Monitor the message issued in instant communication software application;
The message listened to is parsed, obtains message content;
The message content is matched with the keyword in the feature recognition dictionary pre-established;
In successful match, the message content or the phase of the message content and the message content are captured inside the Pass
Hold as characterization information, and the characterization information is preserved.
Second aspect, the embodiment of the present invention additionally provide a kind of information excavating device, including:
Message monitors module, for monitoring the message issued in instant communication software application;
Message resolution module for being parsed to the message listened to, obtains message content;
A matching module, for the keyword in the message content and the feature recognition dictionary that pre-establishes to be carried out
Match;
Characterization information processing module, in successful match, capturing in the message content or the message
Hold and the related content of the message content is preserved as characterization information, and by the characterization information.
Information mining method and device provided in an embodiment of the present invention, by monitoring and parsing in instant communication software application
The message of publication, the not only classification clarity that gives out information in being applied due to instant communication software is high, but also the professional height of information, because
This is by the way that the message content being resolved to is matched, and capture matching with the keyword in the feature recognition dictionary pre-established
Successful message content captures the message content of successful match and the related content of the message content, can be with automatic capture
The characterization information of special object saves human cost, and the characterization information of the special object improved
Professional and accuracy is conducive to be improved special object according to the characterization information.
Description of the drawings
Fig. 1 is the flow chart of a kind of information mining method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of information mining method provided by Embodiment 2 of the present invention;
Fig. 3 a are the flow chart of a kind of information mining method that the embodiment of the present invention three provides;
Fig. 3 b are the flow chart of another information mining method that the embodiment of the present invention three provides;
Fig. 3 c are the flow chart of another information mining method that the embodiment of the present invention three provides;
Fig. 4 is the structure diagram of a kind of information excavating device that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limitation of the invention.It also should be noted that in order to just
Part related to the present invention rather than entire infrastructure are illustrated only in description, attached drawing.
Embodiment one
Referring to Fig. 1, a kind of flow chart of information mining method provided for the embodiment of the present invention one.The embodiment of the present invention
Method can be performed by being configured with the information excavating device of hardware and/or software realization, which typically matches
It is placed in the server for being capable of providing data mining service.
This method includes:Operation 110~operation 140.
110th, the message issued in instant communication software application is monitored.
In general, each enterprises have with the relevant instant communication software application of the enterprise product or department, with convenient
It is responsible for each research and development of products group in the enterprise or the group of responsible operation maintenance gives out information.
For example, the Baidu Hi that baidu company is released is the work(such as a collection word message, voice and video call and file transmission
The instant message applications application of energy, that is established in Baidu Hi is corresponding with product " Baidu map " or product " Baidu's translation " etc.
Group, the staff for being responsible for each research and development of products or responsible operation maintenance in baidu company to be facilitated to give out information.
Wherein, there are many modes to give out information, can be issued with written form, can also be with voice, video or picture
Etc. other forms publication, the present embodiment is not limited this, is supported as long as obtaining instant communication software application.
This operation be specifically monitor instant communication software application in group relevant with enterprise product or with business enterprice sector phase
The word message issued in the group of pass.
120th, the message listened to is parsed, obtains message content.
In this operation, specifically according to the communication protocol of instant communication software application, the message listened to is translated,
Correctly restore initial data corresponding with the message listened to namely restore the character string that can be read.
130th, the message content is matched with the keyword in the feature recognition dictionary pre-established.
This operation is specifically using keyword match technique, according to the feature recognition dictionary pre-established, determines described disappear
It whether ceases in content comprising the keyword in the feature recognition dictionary.
It should be noted that the corresponding group of each object gives out information difference in enterprise, the message content being resolved to is different.
Group have the characteristics that classification clarity is high, the professional high and language feature of information it is apparent (such as the group that includes of each group into
Member is a kind of classification or is the crowd of identical product, and group members all have the same or similar specialty background), therefore distinct group
The message of body publication can reflect Enterprise Object information.
Wherein, object can be the specific macroscopic objects such as each product or business administration.
It is responsible for " Baidu map " research and development or operation maintenance for example, the corresponding group of " Baidu map " product is baidu company
Group, the advantage and disadvantage information or the follow-up of the product that the message package that group members are issued in the group contains the product improve information.
For another example, the message package that group members are issued in the corresponding debugging group of " baidu browser " product contains the production debugging
The bug occurred in the process or doubtful problem.
Therefore, the corresponding group of object that can be different to enterprise establishes corresponding feature recognition dictionary, so as to obtain not
The corresponding characterization information of same object (such as different products or business administration) (such as the advantage and disadvantage letter of different product
Breath or business administration there are the problem of);To the different groups of enterprise's same target, corresponding feature recognition word is preferably established
Allusion quotation, so as to obtain the characterization information of the different level related with same target.
For example, the feature recognition dictionary related with research and development, the dictionary are established to the research and development group in " Baidu map " product
In keyword can include " research and development ", " progress ", " trend ", " cost " and " opponent " etc.;To in " Baidu map " product
Debug group and establish the feature recognition dictionary related with debugging, the keyword in the dictionary can including " debugging errors ", " debug
Period ", " bug ", " loophole " and " defect " etc.;The feature related with publication is established to the publication group in " Baidu map " product
Identify dictionary, the keyword in the dictionary can include " publication ", " news conference ", " issuing stroke " and " issue date " etc..
140th, in successful match, the message content or the phase of the message content and the message content are captured
Hold inside the Pass as characterization information, and the characterization information is preserved.
, can be there are two types of embodiment in this operation, one kind is in successful match, captures the message content as special
Description information is levied, and the characterization information is preserved;Another kind is in successful match, captures the message content
Related content with the message content is preserved as characterization information, and by the characterization information.
Wherein, the related content of the message content and the message content is preferably captured as characterization information,
And preserve the characterization information, compared to only capturing the message content, which is conducive to obtain pair
The complete characterization information of elephant.
Crawl time interval and/or crawl item number can be set, to capture the related content of the message content of successful match,
Such as crawl time interval is set as 15s, crawl item number is set as 5.
Further, the related content of the message content can include:The context message of the message content;With/
Or, establishing session and after the user sends message content augmentation requests with issuing the user of the message content, it is described
The supplemental content that user returns.
Example 1
By object to be illustrated for " baidu browser " product.It is included in the message of some group publication of the product
The largely evaluation in relation to the product and problem discussion, such as:One designer of the product gives out information in group is developed
" when logging in baidu browser, logon rights are problematic ", then another designer of the product sends out in the exploitation group
Cloth message is " really, the reason is that A ", after matching operation, " during login baidu browser, logon rights are problematic " this is issued
Message and keyword " problem " successful match in the feature recognition dictionary " log in Baidu to browse by capturing message content
During device, logon rights are problematic ", the defects of product can be obtained corresponding characterization information, and pass through and capture the message
The context message of content " really, the reason is that A ", can obtain the corresponding feature description of producing cause of the defect in the product
Information, so as to enrich the characterization information of the product.
It should be noted that it is above-mentioned to capture product the defects of corresponding characterization information and the defect producing cause
Illustrated for corresponding characterization information, the corresponding characterization information of the producing cause in addition to capturing the defect it
Outside, other characterization informations such as the corresponding solution of the defect can also be captured, as the complete information of product defects, and
(such as [name of product, defect content, producing cause]) storage is formatted, the present embodiment is not limited this.
Example 2
After session is established with the user of the publication message content, and didactic put question to is taken to be sent to the user
Message content augmentation requests, to ask the complete description of supplementary defect, at this time can dialogue-based (session) grabbed
It takes, i.e., the situation of dimension more (such as defect type, defect producing cause etc.) is described for defect, setting is one longer to grab
The time (such as one minute) is taken, within this time, captures the supplemental content that the user returns.If still without benefit within this time
Description is filled, then only record essential information or returns to failure since necessary information is not complete.
The technical solution of the present embodiment, by monitoring and parsing the message issued in instant communication software application, due to i.e.
When communication software application in the not only classification clarity that gives out information it is high, but also the professional height of information, therefore by that will be resolved to
Message content is matched with the keyword in the feature recognition dictionary pre-established, and captures the message content of successful match,
Or the related content of the message content and the message content of crawl successful match, it can be retouched with the feature of automatic capture special object
Information is stated, saves human cost, and the professional and accuracy of the characterization information of the special object improved, is had
Special object is improved conducive to according to the characterization information.
In the present embodiment, the feature recognition dictionary is established, can specifically be included:
Receive the keyword in the feature recognition dictionary of human configuration;Alternatively,
The typical sentence manually included is searched in the chat history of the instant communication software, according to typical case's language
The context cooccurrence relation of sentence is excavated the keyword of expression individual features and is added in feature recognition dictionary.
It in other words, can be with each keyword in human configuration feature recognition dictionary, for example, being configured in feature recognition dictionary
Keywords such as " problems ", " defect " or " improvement ".
Some typical sentences can also be manually included, and are closed according to the context co-occurrence of sentence typical in chat history
System so as to meet the word of the expression characteristic in the typical sentence of certain co-occurrence frequency as keyword, and knows added to feature
In other dictionary;Or excavate the semantic template of expression characteristic.
For example, in the research and development group of " baidu browser " product of Baidu Hi, a people says " retrieval type=xxx, figure
Mistake, so-and-so look at ", another person answers " quite right, to be a problem, recorded defect ", if repeatedly occurred in group's message
When " figure mistake " and " having recorded defect " this two word is matched, being considered as this two word matched, there are cooccurrence relations, show
This is the defects of needing record, based on this, can excavate the semantic template " [arbitrary word] figure mistake " of expression defect.
Information mining method provided in this embodiment, can be applied to several scenes, for example, the product defects according to foundation
Corresponding feature recognition dictionary obtains the defects of object is product description information;For another example, it is corresponding according to the production debugging of foundation
Feature recognition dictionary obtains the debugging problem description information that object is product;For another example, according to the corresponding spy of the business administration of foundation
Sign identification dictionary, obtain object be business administration event management opinion collect etc. description informations, the present embodiment to this without
Limitation.
Specifically, when for capturing the defects of object is product description information, the key in the feature recognition dictionary
Word includes the keyword of reflection product defects, and the characterization information is the information for describing product defects.Present embodiment carries
It has supplied to excavate from product, captured to defect related content, be finally saved in the full-automatic realization method of designated space, it can be with
Cover the staple product group of all product lines of enterprise.
Embodiment two
Referring to Fig. 2, for a kind of flow chart of information mining method provided by Embodiment 2 of the present invention.The present embodiment is upper
On the basis of stating embodiment, the preferred embodiment before the message issued in monitoring instant communication software application is provided.This is excellent
Choosing method includes:Operation 210~operation 220.
210th, after acquisition applies the access rights of corresponding server with the instant communication software, with the server
Establish connection.
For example, obtain the access rights that " Baidu Hi " corresponding server is applied with instant communication software, and with the service
Device establishes connection.
220th, the group's account or personal user's account in being applied to the instant communication software are sent to the server
Addition request.
For example, send group's account " baidu browser-grind to the corresponding server of instant communication software application " Baidu Hi "
The addition request of hair group ", has so that the group members newly added in can be issued in the group with product " baidu browser "
The message of pass.
For another example, it sends personal user's account to the corresponding server of instant communication software application " Baidu Hi " and adds in request,
The personal account newly added in can chat with having been added to other people's accounts of the application with regard to identical product, form disappearing for publication
Breath;The personal account newly added in, which can apply adding in, has been added to group's account of the application so that the group newly added in into
Member gives out information in the group.
The technical solution of the present embodiment, in instant communication software application is monitored before the message issued, by with it is instant
Communication software establishes connection using corresponding server, and interaction account adds in request, so that it is soft to add in the instant messaging
Account in part application can give out information in this application.
It should be noted that group's account or a in being applied to server transmission to the instant communication software
After the addition request of people's user account, the message issued in instant communication software application is monitored, is specifically included:Receiving
After the response message adhereed to for stating server return, the user in the group of addition or personal user's publication of addition are monitored
Message.
Embodiment three
Fig. 3 a are please referred to, the flow chart of a kind of information mining method provided for the embodiment of the present invention three.The present embodiment exists
On the basis of the various embodiments described above, provide and capturing the message content or the message content and the message content
Related content as after characterization information, the characterization information is preserved before preferred embodiment.
The preferred method includes:Operation 310~operation 360.
310th, the message issued in instant communication software application is monitored.
320th, the message listened to is parsed, obtains message content.
330th, the message content is matched with the keyword in the feature recognition dictionary pre-established.
340th, in successful match, the message content or the phase of the message content and the message content are captured
Hold inside the Pass as characterization information.
350th, the characterization information is matched with the keyword in the classification identification dictionary pre-established, according to
Matching result determines the corresponding classification of the characterization information.
As previously mentioned, information mining method provided in an embodiment of the present invention, can be applied to several scenes, therefore can root
According to practical application request, establish comprising there are many classification of application demand identification dictionaries.
Keyword in classification identification dictionary can be with human configuration.Keyword in the classification identification dictionary can wrap
It includes:Baidu map research and development defect, baidu browser debugging defect and Baidu translation research and development improve etc., the present embodiment to this without
Limitation.
360th, determining classification is associated preservation with the characterization information.
The technical solution of the present embodiment, by monitoring and parsing the message issued in instant communication software application, due to i.e.
When communication software application in the not only classification clarity that gives out information it is high, but also the professional height of information, therefore by that will be resolved to
Message content is matched with the keyword in the feature recognition dictionary pre-established, and captures the message content of successful match,
Or the related content of the message content and the message content of crawl successful match, it can be retouched with the feature of automatic capture special object
Information is stated, saves human cost, and the professional and accuracy of the characterization information of the special object improved, is had
Special object is improved conducive to according to the characterization information;After the characterization information of object is grabbed, lead to
It crosses and determines the corresponding classification of the characterization information, and determining classification is associated guarantor with the characterization information
It deposits, is conducive to the corresponding responsible group of binding category, so as to make accordingly to be responsible for spy of the group according to the profession of special object
Description information is levied, knows the valuable feedback of object in time.
It should be noted that operation 350 is to determine in the embodiment of the corresponding classification of characterization information wherein
One kind determines that the corresponding classification of characterization information can also be:Pass through natural language processing (Natural Language
Processing, NLP) model determines the corresponding classification of the characterization information (operation 351 as shown in Figure 3b).
It is specific that Arithmetic of Semantic Similarity model may be used and/or click similarity algorithm model, determine the feature
The corresponding classification of description information.
Wherein, the measure of supervision training pattern of natural language processing cloud backstage training is utilized to analyze two in semantic similarity
The similarity of Duan Wenben.The bigger value the more similar.The networking of semantic similarity provides the function of calculating similarity.For example it inputs
" laptop ", the semantic similarity of " notebook " is 2.08478.
Wherein, clicking similarity can use in the case where semantic similarity is unable to reach threshold value (such as 1.8), analysis two
The click similarity (such as title in retrieval type and retrieval result) of Duan Wenben is calculated using trained embedding vectors
It is stronger to be worth bigger click similarity for cosine similarity values, value range [- 1,1].For example input " hello for Baidu " and " Zhou Hong
Hello for Yi " both click similarity be -0.121407, input " hello for Baidu " and " Li Yan is macro, and hello " the two click it is similar
Degree is 0.218664;It is higher than the former that the latter clicks similarity.
Characterization information and preset multiple classifications are preferentially subjected to semantic similarity judgement respectively in actual use, returned
It returns semantic similarity and reaches threshold value and highest classification, if characterization information and the semantic similarity of pre-set categories are not up to
Threshold value then continues to carry out characterization information and the pre-set categories to click similarity judgement, if clicking similarity reaches threshold
Value then returns to respective classes, if clicking similarity is not up to threshold value, returns to default category (such as:Other).Threshold value can basis
Historical data is constantly fitted, to keep higher accuracy.
It should also be noted that, determine that the corresponding classification of characterization information can also be:Using previously according to having marked
The probabilistic model that the feature description text of classification information trains determines the corresponding classification of the characterization information, the probability
The input of model is characterized description text, exports the probability value (operation 352 as shown in Figure 3c) to belong to setting classification.Specifically
, probabilistic model is trained according to the feature description text for having marked classification information in advance, the characterization information is inputted
The probabilistic model obtains the classification A corresponding to the characterization information of probabilistic model output and corresponds to the general of category A
Rate value, if the probability value meets certain threshold value, it is determined that the corresponding classification of the characterization information is classification A.It such as can be with
By manual sort in chat record mark and corresponding description text, the probability of P (type | characterization information) is trained
Model, training method can flexibly be selected according to the business scope feature of system, typical such as Nae Bayesianmethod.It is applying
In, if the probability that customer problem description belongs to a certain Question Classification meets certain threshold value, you can think to belong to the classification.
On the basis of the present embodiment, after the corresponding classification of the characterization information is determined, under can also including
State operation:
The information of the recipient of the characterization information is determined according to the classification;
The characterization information is sent to by the recipient according to the information of the recipient.
Wherein, the information of the recipient can be the address of setting website, set note number, the mailbox for receiving user
Address or setting receive the instant communication software account of user.
Present embodiment provides in the characterization information for grabbing object and determines the characterization information pair
After the classification answered, recipient is made to know the realization method of the characterization information of object, it is corresponding using recipient as classification
It is responsible for group, and is responsible for the professional characterization information of group interaction object with this, so as to makes accordingly to be responsible for group's root
According to the characterization information of the profession of special object, the valuable feedback of object is known in time.
Example IV
Referring to Fig. 4, a kind of structure diagram of information excavating device provided for the embodiment of the present invention four.The device packet
It includes:Message monitors module 410, message resolution module 420, matching module 430 and characterization information processing module 440.
Wherein, message monitors module 410 and is used to monitor the message issued in instant communication software application;Message resolution module
420, for being parsed to the message listened to, obtain message content;Matching module 430 is for by the message content and in advance
Keyword in the feature recognition dictionary first established is matched;Characterization information processing module 440 is used in successful match
When, the message content or the related content of the message content and the message content are captured as characterization information,
And the characterization information is preserved.
The technical solution of the present embodiment, by monitoring and parsing the message issued in instant communication software application, due to i.e.
When communication software application in the not only classification clarity that gives out information it is high, but also the professional height of information, therefore by that will be resolved to
Message content is matched with the keyword in the feature recognition dictionary pre-established, and captures the message content of successful match,
Or the related content of the message content and the message content of crawl successful match, it can be retouched with the feature of automatic capture special object
Information is stated, saves human cost, and the professional and accuracy of the characterization information of the special object improved, is had
Special object is improved conducive to according to the characterization information.
In said program, described device can also include:Connection establishment module and request sending module.
Wherein, connection establishment module is used in instant communication software application is monitored before the message issued, obtain with
After the instant communication software is using the access rights of corresponding server, establishes and connect with the server;Request sends mould
Block is used to send the addition of group's account or personal user's account in being applied to the instant communication software to the server
Request;The message is monitored module 410 and is specifically used for:In the response message adhereed to for receiving the server return
Afterwards, the message of the user in the group of addition or personal user's publication of addition is monitored.
In said program, described device can also establish module including feature recognition dictionary, for receiving human configuration
Feature recognition dictionary in keyword;Alternatively,
For searching the typical sentence manually included in the chat history of the instant communication software, according to the allusion quotation
The context cooccurrence relation of type sentence is excavated the keyword of expression individual features and is added in feature recognition dictionary.
In said program, described device can also include:First category determining module or second category determining module,
Or third category determination module.
Wherein, first category determining module is used to capture the message content or the message content and described disappear
Cease content related content as after characterization information, the characterization information is preserved before, by the spy
Sign description information is matched with the keyword in the classification identification dictionary pre-established, and the feature is determined according to matching result
The corresponding classification of description information;Second category determining module be used for capture the message content or the message content and
The related content of the message content as after characterization information, the characterization information is preserved before, lead to
It crosses natural language processing (NLP) model and determines the corresponding classification of the characterization information;Third category determination module is used for
Capture the message content or the related content of the message content and the message content as characterization information it
Afterwards, it before the characterization information is preserved, is instructed using previously according to the feature description text for having marked classification information
The probabilistic model practised determines the corresponding classification of the characterization information;The characterization information processing module 440 is specific
For:Determining classification is associated preservation with the characterization information.
Wherein, the second category determining module is specifically used for:Using Arithmetic of Semantic Similarity model and/or click similar
Algorithm model is spent, determines the corresponding classification of characterization information.
Further, described device can also include:Recipient's information determination module and characterization information sending module.
Wherein, recipient's information determination module is used for after the corresponding classification of the characterization information is determined, according to
The classification determines the information of the recipient of the characterization information;Characterization information sending module is used to connect according to
The characterization information is sent to the recipient by the information of debit.
Wherein, the information of the recipient can be the address of setting website, set note number, the mailbox for receiving user
Address or setting receive the instant communication software account of user.
The related content of the message content can include:The context message of the message content;And/or with hair
The user of message content described in cloth establishes session and after the user sends message content augmentation requests, what the user returned
Supplemental content.
In said program, the keyword in the feature recognition dictionary can include the keyword of reflection product defects,
Correspondingly, the characterization information can be the information of description product defects.
Information excavating device provided in an embodiment of the present invention can perform the information excavating that any embodiment of the present invention is provided
Method has the corresponding function module of execution method and advantageous effect.
Note that it above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The present invention is not limited to specific embodiment described here, can carry out for a person skilled in the art various apparent variations,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
It can include other more equivalent embodiments, and the scope of the present invention is determined by scope of the appended claims.
Claims (16)
1. a kind of information mining method, which is characterized in that including:
Monitor the message issued in instant communication software application;
The message listened to is parsed, obtains message content;
The message content is matched with the keyword in the feature recognition dictionary pre-established;
In successful match, the related content of the message content and the message content is captured as characterization information, and
The characterization information is preserved;
Wherein, the related content of the message content includes:Session is being established with issuing the user of the message content and to institute
After stating user's transmission message content augmentation requests, the supplemental content of user's return;Or, the context of the message content disappears
It ceases and is establishing session and after the user sends message content augmentation requests with issuing the user of the message content, it is described
The supplemental content that user returns.
2. the method as described in claim 1, which is characterized in that in instant communication software application is monitored the message issued it
Before, it further includes:
It is obtaining with after access rights of the instant communication software using corresponding server, establishing and connecting with the server
It connects;
The addition that group's account or personal user's account in being applied to the instant communication software are sent to the server please
It asks;
It is described to monitor the message issued in instant communication software application, it specifically includes:
After the response message adhereed to that the server returns is received, the user in the group of addition or addition are monitored
Personal user publication message.
3. the method as described in claim 1, which is characterized in that establish the feature recognition dictionary, specifically include:
Receive the keyword in the feature recognition dictionary of human configuration;Alternatively,
The typical sentence manually included is searched in the chat history of the instant communication software, according to typical case's sentence
Context cooccurrence relation is excavated the keyword of expression individual features and is added in feature recognition dictionary.
4. the method as described in claim 1, which is characterized in that the message content is related to the message content capturing
Content as after characterization information, the characterization information is preserved before, further include:
The characterization information is matched with the keyword in the classification identification dictionary pre-established, according to matching result
Determine the corresponding classification of the characterization information;Or, the characterization information is determined by natural language processing NLP models
Corresponding classification;Or, institute is determined using the probabilistic model trained previously according to the feature description text for having marked classification information
State the corresponding classification of characterization information;
The characterization information preserve and is included:Determining classification is associated guarantor with the characterization information
It deposits.
5. method as claimed in claim 4, which is characterized in that determine that the feature is retouched by natural language processing NLP models
The corresponding classification of information is stated, is specifically included:
Using Arithmetic of Semantic Similarity model and/or similarity algorithm model is clicked, determines that the characterization information corresponds to
Classification.
6. method as claimed in claim 4, which is characterized in that after the corresponding classification of the characterization information is determined,
It further includes:
The information of the recipient of the characterization information is determined according to the classification;
The characterization information is sent to by the recipient according to the information of the recipient.
7. method as claimed in claim 6, which is characterized in that the information of the recipient is the address for setting website, setting
The note number, email address or setting for receiving user receive the instant communication software account of user.
8. the method as described in any in claim 1-7, which is characterized in that the keyword in the feature recognition dictionary includes
Reflect the keyword of product defects, the characterization information is the information for describing product defects.
9. a kind of information excavating device, which is characterized in that including:
Message monitors module, for monitoring the message issued in instant communication software application;
Message resolution module for being parsed to the message listened to, obtains message content;
Matching module, for the message content to be matched with the keyword in the feature recognition dictionary pre-established;
Characterization information processing module, in successful match, capturing the phase of the message content and the message content
Hold inside the Pass as characterization information, and the characterization information is preserved;
Wherein, the related content of the message content includes:Session is being established with issuing the user of the message content and to institute
After stating user's transmission message content augmentation requests, the supplemental content of user's return;Or, the context of the message content disappears
It ceases and is establishing session and after the user sends message content augmentation requests with issuing the user of the message content, it is described
The supplemental content that user returns.
10. device as claimed in claim 9, which is characterized in that described device further includes:
Connection establishment module, before the message issued in instant communication software application is monitored, obtain with it is described immediately
After communication software is using the access rights of corresponding server, establishes and connect with the server;
Request sending module, for sending group's account or the individual in being applied to the instant communication software to the server
The addition request of user account;
The message is monitored module and is specifically used for:After the response message adhereed to that the server returns is received, prison
Listen the message of the user in the group of addition or personal user's publication of addition.
11. device as claimed in claim 9, which is characterized in that described device further includes feature recognition dictionary and establishes module, uses
Keyword in the feature recognition dictionary for receiving human configuration;Alternatively,
For searching the typical sentence manually included in the chat history of the instant communication software, according to typical case's language
The context cooccurrence relation of sentence is excavated the keyword of expression individual features and is added in feature recognition dictionary.
12. device as claimed in claim 9, which is characterized in that described device further includes:
First category determining module, for being retouched in the related content for capturing the message content and the message content as feature
After stating information, the characterization information is preserved before, by the characterization information and the classification pre-established
Keyword in identification dictionary is matched, and the corresponding classification of the characterization information is determined according to matching result;Or
Second category determining module, for being retouched in the related content for capturing the message content and the message content as feature
After stating information, the characterization information is preserved before, the feature is determined by natural language processing NLP models
The corresponding classification of description information;Or
Third category determination module, for being retouched in the related content for capturing the message content and the message content as feature
After stating information, the characterization information is preserved before, retouched using previously according to the feature for having marked classification information
It states the probabilistic model that text trains and determines the corresponding classification of the characterization information;
The characterization information processing module is specifically used for:Determining classification is associated guarantor with the characterization information
It deposits.
13. device as claimed in claim 12, which is characterized in that the second category determining module is specifically used for:Using language
Adopted similarity algorithm model and/or click similarity algorithm model, determine the corresponding classification of characterization information.
14. device as claimed in claim 12, which is characterized in that described device further includes:
Recipient's information determination module, for after the corresponding classification of the characterization information is determined, according to the classification
Determine the information of the recipient of the characterization information;
Characterization information sending module, the characterization information is sent to for the information according to the recipient described in
Recipient.
15. device as claimed in claim 14, which is characterized in that the information of the recipient is the address for setting website, sets
Surely the note number, email address or setting for receiving user receive the instant communication software account of user.
16. the device as described in any in claim 9-15, which is characterized in that the keyword packet in the feature recognition dictionary
The keyword of the product defects containing reflection, the characterization information are the information for describing product defects.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410710424.7A CN104346480B (en) | 2014-11-27 | 2014-11-27 | information mining method and device |
PCT/CN2015/086095 WO2016082575A1 (en) | 2014-11-27 | 2015-08-05 | Information mining method and apparatus, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410710424.7A CN104346480B (en) | 2014-11-27 | 2014-11-27 | information mining method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104346480A CN104346480A (en) | 2015-02-11 |
CN104346480B true CN104346480B (en) | 2018-06-26 |
Family
ID=52502071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410710424.7A Active CN104346480B (en) | 2014-11-27 | 2014-11-27 | information mining method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104346480B (en) |
WO (1) | WO2016082575A1 (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104346480B (en) * | 2014-11-27 | 2018-06-26 | 百度在线网络技术(北京)有限公司 | information mining method and device |
CN105282012A (en) * | 2015-10-23 | 2016-01-27 | 广东小天才科技有限公司 | Method and system for enhancing information reminding when group chat is carried out |
CN106649404B (en) * | 2015-11-04 | 2019-12-27 | 陈包容 | Method and device for creating session scene database |
CN108345582B (en) * | 2017-01-23 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Method and device for identifying social group engaged business |
CN107526779A (en) * | 2017-07-22 | 2017-12-29 | 长沙兔子代跑网络科技有限公司 | A kind of method and device for excavating generation race client |
CN107491493A (en) * | 2017-07-22 | 2017-12-19 | 长沙兔子代跑网络科技有限公司 | A kind of intelligence obtains the method and device for running chat record in generation |
CN109063029A (en) * | 2018-07-10 | 2018-12-21 | 苏奇 | A kind of information filing management method based on instant communication software |
CN109582719B (en) * | 2018-10-19 | 2021-08-24 | 国电南瑞科技股份有限公司 | Method and system for automatically linking SCD file of intelligent substation to virtual terminal |
US11587095B2 (en) * | 2019-10-15 | 2023-02-21 | Microsoft Technology Licensing, Llc | Semantic sweeping of metadata enriched service data |
CN113765767A (en) * | 2020-06-02 | 2021-12-07 | 上海回声网络科技有限公司 | Enterprise WeChat supervision method and system |
CN113051476B (en) * | 2021-03-25 | 2023-06-13 | 北京百度网讯科技有限公司 | Method and device for sending message |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020133477A1 (en) * | 2001-03-05 | 2002-09-19 | Glenn Abel | Method for profile-based notice and broadcast of multimedia content |
CN1987852A (en) * | 2005-12-21 | 2007-06-27 | 腾讯科技(深圳)有限公司 | Method and device for determining communication object attribute according to news content |
CN101166160B (en) * | 2006-10-20 | 2010-09-15 | 阿里巴巴集团控股有限公司 | A method and system for filtering instant communication rubbish information |
CN102323933A (en) * | 2011-08-31 | 2012-01-18 | 张潇 | Information embedding and interaction system facing real-time communication and method |
CN102419778B (en) * | 2012-01-09 | 2013-03-20 | 中国科学院软件研究所 | Information searching method for discovering and clustering sub-topics of query statement |
CN103577416B (en) * | 2012-07-20 | 2017-09-22 | 阿里巴巴集团控股有限公司 | Expanding query method and system |
CN102970210A (en) * | 2012-11-02 | 2013-03-13 | 北京百度网讯科技有限公司 | Method and device for reminding group messages in instant chat tool |
CN103605690A (en) * | 2013-11-04 | 2014-02-26 | 北京奇虎科技有限公司 | Device and method for recognizing advertising messages in instant messaging |
CN104346480B (en) * | 2014-11-27 | 2018-06-26 | 百度在线网络技术(北京)有限公司 | information mining method and device |
-
2014
- 2014-11-27 CN CN201410710424.7A patent/CN104346480B/en active Active
-
2015
- 2015-08-05 WO PCT/CN2015/086095 patent/WO2016082575A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
CN104346480A (en) | 2015-02-11 |
WO2016082575A1 (en) | 2016-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104346480B (en) | information mining method and device | |
US20200143288A1 (en) | Training of Chatbots from Corpus of Human-to-Human Chats | |
CN107870896B (en) | Conversation analysis method and device | |
US20190243916A1 (en) | Cognitive Ranking of Terms Used During a Conversation | |
CN111182162B (en) | Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence | |
CN107680019A (en) | A kind of implementation method of Examination Scheme, device, equipment and storage medium | |
CN113468296B (en) | Model self-iteration type intelligent customer service quality inspection system and method capable of configuring business logic | |
CN107451110A (en) | A kind of method, apparatus and server for generating meeting summary | |
US11520983B2 (en) | Methods and systems for trending issue identification in text streams | |
CN104050221A (en) | Automatic note taking within a virtual meeting | |
CN105373478B (en) | Automated testing method and system | |
US10885080B2 (en) | Cognitive ranking of terms used during a conversation | |
US10685655B2 (en) | Leveraging natural language processing | |
CN110222513B (en) | Abnormality monitoring method and device for online activities and storage medium | |
CN115099239B (en) | Resource identification method, device, equipment and storage medium | |
CN113111658B (en) | Method, device, equipment and storage medium for checking information | |
CN110163013A (en) | A kind of method and apparatus detecting sensitive information | |
US20200220741A1 (en) | System and Method for Modeling an Asynchronous Communication Channel | |
CN106649102A (en) | Graphical interface program testing log record and replay method based on hook function | |
Lima et al. | Land of lost knowledge: an initial investigation into projects lost knowledge | |
WO2022206307A1 (en) | Method for electronic messaging using image based noisy content | |
CN115114495B (en) | Airworthiness data management auxiliary method and system based on deep learning | |
CN114827237B (en) | Remote connection operation log recording method and electronic equipment | |
WO2021232282A1 (en) | Vulnerability information obtaining method and apparatus, and electronic device and storage medium | |
CN113688280B (en) | Ordering method, ordering device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |