CN104346480A - Information mining method and device - Google Patents

Information mining method and device Download PDF

Info

Publication number
CN104346480A
CN104346480A CN201410710424.7A CN201410710424A CN104346480A CN 104346480 A CN104346480 A CN 104346480A CN 201410710424 A CN201410710424 A CN 201410710424A CN 104346480 A CN104346480 A CN 104346480A
Authority
CN
China
Prior art keywords
information
message content
message
feature interpretation
interpretation information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410710424.7A
Other languages
Chinese (zh)
Other versions
CN104346480B (en
Inventor
刘松
孙凯
陶明远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410710424.7A priority Critical patent/CN104346480B/en
Publication of CN104346480A publication Critical patent/CN104346480A/en
Priority to PCT/CN2015/086095 priority patent/WO2016082575A1/en
Application granted granted Critical
Publication of CN104346480B publication Critical patent/CN104346480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24537Query rewriting; Transformation of operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)

Abstract

The embodiment of the invention provides an information mining method and device. The method comprises the following steps: monitoring a message issued in an instant communication software application; resolving the monitored message to obtain message content; matching the message content with a keyword in a pre-established feature identification dictionary; when the matching is successful, capturing the message content or the message content and relevant content of the message content as feature description information, and saving the feature description message. The message issued in the instant communication software application has high category definition and high information speciality, so that the feature description message of a specific object can be automatically captured by matching the resolved message content with the keyword in the feature identification dictionary and grasping the successfully-matched message content or grasping the successfully-matched message content and relevant content of the message content. Therefore, the labor cost is reduced, and the speciality and accuracy of the obtained feature description information of the specific object are enhanced and increased.

Description

Information mining method and device
Technical field
The embodiment of the present invention relates to areas of information technology, particularly relates to a kind of information mining method and device.
Background technology
Obtain the information relevant to the object such as product or service in prior art, such as during product defects descriptor helpful to the improvement of product, normally pass through manually to capture in the forum or webpage of association area, inefficiency and accuracy is not high.
Summary of the invention
The embodiment of the present invention provides a kind of information mining method and device, to realize the characteristic information of automatically catching special object, saves human cost, and promotes the accuracy of the characteristic information of the special object captured.
First aspect, embodiments provides a kind of information mining method, comprising:
Monitor the message issued in instant communication software application;
The message listened to is resolved, obtains message content;
Described message content is mated with the keyword in the feature identification dictionary set up in advance;
When the match is successful, capture described message content, or the related content of described message content and described message content is as feature interpretation information, and described feature interpretation information is preserved.
Second aspect, the embodiment of the present invention additionally provides a kind of information excavating device, comprising:
Message monitors module, for monitoring the message issued in instant communication software application;
Message resolution module, for resolving the message listened to, obtains message content;
Matching module, for mating described message content with the keyword in the feature identification dictionary set up in advance;
Feature interpretation message processing module, for when the match is successful, capture described message content, or the related content of described message content and described message content is as feature interpretation information, and described feature interpretation information is preserved.
The information mining method that the embodiment of the present invention provides and device, by monitoring and resolving the message issued in instant communication software application, owing to giving out information not only in instant communication software application, classification sharpness is high, and the professional height of information, therefore by the message content be resolved to is mated with the keyword in the feature identification dictionary set up in advance, and capture the message content that the match is successful, or the related content of the crawl message content that the match is successful and this message content, automatically the feature interpretation information of special object can be caught, save human cost, and improve the professional and accuracy of the feature interpretation information of the special object obtained, be conducive to improving special object according to described feature interpretation information.
Accompanying drawing explanation
The process flow diagram of a kind of information mining method that Fig. 1 provides for the embodiment of the present invention one;
The process flow diagram of a kind of information mining method that Fig. 2 provides for the embodiment of the present invention two;
The process flow diagram of a kind of information mining method that Fig. 3 a provides for the embodiment of the present invention three;
The process flow diagram of the another kind of information mining method that Fig. 3 b provides for the embodiment of the present invention three;
The process flow diagram of another information mining method that Fig. 3 c provides for the embodiment of the present invention three;
The structural representation of a kind of information excavating device that Fig. 4 provides for the embodiment of the present invention four.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not entire infrastructure.
Embodiment one
Referring to Fig. 1, is the process flow diagram of a kind of information mining method that the embodiment of the present invention one provides.The method of the embodiment of the present invention can be performed by the information excavating device being configured to hardware and/or software simulating, and this implement device is typically configured at can be provided in the server of data mining service.
The method comprises: operation 110 ~ operation 140.
110, the message issued in instant communication software application is monitored.
Usually, each enterprises has the instant communication software application of being correlated with this enterprise product or department, gives out information to facilitate the colony being responsible for each research and development of products colony or responsible operation maintenance in this enterprise.
Such as, the Baidu Hi that company of Baidu releases is the Instant Messenger (IM) software application of the functions such as a collection word message, voice and video call and file transfer, the group corresponding with product " Baidu's map " or product " Baidu's translation " etc. set up in Baidu Hi, gives out information to facilitate the staff being responsible for each research and development of products or responsible operation maintenance in company of Baidu.
Wherein, the mode given out information has multiple, can issue with written form, and also can issue with other forms such as voice, video or pictures, the present embodiment does not limit this, supports as long as obtain instant communication software application.
The word message issued in the group relevant to enterprise product in instant communication software application or the group that is correlated with business enterprice sector is specifically monitored in this operation.
120, the message listened to is resolved, obtain message content.
In this operation, specifically according to the communication protocol of instant communication software application, the message listened to is translated, correctly restores the raw data corresponding with the message listened to, also namely restore the character string that can read.
130, described message content is mated with the keyword in the feature identification dictionary set up in advance.
This operation specifically utilizes keyword match technique, according to the feature identification dictionary set up in advance, determines whether comprise the keyword in described feature identification dictionary in described message content.
It should be noted that, the colony that in enterprise, each object is corresponding gives out information difference, and the message content be resolved to is different.Colony has that classification sharpness is high, (group members that such as each group comprises is a kind or the crowd doing identical product for the professional height of information and the obvious feature of language feature, group members is the same or analogous specialty background of tool all), the message that therefore different groups are issued can reflect Enterprise Object information.
Wherein, object can be each concrete product, also can be the macroscopic objects such as business administration.
Such as, " Baidu's map " group that product is corresponding is the colony that company of Baidu is responsible for " Baidu's map " research and development or operation maintenance, and the message that in this group, group members is issued includes the relative merits information of this product or the follow-up improvement information of this product.
And for example, the message that in the debugging group that " Baidu's browser " product is corresponding, group members is issued includes the bug or doubtful problem that occur in this production debugging process.
Therefore, group that can be corresponding to the object that enterprise is different sets up corresponding feature identification dictionary, thus obtain different object (such as different product, or business administration) characteristic of correspondence descriptor (the relative merits information of such as different product, or business administration Problems existing); To the different groups of enterprise's same target, preferably set up corresponding feature identification dictionary, thus obtain the feature interpretation information of the different aspects relevant with same target.
Such as, set up the feature identification dictionary relevant with research and development to the research and development colony in " Baidu's map " product, the keyword in this dictionary can comprise " research and development ", " progress ", " trend ", " cost " and " opponent " etc.; Set up the feature identification dictionary relevant with debugging to the debugging colony in " Baidu's map " product, the keyword in this dictionary can comprise " debugging errors ", " debugging cycle ", " bug ", " leak " and " defect " etc.; Set up the issue colony in " Baidu's map " product and issue relevant feature identification dictionary, the keyword in this dictionary can comprise " issue ", " news conference ", " issue stroke " and " date issued " etc.
140, when the match is successful, capture described message content, or the related content of described message content and described message content is as feature interpretation information, and described feature interpretation information is preserved.
In this operation, can have two kinds of embodiments, one is when the match is successful, captures described message content as feature interpretation information, and described feature interpretation information is preserved; Another kind is when the match is successful, captures the related content of described message content and described message content as feature interpretation information, and described feature interpretation information is preserved.
Wherein, preferably capture the related content of described message content and described message content as feature interpretation information, and described feature interpretation information is preserved, compared to only capturing described message content, this optimal way is conducive to the complete feature interpretation information obtaining object.
Can set and capture the time interval and/or capture number, to capture the related content of the message content that the match is successful, such as, the crawl time interval is set as 15s, crawl number is set as 5.
Further, the related content of described message content can comprise: the context message of described message content; And/or, after the user with the described message content of issue sets up session and sends message content augmentation requests to described user, the supplemental content that described user returns.
Example 1
Be described for " Baidu's browser " product for object.Comprise in a large number about evaluation and the problem discussion of this product in the message of certain group issue of this product, such as: a designer of this product gives out information " when logging in Baidu's browser in exploitation group, logon rights has problem ", then another designer of this product gives out information " really in this exploitation group, reason is A ", after matching operation, " when logging in Baidu's browser, logon rights has problem " this gives out information and the match is successful for the keyword " problem " in described feature identification dictionary, by capturing message content " when logging in Baidu's browser, logon rights has problem ", the defect characteristic of correspondence descriptor of this product can be obtained, and it is " certain by the context message capturing this message content, reason is A ", the producing cause characteristic of correspondence descriptor of this defect in this product can be obtained, thus enriched the feature interpretation information of this product.
It should be noted that, the above-mentioned producing cause characteristic of correspondence descriptor for the defect characteristic of correspondence descriptor He this defect that capture product is described, except the producing cause characteristic of correspondence descriptor capturing this defect, other feature interpretation information such as solution corresponding to this defect can also be captured, as the complete information of product defects, formatting lines (such as [name of product, defect content, producing cause]) of going forward side by side stores, and the present embodiment does not limit this.
Example 2
After setting up session with the user issuing described message content, and take didactic enquirement to send message content augmentation requests to described user, to ask the complete description of supplementary defect, now can dialogue-based (session) capture, the situation of dimension more (such as defect type, defect producing cause etc.) is namely described for defect, longer crawl time (as one minute) is set, at this moment, capture the supplemental content that described user returns.If still there is no additional notes at this moment, then only record essential information, or return failure because necessary information is incomplete.
The technical scheme of the present embodiment, by monitoring and resolving the message issued in instant communication software application, owing to giving out information not only in instant communication software application, classification sharpness is high, and the professional height of information, therefore by the message content be resolved to is mated with the keyword in the feature identification dictionary set up in advance, and capture the message content that the match is successful, or the related content of the crawl message content that the match is successful and this message content, automatically the feature interpretation information of special object can be caught, save human cost, and improve the professional and accuracy of the feature interpretation information of the special object obtained, be conducive to improving special object according to described feature interpretation information.
In the present embodiment, set up described feature identification dictionary, specifically can comprise:
Receive the keyword in the feature identification dictionary of human configuration; Or,
In the chat history of described instant communication software, search the typical statement of manually including, according to the context cooccurrence relation of this typical statement, excavate the keyword of expression individual features and be added in feature identification dictionary.
In other words, can each keyword in human configuration feature identification dictionary, such as, in feature identification dictionary, configure the keyword such as " problem ", " defect " or " improvement ".
Also some typical statements can manually be included, and according to the context cooccurrence relation of statement typical in chat history, thus using the word of the expression characteristic in the typical statement meeting certain co-occurrence frequency as keyword, and be added in feature identification dictionary; Or excavate the semantic template of expression characteristic.
Such as, in the research and development group of " Baidu's browser " product of Baidu Hi, a people says " retrieval type=xxx; figure mistake; so-and-so looks at ", another person answers " quite right; to be a problem; recording defect ", if when repeatedly there is the pairing of " figure mistake " and " recording defect " this two word in group's message, just think that these two words of pairing exist cooccurrence relation, show that this is the defect of needs record, based on this, the semantic template " [any word] figure mistake " of expressing defect can be excavated.
The information mining method that the present embodiment provides, can be applied to several scenes, such as, according to the product defects characteristic of correspondence identification dictionary set up, obtains the defect descriptor that object is product; And for example, according to the production debugging characteristic of correspondence identification dictionary set up, the debugging problem descriptor that object is product is obtained; For another example, according to the business administration characteristic of correspondence identification dictionary set up, obtaining object is the descriptors such as the management suggestion collection of business administration event, and the present embodiment does not limit this.
Particularly, when for when to catch object be the defect descriptor of product, the keyword in described feature identification dictionary comprises the keyword of reflection product defects, and described feature interpretation information is the information describing product defects.Present embodiments provide for and excavate from product, capture to defect related content, be finally saved in the implementation of the full-automation of designated space, the staple product group of all product lines of enterprise can be covered.
Embodiment two
Referring to Fig. 2, is the process flow diagram of a kind of information mining method that the embodiment of the present invention two provides.The present embodiment, on the basis of above-described embodiment, provides and is monitoring the preferred version before the message issued in instant communication software application.This method for optimizing comprises: operation 210 ~ operation 220.
210, after acquisition applies the access rights of corresponding server with described instant communication software, connect with described server.
Such as, obtain the access rights applying server corresponding to " Baidu Hi " with instant communication software, and connect with this server.
220, joining request to the group's account in described instant communication software application or personal user's account is sent to described server.
Such as, the server corresponding to instant communication software application " Baidu Hi " sends joining request of group account " Baidu's browser-research and development group ", thus makes the group members newly added can issue the message relevant with product " Baidu's browser " in this group.
And for example, send personal user's account to the server that instant communication software application " Baidu Hi " is corresponding and join request, the individual account newly added can be chatted with regard to identical product with other people's accounts adding this application, forms the message issued; The individual account newly added can apply for adding the group's account adding this application, thus the group members newly added is given out information in this group.
The technical scheme of the present embodiment, before monitoring the message issued in instant communication software application, connect by applying corresponding server with instant communication software, and mutual account joins request, thus the account added in the application of this instant communication software can be given out information in this application.
It should be noted that, sending after the joining request of the group's account in described instant communication software application or personal user's account to described server, monitor the message issued in instant communication software application, specifically comprise: after receiving the response message of adhereing to that described server returns, monitor the message that the user in the group added or the personal user that adds issue.
Embodiment three
Referring to Fig. 3 a, is the process flow diagram of a kind of information mining method that the embodiment of the present invention three provides.The present embodiment is on the basis of the various embodiments described above, provide at the described message content of crawl, or the related content of described message content and described message content is as the preferred version before preserving after feature interpretation information, by described feature interpretation information.
This method for optimizing comprises: operation 310 ~ operation 360.
310, the message issued in instant communication software application is monitored.
320, the message listened to is resolved, obtain message content.
330, described message content is mated with the keyword in the feature identification dictionary set up in advance.
340, when the match is successful, capture described message content, or the related content of described message content and described message content is as feature interpretation information.
350, described feature interpretation information is mated with the keyword in the classification identification dictionary set up in advance, determine according to matching result the classification that described feature interpretation information is corresponding.
As previously mentioned, the information mining method that the embodiment of the present invention provides, can be applied to several scenes, therefore according to practical application request, can set up the classification identification dictionary including multiple application demand.
Keyword in classification identification dictionary can human configuration.Keyword in described classification identification dictionary can comprise: Baidu's map research and development defect, Baidu's browser debugging defect and Baidu's translation research and development improve, and the present embodiment does not limit this.
360, the classification determined is carried out associating with described feature interpretation information preserve.
The technical scheme of the present embodiment, by monitoring and resolving the message issued in instant communication software application, owing to giving out information not only in instant communication software application, classification sharpness is high, and the professional height of information, therefore by the message content be resolved to is mated with the keyword in the feature identification dictionary set up in advance, and capture the message content that the match is successful, or the related content of the crawl message content that the match is successful and this message content, automatically the feature interpretation information of special object can be caught, save human cost, and improve the professional and accuracy of the feature interpretation information of the special object obtained, be conducive to improving special object according to described feature interpretation information, after the feature interpretation information grabbing object, by determining the classification that described feature interpretation information is corresponding, and the classification determined is carried out associating with described feature interpretation information preserve, be conducive to binding responsible colony corresponding to classification, thus corresponding responsible colony can be made according to the feature interpretation information of the specialty of special object, know the valuable feedback of object in time.
It should be noted that, the wherein one in the embodiment of the classification that feature interpretation information is corresponding is just determined in operation 350, determine that classification corresponding to feature interpretation information can also be: determine by natural language processing (Natural Language Processing, NLP) model the classification (operation 351 as shown in Figure 3 b) that described feature interpretation information is corresponding.
Specifically can adopt Arithmetic of Semantic Similarity model and/or click similarity algorithm model, determining the classification that described feature interpretation information is corresponding.
Wherein, semantic similarity make use of the similarity that the measure of supervision training pattern of training on natural language processing cloud backstage analyzes two sections of texts.Value is more large more similar.The networking of semantic similarity provides the function calculating similarity.Such as input " notebook computer ", the semantic similarity of " notebook " is 2.08478.
Wherein, click similarity to use when semantic similarity cannot reach threshold value (as 1.8), analyze the click similarity (title in such as retrieval type and result for retrieval) of two sections of texts, use the embedding vector calculation cosine Similarity value of training, span [-1,1], larger click similarity is worth stronger.Such as inputting " hello in Baidu " and " hello for all great Yi " both click similarities is-0.121407, and input " hello in Baidu " and " Li Yan is grand, and hello " both click similarities are 0.218664; It is higher than the former that the latter clicks similarity.
Preferentially feature interpretation information and the multiple classifications preset are carried out semantic similarity judgement respectively in actual use, return semantic similarity and reach threshold value and the highest classification, if the semantic similarity of feature interpretation information and pre-set categories does not reach threshold value, then continue that feature interpretation information and this pre-set categories are carried out click similarity to judge, if click similarity to reach threshold value, return respective classes, if click similarity not reach threshold value, then return default category (as: other).Threshold value can according to the continuous matching of historical data, to keep higher accuracy.
Also it should be noted that, determine that classification corresponding to feature interpretation information can also be: adopt the probability model trained according to the feature interpretation text having marked classification information in advance to determine the classification that described feature interpretation information is corresponding, described probability model be input as feature interpretation text, export for belong to setting classification probable value (operation 352 as shown in Figure 3 c).Concrete, feature interpretation text training in advance according to marking classification information goes out probability model, described feature interpretation information is inputted this probability model, obtain the classification A corresponding to described feature interpretation information of this probability model output and the probable value of this classification A corresponding, if this probable value meets certain threshold value, then determine that classification corresponding to described feature interpretation information is classification A.Such as can by the manual sort's mark in chat record and corresponding description text, train the probability model of P (type | feature interpretation information), training method can be selected flexibly according to the business scope feature of system, typical in Nae Bayesianmethod.In the application, if customer problem describes the probability belonging to a certain Question Classification meet certain threshold value, can think and belong to this classification.
On the basis of the present embodiment, after determining the classification that described feature interpretation information is corresponding, following operation can also be comprised:
The information of the take over party of described feature interpretation information is determined according to described classification;
Described feature interpretation information is sent to described take over party by the information according to described take over party.
Wherein, the information of described take over party can receive note number, the email address of user for the address of setting website, setting or set the instant communication software account receiving user.
Present embodiment, provide after grabbing the feature interpretation information of object and determine the classification that described feature interpretation information is corresponding, take over party is made to know the implementation of the feature interpretation information of object, using take over party as responsible colony corresponding to classification, and be responsible for the feature interpretation information of the specialty of group interaction object with this, thus corresponding responsible colony can be made according to the feature interpretation information of the specialty of special object, know the valuable feedback of object in time.
Embodiment four
Referring to Fig. 4, is the structural representation of a kind of information excavating device that the embodiment of the present invention four provides.This device comprises: message monitors module 410, message resolution module 420, matching module 430 and feature interpretation message processing module 440.
Wherein, message monitors module 410 for monitoring the message issued in instant communication software application; Message resolution module 420, for resolving the message listened to, obtains message content; Matching module 430 is for mating described message content with the keyword in the feature identification dictionary set up in advance; Feature interpretation message processing module 440, for when the match is successful, captures described message content, or the related content of described message content and described message content is as feature interpretation information, and described feature interpretation information is preserved.
The technical scheme of the present embodiment, by monitoring and resolving the message issued in instant communication software application, owing to giving out information not only in instant communication software application, classification sharpness is high, and the professional height of information, therefore by the message content be resolved to is mated with the keyword in the feature identification dictionary set up in advance, and capture the message content that the match is successful, or the related content of the crawl message content that the match is successful and this message content, automatically the feature interpretation information of special object can be caught, save human cost, and improve the professional and accuracy of the feature interpretation information of the special object obtained, be conducive to improving special object according to described feature interpretation information.
In such scheme, described device can also comprise: connection establishment module and request sending module.
Wherein, connection establishment module is used for, before monitoring the message issued in instant communication software application, after acquisition applies the access rights of corresponding server with described instant communication software, connecting with described server; Request sending module is used for sending joining request to the group's account in described instant communication software application or personal user's account to described server; Described message monitor module 410 specifically for: after receiving the response message of adhereing to that described server returns, monitor the message that the user in the group added or the personal user that adds issue.
In such scheme, described device can also comprise feature identification dictionary and set up module, for receiving the keyword in the feature identification dictionary of human configuration; Or,
For searching the typical statement of manually including in the chat history of described instant communication software, according to the context cooccurrence relation of this typical statement, excavating the keyword of expression individual features and being added in feature identification dictionary.
In such scheme, described device can also comprise: first category determination module or the second classification determination module or the 3rd classification determination module.
Wherein, first category determination module is used at the described message content of crawl, or the related content of described message content and described message content is as before preserving after feature interpretation information, by described feature interpretation information, described feature interpretation information is mated with the keyword in the classification identification dictionary set up in advance, determines according to matching result the classification that described feature interpretation information is corresponding; Second classification determination module is used at the described message content of crawl, or the related content of described message content and described message content, as before preserving after feature interpretation information, by described feature interpretation information, determines by natural language processing (NLP) model the classification that described feature interpretation information is corresponding; 3rd classification determination module is used at the described message content of crawl, or the related content of described message content and described message content is as before preserving after feature interpretation information, by described feature interpretation information, the probability model trained according to the feature interpretation text having marked classification information is in advance adopted to determine the classification that described feature interpretation information is corresponding; Described feature interpretation message processing module 440 specifically for: the classification determined is carried out associating with described feature interpretation information and preserves.
Wherein, described second classification determination module specifically for: adopt Arithmetic of Semantic Similarity model and/or click similarity algorithm model, determining the classification that described feature interpretation information is corresponding.
Further, described device can also comprise: take over party's information determination module and feature interpretation information sending module.
Wherein, take over party's information determination module is used for after determining the classification that described feature interpretation information is corresponding, determines the information of the take over party of described feature interpretation information according to described classification; Feature interpretation information sending module is used for, according to the information of described take over party, described feature interpretation information is sent to described take over party.
Wherein, the information of described take over party can receive note number, the email address of user for the address of setting website, setting or set the instant communication software account receiving user.
The related content of described message content can comprise: the context message of described message content; And/or, after the user with the described message content of issue sets up session and sends message content augmentation requests to described user, the supplemental content that described user returns.
In such scheme, the keyword in described feature identification dictionary can comprise the keyword of reflection product defects, and correspondingly, described feature interpretation information can for describing the information of product defects.
The information excavating device that the embodiment of the present invention provides can perform the information mining method that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.
Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims (18)

1. an information mining method, is characterized in that, comprising:
Monitor the message issued in instant communication software application;
The message listened to is resolved, obtains message content;
Described message content is mated with the keyword in the feature identification dictionary set up in advance;
When the match is successful, capture described message content, or the related content of described message content and described message content is as feature interpretation information, and described feature interpretation information is preserved.
2. the method for claim 1, is characterized in that, before monitoring the message issued in instant communication software application, also comprises:
After acquisition applies the access rights of corresponding server with described instant communication software, connect with described server;
Joining request to the group's account in described instant communication software application or personal user's account is sent to described server;
The message issued in described monitoring instant communication software application, specifically comprises:
After receiving the response message of adhereing to that described server returns, monitor the message that the user in the group added or the personal user that adds issue.
3. the method for claim 1, is characterized in that, sets up described feature identification dictionary, specifically comprises:
Receive the keyword in the feature identification dictionary of human configuration; Or,
In the chat history of described instant communication software, search the typical statement of manually including, according to the context cooccurrence relation of this typical statement, excavate the keyword of expression individual features and be added in feature identification dictionary.
4. the method for claim 1, is characterized in that, at the described message content of crawl, or the related content of described message content and described message content is as before preserving after feature interpretation information, by described feature interpretation information, also comprises:
Described feature interpretation information is mated with the keyword in the classification identification dictionary set up in advance, determines according to matching result the classification that described feature interpretation information is corresponding; Or, determine by natural language processing NLP model the classification that described feature interpretation information is corresponding; Or, adopt the probability model trained according to the feature interpretation text having marked classification information in advance to determine the classification that described feature interpretation information is corresponding;
Described feature interpretation information is carried out preservation comprise: the classification determined is carried out associating with described feature interpretation information and preserves.
5. method as claimed in claim 4, is characterized in that, determine specifically to comprise the classification that described feature interpretation information is corresponding by natural language processing NLP model:
Adopt Arithmetic of Semantic Similarity model and/or click similarity algorithm model, determining the classification that described feature interpretation information is corresponding.
6. method as claimed in claim 4, is characterized in that, after determining the classification that described feature interpretation information is corresponding, also comprise:
The information of the take over party of described feature interpretation information is determined according to described classification;
Described feature interpretation information is sent to described take over party by the information according to described take over party.
7. method as claimed in claim 6, is characterized in that, the note number that the information of described take over party is the address of setting website, setting receives user, email address or setting receive the instant communication software account of user.
8. the method for claim 1, is characterized in that, the related content of described message content comprises: the context message of described message content; And/or, after the user with the described message content of issue sets up session and sends message content augmentation requests to described user, the supplemental content that described user returns.
9. as the method as described in arbitrary in claim 1-8, it is characterized in that, the keyword in described feature identification dictionary comprises the keyword of reflection product defects, and described feature interpretation information is the information describing product defects.
10. an information excavating device, is characterized in that, comprising:
Message monitors module, for monitoring the message issued in instant communication software application;
Message resolution module, for resolving the message listened to, obtains message content;
Matching module, for mating described message content with the keyword in the feature identification dictionary set up in advance;
Feature interpretation message processing module, for when the match is successful, capture described message content, or the related content of described message content and described message content is as feature interpretation information, and described feature interpretation information is preserved.
11. devices as claimed in claim 10, it is characterized in that, described device also comprises:
Connection establishment module, for before monitoring the message issued in instant communication software application, after acquisition applies the access rights of corresponding server with described instant communication software, connects with described server;
Request sending module, for sending joining request to the group's account in described instant communication software application or personal user's account to described server;
Described message monitor module specifically for: after receiving the response message of adhereing to that described server returns, monitor the message that the user in the group added or the personal user that adds issue.
12. devices as claimed in claim 10, is characterized in that, described device also comprises feature identification dictionary and sets up module, for receiving the keyword in the feature identification dictionary of human configuration; Or,
For searching the typical statement of manually including in the chat history of described instant communication software, according to the context cooccurrence relation of this typical statement, excavating the keyword of expression individual features and being added in feature identification dictionary.
13. devices as claimed in claim 10, it is characterized in that, described device also comprises:
First category determination module, for capturing described message content, or the related content of described message content and described message content is as before preserving after feature interpretation information, by described feature interpretation information, described feature interpretation information is mated with the keyword in the classification identification dictionary set up in advance, determines according to matching result the classification that described feature interpretation information is corresponding; Or
Second classification determination module, for capturing described message content, or the related content of described message content and described message content, as before preserving after feature interpretation information, by described feature interpretation information, determines by natural language processing NLP model the classification that described feature interpretation information is corresponding; Or
3rd classification determination module, for capturing described message content, or the related content of described message content and described message content is as before preserving after feature interpretation information, by described feature interpretation information, the probability model trained according to the feature interpretation text having marked classification information is in advance adopted to determine the classification that described feature interpretation information is corresponding;
Described feature interpretation message processing module specifically for: the classification determined is carried out associating with described feature interpretation information and preserves.
14. devices as claimed in claim 13, is characterized in that, described second classification determination module specifically for: adopt Arithmetic of Semantic Similarity model and/or click similarity algorithm model, determining the classification that described feature interpretation information is corresponding.
15. devices as claimed in claim 13, it is characterized in that, described device also comprises:
Take over party's information determination module, for after determining the classification that described feature interpretation information is corresponding, determines the information of the take over party of described feature interpretation information according to described classification;
Feature interpretation information sending module, sends to described take over party for the information according to described take over party by described feature interpretation information.
16. devices as claimed in claim 15, is characterized in that, the note number that the information of described take over party is the address of setting website, setting receives user, email address or setting receive the instant communication software account of user.
17. devices as claimed in claim 10, it is characterized in that, the related content of described message content comprises: the context message of described message content; And/or, after the user with the described message content of issue sets up session and sends message content augmentation requests to described user, the supplemental content that described user returns.
18., as the device as described in arbitrary in claim 10-17, is characterized in that, the keyword in described feature identification dictionary comprises the keyword of reflection product defects, and described feature interpretation information is the information describing product defects.
CN201410710424.7A 2014-11-27 2014-11-27 information mining method and device Active CN104346480B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410710424.7A CN104346480B (en) 2014-11-27 2014-11-27 information mining method and device
PCT/CN2015/086095 WO2016082575A1 (en) 2014-11-27 2015-08-05 Information mining method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410710424.7A CN104346480B (en) 2014-11-27 2014-11-27 information mining method and device

Publications (2)

Publication Number Publication Date
CN104346480A true CN104346480A (en) 2015-02-11
CN104346480B CN104346480B (en) 2018-06-26

Family

ID=52502071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410710424.7A Active CN104346480B (en) 2014-11-27 2014-11-27 information mining method and device

Country Status (2)

Country Link
CN (1) CN104346480B (en)
WO (1) WO2016082575A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105282012A (en) * 2015-10-23 2016-01-27 广东小天才科技有限公司 Method and system for enhancing information reminding when group chat is carried out
WO2016082575A1 (en) * 2014-11-27 2016-06-02 百度在线网络技术(北京)有限公司 Information mining method and apparatus, and storage medium
CN106649404A (en) * 2015-11-04 2017-05-10 陈包容 Session scene database creation method and apparatus
CN107491493A (en) * 2017-07-22 2017-12-19 长沙兔子代跑网络科技有限公司 A kind of intelligence obtains the method and device for running chat record in generation
CN107526779A (en) * 2017-07-22 2017-12-29 长沙兔子代跑网络科技有限公司 A kind of method and device for excavating generation race client
CN108345582A (en) * 2017-01-23 2018-07-31 腾讯科技(深圳)有限公司 A kind of method and device that identification social group is done business
CN109063029A (en) * 2018-07-10 2018-12-21 苏奇 A kind of information filing management method based on instant communication software
CN109582719A (en) * 2018-10-19 2019-04-05 国电南瑞科技股份有限公司 A kind of method and system of intelligent substation SCD file AutoLink virtual terminator
CN113765767A (en) * 2020-06-02 2021-12-07 上海回声网络科技有限公司 Enterprise WeChat supervision method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11587095B2 (en) * 2019-10-15 2023-02-21 Microsoft Technology Licensing, Llc Semantic sweeping of metadata enriched service data
CN113051476B (en) * 2021-03-25 2023-06-13 北京百度网讯科技有限公司 Method and device for sending message

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133477A1 (en) * 2001-03-05 2002-09-19 Glenn Abel Method for profile-based notice and broadcast of multimedia content
CN101166160A (en) * 2006-10-20 2008-04-23 阿里巴巴公司 A method and system for filtering instant communication rubbish information
CN102323933A (en) * 2011-08-31 2012-01-18 张潇 Information embedding and interaction system facing real-time communication and method
CN102419778A (en) * 2012-01-09 2012-04-18 中国科学院软件研究所 Information searching method for discovering and clustering sub-topics of query statement
CN102970210A (en) * 2012-11-02 2013-03-13 北京百度网讯科技有限公司 Method and device for reminding group messages in instant chat tool
CN103577416A (en) * 2012-07-20 2014-02-12 阿里巴巴集团控股有限公司 Query expansion method and system
CN103605690A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method for recognizing advertising messages in instant messaging

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1987852A (en) * 2005-12-21 2007-06-27 腾讯科技(深圳)有限公司 Method and device for determining communication object attribute according to news content
CN104346480B (en) * 2014-11-27 2018-06-26 百度在线网络技术(北京)有限公司 information mining method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133477A1 (en) * 2001-03-05 2002-09-19 Glenn Abel Method for profile-based notice and broadcast of multimedia content
CN101166160A (en) * 2006-10-20 2008-04-23 阿里巴巴公司 A method and system for filtering instant communication rubbish information
CN102323933A (en) * 2011-08-31 2012-01-18 张潇 Information embedding and interaction system facing real-time communication and method
CN102419778A (en) * 2012-01-09 2012-04-18 中国科学院软件研究所 Information searching method for discovering and clustering sub-topics of query statement
CN103577416A (en) * 2012-07-20 2014-02-12 阿里巴巴集团控股有限公司 Query expansion method and system
CN102970210A (en) * 2012-11-02 2013-03-13 北京百度网讯科技有限公司 Method and device for reminding group messages in instant chat tool
CN103605690A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method for recognizing advertising messages in instant messaging

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016082575A1 (en) * 2014-11-27 2016-06-02 百度在线网络技术(北京)有限公司 Information mining method and apparatus, and storage medium
CN105282012A (en) * 2015-10-23 2016-01-27 广东小天才科技有限公司 Method and system for enhancing information reminding when group chat is carried out
CN106649404A (en) * 2015-11-04 2017-05-10 陈包容 Session scene database creation method and apparatus
CN106649404B (en) * 2015-11-04 2019-12-27 陈包容 Method and device for creating session scene database
CN108345582A (en) * 2017-01-23 2018-07-31 腾讯科技(深圳)有限公司 A kind of method and device that identification social group is done business
CN108345582B (en) * 2017-01-23 2021-08-24 腾讯科技(深圳)有限公司 Method and device for identifying social group engaged business
CN107491493A (en) * 2017-07-22 2017-12-19 长沙兔子代跑网络科技有限公司 A kind of intelligence obtains the method and device for running chat record in generation
CN107526779A (en) * 2017-07-22 2017-12-29 长沙兔子代跑网络科技有限公司 A kind of method and device for excavating generation race client
CN109063029A (en) * 2018-07-10 2018-12-21 苏奇 A kind of information filing management method based on instant communication software
CN109582719A (en) * 2018-10-19 2019-04-05 国电南瑞科技股份有限公司 A kind of method and system of intelligent substation SCD file AutoLink virtual terminator
CN109582719B (en) * 2018-10-19 2021-08-24 国电南瑞科技股份有限公司 Method and system for automatically linking SCD file of intelligent substation to virtual terminal
CN113765767A (en) * 2020-06-02 2021-12-07 上海回声网络科技有限公司 Enterprise WeChat supervision method and system

Also Published As

Publication number Publication date
WO2016082575A1 (en) 2016-06-02
CN104346480B (en) 2018-06-26

Similar Documents

Publication Publication Date Title
CN104346480A (en) Information mining method and device
WO2018036239A1 (en) Method, apparatus and system for monitoring internet media events based on industry knowledge mapping database
CN109033471B (en) Information asset identification method and device
Kumar et al. Sanative chatbot for health seekers
CN111182162B (en) Telephone quality inspection method, device, equipment and storage medium based on artificial intelligence
WO2019196226A1 (en) System information querying method and apparatus, computer device, and storage medium
CN106407078B (en) Client performance monitoring device and method based on information exchange
US11601453B2 (en) Methods and systems for establishing semantic equivalence in access sequences using sentence embeddings
CN103077207B (en) A kind of microblogging happy index analysis method and system
CN112507090B (en) Method, apparatus, device and storage medium for outputting information
CN107341399A (en) Assess the method and device of code file security
US20200380169A1 (en) Virtual data lake system created with browser-based decentralized data access and analysis
CN108416034B (en) Information acquisition system based on financial heterogeneous big data and control method thereof
CN113468296A (en) Model self-iteration type intelligent customer service quality inspection system and method capable of configuring business logic
CN115099239B (en) Resource identification method, device, equipment and storage medium
CN114265957A (en) Multiple data source combined query method and system based on graph database
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph
CN116431828A (en) Construction method of power grid center data asset knowledge graph database constructed based on neural network technology
CN105245394A (en) Method and equipment for analyzing network access log based on layered approach
Zhang et al. Application of data mining technology based on data center
CN107491530B (en) Social relationship mining analysis method based on file automatic marking information
CN106874745A (en) Risk checking method and device
CN107391695A (en) A kind of information extracting method based on big data
CN113347075B (en) WeChat group message response method and device
CN109977423A (en) A kind of unknown word processing method, apparatus, electronic equipment and readable storage medium storing program for executing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant