CN106095965A - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN106095965A
CN106095965A CN201610435485.6A CN201610435485A CN106095965A CN 106095965 A CN106095965 A CN 106095965A CN 201610435485 A CN201610435485 A CN 201610435485A CN 106095965 A CN106095965 A CN 106095965A
Authority
CN
China
Prior art keywords
data
user
interactive log
target problem
log data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610435485.6A
Other languages
Chinese (zh)
Inventor
李广增
张磊
朱频频
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhizhen Intelligent Network Technology Co Ltd
Original Assignee
Shanghai Zhizhen Intelligent Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhizhen Intelligent Network Technology Co Ltd filed Critical Shanghai Zhizhen Intelligent Network Technology Co Ltd
Priority to CN201610435485.6A priority Critical patent/CN106095965A/en
Publication of CN106095965A publication Critical patent/CN106095965A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses a kind of data processing method and device, described method includes: obtain real-time user's interactive log data;By the analysis filtering policy set, described user's interactive log data is carried out real time filtering, to obtain target problem;The result data that described target problem is corresponding is captured in appointed website;Based on described target problem and the result data of correspondence thereof, the knowledge point of knowledge base is extended.The method of the invention, by obtaining user's interactive log in real time and analyzing in real time, improves the ageing of data process;And capture overall process at data acquisition, analysis and result data, it is not necessary to manually participate in, improve data-handling efficiency, thus improve the maintenance efficiency of Intelligent Answer System knowledge base, improve the experience of user.

Description

A kind of data processing method and device
Technical field
The present invention relates to technical field of data processing, particularly relate to a kind of data processing method and device.
Background technology
Knowledge base, is also called intelligence database or artificial intelligence data base.Knowledge base is structuring in knowledge engineering, easily grasps Make, easily utilize, comprehensive organized knowledge cluster, be for a certain (or the some) needs that field question solves, use certain The knowledge sheet collection that what (or some) knowledge representation modes stored in computer storage, and organized, manage and use interknit Close.These knowledge sheets include the theoretical knowledge relevant to field, factual data, expertise the heuristic knowledge obtained, as Definition, theorem and algorithm and common sense knowledge etc. relevant in certain field.
Knowledge base has a wide range of applications, and typical application has Intelligent Answer System or automatic problem system.Intelligent answer Having a set of knowledge base in system, there are substantial amounts of problem and the answer corresponding with each problem in the inside.Intelligent Answer System is first Need to identify the problem that proposed of user, i.e. find from knowledge base and the problem corresponding to this customer problem, then find out with The answer that this problem matches.So, whether knowledge base can be to the accurate or rational answer of customer problem in outlet Weigh an important indicator of Intelligent Answer System performance.In order to ensure the performance of Intelligent Answer System, need a kind of mechanism, sentence Whether disconnected Intelligent Answer System gives accurate or rational answer, and based on answering problem of low quality, redefines this The answer of problem, more new knowledge base.
At present, it is the most all to use batch processing to combine artificial mechanism to realize, concrete: by batch processing, every day pair The daily record data of the previous day be analyzed, find out and answer the problem that poor quality maybe cannot reply, add data base to, then Manually being scanned for relevant issues by search engine by knowledge engineer, the standard adding correspondence is asked and answer.
There is the deficiency of two aspects in this mechanism: first, ageing the highest, it is impossible in real time to the question and answer quality of user on line The highest problem is analyzed in time;Secondly, relying on manual search, treatment effeciency is low.
Summary of the invention
In view of the above problems, it is proposed that the present invention is to provide one to solve the problems referred to above or part solution the problems referred to above Data processing method and device.
According to one aspect of the present invention, it is provided that a kind of data processing method, including:
Obtain real-time user's interactive log data;
By the analysis filtering policy set, described user's interactive log data is carried out real time filtering, ask obtaining target Topic;
The result data that described target problem is corresponding is captured in appointed website;
Based on described target problem and the result data of correspondence thereof, the knowledge point of knowledge base is extended.
Alternatively, user's interactive log data that described acquisition is real-time, specifically include:
Log collection agent node is set in each server of storage user's interactive log, obtains described agent node real Time user's interactive log data of collecting and reporting.
Alternatively, after obtaining described target problem, described method also includes: described target problem is carried out word segmentation processing, Obtain multiple target word;
Capture result data corresponding to described target problem to include capturing the result data corresponding with at least part of target word.
Alternatively, described method also includes:
After obtaining real-time user's interactive log data, described real-time user's interactive log data is stored in first and disappears Breath buffer queue, and according to subscribing to the mode of log topic, from described first message buffer queue, extract user's interactive log Data, to carry out real time filtering;
The target problem being filtrated to get is sent to the second message buffer queue as pending problem, and treats according to subscription The mode of process problem theme, extracts problem from the second message buffer queue, to carry out the result data crawl of problem.
Alternatively, described analysis filtering policy includes one of following strategy or the combination of following multiple strategy:
Strategy 1: according to the answer type set, filter out target in the customer problem in user's interactive log data and ask Topic;
Strategy 2: according to the key word set, filter out target in the customer problem in user's interactive log data and ask Topic;
Strategy 3: according to the semantic similarity of answer content Yu problem, in the customer problem in user's interactive log data Filter out target problem;
Strategy 4: according to analyzing the emotion information obtained from user's interactive log data, at user's interactive log data Customer problem filters out target problem.
Alternatively, described based on described target problem and the result data of correspondence, the knowledge point of knowledge base is expanded Exhibition, including: by described target problem and the result data of correspondence thereof, in the way of newly-increased knowledge point, it is stored in described knowledge base.
Alternatively, described based on described target problem and the result data of correspondence, the knowledge point of knowledge base is expanded Exhibition, including:
The result data of described target problem and correspondence thereof is stored in relevant database, treats in described relevant database Result data be reviewed verification by rear, in the way of newly-increased knowledge point, be stored in described knowledge base.
Alternatively, Flume result collection system is utilized to obtain real-time user's interactive log data;
Use Spark Streaming technology, described user's interactive log data is carried out real time filtering.
According to another aspect of the present invention, it is provided that a kind of data processing equipment, including:
Data acquisition module, for obtaining real-time user's interactive log data;
Data processing module, for by the analysis filtering policy set, carrying out described user's interactive log data in real time Filter, to obtain target problem;
Result handling module, for capturing, in appointed website, the result data that described target problem is corresponding;
Management module, for based on described target problem and the result data of correspondence thereof, is carried out the knowledge point of knowledge base Extension.
Alternatively, described data acquisition module, specifically for obtaining log collection agent node real-time collecting and reporting User's interactive log data;Wherein, described log collection agent node is arranged on storage each server of user's interactive log In.
Alternatively, described data processing module, it is additionally operable to, after obtaining described target problem, described target problem be carried out Word segmentation processing, obtains multiple target word;
Described result handling module includes when capturing result data corresponding to described target problem capturing and at least part of mesh The result data that mark word is corresponding.
Alternatively, described data acquisition module, it is additionally operable to after obtaining real-time user's interactive log data, by described reality Time user's interactive log data be stored in the first message buffer queue;
Described data processing module, is additionally operable to according to the mode subscribing to log topic, from described first message buffer queue Middle extraction user's interactive log data, to carry out real time filtering;And using the target problem that is filtrated to get as pending problem It is sent to the second message buffer queue;
Described result handling module, is additionally operable to, according to the mode subscribing to pending problem theme, cache team from the second message Row extract problem, to carry out the result data crawl of problem.
Alternatively, the analysis filtering policy of described data processing module application includes one of following strategy or the most multiple The combination of strategy:
Strategy 1: according to the answer type set, filter out target in the customer problem in user's interactive log data and ask Topic;
Strategy 2: according to the key word set, filter out target in the customer problem in user's interactive log data and ask Topic;
Strategy 3: according to the semantic similarity of answer content Yu problem, in the customer problem in user's interactive log data Filter out target problem;
Strategy 4: according to analyzing the emotion information obtained from user's interactive log data, at user's interactive log data Customer problem filters out target problem.
Alternatively, described management module, specifically for the result data by described target problem and correspondence thereof, know with newly-increased The mode knowing point is stored in described knowledge base.
Alternatively, described management module, specifically for being stored in relation by the result data of described target problem and correspondence thereof Type data base, until the result data in described relevant database be reviewed verification by after, deposit in the way of newly-increased knowledge point Enter described knowledge base.
Alternatively, described data acquisition module, hand over specifically for utilizing Flume result collection system to obtain real-time user Daily record data mutually;
Described data processing module, specifically for using Spark Streaming technology, to described user's interactive log number According to carrying out real time filtering.
Compared with prior art, beneficial effects of the present invention is as follows:
First, the present invention, by obtaining user's interactive log in real time and analyzing in real time, improves what data processed Ageing;
Secondly, the present invention captures overall process at data acquisition, analysis and result data, it is not necessary to manually participates in, improves number According to treatment effeciency;
Again, utilize data processing scheme of the present invention, improve the maintenance efficiency of Intelligent Answer System knowledge base, Such that it is able to provide the user problem answers more accurately, improve the experience of user.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of description, and in order to allow above and other objects of the present invention, the feature and advantage can Become apparent, below especially exemplified by the detailed description of the invention of the present invention.
Accompanying drawing explanation
By reading the detailed description of hereafter preferred implementation, various other advantage and benefit common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical parts.In the accompanying drawings:
The flow chart of a kind of data processing method that Fig. 1 provides for the embodiment of the present invention;
The system architecture diagram of the data processing method application that Fig. 2 provides for the embodiment of the present invention;
Fig. 3 is the schematic diagram of Flume result collection system in the embodiment of the present invention;
The structured flowchart of a kind of data processing equipment that Fig. 4 provides for the embodiment of the present invention.
Detailed description of the invention
It is more fully described the exemplary embodiment of the disclosure below with reference to accompanying drawings.Although accompanying drawing shows the disclosure Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should be by embodiments set forth here Limited.On the contrary, it is provided that these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
Intelligent Answer System in use can produce substantial amounts of user's interactive log data, and each daily record data includes using Customer problem that family is given and the answer be given by Intelligent Answer System for this customer problem.The embodiment of the present invention provides Data processing method, it is intended to by real-time analytical technology, what the answer poor quality in discovery user journal maybe cannot reply asks Topic, then utilizes the relevant question and answer on crawler capturing the Internet, then adds in knowledge base by the answer of crawl, to improve user Experience.
Concrete, the data processing method of the present embodiment offer, as it is shown in figure 1, comprise the steps:
Step S101, obtains real-time user's interactive log data.
In one particular embodiment of the present invention, the mode obtaining real-time user's interactive log data includes:
1) log collection agent node is set in each server of storage user's interactive log;
2) each log collection agent node real-time collecting the user's interactive log data reported are obtained.
Wherein, log collection agent node is preferably log collection agent node based on Flume, i.e. present invention profit Real-time user's interactive log data is obtained with Flume result collection system.
It is pointed out that the above-mentioned daily record data provided obtains mode is the one in numerous mode, and the present invention is also The most uniquely limit and use which to realize, those skilled in the art be readily apparent that other can realize daily record data and obtain in real time Mode all in the protection thought range of the present invention.
In another embodiment of the present invention, real-time user's interactive log data of acquisition is stored in the first message caching In queue, this first message buffer queue externally provides the interface subscribing to data by theme.
Step S102, by the analysis filtering policy set, carries out real time filtering to described user's interactive log data, with To target problem.
When user's interactive log data is stored in the first message buffer queue, this step needs according to subscribing to daily record master The mode of topic, extracts user's interactive log data, to carry out real time filtering from the first message buffer queue.
In one particular embodiment of the present invention, by the analysis filtering policy set, use Spark Streaming skill Art, carries out real time filtering to described user's interactive log data.
In the still another embodiment of the present invention, the analysis filtering policy of setting includes one of following strategy or as follows The combination of multiple strategy:
Strategy 1: according to the answer type set, filter out target in the customer problem in user's interactive log data and ask Topic.Such as, in user's interactive log, answer type for customer problem is " answer " or " providing suggestion to ask " or " do Go out rhetorical question ", then answer corresponding customer problem by such and filter out.
Strategy 2: according to the key word set, filter out target in the customer problem in user's interactive log data and ask Topic.Such as, for the answer of customer problem comprises the key such as " cannot answer ", " describing unintelligible " in user's interactive log Word, then the customer problem comprising these key words in answering filters out.
Strategy 3: according to the semantic similarity of answer content Yu problem, in the customer problem in user's interactive log data Filter out target problem.Concrete, when the semantic similarity of answer content with problem is less than the threshold value set, then by this kind of user Problem filters out.
Strategy 4: according to analyzing the emotion information obtained from user's interactive log data, at user's interactive log data Customer problem filters out target problem.Such as, in dialogue, user employs the vocabulary that negative emotion value is higher, then it represents that user Dissatisfied to answering, corresponding problem can be filtered out.
In a preferred embodiment of the invention, after obtaining the target problem filtered out, also include: described target is asked Topic carries out word segmentation processing, obtains multiple target word.
In an alternate embodiment of the present invention where:
If target problem not being carried out word segmentation processing, then, after being filtrated to get target problem, also include: the target obtained is asked Topic, is sent to the second message buffer queue as pending problem, and the second message buffer queue externally provides by theme subscription According to interface.
If target problem to have been carried out word segmentation processing, then, after obtaining multiple target word, also include: the target problem that will obtain With at least part of target word, being sent to the second message buffer queue as pending problem, the second message buffer queue externally carries For subscribing to the interface of data by theme.
Wherein, the second message buffer queue and the first message buffer queue can be identical, it is also possible to different.
Step S103, captures, in appointed website, the result data that described target problem is corresponding.
In the embodiment of the present invention, work as target problem, or, target problem and at least target word are stored in the second message caching Time in queue, this step needs according to the mode subscribing to pending problem theme, extracts problem from the second message buffer queue, To carry out the result data crawl of problem.
Further, in the embodiment of the present invention, when S102 step does not carries out word segmentation processing to target problem, crawl The result data that target problem is corresponding includes: directly capture the result data that target problem obtains;
When S102 step carries out word segmentation processing to target problem, the result data bag that the target problem of crawl is corresponding Include: directly capture the result data that target problem obtains, and capture the result data that at least part of target word is corresponding.
Further, in the embodiment of the present invention, it is intended that website can be the website that the credibility of data is of a relatively high, example As: Baidupedia etc..
Step S104, based on described target problem and the result data of correspondence thereof, is extended the knowledge point of knowledge base.
In the embodiment of the present invention, a kind of embodiment of this step is: by target problem and the result data of correspondence thereof, with The mode of newly-increased knowledge point is stored in knowledge base.
The another embodiment of this step is: the result data of target problem and correspondence thereof is stored in relational data Storehouse, until the result data in relevant database be reviewed verification by after, in the way of newly-increased knowledge point, be stored in knowledge base.Its In, examination & verification verification can be manual examination and verification verifications, it is also possible to for being loaded with the machine check of special algorithm.
Knowledge base includes that multiple knowledge point, each knowledge point include: standard is asked, multiple extension is asked and answer, knot Really data are the sources of answer, and target problem is that standard is asked or extended and asks.Further, it is also possible to generate more according to target problem Extension is asked.
In summary, the data processing method described in the present embodiment is by obtaining in real time user's interactive log and real Time analyze, improve data process ageing;Overall process is captured, it is not necessary to manually join at data acquisition, analysis and result data With, improve data-handling efficiency.
Data processing scheme described in the present embodiment, improves the maintenance efficiency of Intelligent Answer System knowledge base, such that it is able to Provide the user problem answers more accurately, improve the experience of user.
A specific embodiment of the present invention is given below, in order to the implementation process of the clearer elaboration present invention.
Data processing method described in the present embodiment, make use of Flume result collection system, can be to use on Real-time Collection line The interactive log at family, by the real-time computing engines of Spark Streaming, the user's real-time, interactive daily record collecting Flume Data stream is analyzed being filtrated to get pending problem, in conjunction with crawler technology, to pending problem, captures corresponding question and answer letter Breath warehouse-in, it is provided that edit examination & verification verification to knowledge base maintenance personnel, improve Consumer's Experience in time.
As in figure 2 it is shown, the system architecture diagram applied for data processing method described in the present embodiment, below based on this system tray Composition, is described in detail data processing method of the present invention, and described data processing method comprises the steps:
Step 1, carries out the collection of daily record by ApacheFlume, will be dispersed in the user on server everywhere mutual Log collection is in Apache Kafka message queue.
Explanation Flume log collection mechanism below in conjunction with the accompanying drawings.As it is shown on figure 3, dispose knot for Flume result collection system Composition.
The ultimate unit of Flume transmission data is Event (event), for text, it is common that a line record. Event is also the ultimate unit of affairs (Transaction) simultaneously.
Flume program it is crucial that Agent (agency).Agent is a complete data gathering tool, including three groups Part: Source (source), Channel (passage) and Sink (destination).
The minimum of Event representative of data flow completes unit, and essence is a byte data, and Event can comprise Headers (message header) information.Event flows through Channel from Source, then flows to Sink from Channel.
Source completes the collection of log data, and as Event and Transaction, is cached to Channel In, Channel provides the function of buffer queue, and the data sending Source cache, and Sink takes out the number in Channel According to, store in Apache Kafka message queue.
Step 2, (Kafka is general to subscribe to the log topic in Apache Kafka message queue by SparkStreaming Read Topic), analyze user's interactive log data stream in real time, be filtrated to get target problem, and target problem is carried out participle, will Result after target problem and participle, the pending problem theme being sent in Apache Kafka message queue.
Wherein, SparkStreaming analyzes and obtains the process of target problem and include:
Spark Streaming is a kind of structure real-time Computational frame on Spark, and it extends Spark and processes big The ability of scale stream data.Data can come from the data sources such as Kafka, Flume, HDFS, TCPSocket, and these data can Process with the algorithm by using higher-order function (map, reduce, join, window etc.) to construct complexity.Final process The data crossed are sent to file system, database stores.
SparkStreaming is at internal receipt real time input data stream, and is Fixed Time Interval by data stream cutting Batch data (DStream), then gives Spark task executive engine and processes, and generates the final place of each batch data Reason result.
Specific to this case, Spark Streaming obtains active user interactive log from ApacheKafka message queue Record, as follows:
Record 1
2016-05-0819:47:56 | ... if | brightness cannot be regulated console display, | 8 | and cannot be adjusted by physical button Whole brightness ... | | 1 | brightness cannot regulate | 1.0 | ...
Record 2
2016-05-0819:47:56 | ... | it is the most sorry how to install fingerprint drive software | 8 |, the problem that you carry I Learn, the most also cannot answer ... how | 1 | installs fingerprint drive software | 1.0 | ...
……
SparkStreaming, by the batch data (DStream) that data stream cutting is Fixed Time Interval, then gives Spark task executive engine carries out target problem filtration treatment.
Spark task executive engine carries out the analysis filtering policy that target problem filtration treatment used and comprises and (not only limit In be listed below): 1. in user's interactive log, answer type is 0 (represent and do not answer) and the use of 11 (expression provides suggestion and asks) Family problem;2. in answering, comprise the key word such as " cannot answer ", " describing unintelligible ";3. problem and the semantic phase of answer in daily record Like degree field value less than the threshold values configured;4. dialogue comprises the vocabulary that negative emotion value is higher.
Obtain the target problem (above-mentioned record 2) in user's interactive log, by target problem with target problem is carried out point The word segmentation result obtained after word is sent to the pending problem theme of ApacheKafka message queue, goes to capture for crawlers Answer.
Step 3, crawler capturing system, by subscribing to the pending problem theme in Apache Kafka, is gone to capture pending The question and answer information that problem is relevant, and the result obtained is preserved, for knowledge base maintenance, personnel audit verification.
Concrete, crawlers obtains pending problem (such as: how to install fingerprint from ApacheKafka message queue Drive software) and the word segmentation result (such as: install fingerprint drive software) of correspondence, then crawl list of websites from specified (Baidupedia, Baidu are known) captures corresponding answer, and (answer grabbed by participle is equally as pending problem Alternative answer), and be saved in MySQL database.
Step 4, adds knowledge base by the record audited by knowledge base maintenance personnel and verified in MySQL database.This Time, when user proposes Similar Problems again, correct answer i.e. can be provided.
Wherein, knowledge base externally provides amendment editting function, make knowledge base maintenance personnel can edit correct problem and Answer adds knowledge base.
Further, the embodiment of the present invention also provides for a kind of data processing equipment, and as shown in Figure 4, described device specifically wraps Include:
Data acquisition module 410, for obtaining real-time user's interactive log data;
Data processing module 420, for by the analysis filtering policy set, carrying out reality to described user's interactive log data Time filter, to obtain target problem;
Result handling module 430, for capturing, in appointed website, the result data that described target problem is corresponding;
Management module 440, for based on described target problem and the result data of correspondence thereof, clicks on the knowledge of knowledge base Row extension.
Based on said structure framework and enforcement principle, several concrete and be preferable to carry out side under the above constitution is given below Formula, in order to refine and to optimize the function of device of the present invention, so that the enforcement of the present invention program is more convenient, accurately.Specifically relate to And following content:
In the embodiment of the present invention, data acquisition module 410, specifically for obtaining log collection agent node real-time collecting also The user's interactive log data reported;Wherein, described log collection agent node is arranged on storage each of user's interactive log In server.
Further, in the embodiment of the present invention, data processing module 420, it is additionally operable to after obtaining described target problem, right Described target problem carries out word segmentation processing, obtains multiple target word;
Now, result handling module 430 includes when capturing result data corresponding to described target problem capturing and at least portion The result data that partial objectives for word is corresponding.
Further, in the embodiment of the present invention, data acquisition module 410, it is additionally operable to obtaining real-time user's mutual day After will data, described real-time user's interactive log data is stored in the first message buffer queue;
Data processing module 420, is additionally operable to according to the mode subscribing to log topic, from described first message buffer queue Extract user's interactive log data, to carry out real time filtering;And the target problem being filtrated to get is sent out as pending problem Deliver to the second message buffer queue;
Result handling module 430, is additionally operable to according to the mode subscribing to pending problem theme, from the second message buffer queue Middle extraction problem, to carry out the result data crawl of problem.
Further, in the embodiment of the present invention, the analysis filtering policy of data processing module 420 application includes following strategy One of or the combination of following multiple strategy:
Strategy 1: according to the answer type set, filter out target in the customer problem in user's interactive log data and ask Topic;
Strategy 2: according to the key word set, filter out target in the customer problem in user's interactive log data and ask Topic;
Strategy 3: according to the semantic similarity of answer content Yu problem, in the customer problem in user's interactive log data Filter out target problem;
Strategy 4: according to analyzing the emotion information obtained from user's interactive log data, at user's interactive log data Customer problem filters out target problem.
Further, in the embodiment of the present invention, manage module 440, specifically for by described target problem and correspondence thereof Result data, is stored in described knowledge base in the way of newly-increased knowledge point.
Or, manage module 440, specifically for the result data of described target problem and correspondence thereof is stored in relationship type number According to storehouse, until the result data in described relevant database be reviewed verification by after, in the way of newly-increased knowledge point, be stored in institute State knowledge base.
It is also preferred that the left in the embodiment of the present invention, it is real-time that data acquisition module 410 utilizes Flume result collection system to obtain User's interactive log data.
Data processing module 420 uses Spark Streaming technology, carries out described user's interactive log data in real time Filter.
In summary, device described in the embodiment of the present invention, by obtaining in real time user's interactive log and dividing in real time Analysis, improves the ageing of data process;Overall process is captured, it is not necessary to manually participate in, carry at data acquisition, analysis and result data High data-handling efficiency.
Data processing scheme described in the present embodiment, improves the maintenance efficiency of Intelligent Answer System knowledge base, such that it is able to Provide the user problem answers more accurately, improve the experience of user.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical similar portion between each embodiment Dividing and see mutually, what each embodiment stressed is the difference of itself and other embodiments.Particularly with device For embodiment, due to its basic simlarity and embodiment of the method, so, description fairly simple, it is real that relevant part sees method The part executing example illustrates.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can Completing instructing relevant hardware by program, this program can be stored in a computer-readable recording medium, storage Medium may include that ROM, RAM, disk or CD etc..
In a word, the foregoing is only presently preferred embodiments of the present invention, be not intended to limit protection scope of the present invention. All within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. made, should be included in the present invention's Within protection domain.

Claims (16)

1. a data processing method, it is characterised in that including:
Obtain real-time user's interactive log data;
By the analysis filtering policy set, described user's interactive log data is carried out real time filtering, to obtain target problem;
The result data that described target problem is corresponding is captured in appointed website;
Based on described target problem and the result data of correspondence thereof, the knowledge point of knowledge base is extended.
2. the method for claim 1, it is characterised in that user's interactive log data that described acquisition is real-time, specifically wraps Include:
Log collection agent node is set in each server of storage user's interactive log, obtains described agent node and receive in real time The user's interactive log data collected and report.
3. method as claimed in claim 1 or 2, it is characterised in that after obtaining described target problem, described method also includes: Described target problem is carried out word segmentation processing, obtains multiple target word;
Capture result data corresponding to described target problem to include capturing the result data corresponding with at least part of target word.
4. the method for claim 1, it is characterised in that also include:
After obtaining real-time user's interactive log data, described real-time user's interactive log data is stored in the first message and delays Deposit queue, and according to subscribing to the mode of log topic, from described first message buffer queue, extract user's interactive log data, To carry out real time filtering;
The target problem being filtrated to get is sent to the second message buffer queue as pending problem, and pending according to subscribing to The mode of problem theme, extracts problem from the second message buffer queue, to carry out the result data crawl of problem.
5. the method as described in claim 1 or 2 or 4, it is characterised in that described analysis filtering policy includes one of following strategy Or the combination of following multiple strategy:
Strategy 1: according to the answer type set, filter out target problem in the customer problem in user's interactive log data;
Strategy 2: according to the key word set, filter out target problem in the customer problem in user's interactive log data;
Strategy 3: according to the semantic similarity of answer content Yu problem, filters in the customer problem in user's interactive log data Go out target problem;
Strategy 4: according to analyzing the emotion information obtained from user's interactive log data, the user of user's interactive log data Problem filters out target problem.
6. the method for claim 1, it is characterised in that described based on described target problem and the number of results of correspondence According to, the knowledge point of knowledge base is extended, including: by described target problem and the result data of correspondence thereof, with newly-increased knowledge The mode of point is stored in described knowledge base.
7. the method for claim 1, it is characterised in that described based on described target problem and the number of results of correspondence According to, the knowledge point of knowledge base is extended, including:
The result data of described target problem and correspondence thereof is stored in relevant database, treats the knot in described relevant database Really data are reviewed and verify by rear, are stored in described knowledge base in the way of newly-increased knowledge point.
8. the method for claim 1, it is characterised in that
Flume result collection system is utilized to obtain real-time user's interactive log data;
Use Spark Streaming technology, described user's interactive log data is carried out real time filtering.
9. a data processing equipment, it is characterised in that including:
Data acquisition module, for obtaining real-time user's interactive log data;
Data processing module, for by the analysis filtering policy set, described user's interactive log data being carried out real time filtering, To obtain target problem;
Result handling module, for capturing, in appointed website, the result data that described target problem is corresponding;
Management module, for based on described target problem and the result data of correspondence thereof, is extended the knowledge point of knowledge base.
10. device as claimed in claim 9, it is characterised in that
Described data acquisition module, specifically for obtaining log collection agent node real-time collecting the user's interactive log reported Data;Wherein, during described log collection agent node is arranged on each server that storage has user's interactive log.
11. the device as described in claim 9 or 10, it is characterised in that described data processing module, be additionally operable to obtain described After target problem, described target problem is carried out word segmentation processing, obtain multiple target word;
Described result handling module includes when capturing result data corresponding to described target problem capturing and at least part of target word Corresponding result data.
12. devices as claimed in claim 9, it is characterised in that
Described data acquisition module, is additionally operable to after obtaining real-time user's interactive log data, described real-time user is handed over Daily record data is stored in the first message buffer queue mutually;
Described data processing module, is additionally operable to, according to the mode subscribing to log topic, carry from described first message buffer queue Take family interactive log data, to carry out real time filtering;And the target problem being filtrated to get is sent as pending problem To the second message buffer queue;
Described result handling module, is additionally operable to according to the mode subscribing to pending problem theme, from the second message buffer queue Extraction problem, to carry out the result data crawl of problem.
13. devices as described in claim 9 or 10 or 12, it is characterised in that the analysis of described data processing module application Filter strategy includes one of following strategy or the combination of following multiple strategy:
Strategy 1: according to the answer type set, filter out target problem in the customer problem in user's interactive log data;
Strategy 2: according to the key word set, filter out target problem in the customer problem in user's interactive log data;
Strategy 3: according to the semantic similarity of answer content Yu problem, filters in the customer problem in user's interactive log data Go out target problem;
Strategy 4: according to analyzing the emotion information obtained from user's interactive log data, the user of user's interactive log data Problem filters out target problem.
14. devices as claimed in claim 9, it is characterised in that described management module, specifically for by described target problem and The result data of its correspondence, is stored in described knowledge base in the way of newly-increased knowledge point.
15. devices as claimed in claim 9, it is characterised in that described management module, specifically for by described target problem and The result data of its correspondence is stored in relevant database, treats that the result data in described relevant database is reviewed verification and passes through After, in the way of newly-increased knowledge point, it is stored in described knowledge base.
16. devices as claimed in claim 9, it is characterised in that
Described data acquisition module, specifically for utilizing Flume result collection system to obtain real-time user's interactive log data;
Described data processing module, specifically for using Spark Streaming technology, enters described user's interactive log data Row real time filtering.
CN201610435485.6A 2016-06-17 2016-06-17 A kind of data processing method and device Pending CN106095965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610435485.6A CN106095965A (en) 2016-06-17 2016-06-17 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610435485.6A CN106095965A (en) 2016-06-17 2016-06-17 A kind of data processing method and device

Publications (1)

Publication Number Publication Date
CN106095965A true CN106095965A (en) 2016-11-09

Family

ID=57236468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610435485.6A Pending CN106095965A (en) 2016-06-17 2016-06-17 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN106095965A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256227A (en) * 2017-04-28 2017-10-17 北京神州泰岳软件股份有限公司 Towards the semantic concept spread generating method and device of knowledge content
CN108717468A (en) * 2018-06-11 2018-10-30 泰康保险集团股份有限公司 A kind of data-updating method, device, medium and electronic equipment
CN108989314A (en) * 2018-07-20 2018-12-11 北京木瓜移动科技股份有限公司 A kind of Transmitting Data Stream, processing method and processing device
CN109729130A (en) * 2018-04-04 2019-05-07 中国平安人寿保险股份有限公司 Information analysis method, service server, storage medium and device
CN109766494A (en) * 2018-12-25 2019-05-17 出门问问信息科技有限公司 Problem answers are to extending method, device, equipment and computer readable storage medium
CN110955769A (en) * 2019-12-17 2020-04-03 联想(北京)有限公司 Processing flow construction method and electronic equipment
CN111506672A (en) * 2020-03-24 2020-08-07 平安国际智慧城市科技股份有限公司 Method, device, equipment and storage medium for analyzing environmental protection monitoring data in real time
CN111915378A (en) * 2020-08-17 2020-11-10 深圳墨世科技有限公司 User attribute prediction method, device, computer equipment and storage medium
CN112948564A (en) * 2021-04-15 2021-06-11 苏州数海长云数据信息科技有限公司 Computer question-answering method and system based on artificial intelligence technology

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN105550361A (en) * 2015-12-31 2016-05-04 上海智臻智能网络科技股份有限公司 Log processing method and apparatus, and ask-answer information processing method and apparatus
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902652A (en) * 2014-02-27 2014-07-02 深圳市智搜信息技术有限公司 Automatic question-answering system
CN105631026A (en) * 2015-12-30 2016-06-01 北京奇艺世纪科技有限公司 Security data analysis system
CN105550361A (en) * 2015-12-31 2016-05-04 上海智臻智能网络科技股份有限公司 Log processing method and apparatus, and ask-answer information processing method and apparatus

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256227A (en) * 2017-04-28 2017-10-17 北京神州泰岳软件股份有限公司 Towards the semantic concept spread generating method and device of knowledge content
CN109729130A (en) * 2018-04-04 2019-05-07 中国平安人寿保险股份有限公司 Information analysis method, service server, storage medium and device
CN108717468A (en) * 2018-06-11 2018-10-30 泰康保险集团股份有限公司 A kind of data-updating method, device, medium and electronic equipment
CN108989314A (en) * 2018-07-20 2018-12-11 北京木瓜移动科技股份有限公司 A kind of Transmitting Data Stream, processing method and processing device
CN109766494A (en) * 2018-12-25 2019-05-17 出门问问信息科技有限公司 Problem answers are to extending method, device, equipment and computer readable storage medium
CN110955769A (en) * 2019-12-17 2020-04-03 联想(北京)有限公司 Processing flow construction method and electronic equipment
CN110955769B (en) * 2019-12-17 2023-07-21 联想(北京)有限公司 Method for constructing processing stream and electronic equipment
CN111506672A (en) * 2020-03-24 2020-08-07 平安国际智慧城市科技股份有限公司 Method, device, equipment and storage medium for analyzing environmental protection monitoring data in real time
CN111915378A (en) * 2020-08-17 2020-11-10 深圳墨世科技有限公司 User attribute prediction method, device, computer equipment and storage medium
CN112948564A (en) * 2021-04-15 2021-06-11 苏州数海长云数据信息科技有限公司 Computer question-answering method and system based on artificial intelligence technology

Similar Documents

Publication Publication Date Title
CN106095965A (en) A kind of data processing method and device
US9305302B2 (en) Weighting sentiment information
CN108121795B (en) User behavior prediction method and device
Inel et al. Crowdtruth: Machine-human computation framework for harnessing disagreement in gathering annotated data
US20150149383A1 (en) Method and device for acquiring product information, and computer storage medium
Piad et al. Predicting IT employability using data mining techniques
CN105740228A (en) Internet public opinion analysis method
CN105893583A (en) Data acquisition method and system based on artificial intelligence
Sundar A comparative study for predicting students academic performance using Bayesian network classifiers
Taxidou et al. Realtime analysis of information diffusion in social media
Hall et al. Explainable artificial intelligence for digital forensics
DE102021004157A1 (en) Machine learning modeling to protect against online disclosure of sensitive data
DE102020119090A1 (en) METHODS AND DEVICES FOR CREATING A MULTI-EDITION ENSEMBLE MODEL DEFENSE AGAINST ADVERSARY ATTACK
JP2018195078A (en) Evaluation device, evaluation method, and evaluation program
Wu Data “objectivity” in a time of coronavirus: uncovering the potential impact of state influence on the production of data-driven news
Etudo et al. From Facebook to the streets: Russian troll ads and Black Lives Matter protests
Rogers 3 The scale of Facebook’s problem depends upon how “fake news” is classified
CN106021552A (en) Internet creeper concurrency data collection method and system based on crowd behavior simulation
DE102014113817A1 (en) Device and method for recognizing an object in an image
Meyer et al. Between calls for action and narratives of denial: Climate change attention structures on Twitter
Risse et al. What do you want to collect from the web
Deflem et al. Historical research and social movements
CN109344299A (en) Object search method, apparatus, electronic equipment and computer readable storage medium
CN114490833A (en) Method and system for visualizing graph calculation result
ARAR et al. The refugee system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161109