CN105550361A

CN105550361A - Log processing method and apparatus, and ask-answer information processing method and apparatus

Info

Publication number: CN105550361A
Application number: CN201511030354.1A
Authority: CN
Inventors: 曾永梅; 朱频频
Original assignee: Shanghai Zhizhen Intelligent Network Technology Co Ltd
Current assignee: Shanghai Zhizhen Intelligent Network Technology Co Ltd
Priority date: 2015-12-31
Filing date: 2015-12-31
Publication date: 2016-05-04
Anticipated expiration: 2035-12-31
Also published as: CN105550361B

Abstract

The invention provides a log processing method of an ask-answer system. The method comprises the steps of acquiring user log data; filtering the user log data to obtain to-be-processed log data; acquiring a first standard ask obtained through first similarity algorithm on the to-be-processed log data; acquiring a second standard ask obtained through second similarity algorithm on the to-be-processed log data; and when the similarity between the to-be-processed log data and the first standard ask is greater than a first threshold, the similarity of the to-be-processed log data and the second standard ask is greater than a second threshold, and the first standard ask and the second standard ask are identical, marking the to-be-processed log correct.

Description

Log processing method and device and question and answer information processing method and device

Technical field

The present invention relates to human-computer interaction technique field, particularly relate to the log processing method of question answering system and device and question and answer information processing method and device.

Background technology

Man-machine interaction is the science of the interactive relation between Study system and user.System can be various machine, also can be computerized system and software.Such as, various artificial intelligence system can be realized by man-machine interaction, such as, intelligent customer service system, speech control system etc.Artificial intelligence semantics recognition is the basis of man-machine interaction, and it can identify human language, to convert the language that machine can be understood to.

Intelligent Answer System is a kind of typical apply of man-machine interaction, and wherein after user asks a question, Intelligent Answer System provides the answer of this problem.For this reason, have a set of knowledge base in Intelligent Answer System, there are a large amount of problems and the answer corresponding with each problem in the inside.First Intelligent Answer System needs the problem identifying that user proposes, and namely finds from knowledge base and the problem corresponding to this customer problem, then finds out the answer matched with this problem.

The maintenance update of Intelligent Answer System is a significant challenge.

Summary of the invention

Below provide the brief overview of one or more aspect to provide the basic comprehension to these aspects.Detailed the combining of this not all aspect contemplated of general introduction is look at, and both not intended to be pointed out out the scope of key or decisive any or all aspect of elements nor delineate of all aspects.Its unique object is the sequence that some concepts that will provide one or more aspect in simplified form think the more detailed description provided after a while.

According to an aspect of the present invention, provide a kind of log processing method of question answering system, comprising:

Obtain user journal data;

Filtration treatment is carried out to these user journal data, to obtain pending daily record data;

Obtain the first standard calculated by carrying out the first similarity to this pending daily record data to ask;

Obtain the second standard calculated by carrying out the second similarity to this pending daily record data to ask;

The similarity of asking when this pending daily record data and this first standard is greater than first threshold, the similarity that this pending daily record data and this second standard are asked is greater than Second Threshold, and when this first standard is asked and this second standard is asked identical, then this pending daily record is labeled as correctly.

In one example, this filtration treatment comprises:

Correct daily record storehouse and meaningless daily record storehouse are provided;

The daily record data in meaningless daily record storehouse is not belonged to as pending daily record data using neither belonging to correct daily record storehouse in these user journal data by multilevel iudge yet.

In one example, this first Similarity Measure comprises:

There is provided Q & A database, this Q & A database comprises multiple question and answer standard and asks;

Select a question and answer standard to ask by expression formula Semantic Similarity Measurement for this pending daily record, this question and answer standard is asked and to be asked as the first standard.

In one example, this second Similarity Measure comprises:

There is provided correct daily record storehouse, this correct daily record storehouse comprises correct standard and asks;

Adopt large data clusters analysis correct for this pending Log Clustering to standard to be asked, this correct standard is asked and to be asked as the second standard.

In one example, this first standard is asked directly from this user journal extracting data.

In one example, the method also comprises: the similarity of asking when this pending daily record data and this first standard is greater than first threshold, the similarity that this pending daily record data and this second standard are asked is greater than Second Threshold, this first standard is asked and this second standard asks difference, and the difference of this pending daily record data and this first standard similarity of asking and first threshold than the difference being greater than similarity that this pending daily record data and this second standard ask and Second Threshold than time, then this pending daily record is labeled as correctly.

In one example, the method also comprises:

Being less than first threshold to the similarity of asking with this first standard and all user journal data being less than Second Threshold with the similarity that this second standard is asked carry out cluster analysis, is that multiple user journal is trooped for manual confirmation with cluster.

According to a further aspect in the invention, provide a kind of question and answer information processing method, comprising:

Receive customer problem;

Carry out the first similarity to this customer problem to calculate the first standard and ask;

Carry out the second similarity to this customer problem to calculate the second standard and ask;

The similarity of asking when this customer problem and this first standard is greater than first threshold, the similarity that this customer problem and this second standard are asked is greater than Second Threshold, and when this first standard is asked and this second standard is asked identical, then ask that corresponding answer information or this second standard ask corresponding answer information to this first standard of user feedback.

In one example, this first Similarity Measure comprises:

By expression formula Semantic Similarity Measurement for this customer problem selects a question and answer standard to ask, this question and answer standard is asked and to be asked as the first standard.

In one example, this second Similarity Measure comprises:

Adopt large data clusters analysis correct for this customer problem cluster to standard to be asked, this correct standard is asked and to be asked as the second standard.

In one example, the method also comprises: the similarity of asking when this customer problem and this first standard is greater than first threshold, the similarity that this customer problem and this second standard are asked is greater than Second Threshold, and when this first standard is asked and this second standard asks difference, then ask corresponding answer information with corresponding threshold difference than larger standard to user feedback similarity.

According to a further aspect in the invention, provide a kind of log processing device of question answering system, comprising:

Acquisition module, for obtaining user journal data;

Filtering module, for carrying out filtration treatment to these user journal data, to obtain pending daily record data;

First similarity calculation module, asks for obtaining the first standard calculated by carrying out the first similarity to this pending daily record data;

Second similarity calculation module, asks for obtaining the second standard calculated by carrying out the second similarity to this pending daily record data;

Judge module, for judging whether the similarity that this pending daily record data and this first standard are asked is greater than first threshold, whether the similarity that this pending daily record data and this second standard are asked is greater than Second Threshold, and this first standard is asked and this second standard asks whether same standard is asked; And

Labeling module, similarity for asking when this pending daily record data and this first standard is greater than first threshold, the similarity that this pending daily record data and this second standard are asked is greater than Second Threshold, and when this first standard is asked and this second standard is asked identical, then this pending daily record is labeled as correctly.

In one example, this question answering system provides correct daily record storehouse and meaningless daily record storehouse, and this filtering module does not belong to the daily record data in meaningless daily record storehouse as pending daily record data further by multilevel iudge using neither belonging to correct daily record storehouse in these user journal data yet.

In one example, this question answering system provides Q & A database, and this Q & A database comprises multiple question and answer standard and asks, this first similarity calculation module comprises:

Expression formula Semantic Similarity Measurement module, for selecting a question and answer standard to ask by expression formula Semantic Similarity Measurement for this pending daily record, this question and answer standard is asked and to be asked as the first standard.

In one example, this question answering system provides correct daily record storehouse, and this correct daily record storehouse comprises correct standard and asks, this second Similarity Measure comprises:

Cluster module, for adopting large data clusters analysis correct for this pending Log Clustering to standard to be asked, this correct standard is asked and to be asked as the second standard.

In one example, this first similarity calculation module is directly asked from this first standard of this user journal extracting data.

In one example, the similarity that this judge module is used for when this pending daily record data and this first standard are asked is greater than first threshold, the similarity that this pending daily record data and this second standard are asked is greater than Second Threshold, when this first standard is asked and this second standard asks difference, judge the difference of similarity that this pending daily record data and this first standard ask and first threshold than the difference ratio of the similarity whether being greater than this pending daily record data and asking with this second standard with Second Threshold further

The difference of the similarity of asking in response to this pending daily record data and this first standard and first threshold is than being greater than similarity that this pending daily record data and this second standard the ask difference ratio with Second Threshold, and this pending daily record is labeled as correctly by this labeling module.

In accordance with a further aspect of the present invention, provide a kind of question and answer signal conditioning package, comprising:

Receiver module, for receiving customer problem;

First similarity calculation module, calculates the first standard ask for carrying out the first similarity to this customer problem;

Second similarity calculation module, calculates the second standard ask for carrying out the second similarity to this customer problem;

Judge module, for judging whether the similarity that this customer problem and this first standard are asked is greater than first threshold, whether the similarity that this customer problem and this second standard are asked is greater than Second Threshold, and this first standard is asked and this second standard asks whether be that same standard is asked; And

Output module, similarity for asking when this customer problem and this first standard is greater than first threshold, the similarity that this customer problem and this second standard are asked is greater than Second Threshold, and when this first standard is asked and this second standard is asked identical, ask that corresponding answer information or this second standard ask corresponding answer information to this first standard of user feedback.

Expression formula Semantic Similarity Measurement module, by expression formula Semantic Similarity Measurement for this customer problem selects a question and answer standard to ask, this question and answer standard is asked and to be asked as the first standard.

In one example, this question answering system provides correct daily record storehouse, and this correct daily record storehouse comprises correct standard and asks, this second similarity calculation module comprises:

Cluster module, for adopting large data clusters analysis correct for this customer problem cluster to standard to be asked, this correct standard is asked and to be asked as the second standard.

In one example, the similarity that this judge module is asked when this customer problem and this first standard is greater than first threshold, the similarity that this customer problem and this second standard are asked is greater than Second Threshold, when this first standard is asked and this second standard asks difference, judge that further similarity that this customer problem and this first standard ask and the difference of first threshold compare the difference ratio whether being greater than similarity that this customer problem and this second standard ask and Second Threshold;

This output module asks corresponding answer information with corresponding threshold difference than larger standard to user feedback similarity.

According to the solution of the present invention, by utilizing the different Similarity Measure based on Q & A database and correct daily record storehouse, automatic screening being achieved to the user journal data of quite a few and confirms, greatly reducing artificial workload, improve treatment effeciency, reduce cost.In addition, by utilizing the different Similarity Measure based on Q & A database and correct daily record storehouse, improve the question and answer accuracy of question answering system.

Accompanying drawing explanation

After the detailed description of reading embodiment of the present disclosure in conjunction with the following drawings, above-mentioned feature and advantage of the present invention can be understood better.In the accompanying drawings, each assembly is not necessarily drawn in proportion, and the assembly with similar correlation properties or feature may have identical or close Reference numeral.

Fig. 1 shows the process flow diagram of the log processing method of question answering system according to an aspect of the present invention;

Fig. 2 shows the process flow diagram of question and answer information processing method according to an aspect of the present invention;

Fig. 3 shows the block diagram of the log processing device of question answering system according to an aspect of the present invention; And

Fig. 4 shows the block diagram of question and answer signal conditioning package according to an aspect of the present invention.

Embodiment

Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.Note, the aspects described below in conjunction with the drawings and specific embodiments is only exemplary, and should not be understood to carry out any restriction to protection scope of the present invention.

The most original and the simplest form of basic knowledge point in knowledge base is exactly the FAQ commonly used at ordinary times, and general form is that " ask-answer " is right.In the present invention, " standard is asked " is used to the word representing certain knowledge point, and main target is that expression is clear, is convenient to safeguard.Such as, " rate of CRBT " are exactly express standard clearly to ask description.Here " asking " should be narrowly interpreted as " inquiry ", and broadly should understand one " input ", should " input " have corresponding " output ".Such as, for the semantics recognition for control system, an instruction of user, such as, " turn on radio " and also should be understood to be one " asking ", and now corresponding " answering " can be calling of control program for performing corresponding control.

User is when inputting to machine, and optimal situation is that use standard is asked, then the intelligent semantic recognition system of machine can understand the meaning of user at once.But user often not uses standard to ask, but some forms of being out of shape that standard is asked.Such as, if the standard form of asking switched for wireless radio station is " changing a radio station ", the order that so user may use is " switching a radio station ", and it is the same meaning that machine also needs to identify that user expresses.Therefore, for intelligent semantic identification, what the standard that needs in knowledge base was asked similarly asks, this is similar asks and ask that expression-form has difference slightly with standard, but expresses identical implication.

Further, in order to identify customer problem more accurately and efficiently, Intelligent Answer System also been developed the concept of semantic formula.

Semantic formula is formed primarily of word, part of speech and their "or" relation, its core depends on " part of speech ", part of speech is simply understood and is the word that a group has general character, these words semantically can similar also can be dissimilar, these words also can be noted as important or inessential.Relation between semantic formula and user's question sentence represents by the value (similarity) quantized, and this value quantized can compare mutually with the similarity between similar question sentence and user's question sentence simultaneously.

Below semantic formula is briefly introduced.

Symbol in semantic formula

A. the expression ([]) of part of speech

For distinguishing word in expression formula and part of speech, regulation part of speech must appear in square bracket " [] ", and the part of speech occurred in square bracket is generally " narrow sense part of speech ", but also by configuration-system parameter to support " broad sense part of speech ".

Here is the example of some structures:

[Fetion] [how] [open-minded]

[introduction] [multimedia message] [business]

[login] [method] of [Fetion]

[call reminding] [how] [charge]

Or the expression (|) of relation B.

Part of speech in square bracket can be occurred repeatedly by "or" relation, and the part of speech of these "or" relations can calculate separately in the mode of " expansion " when calculating similarity.Semantic formula is mainly launched into the process of multiple structure by " expansion " according to the meaning of "or".As [method | step] deployable one-tenth " [step] of [CRBT] [open-minded] " and " [method] of [CRBT] [open-minded] " two simple semantic formulas of: [CRBT] [open-minded].

The example of this kind of semantic formula is as follows:

[CRBT] [open-minded] [method | step]

[how] [inquiry | know] [PUK code]

[quit the subscription of | cancel | close | stop using] [IP|17951] [the preferential bag of national distance]

[call reminding] [function is taken | Monthly Fee | information charge | communication expense]

C. non-essential expression (?)

Part of speech in square bracket can add "? " at the end of also can not occur can appear in expression, i.e. non-essential relation, and the part of speech of this inessential relation can calculate separately in the mode of " expansion " when calculating similarity too." expansion " mainly will be launched into the process of two the simple semantic formulas comprising and do not comprise this part of speech containing non-essential part of speech (or part of speech " or combination ") in semantic formula.As: [introduction] [mobile video] [military column] [content] [what? ] deployable one-tenth " [introduction] [mobile video] [military column] [content] " and " [introduction] [mobile video] [military column] [content] [what] " two simple semantic formulas.

The example of this kind of semantic formula is as follows:

[CRBT] [cancellation] [method | step? ]

[introduction] [mobile video] [military column] [content] [what? ]

[introduction] [12580? ] [life is reported] [quality life version] [freely] [business? ]

[how] [open-minded] [Mobile data | flow | online] [100 yuan] [set meal? ] [note]

D. semantically enhancement (&)

There is " & " symbol at the Far Left of semantic formula, the weight of part of speech in semantic formula can be improved in Similarity Measure.This kind of semantic formula often can ignore more word in user's question sentence, and matching range can be more extensive.

The example of this kind of semantic formula is as follows:

& [mobile video] [preferential bag | preferential]

& [the whole network music box] [starlight is sparking] [1 yuan] [set meal]

& [17951] [Mobile IP phone] [business? ]

& [IP? ] [through train] [business? ]

Semantic formula generally has following requirement:

1) write and want succinct, do not write the content irrelevant with semanteme, do not use unified front and back to sew, as Wish i knew, I does not know, I want to consult.Now, with regard to needs, filtration treatment is carried out to expectation, to remove the content that those have nothing to do with semanteme.

2) do not go to summarize, transform, disperse semanteme to be expressed, as condition, restriction, points for attention, problem etc.; Illustrate:

Need the semanteme of expressing: I am when operating Net silver, prompting ERROR-001

Mistake template:

(1) [operation] [Net silver] [reporting an error]

(2) [operation] [Net silver] [problem]

Correct template:

[operation? ] [Net silver] [prompting? ] [ERROR] [001]

3) do not abuse "? " " | ".

Remove in semantic formula containing "? " the word of " | " symbol, residue expression formula need express the similar meaning.

Typical problem: webpage reports an error " ERROR "

Mistake template: [open? ] [webpage] [report an error? ] [ERROR? ]

Correct template: [open? ] [webpage] [reporting an error] [ERROR]

Q & A database comprises the set that question and answer standard is asked, and the set of the standard question and answer case be associated.After user proposes customer problem, asking by mating corresponding standard in Q & A database, then the standard question and answer case that this corresponding standard is asked being supplied to user.Power is matched in order to improve, each question and answer standard is asked and is in fact associated with some semantic formulas, find mated corresponding standard by customer problem and the Similarity Measure of semantic formula to ask, be about to the question and answer standard had with this customer problem associated by the semantic formula of the highest semantic similarity and ask that the corresponding standard as this customer problem is asked, and corresponding answer is provided.

On the other hand, question answering system also comprises correct daily record storehouse, and correct daily record storehouse is for storing the database of all correct daily records in Intelligent Answer System.So-called correct daily record is that such as each correct daily record comprises the model answer that correct standard is asked and this correct standard is asked of system through system or the errorless daily record of manual confirmation.Here the set that correct standard is asked is generally identical with the set that the question and answer standard in Q & A database is asked.

Intelligent Answer System in use can produce a large amount of user journals, each daily record comprise customer problem that user provides and for this customer problem mate the question and answer standard obtained and ask about corresponding answer.The user journal of magnanimity to Intelligent Answer System produces manually is needed to carry out analysis confirmation, for the optimization of Intelligent Answer System and maintenance.

Fig. 1 shows the process flow diagram of the log processing method 100 of question answering system according to an aspect of the present invention.The method 100 is for carrying out robotization combing, to alleviate labor workload to the user journal of magnanimity.

In step 102, first obtain user journal data.

This acquisition can be collected one by one when question answering system produces each user journal, also can be batch collection from question answering system termly.In step 104, filtration treatment is carried out, to obtain pending daily record data to the user journal data obtained.

In one example, if a user journal drops in correct daily record storehouse or in meaningless daily record storehouse, then this user journal obviously confirms without the need to combing.Meaningless daily record storehouse is the database having gathered some insignificant daily records, some reasonless humour problems that such as user arbitrarily proposes, and these daily records are collected as meaningless daily record storehouse.

The daily record data in meaningless daily record storehouse is not belonged to as pending daily record data using neither belonging to correct daily record storehouse in user journal data by multilevel iudge yet, and the daily record data dropped in correct daily record storehouse or in meaningless daily record storehouse is by filtering, without the need to further process.

In step 106, obtain the first standard calculated by carrying out the first similarity to pending daily record data and ask.

For a specific pending daily record, asked to mate the first corresponding standard by the first Similarity Measure.Specifically, this coupling is based on Q & A database.As previously mentioned, Q & A database comprises multiple question and answer standard and asks, each question and answer standard is asked and is associated with semantic formula to represent that this question and answer standard is asked in addition.

By pending daily record (such as, customer problem contained in this pending daily record) the semantic formula executable expressions Semantic Similarity Measurement of asking with the question and answer standard in Q & A database, to find the semantic formula that similarity is the highest, the question and answer standard corresponding to the semantic formula of this highest similarity is asked and is namely asked as mating the first standard obtained.

As previously mentioned, in each user journal, comprise the customer problem that user proposes, and question answering system to be asked and to should the answer of asking of question and answer standard at that time for question and answer standard that this customer problem mates in Q & A database.Therefore, in another example, directly this question and answer standard contained in this pending daily record is asked as the first standard and ask.

In step 108, obtain the second standard calculated by carrying out the second similarity to this pending daily record data and ask.

For this pending daily record, also asked to mate the second corresponding standard by the second Similarity Measure.Specifically, this coupling is based on correct daily record storehouse.

Specifically, with correct daily record storehouse (such as, the set that correct standard contained by it is asked) based on large data clusters is carried out to all pending daily records, to make each pending daily record be asked by cluster to a certain correct standard, thus this correct standard is asked the second standard obtained as coupling is asked.Such as, ask execution a Semantic Similarity Measurement each pending daily record (can be such as the customer problem of this user journal) and each correct standard, then, this pending daily record is gathered that correct standard with the highest semantic similarity to ask.

In step 110, the similarity of asking when this pending daily record data and the first standard is greater than first threshold, the similarity that pending daily record data and the second standard are asked is greater than Second Threshold, and the first standard is asked when asking identical with the second standard, then this pending daily record be labeled as correctly.

The similarity that this pending daily record data and the first standard are asked is greater than first threshold, namely represent that this pending daily record is matched to this first standard and asks to have higher confidence level, the similarity that pending daily record data and the second standard are asked is greater than Second Threshold, also represent that this pending daily record is matched to this second standard and asks to have higher confidence level, and first standard ask and ask identical with the second standard, represent that pending daily record is all matched to identical standard by two weeks different Similarity Measure and asks, thus can judge that this user journal is correct.

On the other hand, if the similarity that this pending daily record data and the first standard are asked is greater than first threshold, namely represent that this pending daily record is matched to this first standard and asks to have higher confidence level, the similarity that pending daily record data and the second standard are asked is greater than Second Threshold, also represent that this pending daily record is matched to this second standard and asks to have higher confidence level, but the first standard is asked and is asked difference with the second standard, illustrate that user journal is asked being matched to different standards by two kinds of different Similarity Measure, now should be as the criterion with the result that matching confidence is higher.

Specifically, the difference ratio of similarity that pending daily record data and the first standard ask and first threshold can be calculated (namely, similarity deducts the difference of first threshold again divided by this first threshold), and calculate the difference ratio (similarity deducts the difference of Second Threshold again divided by this Second Threshold) of similarity that pending daily record data and the second standard ask and Second Threshold.

If the difference of pending daily record data and the first standard similarity of asking and first threshold is than being greater than similarity that pending daily record data and the second standard the ask difference ratio with Second Threshold, then illustrate that this pending daily record data is asked with the first standard (the question and answer standard namely in Q & A database is asked) to mate reliability higher, namely the answer that Q & A database provides is correct, in other words, this the pending daily record produced by Q & A database is correct, therefore this pending daily record is labeled as correctly.

Again on the one hand, if the similarity that this pending daily record data and the first standard are asked is less than first threshold, and the similarity that pending daily record data and the second standard are asked also is less than Second Threshold, then represent that the confidence level of this pending daily record data is lower, need manual confirmation.But each ground confirms that labor workload is very large.

For this reason, first threshold can be less than to all similarities of asking with the first standard and and the second standard similarity of asking user journal data of being less than Second Threshold carry out cluster analysis, be that multiple user journal is trooped for manual confirmation their clusters.Like this, each troop in user journal troop there is higher similarity, be convenient to manual confirmation.

In one example, consider the difference of Semantic Similarity Measurement, first threshold can be less than Second Threshold.

In this way, automatically can mark user journal, save labor workload in large quantities.

Fig. 2 shows the process flow diagram of question and answer information processing method 200 according to an aspect of the present invention.This question and answer information processing method, can be run with the problem provided according to user by question answering system, provide corresponding answer.

In step 202, receive customer problem.

Customer problem can be received via the interactive interface answering system.

In step 204, the first similarity is carried out to this customer problem and calculates the first standard and ask.

For a specific customer problem, asked to mate the first corresponding standard by the first Similarity Measure.Specifically, this coupling is based on Q & A database.As previously mentioned, Q & A database comprises multiple question and answer standard and asks, each question and answer standard is asked and is associated with semantic formula to represent that this question and answer standard is asked in addition.

The semantic formula executable expressions Semantic Similarity Measurement that question and answer standard in this customer problem and Q & A database is asked, to find the semantic formula that similarity is the highest, the question and answer standard corresponding to the semantic formula of this highest similarity is asked and is namely asked as mating the first standard obtained.

In step 206, the second similarity is carried out to this customer problem and calculates the second standard and ask.

For this customer problem, also asked to mate the second corresponding standard by the second Similarity Measure.Specifically, this coupling is based on correct daily record storehouse.

Specifically, with correct daily record storehouse (such as, the set that correct standard contained by it is asked) based on large data clusters is carried out to this customer problem, to make this customer problem be asked to a certain correct standard by cluster, thus this correct standard is asked as mating the second standard obtained and asks.Such as, ask execution a Semantic Similarity Measurement this customer problem and each correct standard, then, this customer problem cluster is asked to that correct standard with the highest semantic similarity.

In step 208, the similarity of asking when this customer problem and the first standard is greater than first threshold, the similarity that this customer problem and the second standard are asked is greater than Second Threshold, and the first standard is asked when asking identical with the second standard, then to ask to this first standard of user feedback or described second standard asks corresponding answer information.

The similarity that this customer problem and the first standard are asked is greater than first threshold, namely represent that this customer problem is matched to this first standard and asks to have higher confidence level, the similarity that this customer problem and the second standard are asked is greater than Second Threshold, also represent that this customer problem is matched to this second standard and asks to have higher confidence level, and first standard ask and ask identical with the second standard, represent that this customer problem is all matched to identical standard by two weeks different Similarity Measure and asks, thus can judge that this coupling is accuracy, therefore, to ask to this first standard of user feedback or described second standard asks corresponding answer information, the answer now provided has very high confidence level.

On the other hand, if the similarity that customer problem and the first standard are asked is greater than first threshold, namely represent that this customer problem is matched to this first standard and asks to have higher confidence level, the similarity that this customer problem and the second standard are asked is greater than Second Threshold, also represent that this customer problem is matched to this second standard and asks to have higher confidence level, but the first standard is asked and is asked difference with the second standard, illustrate that customer problem is asked being matched to different standards by two kinds of different Similarity Measure, now should be as the criterion with the result that matching confidence is higher.

Specifically, the difference ratio of similarity that customer problem and the first standard ask and first threshold can be calculated (namely, similarity deducts the difference of first threshold again divided by this first threshold), and calculate the difference ratio (similarity deducts the difference of Second Threshold again divided by this Second Threshold) of similarity that customer problem and the second standard ask and Second Threshold.

If the difference of customer problem and the first standard similarity of asking and first threshold is than being greater than similarity that customer problem and the second standard the ask difference ratio with Second Threshold, then illustrate that this customer problem is asked with the first standard (the question and answer standard namely in Q & A database is asked) to mate reliability higher, now, ask that corresponding answer information is supplied to user by this first standard.Otherwise, ask that corresponding answer is supplied to user by the second standard.

In this way, improve the question and answer accuracy of question answering system.

Said method illustrated although simplify for making explanation and is described as a series of actions, it should be understood that and understand, these methods not limit by the order of action, because according to one or more embodiment, some actions can occur by different order and/or with from illustrating herein and describe or not shown and to describe but other actions that it will be appreciated by those skilled in the art that occur concomitantly herein.

Fig. 3 shows the block diagram of the log processing device 300 of question answering system according to an aspect of the present invention.This log processing device 300 can be used for carrying out robotization combing, to alleviate labor workload to the user journal of magnanimity.This log processing device 300 can comprise acquisition module 302, filtering module 304, first similarity calculation module 306, second similarity calculation module 308, judge module 310 and labeling module 312.

First acquisition module 302 obtains user journal data.

This acquisition can be collected one by one when question answering system produces each user journal, also can be batch collection from question answering system termly.

Filtering module 304 can carry out filtration treatment, to obtain pending daily record data to the user journal data obtained.

Filtering module 304 does not belong to the daily record data in meaningless daily record storehouse as pending daily record data by multilevel iudge using neither belonging to correct daily record storehouse in user journal data yet, and the daily record data dropped in correct daily record storehouse or in meaningless daily record storehouse is by filtering, without the need to further process.

First similarity calculation module 306 can obtain the first standard calculated by carrying out the first similarity to pending daily record data and ask.

First similarity calculation module 306 can comprise expression formula Semantic Similarity Measurement module (not shown), with by pending daily record (such as, customer problem contained in this pending daily record) the semantic formula executable expressions Semantic Similarity Measurement of asking with the question and answer standard in Q & A database, to find the semantic formula that similarity is the highest, the question and answer standard corresponding to the semantic formula of this highest similarity is asked and is namely asked as mating the first standard obtained.

Second similarity calculation module 308 can obtain the second standard calculated by carrying out the second similarity to this pending daily record data and ask.

For this pending daily record, also asked to mate the second corresponding standard by the second similarity calculation module 308.Specifically, this coupling is based on correct daily record storehouse.

Specifically, second similarity calculation module 308 can comprise cluster module (not shown), for with correct daily record storehouse (such as, the set that correct standard contained by it is asked) based on large data clusters is carried out to all pending daily records, to make each pending daily record be asked by cluster to a certain correct standard, thus this correct standard is asked the second standard obtained as coupling is asked.Such as, ask execution a Semantic Similarity Measurement each pending daily record (can be such as the customer problem of this user journal) and each correct standard, then, this pending daily record is gathered that correct standard with the highest semantic similarity to ask.

Judge module 310 can judge whether the similarity that this pending daily record data and the first standard are asked is greater than first threshold, whether the similarity that this pending daily record data and the second standard are asked is greater than Second Threshold, and this first standard is asked and this second standard asks whether same standard is asked.

The similarity of asking when this pending daily record data and the first standard is greater than first threshold, the similarity that pending daily record data and the second standard are asked is greater than Second Threshold, and the first standard asks that when asking identical with the second standard, then this pending daily record can be labeled as correctly by labeling module 312.

The similarity that this pending daily record data and the first standard are asked is greater than first threshold, namely represent that this pending daily record is matched to this first standard and asks to have higher confidence level, the similarity that pending daily record data and the second standard are asked is greater than Second Threshold, also represent that this pending daily record is matched to this second standard and asks to have higher confidence level, and first standard ask and ask identical with the second standard, represent that pending daily record is all matched to identical standard by two weeks different Similarity Measure and asks, thus labeling module 312 can judge that this user journal is correct.

Specifically, (namely judge module 310 can calculate the difference ratio of similarity that pending daily record data and the first standard ask and first threshold, similarity deducts the difference of first threshold again divided by this first threshold), and calculate the difference ratio (similarity deducts the difference of Second Threshold again divided by this Second Threshold) of similarity that pending daily record data and the second standard ask and Second Threshold, thus judge the difference of similarity that pending daily record data and the first standard ask and first threshold than the difference ratio of the similarity being greater than or being less than pending daily record data and ask with the second standard with Second Threshold.

If the difference of pending daily record data and the first standard similarity of asking and first threshold is than being greater than similarity that pending daily record data and the second standard the ask difference ratio with Second Threshold, then illustrate that this pending daily record data is asked with the first standard (the question and answer standard namely in Q & A database is asked) to mate reliability higher, namely the answer that Q & A database provides is correct, in other words, this the pending daily record produced by Q & A database is correct, and therefore this pending daily record can be labeled as correctly by labeling module 312.

Fig. 4 shows the block diagram of question and answer signal conditioning package 400 according to an aspect of the present invention.

Question and answer signal conditioning package 400 can comprise receiver module 402, first similarity calculation module 404, second similarity calculation module 406 and judge module 408 and output module 410.

Receiver module 402 can receive customer problem.Described customer problem can be text formatting, also can be the forms such as voice.

Receiver module 402 can receive customer problem through the interactive interface of question answering system.

First similarity calculation module 404 can be carried out the first similarity to this customer problem and be calculated the first standard and ask.

For a specific customer problem, asked to mate the first corresponding standard by the first similarity calculation module 404.Specifically, this coupling is based on Q & A database.As previously mentioned, Q & A database comprises multiple question and answer standard and asks, each question and answer standard is asked and is associated with semantic formula to represent that this question and answer standard is asked in addition.

First similarity calculation module 404 can comprise expression formula Semantic Similarity Measurement module (not shown), for the semantic formula executable expressions Semantic Similarity Measurement that the question and answer standard in this customer problem and Q & A database is asked, to find the semantic formula that similarity is the highest, the question and answer standard corresponding to the semantic formula of this highest similarity is asked and is namely asked as mating the first standard obtained.

Second similarity calculation module 406 can be carried out the second similarity to this customer problem and be calculated the second standard and ask.

For this customer problem, also asked to mate the second corresponding standard by the second similarity calculation module 406.Specifically, this coupling is based on correct daily record storehouse.

Specifically, second similarity calculation module 406 can comprise cluster module (not shown), for with correct daily record storehouse (such as, the set that correct standard contained by it is asked) based on large data clusters is carried out to this customer problem, to make this customer problem be asked by cluster to a certain correct standard, thus this correct standard is asked the second standard obtained as coupling is asked.Such as, this customer problem and each correct standard can ask execution a Semantic Similarity Measurement by cluster module, then, this customer problem cluster are asked to that correct standard with the highest semantic similarity.

Judge module 408 can judge whether the similarity that this customer problem and this first standard are asked is greater than first threshold, whether the similarity that this customer problem and this second standard are asked is greater than Second Threshold, and this first standard is asked and this second standard asks whether be that same standard is asked.

The similarity of asking when this customer problem and the first standard is greater than first threshold, the similarity that this customer problem and the second standard are asked is greater than Second Threshold, and the first standard is asked when asking identical with the second standard, then output module 410 can be asked to this first standard of user feedback or described second standard asks corresponding answer information.

The similarity that this customer problem and the first standard are asked is greater than first threshold, namely represent that this customer problem is matched to this first standard and asks to have higher confidence level, the similarity that this customer problem and the second standard are asked is greater than Second Threshold, also represent that this customer problem is matched to this second standard and asks to have higher confidence level, and first standard ask and ask identical with the second standard, represent that this customer problem is all matched to identical standard by two weeks different Similarity Measure and asks, thus can judge that this coupling is accuracy, therefore, output module 410 can be asked to this first standard of user feedback or described second standard asks corresponding answer information, the answer now provided has very high confidence level.

Specifically, (namely judge module 408 can calculate the difference ratio of similarity that customer problem and the first standard ask and first threshold, similarity deducts the difference of first threshold again divided by this first threshold), and calculate the difference ratio (similarity deducts the difference of Second Threshold again divided by this Second Threshold) of similarity that customer problem and the second standard ask and Second Threshold, thus judge the difference of similarity that this customer problem and the first standard ask and first threshold than the difference ratio of the similarity being greater than or being less than this customer problem and ask with the second standard with Second Threshold.

If the difference of customer problem and the first standard similarity of asking and first threshold is than being greater than similarity that customer problem and the second standard the ask difference ratio with Second Threshold, then illustrate that this customer problem is asked with the first standard (the question and answer standard namely in Q & A database is asked) to mate reliability higher, now, this first standard can be asked that corresponding answer information is supplied to user by output module 410.Otherwise, ask that corresponding answer is supplied to user by the second standard.

Those skilled in the art will understand further, and the various illustrative logic plates, module, circuit and the algorithm steps that describe in conjunction with embodiment disclosed herein can be embodied as electronic hardware, computer software or the combination of both.For clearly explaining orally this interchangeability of hardware and software, various illustrative components, frame, module, circuit and step are done vague generalization above with its functional form and are described.This type of is functional is implemented as hardware or software depends on embody rule and puts on the design constraint of total system.Technician can realize described functional by different modes for often kind of application-specific, but such realize decision-making and should not be interpreted to and cause having departed from scope of the present invention.

Software should be construed broadly into mean instruction, instruction set, code, code segment, program code, program, subroutine, software module, application, software application, software package, routine, subroutine, object, can executive item, execution thread, code, function etc., no matter it is that to address with software, firmware, middleware, microcode, hardware description language or other term be all like this.

The various illustrative logic plates, module and the circuit that describe in conjunction with embodiment disclosed herein can realize with general processor, digital signal processor (DSP), special IC (ASIC), field programmable gate array (FPGA) or other programmable logic device (PLD), discrete door or transistor logic, discrete nextport hardware component NextPort or its any combination being designed to perform function described herein or perform.General processor can be microprocessor, but in alternative, and this processor can be the processor of any routine, controller, microcontroller or state machine.Processor can also be implemented as the combination of computing equipment, the combination of such as DSP and microprocessor, multi-microprocessor, with one or more microprocessor of DSP central cooperation or any other this type of configure.

The method described in conjunction with embodiment disclosed herein or the step of algorithm can be embodied directly in hardware, in the software module performed by processor or in the combination of both and embody.Software module can reside in the storage medium of RAM storer, flash memory, ROM storer, eprom memory, eeprom memory, register, hard disk, removable dish, CD-ROM or any other form known in the art.Exemplary storage medium is coupled to processor and can reads and written information from/to this storage medium to make this processor.In alternative, storage medium can be integrated into processor.

Thering is provided previous description of the present disclosure is for making any person skilled in the art all can make or use the disclosure.To be all apparent for a person skilled in the art to various amendment of the present disclosure, and generic principles as defined herein can be applied to other variants and can not depart from spirit or scope of the present disclosure.Thus, the disclosure not intended to be is defined to example described herein and design, but the widest scope consistent with principle disclosed herein and novel features should be awarded.

Claims

1. a log processing method for question answering system, is characterized in that, comprising:

Obtain user journal data;

Filtration treatment is carried out to described user journal data, to obtain pending daily record data;

Obtain the first standard calculated by carrying out the first similarity to described pending daily record data to ask;

Obtain the second standard calculated by carrying out the second similarity to described pending daily record data to ask;

The similarity of asking when described pending daily record data and described first standard is greater than first threshold, the similarity that described pending daily record data and described second standard are asked is greater than Second Threshold, and described first standard is asked when asking identical with described second standard, then described pending daily record be labeled as correctly.

2. log processing method as claimed in claim 1, it is characterized in that, described filtration treatment comprises:

The daily record data in meaningless daily record storehouse is not belonged to as pending daily record data using neither belonging to correct daily record storehouse in described user journal data by multilevel iudge yet.

3. log processing method as claimed in claim 1, it is characterized in that, described first Similarity Measure comprises:

There is provided Q & A database, described Q & A database comprises multiple question and answer standard and asks;

Be that described pending daily record selects a question and answer standard to ask by expression formula Semantic Similarity Measurement, this question and answer standard is asked and to be asked as the first standard.

4. log processing method as claimed in claim 1, it is characterized in that, described second Similarity Measure comprises:

There is provided correct daily record storehouse, described correct daily record storehouse comprises correct standard and asks;

Adopt large data clusters analysis correct for described pending Log Clustering to standard to be asked, this correct standard is asked and to be asked as the second standard.

5. the method for claim 1, is characterized in that, described first standard is asked directly from described user journal extracting data.

6. log processing method as claimed in claim 5, it is characterized in that, also comprise: the similarity of asking when described pending daily record data and described first standard is greater than first threshold, the similarity that described pending daily record data and described second standard are asked is greater than Second Threshold, described first standard is asked and is asked difference with described second standard, and the difference of described pending daily record data and the described first standard similarity of asking and first threshold than the difference being greater than similarity that described pending daily record data and described second standard ask and Second Threshold than time, then described pending daily record is labeled as correctly.

7. the method for claim 1, is characterized in that, also comprises:

Being less than first threshold to the similarity of asking with described first standard and all user journal data being less than Second Threshold with the similarity that described second standard is asked carry out cluster analysis, is that multiple user journal is trooped for manual confirmation with cluster.

8. a question and answer information processing method, is characterized in that, comprising:

Receive customer problem;

Carry out the first similarity to described customer problem to calculate the first standard and ask;

Carry out the second similarity to described customer problem to calculate the second standard and ask;

The similarity of asking when described customer problem and described first standard is greater than first threshold, the similarity that described customer problem and described second standard are asked is greater than Second Threshold, and described first standard is asked when asking identical with described second standard, then ask that corresponding answer information or described second standard ask corresponding answer information to the first standard described in user feedback.

9. question and answer information processing method as claimed in claim 8, it is characterized in that, described first Similarity Measure comprises:

Be that described customer problem selects a question and answer standard to ask by expression formula Semantic Similarity Measurement, this question and answer standard is asked and to be asked as the first standard.

10. question and answer information processing method as claimed in claim 8, it is characterized in that, described second Similarity Measure comprises:

Adopt large data clusters analysis correct for described customer problem cluster to standard to be asked, this correct standard is asked and to be asked as the second standard.

11. question and answer information processing methods as claimed in claim 8, it is characterized in that, also comprise: the similarity of asking when described customer problem and described first standard is greater than first threshold, the similarity that described customer problem and described second standard are asked is greater than Second Threshold, and described first standard is asked when asking difference with described second standard, then ask corresponding answer information to user feedback similarity and corresponding threshold difference than larger standard.

The log processing device of 12. 1 kinds of question answering systems, is characterized in that, comprising:

Acquisition module, for obtaining user journal data;

Filtering module, for carrying out filtration treatment to described user journal data, to obtain pending daily record data;

First similarity calculation module, asks for obtaining the first standard calculated by carrying out the first similarity to described pending daily record data;

Second similarity calculation module, asks for obtaining the second standard calculated by carrying out the second similarity to described pending daily record data;

Judge module, for judging whether the similarity that described pending daily record data and described first standard are asked is greater than first threshold, whether the similarity that described pending daily record data and described second standard are asked is greater than Second Threshold, and described first standard is asked and asked whether same standard is asked with described second standard; And

Labeling module, similarity for asking when described pending daily record data and described first standard is greater than first threshold, the similarity that described pending daily record data and described second standard are asked is greater than Second Threshold, and described first standard is asked when asking identical with described second standard, then described pending daily record be labeled as correctly.

13. log processing devices as claimed in claim 12, it is characterized in that, described question answering system provides correct daily record storehouse and meaningless daily record storehouse, and described filtering module does not belong to the daily record data in meaningless daily record storehouse as pending daily record data further by multilevel iudge using neither belonging to correct daily record storehouse in described user journal data yet.

14. log processing devices as claimed in claim 12, it is characterized in that, described question answering system provides Q & A database, and described Q & A database comprises multiple question and answer standard and asks, described first similarity calculation module comprises:

Expression formula Semantic Similarity Measurement module, for being that described pending daily record selects a question and answer standard to ask by expression formula Semantic Similarity Measurement, this question and answer standard is asked and to be asked as the first standard.

15. log processing devices as claimed in claim 12, it is characterized in that, described question answering system provides correct daily record storehouse, and described correct daily record storehouse comprises correct standard and asks, described second Similarity Measure comprises:

Cluster module, for adopting large data clusters analysis correct for described pending Log Clustering to standard to be asked, this correct standard is asked and to be asked as the second standard.

16. log processing devices as claimed in claim 12, is characterized in that, described first similarity calculation module is directly asked from the first standard described in described user journal extracting data.

17. log processing devices as claimed in claim 16, it is characterized in that, the similarity that described judge module is used for when described pending daily record data and described first standard are asked is greater than first threshold, the similarity that described pending daily record data and described second standard are asked is greater than Second Threshold, when described first standard is asked and is asked difference with described second standard, judge the difference of similarity that described pending daily record data and described first standard ask and first threshold than the difference ratio of the similarity whether being greater than described pending daily record data and asking with described second standard with Second Threshold further,

The difference of the similarity of asking in response to described pending daily record data and described first standard and first threshold is than being greater than similarity that described pending daily record data and described second standard the ask difference ratio with Second Threshold, and described pending daily record is labeled as correctly by described labeling module.

18. 1 kinds of question and answer signal conditioning packages, is characterized in that, comprising:

Receiver module, for receiving customer problem;

First similarity calculation module, calculates the first standard ask for carrying out the first similarity to described customer problem;

Second similarity calculation module, calculates the second standard ask for carrying out the second similarity to described customer problem;

Judge module, for judging whether the similarity that described customer problem and described first standard are asked is greater than first threshold, whether the similarity that described customer problem and described second standard are asked is greater than Second Threshold, and described first standard asks whether ask with described second standard is that same standard is asked; And

Output module, similarity for asking when described customer problem and described first standard is greater than first threshold, the similarity that described customer problem and described second standard are asked is greater than Second Threshold, and described first standard is asked when asking identical with described second standard, ask that corresponding answer information or described second standard ask corresponding answer information to the first standard described in user feedback.

19. question and answer signal conditioning packages as claimed in claim 18, it is characterized in that, described question answering system provides Q & A database, and described Q & A database comprises multiple question and answer standard and asks, described first similarity calculation module comprises:

Expression formula Semantic Similarity Measurement module is that described customer problem selects a question and answer standard to ask by expression formula Semantic Similarity Measurement, and this question and answer standard is asked and to be asked as the first standard.

20. question and answer signal conditioning packages as claimed in claim 18, it is characterized in that, described question answering system provides correct daily record storehouse, and described correct daily record storehouse comprises correct standard and asks, described second similarity calculation module comprises:

Cluster module, for adopting large data clusters analysis correct for described customer problem cluster to standard to be asked, this correct standard is asked and to be asked as the second standard.

21. question and answer signal conditioning packages as claimed in claim 18, it is characterized in that, the similarity that described judge module is asked when described customer problem and described first standard is greater than first threshold, the similarity that described customer problem and described second standard are asked is greater than Second Threshold, when described first standard is asked and asked difference with described second standard, judge that further similarity that described customer problem and described first standard ask and the difference of first threshold compare the difference ratio whether being greater than similarity that described customer problem and described second standard ask and Second Threshold;

Described output module asks corresponding answer information with corresponding threshold difference than larger standard to user feedback similarity.