CN105550361B

CN105550361B - Log processing method and device and question and answer information processing method and device

Info

Publication number: CN105550361B
Application number: CN201511030354.1A
Authority: CN
Inventors: 曾永梅; 朱频频
Original assignee: Shanghai Zhizhen Intelligent Network Technology Co Ltd
Current assignee: Shanghai Zhizhen Intelligent Network Technology Co Ltd
Priority date: 2015-12-31
Filing date: 2015-12-31
Publication date: 2018-11-09
Anticipated expiration: 2035-12-31
Also published as: CN105550361A

Abstract

The present invention provides a kind of log processing methods of question answering system, including：Obtain user journal data；Processing is filtered to the user journal data, to obtain pending daily record data；It obtains and is asked by carrying out the first standard that the first similarity calculation obtains to the pending daily record data；It obtains and is asked by carrying out the second standard that the second similarity calculation obtains to the pending daily record data；When the similarity that the pending daily record data and first standard are asked is more than first threshold, the pending daily record data is more than second threshold with the similarity that second standard is asked, and first standard is asked and when second standard asks identical, is then labeled as the pending daily record correctly.

Description

Log processing method and device and question and answer information processing method and device

Technical field

The present invention relates to the log processing method and device of human-computer interaction technique field more particularly to question answering system and question and answer Information processing method and device.

Background technology

Human-computer interaction is the science of the interactive relation between research system and user.System can be various machines Device can also be the system and software of computerization.For example, various artificial intelligence systems, example may be implemented by human-computer interaction Such as, intelligent customer service system, speech control system etc..Artificial intelligence semantics recognition is the basis of human-computer interaction, can be to people Speech like sound is identified, to be converted into machine it will be appreciated that language.

Intelligent Answer System is a kind of typical case of human-computer interaction, wherein after user's proposition problem, intelligent answer system System provides the answer of the problem.For this purpose, there is a set of knowledge base in intelligent Answer System, there is a large amount of problems in the inside and is asked with each Inscribe corresponding answer.The problem of intelligent Answer System is proposed firstly the need of identification user is found and is somebody's turn to do from knowledge base Then problem corresponding to customer problem finds out the answer to match with the problem.

The maintenance update of intelligent Answer System is a significant challenge.

Invention content

A brief summary of one or more aspects is given below to provide to the basic comprehension in terms of these.This general introduction is not The extensive overview of all aspects contemplated, and be both not intended to identify critical or decisive element in all aspects also non- Attempt to define the range in terms of any or all.Its unique purpose is to provide the one of one or more aspects in simplified form A little concepts are with the sequence for more detailed description given later.

According to an aspect of the present invention, a kind of log processing method of question answering system is provided, including：

Obtain user journal data；

Processing is filtered to the user journal data, to obtain pending daily record data；

It obtains and is asked by carrying out the first standard that the first similarity calculation obtains to the pending daily record data；

It obtains and is asked by carrying out the second standard that the second similarity calculation obtains to the pending daily record data；

When the similarity that the pending daily record data and first standard are asked is more than first threshold, the pending daily record data The similarity asked with second standard is more than second threshold, and first standard asks and when second standard asks identical, then should Pending daily record is labeled as correctly.

In one example, which includes：

Correct daily record library and meaningless daily record library are provided；

By comparing judging both to be not belonging to correct daily record library in the user journal data or be not belonging to meaningless daily record library Daily record data as pending daily record data.

In one example, which includes：

Q & A database is provided, which includes that multiple question and answer standards are asked；

It is that the pending daily record selects a question and answer standard to ask that the question and answer standard asks work by expression formula Semantic Similarity Measurement It is asked for the first standard.

In one example, which includes：

Correct daily record library is provided, which includes that correct standard is asked；

The pending Log Clustering to a correct standard is asked using big data cluster analysis, which asks as Two standards are asked.

In one example, which asks directly from the user journal extracting data.

In one example, this method further includes：When the similarity that the pending daily record data is asked with first standard is more than First threshold, the pending daily record data are more than second threshold with the similarity that second standard is asked, which asks and be somebody's turn to do Second standard asks difference, and the difference ratio of similarity and first threshold that the pending daily record data is asked with first standard is more than When the difference ratio for the similarity and second threshold that the pending daily record data is asked with second standard, then by the pending daily record mark Note is correct.

In one example, this method further includes：

Pair similarity asked with first standard is less than first threshold and the similarity asked with second standard is less than second All user journal data of threshold value carry out clustering, to cluster as multiple user journal clusters for manual confirmation.

According to another aspect of the present invention, a kind of question and answer information processing method is provided, including：

Receive customer problem；

The first standard is obtained to the customer problem the first similarity calculation of progress to ask；

The second standard is obtained to the customer problem the second similarity calculation of progress to ask；

When the similarity that the customer problem and first standard are asked is more than first threshold, the customer problem and second standard The similarity asked is more than second threshold, and first standard is asked and when second standard asks identical, then to user feedback this first Standard asks that corresponding answer information or second standard ask corresponding answer information.

In one example, which includes：

It is that the customer problem selects a question and answer standard to ask that the question and answer standard asks conduct by expression formula Semantic Similarity Measurement First standard is asked.

In one example, which includes：

The customer problem is clustered to a correct standard using big data cluster analysis and is asked, which asks as second Standard is asked.

In one example, this method further includes：When the similarity that the customer problem and first standard are asked is more than the first threshold Value, which is more than second threshold with the similarity that second standard is asked, and first standard is asked and second standard is asked When different, then corresponding answer information is asked than the standard of bigger to user feedback similarity with corresponding threshold difference.

According to another aspect of the present invention, a kind of log processing device of question answering system is provided, including：

Acquisition module, for obtaining user journal data；

Filtering module, for being filtered processing to the user journal data, to obtain pending daily record data；

First similarity calculation module is obtained for obtaining by carrying out the first similarity calculation to the pending daily record data To the first standard ask；

Second similarity calculation module is obtained for obtaining by carrying out the second similarity calculation to the pending daily record data To the second standard ask；

Judgment module, for judging whether the similarity that the pending daily record data and first standard are asked is more than the first threshold Value, whether the similarity that the pending daily record data and second standard are asked is more than second threshold and first standard is asked and Second standard asks whether same standard is asked；And

Labeling module should for being more than first threshold when the pending daily record data and the similarity that first standard is asked Pending daily record data is more than second threshold with the similarity that second standard is asked, and first standard is asked and second standard is asked When identical, then the pending daily record is labeled as correctly.

In one example, which is provided with correct daily record library and meaningless daily record library, and the filtering module is further By comparing judging both to be not belonging to correct daily record library in the user journal data or be not belonging to the daily record number in meaningless daily record library According to as pending daily record data.

In one example, which is provided with Q & A database, which includes that multiple question and answer standards are asked, First similarity calculation module includes：

Expression formula Semantic Similarity Measurement module, for being the pending daily record choosing by expression formula Semantic Similarity Measurement It selects a question and answer standard to ask, which asks asks as the first standard.

In one example, which is provided with correct daily record library, which includes that correct standard is asked, this Two similarity calculations include：

Cluster module, for asking that this is just by the pending Log Clustering to a correct standard using big data cluster analysis True standard is asked asks as the second standard.

In one example, first similarity calculation module is directly from the user journal extracting data first standard It asks.

In one example, the similarity which is used to ask with first standard when the pending daily record data is more than First threshold, the pending daily record data are more than second threshold with the similarity that second standard is asked, which asks and be somebody's turn to do When second standard asks different, similarity that the pending daily record data and first standard are asked and first threshold are further judged Difference than whether being more than the difference ratio of similarity and second threshold that the pending daily record data and second standard are asked,

The difference ratio of the similarity and first threshold asked in response to the pending daily record data and first standard is more than should The difference ratio for the similarity and second threshold that pending daily record data and second standard are asked, the labeling module is by the pending day Will is labeled as correctly.

In accordance with a further aspect of the present invention, a kind of question and answer information processing unit is provided, including：

Receiving module, for receiving customer problem；

First similarity calculation module is asked for obtaining the first standard to the customer problem the first similarity calculation of progress；

Second similarity calculation module is asked for obtaining the second standard to the customer problem the second similarity calculation of progress；

Judgment module should for judging whether the similarity that the customer problem and first standard are asked is more than first threshold The similarity that customer problem is asked with second standard whether is more than second threshold and first standard is asked and second standard is asked Whether it is that same standard is asked；And

Output module, for being more than first threshold when the customer problem and the similarity that first standard is asked, which asks The similarity that topic is asked with second standard is more than second threshold, and first standard is asked and when second standard asks identical, Xiang Yong It feeds back first standard and asks that corresponding answer information or second standard ask corresponding answer information in family.

Expression formula Semantic Similarity Measurement module is that customer problem selection one is asked by expression formula Semantic Similarity Measurement The standard of answering asks that the question and answer standard is asked asks as the first standard.

In one example, which is provided with correct daily record library, which includes that correct standard is asked, this Two similarity calculation modules include：

Cluster module asks that this is correct for clustering the customer problem to a correct standard using big data cluster analysis Standard is asked asks as the second standard.

In one example, which works as the customer problem and is more than first threshold with the similarity that first standard is asked, The customer problem is more than second threshold with the similarity that second standard is asked, which asks and second standard asks difference When, further judge the difference for the similarity and first threshold that the customer problem is asked with first standard than whether being more than the user The difference ratio for the similarity and second threshold that problem is asked with second standard；

The output module asks that corresponding answer is believed with corresponding threshold difference to user feedback similarity than the standard of bigger Breath.

Scheme according to the present invention, by using the different similarity calculations based on Q & A database and correct daily record library, Automatic screening is realized to significant component of user journal data to confirm, is greatly reduced artificial workload, is improved Treatment effeciency reduces cost.In addition, by using the different similarity calculations based on Q & A database and correct daily record library, Improve the question and answer accuracy of question answering system.

Description of the drawings

After reading the detailed description of embodiment of the disclosure in conjunction with the following drawings, it better understood when the present invention's Features described above and advantage.In the accompanying drawings, each component is not necessarily drawn to scale, and has similar correlation properties or feature Component may have same or similar reference numeral.

Fig. 1 is the flow chart for the log processing method for showing question answering system according to an aspect of the present invention；

Fig. 2 is the flow chart for showing question and answer information processing method according to an aspect of the present invention；

Fig. 3 is the block diagram for the log processing device for showing question answering system according to an aspect of the present invention；And

Fig. 4 is the block diagram for showing question and answer information processing unit according to an aspect of the present invention.

Specific implementation mode

Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.Note that below in conjunction with attached drawing and specifically real The aspects for applying example description is merely exemplary, and is understood not to carry out any restrictions to protection scope of the present invention.

Basic knowledge point most original and simplest form in knowledge base are exactly usually common FAQ, general form It is that " ask-answer " is right.In the present invention, " standard is asked " is used to indicate that the word of some knowledge point, main target are that expression is clear, Convenient for safeguarding.For example, " rate of CRBT " are exactly that clearly standard asks description for expression.Here " asking " should not narrowly be understood For " inquiry ", and should broadly understand one " input ", should " input " with corresponding " output ".For example, for being used to control For the semantics recognition of system, the instruction of user, such as " opening radio " should also be understood to be one " asking ", Corresponding at this time " answering " can be performed for the calling of the control program accordingly controlled.

User to machine when inputting, the most ideal situation is that asked using standard, then the intelligent semantic identifying system of machine At once it will be appreciated that the meaning of user.However, user often not use standard to ask, but standard ask some deformation Form.If for example, being " changing a radio station " for the standard form of asking of the radio station switching of radio, then what user may use Order is " switching one radio station ", and what machine was also required to can to identify user's expression is the same meaning.Therefore, for intelligent language Justice identification for, the standard that needed in knowledge base ask it is similar ask, which asks that expression-form has slight difference with standard, but It is the identical meaning of expression.

Further, in order to it is more acurrate, efficiently identify customer problem, semantic meaning representation is also been developed in intelligent Answer System The concept of formula.

Semantic formula is mainly made of word, part of speech and their "or" relationship, and core depends on " part of speech ", part of speech It is simple to understand to be one group of word for having general character, these words semantically can it is similar can not also be similar, these words can also It is noted as important or inessential.Relationship between semantic formula and user's question sentence can be by the value (similarity) of quantization come table Show, at the same similarity of the value between similar question sentence and user's question sentence of this quantization can mutually compared with.

It briefly introduces below to semantic formula.

Symbol in semantic formula

A. the expression ([]) of part of speech

To distinguish the word in expression formula with part of speech, it is specified that part of speech must be present in square brackets " [] ", occur in square brackets Part of speech be generally " narrow sense part of speech ", but can also be supported " broad sense part of speech " by configuring systematic parameter.

Here is the example of some structures：

[Fetion] [how] [open-minded]

[introduction] [multimedia message] [business]

[login] [method] of [Fetion]

[call reminding] [how] [charge]

Or the expression of relationship (|) B.

Part of speech in square brackets can occur repeatedly by "or" relationship, and the part of speech of these "or" relationships can calculate phase It is individually calculated in a manner of " expansion " when seemingly spending.Semantic formula is mainly launched into according to the meaning of "or" by " expansion " The process of multiple structures.Such as：[CRBT] [open-minded] [method | step] it is deployable at the " [step of [CRBT] [open-minded] Suddenly] " and " [method] of [CRBT] [open-minded] " two simple semantic formulas.

The example of this kind of semantic formula is as follows：

[CRBT] [open-minded] [method | step]

[how] [inquiry | know] [PUK code]

[quit the subscription of | revocation | close | deactivate] [IP | 17951] [the preferential packet of national distance]

[call reminding] [function is taken | Monthly Fee | information charge | communication expense]

C. it is non-essential indicate (？)

Part of speech in square brackets can be added at the end of "？" indicate to may occur in which can not also occur, i.e., non-essential pass System, the part of speech of this inessential relationship similarly can individually be calculated when calculating similarity in a manner of " expansion "." exhibition Open " it is mainly that will be launched into include and do not include this containing non-essential part of speech (or " or combination " of part of speech) in semantic formula The process of the simple semantic formula of two of a part of speech.Such as：[introduction] [mobile video] [military column] [content] [what？] can It is launched into " [introduction] [mobile video] [military column] [content] " and " [introduction] [mobile video] [military column] [content] Two simple semantic formulas of [what] ".

The example of this kind of semantic formula is as follows：

[CRBT] [cancellation] [method | step？]

[introduction] [mobile video] [military column] [content] [what？]

[introduction] [12580？] [life is reported] [quality life version] [free] [business？]

[how] [open-minded] [mobile data | flow | online] [100 yuan] [set meal？] [short message]

D. semantically enhancement (&)

There is " & " symbol in the Far Left of semantic formula, word in semantic formula can be improved in similarity calculation The weight of class.More words in user's question sentence often can be ignored in this kind of semantic formula, and matching range can more extensively.

The example of this kind of semantic formula is as follows：

& [mobile video] [preferential packet | preferential]

& [the whole network music box] [starlight is sparking] [1 yuan] [set meal]

& [17951] [mobile IP phone] [business？]

&[IP？] [through train] [business？]

Semantic formula generally has following requirement：

1) writing succinctly not to write the content unrelated with semanteme, and without using sewing before and after unification, such as Wish i knew, I Do not know, I wants to consult.At this time, it is necessary to processing is filtered to expectation, to remove those and semantic unrelated content.

2) it not go to summarize, convert, dissipate semanteme to be expressed, such as condition, limitation, points for attention, problem；Citing Explanation：

Need the semanteme expressed：I prompts ERROR-001 when operating Internetbank

Mistake template：

(1) [operation] [Internetbank] [reporting an error]

(2) [operation] [Internetbank] [problem]

Correct template：

[operation？] [Internetbank] [prompt？][ERROR][001]

3) do not abuse "？" and " | ".

Remove in semantic formula containing "？" and " | " symbol word, remaining expression formula need to express the similar meaning.

Typical problem：Webpage reports an error " ERROR "

Mistake template：It [opens？] [webpage] [report an error？][ERROR？]

Correct template：It [opens？] [webpage] [reporting an error] [ERROR]

Q & A database includes the set of the set that question and answer standard is asked and associated standard question and answer case.When user carries It after going out customer problem, is asked by matching corresponding standard in Q & A database, then asks the standard that the correspondence standard is asked Answer is supplied to user.In order to improve successful match rate, each question and answer standard, which is asked, is actually associated with several semantic formulas, leads to Crossing customer problem, to find, matched corresponding standard is asked with the similarity calculation of semantic formula, i.e., will have with the customer problem There is the question and answer standard associated by the semantic formula of highest semantic similarity to ask that the correspondence standard as the customer problem is asked, and carries For corresponding answer.

On the other hand, question answering system further includes correct daily record library, and correct daily record library is in intelligent Answer System for storing The database of all correct daily records.So-called correct daily record is through the errorless daily record of system or manual confirmation, such as each is just True daily record includes the model answer that the correct standard of system is asked and the correct standard is asked.Here set that correct standard is asked with The set that question and answer standard in Q & A database is asked is usually identical.

Intelligent Answer System will produce a large amount of user journal in use, and each daily record includes the user that user provides Problem and the question and answer standard matched for the customer problem ask about corresponding answer.Manually need to intelligent Answer System The user journal of the magnanimity of generation is analyzed to identify, for the optimization and maintenance to intelligent Answer System.

Fig. 1 is the flow chart for the log processing method 100 for showing question answering system according to an aspect of the present invention.The party Method 100 to the user journal of magnanimity for carrying out automation combing, to mitigate labor workload.

In step 102, user journal data are obtained first.

The acquisition can question answering system generate each user journal when collect one by one, can also be regularly from Batch collection in question answering system.In step 104, processing is filtered to the user journal data of acquisition, it is pending to obtain Daily record data.

In one example, if a user journal is fallen in correct daily record library or in meaningless daily record library, the user day Will is confirmed without combing.Meaningless daily record library is to have gathered the database of some meaningless daily records, such as user Some the reasonless humour problems arbitrarily proposed, these daily records are collected as meaningless daily record library.

By comparing judging both to be not belonging to correct daily record library in user journal data or be not belonging to meaningless daily record library Daily record data is as pending daily record data, and the daily record data fallen in correct daily record library or in meaningless daily record library is filtered It removes, without being further processed.

In step 106, obtains and asked by carrying out the first standard that the first similarity calculation obtains to pending daily record data.

For a specific pending daily record, asked by the first similarity calculation with matching the first corresponding standard. Specifically, the matching is based on Q & A database.As previously mentioned, Q & A database includes multiple question and answer standards It asks, in addition each question and answer standard, which is asked, is associated with semantic formula to indicate that the question and answer standard is asked.

By the question and answer mark in pending daily record (for example, customer problem contained in the pending daily record) and Q & A database The semantic formula executable expressions Semantic Similarity Measurement that standard is asked, to find the highest semantic formula of similarity, the highest Question and answer standard corresponding to the semantic formula of similarity asks that the first standard obtained as matching is asked.

As previously mentioned, the customer problem proposed comprising user in each user journal, and question answering system was at that time The customer problem in Q & A database matched question and answer standard ask and correspond to the answer that the question and answer standard is asked.Therefore, another In one example, directly the question and answer standard contained in the pending daily record is asked and is asked as the first standard.

In step 108, obtain by carrying out the second standard that the second similarity calculation obtains to the pending daily record data It asks.

For the pending daily record, also asked by the second similarity calculation with matching the second corresponding standard.It is specific and Speech, which is based on correct daily record library.

Specifically, being waited for all based on correct daily record library (for example, set that the correct standard contained by it is asked) It handles daily record and carries out big data cluster, asked so that the pending daily record of each is clustered to a certain correct standard, thus should Correct standard asks that the second standard obtained as matching is asked.For example, (such as can be this user by the pending daily record of each The customer problem of daily record) and each correct standard asks then executing a Semantic Similarity Measurement gathers the pending daily record of this It is asked to that correct standard with highest semantic similarity.

In step 110, when the similarity that the pending daily record data and the first standard are asked is more than first threshold, pending day Will data are more than second threshold with the similarity that the second standard is asked, and the first standard is asked when asking identical with the second standard, then should Pending daily record is labeled as correctly.

The similarity that the pending daily record data and the first standard are asked is more than first threshold, that is, indicates the pending daily record quilt It is matched to first standard to ask with higher confidence level, the similarity that pending daily record data and the second standard are asked is more than second Threshold value also illustrates that the pending daily record is matched to second standard and asks with higher confidence level, and the first standard ask and Second standard is asked identical, indicates that pending daily record is all matched to identical standard by two weeks different similarity calculations and asks, So as to judge that the user journal is correct.

On the other hand, if the similarity that the pending daily record data and the first standard are asked is more than first threshold, that is, indicating should Pending daily record is matched to first standard and asks the phase asked with the second standard with higher confidence level, pending daily record data It is more than second threshold like degree, also illustrates that the pending daily record is matched to second standard and asks with higher confidence level, still First standard is asked asks difference with the second standard, illustrates that user journal has been matched to not by two different similarity calculations With standard ask, should be subject to the higher result of matching confidence at this time.

Specifically, pending daily record data can calculate and the difference ratio of similarity and first threshold that the first standard is asked (that is, similarity subtract the difference of first threshold again divided by the first threshold), and calculate pending daily record data and marked with second Similarity and second threshold that standard is asked difference ratio (similarity subtract the difference of second threshold again divided by the second threshold).

If the difference ratio of similarity and first threshold that pending daily record data and the first standard are asked is more than pending daily record The difference ratio for the similarity and second threshold that data are asked with the second standard, then illustrate that the pending daily record data is asked with the first standard The matching reliability higher of (the question and answer standard i.e. in Q & A database is asked), i.e., the answer that Q & A database provides be it is correct, In other words, the pending daily record generated by Q & A database is correct, therefore the pending daily record is labeled as correctly.

If in another aspect, the similarity asked of the pending daily record data and the first standard is less than first threshold, and it is pending The similarity that daily record data and the second standard are asked is again smaller than second threshold, then it represents that the confidence level of the pending daily record data compared with It is low, need manual confirmation.However, each confirm that labor workload is very big.

For this purpose, the similarity that can all similarities asked with the first standard are less than first threshold and be asked with the second standard User journal data less than second threshold carry out clustering, they are clustered as multiple user journal clusters for artificial true Recognize.In this way, the user journal cluster in each cluster has higher similarity, it is convenient for manual confirmation.

In one example, it is contemplated that the difference of Semantic Similarity Measurement, first threshold can be less than second threshold.

In this way, it is possible to be automatically labeled to user journal, labor workload is saved in large quantities.

Fig. 2 is the flow chart for showing question and answer information processing method 200 according to an aspect of the present invention.The question and answer information Processing method the problem of operation to be provided according to user, can provide corresponding answer by question answering system.

In step 202, customer problem is received.

Customer problem can be received via the interactive interface for system of answering.

In step 204, the first standard is obtained to the customer problem the first similarity calculation of progress and is asked.

For a specific customer problem, asked by the first similarity calculation with matching the first corresponding standard.Tool For body, which is based on Q & A database.It is asked as previously mentioned, Q & A database includes multiple question and answer standards, In addition each question and answer standard, which is asked, is associated with semantic formula to indicate that the question and answer standard is asked.

The customer problem is similar to the semantic formula executable expressions semanteme that the question and answer standard in Q & A database is asked Degree calculates, to find the highest semantic formula of similarity, the question and answer standard corresponding to the semantic formula of the highest similarity Ask that the first standard obtained as matching is asked.

In step 206, the second standard is obtained to the customer problem the second similarity calculation of progress and is asked.

For the customer problem, also asked by the second similarity calculation with matching the second corresponding standard.Specifically, The matching is based on correct daily record library.

Specifically, being asked the user based on correct daily record library (for example, set that the correct standard contained by it is asked) Topic carries out big data cluster, is asked so that the customer problem is clustered to a certain correct standard, to ask the correct standard to work It is asked to match the second obtained standard.A semantic similarity meter is executed for example, the customer problem is asked with each correct standard It calculates, then, this customer problem cluster is asked to that correct standard with highest semantic similarity.

In step 208, when the similarity that the customer problem and the first standard are asked is more than first threshold, the customer problem and the The similarity that two standards are asked is more than second threshold, and the first standard is asked when asking identical with the second standard, then to user feedback this One standard is asked or second standard asks corresponding answer information.

The similarity that the customer problem and the first standard are asked is more than first threshold, that is, indicates that the customer problem is matched to this First standard asks with higher confidence level, and the similarity that the customer problem and the second standard are asked is more than second threshold, also illustrates that The customer problem is matched to second standard and asks with higher confidence level, and the first standard asks and asks phase with the second standard Together, indicate that the customer problem is all matched to identical standard by two weeks different similarity calculations and asks, so as to judge The matching is accuracy, therefore, is asked to user feedback first standard or second standard asks corresponding answer information, this When the answer that provides there is very high confidence level.

On the other hand, if the similarity that customer problem and the first standard are asked is more than first threshold, that is, the customer problem is indicated It is matched to first standard to ask with higher confidence level, the similarity that the customer problem and the second standard are asked is more than the second threshold Value, also illustrates that the customer problem is matched to second standard and asks with higher confidence level, but the first standard is asked and second Standard asks difference, illustrates that customer problem is asked being matched to different standards by two different similarity calculations, at this time It should be subject to the higher result of matching confidence.

Specifically, the difference ratio of similarity and first threshold that customer problem is asked with the first standard can be calculated (that is, phase Like degree subtract the difference of first threshold again divided by the first threshold), and calculate customer problem and similarity that the second standard is asked With the difference ratio of second threshold (similarity subtract the difference of second threshold again divided by the second threshold).

If the difference ratio of similarity and first threshold that customer problem and the first standard are asked is more than customer problem and the second mark The difference ratio of similarity and second threshold that standard is asked then illustrates that the customer problem and the first standard are asked (i.e. in Q & A database Question and answer standard is asked) matching reliability higher, at this point, asking that corresponding answer information is supplied to user by first standard.Instead It, asks that corresponding answer is supplied to user by the second standard.

By this method, the question and answer accuracy of question answering system is improved.

Although to simplify explanation to illustrate the above method and being described as a series of actions, it should be understood that and understand, The order that these methods are not acted is limited, because according to one or more embodiments, some actions can occur in different order And/or with from it is depicted and described herein or herein it is not shown and describe but it will be appreciated by those skilled in the art that other Action concomitantly occurs.

Fig. 3 is the block diagram for the log processing device 300 for showing question answering system according to an aspect of the present invention.The daily record Processing unit 300 can be used for carrying out automation combing to the user journal of magnanimity, to mitigate labor workload.The log processing fills It sets 300 and may include acquisition module 302, filtering module 304, the first similarity calculation module 306, the second similarity calculation module 308, judgment module 310 and labeling module 312.

Acquisition module 302 obtains user journal data first.

The acquisition can question answering system generate each user journal when collect one by one, can also be regularly from Batch collection in question answering system.

Filtering module 304 can be filtered processing to the user journal data of acquisition, to obtain pending daily record data.

Filtering module 304 can judge both to be not belonging to correct daily record library in user journal data or be not belonging to nothing by comparing The daily record data in meaning daily record library falls the day in correct daily record library or in meaningless daily record library as pending daily record data Will data are filtered out, without being further processed.

First similarity calculation module 306 can be obtained to be obtained by carrying out the first similarity calculation to pending daily record data The first standard ask.

First similarity calculation module 306 may include expression formula Semantic Similarity Measurement module (not shown), will wait locating The semantic meaning representation that reason daily record (for example, customer problem contained in the pending daily record) is asked with the question and answer standard in Q & A database Formula executable expressions Semantic Similarity Measurement, to find the highest semantic formula of similarity, the semantic table of the highest similarity Ask that the first standard obtained as matching is asked up to the question and answer standard corresponding to formula.

Second similarity calculation module 308 can be obtained to be obtained by carrying out the second similarity calculation to the pending daily record data To the second standard ask.

For the pending daily record, also asked by the second similarity calculation module 308 with matching the second corresponding standard. Specifically, the matching is based on correct daily record library.

Specifically, the second similarity calculation module 308 may include cluster module (not shown), for correct daily record library Big data cluster is carried out to all pending daily records based on (for example, set that the correct standard contained by it is asked), so that The pending daily record of each is clustered to a certain correct standard and asks, to ask the correct standard to the second mark obtained as matching Standard is asked.For example, by the pending daily record of each (such as can be the customer problem of this user journal) and each correct standard Ask then execute a Semantic Similarity Measurement gathers that with highest semantic similarity by the pending daily record of this Correct standard is asked.

Judgment module 310 can determine whether the similarity that the pending daily record data and the first standard are asked is more than the first threshold Whether the similarity that value, the pending daily record data and the second standard are asked is more than second threshold and first standard is asked and is somebody's turn to do Second standard asks whether same standard is asked.

When the similarity that the pending daily record data and the first standard are asked is more than first threshold, pending daily record data and the The similarity that two standards are asked is more than second threshold, and the first standard is asked when asking identical with the second standard, then labeling module 312 can incite somebody to action The pending daily record is labeled as correctly.

The similarity that the pending daily record data and the first standard are asked is more than first threshold, that is, indicates the pending daily record quilt It is matched to first standard to ask with higher confidence level, the similarity that pending daily record data and the second standard are asked is more than second Threshold value also illustrates that the pending daily record is matched to second standard and asks with higher confidence level, and the first standard ask and Second standard is asked identical, indicates that pending daily record is all matched to identical standard by two weeks different similarity calculations and asks, To which labeling module 312 may determine that the user journal is correct.

Specifically, the similarity and first that judgment module 310 can calculate pending daily record data and the first standard is asked Threshold value difference ratio (that is, similarity subtract the difference of first threshold again divided by the first threshold), and calculate pending daily record The similarity and second threshold that data and the second standard are asked difference ratio (similarity subtract the difference of second threshold again divided by this Two threshold values), to judge that the difference ratio of similarity and first threshold that pending daily record data and the first standard are asked is greater than also It is less than pending daily record data and the difference ratio of similarity and second threshold that the second standard is asked.

If the difference ratio of similarity and first threshold that pending daily record data and the first standard are asked is more than pending daily record The difference ratio for the similarity and second threshold that data are asked with the second standard, then illustrate that the pending daily record data is asked with the first standard The matching reliability higher of (the question and answer standard i.e. in Q & A database is asked), i.e., the answer that Q & A database provides be it is correct, In other words, the pending daily record generated by Q & A database is correct, therefore labeling module 312 can be by the pending daily record It is labeled as correct.

Fig. 4 is the block diagram for showing question and answer information processing unit 400 according to an aspect of the present invention.

Question and answer information processing unit 400 may include receiving module 402, the first similarity calculation module 404, the second similarity Computing module 406 and judgment module 408 and output module 410.

Receiving module 402 can receive customer problem.The customer problem can be text formatting, or the lattice such as voice Formula.

Receiving module 402 can receive customer problem through the interactive interface of question answering system.

First similarity calculation module 404 can obtain the first standard to the customer problem the first similarity calculation of progress and ask.

For a specific customer problem, the first corresponding mark is matched by the first similarity calculation module 404 Standard is asked.Specifically, the matching is based on Q & A database.As previously mentioned, Q & A database includes multiple question and answer Standard asks that in addition each question and answer standard, which is asked, is associated with semantic formula to indicate that the question and answer standard is asked.

First similarity calculation module 404 may include expression formula Semantic Similarity Measurement module (not shown), for inciting somebody to action The semantic formula executable expressions Semantic Similarity Measurement that the customer problem is asked with the question and answer standard in Q & A database, to look for To the highest semantic formula of similarity, the question and answer standard corresponding to the semantic formula of the highest similarity is asked i.e. as matching The first obtained standard is asked.

Second similarity calculation module 406 can obtain the second standard to the customer problem the second similarity calculation of progress and ask.

For the customer problem, also asked by the second similarity calculation module 406 with matching the second corresponding standard.Tool For body, which is based on correct daily record library.

Specifically, the second similarity calculation module 406 may include cluster module (not shown), for correct daily record library Big data cluster is carried out to the customer problem based on (for example, set that the correct standard contained by it is asked), so that the user Problem is clustered to a certain correct standard and asks, to ask that the second standard obtained as matching is asked by the correct standard.For example, poly- The customer problem can be asked then executing a Semantic Similarity Measurement asks this user by generic module with each correct standard Topic cluster is asked to that correct standard with highest semantic similarity.

Judgment module 408 can determine whether the similarity that the customer problem is asked with first standard is more than first threshold, should The similarity that customer problem is asked with second standard whether is more than second threshold and first standard is asked and second standard is asked Whether it is that same standard is asked.

It is asked more than first threshold, the customer problem and the second standard when the similarity that the customer problem and the first standard are asked Similarity is more than second threshold, and the first standard is asked when asking identical with the second standard, then output module 410 can be somebody's turn to do to user feedback First standard is asked or second standard asks corresponding answer information.

The similarity that the customer problem and the first standard are asked is more than first threshold, that is, indicates that the customer problem is matched to this First standard asks with higher confidence level, and the similarity that the customer problem and the second standard are asked is more than second threshold, also illustrates that The customer problem is matched to second standard and asks with higher confidence level, and the first standard asks and asks phase with the second standard Together, indicate that the customer problem is all matched to identical standard by two weeks different similarity calculations and asks, so as to judge The matching is accuracy, and therefore, output module 410 can be asked to user feedback first standard or second standard asks correspondence Answer information, the answer provided at this time have very high confidence level.

Specifically, judgment module 408 can calculate customer problem and the first standard is asked similarity and first threshold Difference ratio (that is, similarity subtract the difference of first threshold again divided by the first threshold), and calculate customer problem with second mark Similarity and second threshold that standard is asked difference ratio (similarity subtract the difference of second threshold again divided by the second threshold), to Judge that the difference ratio for the similarity and first threshold that the customer problem and the first standard are asked is greater than and is also less than the customer problem The difference ratio of the similarity and second threshold asked with the second standard.

If the difference ratio of similarity and first threshold that customer problem and the first standard are asked is more than customer problem and the second mark The difference ratio of similarity and second threshold that standard is asked then illustrates that the customer problem and the first standard are asked (i.e. in Q & A database Question and answer standard is asked) matching reliability higher, at this point, first standard can be asked corresponding answer information by output module 410 It is supplied to user.Conversely, asking that corresponding answer is supplied to user by the second standard.

Those skilled in the art will further appreciate that, the various illustratives described in conjunction with the embodiments described herein Logic plate, module, circuit and algorithm steps can be realized as electronic hardware, computer software or combination of the two.It is clear Explain to Chu this interchangeability of hardware and software, various illustrative components, frame, module, circuit and step be above with Its functional form makees generalization description.Such functionality be implemented as hardware or software depend on concrete application and It is applied to the design constraint of total system.Technical staff can realize each specific application described with different modes Functionality, but such realization decision should not be interpreted to cause departing from the scope of the present invention.

Software should be broadly interpreted to mean instruction, instruction set, code, code segment, program code, program, son Program, software module, application, software application, software package, routine, subroutine, object, executable item, the thread of execution, regulation, Function etc., no matter it is all is to address with software, firmware, middleware, microcode, hardware description language or other terms So.

General place can be used in conjunction with various illustrative logic plates, module and the circuit that presently disclosed embodiment describes Reason device, digital signal processor (DSP), application-specific integrated circuit (ASIC), field programmable gate array (FPGA) other are compiled Journey logical device, discrete door or transistor logic, discrete hardware component or its be designed to carry out function described herein Any combinations are realized or are executed.General processor can be microprocessor, but in alternative, which can appoint What conventional processor, controller, microcontroller or state machine.Processor is also implemented as the combination of computing device, example As DSP and the combination of microprocessor, multi-microprocessor, the one or more microprocessors to cooperate with DSP core or it is any its His such configuration.

It can be embodied directly in hardware, in by processor in conjunction with the step of method or algorithm that embodiment disclosed herein describes It is embodied in the software module of execution or in combination of the two.Software module can reside in RAM memory, flash memory, ROM and deposit Reservoir, eprom memory, eeprom memory, register, hard disk, removable disk, CD-ROM or known in the art appoint In the storage medium of what other forms.Exemplary storage medium is coupled to processor so that the processor can be from/to the storage Medium reads and writees information.In alternative, storage medium can be integrated into processor.

Offer is that can make or use this public affairs to make any person skilled in the art all to the previous description of the disclosure It opens.The various modifications of the disclosure all will be apparent for a person skilled in the art, and as defined herein general Suitable principle can be applied to spirit or scope of other variants without departing from the disclosure.The disclosure is not intended to be limited as a result, Due to example described herein and design, but should be awarded and principle disclosed herein and novel features phase one The widest scope of cause.

Claims

1. a kind of log processing method of question answering system, which is characterized in that including：

Obtain user journal data；

When the similarity that the pending daily record data and first standard are asked is more than first threshold, the pending daily record number According to the similarity asked with second standard be more than second threshold, and first standard ask asked with second standard it is identical When, then the pending daily record is labeled as correctly.

2. log processing method as described in claim 1, which is characterized in that the filtration treatment includes：

By comparing judging both to be not belonging to correct daily record library in the user journal data or be not belonging to meaningless daily record library Daily record data is as pending daily record data.

3. log processing method as described in claim 1, which is characterized in that first similarity calculation includes：

Q & A database is provided, the Q & A database includes that multiple question and answer standards are asked；

It is that the pending daily record selects a question and answer standard to ask that the question and answer standard asks conduct by expression formula Semantic Similarity Measurement First standard is asked.

4. log processing method as described in claim 1, which is characterized in that second similarity calculation includes：

Correct daily record library is provided, the correct daily record library includes that correct standard is asked；

Ask that the correct standard is asked as second by the pending Log Clustering to a correct standard using big data cluster analysis Standard is asked.

5. the method as described in claim 1, which is characterized in that first standard is asked directly from the user journal data Extraction.

6. log processing method as claimed in claim 5, which is characterized in that further include：When the pending daily record data with The similarity that first standard is asked is more than first threshold, the similarity that the pending daily record data is asked with second standard More than second threshold, first standard is asked asks difference with second standard, and the pending daily record data and described the The difference of the similarity that one standard is asked and first threshold is than the phase asked more than the pending daily record data and second standard When like the difference ratio of degree and second threshold, then the pending daily record is labeled as correctly.

7. the method as described in claim 1, which is characterized in that further include：

8. a kind of question and answer information processing method, which is characterized in that including：

Receive customer problem；

When the similarity that the customer problem and first standard are asked is more than first threshold, the customer problem and described second The similarity that standard is asked is more than second threshold, and first standard is asked when asking identical with second standard, then anti-to user It presents first standard and asks that corresponding answer information or second standard ask corresponding answer information.

9. question and answer information processing method as claimed in claim 8, which is characterized in that first similarity calculation includes：

It is that the customer problem selects a question and answer standard to ask by expression formula Semantic Similarity Measurement, which asks as the One standard is asked.

10. question and answer information processing method as claimed in claim 8, which is characterized in that second similarity calculation includes：

The customer problem is clustered to a correct standard using big data cluster analysis and is asked, which asks as the second mark Standard is asked.

11. question and answer information processing method as claimed in claim 8, which is characterized in that further include：When the customer problem and institute It states the similarity that the first standard is asked and is more than first threshold, the customer problem is more than second with the similarity that second standard is asked Threshold value, and first standard is asked when asking different with second standard, then to user feedback similarity and corresponding threshold difference Value asks corresponding answer information than the standard of bigger.

12. a kind of log processing device of question answering system, which is characterized in that including：

Acquisition module, for obtaining user journal data；

First similarity calculation module is obtained for obtaining by carrying out the first similarity calculation to the pending daily record data The first standard ask；

Second similarity calculation module is obtained for obtaining by carrying out the second similarity calculation to the pending daily record data The second standard ask；

Judgment module, for judging whether the similarity that the pending daily record data and first standard are asked is more than the first threshold Whether value, the pending daily record data are more than second threshold and first mark with the similarity that second standard is asked Standard is asked asks whether same standard is asked with second standard；And

Labeling module, for being more than first threshold, institute when the pending daily record data and the similarity that first standard is asked It states the similarity that pending daily record data and second standard are asked and is more than second threshold, and first standard is asked and described the When two standards ask identical, then the pending daily record is labeled as correctly.

13. log processing device as claimed in claim 12, which is characterized in that the question answering system is provided with correct daily record library With meaningless daily record library, the filtering module further judges both to be not belonging in the user journal data correct by comparing Daily record library is also not belonging to the daily record data in meaningless daily record library as pending daily record data.

14. log processing device as claimed in claim 12, which is characterized in that the question answering system is provided with question and answer data Library, the Q & A database include that multiple question and answer standards are asked, first similarity calculation module includes：

Expression formula Semantic Similarity Measurement module, for being the pending daily record selection by expression formula Semantic Similarity Measurement One question and answer standard asks that the question and answer standard is asked asks as the first standard.

15. log processing device as claimed in claim 12, which is characterized in that the question answering system is provided with correct daily record Library, the correct daily record library include that correct standard is asked, second similarity calculation includes：

Cluster module, for being asked the pending Log Clustering to a correct standard using big data cluster analysis, this is correct Standard is asked asks as the second standard.

16. log processing device as claimed in claim 12, which is characterized in that first similarity calculation module directly from The first standard is asked described in the user journal extracting data.

17. log processing device as claimed in claim 16, which is characterized in that the judgment module is used for when described pending Daily record data is more than first threshold, the pending daily record data and second standard with the similarity that first standard is asked The similarity asked is more than second threshold, and first standard is asked when asking different with second standard, further judge described in wait for Whether the difference for the similarity and first threshold that processing daily record data is asked with first standard is than being more than the pending daily record The difference ratio for the similarity and second threshold that data are asked with second standard,

The difference ratio of the similarity and first threshold asked in response to the pending daily record data and first standard is more than institute The difference ratio of similarity and second threshold that pending daily record data is asked with second standard is stated, the labeling module will be described Pending daily record is labeled as correctly.

18. a kind of question and answer information processing unit, which is characterized in that including：

Receiving module, for receiving customer problem；

Judgment module, for judging whether the similarity that the customer problem and first standard are asked is more than first threshold, institute State whether the similarity that customer problem and second standard are asked is more than second threshold and first standard is asked and described the Whether it is that same standard is asked that two standards are asked；And

Output module, for being more than first threshold, the user when the customer problem and the similarity that first standard is asked The similarity that problem and second standard are asked is more than second threshold, and first standard ask asked with second standard it is identical When, ask that corresponding answer information or second standard ask corresponding answer information to the first standard described in user feedback.

19. question and answer information processing unit as claimed in claim 18, which is characterized in that the question answering system is provided with question and answer number According to library, the Q & A database includes that multiple question and answer standards are asked, first similarity calculation module includes：

Expression formula Semantic Similarity Measurement module is that the customer problem selects a question and answer by expression formula Semantic Similarity Measurement Standard asks that the question and answer standard is asked asks as the first standard.

20. question and answer information processing unit as claimed in claim 18, which is characterized in that the question answering system is provided with correct day Will library, the correct daily record library include that correct standard is asked, second similarity calculation module includes：

Cluster module is asked, the correct mark for being clustered the customer problem to a correct standard using big data cluster analysis Standard is asked asks as the second standard.

21. question and answer information processing unit as claimed in claim 18, which is characterized in that the judgment module is asked as the user Topic is more than first threshold with the similarity that first standard is asked, the customer problem and the similarity that second standard is asked are big In second threshold, first standard is asked when asking different with second standard, further judge the customer problem with it is described Whether the difference of the similarity that the first standard is asked and first threshold is than being more than the phase that the customer problem asks with second standard Like the difference ratio of degree and second threshold；

The output module asks corresponding answer information with corresponding threshold difference to user feedback similarity than the standard of bigger.