CN105468468B - Data error-correcting method towards question answering system and device - Google Patents

Data error-correcting method towards question answering system and device Download PDF

Info

Publication number
CN105468468B
CN105468468B CN201510870038.9A CN201510870038A CN105468468B CN 105468468 B CN105468468 B CN 105468468B CN 201510870038 A CN201510870038 A CN 201510870038A CN 105468468 B CN105468468 B CN 105468468B
Authority
CN
China
Prior art keywords
information
content
wrong
error correction
question answering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510870038.9A
Other languages
Chinese (zh)
Other versions
CN105468468A (en
Inventor
孙永超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Guangnian Wuxian Technology Co Ltd
Original Assignee
Beijing Guangnian Wuxian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guangnian Wuxian Technology Co Ltd filed Critical Beijing Guangnian Wuxian Technology Co Ltd
Priority to CN201510870038.9A priority Critical patent/CN105468468B/en
Publication of CN105468468A publication Critical patent/CN105468468A/en
Application granted granted Critical
Publication of CN105468468B publication Critical patent/CN105468468B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of data error-correcting method and device towards question answering system, and wherein method includes:User's input information is received, and user's input information is converted into received text format information, wherein user's input information includes voice messaging and/or text message;Denoising is carried out to received text format information, and obtains the first information;Wrong differentiation is carried out to the first information using wrong dictionary;When in the first information including wrong content, the wrong content in the first information is extracted;Wrong content is replaced according to default Processing Algorithm, obtain the second information and is exported.Data error-correcting method and device provided by the invention towards question answering system, can effectively reduce input error of the user to question answering system, answer user's accuracy putd question to improve question answering system, effectively promote the user experience of question answering system.

Description

Data error-correcting method towards question answering system and device
Technical field
The present invention relates to information retrieval and inquiry field more particularly to a kind of data error-correcting method towards question answering system and Device.
Background technology
Question answering system is a kind of advanced form of information retrieval system, it can be answered with accurate, succinct natural language and be used The problem of family is proposed with natural language.It is people to quickly and accurately obtaining the demand of information that it, which studies the main reason for rising,. Question answering system refers to using natural language understanding technology as core so that machine it will be appreciated that user speech content, realize people with Effective communication between machine, it be it is a kind of by natural language technology automatically with the artificial intelligence system of user session. Currently, question answering system is mainly used in computer customer service system, robot, toy for children, voice assistant and secretary using extensively In class product etc..
Due to user by natural language to question answering system propose problem, inevitably will appear some identification mistake or Other input errors of person, these mistakes enter question answering system, the accuracy of meeting strong influence question answering system output with input. Currently, the design of question answering system primarily focuses in the quality for how improving the answer for proposing problem acquisition to user, without The enquirement of user is differentiated and corrected.
Due to the input error of user, inevitably result in the corresponding answer quality of acquisition reduces question answering system, makes question and answer system The accuracy of system reduces, and user experience is not high.
Invention content
The present invention provides a kind of data error-correcting method and device towards question answering system, to solve to pass through in the prior art When natural language puts question to question answering system, since identification mistake or other input errors cause to influence question answering system accuracy The technical issues of.
One aspect of the present invention provides a kind of data error-correcting method towards question answering system, including:
User's input information is received, and user's input information is converted into received text format information, wherein user inputs Information includes voice messaging and/or text message;
Denoising is carried out to received text format information, and obtains the first information;
Wrong differentiation is carried out to the first information using wrong dictionary;
When in the first information including wrong content, the wrong content in the first information is extracted;
Wrong content is replaced according to default Processing Algorithm;
The second information is obtained according to replacement result and is exported.
Further, wrong differentiation is carried out to the first information using wrong dictionary, including:
For the first information in erroneous words library searching, when including the mistake stored in wrong dictionary in the first information Content differentiates in the first information to include wrong content;
Further include:
When retrieval failure, calculated by correct language material benchmark model general comprising wrong content in the first information Rate, mistake dictionary train to obtain by correct language material benchmark model;
When probability is more than predetermined threshold value, differentiate in the first information to include wrong content.
Further, to wrong content according to default Processing Algorithm be replaced including:
Classify according to type of error to wrong content, under classification results, is generated for wrong content a plurality of to be selected Error correction content;
According to whole syntactic analysis and context system to a plurality of error correction content ordering to be selected;
According to ranking results, error correction content is generated;
Using error correction content, wrong content is replaced.
Further, the above method further includes:Acquisition is labeled as correct language material, is instructed to correct language material benchmark model Practice.
Further, further include:The wrong content is input in wrong dictionary.
Another aspect of the present invention provides a kind of data error correction apparatus towards question answering system, including:
User information receiving module is converted to received text for receiving user's input information, and by user's input information Format information, wherein user's input information includes voice messaging and/or text message;
Preprocessing module carries out denoising for being ceased to received text format, and obtains the first information;
Mistake discrimination module, for carrying out wrong differentiation to the first information using wrong dictionary;
Wrong content extraction module, for when in the first information including wrong content, extracting the mistake in the first information Content;
Default Processing Algorithm module, for being replaced according to default Processing Algorithm to wrong content;
Correct content output module, for obtaining the second information according to replacement result and exporting.
Further, mistake discrimination module further includes:
Wrong content probability calculation submodule, for being directed to the first information when erroneous words library searching fails, by correct Language material benchmark model calculates the probability for including wrong content in the first information;
Wrong content differentiates submodule, for when probability is more than predetermined threshold value, differentiating in the first information comprising in mistake Hold.
Further, Processing Algorithm module is preset, including:
Error correction content obtaining submodule to be selected, for classifying according to type of error to wrong content, in classification results Under, generate a plurality of error correction content to be selected for wrong content;
Error correction content ordering submodule to be selected is used for according to whole syntactic analysis and context system to a plurality of error correction to be selected Content ordering;
Error correction content generates submodule, according to ranking results, generates error correction content;
Submodule is replaced, for utilizing error correction content, wrong content is replaced.
Further, further include correct language material training module, correct language material is labeled as acquiring, to correct language material Benchmark model is trained.
Further, further include mistake language material complementary module, for the wrong content to be input in wrong dictionary.
Data error-correcting method and device provided by the invention towards question answering system convert user's input information of reception For normative text format information, denoising then is carried out to received text format information and obtains the first information, and utilizes mistake Accidentally dictionary carries out wrong differentiation to the first information, when in the first information including wrong content, extracts the mistake in the first information Then content is replaced wrong content according to default Processing Algorithm, obtain the second information and export, this second information is Correct information after correction, this error correction method and device can effectively reduce input error of the user to question answering system, from And improve question answering system and answer user's accuracy putd question to, effectively promote the user experience of question answering system.
Description of the drawings
The invention will be described in more detail below based on embodiments and refering to the accompanying drawings.Wherein:
Fig. 1 is the flow diagram according to the data error-correcting method towards question answering system of the embodiment of the present invention one;
Fig. 2 a are the flow diagram according to the data error-correcting method towards question answering system of the embodiment of the present invention two;
Fig. 2 b be according in the data error-correcting method towards question answering system of the embodiment of the present invention two to wrong content according to The flow diagram that default Processing Algorithm is replaced;
Fig. 3 is the structural schematic diagram according to the data error correction apparatus towards question answering system of the embodiment of the present invention three;
Fig. 4 is the structural schematic diagram according to the data error correction apparatus towards question answering system of the embodiment of the present invention four.
In the accompanying drawings, identical component uses identical reference numeral.Attached drawing is not drawn according to actual ratio.
Specific implementation mode
The present invention will be further described with reference to the accompanying drawings.
Embodiment one
Fig. 1 is according to the flow diagram of the data error-correcting method towards question answering system of the embodiment of the present invention one, such as Fig. 1 Shown, the present invention provides a kind of data error-correcting method towards question answering system, including:
Step 101, user's input information is received, and user's input information is converted into received text format information, wherein User's input information includes voice messaging and/or text message.
Specifically, user's input information includes voice messaging or text message, or include voice messaging and text simultaneously Information also needs to user's input information being converted into standard herein for the ease of user's input information is uniformly processed Text formatting information.
Step 102, denoising is carried out to the received text format information, and obtains the first information.
Specifically, can include under normal circumstances, in received text format information in some mistakes or useless text Hold, influence whether later processing procedure, so the denoising in this step can fall mistake or useless text filtering, both The correctness of user's input information is not interfered with, the interference to subsequent step can also be reduced, the first information is to standard text This format information carries out the result obtained after denoising.
Step 103, wrong differentiation is carried out to the first information using wrong dictionary.
Specifically, for the first information in erroneous words library searching, when including to be deposited in wrong dictionary in the first information The wrong content of storage differentiates in the first information to include wrong content.Mistake dictionary is whether there is in test input information The database of mistake word, the wrong word for including in the database is more, when carrying out mistake to the first information and differentiating, finds The probability of wrong word is bigger in the first information, i.e., the wrong word in the first information is easier to be found.
The first information is differentiated, judges that whether wrong content exists in the first information, holds if so, going to step 104 Row;Further, if not having wrong content presence in the first information, directly the first information is exported.
Step 104, when in the first information including wrong content, the wrong content in the first information is extracted.
Specifically, this step needs to extract the wrong content in the first information, i.e., according to mistake in extraction step 103 The wrong content that accidentally dictionary is found.
Step 105, the wrong content is replaced according to default Processing Algorithm.
Specifically, default Processing Algorithm can replace the error correction content of the wrong content in the first information, to obtain the Two information.
Step 106, the second information is obtained according to replacement result and exported.Specifically, to the wrong content in the first information The information obtained after being replaced is the second information, and the second information is exported.
User's input information of reception is converted to standard by the data error-correcting method provided by the invention towards question answering system Then text formatting information carries out denoising to received text format information and obtains the first information, and utilizes wrong dictionary pair The first information carries out wrong differentiation, when in the first information including wrong content, extracts the wrong content in the first information, and right Wrong content is replaced according to default Processing Algorithm, is obtained the second information and is exported, this second information is after correction Correct information, this error correction method can effectively reduce input error of the user to question answering system, be returned to improve question answering system The accuracy of user's enquirement is answered, the user experience of question answering system is effectively promoted.
Error correction method provided by the invention can be used not only in question answering system, moreover it can be used to other to input text require compared with Height needs in the system for identifying and correcting.
Embodiment two
The present embodiment is the supplementary explanation carried out on the basis of the above embodiments.
Fig. 2 a are such as to be schemed according to the flow diagram of the data error-correcting method towards question answering system of the embodiment of the present invention two Shown in 2a, the present invention provides a kind of data error-correcting method towards question answering system, including:
Step 201, user's input information is received, and user's input information is converted into normative text format information, In, user's input information includes voice messaging and/or text message.
Step 202, denoising is carried out to received text format information, and obtains the first information.
Above-mentioned steps 201-202 is consistent with step 101-102 in embodiment one, and details are not described herein.
Step 203, wrong differentiation is carried out to the first information using wrong dictionary.
The first information is differentiated, judges that whether wrong content exists in the first information, holds if so, going to step 204 Row;If there is no wrong content presence in the first information, 2031 are entered step.
Step 2031, when for the first information when erroneous words library searching fails, i.e.,:Do not include when in the first information The wrong content stored in wrong dictionary;
The probability for including wrong content in the first information is calculated by correct language material benchmark model.
Correct language material is largely marked by training and obtains correct language material benchmark model, since correct language material benchmark model is Model based on statistics, so the data volume of the correct language material of training is bigger, the wrong content of extraction will be more and more accurate, profit When being differentiated to the first information with wrong dictionary, the case where can not exactly matching is had, i.e.,:Is not stored in mistake dictionary Wrong content included in one information.
In the above case, it includes wrong content to be calculated in the first information by correct language material benchmark model in this step Probability to judge in subsequent step in the first information whether to include wrong content.For example " I wants to eat work packet chicken user's input Fourth ", it is preliminary to assert that the combination of " work packet " between " diced chicken " is not correct group in the case where correct language material benchmark model calls situation Close, thus for " work packet diced chicken " be wrong content;
" work packet diced chicken " at this time are wrong content to be confirmed, if are real wrong content, it is also necessary to carry out It is further to judge, therefore this step need to calculate the probability that " work packet diced chicken " are wrong content, for further judging.
Step 2032, when probability is more than predetermined threshold value, differentiate in the first information to include wrong content.
Specifically, predetermined threshold value can be configured according to actual conditions, in general, predetermined threshold value is arranged lower, mistake Content is easier to be found, and predetermined threshold value is arranged higher, and the wrong content in the first information is easier to be missed.Work as step When calculating the probability of acquisition in 2031 more than predetermined threshold value, it is believed that include wrong content in the first information.Such as " work packet diced chicken " Probability for wrong content is 0.95, and predetermined threshold value is set as 0.9, then " work packet diced chicken " will be considered as in mistake at this time Hold.
Step 204, when in the first information including wrong content, the wrong content in the first information is extracted, in mistake Appearance is replaced according to default Processing Algorithm.
Specifically, this step needs to be replaced amendment to the wrong content in the first information, first to wrong content into Row analysis obtains possible correct content and judges possible correct content, finally determines a correct content, then The algorithm of replacement wrong content is removed with correct content.If confirmation " work packet diced chicken " is wrong content, then needing to " work packet chicken Fourth " is analyzed, and is obtained possible correctly content and is judged possible correct content, finally determines that correct content is " chicken cubes in chilly sauce " then replaces wrong content " work packet diced chicken " with correct content " chicken cubes in chilly sauce ".
Further, referring to Fig. 2 b, step 204 specifically includes:
Step 2051, classify according to type of error to wrong content, under classification results, generated for wrong content A plurality of error correction content to be selected.
Specifically, analyzing wrong content, determine which kind of type of error the wrong content belongs to, i.e. the wrong content It is since (such as aphthenxia is clear to cause voice conversion written caused by text input mistake or caused by voice input error This when, malfunctions):
Further, text input mistake further includes Pinyin Input mistake, five input errors and writes input error (i.e. Directly carry out word writing) input.Sorting algorithm can be used to realize in this step.Classification and then root are carried out to wrong content According to classification results, a plurality of error correction content to be selected is generated for wrong content.Such as wrong content " work packet diced chicken ", according to mistake Type is classified, and belongs to Pinyin Input mistake, therefore under the type of error of Pinyin Input mistake, according to wrong content " work Packet diced chicken " generate a plurality of error correction content " chicken cubes in chilly sauce " to be selected, " bulletin diced chicken " etc..
Step 2052, according to whole syntactic analysis and context system to a plurality of error correction content ordering to be selected.
Specifically, when error correction content to be selected is multiple, need to select the highest error correction content to be selected of correct probability as replacing Content is changed, therefore using whole syntactic analysis and context system to a plurality of error correction content ordering to be selected, whole syntactic analysis is just Refer to analyzing the word grammatical function in sentence, such as " I is late ", " I " is subject here, and " next " is predicate, " Evening " is complement.Syntactic analysis is mainly used in Chinese information processing, such as machine translation.
Error correction content to be selected is given a mark in conjunction with whole Sentence analysis, determines that error correction content to be selected is highest scoring Correct content.Quick sorting algorithm, optimal selection algorithm can be used to realize for this step.Such as in conjunction with whole Sentence analysis and up and down Literary system is come to error correction content " chicken cubes in chilly sauce " to be selected, " bulletin diced chicken " marking, and the score of " chicken cubes in chilly sauce " can be than " bulletin chicken The score of fourth " is high, so the sorting position of " chicken cubes in chilly sauce " can come before " bulletin diced chicken ".
Step 2053, according to ranking results, error correction content is generated.
Specifically, the error correction content to be selected for being first using sorting position in step 2052 is as error correction content.Such as " the quick-fried chicken in palace The sorting position of fourth " is first, i.e. score highest, then " chicken cubes in chilly sauce " is selected as error correction content.
Step 2054, using error correction content, wrong content is replaced.
Specifically, the wrong content in the first information is directly replaced with error correction content, " chicken cubes in chilly sauce " replacement " I is such as used Eat work packet diced chicken " in " work packet diced chicken ", obtain " I will eat chicken cubes in chilly sauce ", i.e. the second information, the second information at this time is For the correct information after correction.
Further, the above method further includes that the wrong content is input in wrong dictionary, to supplement wrong dictionary In wrong language material, improve the abundant degree of wrong dictionary, can be directly in erroneous words when including wrong content in input information Wrong content is completed in library to differentiate, improves error correction efficiency.
Finally, the second information is obtained according to replacement result and exported.
Further, preferably, technical solution of the present invention further includes:Acquisition is labeled as correct language material, to correct language Material benchmark model is trained.Specifically, correct language material benchmark model is before the use, need by largely marking correctly Language material trains to obtain,, can be from if do not find wrong content in addition, during carrying out wrong differentiation to the first information Each word is extracted in the first information, and each word is labeled as correctly, and correct language material is labeled as to just to acquire these in real time True language material benchmark model is trained.
Data error-correcting method provided by the invention towards question answering system, when for the first information erroneous words library searching lose When losing, correct language material is largely marked by training and obtains correct language material benchmark model, is extracted using correct language material benchmark model Wrong content in the first information, and classify to wrong content, under classification results, a plurality of wait for is generated for wrong content Error correction content is selected, according to whole syntactic analysis and context system to a plurality of error correction content ordering to be selected, is given birth to according to ranking results Wrong content is replaced using error correction content at error correction content, obtain the second information and is exported, i.e., after correction Correct information, this error correction method can effectively reduce input error of the user to question answering system, be answered to improve question answering system The accuracy that user puts question to effectively promotes the user experience of question answering system.
Embodiment three
The present embodiment is the device embodiment for executing the data error-correcting method provided by the invention towards question answering system.
Fig. 3 is according to the structural schematic diagram of the data error correction apparatus towards question answering system of the embodiment of the present invention three, such as Fig. 3 Shown, the present invention provides a kind of data error correction apparatus towards question answering system, including user information receiving module 301, pretreatment Module 302, mistake discrimination module 303, wrong content extraction module 304, default Processing Algorithm module 305 and correct content output Module 306.
Wherein, user information receiving module 301 is converted to for receiving user's input information, and by user's input information Received text format information, wherein user's input information includes voice messaging and/or text message;
Preprocessing module 302 carries out denoising for being ceased to received text format, and obtains the first information;
Mistake discrimination module 303, for carrying out wrong differentiation to the first information using wrong dictionary;
Wrong content extraction module 304, for when in the first information including wrong content, extracting the mistake in the first information Accidentally content;
Default Processing Algorithm module 305, for being replaced according to default Processing Algorithm to wrong content.
Correct content output module 306, for obtaining the second information according to replacement result and exporting.
The present embodiment is device embodiment corresponding with embodiment of the method one, and for details, reference can be made to accordingly retouching in embodiment one It states, details are not described herein.
Data error correction apparatus provided by the invention towards question answering system is received using user information receiving module 301 and is used Family input information, and user's input information is converted into received text format information, then preprocessing module 302 is to received text Format information carries out denoising and obtains the first information, reuses wrong discrimination module 303 using wrong dictionary to the first information Wrong differentiation is carried out, then wrong content extraction module 304 is used for when in the first information including wrong content, the first letter of extraction Wrong content in breath is finally preset Processing Algorithm module 305 and is replaced according to default Processing Algorithm to wrong content, correctly Content output module 306 obtains the second information and exports.This second information is the correct information after correction, this error correction Device can effectively reduce input error of the user to question answering system, and user's accuracy putd question to is answered to improve question answering system, Effectively promote the user experience of question answering system.
Example IV
The present embodiment is the supplementary explanation carried out on the basis of the above embodiments.
Fig. 4 is according to the structural schematic diagram of the data error correction apparatus towards question answering system of the embodiment of the present invention four, such as Fig. 4 Shown, the present invention provides a kind of data error correction apparatus towards question answering system, including user information receiving module 401, pretreatment Module 402, mistake discrimination module 403, wrong content extraction module 404, default Processing Algorithm module 405 and correct content output Module 406.
Wherein, user information receiving module 401 is converted to for receiving user's input information, and by user's input information Received text format information, wherein user's input information includes voice messaging and/or text message.
Preprocessing module 402 carries out denoising for being ceased to received text format, and obtains the first information.
Mistake discrimination module 403, for carrying out wrong differentiation to the first information using wrong dictionary.
Further, mistake discrimination module 403 further includes:Wrong content probability calculation submodule 4031 and wrong content are sentenced Small pin for the case module 4032.
Wherein, wrong content probability calculation submodule 4031, for being directed to the first information when erroneous words library searching fails, The probability for including wrong content in the first information is calculated by correct language material benchmark model;
Wrong content differentiates submodule 4032, for when probability is more than predetermined threshold value, differentiating in the first information comprising mistake Accidentally content.
Wrong content extraction module 404, for when in the first information including wrong content, extracting the mistake in the first information Accidentally content;
Default Processing Algorithm module 405 obtains the second letter for being replaced according to default Processing Algorithm to wrong content It ceases and exports.
Further, presetting Processing Algorithm module 405 includes:Error correction content obtaining submodule 4051 to be selected, error correction to be selected Content ordering submodule 4052, error correction content generate submodule 4053 and replace submodule 4054.
Wherein, error correction content obtaining submodule 4051 to be selected, for classifying according to type of error to wrong content, Under classification results, a plurality of error correction content to be selected is generated for wrong content.
Error correction content ordering submodule 4052 to be selected is used for according to whole syntactic analysis and context system to a plurality of to be selected Error correction content ordering.Error correction content generates submodule 4053, according to ranking results, generates error correction content.
Submodule 4054 is replaced, for utilizing error correction content, wrong content is replaced.
Correct content output module 406, for obtaining the second information according to replacement result and exporting.
Further, the data error correction apparatus provided by the invention towards question answering system further includes correct language material training mould Block is labeled as correct language material for acquiring, is trained to correct language material benchmark model.
Further, the data error correction apparatus provided by the invention towards question answering system further includes:Mistake language material supplements mould Block, for the wrong content to be input in wrong dictionary.
The present embodiment is device embodiment corresponding with embodiment of the method two, and for details, reference can be made to accordingly retouching in embodiment two It states, details are not described herein.
Data error correction apparatus provided by the invention towards question answering system passes through wrong content probability calculation submodule 4031 In correct language material benchmark model, calculates in the first information and include the probability of wrong content, then submodule is differentiated by wrong content Block 4032 differentiates in the first information to include wrong content, then extracts mould by wrong content when probability is more than predetermined threshold value Block 404 extracts the wrong content in the first information, and error correction content obtaining submodule to be selected 4051 is to wrong content according to wrong class Type is classified, and generates a plurality of error correction content to be selected, and then error correction content ordering submodule to be selected 4052 is according to default processing Algorithm generates submodule 4053 and generates error correction content according to ranking results, replace to a plurality of error correction content ordering to be selected, error correction content It changes submodule 4054 to be replaced wrong content using error correction content, obtain the second information and exports.This error correction device energy Input error of the user to question answering system is effectively reduced, user's accuracy putd question to is answered to improve question answering system, effectively carries Rise the user experience of question answering system.
Although by reference to preferred embodiment, invention has been described, the case where not departing from the scope of the present invention Under, various improvement can be carried out to it and can replace component therein with equivalent.Especially, as long as there is no structures to rush Prominent, items technical characteristic mentioned in the various embodiments can be combined in any way.The invention is not limited in texts Disclosed in specific embodiment, but include all technical solutions fallen within the scope of the appended claims.

Claims (9)

1. a kind of data error-correcting method towards question answering system, which is characterized in that including:
User's input information is received, and user's input information is converted into received text format information, wherein the user Input information includes voice messaging and/or text message;
Denoising is carried out to the received text format information, and obtains the first information;
Wrong differentiation is carried out to the first information using wrong dictionary;
When in the first information including wrong content, the wrong content in the first information is extracted;
Classify according to type of error to the wrong content, under classification results, is generated for the wrong content a plurality of Error correction content to be selected, with according to whole syntactic analysis and context system to a plurality of error correction content ordering to be selected, and according to The ranking results generate error correction content, to utilize the error correction content, are replaced to the wrong content;
The second information is obtained according to replacement result and is exported.
2. the data error-correcting method according to claim 1 towards question answering system, which is characterized in that described to utilize erroneous words Library carries out wrong differentiation to the first information, including:
For the first information in erroneous words library searching, when including in the mistake stored in wrong dictionary in the first information Hold, differentiates in the first information to include wrong content;
Further include:
When retrieval failure, calculated by correct language material benchmark model general comprising wrong content in the first information Rate;
When the probability is more than predetermined threshold value, differentiate in the first information to include wrong content.
3. the data error-correcting method according to claim 2 towards question answering system, which is characterized in that further include:Acquisition mark Note is correct language material, is trained to the correct language material benchmark model.
4. the data error-correcting method according to claim 2 towards question answering system, which is characterized in that further include:
The wrong content is input in wrong dictionary.
5. a kind of data error correction apparatus towards question answering system, which is characterized in that including:
User information receiving module is converted to received text for receiving user's input information, and by user's input information Format information, wherein user's input information includes voice messaging and/or text message;
Preprocessing module for carrying out denoising to the received text format information, and obtains the first information;
Mistake discrimination module, for carrying out wrong differentiation to the first information using wrong dictionary;
Wrong content extraction module, for when in the first information including wrong content, extracting in the first information The wrong content;
Default Processing Algorithm module, for classifying according to type of error to the wrong content, under classification results, for The wrong content generates a plurality of error correction content to be selected, with according to whole syntactic analysis and context system to described a plurality of to be selected Error correction content ordering, and according to the ranking results, error correction content is generated, to utilize the error correction content, in the mistake Appearance is replaced;
Correct content output module, for obtaining the second information according to replacement result and exporting.
6. the data error correction apparatus according to claim 5 towards question answering system, which is characterized in that the mistake differentiates mould Block further includes:
Wrong content probability calculation submodule, for, when erroneous words library searching fails, passing through correct language material for the first information Benchmark model calculates the probability for including wrong content in the first information;
Wrong content differentiates submodule, for when the probability is more than predetermined threshold value, differentiating in the first information comprising mistake Accidentally content.
7. the data error correction apparatus according to claim 5 towards question answering system, which is characterized in that the default processing is calculated Method module, including:
Error correction content obtaining submodule to be selected, for classifying according to type of error to the wrong content, in classification results Under, generate a plurality of error correction content to be selected for wrong content;
Error correction content ordering submodule to be selected is used for according to whole syntactic analysis and context system to a plurality of error correction to be selected Content ordering;
Error correction content generates submodule, for according to the ranking results, generating error correction content;
Submodule is replaced to be replaced the wrong content using the error correction content.
8. the data error correction apparatus according to claim 6 towards question answering system, which is characterized in that further include correct language Expect training module, is labeled as correct language material for acquiring, the correct language material benchmark model is trained.
9. according to any data error correction apparatus towards question answering system of claim 5-8, which is characterized in that further include, Mistake language material complementary module, for the wrong content to be input in wrong dictionary.
CN201510870038.9A 2015-12-02 2015-12-02 Data error-correcting method towards question answering system and device Active CN105468468B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510870038.9A CN105468468B (en) 2015-12-02 2015-12-02 Data error-correcting method towards question answering system and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510870038.9A CN105468468B (en) 2015-12-02 2015-12-02 Data error-correcting method towards question answering system and device

Publications (2)

Publication Number Publication Date
CN105468468A CN105468468A (en) 2016-04-06
CN105468468B true CN105468468B (en) 2018-07-27

Family

ID=55606203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510870038.9A Active CN105468468B (en) 2015-12-02 2015-12-02 Data error-correcting method towards question answering system and device

Country Status (1)

Country Link
CN (1) CN105468468B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122346B (en) * 2016-12-28 2018-02-27 平安科技(深圳)有限公司 The error correction method and device of a kind of read statement
CN107729316A (en) * 2017-10-12 2018-02-23 福建富士通信息软件有限公司 The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese
CN108038098A (en) * 2017-11-28 2018-05-15 苏州市东皓计算机系统工程有限公司 A kind of computword correcting method
CN108829674A (en) * 2018-06-08 2018-11-16 Oppo(重庆)智能科技有限公司 Content error correction method and relevant apparatus
CN109344392B (en) * 2018-08-23 2023-02-03 广州市万隆证券咨询顾问有限公司 Intelligent message pushing method, system and device for security customer service consultation
CN109376224B (en) * 2018-10-24 2020-07-21 深圳市壹鸽科技有限公司 Corpus filtering method and apparatus
CN111523305A (en) * 2019-01-17 2020-08-11 阿里巴巴集团控股有限公司 Text error correction method, device and system
CN110489723A (en) * 2019-08-19 2019-11-22 绍兴数纺科技有限公司 A kind of data error detection and error correction system of dyeing information system
CN110674276B (en) * 2019-09-23 2024-08-16 深圳前海微众银行股份有限公司 Robot self-learning method, robot terminal, device and readable storage medium
CN110598218A (en) * 2019-09-24 2019-12-20 腾讯科技(深圳)有限公司 Information error surveying method and related device
CN112733529B (en) * 2019-10-28 2023-09-29 阿里巴巴集团控股有限公司 Text error correction method and device
CN111708870A (en) * 2020-05-27 2020-09-25 盛视科技股份有限公司 Deep neural network-based question answering method and device and storage medium
CN112329476B (en) * 2020-11-11 2024-07-19 北京京东尚科信息技术有限公司 Text error correction method and device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984422A (en) * 2010-10-18 2011-03-09 百度在线网络技术(北京)有限公司 Fault-tolerant text query method and equipment
CN103366741A (en) * 2012-03-31 2013-10-23 盛乐信息技术(上海)有限公司 Voice input error correction method and system
CN103488752A (en) * 2013-09-24 2014-01-01 沈阳美行科技有限公司 POI (point of interest) searching method
CN103871407A (en) * 2012-12-07 2014-06-18 浦项工科大学校产学协力团 Method and apparatus for correcting speech recognition error
CN103942223A (en) * 2013-01-23 2014-07-23 北京百度网讯科技有限公司 Method and system for conducting online error correction on language model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8077983B2 (en) * 2007-10-04 2011-12-13 Zi Corporation Of Canada, Inc. Systems and methods for character correction in communication devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984422A (en) * 2010-10-18 2011-03-09 百度在线网络技术(北京)有限公司 Fault-tolerant text query method and equipment
CN103366741A (en) * 2012-03-31 2013-10-23 盛乐信息技术(上海)有限公司 Voice input error correction method and system
CN103871407A (en) * 2012-12-07 2014-06-18 浦项工科大学校产学协力团 Method and apparatus for correcting speech recognition error
CN103942223A (en) * 2013-01-23 2014-07-23 北京百度网讯科技有限公司 Method and system for conducting online error correction on language model
CN103488752A (en) * 2013-09-24 2014-01-01 沈阳美行科技有限公司 POI (point of interest) searching method

Also Published As

Publication number Publication date
CN105468468A (en) 2016-04-06

Similar Documents

Publication Publication Date Title
CN105468468B (en) Data error-correcting method towards question answering system and device
CN104268160B (en) A kind of OpinionTargetsExtraction Identification method based on domain lexicon and semantic role
RU2251737C2 (en) Method for automatic recognition of language of recognized text in case of multilingual recognition
CN109635108B (en) Man-machine interaction based remote supervision entity relationship extraction method
CN106257455B (en) A kind of Bootstrapping method extracting viewpoint evaluation object based on dependence template
CN105589844A (en) Missing semantic supplementing method for multi-round question-answering system
CN107145514B (en) Chinese sentence pattern classification method based on decision tree and SVM mixed model
CN111353306B (en) Entity relationship and dependency Tree-LSTM-based combined event extraction method
CN102956231B (en) Voice key information recording device and method based on semi-automatic correction
CN111090735B (en) Performance evaluation method of intelligent question-answering method based on knowledge graph
CN105206284A (en) Virtual chatting method and system relieving psychological pressure of adolescents
CN105574173A (en) Commodity searching method and commodity searching device based on voice recognition
CN108009297A (en) Text emotion analysis method and system based on natural language processing
CN106372053B (en) Syntactic analysis method and device
CN109086266A (en) A kind of error detection of text nearly word form and proofreading method
CN110176228A (en) A kind of small corpus audio recognition method and system
CN110175585A (en) It is a kind of letter answer correct system and method automatically
CN110147546A (en) A kind of syntactic correction method and device of Oral English Practice
CN114239546A (en) Translator test method based on grammar tree pruning
CN105183808A (en) Problem classification method and apparatus
CN110543475A (en) financial statement data automatic identification and analysis method based on machine learning
CN106548787B (en) Optimize the evaluating method and evaluating system of new word
CN112447172B (en) Quality improvement method and device for voice recognition text
CN117094311A (en) Method for establishing error correction filter for Chinese grammar error correction
CN106484676B (en) Biological Text protein reference resolution method based on syntax tree and domain features

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant