CN109753636A - Machine processing and text error correction method and device calculate equipment and storage medium - Google Patents
Machine processing and text error correction method and device calculate equipment and storage medium Download PDFInfo
- Publication number
- CN109753636A CN109753636A CN201711060088.6A CN201711060088A CN109753636A CN 109753636 A CN109753636 A CN 109753636A CN 201711060088 A CN201711060088 A CN 201711060088A CN 109753636 A CN109753636 A CN 109753636A
- Authority
- CN
- China
- Prior art keywords
- text
- error correction
- machine processing
- model
- log
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 289
- 238000012545 processing Methods 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 81
- 238000013519 translation Methods 0.000 claims description 85
- 230000003203 everyday effect Effects 0.000 claims description 21
- 238000002360 preparation method Methods 0.000 claims description 15
- 238000003672 processing method Methods 0.000 claims description 14
- 238000005065 mining Methods 0.000 claims description 10
- 230000007246 mechanism Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 10
- 238000009412 basement excavation Methods 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 5
- 235000013399 edible fruits Nutrition 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 239000000523 sample Substances 0.000 description 4
- 241000289669 Erinaceus europaeus Species 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000007257 malfunction Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241000283084 Balaenoptera musculus Species 0.000 description 1
- 241000283323 Delphinapterus leucas Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 108010059642 isinglass Proteins 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of machine processing and text error correction method, device, calculate equipment and storage medium.Prepare the error correction rewriting pair including Error Text and corresponding correct text.It is rewritten using error correction to as training corpus, machine processing model is trained, be thus prepared for the machine processing model suitable for text error correction.It can be rewritten by excavating error correction from log to being trained to machine processing model, be adapted to carry out error correction to text.By the first text input into machine processing model, the second text, i.e. error correction result text are obtained.Furthermore it is also possible to first judge whether the first text needs to carry out error correction using language model or conventional dictionary.The training corpus excavated from log can be used and carry out train language model, conventional dictionary can also be arranged by being segmented, being counted to the text in log.As a result, text error correction can be conveniently realized.
Description
Technical field
This disclosure relates to text-processing technology, in particular to text error correcting technique.
Background technique
With the rapid development of computer technology and Internet technology, the form of human-computer interaction is also more and more abundant, more next
It is more convenient.
Keyboard input is traditional human-computer interaction input mode.User often will appear some typing errors in input.
For example, being often incorrectly entered the wrongly written character of likeness in form when using five-stroke character input method;When using spelling input method, frequent mistake
Ground input sound with or sound as wrongly written character.When the error occurs, the input content for needing user's deletion error re-starts input.Also
Some mistakes fail to be found in time, leave in a document.
Handwriting input is known another human-computer interaction input mode.User writes on such as handwriting pad, calculates system
System identifies write text.However, the writing style of each user is different, system also often identifies mistake, needs to use
Family is deleted, and is re-entered.
Image recognition technology has also obtained quick development in recent years, can identify to the image of penman text,
Obtain corresponding text.However, image recognition is also due to a variety of causes generates the recognition result of some mistakes.
In addition, computing system has begun processing Human Natural Language of having the ability, can be analyzed based on natural language,
Effective information is obtained, and responds or execute corresponding operating.Such natural language can be written, be also possible to language
Sound input.When the voice input for directly receiving user and being issued with natural language, and made by analysis and understanding input voice pair
The response answered, such as when execution corresponding operation, the convenience of human-computer interaction can be greatly increased.Correspondingly, voice inputs
Also have become a very important entrance in the calculating fields such as artificial intelligence.
However, the text that many intelligence softwares or hardware identify input voice still often will appear and use
The inconsistent situation of the practical content expressed in family sometimes even exports the text that people fails to understand.If will identify that not just
True text input makes a significant impact the processing such as understanding on subsequent natural language to subsequent processing stage, or even can not
Continue subsequent processing.
The reason of causing speech recognition errors has many possibility.For example, the pronunciation of user is nonstandard, hardware/software itself
Existing defects, speech recognition algorithm error etc..
Currently, having there is the solution of some speech recognition optimizations.These solutions are mostly from part, positioning
The segment for locally needing to rewrite pulls corresponding error correction candidate text, then screen wherein best one rewritten.That is,
Achieve the effect that rewrite entirety by rewriting part.
Often there are following problems in such solution.
On the one hand, entirety is had ignored when modifying to part.In fact, many times, whole other parts for
The modification of part should can play the role of supervision and constraint.
On the other hand, some solutions error-correction rule good dependent on preparatory offline consolidation.And due to speech recognition mistake
The a variety of causes such as diversity accidentally, in order to reach higher quality requirement, these error-correction rules are often very complicated, arrangement at
It originally will be relatively high.
To sum up, for various man-machine interaction modes, all still need a kind of solution for being able to carry out text error correction.
Summary of the invention
The invention solves a technical problem be to provide a kind of machine processing scheme, enable to text error correction more
It is convenient.
According to the first aspect of the invention, a kind of machine processing method is provided, comprising: prepare error correction rewriting pair, entangle
Mistake is rewritten to including Error Text and corresponding correct text;And it is rewritten using error correction to as training corpus, to machine processing
Model is trained.
Preferably, the step for preparing error correction rewriting pair may include: that error correction rewriting pair is excavated from log.
Preferably, the first text for meeting at least one of following conditions and later this can be found from log, as
Error correction is rewritten pair: the first text that records in log and the later time interval between this are no more than predetermined time interval;?
The ratio that first text and the later editing distance between this are obtained divided by the maximum length of two texts is predetermined no more than first
Fractional threshold;This frequency of occurrence is not less than the first pre-determined number threshold value later;First text and later this as error correction
The frequency of occurrence of rewriting pair is not less than the second pre-determined number threshold value.
Preferably, machine processing model can be Machine Translation Model.
Preferably, Machine Translation Model can be sequence to series model.
Preferably, Machine Translation Model can be the sequence based on attention mechanism to series model.
Preferably, Error Text and correct text can be external input texts.
Preferably, external input text can be typewriting input text, handwriting input text, speech recognition result text,
One of image recognition result text.
According to the second aspect of the disclosure, a kind of text error correction method is provided, comprising: prepare to be suitable for text error correction
Machine processing model;And by the first text input into machine processing model, the second text is obtained.
Preferably, mould can be handled come preparation machine by the machine processing method according to the above-mentioned first aspect of the disclosure
Type.
Preferably, text error correction method can also include: to judge whether the first text needs to carry out error correction, wherein
In the case that judgement needs to carry out error correction, by the first text input into machine processing model, and determining not needing to be entangled
In the case where mistake, not by the first text input into machine processing model.
Preferably, judging whether the first text needs the step of carrying out error correction may include: to judge using language model
Whether one text needs to carry out error correction;And/or judge whether the first text needs to carry out error correction based on conventional dictionary.
Preferably, the first predetermined puzzlement degree threshold value can be higher than for the puzzlement degree that the first text provides in language model
In the case of, judgement needs to carry out error correction.
Preferably, can in the first text include non-everyday words in the case where, judgement need to carry out error correction.
Preferably, text error correction method can also include: that the training language for being suitable for train language model is excavated from log
Material;And carry out train language model using training corpus.
Preferably, text error correction method can also include: the training corpus and common language that will be excavated from log
Material mixing, obtains combined training corpus, wherein carry out train language model using combined training corpus.
Preferably, text error correction method can also include: to segment to the text in log;Each participle is counted to exist
Frequency of occurrence in log;And the word using frequency of occurrence not less than third pre-determined number threshold value is recorded in often as everyday words
With in dictionary.
Preferably, text error correction method can also include: to judge whether the second text is qualified.
Preferably, can judge whether the second text is qualified based at least one in the following conditions: machine processing mould
Type is not less than predetermined confidence threshold value for the confidence level that the second text provides;Language model is directed to the puzzlement that the second text provides
Score value is spent less than the first text, and/or puzzlement degree score value is less than the second predetermined puzzlement degree threshold value;First text and the second text
The ratio that editing distance between this is obtained divided by the maximum length of the two is no more than the second predetermined ratio threshold value.
Preferably, the first text can be typewriting input text, handwriting input text, speech recognition result text, image
One of recognition result text.
According to the third aspect of the disclosure, a kind of machine processing device is provided, comprising: error correction is rewritten to preparation device,
For preparing error correction rewriting pair, error correction is rewritten to including wrong identification result and corresponding correct recognition result;And first instruction
Practice device, for being rewritten using error correction to as training corpus, machine processing model is trained.
Preferably, error correction is rewritten can excavate error correction rewriting pair to preparation device from log.
Preferably, the first text for meeting at least one of following conditions and later this can be found from log, as
Error correction is rewritten pair: the first text that records in log and the later time interval between this are no more than predetermined time interval;?
The ratio that first text and the later editing distance between this are obtained divided by the maximum length of two texts is predetermined no more than first
Fractional threshold;This frequency of occurrence is not less than the first pre-determined number threshold value later;First text and later this as error correction
The frequency of occurrence of rewriting pair is not less than the second pre-determined number threshold value.
Preferably, machine processing model can be Machine Translation Model.
Preferably, Machine Translation Model can be sequence to series model.
Preferably, Machine Translation Model can be the sequence based on attention mechanism to series model.
Preferably, Error Text and correct text can be external input texts.
Preferably, external input text can be typewriting input text, handwriting input text, speech recognition result text,
One of image recognition result text.
According to the fourth aspect of the disclosure, a kind of text error correction device is provided, comprising: off-line module, off-line module packet
Machine processing model preparation device is included, for preparing the machine processing model suitable for text error correction;And in wire module, online
Module includes error correction re-writing device, for the first text input into machine processing model, to be obtained the second text.
Preferably, machine processing model preparation device can be fills according to the machine processing of the above-mentioned third aspect of the disclosure
It sets.
It preferably, can also include: error correction decision maker in wire module, for judging whether the first text is entangled
It is wrong, wherein in the case where the judgement of error correction decision maker needs to carry out error correction, by the first text input to machine processing model
In, and in the case where the judgement of error correction decision maker does not need to carry out error correction, not by the first text input to machine processing model
In.
Preferably, error correction decision maker may include: the first judgment means, whether judge the first text using language model
It needs to carry out error correction;And/or second judgment means, judge whether the first text needs to carry out error correction based on conventional dictionary.
Preferably, the first judgment means can be higher than first for the puzzlement degree that the first text provides in language model and make a reservation for
In the case where puzzlement degree threshold value, judgement needs to carry out error correction.
Preferably, in the case that the second judgment means can include non-everyday words in the first text, judgement needs to carry out
Error correction.
Preferably, off-line module may include: corpora mining device, be suitable for train language model for excavating from log
Training corpus;And second training device, for carrying out train language model using training corpus.
Preferably, off-line module can also include: corpus mixing arrangement, the training language for will excavate from log
Material is mixed with common corpus, obtains combined training corpus, wherein the second training device carrys out training language using combined training corpus
Model.
Preferably, off-line module can also include: participle device, for segmenting to the text in log;Statistics dress
It sets, for counting frequency of occurrence of each participle in log;And collating unit, for frequency of occurrence is pre- not less than third
The word of frequency threshold value is determined as everyday words, is recorded in conventional dictionary.
It preferably, may include: result judgement device in wire module, for judging whether the second text is qualified.
Preferably, result judgement device can judge whether the second text closes based at least one in the following conditions
Lattice: machine processing model is not less than predetermined confidence threshold value for the confidence level that the second text provides;Language model is directed to second
The puzzlement degree score value that text provides is less than the first text, and/or puzzlement degree score value is less than the second predetermined puzzlement degree threshold value;The
The ratio that editing distance between one text and the second text is obtained divided by the maximum length of the two is no more than the second predetermined ratio
Threshold value.
Preferably, the first text can be typewriting input text, handwriting input text, speech recognition result text, image
One of recognition result text.
According to the 5th of the disclosure the aspect, a kind of calculating equipment is provided, comprising: processor;And memory, it deposits thereon
Executable code is contained, when executable code is executed by processor, executes processor according to disclosure first aspect or the
The method of two aspects.
According to the 6th of the disclosure the aspect, a kind of non-transitory machinable medium is provided, being stored thereon with can
Code is executed, when executable code is executed by the processor of electronic equipment, executes processor according to disclosure first aspect
Or the method for second aspect.
By the machine processing scheme of the disclosure, make it possible to easily realize text error correction.
Detailed description of the invention
Disclosure illustrative embodiments are described in more detail in conjunction with the accompanying drawings, the disclosure above-mentioned and its
Its purpose, feature and advantage will be apparent, wherein in disclosure illustrative embodiments, identical reference label
Typically represent same parts.
Fig. 1 is the schematic block diagram of the speech recognition result error correction scheme of the disclosure;
Fig. 2 is the schematic block diagram of the speech recognition result error correction device of the disclosure;
Fig. 3 can be used for executing the schematic block diagram of the calculating equipment of the method for correcting error of voice identification result of the disclosure.
Specific embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Preferred embodiment, however, it is to be appreciated that may be realized in various forms the disclosure without the embodiment party that should be illustrated here
Formula is limited.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and can be by the disclosure
Range is completely communicated to those skilled in the art.
In the following, describing the machine processing scheme and text error correction scheme of the disclosure referring to figs. 1 to Fig. 3.
Fig. 1 be described by taking speech recognition result error correction as an example the disclosure machine processing and text error correction scheme it is schematic
Block diagram.As shown in Figure 1, the speech recognition result error correction scheme of the disclosure includes offline part (left side of dotted line, Ke Yiyou in Fig. 1
Off-line module processing) and it is online partially (in Fig. 1 right side of dotted line, can be by online resume module).Offline part is for doing in advance
It prepares, and online part then real-time perfoming error correction.
[general introduction]
The present inventor is related to the log of the application of speech recognition by analysis, it is noted that some of rules, example
Such as, certain some pronunciation can relatively frequently be identified as another pronunciation.That is, in a large amount of logs, the mistake of speech recognition
There are certain rules.Therefore, inventors realized that, such rule can be excavated by algorithm, it is especially such to reflect
Relationship is penetrated, for realizing the error correction to speech recognition result.
In consideration of it, the disclosure proposes a kind of speech recognition result error correction scheme based on machine processing model, by means of pre-
First trained machine processing model carries out the error correction to speech recognition result.
The machine processing model of the disclosure can be any machine processing model suitable for text-processing.Some embodiments
In, machine processing model can be the machine processing model suitable for text conversion processing, such as be properly termed as " text conversion mould
Type ".Currently, a kind of text conversion model being used widely is Machine Translation Model.Hereinafter, it is with Machine Translation Model
Example is described.It should be understood that technical solution disclosed herein can also be realized using other machine processing models.
On the other hand, it is described by taking speech recognition result error correction as an example in the disclosure.It should be understood that skill disclosed herein
Art scheme can be adapted for the scene of various text error correction completely.By using preprepared Error Text and it is corresponding just
The error correction pair that true text is constituted comes training machine processing model, the available machine processing model suitable for text error correction.Make
It can be used to carry out error correction to various texts with the machine processing model.Text as error correction object can be original in system
Text, be also possible to input text.The input mode of text be also possible to it is diversified, for example, handwriting input, typewriting
Input, voice input (speech recognition), image recognition input etc..These texts are also all likely to mistake occur, such as hand-written
Identify mistake, typing error, speech recognition errors, image recognition mistake etc..And there is also certain rules for these mistakes, together
Sample can excavate such rule by algorithm, especially such mapping relations, for realizing text error correction.
In brief, the speech recognition result error correction system of disclosure preferred embodiment mainly includes two big modules, offline
Web log mining module (can be also simply referred to as " off-line module ") and online correction module (can be also simply referred to as " in wire module ").The former uses
In generation training corpus 110 with training machine translation model 120, and the latter is then carrying out voice knowledges to real-time voice input 210
After not obtaining speech recognition result text (generally, being properly termed as " the first text ") 220, trained machine translation is utilized
Model 120 carries out error correction to speech recognition result text 220, and final output is by the revised speech recognition result text of error correction
This (generally, is properly termed as " the second text "." first ", " second " in the disclosure are only used to distinguish different texts
Description, is not meant to further restriction any to text) 240.
Offline logs excavate module by from log 100 mining data prepare for online correction module.And it is online
If the error correction result of correction module, which feeds back to offline logs, excavates module, offline logs can be further improved and excavate knot
Fruit.
It is online correction module service that offline logs, which excavate module, therefore can be set based on needed for online correction module
Meter.
Particularly, online correction module carries out speech recognition result error correction using Machine Translation Model 120.Accordingly
Ground, offline logs, which excavate module, can excavate error correction rewriting to 110 as training sample, carry out training machine translation model 120.
Further, if online correction module can just call machine in the case where judgement (230) need error correction
Translation model 120 carries out error correction, then can greatly improve efficiency.Correspondingly, offline logs, which excavate module, to be judgement
Whether need error correction and is prepared.However, this judges that (230) are not construed as the technical solution institute to realize the disclosure
It is required, even if also can be realized the purpose of speech recognition result error correction without such judgement.Therefore, dashed box is used in Fig. 1
It outlines and determines 230 related parts with error correction, indicate that this Partial Feature can be omitted.
In the following, various aspects involved in technical solution to the disclosure are described respectively.
[Machine Translation Model]
Firstly, Machine Translation Model 120 is briefly described.
Translation is that a kind of sentence of linguistic form (such as English) is converted to another linguistic form (such as Chinese)
Sentence, the sentence of bilingual form substantially express identical meaning.
About machine translation (also known as " automatic translation "), people have carried out long-term research and exploration, and propose base
Machine translation scheme in regular (Rule-Based) and the machine translation scheme based on corpus (Corpus-Based).It is based on
The machine translation scheme of corpus can be divided into the machine translation scheme based on statistics (Statistics-based) again and be based on real
The machine translation scheme of the method for example (Example-based).
Over 2013, as the research of deep learning obtains greater advance, the machine translation based on artificial neural network
(Neural Machine Translation) gradually rises.Its technological core is the depth for possessing magnanimity node (neuron)
Neural network is spent, can automatically learn translation knowledge from corpus.CNN (convolutional neural networks) and RNN (circulation nerve net
Network) it is widely used.After a kind of sentence of language is quantified, transmit layer by layer in a network, being converted into computer " can manage
The representation of solution " generates the translation of another language using the conduction operation of multilayer complexity.Realize " understanding language,
The interpretative system of generation translation ".This maximum advantage of interpretation method is that the translation reads smoothly, is more in line with syntax gauge, is easy
Understand.Compared to translation technology before, quality has the promotion of " jump ".
In September, 2016, Google (Google) company have issued Google's nerve machine translation system, use sequence to sequence
(seq2seq) learning model, in the case where not needing deep learning researcher and understanding language translation knowledge in depth, translation effect
Fruit has surpassed the language translation system that language specialist best in the world is built.Based on the sequence of attention to series model in machine
Device translates field by increasing attention.
In June, 2017, Google further publish thesis in Arxiv system, " Attention Is All You
Need”(https://arxiv.org/pdf/1706.03762.pdf), it proposes one kind and is based only upon attention (attention)
Without the use of the machine translation mechanism of CNN and RNN.
Under researcher's research and probe deep and lasting for a long time, machine translation scheme is more and more mature, translation effect
Fruit is become better and better.
And the technical solution of the disclosure inventors noted that text error correction (such as speech recognition result error correction) and language
There is similarities between translation, they have similar logic.The bilingual expression that language translation is converted has
, there are mapping relations between the two in the same physical meaning.And the mistake text of text error correction (such as speech recognition result error correction)
This (wrong identification result) and correct text (correct recognition result) then both correspond to identical content that user's expectation inputs (such as
The identical voice successively issued), it is also likely to that there are a degree of mapping relations between the two.Between bilingual expression
Translation conversion have certain rule and rule, between Error Text and correct text error correction conversion also have certain rule
Rule.
Therefore, inventor proposes, can borrow Machine Translation Model 120 and execute text error correction (such as speech recognition result
Error correction) operation.By taking speech recognition result error correction as an example, the wrong identification result that is obtained or arranged using preparatory (generally, can be with
Referred to as " Error Text ")-correct recognition result (generally, being properly termed as " correct text ") is right, as Machine Translation Model 120
Input-output to (generally, being properly termed as " error correction rewrite to ") training corpus, Machine Translation Model 120 is trained,
So that Machine Translation Model 120 grasp mapping relations between (study is arrived) wrong identification result and correct recognition result and/or
Conversion rule, so as to obtain the Machine Translation Model 120 suitable for speech recognition result error correction.
Preferably, sequence can be used to series model in the technical solution of the disclosure, is preferably based on the sequence of attention
To series model.CNN and/or RNN can be used in the model, can also not use CNN and/or RNN.About sequence to sequence mould
Type and attention mechanism, in the existing extensive discussions in machine translation field, details are not described herein.
[off-line module]
The off-line module of the disclosure is mainly used for Web log mining, therefore is referred to as " offline logs excavation module ".
Offline logs, which excavate module, can carry out three aspect work: error correction is rewritten to (training corpus) excavation, language model
Training corpus excavates, everyday words is collected.Error correction is rewritten to 110 for being trained to Machine Translation Model 120.And language model
160 and conventional dictionary 140 then for judging whether system needs error correction to the recognition result for the voice that user inputs.
[error correction rewrite to]
It can be with the correct recognition result pair of the above-mentioned wrong identification result-for training machine translation model 120 of manual sorting.
That is, the common some wrong identifications of manual sorting as a result, and provide corresponding correct recognition result, composing training corpus is put into
Training corpus, to be trained to Machine Translation Model 120.
However, manual sorting efficiency is more low, and the training corpus arranged is inevitably slipped, it is likely that not enough
Comprehensively, training effect may be not satisfactory enough.
Inventor's proposition, can be from the correlation log of speech recognition application (more generally, being related to the application of text input)
Such wrong identification result-correct recognition result is excavated in 100 to (error correction is rewritten to 110), as Machine Translation Model 120
Training corpus.
It will record the speech recognition result of system in log 100, user initiates time and/or the speech recognition result of request
The time of generation and some other relevant informations.
User is when by voice input come using intelligence software or hardware, if obtained since speech recognition malfunctions
Less than correct speech recognition result or corresponding correctly service, primary request is often re-initiated.Therefore, in log 100
Can have the spontaneous speech recognition result error correction of a large amount of user to rewrite to 110, i.e., wrong recognition result (first text) and
Re-initiate wrong identification result-correct recognition result that the correct recognition result (later this) obtained after request is constituted
It is right.Under other man-machine interaction modes, the correct text pair of similar Error Text-can be also recorded in corresponding log.
Off-line module identifies that the such error correction of excavation is rewritten to 110 from log 100, can construct and turn over for training machine
Translate the correct recognition result pair of wrong identification result-of model 120.Off-line module can be by a series of strong rules come from log
Such error correction is excavated in 100 to rewrite to 110, constructs the training corpus of Machine Translation Model 120.
In the following, analysis correction rewrites the excavation logic to 110, in other words, mining rule.
On the one hand, when speech recognition malfunctions, user can re-initiate request quickly, therefore, wrong identification result and again
The new correct recognition result corresponding time (time is initiated in recognition time or request) for initiating to obtain after request apart will not too long.
On the other hand, the difference between correct recognition result and wrong identification result often will not be too big, between the two
With certain similitude.Here the concept of " editing distance " can be introduced.Editing distance refers to two word strings (in the disclosure
For two words) between, the minimum edit operation times needed for another are changed into as one.The edit operation of license includes by one
Character is substituted for another character, is inserted into a character, deletes a character.In general, editing distance is smaller, two string
Similarity is bigger.
In another aspect, some sentences may repeatedly be identified mistake, or even having can due to factors such as pronunciation, morphology, syntaxes
Same error result can be repeatedly identified as, be used for using such sentence or such error correction rewriting to 110 to train
The Machine Translation Model 120 of error correction will be advantageous.Such error correction is rewritten more representative to 110.Moreover, when to a large amount of
When log 100 is excavated, it can also filter out some less representative error correction and rewrite to 110, improve trained efficiency.
In addition, such wrong identification result (Error Text) and the obtained correct recognition result (text) of re-request it
Between, some other rule or incidence relation may be also had, can be used as excavation logic.
In this way, it may include at least one of following for excavating error correction from log 100 and rewriting to 110 excavation logic:
A) between two words (two recognition results are generally properly termed as " first text " and " later this ") when
Between (request time or recognition time) interval be not more than predetermined time interval;
B) ratio that the editing distance between two words is obtained divided by the maximum length of two words is no more than the first predetermined ratio
It is worth threshold value;
C) frequency of occurrence of that sentence (identifying obtained result (later this) rear) corrected is rewritten not less than first
Pre-determined number threshold value, such as 5 times.
D) this two word is rewritten as error correction is not less than the second pre-determined number threshold value, such as 5 times to 110 frequency of occurrence.
As described above, it is considered that some other rule or incidence relation, can also have other excavation logics.
Meet two words (first speech recognition result (first text) the He Hou speech recognition of at least one of above-mentioned condition
As a result (later this)) it can be used as error correction rewriting pair.
Although excavating change as error correction here it should be understood that can be excavated to the log of multiple users
Two words write pair should be two words in the log of the same user.
Then, off-line module can analyze two two ground of recognition result (word) in log 100, see
Whether above-mentioned excavation logic is met.If met, show that this two word identified in log 100 likely corresponds to
Same a word of user's successively voice input twice, previous sentence is likely to wrong identification as a result, and latter sentence is likely to just
True recognition result.
Here two words analyzed every time, usually adjacent two word.However, user may also be two under some cases
The voice of some not practical significances is inputted between secondary request.In this way, these recognition results without practical significance can be omitted, and
It analyzes before and after them, judges whether to meet above-mentioned excavation logic.
For example, user wishes input " military plane of army ", after inputting for the first time, but identification obtains " essence force hedgehog face ".User
It was found that re-entering after identification mistake, correct recognition result " military plane of army " has been obtained.In this way, the first postscript in log
" essence force hedgehog face " (first text) and " military plane of army " (later this) two texts have been recorded, it can be by Web log mining
An error correction is obtained to rewrite to (" essence force hedgehog face ", " military plane of army ").
In another example user wishes input " huge mind war hits team ", after inputting for the first time, but identification obtains " mind single machine to ".With
It after family discovery identification mistake, re-enters, has obtained correct recognition result " team is hit in huge mind war ".In this way, in log successively
" mind single machine to " (first text) and " team is hit in huge mind war " (later this) two texts are had recorded, it can be by being dug to log
Pick obtains error correction and rewrites to " mind single machine to "-" huge mind, which is fought, hits team ".
Thus, it is possible to which the error correction for obtaining excavating logic as defined in several meet is rewritten to 110.These error correction are rewritten to 110
The correct recognition result pair of above-mentioned wrong identification result-can be considered as, as training corpus, for Machine Translation Model 120 into
Row training.
Here, it is described for rewriting from the Web log mining error correction of speech recognition application and expecting training.It should
Understand, corresponding error correction rewriting pair can also be excavated from the log of other applications for being related to text input with identical principle
Training is expected.
The error correction excavated can be rewritten and word granularity is cut into respectively (that is, being thin with word to each of 110 texts
Sub-unit) to be trained.When training machine translation model 120, use word granularity without using word granularity (that is, being thin with word
Sub-unit), mainly in view of the non-writing text of text that speech recognition comes out, relatively disorderly, it is likely that influence participle effect.
If the participle of mistake is possible to influence the understanding of sentence, so that the training to model has adverse effect on using word granularity.
In Machine Translation Model 120 using the sequence based on attention to series model (seq2seq+attention)
In the case of, Machine Translation Model 120 can be trained using seq2seq+attention mode.
[language model]
Language model 160 be according to language objective fact and carry out language abstract mathematics modeling.In brief, language mould
Type 160 is the model for calculating the probability of a sentence (or word sequence).Using language model 160, which can determine
A possibility that word sequence, is bigger, or gives several words, can predict the word that next most probable occurs.Use training language
After material is trained language model 160, it can be used for corresponding Language Processing application.
If being recorded in the log 100 of the intelligence software or hardware of various text input modes (such as voice input)
Language (recognition result) generally reflects tongue of the user under application intelligence software or hardware this special screne.It can be with
The language recorded in usage log 100 is trained language model 160 as training corpus 150, in order to which online text entangles
When wrong (such as speech recognition error correction), the language that (such as by speech recognition system) newly inputs user is judged by language model 160
Whether the recognition result (the first text) of sentence malfunctions.
Off-line module can then excavate the language for being suitable as language model training corpus 150 from log 100, such as
Determine to identify correct language.It requests for example, being issued in user, after system identification obtains the recognition result, has made corresponding
Response and/or user do not re-initiate request, then can be determined that this recognition result is correctly, in log 100
This sentence may be used as training corpus.
When the intelligence software or hardware are for specific field, such as plane ticket booking, Intelligent housing etc., log
The language of 100 records can have specific rule, keyword and content.Language in usage log 100 is as training corpus 150
Carry out train language model 160, the particularity of specific area involved in intelligence software or hardware can be embodied.
And when the intelligence software or hardware are used for universal field, the rule of the language that log 100 records, keyword,
Content will be than wide.
The training corpus 150 excavated from log 100 and common training corpus can also be mixed to form trained language
Expect library, increase the capacity of training corpus and covers content range.
For example, in the case where the intelligence software or hardware are used for specific field, the language mould that will be excavated from log 100
Type training corpus 150 is mixed with common training corpus, train language model 160, can take into account it is general with it is special
Property.
Whether trained language model 160 can be used for online error correction and determine, determine in short clear and coherent smooth.
[everyday words]
Furthermore it is also possible to which the language in log 100 especially can be determined that the correct language (text) of identification, be carried out
Participle, and frequency of occurrence of the participle in log 100 is counted.Frequency of occurrence is not less than third pre-determined number threshold value
The word of (such as 5 times) is recorded as everyday words 130.These everyday words 130 can be stored in conventional dictionary or everyday words column
In table 140, so as to online module polls.
Especially in the case where intelligence software or hardware are used for specific field, it can go out in the language of general user's input
Existing words is limited, and is often repeatedly occurred.And work as in the text (the first text) that speech recognition obtains comprising seldom going out
When existing word (non-everyday words), this speech recognition result (the first text) is likely that there are mistake, needs to carry out error correction rewriting.
For example, everyday words includes such as place name, time, airline's name etc., if emitted suddenly under the scene of plane ticket booking application
The word of a wide of the mark out, such as " blue whale " (Nanjing), " beluga " (Beijing), then being just likely to mistake occur.
Error correction judgement is carried out in wire module, non-everyday words can be whether there is according to a word (the first text) identified
To judge the language recognition result with the presence or absence of mistake, if need to carry out error correction.
[in wire module]
The disclosure mainly executes online error correction in wire module, is referred to as " online correction module ".
Online correction module may include two aspect functions.Core function is using trained machine translation as described above
Model 120 is to there may be the recognition result texts (the first text) of mistake to carry out error correction rewriting.As auxiliary, can also carry out
Error correction determines, determines that the resulting text (the first text) that identification obtains whether there is mistake, if need to carry out error correction to it.?
Line correction module only can call Machine Translation Model 120 just in the case where judgement needs error correction to carry out error correction, in this way may be used
To greatly improve efficiency.
[error correction judgement]
Here it is described by taking speech recognition as an example.It should be appreciated that error correction decision scheme described herein is equally applicable to
The text of other way input.
In a preferred embodiment, when speech recognition module output speech recognition text (speech recognition result, more generally
For the first text) after, online correction module can first pass through error correction decision logic to be analyzed and determined, to determine that voice is known
Other result whether there is mistake, if error correction is needed to rewrite.In the case where determining that error correction is needed to rewrite, then using trained
Machine Translation Model 120 carries out error correction rewriting.In the case where determining that not needing error correction rewrites, then without being input to machine translation
Error correction rewriting is carried out in model 120, and the speech recognition result text 220 for directly obtaining speech recognition is as final output
Speech recognition result text 220 exports.
Speech recognition result text can be judged based on language model 160 and/or conventional dictionary 140 as described above
Originally 220 whether there is mistake, if error correction is needed to rewrite.
Speech recognition result text 220 is thought to be rewritten when meeting following condition, otherwise directly exports voice
The text of identification:
A) puzzlement degree (perplexity) score value that language model 160 is provided for the speech recognition result text 220 is high
In the first predetermined puzzlement degree threshold value;And/or
B) there is non-everyday words.The word not having in non-everyday words, that is, conventional dictionary 140.
It in the technical solution of the disclosure, can be judged using any one of them condition, both can also be combined
Condition is judged.It should be understood that can also judge whether speech recognition result deposits using other Rule of judgment (logic)
In mistake.
On the one hand, speech recognition result text 220 is input in trained language model 160 as described above, language
Speech model 160 can provide puzzlement degree (perplexity) score value by analyzing the text, if the score value is higher than first in advance
Surely puzzled degree threshold value, then show that speech recognition result text 220 is likely that there are mistake, need to carry out error correction rewriting to it.
On the other hand, the participle of speech recognition result text 220 can be obtained into multiple words.Respectively as described above
These words are searched in the conventional dictionary or common word list 140 excavated.If in conventional dictionary or common word list 140
Some word is not found, then the word is non-everyday words.In this case, show that speech recognition result is also likely to exist
Mistake needs to carry out error correction rewriting to it.
It should be appreciated that above-mentioned judgement can also be carried out using other error correction determination methods.
[error correction rewriting]
As described above, the core function of online correction module is that real-time text (speech recognition result) error correction is rewritten.
The disclosure proposes to rewrite to the Machine Translation Model 120 of 110 training using by error correction come to speech recognition result
Carry out error correction rewriting.The speech recognition result text (the first text) for needing error correction to rewrite is input in Machine Translation Model 120,
Model exports the speech recognition text (the second text) 240 after error correction correction.About Machine Translation Model 120, above
It is described in detail.
In this way, being but identified as " the enough violent current double paddles of amount " (first text for example when user's input " let us has swung double paddles "
This) when, Machine Translation Model 120 can be converted into " let us has swung double paddles " (second text of the practical expectation input of user
This), it re-enters without user for identification.
It, can also be according to some scheduled filter conditions for the speech recognition text (the second text) after error correction correction
(or Rule of judgment), whether the result that comprehensive descision error correction is rewritten is effective, that is, judges the revised speech recognition text of error correction (the
Two texts) it is whether qualified.
Such filter condition may include at least one of following:
1) confidence level that Machine Translation Model 120 provides is not less than predetermined confidence threshold value;
2) before the puzzlement degree score value of the language model 160 of the revised text of error correction (the second text) is less than error correction rewriting
Text, and/or puzzlement degree score value is less than the second predetermined puzzlement degree threshold value;
3) error correction rewrites the editing distance between two texts (the first text and the second text) of front and back divided by the two
The ratio that maximum length obtains is not more than the second predetermined ratio threshold value.
About the above-mentioned 1) item, Machine Translation Model 120 is to the speech recognition result text (the first text) inputted
After carrying out processing conversion, while exporting speech recognition result text (the second text) that error correction is rewritten, it can also export simultaneously
This time confidence level of conversion.When confidence level higher (being not less than predetermined confidence threshold value), show the text that error correction is rewritten
(the second text) is more credible.When confidence level is lower than predetermined confidence threshold value, the effect for showing that error correction is rewritten is not ideal enough.
About above-mentioned, 2) item, above-mentioned housebroken language model 160 can be used not only for judging that speech recognition result (is entangled
Text before mistake rewriting, the first text) it whether there is mistake, it can be also used for judging the revised text of error correction (the second text)
With the presence or absence of mistake.On the one hand, language model 160 is directed to the puzzlement degree score value that the revised text of error correction (the second text) provides
The puzzlement degree score value that the text (the first text) before rewriting for error correction provides should be generally less than.On the other hand, this is puzzled
Spending score value should be less than the second predetermined puzzlement degree threshold value.This second predetermined puzzlement degree threshold value can be known with voice is above judged
The predetermined puzzlement degree threshold value of upper one first used when other result (the first text) is with the presence or absence of mistake is equal, can also be greater than upper
One first predetermined puzzlement degree threshold value (being judged using higher standard).
About the above-mentioned the 3) item, if rewriting two texts (the first text and the second text) of front and back relative to error correction
For maximum length, the editing distance between two texts is excessive, it is likely that deviates from the voice input literal sense of user.
In the case where the result for determining that error correction is rewritten is invalid, Machine Translation Model 120 can be returned and re-start error correction
Correction.
In the effective situation of result for determining error correction rewriting, (second is literary for the resulting text that output error correction is rewritten
This).
Machine Translation Model 120 is to leave for error correction from entirety to rewrite, and can consider global semantic information, this is semantic
Information can constrain local error correction.In particular, being compiled using the sequence based on attention to series model
Code (encoder) stage can be encoded into whole word one semantic vector, and in decoding (decoder) stage, it can pass through
Attention mechanism realizes that local alignment, such Machine Translation Model 120 had not only considered whole but also taken into account part.
In addition, Machine Translation Model 120 does not need explicitly to refine rewriting rule, model itself also has generalization ability,
In the case where using a large amount of training samples (training corpus), the presence of a small amount of dirty sample can also allow that.
It should also be noted that carrying out text error correction (such as speech recognition result error correction) in the technical solution of the disclosure
During, it does not need further to be interacted for error-correction operation between system and user.
[text error correction device]
The machine processing side of the disclosure above is described in detail by taking speech recognition result error correction as an example by reference to Fig. 1
Method and text error correction method.Below with reference to Fig. 2 by taking speech recognition result error correction as an example, describe the disclosure machine processing device and
Text error correction device.
Fig. 2 is the schematic block diagram for being illustrated the text error correction device of the disclosure with speech recognition result error correction device.
Wherein, the details in relation to some contents is identical as the description hereinbefore with reference to Fig. 1, and details are not described herein.
As described above, being described by taking speech recognition result error correction as an example here.It should be understood that the device of same principle
It can be adapted for the error correction of the text inputted to other way.
As shown in Fig. 2, the speech recognition result error correction device (more generally, text error correction device) 300 of the disclosure can be with
Including off-line module 310 and in wire module 320.
Off-line module 310 may include Machine Translation Model (machine processing model) preparation device 311, be applicable in for preparing
In the Machine Translation Model of speech recognition result error correction.
The Machine Translation Model can be sequence to series model.Preferably, which can be based on note
The sequence for power mechanism of anticipating is to series model.
Machine Translation Model preparation device 311 may include that error correction is rewritten to preparation device 311-1 and the first training device
311-2。
Error correction, which is rewritten, can be used for preparing error correction rewriting pair to preparation device 311-1, and error correction is rewritten to including wrong identification
As a result with corresponding correct recognition result.
Error correction is rewritten can also excavate error correction rewriting pair to preparation device 311-1 from log.
For example, error correction rewrite preparation device 311-1 can be found from log meet at least one following condition
First speech recognition result (first text) He Hou speech recognition result (later this), as error correction rewriting pair:
A) the first speech recognition result and the time interval between rear speech recognition result recorded in log is not more than
Predetermined time interval;And/or
B) first speech recognition result and the editing distance between rear speech recognition result are divided by two speech recognition knots
The ratio that the maximum length of fruit obtains is not more than the first predetermined ratio threshold value;And/or
C) it is not less than the first pre-determined number threshold value in the frequency of occurrence of rear speech recognition result;And/or
D) formerly speech recognition result and it is not less than the in the frequency of occurrence that rear speech recognition result is rewritten pair as error correction
Two pre-determined number threshold values.
First training device 311-2 can be used for rewriting using error correction to as training corpus, carry out to Machine Translation Model
Training.
It may include error correction re-writing device 321 in wire module 320, be used for speech recognition result text (i.e. the first text)
It is input in Machine Translation Model, obtains speech recognition error correction result text (i.e. the second text).
It can also include error correction decision maker 322 in wire module 320, for judging whether speech recognition result text needs
Carry out error correction.It wherein, can be by speech recognition result text in the case where the judgement of error correction decision maker 322 needs to carry out error correction
Originally it is input in Machine Translation Model, and in the case where the judgement of error correction decision maker 322 does not need to carry out error correction, it can not incite somebody to action
Speech recognition result text input is into Machine Translation Model.
Error correction decision maker 322 may include the first judgment means 322-1 and the second judgment means 322-2.
First judgment means 322-1 can be used language model and judge whether speech recognition result text is entangled
It is wrong.For example, the first predetermined puzzlement degree threshold value can be higher than for the puzzlement degree that speech recognition result text provides in language model
In the case where, judgement needs to carry out error correction.
Second judgment means 322-2 judges whether speech recognition result text needs to carry out error correction based on conventional dictionary.Example
Such as, can in speech recognition result text include non-everyday words in the case where, judgement need to carry out error correction.
Preferably, off-line module 310 can also include corpora mining device 312 and the second training device 313.
Corpora mining device 312 can be used for excavating the training corpus for being suitable for train language model from log.Second instruction
White silk device 313 is used for training corpus and carrys out train language model.
Preferably, off-line module 310 can also include corpus mixing arrangement 314, obtain for will excavate from log
Training corpus is mixed with common corpus, obtains combined training corpus wherein, and the second training device 313 uses combined training corpus
Carry out train language model.
Preferably, off-line module 310 can also include participle device 315, statistic device 316 and collating unit 317
Participle device 315 can be used for segmenting the speech recognition result text in log.Statistic device 316 can be with
For counting frequency of occurrence of each participle in log.Collating unit 317 can be used for frequency of occurrence is pre- not less than third
The word of frequency threshold value is determined as everyday words, is recorded in conventional dictionary.
In addition, can also include result judgement device 323 in wire module 320, for judging speech recognition error correction result text
Whether this is qualified.
For example, result judgement device 323 can judge speech recognition error correction knot based at least one in the following conditions
Whether fruit text is qualified:
1) Machine Translation Model is not less than predetermined confidence level threshold for the confidence level that speech recognition error correction result text provides
Value;
2) language model is less than speech recognition result text for the puzzlement degree score value that speech recognition error correction result text provides
This, and/or puzzlement degree score value is less than the second predetermined puzzlement degree threshold value;
3) editing distance between speech recognition result text and speech recognition error correction result text divided by the two maximum
The ratio that length obtains is not more than the second predetermined ratio threshold value.
[calculating equipment]
A kind of machine processing method that can be used for executing the disclosure and text error correction method are additionally provided according to the disclosure
Calculating equipment.The calculating equipment can be the server for speech recognition error correction.
Fig. 3 can be used for execute the disclosure machine processing method and text error correction method calculating equipment it is schematic
Block diagram.
As shown in figure 3, the calculating equipment 400 may include processor 420 and memory 430.It is stored on memory 430
Executable code.When processor 420 executes the executable code, so that processor 420 executes machine processing described above
Method and text error correction method.
Machine processing method and the text error correction rewriting side according to the disclosure are above described in detail by reference to attached drawing
Method, device and system.
In addition, being also implemented as a kind of computer program or computer program product, the meter according to the method for the present invention
Calculation machine program or computer program product include the calculating for executing the above steps limited in the above method of the invention
Machine program code instruction.
Alternatively, the present invention can also be embodied as a kind of (or the computer-readable storage of non-transitory machinable medium
Medium or machine readable storage medium), it is stored thereon with executable code (or computer program or computer instruction code),
When the executable code (or computer program or computer instruction code) by electronic equipment (or calculate equipment, server
Deng) processor execute when, so that the processor is executed each step according to the above method of the present invention.
Those skilled in the art will also understand is that, various illustrative logical blocks, mould in conjunction with described in disclosure herein
Block, circuit and algorithm steps may be implemented as the combination of electronic hardware, computer software or both.
The flow chart and block diagram in the drawings show the possibility of the system and method for multiple embodiments according to the present invention realities
Existing architecture, function and operation.In this regard, each box in flowchart or block diagram can represent module, a journey
A part of sequence section or code, a part of the module, section or code include one or more for realizing defined
The executable instruction of logic function.It should also be noted that in some implementations as replacements, the function of being marked in box can also
To be occurred with being different from the sequence marked in attached drawing.For example, two continuous boxes can actually be basically executed in parallel,
They can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or stream
The combination of each box in journey figure and the box in block diagram and or flow chart, can the functions or operations as defined in executing
Dedicated hardware based system realize, or can realize using a combination of dedicated hardware and computer instructions.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In the principle, practical application or improvement to the technology in market for best explaining each embodiment, or make the art
Other those of ordinary skill can understand each embodiment disclosed herein.
Claims (40)
1. a kind of machine processing method, comprising:
Prepare error correction rewriting pair, the error correction is rewritten to including Error Text and corresponding correct text;And
It is rewritten using the error correction to as training corpus, machine processing model is trained.
2. machine processing method according to claim 1, wherein it is described prepare error correction rewrite pair step include:
The error correction rewriting pair is excavated from log.
3. machine processing method according to claim 2, wherein found from log and meet at least one of following conditions
First text and later sheet, as error correction rewriting pair:
The first text that records in log and the later time interval between this are no more than predetermined time interval;
The ratio that first text and the later editing distance between this are obtained divided by the maximum length of two texts is no more than the
One predetermined ratio threshold value;
This frequency of occurrence is not less than the first pre-determined number threshold value later;
First text and later this as the frequency of occurrence of error correction rewriting pair not less than the second pre-determined number threshold value.
4. according to claim 1 to machine processing method described in any one of 3, wherein
The machine processing model is Machine Translation Model.
5. machine processing method according to claim 4, wherein
The Machine Translation Model is sequence to series model.
6. machine processing method according to claim 5, wherein
The Machine Translation Model is the sequence based on attention mechanism to series model.
7. according to claim 1 to machine processing method described in any one of 3, wherein
The Error Text and the correct text are all external input texts.
8. machine processing method according to claim 7, wherein
The external input text is typewriting input text, handwriting input text, speech recognition result text, image recognition result
One of text.
9. a kind of text error correction method, comprising:
Prepare the machine processing model for being suitable for text error correction;And
By the first text input into the machine processing model, the second text is obtained.
10. text error correction method according to claim 9, wherein
By preparing the machine processing model to machine processing method described in any one of 8 according to claim 1.
11. text error correction method according to claim 9, further includes:
Judge whether first text needs to carry out error correction,
Wherein, in the case where determining to need to carry out error correction, by first text input into the machine processing model, and
In the case where determining not needing to carry out error correction, not by first text input into the machine processing model.
12. text error correction method according to claim 11, wherein judge whether first text needs to carry out error correction
The step of include:
Judge whether first text needs to carry out error correction using language model;And/or
Judge whether first text needs to carry out error correction based on conventional dictionary.
13. text error correction method according to claim 12, wherein
In the case where the language model is higher than the first predetermined puzzlement degree threshold value for the puzzlement degree that first text provides,
Judgement needs to carry out error correction;And/or
In the case where including non-everyday words in first text, judgement needs to carry out error correction.
14. text error correction method according to claim 12, further includes:
The training corpus for being suitable for the training language model is excavated from the log;And
The language model is trained using the training corpus.
15. text error correction method according to claim 14, further includes:
The training corpus excavated from log is mixed with common corpus, obtains combined training corpus,
Wherein, the language model is trained using the combined training corpus.
16. text error correction method according to claim 14, further includes:
Text in the log is segmented;
Count frequency of occurrence of each participle in the log;And
Word using frequency of occurrence not less than third pre-determined number threshold value is recorded in the conventional dictionary as everyday words.
17. text error correction method according to claim 9, further includes:
Judge whether second text is qualified.
18. text error correction method according to claim 17, wherein based at least one in the following conditions to judge
Whether qualified state the second text:
The machine processing model is not less than predetermined confidence threshold value for the confidence level that second text provides;
The puzzlement degree score value that language model is provided for second text is less than first text and/or described tired
Puzzled degree score value is less than the second predetermined puzzlement degree threshold value;
The ratio that editing distance between first text and second text is obtained divided by the maximum length of the two is little
In the second predetermined ratio threshold value.
19. text error correction method according to claim 9, wherein
First text is typewriting input text, handwriting input text, speech recognition result text, image recognition result text
One of.
20. a kind of machine processing device, comprising:
Error correction is rewritten to preparation device, is rewritten pair for preparing error correction, and the error correction is rewritten to including wrong identification result and right
The correct recognition result answered;And
First training device is trained the machine processing model for being rewritten using the error correction to as training corpus.
21. machine processing device according to claim 20, wherein preparation device is dug in the error correction rewriting from log
Dig the error correction rewriting pair.
22. machine processing device according to claim 21, wherein found from log and meet at least one of following conditions
First text and later this, as the error correction rewrite pair:
The first text that records in log and the later time interval between this are no more than predetermined time interval;
The ratio that first text and the later editing distance between this are obtained divided by the maximum length of two texts is no more than the
One predetermined ratio threshold value;
This frequency of occurrence is not less than the first pre-determined number threshold value later;
First text and later this as the frequency of occurrence of error correction rewriting pair not less than the second pre-determined number threshold value.
23. the machine processing device according to any one of claim 20 to 22, wherein
The machine processing model is Machine Translation Model.
24. machine processing device according to claim 23, wherein
The Machine Translation Model is sequence to series model.
25. machine processing device according to claim 24, wherein
The Machine Translation Model is the sequence based on attention mechanism to series model.
26. the machine processing device according to any one of claim 20 to 22, wherein
The Error Text and the correct text are all external input texts.
27. machine processing device according to claim 26, wherein
The external input text is typewriting input text, handwriting input text, speech recognition result text, image recognition result
One of text.
28. a kind of text error correction device, comprising:
Off-line module, the off-line module include machine processing model preparation device, for preparing the machine suitable for text error correction
Device handles model;And
In wire module, described in wire module includes error correction re-writing device, for by the first text input to the machine processing mould
In type, the second text is obtained.
29. text error correction device according to claim 29, wherein
The machine processing model preparation device is the machine processing device according to any one of claim 20 to 27.
30. text error correction device according to claim 28, wherein described in wire module further include:
Error correction decision maker, for judging whether first text needs to carry out error correction,
Wherein, in the case where error correction decision maker judgement needs to carry out error correction, by first text input described in
In machine processing model, and in the case where error correction decision maker judgement does not need to carry out error correction, not by first text
Originally it is input in the machine processing model.
31. text error correction device according to claim 30, wherein the error correction decision maker includes:
First judgment means judge whether first text needs to carry out error correction using language model;And/or
Second judgment means judge whether first text needs to carry out error correction based on conventional dictionary.
32. text error correction device according to claim 31, wherein
First judgment means are higher than first for the puzzlement degree that first text provides in the language model and make a reservation for be stranded
In the case where puzzled degree threshold value, judgement needs to carry out error correction;And/or
In the case that second judgment means include non-everyday words in first text, judgement needs to carry out error correction.
33. text error correction device according to claim 31, wherein the off-line module further include:
Corpora mining device, for excavating the training corpus for being suitable for the training language model from the log;And
Second training device, for training the language model using the training corpus.
34. text error correction device according to claim 33, wherein the off-line module further include:
Corpus mixing arrangement obtains combined training for mixing the training corpus excavated from log with common corpus
Corpus,
Wherein, second training device trains the language model using the combined training corpus.
35. text error correction device according to claim 33, wherein the off-line module further include:
Device is segmented, for segmenting to the text in the log;
Statistic device, for counting frequency of occurrence of each participle in the log;And
Collating unit is recorded in described normal for the word using frequency of occurrence not less than third pre-determined number threshold value as everyday words
With in dictionary.
36. text error correction device according to claim 28, wherein described in wire module further include:
Result judgement device, for judging whether second text is qualified.
37. text error correction device according to claim 36, wherein the result judgement device is based in the following conditions
At least one of judge whether second text qualified:
The machine processing model is not less than predetermined confidence threshold value for the confidence level that second text provides;
The puzzlement degree score value that language model is provided for second text is less than first text and/or described tired
Puzzled degree score value is less than the second predetermined puzzlement degree threshold value;
The ratio that editing distance between first text and second text is obtained divided by the maximum length of the two is little
In the second predetermined ratio threshold value.
38. text error correction device according to claim 28, wherein
First text is typewriting input text, handwriting input text, speech recognition result text, image recognition result text
One of.
39. a kind of calculating equipment, comprising:
Processor;And
Memory is stored thereon with executable code, when the executable code is executed by the processor, makes the processing
Device executes the method as described in any one of claim 1-19.
40. a kind of non-transitory machinable medium, is stored thereon with executable code, when the executable code is electric
When the processor of sub- equipment executes, the processor is made to execute the method as described in any one of claims 1 to 19.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711060088.6A CN109753636A (en) | 2017-11-01 | 2017-11-01 | Machine processing and text error correction method and device calculate equipment and storage medium |
TW107130128A TW201918913A (en) | 2017-11-01 | 2018-08-29 | Machine processing and text correction method and device, computing equipment and storage media |
PCT/CN2018/111173 WO2019085779A1 (en) | 2017-11-01 | 2018-10-22 | Machine processing and text correction method and device, computing equipment and storage media |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711060088.6A CN109753636A (en) | 2017-11-01 | 2017-11-01 | Machine processing and text error correction method and device calculate equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109753636A true CN109753636A (en) | 2019-05-14 |
Family
ID=66331335
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711060088.6A Pending CN109753636A (en) | 2017-11-01 | 2017-11-01 | Machine processing and text error correction method and device calculate equipment and storage medium |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN109753636A (en) |
TW (1) | TW201918913A (en) |
WO (1) | WO2019085779A1 (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110148418A (en) * | 2019-06-14 | 2019-08-20 | 安徽咪鼠科技有限公司 | A kind of scene record analysis system, method and device thereof |
CN110232129A (en) * | 2019-06-11 | 2019-09-13 | 北京百度网讯科技有限公司 | Scene error correction method, device, equipment and storage medium |
CN110543812A (en) * | 2019-07-19 | 2019-12-06 | 拉扎斯网络科技(上海)有限公司 | information extraction method and device, electronic equipment and storage medium |
CN110750959A (en) * | 2019-10-28 | 2020-02-04 | 腾讯科技(深圳)有限公司 | Text information processing method, model training method and related device |
CN110827801A (en) * | 2020-01-09 | 2020-02-21 | 成都无糖信息技术有限公司 | Automatic voice recognition method and system based on artificial intelligence |
CN111104480A (en) * | 2019-11-30 | 2020-05-05 | 广东新瑞世纪科技有限公司 | Innovative AI intelligent text processing system |
CN111126072A (en) * | 2019-12-13 | 2020-05-08 | 北京声智科技有限公司 | Method, device, medium and equipment for training Seq2Seq model |
CN111125302A (en) * | 2019-11-29 | 2020-05-08 | 海信视像科技股份有限公司 | Error detection method and device for user input statement and electronic equipment |
CN111191441A (en) * | 2020-01-06 | 2020-05-22 | 广东博智林机器人有限公司 | Text error correction method, device and storage medium |
CN111209740A (en) * | 2019-12-31 | 2020-05-29 | 中移(杭州)信息技术有限公司 | Text model training method, text error correction method, electronic device and storage medium |
CN111310473A (en) * | 2020-02-04 | 2020-06-19 | 四川无声信息技术有限公司 | Text error correction method and model training method and device thereof |
CN111539199A (en) * | 2020-04-17 | 2020-08-14 | 中移(杭州)信息技术有限公司 | Text error correction method, device, terminal and storage medium |
CN111861731A (en) * | 2020-07-31 | 2020-10-30 | 重庆富民银行股份有限公司 | Post-credit check system and method based on OCR |
CN112183073A (en) * | 2020-11-27 | 2021-01-05 | 北京擎盾信息科技有限公司 | Text error correction and completion method suitable for legal hot-line speech recognition |
CN112435671A (en) * | 2020-11-11 | 2021-03-02 | 深圳市小顺智控科技有限公司 | Intelligent voice control method and system for accurately recognizing Chinese |
CN112464650A (en) * | 2020-11-12 | 2021-03-09 | 创新工场(北京)企业管理股份有限公司 | Text error correction method and device |
CN112489632A (en) * | 2019-09-11 | 2021-03-12 | 甲骨文国际公司 | Implementing correction models to reduce propagation of automatic speech recognition errors |
CN112733552A (en) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | Machine translation model construction method, device and equipment |
CN112767924A (en) * | 2021-02-26 | 2021-05-07 | 北京百度网讯科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
CN112784581A (en) * | 2020-11-20 | 2021-05-11 | 网易(杭州)网络有限公司 | Text error correction method, device, medium and electronic equipment |
CN113449511A (en) * | 2020-03-24 | 2021-09-28 | 百度在线网络技术(北京)有限公司 | Text processing method, device, equipment and storage medium |
CN113569545A (en) * | 2021-09-26 | 2021-10-29 | 中国电子科技集团公司第二十八研究所 | Control information extraction method based on voice recognition error correction model |
CN113705202A (en) * | 2021-08-31 | 2021-11-26 | 北京金堤科技有限公司 | Search input information error correction method and device, electronic equipment and storage medium |
CN113948066A (en) * | 2021-09-06 | 2022-01-18 | 北京数美时代科技有限公司 | Error correction method, system, storage medium and device for real-time translation text |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549694B (en) * | 2018-04-16 | 2021-11-23 | 南京云问网络技术有限公司 | Method for processing time information in text |
KR20210037307A (en) * | 2019-09-27 | 2021-04-06 | 삼성전자주식회사 | Electronic device and controlling method of electronic device |
TWI750622B (en) * | 2020-03-31 | 2021-12-21 | 群光電子股份有限公司 | Deep learning model training system, deep learning model training method, and non-transitory computer readable storage medium |
CN112749553B (en) * | 2020-06-05 | 2023-07-25 | 腾讯科技(深圳)有限公司 | Text information processing method and device for video file and server |
CN113947092A (en) * | 2020-07-16 | 2022-01-18 | 阿里巴巴集团控股有限公司 | Translation method and device |
CN111897535B (en) * | 2020-07-30 | 2024-07-02 | 平安科技(深圳)有限公司 | Grammar error correction method, grammar error correction device, computer system and readable storage medium |
CN111985241B (en) * | 2020-09-03 | 2023-08-08 | 深圳平安智慧医健科技有限公司 | Medical information query method, device, electronic equipment and medium |
CN112329476A (en) * | 2020-11-11 | 2021-02-05 | 北京京东尚科信息技术有限公司 | Text error correction method and device, equipment and storage medium |
CN113129865A (en) * | 2021-03-05 | 2021-07-16 | 联通(广东)产业互联网有限公司 | Method and device for processing communication voice transcription AI connector intermediate element |
CN113076739A (en) * | 2021-04-09 | 2021-07-06 | 厦门快商通科技股份有限公司 | Method and system for realizing cross-domain Chinese text error correction |
CN113177419B (en) * | 2021-04-27 | 2024-04-30 | 北京小米移动软件有限公司 | Text rewriting method and device, storage medium and electronic equipment |
CN113192497B (en) * | 2021-04-28 | 2024-03-01 | 平安科技(深圳)有限公司 | Speech recognition method, device, equipment and medium based on natural language processing |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170025117A1 (en) * | 2015-07-23 | 2017-01-26 | Samsung Electronics Co., Ltd. | Speech recognition apparatus and method |
CN106527756A (en) * | 2016-10-26 | 2017-03-22 | 长沙军鸽软件有限公司 | Method and device for intelligently correcting input information |
CN106598939A (en) * | 2016-10-21 | 2017-04-26 | 北京三快在线科技有限公司 | Method and device for text error correction, server and storage medium |
US20170154258A1 (en) * | 2015-11-30 | 2017-06-01 | National Institute Of Information And Communications Technology | Joint estimation method and method of training sequence-to-sequence model therefor |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN106919646A (en) * | 2017-01-18 | 2017-07-04 | 南京云思创智信息科技有限公司 | Chinese text summarization generation system and method |
CN107092664A (en) * | 2017-03-30 | 2017-08-25 | 华为技术有限公司 | A kind of content means of interpretation and device |
CN107122346A (en) * | 2016-12-28 | 2017-09-01 | 平安科技(深圳)有限公司 | The error correction method and device of a kind of read statement |
CN107170453A (en) * | 2017-05-18 | 2017-09-15 | 百度在线网络技术(北京)有限公司 | Across languages phonetic transcription methods, equipment and computer-readable recording medium based on artificial intelligence |
CN107229348A (en) * | 2016-03-23 | 2017-10-03 | 北京搜狗科技发展有限公司 | A kind of input error correction method, device and the device for inputting error correction |
US20170308526A1 (en) * | 2016-04-21 | 2017-10-26 | National Institute Of Information And Communications Technology | Compcuter Implemented machine translation apparatus and machine translation method |
-
2017
- 2017-11-01 CN CN201711060088.6A patent/CN109753636A/en active Pending
-
2018
- 2018-08-29 TW TW107130128A patent/TW201918913A/en unknown
- 2018-10-22 WO PCT/CN2018/111173 patent/WO2019085779A1/en active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170025117A1 (en) * | 2015-07-23 | 2017-01-26 | Samsung Electronics Co., Ltd. | Speech recognition apparatus and method |
US20170154258A1 (en) * | 2015-11-30 | 2017-06-01 | National Institute Of Information And Communications Technology | Joint estimation method and method of training sequence-to-sequence model therefor |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN107229348A (en) * | 2016-03-23 | 2017-10-03 | 北京搜狗科技发展有限公司 | A kind of input error correction method, device and the device for inputting error correction |
US20170308526A1 (en) * | 2016-04-21 | 2017-10-26 | National Institute Of Information And Communications Technology | Compcuter Implemented machine translation apparatus and machine translation method |
CN106598939A (en) * | 2016-10-21 | 2017-04-26 | 北京三快在线科技有限公司 | Method and device for text error correction, server and storage medium |
CN106527756A (en) * | 2016-10-26 | 2017-03-22 | 长沙军鸽软件有限公司 | Method and device for intelligently correcting input information |
CN107122346A (en) * | 2016-12-28 | 2017-09-01 | 平安科技(深圳)有限公司 | The error correction method and device of a kind of read statement |
CN106919646A (en) * | 2017-01-18 | 2017-07-04 | 南京云思创智信息科技有限公司 | Chinese text summarization generation system and method |
CN107092664A (en) * | 2017-03-30 | 2017-08-25 | 华为技术有限公司 | A kind of content means of interpretation and device |
CN107170453A (en) * | 2017-05-18 | 2017-09-15 | 百度在线网络技术(北京)有限公司 | Across languages phonetic transcription methods, equipment and computer-readable recording medium based on artificial intelligence |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232129B (en) * | 2019-06-11 | 2020-09-29 | 北京百度网讯科技有限公司 | Scene error correction method, device, equipment and storage medium |
CN110232129A (en) * | 2019-06-11 | 2019-09-13 | 北京百度网讯科技有限公司 | Scene error correction method, device, equipment and storage medium |
CN110148418B (en) * | 2019-06-14 | 2024-05-03 | 安徽咪鼠科技有限公司 | Scene record analysis system, method and device |
CN110148418A (en) * | 2019-06-14 | 2019-08-20 | 安徽咪鼠科技有限公司 | A kind of scene record analysis system, method and device thereof |
CN110543812A (en) * | 2019-07-19 | 2019-12-06 | 拉扎斯网络科技(上海)有限公司 | information extraction method and device, electronic equipment and storage medium |
CN112489632B (en) * | 2019-09-11 | 2024-04-05 | 甲骨文国际公司 | Implementing correction models to reduce propagation of automatic speech recognition errors |
CN112489632A (en) * | 2019-09-11 | 2021-03-12 | 甲骨文国际公司 | Implementing correction models to reduce propagation of automatic speech recognition errors |
CN110750959B (en) * | 2019-10-28 | 2022-05-10 | 腾讯科技(深圳)有限公司 | Text information processing method, model training method and related device |
CN110750959A (en) * | 2019-10-28 | 2020-02-04 | 腾讯科技(深圳)有限公司 | Text information processing method, model training method and related device |
CN111125302A (en) * | 2019-11-29 | 2020-05-08 | 海信视像科技股份有限公司 | Error detection method and device for user input statement and electronic equipment |
CN111104480A (en) * | 2019-11-30 | 2020-05-05 | 广东新瑞世纪科技有限公司 | Innovative AI intelligent text processing system |
CN111126072B (en) * | 2019-12-13 | 2023-06-20 | 北京声智科技有限公司 | Method, device, medium and equipment for training Seq2Seq model |
CN111126072A (en) * | 2019-12-13 | 2020-05-08 | 北京声智科技有限公司 | Method, device, medium and equipment for training Seq2Seq model |
CN111209740A (en) * | 2019-12-31 | 2020-05-29 | 中移(杭州)信息技术有限公司 | Text model training method, text error correction method, electronic device and storage medium |
CN111209740B (en) * | 2019-12-31 | 2023-08-15 | 中移(杭州)信息技术有限公司 | Text model training method, text error correction method, electronic device and storage medium |
CN111191441A (en) * | 2020-01-06 | 2020-05-22 | 广东博智林机器人有限公司 | Text error correction method, device and storage medium |
CN110827801B (en) * | 2020-01-09 | 2020-04-17 | 成都无糖信息技术有限公司 | Automatic voice recognition method and system based on artificial intelligence |
CN110827801A (en) * | 2020-01-09 | 2020-02-21 | 成都无糖信息技术有限公司 | Automatic voice recognition method and system based on artificial intelligence |
CN111310473A (en) * | 2020-02-04 | 2020-06-19 | 四川无声信息技术有限公司 | Text error correction method and model training method and device thereof |
CN113449511A (en) * | 2020-03-24 | 2021-09-28 | 百度在线网络技术(北京)有限公司 | Text processing method, device, equipment and storage medium |
CN111539199B (en) * | 2020-04-17 | 2023-08-18 | 中移(杭州)信息技术有限公司 | Text error correction method, device, terminal and storage medium |
CN111539199A (en) * | 2020-04-17 | 2020-08-14 | 中移(杭州)信息技术有限公司 | Text error correction method, device, terminal and storage medium |
CN111861731A (en) * | 2020-07-31 | 2020-10-30 | 重庆富民银行股份有限公司 | Post-credit check system and method based on OCR |
CN112435671B (en) * | 2020-11-11 | 2021-06-29 | 深圳市小顺智控科技有限公司 | Intelligent voice control method and system for accurately recognizing Chinese |
CN112435671A (en) * | 2020-11-11 | 2021-03-02 | 深圳市小顺智控科技有限公司 | Intelligent voice control method and system for accurately recognizing Chinese |
CN112464650A (en) * | 2020-11-12 | 2021-03-09 | 创新工场(北京)企业管理股份有限公司 | Text error correction method and device |
CN112784581A (en) * | 2020-11-20 | 2021-05-11 | 网易(杭州)网络有限公司 | Text error correction method, device, medium and electronic equipment |
CN112784581B (en) * | 2020-11-20 | 2024-02-13 | 网易(杭州)网络有限公司 | Text error correction method, device, medium and electronic equipment |
CN112183073A (en) * | 2020-11-27 | 2021-01-05 | 北京擎盾信息科技有限公司 | Text error correction and completion method suitable for legal hot-line speech recognition |
CN112733552A (en) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | Machine translation model construction method, device and equipment |
CN112733552B (en) * | 2020-12-30 | 2024-04-12 | 中国科学技术大学 | Machine translation model construction method, device and equipment |
CN112767924A (en) * | 2021-02-26 | 2021-05-07 | 北京百度网讯科技有限公司 | Voice recognition method and device, electronic equipment and storage medium |
US11842726B2 (en) | 2021-02-26 | 2023-12-12 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method, apparatus, electronic device and storage medium for speech recognition |
CN113705202A (en) * | 2021-08-31 | 2021-11-26 | 北京金堤科技有限公司 | Search input information error correction method and device, electronic equipment and storage medium |
CN113948066A (en) * | 2021-09-06 | 2022-01-18 | 北京数美时代科技有限公司 | Error correction method, system, storage medium and device for real-time translation text |
CN113569545A (en) * | 2021-09-26 | 2021-10-29 | 中国电子科技集团公司第二十八研究所 | Control information extraction method based on voice recognition error correction model |
Also Published As
Publication number | Publication date |
---|---|
WO2019085779A1 (en) | 2019-05-09 |
TW201918913A (en) | 2019-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109753636A (en) | Machine processing and text error correction method and device calculate equipment and storage medium | |
CN109766540B (en) | General text information extraction method and device, computer equipment and storage medium | |
CN110750959B (en) | Text information processing method, model training method and related device | |
CN108984683B (en) | Method, system, equipment and storage medium for extracting structured data | |
EP3230896B1 (en) | Localization complexity of arbitrary language assets and resources | |
US20210216819A1 (en) | Method, electronic device, and storage medium for extracting spo triples | |
US11797607B2 (en) | Method and apparatus for constructing quality evaluation model, device and storage medium | |
US20100023319A1 (en) | Model-driven feedback for annotation | |
CN112789591A (en) | Automatic content editor | |
US11361002B2 (en) | Method and apparatus for recognizing entity word, and storage medium | |
EP4364044A1 (en) | Automated troubleshooter | |
CN112528605B (en) | Text style processing method, device, electronic equipment and storage medium | |
CN107885744A (en) | Conversational data analysis | |
CN111881683A (en) | Method and device for generating relation triples, storage medium and electronic equipment | |
CN110991175A (en) | Text generation method, system, device and storage medium under multiple modes | |
US20230014904A1 (en) | Searchable data structure for electronic documents | |
CN111062216B (en) | Named entity identification method, device, terminal and readable medium | |
CN115510362A (en) | System for automatically generating web front-end codes according to natural language description documents | |
CN113468875A (en) | MNet method for semantic analysis of natural language interaction interface of SCADA system | |
CN107491443B (en) | Method and system for translating Chinese sentences containing unconventional words | |
CN113934450A (en) | Method, apparatus, computer device and medium for generating annotation information | |
US12032605B2 (en) | Searchable data structure for electronic documents | |
US20220269869A1 (en) | Handwriting text summarization | |
US20230305863A1 (en) | Self-Supervised System for Learning a User Interface Language | |
US20230153335A1 (en) | Searchable data structure for electronic documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190514 |
|
RJ01 | Rejection of invention patent application after publication |