CN108345581A - A kind of information identifying method, device and terminal device - Google Patents
A kind of information identifying method, device and terminal device Download PDFInfo
- Publication number
- CN108345581A CN108345581A CN201710054957.8A CN201710054957A CN108345581A CN 108345581 A CN108345581 A CN 108345581A CN 201710054957 A CN201710054957 A CN 201710054957A CN 108345581 A CN108345581 A CN 108345581A
- Authority
- CN
- China
- Prior art keywords
- character
- error correction
- recognition result
- identification
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/274—Converting codes to words; Guess-ahead of partial word inputs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Character Discrimination (AREA)
Abstract
The embodiment of the present application provides a kind of information identifying method, device and terminal device, and mistake is identified to reduce information.The method includes:Obtain at least two recognition results, wherein different recognition results identify to obtain according to different identification methods, and the input information that different recognition results correspond to identification derives from the same information content;At least two recognition result is compared, corresponding error correction position is obtained and corresponding waits for error correction character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result;Error correction position is corresponding waits for that error correction character carries out error correction respectively to each in each recognition result, obtains the identification content after error correction.It is corresponding to each error correction position to wait for that error correction character carries out error correction respectively, it can effectively improve the accuracy of identification.
Description
Technical field
This application involves field of communication technology, more particularly to a kind of information identifying method, a kind of information recognition device and
A kind of terminal device.
Background technology
With the development of science and technology, human-computer interaction refers to using certain conversational language between people and computer, with certain friendship
Mutual mode, to complete to determine the information exchanging process of task.
In interactive process, machine obtains there are many modes of information, such as voice, image, text.But often
There may be the problems that certain error causes identification mistake occur, such as speech recognition input when being identified for kind mode
" narrow strip of water " is identified as in the process " one by one band water ", " Pavilion of Prince Teng " is identified as to " the king Teng Yan ", by " threshold in image recognition
Value " is identified as " threshold values " etc., and identification mistake caused by the identification can cause to go wrong in follow-up interactive process.
Invention content
The technical problem to be solved in the embodiments of the present application is that a kind of information identifying method is provided, to reduce identification mistake.
Correspondingly, the embodiment of the present application also provides a kind of information recognition devices and a kind of terminal device, on ensureing
State the realization and application of method.
To solve the above-mentioned problems, the embodiment of the present application discloses a kind of information identifying method, including:Obtain at least two
Recognition result, wherein different recognition results identify to obtain according to different identification methods, and different recognition results correspond to the input of identification
Information source is in the same information content;At least two recognition result is compared, corresponding error correction position and right is obtained
That answers waits for error correction character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result;To each recognition result
In it is each error correction position is corresponding waits for that error correction character carries out error correction respectively, obtain the identification content after error correction.
Optionally, described to be compared at least two recognition result, obtain corresponding error correction position and corresponding
Wait for error correction character, including:The character carry out sequence comparison for including by each recognition result, obtains common characters sequence, the public affairs
Character string includes tactic at least one common characters altogether, and the common characters are in each recognition result according to certain suitable
The identical characters of sequence;Determine that error correction position, inquiry obtain each identification knot in each recognition result according to the common characters sequence
Error correction position described in fruit is corresponding to wait for error correction character.
Optionally, the character carry out sequence comparison for including by each recognition result, obtains common characters sequence, packet
It includes:The character carry out sequence comparison for including by each recognition result, obtains the identical characters in different recognition results, as public
Character;According to comparison sequence, the common characters are arranged to make up common characters sequence.
Optionally, described to determine error correction position in each recognition result according to the common characters sequence, including:According to institute
Common characters sequence is stated to be aligned at least two recognition results as unit of character, by aligned position in each recognition result it is identical but
The different aligned position of character is determined as error correction position.
Optionally, after at least two recognition results being aligned as unit of character according to the common characters sequence, institute
It states before the aligned position that aligned position is identical in each recognition result but character is different is determined as error correction position, further includes:When
When different recognition results correspond to the character quantity difference between two common characters, between the number of characters described two common characters
In the more recognition result of amount, the character that includes between described two common characters is filtered.
It is optionally, described that error correction position is corresponding waits for that error correction character entangles respectively to each in each recognition result
Mistake obtains the identification content after error correction, including:According to each error correction position in each recognition result it is corresponding wait for error correction character determine
The corresponding character set in the error correction position, from the character set matching obtain correcting character;It will be entangled described in each recognition result
The character of wrong position replaces with the amendment character, obtains the identification content after error correction.
Optionally, described to wait for that error correction character determines the error correction bit according to each error correction position is corresponding in each recognition result
Corresponding character set is set, is matched from the character set and obtains correcting character, including:For each error correction position, foundation
Error correction position is corresponding described in each recognition result waits for error correction character and the corresponding identification method of each recognition result, is matched to
A few character to be selected, generates corresponding character set;Character to be selected in the character set is carried out according at least one dimension
Matching selects character to be selected to correct character according to matching result.
Optionally, the step of character to be selected by the character set is matched according at least one dimension include
Following at least one:Character to be selected in the character set is matched according to font dimension, determines font similarity;By institute
The character to be selected stated in character set is matched according to word tone dimension, determines word tone similarity;It will be to be selected in the character set
Character is matched according to language dimension, determines context probability;According to the font similarity, word tone similarity and/or on
Hereafter probability determines the correction value of the character to be selected as matching result.
Optionally, described to select character to be selected according to matching result to correct character, including:It is carried out according to the correction value
Sequence, it is to correct character to choose character to be selected according to clooating sequence.
Optionally, at least two recognition results of the acquisition include:It is directed to the same information content using at least two modes
Input information is obtained, and the type of the input information according to the acquisition of each acquisition modes determines corresponding identification method;It will be different
The input information of type is identified respectively according to corresponding identification method, obtains at least two recognition results;The input letter
The type of breath includes following at least one:Voice class, image class, text class.
The embodiment of the invention also discloses a kind of information recognition devices, including:It acquisition module, error correction identification module and entangles
Wrong processing module;
The acquisition module, for obtaining at least two recognition results, wherein different recognition results are according to different identification sides
Formula identifies to obtain, and the input information that different recognition results correspond to identification derives from the same information content;
The error correction identification module obtains corresponding error correction bit at least two recognition result to be compared
It sets and corresponding waits for error correction character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result;
The correction process module is obtained for waiting for that error correction character carries out error correction respectively to each error correction position is corresponding
Identification content after to error correction.
Optionally, the error correction identification module, including:Compare submodule and error correction determination sub-module;
The comparison submodule, the character carry out sequence comparison for including by each recognition result, obtains common characters
Sequence, the common characters sequence include tactic at least one common characters, and the common characters are each recognition result
According to certain sequence identical characters;
The error correction determination sub-module, for determining error correction bit in each recognition result according to the common characters sequence
It sets, inquiry obtains that error correction position described in each recognition result is corresponding to wait for error correction character.
Optionally, the comparison submodule, the character carry out sequence comparison for including by each recognition result, obtains not
With the identical characters in recognition result, as common characters;According to comparison sequence, the common characters are arranged to make up public word
Accord with sequence.
Optionally, the error correction determination sub-module, for according to the common characters sequence by least two recognition results
It is aligned as unit of character, the aligned position that aligned position is identical in each recognition result but character is different is determined as error correction bit
It sets.
Optionally, the error correction determination sub-module is additionally operable between different recognition results correspond to two common characters
When character quantity difference, in the recognition result more than character quantity between described two common characters, described two common characters
Between include character be filtered.
Optionally, the correction process module, including:It corrects submodule and replaces submodule;
The amendment submodule, for waiting for that error correction character determines institute according to each error correction position is corresponding in each recognition result
The corresponding character set in error correction position is stated, is matched from the character set and obtains correcting character;
The replacement submodule, for the character of error correction position described in each recognition result to be replaced with the amendment word
Symbol, obtains the identification content after error correction.
Optionally, the amendment submodule, including:Matching unit and character amending unit;
The matching unit, for being directed to each error correction position, according to error correction position described in each recognition result
It is corresponding to wait for error correction character and the corresponding identification method of each recognition result, at least one character to be selected is matched, is generated corresponding
Character set;
The character amending unit is used for the character to be selected in the character set according to the progress of at least one dimension
Match, selects character to be selected according to matching result to correct character.
Optionally, the matching unit, for the character to be selected in the character set to be matched according to font dimension,
Determine font similarity;Character to be selected in the character set is matched according to word tone dimension, determines word tone similarity;It will
Character to be selected in the character set is matched according to language dimension, determines context probability;According to the font similarity,
Word tone similarity and/or context probability determine the correction value of the character to be selected as matching result.
Optionally, the character amending unit is waited for for being ranked up according to the correction value according to clooating sequence selection
Word selection symbol is amendment character.
Optionally, the acquisition module, for obtaining input information for the same information content using at least two modes,
And the type of the input information according to the acquisition of each acquisition modes determines corresponding identification method;By different types of input information
It is identified respectively according to corresponding identification method, obtains at least two recognition results.Wherein, the type packet of the input information
Include following at least one:Voice class, image class, text class.
Include memory and one or more than one the embodiment of the invention also discloses a kind of terminal device
Program, either more than one program is stored in memory and is configured to by one or more than one processing for one of them
It includes the instruction for being operated below that device, which executes the one or more programs,:Obtain at least two identification knots
Fruit, wherein different recognition results identify to obtain according to different identification methods, and the input information that different recognition results correspond to identification is come
Derived from the same information content;At least two recognition result is compared, corresponding error correction position is obtained and corresponding is waited for
Error correction character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result;It is corresponding to each error correction position
It waits for that error correction character carries out error correction respectively, obtains the identification content after error correction.
Optionally, the character carry out sequence comparison for including by each recognition result, obtains common characters sequence, packet
It includes:The character carry out sequence comparison for including by each recognition result, obtains the identical characters in different recognition results, as public
Character;According to comparison sequence, the common characters are arranged to make up common characters sequence.
Optionally, described to determine error correction position in each recognition result according to the common characters sequence, including:According to institute
Common characters sequence is stated to be aligned at least two recognition results as unit of character, by aligned position in each recognition result it is identical but
The different aligned position of character is determined as error correction position.
Optionally, after at least two recognition results being aligned as unit of character according to the common characters sequence, institute
It states before the aligned position that aligned position is identical in each recognition result but character is different is determined as error correction position, further includes:When
When different recognition results correspond to the character quantity difference between two common characters, between the number of characters described two common characters
In the more recognition result of amount, the character that includes between described two common characters is filtered.
It is optionally, described that error correction position is corresponding waits for that error correction character entangles respectively to each in each recognition result
Mistake obtains the identification content after error correction, including:According to each error correction position in each recognition result it is corresponding wait for error correction character determine
The corresponding character set in the error correction position, from the character set matching obtain correcting character;It will be entangled described in each recognition result
The character of wrong position replaces with the amendment character, obtains the identification content after error correction.
Optionally, described to wait for that error correction character determines the error correction bit according to each error correction position is corresponding in each recognition result
Corresponding character set is set, is matched from the character set and obtains correcting character, including:For each error correction position, foundation
Error correction position is corresponding described in each recognition result waits for error correction character and the corresponding identification method of each recognition result, is matched to
A few character to be selected, generates corresponding character set;Character to be selected in the character set is carried out according at least one dimension
Matching selects character to be selected to correct character according to matching result.
Optionally, the step of character to be selected by the character set is matched according at least one dimension include
Following at least one:Character to be selected in the character set is matched according to font dimension, determines font similarity;By institute
The character to be selected stated in character set is matched according to word tone dimension, determines word tone similarity;It will be to be selected in the character set
Character is matched according to language dimension, determines context probability;According to the font similarity, word tone similarity and/or on
Hereafter probability determines the correction value of the character to be selected as matching result.
Optionally, described to select character to be selected according to matching result to correct character, including:It is carried out according to the correction value
Sequence, it is to correct character to choose character to be selected according to clooating sequence.
Optionally, at least two recognition results of the acquisition include:It is directed to the same information content using at least two modes
Input information is obtained, and the type of the input information according to the acquisition of each acquisition modes determines corresponding identification method;It will be different
The input information of type is identified respectively according to corresponding identification method, obtains at least two recognition results;The input letter
The type of breath includes following at least one:Voice class, image class, text class.
The embodiment of the present application includes the following advantages:
The embodiment of the present application can obtain at least two recognition results, wherein different recognition results are according to different identification methods
Identification obtains, and the input information that different recognition results correspond to identification derives from the same information content, then can be at least two by described in
A recognition result is compared, and is obtained each error correction position each recognition result is corresponding and is waited for error correction character, due to different identifications
The wrong difference of the corresponding identification of mode is larger, therefore detects identification problem mutually by different recognition results and can effectively detect
Go out and identifies mistake, it is then corresponding to each error correction position to wait for that error correction character carries out error correction respectively, obtain the identification after error correction
Content improves the accuracy of identification.
Description of the drawings
Fig. 1 is a kind of step flow chart of information identifying method embodiment of the application;
Fig. 2 is a kind of step flow chart of information identifying method alternative embodiment of the application;
Fig. 3 is a kind of structure diagram of information recognition device embodiment of the application;
Fig. 4 is the structure diagram of another information recognition device embodiment of the application;
Fig. 5 is a kind of structure diagram of terminal device for information identification shown according to an exemplary embodiment;
Fig. 6 is a kind of block diagram being used for information recognition device as server when shown according to an exemplary embodiment.
Specific embodiment mode
In order to make the above objects, features, and advantages of the present application more apparent, below in conjunction with the accompanying drawings and it is specific real
Applying a mode, the present application will be further described in detail.
Referring to Fig.1, the step flow chart for showing a kind of information identifying method embodiment of the application, can specifically include
Following steps:
Step 102, at least two recognition results are obtained, wherein different recognition results are identified according to different identification methods
It arrives, the input information that different recognition results correspond to identification derives from the same information content.
The embodiment of the present application for the same information content may be used different modes carry out information collection, and by with acquisition
The corresponding mode of mode is by the information input collected to identification device, so as to use identification corresponding with input mode
Mode carries out the identification of input information, obtains corresponding recognition result.Such as ease of machine recognition in interactive process,
Computer equipment is inputed to after can acquiring same content in different ways, such as passes through voice input, text input, figure
Piece input etc. to which computer equipment obtains multichannel input information, and is chosen each input information according to corresponding input mode
Identification method identifies to obtain the corresponding recognition result of the input information using corresponding identification method.
Step 104, at least two recognition result is compared, obtains corresponding error correction position and corresponding waits entangling
Error character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result.
The recognition result of various identification methods there may be error, i.e. appearance identification mistake, and different identification methods
The type of error of corresponding identification is usually different, therefore after using a variety of identification methods to same content, so that it may by right
The comparison of different recognition results come determine identification mistake character.
Character described in the embodiment of the present invention also refers to the letter used in computer, number, word and symbol etc., example
It can be such as Chinese text for the character of Chinese recognition result, can be English alphabet for the character of English recognition result,
Can be katakana etc. for the character of Japanese OCR result, the character further include number in recognition result, punctuation mark, point
Every symbol, additional character etc..
Therefore at least two recognition results identified by different identification methods are compared, different recognition results
In some characters it is identical, some characters are different.
It will be known as common characters according to the identical characters of certain sequence in different recognition results.Due to deriving from same information
Content, therefore these common characters are ordered into, that is, have certain ordinal relation, such as recognition result A is { abcddsc }, and
Recognition result B is { abddesc }, it is assumed that sequentially for backward, then wherein common characters are since first character:Character a, word
Accord with b, character d, character s, character c, then can 5 characters be referred to as common characters, and be ranked sequentially as { a, b, d, s, c }.
To identify the position in different recognition results where kinds of characters by comparing.Such as aforementioned exemplary
In, the third position in recognition result A and recognition result B and the 5th, wherein recognition result A corresponds to the character of third position as c,
The character that third position is corresponded in recognition result B is d.
The position where kinds of characters is determined according to common characters orderly in recognition result, using the position as error correction
Position, using the corresponding character in error correction position in each recognition result as waiting for error correction character.For example, in aforementioned exemplary, error correction
Position includes third position and the 5th, and error correction position third position is corresponded in wherein recognition result A waits for that error correction character is c, identification
As a result error correction position third position is corresponded in B waits for that error correction character is d.
Wherein, a recognition result includes at least one error correction position, and each error correction position is divided on each recognition result
Not Dui Ying one wait for error correction character, i.e., an error correction position, which corresponds to, waits for that the quantity of error correction character is identical as the quantity of recognition result.
Step 106, error correction position is corresponding waits for that error correction character carries out error correction respectively to each in each recognition result, obtains
Identification content after to error correction.
In step 106, it can wait for that error correction character carries out error correction respectively to each error correction position is corresponding in each recognition result, i.e.,
The corresponding amendment character in each error correction position is determined respectively, and the error correction position in each recognition result is replaced using the amendment character
It is corresponding to wait for error correction character, the identification content after error correction is obtained, so that it is determined that the corresponding correct identification of the information content for going out input
As a result.It wherein, may the error correction position be corresponding in certain recognition results waits for that error correction character is mistake for different recognition results
, but in other recognition results, the error correction position is corresponding to wait for that error correction character may be correct.Therefore the present invention is implemented
In example, when error correction position is corresponding when error correction character is correct, the error correction position is corresponding to wait for that error correction character is as aforementioned
Amendment character.
To sum up, at least two recognition results are obtained, wherein different recognition results identify to obtain according to different identification methods,
The input information that different recognition results correspond to identification derives from the same information content, then can be by least two recognition result
It is compared, obtains in error correction position and each recognition result that each error correction position is corresponding to wait for error correction character.Due to different knowledges
The wrong difference of the corresponding identification of other mode is larger, therefore detects identification problem mutually by different recognition results and can effectively examine
Identification mistake is measured, then error correction position is corresponding waits for that error correction character carries out error correction respectively to each in each recognition result,
The identification content after error correction is obtained, the accuracy of identification is improved.
With reference to Fig. 2, shows the step flow chart of another information identifying method embodiment of the application, can specifically wrap
Include following steps:
Step 202, input information, and the input information obtained according to each acquisition modes are obtained using at least two modes
Type determine corresponding identification method.
Step 204, different types of input information is identified respectively according to corresponding identification method, is determined corresponding
Recognition result.
Computer equipment is inputted after being acquired in several ways to same information in the embodiment of the present application, such as intelligence
Robot, mobile device, PC machine etc..Wherein, intelligent robot refers to realizing the computer of human-computer interaction based on artificial intelligence
Equipment.The different types of input information that different acquisition mode obtains can be obtained by different input interfaces, and according to this
The corresponding acquisition mode of input interface determines the type of the corresponding input information of the interface, and determines therefrom that corresponding identification side
Formula.Wherein, the type of the input information includes following at least one:Voice class, image class, text class, corresponding identification side
Formula may include:Speech recognition, optical character identification, text identification etc., may also include certainly other kinds of input information and
Corresponding identification method, the embodiment of the present invention are not listed one by one.The languages such as microphone wherein can be passed through for the input information of voice class
The acquisition of sound equipment obtains, you can to call speech interface to obtain voice class input information, corresponding identification method can be voice
Identification;The input information of image class can be acquired by equipment such as cameras and obtained, you can to call shooting interface to obtain figure
As class input information, corresponding identification method is image recognition;Network interface, USB can be passed through for the input information of text class
The various data transmission interfaces such as interface obtain, and corresponding identification method is text identification;Certain above-mentioned various information can pass through
The various data transmission interfaces such as network interface, USB interface obtain.After obtaining different types of input information, according to all types of
Corresponding identification method is identified respectively, obtains at least two recognition results.
For example, in the application scenarios of human-computer interaction answer, item content be by host's thought topic (voice input) and
Topic plate show (image input) two ways and meanwhile input, intelligent robot can obtain voice class input information and image simultaneously
Class input information.
Wherein, for the image category information of image input, OCR (Optical Character may be used
Recognition, optical character identification) technology.The OCR refers to determining its shape by detecting dark, bright pattern, and word is used in combination
Shape is translated into the process of computword by symbol recognition methods.
I.e. intelligent robot can collect voice class input information when host's thought is inscribed, and can be inscribed by shooting
Plate obtains image class input information.Then speech recognition is carried out for voice class input information and obtains voice recognition result, simultaneously
OCR is carried out for image class input information to identify to obtain OCR recognition results.For example, for same information, voice class input information
Corresponding voice recognition result can be " king Teng is each positioned at Jiangxi Province ", and image class input information correspond to OCR recognition results can be with
For " the king Teng Yan is located at Jiang Si and saves ".
Step 206, the character carry out sequence comparison for including by each recognition result, obtains common characters sequence, the public affairs
Character string includes tactic at least one common characters altogether, and the common characters are each recognition result according to certain sequence
Identical characters.
Step 208, determine that error correction position, inquiry obtain each identification in each recognition result according to the common characters sequence
As a result error correction position described in is corresponding to wait for error correction character;Wherein, the error correction bit is set to kinds of characters institute in each recognition result
The position at place.
In order to be detected to recognition result and the error correction after going wrong, the present embodiment can correspond to different identification methods
The character carry out sequence comparison that includes of each recognition result, the sequence comparison refers to suitable using being determined according to common characters
Sequence and the mode being compared, i.e., the common characters compared out in comparison process, these common characters are ordered into arrangement
As common characters sequence, the tandem in recognition result that puts in order, i.e., common characters sequence in each recognition result
Row are ordered into, can discontinuously also can be continuous for whether continuous do not limit of common characters.Such as common characters sequence is
Abc, then common characters are in each recognition result:A is before b, and b is before c.
After sequence comparison obtains common characters sequence, it can distinguish according to by the common characters sequence and each recognition result
It is compared, you can determine the position where not common character that each recognition result includes and each not common character, then
Error correction position is filtered out from the position where these not common characters, then inquiry obtains respectively respectively in each recognition result
Error correction position described in recognition result is corresponding to wait for error correction character.
In an alternative embodiment of the invention, the character progress for including by each recognition result described in step 206 is sequentially
It comparing, obtains common characters sequence, the common characters sequence includes tactic at least one common characters, including:It will
The character that each recognition result includes obtains the identical characters in different recognition results according to certain sequence carry out sequence comparison,
As common characters;And according to comparison sequence, the common characters are arranged to make up common characters sequence.Tied by each identification
During fruit includes character carry out sequence comparison, at least one public affairs can be obtained according to comparison sequence with the execution of comparison
Character altogether, constitutes the common characters sequence after being then ranked sequentially at least one common characters according to comparison sequence.Example
It is public that corresponding longest such as can be obtained by LCS (Longest Common Subsequence, longest common subsequence) algorithm
Character string, the longest common characters sequence are longest subsequence in two or more known arrays.For another example the comparison is suitable
Sequence is compared backward being the first character for including since recognition result;The comparison sequence may be to be tied from identification
The last character that fruit includes starts to be carried forward comparison;It is not limited herein.
For example, be " king Teng is each positioned at Jiangxi Province " and OCR recognition results B by voice recognition result A being that " the king Teng Yan is located at
Jiang Si is saved " carry out sequence comparison, determine that common characters sequence is " Teng Wangwei Yu Jiang are saved ".
For another example be " king Teng is each positioned at Jiangxi Province " and OCR recognition results C by voice recognition result A being " the king Teng Yan, position
Yu Jiangsi is saved " carry out sequence comparison, determine that common characters sequence is still " Teng Wangwei Yu Jiang are saved ".
In another of the invention alternative embodiment, in step 210 described according to the common characters sequence in each identification
As a result error correction position is determined in, may include:By at least two recognition results it is single with character according to the common characters sequence
The location determination that aligned position is identical in each recognition result but character is different is error correction position by position alignment.Specifically, by each knowledge
Other result is synchronized according to common characters sequence, i.e., determines character sequence according to common characters sequence, then ties each identification
Fruit is aligned as unit of character, and alignment refers to realizing in different recognition results not common word between non-conterminous two common characters
The quantity of symbol is identical, so that each character is mutually one-to-one in different recognition results, then can tie each identification
The aligned position that aligned position is identical in fruit but character is different is determined as error correction position, i.e., will be between non-conterminous two common characters
Not common character corresponding position is as error correction position.
Optionally, described that at least two recognition results are aligned it as unit of character according to the common characters sequence
Afterwards, it is described the aligned position that aligned position is identical in each recognition result but character is different is determined as error correction position before, may be used also
To include:When the character quantity difference between different recognition results two common characters of correspondence, between two common characters
Character quantity more than recognition result in include between two common characters character be filtered.In actual treatment, although not
Derive from same content with input information, but sometimes different input modes may cause to exist between the input information acquired it is poor
Not, such as usual image is inputted with punctuation mark in corresponding image input information, and voice input does not often have punctuate
Symbol;It, may be with auxiliary words of mood, host and welcome guest in voice input information also as noted above when host's reading topic
Chat input etc., so that different input modes correspond in input information, the character between two non-conterminous common characters
Quantity may be different.Therefore the different character of the character quantity for including between two non-conterminous common characters is also needed to carry out
Filtering can remove or ignore the meaningless informations such as punctuation mark, auxiliary words of mood, chat data, the meaningless information by filtering
It refers to being meaningless relative to original information content, to by filtering, make non-conterminous in different recognition results
Two common characters between not common character quantity it is identical, convenient for determining error correction position.
For example, be " king Teng is each positioned at Jiangxi Province " and OCR recognition results B by voice recognition result A being that " the king Teng Yan is located at
Jiang Si is saved " be compared, determine that common characters sequence is " Teng Wangwei Yu Jiang save ", then error correction bit be set to third character bit with
7th character bit, recognition result A and recognition result B correspond to third character bit wait for error correction character be respectively " each " and
What " Yan ", recognition result A and recognition result B corresponded to the 7th character bit waits for that error correction character is respectively " west " and " four ".
For another example be " king Teng is each positioned at Jiangxi Province " and OCR recognition results C by voice recognition result A being " the king Teng Yan, position
Yu Jiangsi is saved " it is compared, it determines that common characters sequence is " Teng Wangwei Yu Jiang are saved ", then can see, common characters " king "
The character quantity for including between common characters " position " is different, wherein the common characters " king " and common characters of recognition result C
The character quantity for including between " position " is more, then need alignment be filtered alignment, filter out including punctuation character ", ", and
According to the recognition result C and recognition result A after alignment, determine that error correction bit is set to third alignment characters position and the 7th alignment word
Fu Wei, recognition result A and recognition result B correspond to third alignment characters position wait for error correction character be respectively " each " and " Yan ", knowledge
What other result A and recognition result B corresponded to the 7th alignment characters position waits for that error correction character is respectively " west " and " four ".
Step 210, wait for that error correction character determines the error correction position according to error correction position described in each recognition result is corresponding
Corresponding character set matches from the character set and obtains correcting character.
Step 212, the character of error correction position described in each recognition result is replaced with into the amendment character, after obtaining error correction
Identification content.
For an error correction position, each recognition result is corresponding to be waited in error correction character, it is understood that there may be correct characters, it is also possible to
All be error character, thus can by error correction position described in each recognition result it is corresponding wait for error correction character be used as character to be selected,
And based on error correction character is respectively waited for and its identification method determines other characters to be selected, obtain that the error correction position is corresponding various to wait for
Word selection accords with, and generates corresponding character set, is then matched to character to be selected in the character set, determines and corrects character.It again will be each
The error correction position is corresponding in recognition result waits for that error correction character replaces with the amendment character, to obtain in the identification after error correction
Hold.Identification content after the error correction can be correspond in each recognition result error correction position replace with it is after correcting character as a result, i.e. often
A recognition result corresponds to the identification content after an error correction respectively;Or recognition result alignment and replace amendment character
Content afterwards is to get to the identification content after an error correction.In addition, the identification content after the error correction may be with the letter as source
It is identical to cease content.
In an alternative embodiment of the invention, corresponded to according to error correction position described in each recognition result described in step 210
Wait for that error correction character determines the corresponding character set in the error correction position, from the character set matching obtain correct character, including:
For each error correction position, error correction character and each identification are waited for according to error correction position described in each recognition result is corresponding
As a result corresponding identification method matches at least one character to be selected, generates corresponding character set;It will be to be selected in the character set
Character is matched according at least one dimension, selects character to be selected according to matching result to correct character.For each error correction
Error correction character is waited for as character to be selected using error correction position described in each recognition result is corresponding, and will each identify knot in position
Fruit carries out fallibility digital data extension according to corresponding identification method, determines that corresponding character is character to be selected, each to obtain
The corresponding character set in error correction position, each character set include at least one character to be selected.It then can will be in the character set
Character to be selected matched according at least one dimension, wherein dimension can according to identification method, language etc. determine, that is, be directed to wait for
Word selection symbol respectively obtains corresponding dimension values in each dimension, and the correction value of the character to be selected can be obtained according to the dimension values,
It determines the matching result of the error correction position, character to be selected is then selected from matching result to correct character.
Wherein, for voice recognition mode, it will wait for that error correction character and voice recognition mode match, it may be determined that go out word tone
Similar at least one character to be selected;For image recognition mode (such as OCR identification methods), error correction character and image recognition will be waited for
Mode is matched, it may be determined that goes out similar at least one character to be selected of font sound etc..
The dimension includes following at least one:Font dimension, word tone dimension, language dimension.Font dimension refers to word
The dimension of shape, i.e., be likely to occur on font face it is similar, such as in the pictographs such as Chinese, often perhaps multicharacter knot
Structure and more similar in shape, for another example English alphabet etc. in identification may also it is similar due to shape and identify mistake, such as u and
v.Therefore in the image recognitions mode such as OCR, usually it is possible that due to font it is similar caused by identify mistake.Word tone
Dimension refers to the dimension of character sound, i.e., may be similar in the pronunciation of character, and the pronunciation of some English words is similar, in Chinese
Also there are many phonetically similar words, therefore possibly can not only be accurately identified by pronouncing, hence for voice recognition mode, may usually be gone out
Mistake is identified caused by existing pronunciation is similar.Language dimension refers to the dimension of language environment, i.e., in different language environment if not
Lead to mistake etc. with context." each " to wait for word selection for word tone is similar such as " pavilion " in " Pavilion of Prince Teng ", " Yan " is font phase
As wait for word selection, and " pavilion " is to wait for word selection according to language environment.
In another alternative embodiment of the invention, the character to be selected by the character set is according at least one dimension
The step of being matched includes following at least one:Character to be selected in the character set is matched according to font dimension,
Determine font similarity;Character to be selected in the character set is matched according to word tone dimension, determines word tone similarity;It will
Character to be selected in the character set is matched according to language dimension, determines context probability;According to the font similarity,
Word tone similarity and/or context probability determine the correction value of the character to be selected as matching result.
Wherein, the character to be selected by the character set is matched according to font dimension, determines font similarity,
May include:Calculating character concentrates each character to be selected and waits for the similarity between error correction character on font face, as font
Similarity, if the character to be selected is to wait for that error correction character, similarity can be 100% or other values.
Wherein, the character to be selected by the character set is matched according to word tone dimension, determines word tone similarity,
May include:Calculating character concentrates each character to be selected and waits for the similarity between error correction character on pronunciation, similar as word tone
Degree, if the character to be selected is to wait for that error correction character, similarity can be 100% or other values.
Wherein, the character to be selected by the character set is matched according to language dimension, determines context probability,
May include:By each character input to be selected in character set to corresponding recognition result or the context environmental of common characters Sequence composition
In, match the context probability of the character to be selected in such circumstances according to corresponding speech model.
Wherein font similarity, word tone similarity and context probability can be described as the dimension values in corresponding dimension,
In, character to be selected can have dimension values in one or more dimensions wherein, then can be according to the font similarity, word
Sound similarity and/or context probability calculate the correction value of the character to be selected as matching result.Wherein the correction value can be
The probability value of one concrete numerical value or normalized, the embodiment of the present invention are not construed as limiting this.Wherein, different dimensions
Weight can be also set, corresponding matching value is obtained after being weighted to dimension values using weight.
It is described select character to be selected for amendment character according to matching result in another of the invention alternative embodiment, it can be with
Including:It is ranked up according to the correction value, it is to correct character to choose character to be selected according to clooating sequence.It can be by error correction position pair
Each character to be selected that the character set answered includes, is ranked up according to the size of correction value in matching result, such as arranges from big to small
Sequence sorts from small to large, and it is to correct character then to choose character to be selected according to clooating sequence, such as optional maximum, also it is optional most
Small, it is determined according to specific requirements.
By voice recognition result be " king Teng is each positioned at Jiangxi Province " and OCR recognition results it is " king Teng Yan position for example, above-mentioned
Yu Jiangsi is saved " it is compared, determine that common characters sequence is " Teng Wangwei Yu Jiang are saved ".By correcting the knowledge after error correction can be obtained
Other content is " Pavilion of Prince Teng is located at Jiangxi Province ".
The embodiment of the present invention can also be excavated offline in advance, to obtain voice fallibility words, OCR fallibility words, with
And corresponding speech model, convenient for providing data basis during determining character to be selected and dimension values.By taking Chinese as an example, for
Voice fallibility words, may due to Chinese pronunciations flat tongue, retroflect, pre-nasal sound, rear nasal sound etc. lead to voice fallibility words occur,
And voice fallibility words may also occur due to voiceless consonant, voiced consonant, liaison etc. in English.It may be due to radical portion for Chinese
There is OCR fallibility words in head, character form structure etc..For language model, Chinese language model can be using word as token (token), i.e.,
The matching of language model is carried out according to word, english language model can be that token is matched etc. with word.
To which the embodiment of the present invention can utilize different knowledges by carrying out Corresponding matching wrong to the input of various ways
Other mode carries out the matching of various dimensions, improves error correction accuracy and efficiency, even if each road recognition result occurs identifying mistake, still
It so can effectively carry out error correction.
In an exemplary scene, such as in the application scenarios of human-computer interaction answer, item content is to pass through host
Thought inscribe (voice input) and topic plate shows (image input) two ways input, intelligent robot can obtain voice input information with
Image input information.That is intelligent robot typing voice input information when host's thought is inscribed, and topic plate acquisition figure can be shot
As input information.Then speech recognition is carried out for voice input information and obtains voice recognition result, and image is inputted
Information carries out OCR and identifies to obtain OCR recognition results.
Recognition result to speech recognition is Q1=(t11,t12,...t1n), the recognition result of OCR identifications is Q2=(t21,
t22,...t2m).It can determine the longest common sequence of two kinds of recognition results by LCS algorithms, then use the longest common sequence
Two inputs are aligned by word for unit, and find the error correction position of two kinds of recognition results according to this.For example, voice input letter
The corresponding voice recognition result of breath is " king Teng is each positioned at that Jiangxi Province ", and it is " Teng that image input information, which corresponds to OCR recognition results,
The king Yan is located at Jiang Si provinces ".It can determine that longest common sequence is " Teng Wangwei Yu Jiang are saved " by LCS algorithms, then foundation should " king Teng
Two recognition results are aligned positioned at Jiang Sheng ", find that " in " and " river " two is public in voice input information in alignment procedure
More " that " two characters, determine that it is auxiliary words of mood, are filtered out in character.Wherein, common characters " in " and " river " phase
It is continuous for OCR recognition results, but is discontinuous relative to voice recognition result.To which above-mentioned recognition result corresponds to really
To determine error correction bit and is set to third character bit and the 7th character bit, third character bit waits for that error correction character is " each " and " Yan ",
7th character bit waits for that error correction character is " west " and " four ".
For these corresponding error correction positions of identification mistake, using the respective identification method of Q1, Q2 excavate it is familiar in shape,
Words similar in word tone is as character to be selected.Such as { respectively, the character set of above-mentioned error correction bit (third character bit) may include
The Yan, pavilion ... }.Each wait for that word selection is matched according to each dimension, obtaining corresponding matching value includes:Font similarity, word tone phase
Like degree, the context probability etc. based on language model.Then it a regression model can be used treats word selection and be ranked up, finally give
Go out the amendment character of the position, such as amendment character of above-mentioned error correction bit (third character bit) is " pavilion ", error correction bit (the 7th word
Accord with position) amendment character be " west ", identification content after corresponding error correction is " Pavilion of Prince Teng is located at Jiangxi Province ".
Based on above-described embodiment, the alignment for multiple input modes (i.e. multimode inputs), because of different input modes pair
It answers input content related to input source, there is the noise data of response according to the feature of input source, such as voice starts, stop bits
The accuracy of identification set is relatively low, and the auxiliary word etc. during for another example environment is noisy and language can all cause the precision of speech recognition
It influences, for another example effect of shadow, the clarity etc. of image taking can also impact accuracy of identification in OCR identifications.And multimode is defeated
The recognition result entered is not often naturally to be aligned.Therefore it is matched jointly by multichannel recognition result, it is defeated multichannel can be based on
Feature entered such as word tone, font, context etc. so that different identification methods can mutual phase processor, knowledge is provided on different dimensions
Other accuracy and precision, being capable of effective error correction to multichannel input.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group
It closes, but those skilled in the art should understand that, the embodiment of the present application is not limited by the described action sequence, because according to
According to the embodiment of the present application, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should
Know, embodiment described in this description belongs to preferred embodiment, and involved action not necessarily the application is implemented
Necessary to example.
With reference to Fig. 3, show a kind of structure diagram of information recognition device embodiment of the application, can specifically include as
Lower module:Acquisition module 302, error correction identification module 304 and correction process module 306;
Acquisition module 302, for obtaining at least two recognition results, wherein different recognition results are according to different identification sides
Formula identifies to obtain, and the input information that different recognition results correspond to identification derives from the same information content.
Error correction identification module 304 obtains corresponding error correction position at least two recognition result to be compared
And corresponding wait for error correction character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result.
Correction process module 306, for waiting for error correction character point to each error correction position in each recognition result is corresponding
Error correction is not carried out, obtains the identification content after error correction.
To sum up, at least two recognition results can be obtained, wherein different recognition results are identified according to different identification methods
It arrives, the input information that different recognition results correspond to identification derives from the same information content, then can be identified described at least two
As a result it is compared, obtains each error correction position each recognition result is corresponding and wait for error correction character, due to different identification methods pair
The identification mistake difference answered is larger, therefore detects identification problem mutually by different recognition results and can effectively detect to identify
Mistake, it is then corresponding to each error correction position to wait for that error correction character carries out error correction respectively, the identification content after error correction is obtained, is carried
The accuracy of height identification.
With reference to Fig. 4, shows the structure diagram of another information recognition device embodiment of the application, can specifically include
Following module:
Acquisition module 302, for obtaining at least two recognition results, wherein different recognition results are according to different identification sides
Formula identifies to obtain, and the input information that different recognition results correspond to identification derives from the same information content.
Error correction identification module 304 obtains corresponding error correction position at least two recognition result to be compared
And corresponding wait for error correction character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result.
Correction process module 306, for waiting for error correction character point to each error correction position in each recognition result is corresponding
Error correction is not carried out, obtains the identification content after error correction.
Wherein, error correction identification module 304, including:Compare submodule 3042 and error correction determination sub-module 3044;
Submodule 3042 is compared, the character carry out sequence comparison for including by each recognition result obtains common characters
Sequence, the common characters sequence include tactic at least one common characters, and the common characters are each recognition result
According to certain sequence identical characters.
Error correction determination sub-module 3044, for determining error correction bit in each recognition result according to the common characters sequence
It sets, inquiry obtains that error correction position described in each recognition result is corresponding to wait for error correction character.
Wherein, the comparison submodule 3042, the character carry out sequence comparison for including by each recognition result, obtains
Identical characters in different recognition results, as common characters;According to comparison sequence, the common characters are arranged to make up public
Character string.
The error correction determination sub-module 3044 is used at least two recognition results according to the common characters sequence with word
Symbol is that unit is aligned, and the aligned position that aligned position is identical in each recognition result but character is different is determined as error correction position.
The error correction determination sub-module 3044, the character being additionally operable between different recognition results correspond to two common characters
When quantity difference, wrapped in the recognition result more than character quantity between described two common characters, described two common characters
The character included is filtered.
The correction process module 306, including:It corrects submodule 3062 and replaces submodule 3064;
Submodule 3062 is corrected, for waiting for that error correction character determines institute according to each error correction position is corresponding in each recognition result
The corresponding character set in error correction position is stated, is matched from the character set and obtains correcting character;
Submodule 3064 is replaced, for the character of error correction position described in each recognition result to be replaced with the amendment word
Symbol, obtains the identification content after error correction.
The amendment submodule 3062, including:Matching unit and character amending unit.
The matching unit, for being directed to each error correction position, according to error correction position described in each recognition result
It is corresponding to wait for error correction character and the corresponding identification method of each recognition result, at least one character to be selected is matched, is generated corresponding
Character set.
The character amending unit is used for the character to be selected in the character set according to the progress of at least one dimension
Match, selects character to be selected according to matching result to correct character.
Wherein, the matching unit, for matching the character to be selected in the character set according to font dimension, really
Determine font similarity;Character to be selected in the character set is matched according to word tone dimension, determines word tone similarity;By institute
The character to be selected stated in character set is matched according to language dimension, determines context probability;According to the font similarity, word
Sound similarity and/or context probability determine the correction value of the character to be selected as matching result.
The character amending unit chooses character to be selected for being ranked up according to the correction value according to clooating sequence
To correct character.
The identification module 302, for using at least two modes for same information content acquisition input information, and according to
The type of the input information obtained according to each acquisition modes determines corresponding identification method;By different types of input information according to
Corresponding identification method is identified respectively, obtains at least two recognition results.Wherein, the type of the input information include with
Lower at least one:Voice class, image class, text class.
To which the embodiment of the present invention can utilize different knowledges by carrying out Corresponding matching wrong to the input of various ways
Other mode carries out the matching of various dimensions, improves error correction accuracy and efficiency, even if each road recognition result occurs identifying mistake, still
It so can effectively carry out error correction.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description
Place illustrates referring to the part of embodiment of the method.
Fig. 5 is a kind of structure diagram of terminal device 500 for information identification shown according to an exemplary embodiment.
For example, terminal device 500 can be mobile phone, and computer, digital broadcast terminal equipment, messaging devices, game control
Platform, tablet device, Medical Devices, body-building equipment, personal digital assistant, intelligent robot etc..
With reference to Fig. 5, terminal device 500 may include following one or more components:Processing component 502, memory 504,
Power supply module 506, multimedia component 508, audio component 510, the interface 512 of input/output (I/O), sensor module 514,
And communication component 516.
The integrated operation of 502 usual control terminal equipment 500 of processing component, such as with display, call, data are logical
Letter, camera operation and record operate associated operation.Processing component 502 may include one or more processors 520 to hold
Row instruction, to perform all or part of the steps of the methods described above.In addition, processing component 502 may include one or more moulds
Block, convenient for the interaction between processing component 502 and other assemblies.For example, processing component 502 may include multi-media module, with
Facilitate the interaction between multimedia component 508 and processing component 502.
Memory 504 is configured as storing various types of data to support the operation in equipment 500.These data are shown
Example includes the instruction for any application program or method that are operated on terminal device 500, contact data, telephone directory number
According to, message, picture, video etc..Memory 504 can by any kind of volatibility or non-volatile memory device or they
Combination realize, such as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable
Programmable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, quick flashing
Memory, disk or CD.
Electric power assembly 504 provides electric power for the various assemblies of terminal device 500.Electric power assembly 504 may include power supply pipe
Reason system, one or more power supplys and other generated with for terminal device 500, management and the associated component of distribution electric power.
Multimedia component 508 is included in the screen of one output interface of offer between the terminal device 500 and user.
In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surface
Plate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touches
Sensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding
The boundary of action, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,
Multimedia component 508 includes a front camera and/or rear camera.When terminal device 500 is in operation mode, such as clap
When taking the photograph pattern or video mode, front camera and/or rear camera can receive external multi-medium data.It is each preposition
Camera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 510 is configured as output and/or input audio signal.For example, audio component 510 includes a Mike
Wind (MIC), when terminal device 500 is in operation mode, when such as call model, logging mode and speech recognition mode, microphone
It is configured as receiving external audio signal.The received audio signal can be further stored in memory 504 or via logical
Believe that component 516 is sent.In some embodiments, audio component 510 further includes a loud speaker, is used for exports audio signal.
I/O interfaces 512 provide interface between processing component 502 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor module 514 includes one or more sensors, the state for providing various aspects for terminal device 500
Assessment.For example, sensor module 514 can detect the state that opens/closes of equipment 500, the relative positioning of component, such as institute
The display and keypad that component is terminal device 500 are stated, sensor module 514 can be with detection terminal equipment 500 or terminal
The position change of 500 1 components of equipment, the existence or non-existence that user contacts with terminal device 500,500 orientation of terminal device
Or the temperature change of acceleration/deceleration and terminal device 500.Sensor module 514 may include proximity sensor, be configured to
It detects the presence of nearby objects without any physical contact.Sensor module 514 can also include optical sensor, such as
CMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which can be with
Including acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 516 is configured to facilitate the communication of wired or wireless way between terminal device 500 and other equipment.
Terminal device 500 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or combination thereof.Show at one
In example property embodiment, communication component 514 receives broadcast singal or broadcast from external broadcasting management system via broadcast channel
Relevant information.In one exemplary embodiment, the communication component 514 further includes near-field communication (NFC) module, short to promote
Cheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module
(UWB) technology, bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, terminal device 500 can be by one or more application application-specific integrated circuit (ASIC), number
Word signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
In the exemplary embodiment, it includes the non-transitorycomputer readable storage medium instructed, example to additionally provide a kind of
Such as include the memory 504 of instruction, above-metioned instruction can be executed by the processor 520 of terminal device 500 to complete the above method.Example
Such as, the non-transitorycomputer readable storage medium can be ROM, it is random access memory (RAM), CD-ROM, tape, soft
Disk and optical data storage devices etc..
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of terminal device
When device executes so that terminal device is able to carry out a kind of information identifying method, the method includes:Obtain at least two identification knots
Fruit, wherein different recognition results identify to obtain according to different identification methods, and the input information that different recognition results correspond to identification is come
Derived from the same information content;At least two recognition result is compared, corresponding error correction position is obtained and corresponding is waited for
Error correction character, wherein the corresponding character in error correction position is to wait for error correction character in recognition result;It is corresponding to each error correction position
It waits for that error correction character carries out error correction respectively, obtains the identification content after error correction.
Wherein, described to be compared at least two recognition result, it obtains corresponding error correction position and corresponding waits for
Error correction character, including:The character carry out sequence comparison for including by each recognition result, obtains common characters sequence, described public
Character string includes tactic at least one common characters, and the common characters are in each recognition result according to certain sequence
Identical characters;Determine that error correction position, inquiry obtain each recognition result in each recognition result according to the common characters sequence
Described in error correction position is corresponding waits for error correction character.
The character carry out sequence comparison for including by each recognition result, obtains common characters sequence, including:By each knowledge
The character carry out sequence comparison that other result includes, obtains the identical characters in different recognition results, as common characters;Foundation
The common characters are arranged to make up common characters sequence by comparison sequence.
It is described to determine error correction position in each recognition result according to the common characters sequence, including:According to described public
At least two recognition results are aligned by character string as unit of character, by aligned position in each recognition result is identical but character not
Same aligned position is determined as error correction position.
It is described by each knowledge after being aligned at least two recognition results as unit of character according to the common characters sequence
The aligned position that aligned position is identical in other result but character is different is determined as before error correction position, further includes:When different identifications
When as a result corresponding to the character quantity difference between two common characters, the knowledge more than character quantity between described two common characters
The character for including in other result, between described two common characters is filtered.
Described error correction position is corresponding waits for that error correction character carries out error correction respectively to each in each recognition result, is entangled
Identification content after mistake, including:Wait for that error correction character determines the error correction according to each error correction position is corresponding in each recognition result
The corresponding character set in position matches from the character set and obtains correcting character;By error correction position described in each recognition result
Character replaces with the amendment character, obtains the identification content after error correction.
It is described to wait for that error correction character determines that the error correction position is corresponded to according to each error correction position in each recognition result is corresponding
Character set, from the character set matching obtain correct character, including:For each error correction position, know according to each
Error correction position described in other result is corresponding to wait for that error correction character and the corresponding identification method of each recognition result, matching are at least one
Character to be selected generates corresponding character set;Character to be selected in the character set is matched according at least one dimension, according to
Character to be selected is selected according to matching result to correct character.
The step of character to be selected by the character set is matched according at least one dimension include with down toward
Few one kind:Character to be selected in the character set is matched according to font dimension, determines font similarity;By the character
The character to be selected concentrated is matched according to word tone dimension, determines word tone similarity;Character to be selected in the character set is pressed
It is matched according to language dimension, determines context probability;It is general according to the font similarity, word tone similarity and/or context
Rate determines the correction value of the character to be selected as matching result.
It is described to select character to be selected according to matching result to correct character, including:It is ranked up, presses according to the correction value
It is to correct character to choose character to be selected according to clooating sequence.
At least two recognition results of the acquisition include:Using at least two modes input is obtained for the same information content
Information, and the type of the input information according to the acquisition of each acquisition modes determines corresponding identification method;It will be different types of defeated
Enter information to be identified respectively according to corresponding identification method, obtains at least two recognition results;The type of the input information
Including following at least one:Voice class, image class, text class.
Fig. 6 is a kind of block diagram being used for information recognition device as server when shown according to an exemplary embodiment.
The server 600 can generate bigger difference because configuration or performance are different, may include one or more centres
Device (central processing units, CPU) 622 (for example, one or more processors) and memory 632 is managed,
The storage medium 630 of one or more storage application programs 642 or data 644 (such as deposit by one or more magnanimity
Store up equipment).Wherein, memory 632 and storage medium 630 can be of short duration storage or persistent storage.It is stored in storage medium 630
Program may include one or more modules (diagram does not mark), each module may include to the system in server
Row instruction operation.Further, central processing unit 622 could be provided as communicating with storage medium 630, on server 600
Execute the series of instructions operation in storage medium 630.
Server 600 can also include one or more power supplys 626, one or more wired or wireless networks
Interface 650, one or more input/output interfaces 658, one or more keyboards 656, and/or, one or one
The above operating system 641, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiments of the present application may be provided as method, apparatus or calculating
Machine program product.Therefore, the embodiment of the present application can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present application can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of embodiment.
The embodiment of the present application is with reference to according to the method for the embodiment of the present application, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in flow and/or box combination.These can be provided
Computer program instructions are set to all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals
Standby processor is to generate a machine so that is held by the processor of computer or other programmable data processing terminal equipments
Capable instruction generates for realizing in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes
The device of specified function.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing terminal equipments
In computer-readable memory operate in a specific manner so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one flow of flow chart or multiple flows and/or one side of block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing terminal equipments so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one flow of flow chart or multiple flows
And/or in one box of block diagram or multiple boxes specify function the step of.
Although preferred embodiments of the embodiments of the present application have been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and all change and modification within the scope of the embodiments of the present application.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also include other elements that are not explicitly listed, or further include for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device including the element.
Above to a kind of information identifying method provided herein, a kind of information recognition device and a kind of terminal device,
It is described in detail, specific case used herein is expounded the principle and way of example of the application, above
The explanation of embodiment is merely used to help understand the present processes and its core concept;Meanwhile for the general skill of this field
Art personnel, according to the thought of the application, the there will be changes in specific embodiment mode and application range, in conclusion
The contents of this specification should not be construed as limiting the present application.
Claims (12)
1. a kind of information identifying method, which is characterized in that including:
Obtain at least two recognition results, wherein different recognition results identify to obtain according to different identification methods, difference identification knot
The input information that fruit corresponds to identification derives from the same information content;
At least two recognition result is compared, corresponding error correction position is obtained and corresponding waits for error correction character, wherein
The corresponding character in error correction position is to wait for error correction character in recognition result;
Error correction position is corresponding waits for that error correction character carries out error correction respectively to each in each recognition result, obtains the knowledge after error correction
Other content.
2. according to the method described in claim 1, it is characterized in that, described be compared at least two recognition result,
Obtain corresponding error correction position and it is corresponding wait for error correction character, including:
The character carry out sequence comparison for including by each recognition result, obtains common characters sequence, the common characters sequence packet
Tactic at least one common characters are included, the common characters are the same word according to certain sequence in each recognition result
Symbol;
Determine that error correction position, inquiry obtain entangling described in each recognition result in each recognition result according to the common characters sequence
Wrong position is corresponding to wait for error correction character.
3. according to the method described in claim 2, it is characterized in that, the character progress sequence for including by each recognition result
It compares, obtains common characters sequence, including:
The character carry out sequence comparison for including by each recognition result, obtains the identical characters in different recognition results, as public affairs
Character altogether;
According to comparison sequence, the common characters are arranged to make up common characters sequence.
4. according to the method described in claim 3, it is characterized in that, it is described according to the common characters sequence in each recognition result
Middle determining error correction position, including:
At least two recognition results are aligned as unit of character according to the common characters sequence, will be aligned in each recognition result
The aligned position that position is identical but character is different is determined as error correction position.
5. according to the method described in claim 4, it is characterized in that, at least two identifications are tied according to the common characters sequence
It is described to determine the aligned position that aligned position is identical in each recognition result but character is different after fruit is aligned as unit of character
Before error correction position, further include:
When the character quantity difference between different recognition results two common characters of correspondence, between described two common characters
The character for including in recognition result more than character quantity, between described two common characters is filtered.
6. according to the method described in claim 1, it is characterized in that, described to each error correction position pair in each recognition result
That answers waits for that error correction character carries out error correction respectively, obtains the identification content after error correction, including:
Wait for that error correction character determines the corresponding character set in the error correction position according to each error correction position in each recognition result is corresponding,
Matching obtains correcting character from the character set;
The character of error correction position described in each recognition result is replaced with into the amendment character, obtains the identification content after error correction.
7. according to the method described in claim 6, it is characterized in that, described correspond to according to each error correction position in each recognition result
Wait for that error correction character determines the corresponding character set in the error correction position, from the character set matching obtain correct character, including:
For each error correction position, according to the corresponding error correction character and each of waiting in error correction position described in each recognition result
The corresponding identification method of recognition result matches at least one character to be selected, generates corresponding character set;
Character to be selected in the character set is matched according at least one dimension, character to be selected is selected according to matching result
To correct character.
8. the method according to the description of claim 7 is characterized in that the character to be selected by the character set is according at least
The step of one dimension is matched includes following at least one:
Character to be selected in the character set is matched according to font dimension, determines font similarity;
Character to be selected in the character set is matched according to word tone dimension, determines word tone similarity;
Character to be selected in the character set is matched according to language dimension, determines context probability;
According to the font similarity, word tone similarity and/or context probability, the correction value conduct of the character to be selected is determined
Matching result.
9. the method according to the description of claim 7 is characterized in that described select character to be selected to correct word according to matching result
Symbol, including:
It is ranked up according to the correction value, it is to correct character to choose character to be selected according to clooating sequence.
10. according to the method described in claim 1, it is characterized in that, at least two recognition results of the acquisition include:
Using at least two modes input information, and the input obtained according to each acquisition modes are obtained for the same information content
The type of information determines corresponding identification method;
Different types of input information is identified respectively according to corresponding identification method, obtains at least two recognition results;
The type of the input information includes following at least one:Voice class, image class, text class.
11. a kind of information recognition device, which is characterized in that including:
Acquisition module, for obtaining at least two recognition results, wherein different recognition results are identified according to different identification methods
It arrives, the input information that different recognition results correspond to identification derives from the same information content;
Error correction identification module obtains corresponding error correction position and correspondence at least two recognition result to be compared
Wait for error correction character, wherein in recognition result the corresponding character in error correction position be wait for error correction character;
Correction process module, for waiting for that error correction character carries out error correction respectively to each error correction position is corresponding, after obtaining error correction
Identification content.
12. a kind of terminal device, which is characterized in that include memory and one or more than one program, wherein one
A either more than one program is stored in memory and is configured to execute described one by one or more than one processor
A or more than one program includes the instruction for being operated below:
Obtain at least two recognition results, wherein different recognition results identify to obtain according to different identification methods, difference identification knot
The input information that fruit corresponds to identification derives from the same information content;
At least two recognition result is compared, corresponding error correction position is obtained and corresponding waits for error correction character, wherein
The corresponding character in error correction position is to wait for error correction character in recognition result;
It is corresponding to each error correction position to wait for that error correction character carries out error correction respectively, obtain the identification content after error correction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710054957.8A CN108345581B (en) | 2017-01-24 | 2017-01-24 | Information identification method and device and terminal equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710054957.8A CN108345581B (en) | 2017-01-24 | 2017-01-24 | Information identification method and device and terminal equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108345581A true CN108345581A (en) | 2018-07-31 |
CN108345581B CN108345581B (en) | 2022-10-14 |
Family
ID=62962818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710054957.8A Active CN108345581B (en) | 2017-01-24 | 2017-01-24 | Information identification method and device and terminal equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108345581B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214387A (en) * | 2018-09-14 | 2019-01-15 | 辽宁奇辉电子系统工程有限公司 | A kind of railway operation detection system based on character recognition technology |
CN109344831A (en) * | 2018-08-22 | 2019-02-15 | 中国平安人寿保险股份有限公司 | A kind of tables of data recognition methods, device and terminal device |
CN109344730A (en) * | 2018-09-06 | 2019-02-15 | 康美健康云服务有限公司 | Data extraction method, device and computer readable storage medium |
CN110659639A (en) * | 2019-09-24 | 2020-01-07 | 北京字节跳动网络技术有限公司 | Chinese character recognition method and device, computer readable medium and electronic equipment |
CN111126370A (en) * | 2018-10-31 | 2020-05-08 | 上海迈弦网络科技有限公司 | OCR recognition result-based longest common substring automatic error correction method and system |
CN111178049A (en) * | 2019-12-09 | 2020-05-19 | 天津幸福生命科技有限公司 | Text correction method and device, readable medium and electronic equipment |
CN112927087A (en) * | 2021-02-03 | 2021-06-08 | 泛华普益基金销售有限公司 | Financing information processing system, financing information processing method, computer device, and storage medium |
CN113595717A (en) * | 2020-04-30 | 2021-11-02 | 比亚迪股份有限公司 | ECB mode block encryption method, ECB mode block decryption method, ECB mode block encryption control device, ECB mode block decryption control device and vehicle |
CN116992496A (en) * | 2023-09-28 | 2023-11-03 | 武汉彤新科技有限公司 | Data resource safety supervision system for enterprise service management |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103000176A (en) * | 2012-12-28 | 2013-03-27 | 安徽科大讯飞信息科技股份有限公司 | Speech recognition method and system |
JP2014149612A (en) * | 2013-01-31 | 2014-08-21 | Nippon Hoso Kyokai <Nhk> | Voice recognition error correction device and its program |
CN105374356A (en) * | 2014-08-29 | 2016-03-02 | 株式会社理光 | Speech recognition method, speech assessment method, speech recognition system, and speech assessment system |
CN106098060A (en) * | 2016-05-19 | 2016-11-09 | 北京搜狗科技发展有限公司 | The correction processing method of voice and device, the device of correction process for voice |
-
2017
- 2017-01-24 CN CN201710054957.8A patent/CN108345581B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103000176A (en) * | 2012-12-28 | 2013-03-27 | 安徽科大讯飞信息科技股份有限公司 | Speech recognition method and system |
JP2014149612A (en) * | 2013-01-31 | 2014-08-21 | Nippon Hoso Kyokai <Nhk> | Voice recognition error correction device and its program |
CN105374356A (en) * | 2014-08-29 | 2016-03-02 | 株式会社理光 | Speech recognition method, speech assessment method, speech recognition system, and speech assessment system |
CN106098060A (en) * | 2016-05-19 | 2016-11-09 | 北京搜狗科技发展有限公司 | The correction processing method of voice and device, the device of correction process for voice |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344831A (en) * | 2018-08-22 | 2019-02-15 | 中国平安人寿保险股份有限公司 | A kind of tables of data recognition methods, device and terminal device |
CN109344831B (en) * | 2018-08-22 | 2024-04-05 | 中国平安人寿保险股份有限公司 | Data table identification method and device and terminal equipment |
CN109344730A (en) * | 2018-09-06 | 2019-02-15 | 康美健康云服务有限公司 | Data extraction method, device and computer readable storage medium |
CN109214387A (en) * | 2018-09-14 | 2019-01-15 | 辽宁奇辉电子系统工程有限公司 | A kind of railway operation detection system based on character recognition technology |
CN111126370A (en) * | 2018-10-31 | 2020-05-08 | 上海迈弦网络科技有限公司 | OCR recognition result-based longest common substring automatic error correction method and system |
CN110659639A (en) * | 2019-09-24 | 2020-01-07 | 北京字节跳动网络技术有限公司 | Chinese character recognition method and device, computer readable medium and electronic equipment |
CN110659639B (en) * | 2019-09-24 | 2021-11-05 | 北京字节跳动网络技术有限公司 | Chinese character recognition method and device, computer readable medium and electronic equipment |
CN111178049A (en) * | 2019-12-09 | 2020-05-19 | 天津幸福生命科技有限公司 | Text correction method and device, readable medium and electronic equipment |
CN111178049B (en) * | 2019-12-09 | 2023-12-12 | 北京懿医云科技有限公司 | Text correction method and device, readable medium and electronic equipment |
CN113595717B (en) * | 2020-04-30 | 2023-10-17 | 比亚迪股份有限公司 | ECB mode packet encryption method and decryption method, control device and vehicle |
CN113595717A (en) * | 2020-04-30 | 2021-11-02 | 比亚迪股份有限公司 | ECB mode block encryption method, ECB mode block decryption method, ECB mode block encryption control device, ECB mode block decryption control device and vehicle |
CN112927087A (en) * | 2021-02-03 | 2021-06-08 | 泛华普益基金销售有限公司 | Financing information processing system, financing information processing method, computer device, and storage medium |
CN116992496A (en) * | 2023-09-28 | 2023-11-03 | 武汉彤新科技有限公司 | Data resource safety supervision system for enterprise service management |
CN116992496B (en) * | 2023-09-28 | 2023-12-29 | 武汉彤新科技有限公司 | Data resource safety supervision system for enterprise service management |
Also Published As
Publication number | Publication date |
---|---|
CN108345581B (en) | 2022-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108345581A (en) | A kind of information identifying method, device and terminal device | |
CN107102746B (en) | Candidate word generation method and device and candidate word generation device | |
WO2021128880A1 (en) | Speech recognition method, device, and device for speech recognition | |
US20210224592A1 (en) | Method and device for training image recognition model, and storage medium | |
CN107992812A (en) | A kind of lip reading recognition methods and device | |
CN109446961B (en) | Gesture detection method, device, equipment and storage medium | |
CN108008832A (en) | A kind of input method and device, a kind of device for being used to input | |
CN108121736A (en) | A kind of descriptor determines the method for building up, device and electronic equipment of model | |
CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
CN107564526B (en) | Processing method, apparatus and machine-readable medium | |
CN108922531B (en) | Slot position identification method and device, electronic equipment and storage medium | |
CN108509412A (en) | A kind of data processing method, device, electronic equipment and storage medium | |
US11335348B2 (en) | Input method, device, apparatus, and storage medium | |
KR20210032875A (en) | Voice information processing method, apparatus, program and storage medium | |
EP3734472A1 (en) | Method and device for text processing | |
CN112735396A (en) | Speech recognition error correction method, device and storage medium | |
CN110069143A (en) | A kind of information is anti-error to entangle method, apparatus and electronic equipment | |
CN112036174B (en) | Punctuation marking method and device | |
CN113936697B (en) | Voice processing method and device for voice processing | |
CN111797746B (en) | Face recognition method, device and computer readable storage medium | |
CN114154485A (en) | Text error correction method and device | |
CN111816174B (en) | Speech recognition method, device and computer readable storage medium | |
CN108182002A (en) | Layout method, device, equipment and the storage medium of enter key | |
CN110858099B (en) | Candidate word generation method and device | |
CN116860913A (en) | Voice interaction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |