CN104699675B - The method and apparatus of translation information - Google Patents
The method and apparatus of translation information Download PDFInfo
- Publication number
- CN104699675B CN104699675B CN201510119654.0A CN201510119654A CN104699675B CN 104699675 B CN104699675 B CN 104699675B CN 201510119654 A CN201510119654 A CN 201510119654A CN 104699675 B CN104699675 B CN 104699675B
- Authority
- CN
- China
- Prior art keywords
- emoticon
- information
- mark
- word
- replaced
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a kind of method and apparatus of translation information, belong to natural language processing research field.Method includes:Obtain the emoticon included in the first information of original language form;The first mark that the emoticon is replaced into for identifying the emoticon is obtained into the second information in the first information;Second information is translated as to the 3rd information of object language form;Second mark corresponding with the first mark is extracted from the 3rd information;In the 3rd information by second mark be replaced into the second mark corresponding to emoticon obtain the 4th information.Device includes:First acquisition module, the first replacement module, translation module, the first extraction module and the second replacement module.Realize and do not limited by emoticon storehouse and dictionary for translation, the high accuracy of emoticon is translated, the costs such as the dictionary for translation comprising emoticon, translation rule, translation model and language model are constructed in reduction, solve the identification, translation and generation problem of the emoticon being not logged in emoticon dictionary.
Description
Technical field
The present invention relates to natural language processing research field, more particularly to a kind of method and apparatus of translation information.
Background technology
Currently, as the development of computer network and the communication technology, mobile terminal are increasingly popularized, Email, short message,
The more and more routine works and life for penetrating into people of the various social medias such as Facebook, QQ, wechat, microblogging.In people
Daily exchange activity in, short text information largely occurs, and is mingled with various table being made up of multiple symbol string in word
Feelings symbol.
On the other hand, the communication space of people, the exchange of people are constantly expanded in the development of Internet technology and the communication technology
Become more and more transnational, it is the important means that people carry out cross-cultural communication that information is carried out into translation, especially when user's
When foreign country's language is poor, when seeing the information of foreign country's language, typically information can be translated into target language using machine translation
Speech.Wherein, substantial amounts of emoticon may be included in information, machine translation is often translated emoticon using dictionary for translation
Into object language, wherein dictionary for translation includes the word of emoticon object language form corresponding with its.
During the present invention is realized, inventor has found that prior art at least has problems with:
Because emoticon is continually changing, cause to construct the time-consuming of dictionary for translation, cost is high, when in information
Certain emoticon not in dictionary for translation, translation model or translation instance when, the emoticon can not just be translated.
The content of the invention
In order to solve the problems of the prior art, the invention provides a kind of method of translation information.The technical scheme
It is as follows:
On the one hand, the invention provides a kind of method of translation information, methods described to include:
Obtain the emoticon included in the first information of original language form;
The emoticon is replaced into the first information and identified for identifying the first of the emoticon
To the second information;
Second information is translated as to the 3rd information of object language form;
Second mark corresponding with the described first mark is extracted from the 3rd information;
In the 3rd information by the described second mark be replaced into second mark corresponding to emoticon obtain the
Four information.
Further, described first be identified as temporary variable, the form of the temporary variable in every kind of language format all
It is identical;
It is described that the emoticon is replaced into the first mark for identifying the emoticon in the first information
Knowledge obtains the second information, including:
Interim numbering is distributed for the emoticon;
The interim numbering that the emoticon is replaced into the emoticon in the first information obtains the 5th letter
Breath;
It is that the emoticon distributes temporary variable according to position of the emoticon in the first information;
Associate the temporary variable of the emoticon and interim numbering;
The interim numbering of the emoticon is replaced into the interim volume with the emoticon in the 5th information
Number associated temporary variable obtains the second information.
Further, described second temporary variable, the extraction from the 3rd information and the described first mark are identified as
Second mark corresponding to sensible, including:
The temporary variable that the 3rd packet contains is extracted from the 3rd information;
Correspondingly, it is described that the described second mark is replaced into expression corresponding to second mark in the 3rd information
Symbol obtains the 4th information, including:
Obtain the interim numbering associated with the temporary variable;
The temporary variable is replaced into associated with the temporary variable interim number in the 3rd information
To the 6th information;
Obtain emoticon corresponding to the interim numbering;
In the 6th information by the interim numbering be replaced into the interim numbering corresponding to emoticon obtain the
Four information.
Further, described first word corresponding to the emoticon is identified as, the language format of the word is source
Language format;
It is described that the emoticon is replaced into the first mark for identifying the emoticon in the first information
Knowledge obtains the second information, including:
According to the emoticon, the attribute information of the emoticon is obtained;
At least one word according to corresponding to the attribute information of the emoticon obtains the emoticon;
The emoticon in the first information is replaced into each word at least one word respectively,
Obtain the second information corresponding to each word.
Further, it is described according to the emoticon, the attribute information of the emoticon is obtained, including:
According to the icon data of the emoticon, the expression is obtained from icon data and the corresponding relation of call number
The call number of symbol;
According to the call number of the emoticon, obtained from the corresponding relation of call number corresponding to original language and attribute information
Take the attribute information of the emoticon.
Further, it is described at least one according to corresponding to the attribute information of the emoticon obtains the emoticon
Word, including:
The attribute information of the emoticon similarity between each attribute information in semantic dictionary respectively is calculated,
The semantic dictionary is used for the corresponding relation of attribute information storage and word;
The similarity obtained from the semantic dictionary between the attribute information of the emoticon meets preparatory condition
At least one attribute information;
Word corresponding to each attribute information at least one attribute information is obtained from the semantic dictionary.
Further, it is described at least one according to corresponding to the attribute information of the emoticon obtains the emoticon
Word, including:
Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;
Obtain the synonym or near synonym of word corresponding to the emoticon, and using the synonym and near synonym as
Word corresponding to the emoticon.
Further, the extraction from the 3rd information second mark corresponding with the described first mark, including:
The word of object language form corresponding to first mark is extracted from the 3rd information, by the extraction
Word is as the second mark;
Correspondingly, it is described that the described second mark is replaced into expression corresponding to second mark in the 3rd information
Symbol obtains the 4th information, including:
Pair for including the described second mark is obtained from attribute information corresponding to object language and the corresponding relation of call number
It should be related to;
Extract the call number included in the corresponding relation of the acquisition;
According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
The icon data for the emoticon that the described second mark is replaced into the acquisition in the 3rd information obtains
4th information.
Further, the extraction from the 3rd information second mark corresponding with the described first mark, including:
The word of object language form corresponding to first mark is extracted from the 3rd information, by the extraction
Word is as the second mark;
Correspondingly, it is described that the described second mark is replaced into expression corresponding to second mark in the 3rd information
Symbol obtains the 4th information, including:
Obtain first mark corresponding with the described second mark;
Obtained in corresponding relation from attribute information corresponding to original language with call number comprising the corresponding of the described first mark
Relation;
Extract the call number included in the corresponding relation of the acquisition;
According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
The icon data that described second mark is replaced into the emoticon in the 3rd information obtains the 4th letter
Breath.
On the other hand, the invention provides a kind of device of translation information, described device to include:
First acquisition module, the emoticon included in the first information for obtaining original language form;
First replacement module, for the emoticon to be replaced into for identifying the expression in the first information
First mark of symbol obtains the second information;
Translation module, for second information to be translated as to the 3rd information of object language form;
First extraction module, for extracting second mark corresponding with the described first mark from the 3rd information;
Second replacement module, it is corresponding for the described second mark to be replaced into second mark in the 3rd information
Emoticon obtain the 4th information.
In embodiments of the present invention, the emoticon included in the first information of original language form is obtained;In the first information
Middle the first mark that the emoticon is replaced into for identifying the emoticon obtains the second information;Second information is translated as
3rd information of object language form;Second mark corresponding with the first mark is extracted from the 3rd information;In the 3rd information
It is middle by second mark be replaced into the second mark corresponding to emoticon obtain the 4th information.It is achieved thereby that not by emoticon
Storehouse and the limitation of dictionary for translation, can effectively realize the high accuracy translation of emoticon, and the translation comprising emoticon is constructed in reduction
The cost such as dictionary, translation rule or translation model, language model.And can also effectively solve to be not logged in emoticon dictionary
The identification of emoticon, translation and generation problem.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, make required in being described below to embodiment
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings
Accompanying drawing.
Fig. 1 is a kind of method flow diagram of the method for translation information that the embodiment of the present invention 1 provides;
Fig. 2-1 is a kind of method flow diagram of the method for translation information that the embodiment of the present invention 2 provides;
Fig. 2-2 is a kind of schematic diagram for first information that the embodiment of the present invention 2 provides;
Fig. 2-3 is the schematic diagram for the second information segment analysis that the embodiment of the present invention 2 provides;
Fig. 2-4 is the schematic diagram for the second information segment analysis that the embodiment of the present invention 2 provides;
Fig. 2-5 is a kind of schematic diagram for 4th information that the embodiment of the present invention 2 provides;
Fig. 2-6 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 2 provides;
Fig. 3-1 is a kind of method flow diagram of the method for translation information that the embodiment of the present invention 3 provides;
Fig. 3-2 is a kind of schematic diagram for first information that the embodiment of the present invention 3 provides;
Fig. 3-3 is a kind of schematic diagram for 4th information that the embodiment of the present invention 3 provides;
Fig. 3-4 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 3 provides;
Fig. 3-5 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 3 provides;
Fig. 3-6 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 3 provides;
Fig. 4 is a kind of apparatus structure schematic diagram for translation information that the embodiment of the present invention 4 provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is described in further detail.
Embodiment 1
The embodiments of the invention provide a kind of method of translation information, the executive agent of this method is terminal, and the translation is believed
The method of breath can be implemented in combination with as some or all of of terminal by software, hardware or both;And terminal can
Not possess any symbol database.The terminal includes mobile terminal, fixed terminal or server etc..
Referring to Fig. 1, wherein, this method includes:
Step 101:Obtain the emoticon included in the first information of original language form;
Step 102:The emoticon is replaced into the first information and identified for identifying the first of the emoticon
To the second information;
Step 103:Second information is translated as to the 3rd information of object language form;
Step 104:Second mark corresponding with the first mark is extracted from the 3rd information;
Step 105:In the 3rd information by second mark be replaced into the second mark corresponding to emoticon obtain the 4th letter
Breath.
Further, first temporary variable is identified as, the form of temporary variable is all identical in every kind of language format;
The first mark that the emoticon is replaced into for identifying the emoticon is obtained into the second letter in the first information
Breath, including:
Interim numbering is distributed for the emoticon;
The interim numbering that the emoticon is replaced into the emoticon in the first information obtains the 5th information;
It is that the emoticon distributes temporary variable according to position of the emoticon in the first information;
Associate the temporary variable of the emoticon and interim numbering;
The interim numbering of the emoticon is replaced into the 5th information associated with the interim numbering of the emoticon
Temporary variable obtain the second information.
Further, second temporary variable is identified as, second mark corresponding with the first mark is extracted from the 3rd information
Know, including:
The temporary variable that the 3rd packet contains is extracted from the 3rd information;
Correspondingly, second is identified in the 3rd information emoticon corresponding to being replaced into the second mark and obtains the 4th letter
Breath, including:
Obtain the interim numbering associated with temporary variable;
Temporary variable is replaced into the interim numbering associated with temporary variable in the 3rd information and obtains the 6th information;
Obtain emoticon corresponding to interim numbering;
Interim numbering is replaced into emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th information.
Further, first word corresponding to emoticon is identified as, the language format of word is original language form;
The first mark that the emoticon is replaced into for identifying emoticon is obtained into the second information in the first information,
Including:
According to the emoticon, the attribute information of the emoticon is obtained;
At least one word according to corresponding to the attribute information of the emoticon obtains the emoticon;
The each word emoticon in the first information being replaced into respectively at least one word, obtains each word
Second information corresponding to language.
Further, according to the emoticon, the attribute information of the emoticon is obtained, including:
According to the icon data of the emoticon, the emoticon is obtained from icon data and the corresponding relation of call number
Call number;
According to the call number of the emoticon, obtained from the corresponding relation of call number corresponding to original language and attribute information
The attribute information of the emoticon.
Further, at least one word according to corresponding to the attribute information of the emoticon obtains the emoticon, bag
Include:
Calculate the attribute information of the emoticon similarity between each attribute information in semantic dictionary respectively, language
Adopted dictionary is used for the corresponding relation of attribute information storage and word;
The similarity obtained from semantic dictionary between the attribute information of the emoticon meets preparatory condition at least
One attribute information;
Word corresponding to each attribute information at least one attribute information is obtained from semantic dictionary.
Further, at least one word according to corresponding to the attribute information of the emoticon obtains emoticon, including:
Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;
The synonym or near synonym of word corresponding to the emoticon are obtained, and using synonym and near synonym as emoticon
Word corresponding to number.
Further, second mark corresponding with the first mark is extracted from the 3rd information, including:
The word of object language form corresponding to the mark of extraction first from the 3rd information, using the word of extraction as second
Mark;
Correspondingly, second is identified in the 3rd information emoticon corresponding to being replaced into the second mark and obtains the 4th letter
Breath, including:
The corresponding pass comprising the second mark is obtained in corresponding relation from attribute information corresponding to object language with call number
System;
Extract the call number included in the corresponding relation obtained;
According to call number, the icon data of the emoticon is obtained from the corresponding relation of call number and icon data;
The icon data for the emoticon that the second mark is replaced into acquisition in the 3rd information obtains the 4th information.
Further, second mark corresponding with the first mark is extracted from the 3rd information, including:
The word of object language form corresponding to the mark of extraction first from the 3rd information, using the word of extraction as second
Mark;
Correspondingly, second is identified in the 3rd information emoticon corresponding to being replaced into the second mark and obtains the 4th letter
Breath, including:
Obtain first mark corresponding with the second mark;
The corresponding relation for including the first mark is obtained from attribute information corresponding to original language and the corresponding relation of call number;
Extract the call number included in the corresponding relation obtained;
According to call number, the icon data of the emoticon is obtained from the corresponding relation of call number and icon data;
The icon data that second mark is replaced into the emoticon in the 3rd information obtains the 4th information.
In embodiments of the present invention, the emoticon included in the first information of original language form is obtained;In the first information
Middle the first mark that the emoticon is replaced into for identifying the emoticon obtains the second information;Second information is translated as
3rd information of object language form;Second mark corresponding with the first mark is extracted from the 3rd information;In the 3rd information
It is middle by second mark be replaced into the second mark corresponding to emoticon obtain the 4th information.It is achieved thereby that not by emoticon
Storehouse and the limitation of dictionary for translation, can effectively realize the high accuracy translation of emoticon, and the translation comprising emoticon is constructed in reduction
The cost such as dictionary, translation rule or translation model, language model.And can also effectively solve to be not logged in emoticon dictionary
The identification of emoticon, translation and generation problem.
Embodiment 2
The embodiments of the invention provide a kind of method of translation information, the executive agent of this method is terminal, and the translation is believed
The method of breath can be implemented in combination with as some or all of of terminal by software, hardware or both;And terminal can
Not possess any symbol database.The terminal can be mobile terminal, fixed terminal or server etc..The embodiment of the present invention
Suitable for the input equipment of user, do not possess any emoticon database scene.
Illustrated in embodiments of the present invention so that the first mark and second are identified as the temporary variable of emoticon as an example.
Referring to Fig. 2-1, wherein, this method includes:
Step 201:Obtain the emoticon included in the first information of original language form;
Wherein, step 201 can be realized by following steps (1) to (2), including:
(1):Terminal obtains the first information of the original language form of user's input;
Wherein, user is to the first information of terminal input source language format, the original language form that terminal acquisition user inputs
The first information, the first information includes at least one sentence to be translated.
User can use the mode that is manually entered to input the first information to terminal in embodiments of the present invention, can also use
Replicate the mode pasted and input the first information to terminal.
Wherein, user be manually entered mode can be the input of document form, phonetic entry, input through keyboard, touch input,
One or more in handwriting input, optical character identification input.
Wherein, original language can be any languages, and original language is not especially limited in embodiments of the present invention.For example,
Using original language as Chinese, the first information for " Valentine's Day, give you+rose emoticon and+lollipop emoticon!+ titter expression
Symbol!" exemplified by illustrate, referring to Fig. 2-2.Then in this step, terminal obtain user input the first information " Valentine's Day,
Give you+rose emoticon and+lollipop emoticon!+ titter emoticon!”.
(2):Whether emoticon is included in the detection first information, if including the table included in the acquisition first information
Feelings symbol.
Wherein, emoticon can include one in letter, numeral, punctuate, phonetic, assumed name, font, classification, image etc.
The symbol with certain semantic that individual or multiple symbols are formed.
Because emoticon is different with the coded format of word, therefore, detect in the first information whether include emoticon
The step of can be:
Determine whether comprising the content that coded format is pre-arranged code form in the first information, if comprising, it is determined that the
Emoticon is included in one information, and the content of pre-arranged code form is emoticon;If do not include, it is determined that the first letter
Emoticon is not included in breath.
Pre-arranged code form can be coded format etc. corresponding to picture format or symbol string.
For example, determine the first information " Valentine's Day, give you+rose emoticon and+lollipop emoticon!+ titter expression
Symbol!" in include emoticon, obtain the emoticon " rose emoticon " that the first information includes, " lollipop emoticon
Number " and " titter emoticon ".
Further, terminal has preserves function in real time, when getting the emoticon included in the first information, by first
The word and emoticon included in information is stored according to preset data structure, for example, preset data structure can be to breathe out
Uncommon array " $ hash [key]=$ value ", or chained list etc..
Step 202:Interim numbering is distributed for the emoticon;
According to storage order of the emoticon in the first information, interim numbering is distributed for the emoticon.And it is
After the interim numbering of emoticon distribution, the interim numbering of the emoticon and the emoticon is associated, also will the expression
The interim numbering of symbol and the emoticon be stored in emoticon and the corresponding relation numbered temporarily in.
For example, " rose emoticon ", " lollipop emoticon " and " titter emoticon " depositing in the first information
Storage order is followed successively by 1,2,3;It is respectively then " rose emoticon " " lollipop emoticon " and " titter emoticon " distribution
Interim numbering temp001, temp002 and temp003, associate " rose emoticon " and temp001, " lollipop emoticon "
And temp002, " titter emoticon " and temp003, also will " rose emoticon " and temp001, " lollipop emoticon
Number " and temp002, " titter emoticon " and temp003 are stored in the corresponding relation of interim numbering of emoticon, as follows
Shown in table 1:
Table 1
Emoticon | Interim numbering |
Rose | Temp001 |
Lollipop | Temp002 |
Titter | Temp003 |
…… | …… |
Step 203:The interim numbering that the emoticon is replaced into the emoticon in the first information obtains the 5th letter
Breath;
For example, the first information " Valentine's Day, give you+rose emoticon and+lollipop emoticon!+ titter emoticon
Number!" in " rose emoticon " is replaced into the interim numbering temp001 of " rose emoticon ", by " lollipop emoticon
Number " the interim numbering temp002 that " lollipop emoticon " is replaced into " lollipop emoticon " is replaced into, will " titter expression
Symbol " is replaced into the emoticon that titters " interim numbering temp003, obtain the 5th information for " Valentine's Day, give your temp001 and
temp002!temp003!”.
Step 204:It is that the emoticon distributes temporary variable according to position of the emoticon in the first information;
For example, according to " rose emoticon ", " lollipop emoticon " and " titter emoticon " is in the first information
Position, distribute temporary variable X for " rose emoticon ", temporary variable Y, Yi Jiwei distributed for " lollipop emoticon "
" titter emoticon " distribution temporary variable Z.
Can also be in embodiments of the present invention the table according to position of the emoticon in the first information and/or classification
Feelings symbol distributes temporary variable;The form of temporary variable is all identical in every kind of language format.
, can be using temporary variable sum combinatorics on words as the table if the emoticon included in the first information is more
The temporary variable of feelings symbol, a unique interim change is distributed so as to be embodied as each emoticon included in the first information
Amount.For example, temporary variable is English alphabet, because English alphabet only has 26, when the emoticon included in the first information
When number is more than 26, numeral can be obtained, the temporary variable using English alphabet sum combinatorics on words as emoticon.For example,
X0, X1, X2 ... Xn and Y1, Y2 ... are waited into the temporary variable as emoticon.
Step 205:Associate the temporary variable of the emoticon and interim numbering;
Number the temporary variable of emoticon and temporarily in the corresponding relation for being stored in temporary variable and numbering temporarily, from
And realize the temporary variable for associating the emoticon and interim numbering.
For example, by the temporary variable X of " rose emoticon " and interim numbering temp001, " lollipop emoticon "
Temporary variable Y and interim numbering temp002, and the temporary variable Z of " titter emoticon " and interim numbering temp003 storages
It is as shown in table 2 below in the corresponding relation numbered in temporary variable and temporarily:
Table 2
Temporary variable | Interim numbering |
X | Temp001 |
Y | Temp002 |
Z | Temp003 |
…… | …… |
Step 206:The interim numbering of the emoticon is replaced into the interim volume with the emoticon in the 5th information
Number associated temporary variable obtains the second information;
According to the interim numbering of the emoticon, obtained and the expression from the corresponding relation of interim numbering and temporary variable
The associated temporary variable of the interim numbering of symbol;In the 5th information, the interim numbering of the emoticon is replaced into being somebody's turn to do
The associated temporary variable of the interim numbering of emoticon obtains the second information.
For example, according to the interim numbering temp001 of " rose emoticon ", the interim numbering of " lollipop emoticon "
Temp002, and the interim numbering temp003 of " titter emoticon ", obtained respectively from table 2 associated with temp001
Temporary variable X, the temporary variable Y associated with temp002, and the temporary variable Z associated with temp003;In the 5th letter
In breath, temp001 is replaced with into X, temp002 is replaced with into Y, temp003 is replaced with into Z, it is " sweet heart to obtain the second information
Section, gives you X and Y!Z!”.
Step 207:Second information is translated as to the 3rd information of object language form;
Wherein, object language can be any languages, and object language is not especially limited in embodiments of the present invention.Step
Rapid 207 can be realized by following steps (1) to (2), including:
(1):At least one translation algorithm is selected from translation algorithm set;
Wherein, translation algorithm set includes rule-based translation algorithm, the translation algorithm of Case-based Reasoning and based on system
The translation algorithm of meter.
Any one translation algorithm is selected from translation algorithm set, or any two kinds are selected from translation algorithm set
Translation algorithm, or three kinds of translation algorithms are selected from translation algorithm set.
(2):By the translation algorithm of selection, the second information is translated as to the 3rd information of object language form;
For example, using object language as Japanese, the translation algorithm of selection to be illustrated exemplified by rule-based translation algorithm,
Then by rule-based translation algorithm, then " Valentine's Day, you is given the second information to X and Y!Z!" it is translated as the 3rd of Japanese form the
Information is " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru!Z!”;For another example, using object language as Japanese, selection
Translation algorithm be Case-based Reasoning translation algorithm, then " Valentine's Day, give you the second information to X and Y!Z!" it is translated as Japanese form
The 3rd information be " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y!Z!”.
, can be with root when the second information is translated as into three information of object language form by the translation algorithm of selection
Morphology and/or syntactic analysis are carried out to the second information according to the translation algorithm of selection.Also word only can be carried out to the second information
Method is analyzed, or only carries out syntactic analysis to the second information, morphological analysis can also be first carried out to the second information, then carry out sentence
Method is analyzed.Morphological analysis and syntactic analysis are not especially limited in embodiments of the present invention.
Wherein, rule-based translation algorithm, the translation algorithm of Case-based Reasoning and translation algorithm based on syntax are, it is necessary to right
Second information carries out morphological analysis and syntactic analysis;And some translation algorithms can only to the second information carry out morphological analysis or
Person's syntactic analysis.
Wherein, the instrument of morphological analysis and syntactic analysis is a lot, such as participle instrument can be Stanford POS
Tagger (English Chinese Arabic), Computer Department of the Chinese Academy of Science ICTCLAS Chinese analysises system, the thulac participles of Tsing-Hua University
ChaSen, Mecab, JUMAN etc. of system, Japanese.Such as syntax participle instrument can be Stanford Parse (English Chinese
Arabic), Harbin Institute of Technology's Chinese parsing device, the parser such as Cabocha, KNP of Japanese, in embodiments of the present invention
Lexical analysis tool and syntactic analysis instrument are not specifically limited.
For example, " Valentine's Day, give you X and Y to the second information using the thulac participle instruments of Tsing-Hua University!Z!" divided
Word, obtained word segmentation result for " Valentine's Day/t ,/w gives/v you/r X/x and/cY/x!/w Z/x!/w”
Wherein, the part of speech in word segmentation result is as shown in table 3:
Table 3
Symbol | Part of speech | Symbol | Part of speech | Symbol | Part of speech |
n | Noun | s | Place word | r | Pronoun |
np | Name | v | Verb | c | Conjunction |
ns | Place name | vm | Modal verb | p | Preposition |
ni | Mechanism name | vd | Directional verb | u | Auxiliary word |
nz | Other proper names | a | Adjective | y | Auxiliary words of mood |
m | Number | d | Adverbial word | e | Interjection |
q | Measure word | h | Enclitics | o | Onomatopoeia |
mq | Numeral-classifier compound | k | It is followed by composition | g | Morpheme |
t | Time word | i | Idiom | w | Punctuate |
f | The noun of locality | j | Referred to as | x | It is other |
For example, go " Valentine's Day, to give you X and Y to the second information using Harbin Institute of Technology's Chinese parsing!Z!" carry out interdependent sentence
Method is analyzed, analysis result such as Fig. 2-3 and Fig. 2-4 of obtained dependency analysis tree.Chinese dependency tree and phrase structure tree mutually turn
Change technology relative maturity.Phrase-based structure tree or statistical machine translation method based on dependency structure tree have tree to string mould
Type, string to tree-model, tree to tree-model, forest arrive string model etc..Described syntactic analysis result, it is same suitable with reference to the present invention
For described syntax-based SMT method and Case-based design method.
Step 208:The temporary variable that the 3rd packet contains is extracted from the 3rd information;
For example, from the 3rd information " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru!Z!" or " バ レ
Application タ イ ン デ ー は, あ な To Gifts り, X と Y!Z!" in the 3rd packet of extraction the temporary variable X, Y and the Z that contain.
Step 209:Obtain the interim numbering associated with temporary variable;
According to the temporary variable of acquisition, obtained in the corresponding relation numbered from temporary variable and temporarily related to temporary variable
The interim numbering of connection.
For example, according to temporary variable X, Y and Z, obtained respectively from the temporary variable of table 2 and the corresponding relation numbered temporarily
Temporary variable X interim numbering temp001, temporary variable Y interim numbering temp002, and temporary variable Z interim numbering
temp003。
Step 210:Temporary variable is replaced into the interim numbering associated with temporary variable in the 3rd information and obtains
Six information;
For example, in the 3rd information " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru!Z!" in will become temporarily
Amount X is replaced into the interim numbering temp001 associated with temporary variable X, temporary variable Y is replaced into related to temporary variable Y
The interim numbering temp002 of connection, temporary variable Z is replaced into the interim numbering temp003 associated with temporary variable Z, obtained
6th information is " バ レ Application タ イ ン デ ー は, あ な To temp001 と temp002 The and え Ru!temp003”;For another example,
In the 3rd information " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y!Z!" in temporary variable X is replaced into becoming temporarily
The associated interim numbering temp001 of X are measured, temporary variable Y is replaced into the interim numbering associated with temporary variable Y
Temp002, temporary variable Z is replaced into the interim numbering temp003 associated with temporary variable Z, it is " バ to obtain the 6th information
レ Application タ イ ン デ ー は, あ な To Gifts り, temp001 と temp002!temp003!”.
Step 211:Obtain emoticon corresponding to interim numbering;
According to interim numbering, expression corresponding to the interim numbering is obtained from the corresponding relation of interim numbering and emoticon
Symbol.
For example, according to interim numbering temp001, temp002 and temp003, it is corresponding that temp001 is obtained from table 1 respectively
Emoticon " rose emoticon ", emoticon corresponding to temp002 " lollipop emoticon ", corresponding to temp003
Emoticon " titter emoticon ".
Step 212:Interim numbering is replaced into emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th letter
Breath.
For example, in the 6th information " バ レ Application タ イ ン デ ー は, あ な To temp001 と temp002 The and え Ru!
In temp003 ", interim numbering temp001 is replaced into " rose emoticon ", interim numbering temp002 is replaced into " stick
Sugared emoticon ", it is " バ レ Application タ イ Application that interim numbering temp003, which is replaced into " titter emoticon " to obtain the 4th information,
デ ー は, あ な To+rose emoticon と+lollipop emoticon The and え Ru!+ titter emoticon ", such as Fig. 2-5
It is shown.For another example, in the 6th information " バ レ Application タ イ ン デ ー は, あ な To Gifts り, temp001 と temp002!
temp003!" in, interim numbering temp001 is replaced into " rose emoticon ", interim numbering temp002 is replaced into " rod
Lollipop emoticon ", it is " バ レ Application タ イ that interim numbering temp003, which is replaced into " titter emoticon " to obtain the 4th information,
ン デ ー は, あ な To Gifts り ,+rose emoticon と+lollipop emoticon!+ titter emoticon!", as Fig. 2-6 institute
Show.
According to the 4th information " バ レ Application タ イ ン デ ー は, あ な To+rose emoticon と+lollipop emoticon
Number The and え Ru!+ titter emoticon " and " バ レ Application タ イ ン デ ー は, あ な To Gifts り ,+rose emoticon と+rod
Lollipop emoticon!+ titter emoticon!" as can be seen that although the translation algorithm of selection is different, cause translation result different,
But have no effect on the effect of the present invention.
Further, interim numbering is replaced into emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th letter
After breath, the 4th information is exported.Can be with one in word, image, voice, emoticon etc. when the 4th information is exported
Plant or more than one are exported.The way of output of the 4th information is not especially limited in embodiments of the present invention.
In embodiments of the present invention, the emoticon included in the first information by obtaining original language form;First
Emoticon is replaced into information and obtains the second information for identifying the temporary variable of the emoticon;Second information is translated
For the 3rd information of object language form;Temporary variable is extracted from the 3rd information;Temporary variable is replaced in the 3rd information
The 4th information is obtained for corresponding emoticon.It is achieved thereby that do not limited by emoticon storehouse and dictionary for translation, can be effective
Realize emoticon high accuracy translation, reduction construct the dictionary for translation comprising emoticon, translation rule or translation model,
The costs such as language model.And can also effectively solve identification, translation and the life of the emoticon being not logged in emoticon dictionary
It is problematic.And can also solve emoticon in the judgement of the structural position at object language end and generation problem.In target
Language end, the position where emoticon is correctly identified, ensure the sentence structure of translation result and semantic integrality.And
The present invention is not limited by languages, can effectively solve the identification, translation and generation problem of the emoticon of any languages.
Embodiment 3
The embodiments of the invention provide a kind of method of translation information, the executive agent of this method is terminal, and the translation is believed
The method of breath can be implemented in combination with as some or all of of terminal by software, hardware or both.The terminal can be with
For mobile terminal, fixed terminal or server etc..The embodiment of the present invention is applied to the input equipment of user, possesses certain given
The situations such as kind symbol expression database, synonymicon.
Emoticon corresponding word in original language is identified as with first in embodiments of the present invention, second is identified as table
Feelings symbol illustrates in object language exemplified by corresponding word.Referring to Fig. 3-1, wherein, this method includes:
Step 301:Obtain the emoticon included in the first information of original language form;
Wherein, step 301 can be realized by following steps (1) to (2), including:
(1):Terminal obtains the first information of the original language form of user's input;
Wherein, user is to the first information of terminal input source language format, the original language form that terminal acquisition user inputs
The first information, the first information includes at least one sentence to be translated.
User can use the mode that is manually entered to input the first information to terminal in embodiments of the present invention, can also use
Replicate the mode pasted and input the first information to terminal.
Wherein, user be manually entered mode can be the input of document form, phonetic entry, input through keyboard, touch input,
One or more in handwriting input, optical character identification input.
Wherein, original language can be any languages, and original language is not especially limited in embodiments of the present invention.For example,
Using original language as Chinese, the first information is illustrates exemplified by " he feels very+happiness emoticon ", as shown in figure 3-2.Then
In this step, terminal obtains " he feels very+happiness emoticon " of user's input.
(2):Whether emoticon is included in the detection first information, if including the table included in the acquisition first information
Feelings symbol.
Wherein, emoticon can include one in letter, numeral, punctuate, phonetic, assumed name, font, classification, image etc.
The symbol with certain semantic that individual or multiple symbols are formed.
Because emoticon is different with the coded format of word, therefore, detect in the first information whether include emoticon
The step of can be:
Determine whether comprising the content that coded format is pre-arranged code form in the first information, if comprising, it is determined that the
Emoticon is included in one information, and the content of pre-arranged code form is emoticon;If do not include, it is determined that the first letter
Emoticon is not included in breath.
Pre-arranged code form can be coded format etc. corresponding to picture format or symbol string.
Wherein, the first information from user to terminal input source language format,
For example, determining to include emoticon in the first information " he feels very+happiness emoticon ", the first information is obtained
Comprising emoticon " happiness emoticon ".
Further, terminal has preserves function in real time, when getting the emoticon included in the first information, by first
The word and emoticon included in information is stored according to preset data structure, for example, preset data structure can be to breathe out
Uncommon array " $ hash [key]=$ value ", or chained list etc..
Further, it is interim for the emoticon distribution of acquisition after getting the emoticon included in the first information
Emoticon, is replaced into corresponding to emoticon and numbers temporarily by numbering in the first information.
The for example, interim numbering temp001 of " happiness emoticon " distribution, in the first information, " he feels very+happiness table
In feelings symbol ", " happiness emoticon " is replaced into tempo01, obtain " he feel very temp001 ".
Step 302:According to the emoticon, the attribute information of the emoticon is obtained;
Wherein, attribute information can be the word of emoticon, semanteme, classification, part of speech, structure, concept, length, title,
Expression size, form, content and/or phonetic etc..Various emoticons in the emoticon and emoticon storehouse are subjected to mould
Formula matches, and obtains the attribute information of the emoticon;Wherein, it is used to store emoticon and attribute information in emoticon storehouse
Corresponding relation.The content in emoticon storehouse can include emoticon library name, expression data total length, expression number, recently
Using emoticon, expression index, expression length, expression title, expression size, expression form, expression content, written form,
The information such as semanteme, classification, part of speech, structure, concept and display location.
Wherein, step 302 can be realized by following steps (1) and (2), including:
(1):According to the icon data of the emoticon, the expression is obtained from icon data and the corresponding relation of call number
The call number of symbol;
According to the emoticon, the icon number of the emoticon is obtained from the corresponding relation of emoticon and icon data
According to according to the icon data of the emoticon, the rope of the emoticon is obtained from icon data and the corresponding relation of call number
Quotation marks.
In embodiments of the present invention, the corresponding relation and icon number of emoticon and icon data are previously stored in terminal
According to the corresponding relation with call number;Wherein, the corresponding relation of emoticon and icon data is as shown in table 4 below, icon data with
The corresponding relation of call number is as shown in table 5 below:
Table 4
Icon data | Emoticon |
010011000111……0100100 | It is surprised |
010011000111……0100101 | It is glad |
010011000111……0100110 | Titter |
010011000111……0100111 | By force |
010011000111……0101000 | Lollipop |
… | … |
010011000111……0111000 | Rose |
Table 5
Call number | Icon data |
X…X001 | 010011000111……0100100 |
X…X002 | 010011000111……0100101 |
X…X003 | 010011000111……0100110 |
X…X004 | 010011000111……0100111 |
X…X005 | 010011000111……0101000 |
… | … |
X…X100 | 010011000111……0111000 |
For example, according to " happiness emoticon ", obtained from the emoticon and the corresponding relation of icon data in table 4
Icon data corresponding to " happiness emoticon " be 010011000111 ... 0100101;According to the figure of " happiness emoticon "
Data 010011000111 ... 0100101 are marked, from the icon data of the emoticon in table 5 and the corresponding relation of call number
Call number corresponding to obtaining " happiness emoticon " is X ... X002.
(2):According to the call number of the emoticon, from the corresponding relation of call number corresponding to original language and attribute information
Obtain the attribute information of the emoticon.
According to original language, the corresponding relation of call number and attribute information corresponding to original language is obtained, according to the emoticon
Call number, the attribute information of the emoticon is obtained from the corresponding relation of call number corresponding to original language and attribute information.
In embodiments of the present invention, terminal is previously stored the pass corresponding with attribute information of call number corresponding to every kind of language
System.For example, the corresponding relation of call number corresponding to Chinese and attribute information is as shown in table 6 below and table 7:
Table 6
Length | Title | Expression size | Form | Content | Word | Position | Call number | … |
100bytes | /jy | 16*16 | bmp | (⊙o⊙) | It is surprised | 1 | X…X001 | … |
… | /gx | 16*16 | bmp | (* ^ ﹏ ^*) | It is glad | 2 | X…X002 | … |
… | /tx | 16*16 | bmp | … | Titter | 3 | X…X003 | … |
… | /qiang | 16*16 | bmp | … | By force | 4 | X…X004 | … |
… | /bangbangt | 16*16 | bmp | … | Lollipop | 5 | X…X005 | |
… | … | … | … | … | … | … | … | … |
Table 7
Call number | Word | Title | Phonetic | Part of speech | It is semantic | … |
X…X001 | It is surprised | /jy | jinagya | adj | Emotion | … |
X…X002 | It is glad | /gx | gaoxing | adj | Emotion | … |
X…X003 | Titter | /tx | touxiao | v | Behavior | … |
X…X004 | By force | /qiang | qiang | adj | Degree | … |
X…X005 | Lollipop | /bangbangt | bagnbangtang | n | Food | … |
… | … | … | … | … | … | … |
For example, according to call number X ... X002, attribute information bag corresponding to " happiness emoticon " is obtained from table 6 and table 7
Entitled/gx is included, expression size is 16*16, and form bmp, content is (* ^ ﹏ ^*), and word is glad, position 2, phonetic
For happiness, part of speech adj, semanteme is emotion etc..For another example, by " happiness emoticon " and interim numbering temp001 composition numerical value
(temp001, happiness emoticon), pattern match then is carried out using the emoticon in the array and emoticon storehouse, from
Can be obtained in table 4 and table 5 searching number of happiness emoticon for " X ... X002 ", then according to the searching number " X ... X002 " and
Table 6 and table 7 carry out pattern match, the various attribute informations of " happiness emoticon " can be obtained, as expression length is
The entitled Happy of 100bytes, expression, expression size are 16*16, expression form is bmp, word is glad, phonetic is
Gaoxing, semanteme are emotion, part of speech is the attribute informations such as adj/ adjectives.
Step 303:At least one word according to corresponding to the attribute information of the emoticon obtains the emoticon;
Wherein, step 303 can be realized by first way or the second way, for the first implementation,
Step 303 can be realized by following steps (1) to (3), including:
(1):The attribute information for calculating the emoticon is similar between each attribute information in semantic dictionary respectively
Degree, semantic dictionary are used for the corresponding relation of attribute information storage and word;
Wherein, semantic dictionary can use synonym either near synonym dictionary or the original language and target language of original language
The dictionary for translation of speech, or translation model, language model etc..Different according to the word storehouse that uses, the method for calculating similarity can be with
Do corresponding adjustment.
The dictionary for translation of original language and object language, or translation model, language model can make the money that translation system carries
Source.Such as rule-based translation algorithm, the bilingual translation dictionary of the translation algorithm of Case-based Reasoning, the translation algorithm based on statistics
Translation model or language model etc., it is used equally for the Semantic Similarity Measurement of vocabulary, such technology relative maturity, here, no longer
Repeat.
Chinese can use Chinese thesaurus or HowNet (http://www.keenage.com/), English Synonyms/
Near synonym dictionary can use WordNet (http://wordnet.princeton.edu/).
In addition, EuroWordNet (http://www.illc.uva.nl/EuroWordNet/) it is the multi-lingual of Europe
Semantic network dictionary, go for Dutch, Italian, Spanish, German, French, Czech and Estonian
Deng the Semantic Similarity Measurement of language.
India semantic dictionary IndoWordNet (http://en.wikipedia.org/wiki/IndoWordNet) include
The semantic networks of 18 kinds of official languages of India.
Japanese can use Japanese WordNet (http://nlpwww.nict.go.jp/wn-ja/), Japanese vocabulary it is complete works of
(http://www.kecl.ntt.co.jp/icl/lirg/resources/GoiTaikei/) etc. carry out semantic similarity meter
Calculate.
Different according to languages, semantic similarity calculation method is slightly different, is not specifically limited herein;As Chinese uses
HowNet semantic similarity calculation method.Such as:Liu Qun, Li Su build and are based on《Hownet》Similarity of Words calculate [J]
Chinese computing linguistics, 2002,7 (2):59-76.
The semantic similarity calculation method of English, such as:Pedersen T,Patwardhan S,Michelizzi
J.WordNet::Similarity:Measuring the relatedness of concepts [C], Demonstration
papers at HLT-NAACL 2004.Association for Computational Linguistics,2004:38-
41.
Other language equally exist the existing technology of many Semantic Similarity Measurements and method, may be applicable to the present invention,
It will not be repeated here.
(2):The similarity obtained from semantic dictionary between the attribute information of the emoticon meets preparatory condition
At least one attribute information;
Preparatory condition can be the default value that similarity is more than predetermined threshold value or similarity maximum;Then step (2) can
Think:The attribute letter for being more than predetermined threshold value the similarity between the attribute information of the emoticon is obtained from semantic dictionary
Breath;Or the predetermined number category maximum the similarity between the attribute information of the emoticon is obtained from semantic dictionary
Property information.
Predetermined threshold value and default value can be configured as needed, in embodiments of the present invention to predetermined threshold value and
Default value is all not especially limited.
(3):Word corresponding to each attribute information at least one attribute information is obtained from semantic dictionary.
According to each attribute information at least one attribute information of acquisition, from the attribute information and word in semantic dictionary
Word corresponding to each attribute information is obtained in the corresponding relation of language.
For second of implementation, step 303 can be realized by following steps (A) and (B), including:
(A):Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;
Wherein, attribute information includes word, and the emoticon pair is extracted from attribute information corresponding to the emoticon
The word answered.
For example, the extraction word corresponding to " happiness emoticon " is from attribute information corresponding to " happiness emoticon "
" happiness ".
(B):Obtain the synonym or near synonym of word corresponding to the emoticon, and by the synonym and near synonym of acquisition
As word corresponding to the emoticon.
For example, the synonym of " happiness " is " happiness ", the near synonym of " happiness " include joyful, joyful, happy, joy, pleased
It hurry up, gambol, rouse oneself, it is smooth happy, it is proud, it is peaceful and comfortable, it is cheerful and light-hearted, it is satisfactory, it is great rejoicing, jump for joy, it is happy, it is satisfied, it is happy, it is happily, joyous, it is emerging
To put forth energy, achieve one's ambition, entertainment is happy, jubilant, happy, gladly etc..Obtained from the synonym or near synonym of " happiness " " happiness " and " pleased
Hurry up ", by word corresponding to " happiness " and " happiness " conduct " happiness emoticon ".
Step 304:Each word emoticon in the first information being replaced into respectively at least one word, is obtained
Second information corresponding to each word;
For example, in the first information " he feels very+happiness emoticon ", " happiness emoticon " is replaced into respectively
" happiness " and " happiness ", it is " he feels very delight " to obtain the second information corresponding to " happiness ", and corresponding to " happiness "
Two information are " he feels to be as cheerful as a lark ".
Step 305:Second information is translated as to the 3rd information of object language form;
Wherein, object language can be any languages, and object language is not especially limited in embodiments of the present invention.Step
Rapid 305 can be realized by following steps (1) to (2), including:
(1):At least one translation algorithm is selected from translation algorithm set;
Wherein, translation algorithm set includes rule-based translation algorithm, the translation algorithm of Case-based Reasoning and based on system
The translation algorithm of meter.
Any one translation algorithm is selected from translation algorithm set, or any two kinds are selected from translation algorithm set
Translation algorithm, or three kinds of translation algorithms are selected from translation algorithm set.
(2):By the translation algorithm of selection, the second information is translated as to the 3rd information of object language form;
For example, using object language as Japanese, the second information is " he feels very delight ", and by the second information, " he feels non-
The 3rd information that Chang Gaoxing " is translated as Japanese form is " that は と て も う れ い ", wherein " う れ い " are a day
Language word, look like for happiness;For another example, the second information is " he feels to be as cheerful as a lark ", by the second information " he feels to be as cheerful as a lark "
The 3rd information of Japanese form is translated as " that は と て も ease い ", wherein , " ease い " are a Japanese vocabulary, the meaning
For happiness.
, can be with root when the second information is translated as into three information of object language form by the translation algorithm of selection
Morphology and/or syntactic analysis are carried out to the second information according to the translation algorithm of selection.Also word only can be carried out to the second information
Method is analyzed, or only carries out syntactic analysis to the second information, morphological analysis can also be first carried out to the second information, then carry out sentence
Method is analyzed.Morphological analysis and syntactic analysis are not especially limited in embodiments of the present invention.
Wherein, rule-based translation algorithm, the translation algorithm of Case-based Reasoning and translation algorithm based on syntax are, it is necessary to right
Second information carries out morphological analysis and syntactic analysis;And some translation algorithms can only to the second information carry out morphological analysis or
Person's syntactic analysis.
Wherein, the instrument of morphological analysis and syntactic analysis is a lot, such as participle instrument can be Stanford POS
Tagger (English Chinese Arabic), Computer Department of the Chinese Academy of Science ICTCLAS Chinese analysises system, the thulac participles of Tsing-Hua University
ChaSen, Mecab, JUMAN etc. of system, Japanese.Such as syntax participle instrument can be Stanford Parse (English Chinese
Arabic), Harbin Institute of Technology's Chinese parsing device, the parser such as Cabocha, KNP of Japanese, in embodiments of the present invention
Lexical analysis tool and syntactic analysis instrument are not specifically limited.
Step 306:The word of object language form corresponding to the emoticon is extracted from the 3rd information;‘
The object language form according to corresponding to the word of original language form corresponding to the emoticon obtains the emoticon
Word, the word of object language form corresponding to the emoticon is extracted from the 3rd information.
Step 307:The word of extraction is replaced into emoticon corresponding to the word of extraction in the 3rd information and obtains
Four information.
Wherein, step 307 can be realized by first way or the second way, for the first implementation,
Step 307 can be realized by following steps (1) to (4), including:
(1):Obtained from attribute information corresponding to object language and the corresponding relation of call number and include the emoticon pair
The corresponding relation of the word for the object language form answered;
According to object language, attribute information and the corresponding relation of call number corresponding to object language are obtained;According to the expression
The word of original language form corresponding to symbol, object language form corresponding to the emoticon is obtained from the corresponding relation of acquisition
Word corresponding relation.
Wherein, the corresponding relation of call number, icon data and picture material is as shown in table 8 below corresponding to Japanese:
Table 8
Call number | Icon data | Picture material (remarks) |
Y…Y001 | 010011000111……1000100 | Happiness emoticon in Japanese |
Y…Y002 | 010011000111……1000101 | Happy emoticon in Japanese |
… | … | … |
Y…Y100 | 010011000111……1011010 | Emoticon tired out in Japanese |
Wherein, the attribute information of the emoticon in Japanese is as shown in table 9 below:
Table 9
For example, in the 3rd information, find out " う れ い " He " ease い " are the generation object of Japanese manufacturing side icon,
According to " う れ い " He " ease い " find " call number of icon corresponding to う れ い " He " ease い " difference from table 8
For " Y ... Y001 " and " Y ... Y002 ", by " Y ... Y001 " are updated to " obtains " that in that は と て も う れ い う れ い "
は と て も う れ い (Y ... Y001) ", by " Y ... Y002 " are updated to " obtains " that は と て in that は と て も ease い "
も ease い (Y ... Y002) ".
(2):Extract the call number included in the corresponding relation obtained;
Wherein, call number, the call number that extractor includes from the corresponding relation of acquisition are included in corresponding relation.
(3):According to the call number, the icon number of the emoticon is obtained from the corresponding relation of call number and icon data
According to;
Wherein, the corresponding relation of call number and icon data is stored in terminal.
(4):The icon data for the emoticon that the word is replaced into acquisition in the 3rd information obtains the 4th information.
For example, in the 3rd information " in that は と て も う れ い ", by " う れ い " are replaced into table corresponding to happiness
Feelings symbol, it is " that は と て も+happiness emoticon " to obtain the 4th information, as shown in Fig. 3-3;For another example, in the 3rd information " that
In は と て も ease い ", " ease い " are replaced into emoticon corresponding to happiness, it is " that は と て to obtain the 4th information
も+happy emoticon ", as shown in Figure 3-4.
For example, " happiness emoticon " is inserted on the right side of that は と て も, to obtain the 4th information be " that は と て も
う れ い (happiness emoticon) ", as in Figure 3-5;The right side that " happy emoticon " is inserted into that は と て も obtains
It is " that は と て も ease い (happy emoticon) " to the 4th information, as seen in figures 3-6.
For second of implementation, step 307 can be realized by following steps (A) to (E), including:
(A):Obtain the word of original language form corresponding with the emoticon;
According to the emoticon, the word of acquisition original language form corresponding with the emoticon, for example, according to " glad
The word that emoticon " obtains original language form corresponding with being somebody's turn to do " happiness emoticon " is " happiness ".
(B):Obtained from attribute information corresponding to original language and the corresponding relation of call number comprising the emoticon in source
The corresponding relation of word in language;
According to original language, attribute information and the corresponding relation of call number corresponding to original language are obtained, corresponding from acquisition is closed
The corresponding relation for including word of the emoticon in original language is obtained in system.
(C):Extract the call number included in the corresponding relation obtained;
Wherein, call number is included in corresponding relation, extracts the call number included in the corresponding relation of acquisition.
(D):According to call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
(E):The icon data that the word is replaced into the emoticon in the 3rd information obtains the 4th information.
Wherein, the icon data of the emoticon can also be inserted into a left side for the word by step (E) in the 3rd information
Side or right side etc..
, wherein it is desired to the difference of explanation, the first implementation and second of implementation is, first way
In emoticon be object language form emoticon, and the emoticon in second of implementation is original language form
Emoticon.
Further, the word of extraction is replaced into emoticon corresponding to the word of extraction in the 3rd information and obtains
After four information, the 4th information is exported.4th information can be with one kind or one kind in word, image, voice, emoticon etc.
Exported with upper type.
The first effect of the present invention, it is that the present invention can not be limited by emoticon storehouse and dictionary for translation, can be effectively real
The dictionary for translation comprising emoticon, translation rule or translation model, language are constructed in the high accuracy translation of existing emoticon, reduction
Say the costs such as model.
The second effect of the present invention, it is that the present invention can effectively solve solve the emoticon being not logged in emoticon dictionary
Identification, translation and generation problem.
The 3rd effect of the present invention, it is that the present invention can effectively solve emoticon in the structural position at object language end
Judge and generate problem.At object language end, the position where emoticon is correctly identified, ensures the sentence knot of translation result
Structure and semantic integrality.
The 4th effect of the present invention, it is that the present invention is not limited by languages, can effectively solves the emoticon of any languages
Identification, translation and generation problem.
Embodiment 4
The embodiments of the invention provide a kind of device of translation information, referring to Fig. 4, wherein, the device includes:
First acquisition module 401, the emoticon included in the first information for obtaining original language form;
First replacement module 402, in the first information by emoticon be replaced into for identify emoticon
One mark obtains the second information;
Translation module 403, for the second information to be translated as to the 3rd information of object language form;
First extraction module 404, for extracting second mark corresponding with the first mark from the 3rd information;
Second replacement module 405, for the second mark to be replaced into emoticon corresponding to the second mark in the 3rd information
Number obtain the 4th information.
Further, first temporary variable is identified as, the form of temporary variable is all identical in every kind of language format;
First replacement module 402, including:
First allocation unit, for distributing interim numbering for emoticon;
First displacement unit, the interim numbering for emoticon to be replaced into emoticon in the first information obtain the
Five information;
Second allocation unit, for being become according to position of the emoticon in the first information for emoticon distribution is interim
Amount;
Associative cell, for the temporary variable for associating emoticon and interim numbering;
Second displacement unit, faces for being replaced into the interim numbering of emoticon in the 5th information with emoticon
When the associated temporary variable of numbering obtain the second information.
Further, second it is identified as temporary variable, the first extraction module 404, including:
First extraction unit, the temporary variable contained for extracting the 3rd packet from the 3rd information;
Correspondingly, the second replacement module 405, including:
First acquisition unit, for obtaining the interim numbering associated with temporary variable;
3rd displacement unit, for temporary variable to be replaced into the interim volume associated with temporary variable in the 3rd information
Number obtain the 6th information;
Second acquisition unit, for obtaining emoticon corresponding to interim numbering;
4th displacement unit, obtained for interim numbering to be replaced into emoticon corresponding to numbering temporarily in the 6th information
To the 4th information.
Further, first word corresponding to emoticon is identified as, the language format of word is original language form;
First replacement module 402, including:
3rd acquiring unit, for according to emoticon, obtaining the attribute information of emoticon;
4th acquiring unit, at least one word according to corresponding to the attribute information of emoticon acquisition emoticon
Language;
5th displacement unit, it is each at least one word for the emoticon in the first information to be replaced into respectively
Word, obtain the second information corresponding to each word.
Further, the 3rd acquiring unit, including:
First obtains subelement, for the icon data according to emoticon, is closed from icon data is corresponding with call number
The call number of emoticon is obtained in system;
Second obtains subelement, for the call number according to emoticon, believes from call number corresponding to original language and attribute
The attribute information of emoticon is obtained in the corresponding relation of breath.
Further, the 4th acquiring unit, including:
Computation subunit, for calculate the attribute information of emoticon respectively with each attribute information in semantic dictionary it
Between similarity, semantic dictionary is used for the corresponding relation of attribute information storage and word;
3rd obtains subelement, expires for obtaining the similarity between the attribute information of emoticon from semantic dictionary
At least one attribute information of sufficient preparatory condition;
4th obtains subelement, for obtaining each attribute information pair at least one attribute information from semantic dictionary
The word answered.
Further, the 4th acquiring unit, including:
Subelement is extracted, for extracting word corresponding to emoticon from attribute information corresponding to emoticon;
5th obtains subelement, for obtaining the synonym or near synonym of word corresponding to emoticon, and by synonym
With near synonym as word corresponding to emoticon.
Further, the first extraction module 404, including:
Second extraction unit, will for the word of object language form corresponding to the mark of extraction first from the 3rd information
The word of extraction is as the second mark;
Correspondingly, the second replacement module 405, including:
5th acquiring unit, included for being obtained from the corresponding relation of attribute information corresponding to object language and call number
The corresponding relation of second mark;
3rd extraction unit, for extracting the call number included in the corresponding relation obtained;
6th acquiring unit, for according to call number, emoticon to be obtained from the corresponding relation of call number and icon data
Number icon data;
6th displacement unit, the icon data of the emoticon for the second mark to be replaced into acquisition in the 3rd information
Obtain the 4th information.
Further, the first extraction module 404, including:
3rd extraction unit, will for the word of object language form corresponding to the mark of extraction first from the 3rd information
The word of extraction is as the second mark;
Correspondingly, the second replacement module 405, including:
7th acquiring unit, for obtaining first mark corresponding with the second mark;
8th acquiring unit, for being obtained from the corresponding relation of attribute information corresponding to original language and call number comprising the
The corresponding relation of one mark;
4th extraction unit, for extracting the call number included in the corresponding relation obtained;
9th acquiring unit, for according to call number, emoticon to be obtained from the corresponding relation of call number and icon data
Number icon data;
7th displacement unit, the icon data for the second mark to be replaced into emoticon in the 3rd information obtain the
Four information.
The first effect of the present invention, it is that the present invention can not be limited by emoticon storehouse and dictionary for translation, can be effectively real
The dictionary for translation comprising emoticon, translation rule or translation model, language are constructed in the high accuracy translation of existing emoticon, reduction
Say the costs such as model.
The second effect of the present invention, it is that the present invention can effectively solve solve the emoticon being not logged in emoticon dictionary
Identification, translation and generation problem.
The 3rd effect of the present invention, it is that the present invention can effectively solve emoticon in the structural position at object language end
Judge and generate problem.At object language end, the position where emoticon is correctly identified, ensures the sentence knot of translation result
Structure and semantic integrality.
The 4th effect of the present invention, it is that the present invention is not limited by languages, can effectively solves the emoticon of any languages
Identification, translation and generation problem.
It should be noted that:The method of the translation information provided in above-described embodiment is in translation information, only with above-mentioned each
The division progress of functional module, can be as needed and by above-mentioned function distribution by different work(for example, in practical application
Energy module is completed, i.e., the internal structure of the device of translation information is divided into different functional modules, described above to complete
All or part of function.In addition, the device of translation information and the embodiment of the method category of translation information that above-described embodiment provides
In same design, its specific implementation process refers to embodiment of the method, repeated no more here.
It should be added that translation information method of the invention and translation information device are not specific for two kinds
Language and propose that there is general applicability with the inventive method.The present invention can equally be well applied to other language pair.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment
To complete, by program the hardware of correlation can also be instructed to complete, described program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.
Claims (8)
- A kind of 1. method of translation information, it is characterised in that methods described includes:Obtain the emoticon included in the first information of original language form;The first mark that the emoticon is replaced into for identifying the emoticon is obtained the in the first information Two information, described first is identified as word corresponding to temporary variable or the emoticon;Second information is translated as to the 3rd information of object language form;Second mark corresponding with the described first mark is extracted from the 3rd information;In the 3rd information by described second mark be replaced into it is described second mark corresponding to emoticon obtain the 4th letter Breath;When described first is identified as temporary variable, the form of the temporary variable is all identical in every kind of language format;It is described The first mark that the emoticon is replaced into for identifying the emoticon is obtained into the second letter in the first information Breath, including:Interim numbering is distributed for the emoticon;The emoticon is replaced into the emoticon in the first information Number interim numbering obtain the 5th information;It is the emoticon according to position of the emoticon in the first information Distribute temporary variable;Associate the temporary variable of the emoticon and interim numbering;By the expression in the 5th information The interim numbering of symbol is replaced into the temporary variable associated with the interim numbering of the emoticon and obtains the second information;When described first is identified as word corresponding to the emoticon, the language format of the word is original language form; It is described that the first mark that the emoticon is replaced into for identifying the emoticon is obtained the in the first information Two information, including:According to the emoticon, the attribute information of the emoticon is obtained;Obtained according to the attribute information of the emoticon Take at least one word corresponding to the emoticon;The emoticon in the first information is replaced into respectively described Each word at least one word, obtain the second information corresponding to each word.
- 2. the method as described in claim 1, it is characterised in that described second is identified as temporary variable, described from the described 3rd Second mark corresponding with the described first mark is extracted in information, including:The temporary variable that the 3rd packet contains is extracted from the 3rd information;Correspondingly, it is described that the described second mark is replaced into emoticon corresponding to second mark in the 3rd information The 4th information is obtained, including:Obtain the interim numbering associated with the temporary variable;The temporary variable is replaced into the interim numbering associated with the temporary variable in the 3rd information and obtains Six information;Obtain emoticon corresponding to the interim numbering;The interim numbering is replaced into the emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th letter Breath.
- 3. the method as described in claim 1, it is characterised in that it is described according to the emoticon, obtain the emoticon Attribute information, including:According to the icon data of the emoticon, the emoticon is obtained from icon data and the corresponding relation of call number Call number;According to the call number of the emoticon, institute is obtained from the corresponding relation of call number corresponding to original language and attribute information State the attribute information of emoticon.
- 4. the method as described in claim 1, it is characterised in that described according to obtaining the attribute information of the emoticon At least one word corresponding to emoticon, including:The attribute information of the emoticon similarity between each attribute information in semantic dictionary respectively is calculated, it is described Semantic dictionary is used for the corresponding relation of attribute information storage and word;From the semantic dictionary obtain and the attribute information of the emoticon between similarity meet preparatory condition to A few attribute information;Word corresponding to each attribute information at least one attribute information is obtained from the semantic dictionary.
- 5. the method as described in claim 1, it is characterised in that described according to obtaining the attribute information of the emoticon At least one word corresponding to emoticon, including:Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;The synonym or near synonym of word corresponding to the emoticon are obtained, and using the synonym and near synonym as described in Word corresponding to emoticon.
- 6. the method as described in claim 1, it is characterised in that the extraction from the 3rd information and the described first mark The second corresponding mark, including:The word of object language form corresponding to first mark is extracted from the 3rd information, by the word of the extraction As the second mark;Correspondingly, it is described that the described second mark is replaced into emoticon corresponding to second mark in the 3rd information The 4th information is obtained, including:The corresponding pass comprising the described second mark is obtained in corresponding relation from attribute information corresponding to object language with call number System;Extract the call number included in the corresponding relation of the acquisition;According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;The icon data for the emoticon that the described second mark is replaced into the acquisition in the 3rd information obtains the 4th Information.
- 7. the method as described in claim 1, it is characterised in that the extraction from the 3rd information and the described first mark The second corresponding mark, including:The word of object language form corresponding to first mark is extracted from the 3rd information, by the word of the extraction As the second mark;Correspondingly, it is described that the described second mark is replaced into emoticon corresponding to second mark in the 3rd information The 4th information is obtained, including:Obtain first mark corresponding with the described second mark;The corresponding relation for including the described first mark is obtained from attribute information corresponding to original language and the corresponding relation of call number;Extract the call number included in the corresponding relation of the acquisition;According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;The icon data that described second mark is replaced into the emoticon in the 3rd information obtains the 4th information.
- 8. a kind of device of translation information, it is characterised in that described device includes:First acquisition module, the emoticon included in the first information for obtaining original language form;First replacement module, for the emoticon to be replaced into for identifying the emoticon in the first information First mark obtain the second information, described first is identified as word corresponding to temporary variable or the emoticon;Translation module, for second information to be translated as to the 3rd information of object language form;First extraction module, for extracting second mark corresponding with the described first mark from the 3rd information;Second replacement module, for the described second mark to be replaced into table corresponding to second mark in the 3rd information Feelings symbol obtains the 4th information;When described first is identified as temporary variable, the form of the temporary variable is all identical in every kind of language format;It is described First replacement module, it is additionally operable to distribute interim numbering for the emoticon;By the emoticon in the first information The interim numbering for being replaced into the emoticon obtains the 5th information;According to position of the emoticon in the first information It is set to the emoticon distribution temporary variable;Associate the temporary variable of the emoticon and interim numbering;The described 5th The interim numbering of the emoticon is replaced into the temporary variable associated with the interim numbering of the emoticon in information Obtain the second information;When described first is identified as word corresponding to the emoticon, the language format of the word is original language form; First replacement module, it is additionally operable to, according to the emoticon, obtain the attribute information of the emoticon;According to the table The attribute information of feelings symbol obtains at least one word corresponding to the emoticon;By the expression in the first information Symbol is replaced into each word at least one word respectively, obtains the second information corresponding to each word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510119654.0A CN104699675B (en) | 2015-03-18 | 2015-03-18 | The method and apparatus of translation information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510119654.0A CN104699675B (en) | 2015-03-18 | 2015-03-18 | The method and apparatus of translation information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104699675A CN104699675A (en) | 2015-06-10 |
CN104699675B true CN104699675B (en) | 2018-01-30 |
Family
ID=53346814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510119654.0A Active CN104699675B (en) | 2015-03-18 | 2015-03-18 | The method and apparatus of translation information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104699675B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9928236B2 (en) * | 2015-09-18 | 2018-03-27 | Mcafee, Llc | Systems and methods for multi-path language translation |
CN106708810A (en) * | 2016-12-19 | 2017-05-24 | 新译信息科技(深圳)有限公司 | Machine translation method, device and terminal device |
CN110688840B (en) * | 2019-09-26 | 2022-07-26 | 联想(北京)有限公司 | Text conversion method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1655231A (en) * | 2004-02-10 | 2005-08-17 | 乐金电子(中国)研究开发中心有限公司 | Expression figure explanation treatment method for text and voice transfer system |
US6963839B1 (en) * | 2000-11-03 | 2005-11-08 | At&T Corp. | System and method of controlling sound in a multi-media communication application |
CN101030368A (en) * | 2006-03-03 | 2007-09-05 | 国际商业机器公司 | Method and system for communicating across channels simultaneously with emotion preservation |
CN101937431A (en) * | 2010-08-18 | 2011-01-05 | 华南理工大学 | Emotional voice translation device and processing method |
-
2015
- 2015-03-18 CN CN201510119654.0A patent/CN104699675B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6963839B1 (en) * | 2000-11-03 | 2005-11-08 | At&T Corp. | System and method of controlling sound in a multi-media communication application |
CN1655231A (en) * | 2004-02-10 | 2005-08-17 | 乐金电子(中国)研究开发中心有限公司 | Expression figure explanation treatment method for text and voice transfer system |
CN101030368A (en) * | 2006-03-03 | 2007-09-05 | 国际商业机器公司 | Method and system for communicating across channels simultaneously with emotion preservation |
CN101937431A (en) * | 2010-08-18 | 2011-01-05 | 华南理工大学 | Emotional voice translation device and processing method |
Also Published As
Publication number | Publication date |
---|---|
CN104699675A (en) | 2015-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vilares et al. | Universal, unsupervised (rule-based), uncovered sentiment analysis | |
US10031910B1 (en) | System and methods for rule-based sentiment analysis | |
Sidorov et al. | Empirical study of machine learning based approach for opinion mining in tweets | |
CN104881402B (en) | The method and device of Chinese network topics comment text semantic tendency analysis | |
Vandeghinste et al. | Translating text into pictographs | |
CN102122297A (en) | Semantic-based Chinese network text emotion extracting method | |
CN106844348B (en) | Method for analyzing functional components of Chinese sentences | |
KR20100035940A (en) | System for extraction and analysis of opinion in web documents and method thereof | |
CN107291680A (en) | A kind of system and implementation method that automatically generate composition based on template | |
Svoboda et al. | New word analogy corpus for exploring embeddings of Czech words | |
Outahajala et al. | Building an annotated corpus for Amazighe | |
CN109460552A (en) | Rule-based and corpus Chinese faulty wording automatic testing method and equipment | |
CN106446147A (en) | Emotion analysis method based on structuring features | |
Priyadarshi et al. | Towards the first Maithili part of speech tagger: Resource creation and system development | |
CN104699675B (en) | The method and apparatus of translation information | |
Hamdi et al. | POS-tagging of Tunisian dialect using standard Arabic resources and tools | |
CN111259661B (en) | New emotion word extraction method based on commodity comments | |
Gîfu et al. | Multi-dimensional analysis of political language | |
Alotaiby et al. | Arabic vs. English: Comparative statistical study | |
Lin et al. | Developing a chunk-based grammar checker for translated English sentences | |
CN111914533A (en) | Method and system for analyzing English long sentence | |
CN105045784A (en) | English expression access device method and device | |
KR102182248B1 (en) | System and method for checking grammar and computer program for the same | |
Alam et al. | Multi-lingual author identification and linguistic feature extraction—A machine learning approach | |
Fonseca et al. | An architecture for semantic role labeling on portuguese |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |