CN104699675B - The method and apparatus of translation information - Google Patents

The method and apparatus of translation information Download PDF

Info

Publication number
CN104699675B
CN104699675B CN201510119654.0A CN201510119654A CN104699675B CN 104699675 B CN104699675 B CN 104699675B CN 201510119654 A CN201510119654 A CN 201510119654A CN 104699675 B CN104699675 B CN 104699675B
Authority
CN
China
Prior art keywords
emoticon
information
mark
word
replaced
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510119654.0A
Other languages
Chinese (zh)
Other versions
CN104699675A (en
Inventor
徐金安
赵雁榕
韩晓光
肖冰
徐凡
陈钰枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN201510119654.0A priority Critical patent/CN104699675B/en
Publication of CN104699675A publication Critical patent/CN104699675A/en
Application granted granted Critical
Publication of CN104699675B publication Critical patent/CN104699675B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of method and apparatus of translation information, belong to natural language processing research field.Method includes:Obtain the emoticon included in the first information of original language form;The first mark that the emoticon is replaced into for identifying the emoticon is obtained into the second information in the first information;Second information is translated as to the 3rd information of object language form;Second mark corresponding with the first mark is extracted from the 3rd information;In the 3rd information by second mark be replaced into the second mark corresponding to emoticon obtain the 4th information.Device includes:First acquisition module, the first replacement module, translation module, the first extraction module and the second replacement module.Realize and do not limited by emoticon storehouse and dictionary for translation, the high accuracy of emoticon is translated, the costs such as the dictionary for translation comprising emoticon, translation rule, translation model and language model are constructed in reduction, solve the identification, translation and generation problem of the emoticon being not logged in emoticon dictionary.

Description

The method and apparatus of translation information
Technical field
The present invention relates to natural language processing research field, more particularly to a kind of method and apparatus of translation information.
Background technology
Currently, as the development of computer network and the communication technology, mobile terminal are increasingly popularized, Email, short message, The more and more routine works and life for penetrating into people of the various social medias such as Facebook, QQ, wechat, microblogging.In people Daily exchange activity in, short text information largely occurs, and is mingled with various table being made up of multiple symbol string in word Feelings symbol.
On the other hand, the communication space of people, the exchange of people are constantly expanded in the development of Internet technology and the communication technology Become more and more transnational, it is the important means that people carry out cross-cultural communication that information is carried out into translation, especially when user's When foreign country's language is poor, when seeing the information of foreign country's language, typically information can be translated into target language using machine translation Speech.Wherein, substantial amounts of emoticon may be included in information, machine translation is often translated emoticon using dictionary for translation Into object language, wherein dictionary for translation includes the word of emoticon object language form corresponding with its.
During the present invention is realized, inventor has found that prior art at least has problems with:
Because emoticon is continually changing, cause to construct the time-consuming of dictionary for translation, cost is high, when in information Certain emoticon not in dictionary for translation, translation model or translation instance when, the emoticon can not just be translated.
The content of the invention
In order to solve the problems of the prior art, the invention provides a kind of method of translation information.The technical scheme It is as follows:
On the one hand, the invention provides a kind of method of translation information, methods described to include:
Obtain the emoticon included in the first information of original language form;
The emoticon is replaced into the first information and identified for identifying the first of the emoticon To the second information;
Second information is translated as to the 3rd information of object language form;
Second mark corresponding with the described first mark is extracted from the 3rd information;
In the 3rd information by the described second mark be replaced into second mark corresponding to emoticon obtain the Four information.
Further, described first be identified as temporary variable, the form of the temporary variable in every kind of language format all It is identical;
It is described that the emoticon is replaced into the first mark for identifying the emoticon in the first information Knowledge obtains the second information, including:
Interim numbering is distributed for the emoticon;
The interim numbering that the emoticon is replaced into the emoticon in the first information obtains the 5th letter Breath;
It is that the emoticon distributes temporary variable according to position of the emoticon in the first information;
Associate the temporary variable of the emoticon and interim numbering;
The interim numbering of the emoticon is replaced into the interim volume with the emoticon in the 5th information Number associated temporary variable obtains the second information.
Further, described second temporary variable, the extraction from the 3rd information and the described first mark are identified as Second mark corresponding to sensible, including:
The temporary variable that the 3rd packet contains is extracted from the 3rd information;
Correspondingly, it is described that the described second mark is replaced into expression corresponding to second mark in the 3rd information Symbol obtains the 4th information, including:
Obtain the interim numbering associated with the temporary variable;
The temporary variable is replaced into associated with the temporary variable interim number in the 3rd information To the 6th information;
Obtain emoticon corresponding to the interim numbering;
In the 6th information by the interim numbering be replaced into the interim numbering corresponding to emoticon obtain the Four information.
Further, described first word corresponding to the emoticon is identified as, the language format of the word is source Language format;
It is described that the emoticon is replaced into the first mark for identifying the emoticon in the first information Knowledge obtains the second information, including:
According to the emoticon, the attribute information of the emoticon is obtained;
At least one word according to corresponding to the attribute information of the emoticon obtains the emoticon;
The emoticon in the first information is replaced into each word at least one word respectively, Obtain the second information corresponding to each word.
Further, it is described according to the emoticon, the attribute information of the emoticon is obtained, including:
According to the icon data of the emoticon, the expression is obtained from icon data and the corresponding relation of call number The call number of symbol;
According to the call number of the emoticon, obtained from the corresponding relation of call number corresponding to original language and attribute information Take the attribute information of the emoticon.
Further, it is described at least one according to corresponding to the attribute information of the emoticon obtains the emoticon Word, including:
The attribute information of the emoticon similarity between each attribute information in semantic dictionary respectively is calculated, The semantic dictionary is used for the corresponding relation of attribute information storage and word;
The similarity obtained from the semantic dictionary between the attribute information of the emoticon meets preparatory condition At least one attribute information;
Word corresponding to each attribute information at least one attribute information is obtained from the semantic dictionary.
Further, it is described at least one according to corresponding to the attribute information of the emoticon obtains the emoticon Word, including:
Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;
Obtain the synonym or near synonym of word corresponding to the emoticon, and using the synonym and near synonym as Word corresponding to the emoticon.
Further, the extraction from the 3rd information second mark corresponding with the described first mark, including:
The word of object language form corresponding to first mark is extracted from the 3rd information, by the extraction Word is as the second mark;
Correspondingly, it is described that the described second mark is replaced into expression corresponding to second mark in the 3rd information Symbol obtains the 4th information, including:
Pair for including the described second mark is obtained from attribute information corresponding to object language and the corresponding relation of call number It should be related to;
Extract the call number included in the corresponding relation of the acquisition;
According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
The icon data for the emoticon that the described second mark is replaced into the acquisition in the 3rd information obtains 4th information.
Further, the extraction from the 3rd information second mark corresponding with the described first mark, including:
The word of object language form corresponding to first mark is extracted from the 3rd information, by the extraction Word is as the second mark;
Correspondingly, it is described that the described second mark is replaced into expression corresponding to second mark in the 3rd information Symbol obtains the 4th information, including:
Obtain first mark corresponding with the described second mark;
Obtained in corresponding relation from attribute information corresponding to original language with call number comprising the corresponding of the described first mark Relation;
Extract the call number included in the corresponding relation of the acquisition;
According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
The icon data that described second mark is replaced into the emoticon in the 3rd information obtains the 4th letter Breath.
On the other hand, the invention provides a kind of device of translation information, described device to include:
First acquisition module, the emoticon included in the first information for obtaining original language form;
First replacement module, for the emoticon to be replaced into for identifying the expression in the first information First mark of symbol obtains the second information;
Translation module, for second information to be translated as to the 3rd information of object language form;
First extraction module, for extracting second mark corresponding with the described first mark from the 3rd information;
Second replacement module, it is corresponding for the described second mark to be replaced into second mark in the 3rd information Emoticon obtain the 4th information.
In embodiments of the present invention, the emoticon included in the first information of original language form is obtained;In the first information Middle the first mark that the emoticon is replaced into for identifying the emoticon obtains the second information;Second information is translated as 3rd information of object language form;Second mark corresponding with the first mark is extracted from the 3rd information;In the 3rd information It is middle by second mark be replaced into the second mark corresponding to emoticon obtain the 4th information.It is achieved thereby that not by emoticon Storehouse and the limitation of dictionary for translation, can effectively realize the high accuracy translation of emoticon, and the translation comprising emoticon is constructed in reduction The cost such as dictionary, translation rule or translation model, language model.And can also effectively solve to be not logged in emoticon dictionary The identification of emoticon, translation and generation problem.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, make required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing.
Fig. 1 is a kind of method flow diagram of the method for translation information that the embodiment of the present invention 1 provides;
Fig. 2-1 is a kind of method flow diagram of the method for translation information that the embodiment of the present invention 2 provides;
Fig. 2-2 is a kind of schematic diagram for first information that the embodiment of the present invention 2 provides;
Fig. 2-3 is the schematic diagram for the second information segment analysis that the embodiment of the present invention 2 provides;
Fig. 2-4 is the schematic diagram for the second information segment analysis that the embodiment of the present invention 2 provides;
Fig. 2-5 is a kind of schematic diagram for 4th information that the embodiment of the present invention 2 provides;
Fig. 2-6 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 2 provides;
Fig. 3-1 is a kind of method flow diagram of the method for translation information that the embodiment of the present invention 3 provides;
Fig. 3-2 is a kind of schematic diagram for first information that the embodiment of the present invention 3 provides;
Fig. 3-3 is a kind of schematic diagram for 4th information that the embodiment of the present invention 3 provides;
Fig. 3-4 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 3 provides;
Fig. 3-5 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 3 provides;
Fig. 3-6 is the schematic diagram for the information of another kind the 4th that the embodiment of the present invention 3 provides;
Fig. 4 is a kind of apparatus structure schematic diagram for translation information that the embodiment of the present invention 4 provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention Formula is described in further detail.
Embodiment 1
The embodiments of the invention provide a kind of method of translation information, the executive agent of this method is terminal, and the translation is believed The method of breath can be implemented in combination with as some or all of of terminal by software, hardware or both;And terminal can Not possess any symbol database.The terminal includes mobile terminal, fixed terminal or server etc..
Referring to Fig. 1, wherein, this method includes:
Step 101:Obtain the emoticon included in the first information of original language form;
Step 102:The emoticon is replaced into the first information and identified for identifying the first of the emoticon To the second information;
Step 103:Second information is translated as to the 3rd information of object language form;
Step 104:Second mark corresponding with the first mark is extracted from the 3rd information;
Step 105:In the 3rd information by second mark be replaced into the second mark corresponding to emoticon obtain the 4th letter Breath.
Further, first temporary variable is identified as, the form of temporary variable is all identical in every kind of language format;
The first mark that the emoticon is replaced into for identifying the emoticon is obtained into the second letter in the first information Breath, including:
Interim numbering is distributed for the emoticon;
The interim numbering that the emoticon is replaced into the emoticon in the first information obtains the 5th information;
It is that the emoticon distributes temporary variable according to position of the emoticon in the first information;
Associate the temporary variable of the emoticon and interim numbering;
The interim numbering of the emoticon is replaced into the 5th information associated with the interim numbering of the emoticon Temporary variable obtain the second information.
Further, second temporary variable is identified as, second mark corresponding with the first mark is extracted from the 3rd information Know, including:
The temporary variable that the 3rd packet contains is extracted from the 3rd information;
Correspondingly, second is identified in the 3rd information emoticon corresponding to being replaced into the second mark and obtains the 4th letter Breath, including:
Obtain the interim numbering associated with temporary variable;
Temporary variable is replaced into the interim numbering associated with temporary variable in the 3rd information and obtains the 6th information;
Obtain emoticon corresponding to interim numbering;
Interim numbering is replaced into emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th information.
Further, first word corresponding to emoticon is identified as, the language format of word is original language form;
The first mark that the emoticon is replaced into for identifying emoticon is obtained into the second information in the first information, Including:
According to the emoticon, the attribute information of the emoticon is obtained;
At least one word according to corresponding to the attribute information of the emoticon obtains the emoticon;
The each word emoticon in the first information being replaced into respectively at least one word, obtains each word Second information corresponding to language.
Further, according to the emoticon, the attribute information of the emoticon is obtained, including:
According to the icon data of the emoticon, the emoticon is obtained from icon data and the corresponding relation of call number Call number;
According to the call number of the emoticon, obtained from the corresponding relation of call number corresponding to original language and attribute information The attribute information of the emoticon.
Further, at least one word according to corresponding to the attribute information of the emoticon obtains the emoticon, bag Include:
Calculate the attribute information of the emoticon similarity between each attribute information in semantic dictionary respectively, language Adopted dictionary is used for the corresponding relation of attribute information storage and word;
The similarity obtained from semantic dictionary between the attribute information of the emoticon meets preparatory condition at least One attribute information;
Word corresponding to each attribute information at least one attribute information is obtained from semantic dictionary.
Further, at least one word according to corresponding to the attribute information of the emoticon obtains emoticon, including:
Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;
The synonym or near synonym of word corresponding to the emoticon are obtained, and using synonym and near synonym as emoticon Word corresponding to number.
Further, second mark corresponding with the first mark is extracted from the 3rd information, including:
The word of object language form corresponding to the mark of extraction first from the 3rd information, using the word of extraction as second Mark;
Correspondingly, second is identified in the 3rd information emoticon corresponding to being replaced into the second mark and obtains the 4th letter Breath, including:
The corresponding pass comprising the second mark is obtained in corresponding relation from attribute information corresponding to object language with call number System;
Extract the call number included in the corresponding relation obtained;
According to call number, the icon data of the emoticon is obtained from the corresponding relation of call number and icon data;
The icon data for the emoticon that the second mark is replaced into acquisition in the 3rd information obtains the 4th information.
Further, second mark corresponding with the first mark is extracted from the 3rd information, including:
The word of object language form corresponding to the mark of extraction first from the 3rd information, using the word of extraction as second Mark;
Correspondingly, second is identified in the 3rd information emoticon corresponding to being replaced into the second mark and obtains the 4th letter Breath, including:
Obtain first mark corresponding with the second mark;
The corresponding relation for including the first mark is obtained from attribute information corresponding to original language and the corresponding relation of call number;
Extract the call number included in the corresponding relation obtained;
According to call number, the icon data of the emoticon is obtained from the corresponding relation of call number and icon data;
The icon data that second mark is replaced into the emoticon in the 3rd information obtains the 4th information.
In embodiments of the present invention, the emoticon included in the first information of original language form is obtained;In the first information Middle the first mark that the emoticon is replaced into for identifying the emoticon obtains the second information;Second information is translated as 3rd information of object language form;Second mark corresponding with the first mark is extracted from the 3rd information;In the 3rd information It is middle by second mark be replaced into the second mark corresponding to emoticon obtain the 4th information.It is achieved thereby that not by emoticon Storehouse and the limitation of dictionary for translation, can effectively realize the high accuracy translation of emoticon, and the translation comprising emoticon is constructed in reduction The cost such as dictionary, translation rule or translation model, language model.And can also effectively solve to be not logged in emoticon dictionary The identification of emoticon, translation and generation problem.
Embodiment 2
The embodiments of the invention provide a kind of method of translation information, the executive agent of this method is terminal, and the translation is believed The method of breath can be implemented in combination with as some or all of of terminal by software, hardware or both;And terminal can Not possess any symbol database.The terminal can be mobile terminal, fixed terminal or server etc..The embodiment of the present invention Suitable for the input equipment of user, do not possess any emoticon database scene.
Illustrated in embodiments of the present invention so that the first mark and second are identified as the temporary variable of emoticon as an example. Referring to Fig. 2-1, wherein, this method includes:
Step 201:Obtain the emoticon included in the first information of original language form;
Wherein, step 201 can be realized by following steps (1) to (2), including:
(1):Terminal obtains the first information of the original language form of user's input;
Wherein, user is to the first information of terminal input source language format, the original language form that terminal acquisition user inputs The first information, the first information includes at least one sentence to be translated.
User can use the mode that is manually entered to input the first information to terminal in embodiments of the present invention, can also use Replicate the mode pasted and input the first information to terminal.
Wherein, user be manually entered mode can be the input of document form, phonetic entry, input through keyboard, touch input, One or more in handwriting input, optical character identification input.
Wherein, original language can be any languages, and original language is not especially limited in embodiments of the present invention.For example, Using original language as Chinese, the first information for " Valentine's Day, give you+rose emoticon and+lollipop emoticon!+ titter expression Symbol!" exemplified by illustrate, referring to Fig. 2-2.Then in this step, terminal obtain user input the first information " Valentine's Day, Give you+rose emoticon and+lollipop emoticon!+ titter emoticon!”.
(2):Whether emoticon is included in the detection first information, if including the table included in the acquisition first information Feelings symbol.
Wherein, emoticon can include one in letter, numeral, punctuate, phonetic, assumed name, font, classification, image etc. The symbol with certain semantic that individual or multiple symbols are formed.
Because emoticon is different with the coded format of word, therefore, detect in the first information whether include emoticon The step of can be:
Determine whether comprising the content that coded format is pre-arranged code form in the first information, if comprising, it is determined that the Emoticon is included in one information, and the content of pre-arranged code form is emoticon;If do not include, it is determined that the first letter Emoticon is not included in breath.
Pre-arranged code form can be coded format etc. corresponding to picture format or symbol string.
For example, determine the first information " Valentine's Day, give you+rose emoticon and+lollipop emoticon!+ titter expression Symbol!" in include emoticon, obtain the emoticon " rose emoticon " that the first information includes, " lollipop emoticon Number " and " titter emoticon ".
Further, terminal has preserves function in real time, when getting the emoticon included in the first information, by first The word and emoticon included in information is stored according to preset data structure, for example, preset data structure can be to breathe out Uncommon array " $ hash [key]=$ value ", or chained list etc..
Step 202:Interim numbering is distributed for the emoticon;
According to storage order of the emoticon in the first information, interim numbering is distributed for the emoticon.And it is After the interim numbering of emoticon distribution, the interim numbering of the emoticon and the emoticon is associated, also will the expression The interim numbering of symbol and the emoticon be stored in emoticon and the corresponding relation numbered temporarily in.
For example, " rose emoticon ", " lollipop emoticon " and " titter emoticon " depositing in the first information Storage order is followed successively by 1,2,3;It is respectively then " rose emoticon " " lollipop emoticon " and " titter emoticon " distribution Interim numbering temp001, temp002 and temp003, associate " rose emoticon " and temp001, " lollipop emoticon " And temp002, " titter emoticon " and temp003, also will " rose emoticon " and temp001, " lollipop emoticon Number " and temp002, " titter emoticon " and temp003 are stored in the corresponding relation of interim numbering of emoticon, as follows Shown in table 1:
Table 1
Emoticon Interim numbering
Rose Temp001
Lollipop Temp002
Titter Temp003
…… ……
Step 203:The interim numbering that the emoticon is replaced into the emoticon in the first information obtains the 5th letter Breath;
For example, the first information " Valentine's Day, give you+rose emoticon and+lollipop emoticon!+ titter emoticon Number!" in " rose emoticon " is replaced into the interim numbering temp001 of " rose emoticon ", by " lollipop emoticon Number " the interim numbering temp002 that " lollipop emoticon " is replaced into " lollipop emoticon " is replaced into, will " titter expression Symbol " is replaced into the emoticon that titters " interim numbering temp003, obtain the 5th information for " Valentine's Day, give your temp001 and temp002!temp003!”.
Step 204:It is that the emoticon distributes temporary variable according to position of the emoticon in the first information;
For example, according to " rose emoticon ", " lollipop emoticon " and " titter emoticon " is in the first information Position, distribute temporary variable X for " rose emoticon ", temporary variable Y, Yi Jiwei distributed for " lollipop emoticon " " titter emoticon " distribution temporary variable Z.
Can also be in embodiments of the present invention the table according to position of the emoticon in the first information and/or classification Feelings symbol distributes temporary variable;The form of temporary variable is all identical in every kind of language format.
, can be using temporary variable sum combinatorics on words as the table if the emoticon included in the first information is more The temporary variable of feelings symbol, a unique interim change is distributed so as to be embodied as each emoticon included in the first information Amount.For example, temporary variable is English alphabet, because English alphabet only has 26, when the emoticon included in the first information When number is more than 26, numeral can be obtained, the temporary variable using English alphabet sum combinatorics on words as emoticon.For example, X0, X1, X2 ... Xn and Y1, Y2 ... are waited into the temporary variable as emoticon.
Step 205:Associate the temporary variable of the emoticon and interim numbering;
Number the temporary variable of emoticon and temporarily in the corresponding relation for being stored in temporary variable and numbering temporarily, from And realize the temporary variable for associating the emoticon and interim numbering.
For example, by the temporary variable X of " rose emoticon " and interim numbering temp001, " lollipop emoticon " Temporary variable Y and interim numbering temp002, and the temporary variable Z of " titter emoticon " and interim numbering temp003 storages It is as shown in table 2 below in the corresponding relation numbered in temporary variable and temporarily:
Table 2
Temporary variable Interim numbering
X Temp001
Y Temp002
Z Temp003
…… ……
Step 206:The interim numbering of the emoticon is replaced into the interim volume with the emoticon in the 5th information Number associated temporary variable obtains the second information;
According to the interim numbering of the emoticon, obtained and the expression from the corresponding relation of interim numbering and temporary variable The associated temporary variable of the interim numbering of symbol;In the 5th information, the interim numbering of the emoticon is replaced into being somebody's turn to do The associated temporary variable of the interim numbering of emoticon obtains the second information.
For example, according to the interim numbering temp001 of " rose emoticon ", the interim numbering of " lollipop emoticon " Temp002, and the interim numbering temp003 of " titter emoticon ", obtained respectively from table 2 associated with temp001 Temporary variable X, the temporary variable Y associated with temp002, and the temporary variable Z associated with temp003;In the 5th letter In breath, temp001 is replaced with into X, temp002 is replaced with into Y, temp003 is replaced with into Z, it is " sweet heart to obtain the second information Section, gives you X and Y!Z!”.
Step 207:Second information is translated as to the 3rd information of object language form;
Wherein, object language can be any languages, and object language is not especially limited in embodiments of the present invention.Step Rapid 207 can be realized by following steps (1) to (2), including:
(1):At least one translation algorithm is selected from translation algorithm set;
Wherein, translation algorithm set includes rule-based translation algorithm, the translation algorithm of Case-based Reasoning and based on system The translation algorithm of meter.
Any one translation algorithm is selected from translation algorithm set, or any two kinds are selected from translation algorithm set Translation algorithm, or three kinds of translation algorithms are selected from translation algorithm set.
(2):By the translation algorithm of selection, the second information is translated as to the 3rd information of object language form;
For example, using object language as Japanese, the translation algorithm of selection to be illustrated exemplified by rule-based translation algorithm, Then by rule-based translation algorithm, then " Valentine's Day, you is given the second information to X and Y!Z!" it is translated as the 3rd of Japanese form the Information is " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru!Z!”;For another example, using object language as Japanese, selection Translation algorithm be Case-based Reasoning translation algorithm, then " Valentine's Day, give you the second information to X and Y!Z!" it is translated as Japanese form The 3rd information be " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y!Z!”.
, can be with root when the second information is translated as into three information of object language form by the translation algorithm of selection Morphology and/or syntactic analysis are carried out to the second information according to the translation algorithm of selection.Also word only can be carried out to the second information Method is analyzed, or only carries out syntactic analysis to the second information, morphological analysis can also be first carried out to the second information, then carry out sentence Method is analyzed.Morphological analysis and syntactic analysis are not especially limited in embodiments of the present invention.
Wherein, rule-based translation algorithm, the translation algorithm of Case-based Reasoning and translation algorithm based on syntax are, it is necessary to right Second information carries out morphological analysis and syntactic analysis;And some translation algorithms can only to the second information carry out morphological analysis or Person's syntactic analysis.
Wherein, the instrument of morphological analysis and syntactic analysis is a lot, such as participle instrument can be Stanford POS Tagger (English Chinese Arabic), Computer Department of the Chinese Academy of Science ICTCLAS Chinese analysises system, the thulac participles of Tsing-Hua University ChaSen, Mecab, JUMAN etc. of system, Japanese.Such as syntax participle instrument can be Stanford Parse (English Chinese Arabic), Harbin Institute of Technology's Chinese parsing device, the parser such as Cabocha, KNP of Japanese, in embodiments of the present invention Lexical analysis tool and syntactic analysis instrument are not specifically limited.
For example, " Valentine's Day, give you X and Y to the second information using the thulac participle instruments of Tsing-Hua University!Z!" divided Word, obtained word segmentation result for " Valentine's Day/t ,/w gives/v you/r X/x and/cY/x!/w Z/x!/w”
Wherein, the part of speech in word segmentation result is as shown in table 3:
Table 3
Symbol Part of speech Symbol Part of speech Symbol Part of speech
n Noun s Place word r Pronoun
np Name v Verb c Conjunction
ns Place name vm Modal verb p Preposition
ni Mechanism name vd Directional verb u Auxiliary word
nz Other proper names a Adjective y Auxiliary words of mood
m Number d Adverbial word e Interjection
q Measure word h Enclitics o Onomatopoeia
mq Numeral-classifier compound k It is followed by composition g Morpheme
t Time word i Idiom w Punctuate
f The noun of locality j Referred to as x It is other
For example, go " Valentine's Day, to give you X and Y to the second information using Harbin Institute of Technology's Chinese parsing!Z!" carry out interdependent sentence Method is analyzed, analysis result such as Fig. 2-3 and Fig. 2-4 of obtained dependency analysis tree.Chinese dependency tree and phrase structure tree mutually turn Change technology relative maturity.Phrase-based structure tree or statistical machine translation method based on dependency structure tree have tree to string mould Type, string to tree-model, tree to tree-model, forest arrive string model etc..Described syntactic analysis result, it is same suitable with reference to the present invention For described syntax-based SMT method and Case-based design method.
Step 208:The temporary variable that the 3rd packet contains is extracted from the 3rd information;
For example, from the 3rd information " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru!Z!" or " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y!Z!" in the 3rd packet of extraction the temporary variable X, Y and the Z that contain.
Step 209:Obtain the interim numbering associated with temporary variable;
According to the temporary variable of acquisition, obtained in the corresponding relation numbered from temporary variable and temporarily related to temporary variable The interim numbering of connection.
For example, according to temporary variable X, Y and Z, obtained respectively from the temporary variable of table 2 and the corresponding relation numbered temporarily Temporary variable X interim numbering temp001, temporary variable Y interim numbering temp002, and temporary variable Z interim numbering temp003。
Step 210:Temporary variable is replaced into the interim numbering associated with temporary variable in the 3rd information and obtains Six information;
For example, in the 3rd information " バ レ Application タ イ ン デ ー は, あ な To X と Y The and え Ru!Z!" in will become temporarily Amount X is replaced into the interim numbering temp001 associated with temporary variable X, temporary variable Y is replaced into related to temporary variable Y The interim numbering temp002 of connection, temporary variable Z is replaced into the interim numbering temp003 associated with temporary variable Z, obtained 6th information is " バ レ Application タ イ ン デ ー は, あ な To temp001 と temp002 The and え Ru!temp003”;For another example, In the 3rd information " バ レ Application タ イ ン デ ー は, あ な To Gifts り, X と Y!Z!" in temporary variable X is replaced into becoming temporarily The associated interim numbering temp001 of X are measured, temporary variable Y is replaced into the interim numbering associated with temporary variable Y Temp002, temporary variable Z is replaced into the interim numbering temp003 associated with temporary variable Z, it is " バ to obtain the 6th information レ Application タ イ ン デ ー は, あ な To Gifts り, temp001 と temp002!temp003!”.
Step 211:Obtain emoticon corresponding to interim numbering;
According to interim numbering, expression corresponding to the interim numbering is obtained from the corresponding relation of interim numbering and emoticon Symbol.
For example, according to interim numbering temp001, temp002 and temp003, it is corresponding that temp001 is obtained from table 1 respectively Emoticon " rose emoticon ", emoticon corresponding to temp002 " lollipop emoticon ", corresponding to temp003 Emoticon " titter emoticon ".
Step 212:Interim numbering is replaced into emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th letter Breath.
For example, in the 6th information " バ レ Application タ イ ン デ ー は, あ な To temp001 と temp002 The and え Ru! In temp003 ", interim numbering temp001 is replaced into " rose emoticon ", interim numbering temp002 is replaced into " stick Sugared emoticon ", it is " バ レ Application タ イ Application that interim numbering temp003, which is replaced into " titter emoticon " to obtain the 4th information, デ ー は, あ な To+rose emoticon と+lollipop emoticon The and え Ru!+ titter emoticon ", such as Fig. 2-5 It is shown.For another example, in the 6th information " バ レ Application タ イ ン デ ー は, あ な To Gifts り, temp001 と temp002! temp003!" in, interim numbering temp001 is replaced into " rose emoticon ", interim numbering temp002 is replaced into " rod Lollipop emoticon ", it is " バ レ Application タ イ that interim numbering temp003, which is replaced into " titter emoticon " to obtain the 4th information, ン デ ー は, あ な To Gifts り ,+rose emoticon と+lollipop emoticon!+ titter emoticon!", as Fig. 2-6 institute Show.
According to the 4th information " バ レ Application タ イ ン デ ー は, あ な To+rose emoticon と+lollipop emoticon Number The and え Ru!+ titter emoticon " and " バ レ Application タ イ ン デ ー は, あ な To Gifts り ,+rose emoticon と+rod Lollipop emoticon!+ titter emoticon!" as can be seen that although the translation algorithm of selection is different, cause translation result different, But have no effect on the effect of the present invention.
Further, interim numbering is replaced into emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th letter After breath, the 4th information is exported.Can be with one in word, image, voice, emoticon etc. when the 4th information is exported Plant or more than one are exported.The way of output of the 4th information is not especially limited in embodiments of the present invention.
In embodiments of the present invention, the emoticon included in the first information by obtaining original language form;First Emoticon is replaced into information and obtains the second information for identifying the temporary variable of the emoticon;Second information is translated For the 3rd information of object language form;Temporary variable is extracted from the 3rd information;Temporary variable is replaced in the 3rd information The 4th information is obtained for corresponding emoticon.It is achieved thereby that do not limited by emoticon storehouse and dictionary for translation, can be effective Realize emoticon high accuracy translation, reduction construct the dictionary for translation comprising emoticon, translation rule or translation model, The costs such as language model.And can also effectively solve identification, translation and the life of the emoticon being not logged in emoticon dictionary It is problematic.And can also solve emoticon in the judgement of the structural position at object language end and generation problem.In target Language end, the position where emoticon is correctly identified, ensure the sentence structure of translation result and semantic integrality.And The present invention is not limited by languages, can effectively solve the identification, translation and generation problem of the emoticon of any languages.
Embodiment 3
The embodiments of the invention provide a kind of method of translation information, the executive agent of this method is terminal, and the translation is believed The method of breath can be implemented in combination with as some or all of of terminal by software, hardware or both.The terminal can be with For mobile terminal, fixed terminal or server etc..The embodiment of the present invention is applied to the input equipment of user, possesses certain given The situations such as kind symbol expression database, synonymicon.
Emoticon corresponding word in original language is identified as with first in embodiments of the present invention, second is identified as table Feelings symbol illustrates in object language exemplified by corresponding word.Referring to Fig. 3-1, wherein, this method includes:
Step 301:Obtain the emoticon included in the first information of original language form;
Wherein, step 301 can be realized by following steps (1) to (2), including:
(1):Terminal obtains the first information of the original language form of user's input;
Wherein, user is to the first information of terminal input source language format, the original language form that terminal acquisition user inputs The first information, the first information includes at least one sentence to be translated.
User can use the mode that is manually entered to input the first information to terminal in embodiments of the present invention, can also use Replicate the mode pasted and input the first information to terminal.
Wherein, user be manually entered mode can be the input of document form, phonetic entry, input through keyboard, touch input, One or more in handwriting input, optical character identification input.
Wherein, original language can be any languages, and original language is not especially limited in embodiments of the present invention.For example, Using original language as Chinese, the first information is illustrates exemplified by " he feels very+happiness emoticon ", as shown in figure 3-2.Then In this step, terminal obtains " he feels very+happiness emoticon " of user's input.
(2):Whether emoticon is included in the detection first information, if including the table included in the acquisition first information Feelings symbol.
Wherein, emoticon can include one in letter, numeral, punctuate, phonetic, assumed name, font, classification, image etc. The symbol with certain semantic that individual or multiple symbols are formed.
Because emoticon is different with the coded format of word, therefore, detect in the first information whether include emoticon The step of can be:
Determine whether comprising the content that coded format is pre-arranged code form in the first information, if comprising, it is determined that the Emoticon is included in one information, and the content of pre-arranged code form is emoticon;If do not include, it is determined that the first letter Emoticon is not included in breath.
Pre-arranged code form can be coded format etc. corresponding to picture format or symbol string.
Wherein, the first information from user to terminal input source language format,
For example, determining to include emoticon in the first information " he feels very+happiness emoticon ", the first information is obtained Comprising emoticon " happiness emoticon ".
Further, terminal has preserves function in real time, when getting the emoticon included in the first information, by first The word and emoticon included in information is stored according to preset data structure, for example, preset data structure can be to breathe out Uncommon array " $ hash [key]=$ value ", or chained list etc..
Further, it is interim for the emoticon distribution of acquisition after getting the emoticon included in the first information Emoticon, is replaced into corresponding to emoticon and numbers temporarily by numbering in the first information.
The for example, interim numbering temp001 of " happiness emoticon " distribution, in the first information, " he feels very+happiness table In feelings symbol ", " happiness emoticon " is replaced into tempo01, obtain " he feel very temp001 ".
Step 302:According to the emoticon, the attribute information of the emoticon is obtained;
Wherein, attribute information can be the word of emoticon, semanteme, classification, part of speech, structure, concept, length, title, Expression size, form, content and/or phonetic etc..Various emoticons in the emoticon and emoticon storehouse are subjected to mould Formula matches, and obtains the attribute information of the emoticon;Wherein, it is used to store emoticon and attribute information in emoticon storehouse Corresponding relation.The content in emoticon storehouse can include emoticon library name, expression data total length, expression number, recently Using emoticon, expression index, expression length, expression title, expression size, expression form, expression content, written form, The information such as semanteme, classification, part of speech, structure, concept and display location.
Wherein, step 302 can be realized by following steps (1) and (2), including:
(1):According to the icon data of the emoticon, the expression is obtained from icon data and the corresponding relation of call number The call number of symbol;
According to the emoticon, the icon number of the emoticon is obtained from the corresponding relation of emoticon and icon data According to according to the icon data of the emoticon, the rope of the emoticon is obtained from icon data and the corresponding relation of call number Quotation marks.
In embodiments of the present invention, the corresponding relation and icon number of emoticon and icon data are previously stored in terminal According to the corresponding relation with call number;Wherein, the corresponding relation of emoticon and icon data is as shown in table 4 below, icon data with The corresponding relation of call number is as shown in table 5 below:
Table 4
Icon data Emoticon
010011000111……0100100 It is surprised
010011000111……0100101 It is glad
010011000111……0100110 Titter
010011000111……0100111 By force
010011000111……0101000 Lollipop
010011000111……0111000 Rose
Table 5
Call number Icon data
X…X001 010011000111……0100100
X…X002 010011000111……0100101
X…X003 010011000111……0100110
X…X004 010011000111……0100111
X…X005 010011000111……0101000
X…X100 010011000111……0111000
For example, according to " happiness emoticon ", obtained from the emoticon and the corresponding relation of icon data in table 4 Icon data corresponding to " happiness emoticon " be 010011000111 ... 0100101;According to the figure of " happiness emoticon " Data 010011000111 ... 0100101 are marked, from the icon data of the emoticon in table 5 and the corresponding relation of call number Call number corresponding to obtaining " happiness emoticon " is X ... X002.
(2):According to the call number of the emoticon, from the corresponding relation of call number corresponding to original language and attribute information Obtain the attribute information of the emoticon.
According to original language, the corresponding relation of call number and attribute information corresponding to original language is obtained, according to the emoticon Call number, the attribute information of the emoticon is obtained from the corresponding relation of call number corresponding to original language and attribute information.
In embodiments of the present invention, terminal is previously stored the pass corresponding with attribute information of call number corresponding to every kind of language System.For example, the corresponding relation of call number corresponding to Chinese and attribute information is as shown in table 6 below and table 7:
Table 6
Length Title Expression size Form Content Word Position Call number
100bytes /jy 16*16 bmp (⊙o⊙) It is surprised 1 X…X001
/gx 16*16 bmp (* ^ ﹏ ^*) It is glad 2 X…X002
/tx 16*16 bmp Titter 3 X…X003
/qiang 16*16 bmp By force 4 X…X004
/bangbangt 16*16 bmp Lollipop 5 X…X005
Table 7
Call number Word Title Phonetic Part of speech It is semantic
X…X001 It is surprised /jy jinagya adj Emotion
X…X002 It is glad /gx gaoxing adj Emotion
X…X003 Titter /tx touxiao v Behavior
X…X004 By force /qiang qiang adj Degree
X…X005 Lollipop /bangbangt bagnbangtang n Food
For example, according to call number X ... X002, attribute information bag corresponding to " happiness emoticon " is obtained from table 6 and table 7 Entitled/gx is included, expression size is 16*16, and form bmp, content is (* ^ ﹏ ^*), and word is glad, position 2, phonetic For happiness, part of speech adj, semanteme is emotion etc..For another example, by " happiness emoticon " and interim numbering temp001 composition numerical value (temp001, happiness emoticon), pattern match then is carried out using the emoticon in the array and emoticon storehouse, from Can be obtained in table 4 and table 5 searching number of happiness emoticon for " X ... X002 ", then according to the searching number " X ... X002 " and Table 6 and table 7 carry out pattern match, the various attribute informations of " happiness emoticon " can be obtained, as expression length is The entitled Happy of 100bytes, expression, expression size are 16*16, expression form is bmp, word is glad, phonetic is Gaoxing, semanteme are emotion, part of speech is the attribute informations such as adj/ adjectives.
Step 303:At least one word according to corresponding to the attribute information of the emoticon obtains the emoticon;
Wherein, step 303 can be realized by first way or the second way, for the first implementation, Step 303 can be realized by following steps (1) to (3), including:
(1):The attribute information for calculating the emoticon is similar between each attribute information in semantic dictionary respectively Degree, semantic dictionary are used for the corresponding relation of attribute information storage and word;
Wherein, semantic dictionary can use synonym either near synonym dictionary or the original language and target language of original language The dictionary for translation of speech, or translation model, language model etc..Different according to the word storehouse that uses, the method for calculating similarity can be with Do corresponding adjustment.
The dictionary for translation of original language and object language, or translation model, language model can make the money that translation system carries Source.Such as rule-based translation algorithm, the bilingual translation dictionary of the translation algorithm of Case-based Reasoning, the translation algorithm based on statistics Translation model or language model etc., it is used equally for the Semantic Similarity Measurement of vocabulary, such technology relative maturity, here, no longer Repeat.
Chinese can use Chinese thesaurus or HowNet (http://www.keenage.com/), English Synonyms/ Near synonym dictionary can use WordNet (http://wordnet.princeton.edu/).
In addition, EuroWordNet (http://www.illc.uva.nl/EuroWordNet/) it is the multi-lingual of Europe Semantic network dictionary, go for Dutch, Italian, Spanish, German, French, Czech and Estonian Deng the Semantic Similarity Measurement of language.
India semantic dictionary IndoWordNet (http://en.wikipedia.org/wiki/IndoWordNet) include The semantic networks of 18 kinds of official languages of India.
Japanese can use Japanese WordNet (http://nlpwww.nict.go.jp/wn-ja/), Japanese vocabulary it is complete works of (http://www.kecl.ntt.co.jp/icl/lirg/resources/GoiTaikei/) etc. carry out semantic similarity meter Calculate.
Different according to languages, semantic similarity calculation method is slightly different, is not specifically limited herein;As Chinese uses HowNet semantic similarity calculation method.Such as:Liu Qun, Li Su build and are based on《Hownet》Similarity of Words calculate [J] Chinese computing linguistics, 2002,7 (2):59-76.
The semantic similarity calculation method of English, such as:Pedersen T,Patwardhan S,Michelizzi J.WordNet::Similarity:Measuring the relatedness of concepts [C], Demonstration papers at HLT-NAACL 2004.Association for Computational Linguistics,2004:38- 41.
Other language equally exist the existing technology of many Semantic Similarity Measurements and method, may be applicable to the present invention, It will not be repeated here.
(2):The similarity obtained from semantic dictionary between the attribute information of the emoticon meets preparatory condition At least one attribute information;
Preparatory condition can be the default value that similarity is more than predetermined threshold value or similarity maximum;Then step (2) can Think:The attribute letter for being more than predetermined threshold value the similarity between the attribute information of the emoticon is obtained from semantic dictionary Breath;Or the predetermined number category maximum the similarity between the attribute information of the emoticon is obtained from semantic dictionary Property information.
Predetermined threshold value and default value can be configured as needed, in embodiments of the present invention to predetermined threshold value and Default value is all not especially limited.
(3):Word corresponding to each attribute information at least one attribute information is obtained from semantic dictionary.
According to each attribute information at least one attribute information of acquisition, from the attribute information and word in semantic dictionary Word corresponding to each attribute information is obtained in the corresponding relation of language.
For second of implementation, step 303 can be realized by following steps (A) and (B), including:
(A):Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;
Wherein, attribute information includes word, and the emoticon pair is extracted from attribute information corresponding to the emoticon The word answered.
For example, the extraction word corresponding to " happiness emoticon " is from attribute information corresponding to " happiness emoticon " " happiness ".
(B):Obtain the synonym or near synonym of word corresponding to the emoticon, and by the synonym and near synonym of acquisition As word corresponding to the emoticon.
For example, the synonym of " happiness " is " happiness ", the near synonym of " happiness " include joyful, joyful, happy, joy, pleased It hurry up, gambol, rouse oneself, it is smooth happy, it is proud, it is peaceful and comfortable, it is cheerful and light-hearted, it is satisfactory, it is great rejoicing, jump for joy, it is happy, it is satisfied, it is happy, it is happily, joyous, it is emerging To put forth energy, achieve one's ambition, entertainment is happy, jubilant, happy, gladly etc..Obtained from the synonym or near synonym of " happiness " " happiness " and " pleased Hurry up ", by word corresponding to " happiness " and " happiness " conduct " happiness emoticon ".
Step 304:Each word emoticon in the first information being replaced into respectively at least one word, is obtained Second information corresponding to each word;
For example, in the first information " he feels very+happiness emoticon ", " happiness emoticon " is replaced into respectively " happiness " and " happiness ", it is " he feels very delight " to obtain the second information corresponding to " happiness ", and corresponding to " happiness " Two information are " he feels to be as cheerful as a lark ".
Step 305:Second information is translated as to the 3rd information of object language form;
Wherein, object language can be any languages, and object language is not especially limited in embodiments of the present invention.Step Rapid 305 can be realized by following steps (1) to (2), including:
(1):At least one translation algorithm is selected from translation algorithm set;
Wherein, translation algorithm set includes rule-based translation algorithm, the translation algorithm of Case-based Reasoning and based on system The translation algorithm of meter.
Any one translation algorithm is selected from translation algorithm set, or any two kinds are selected from translation algorithm set Translation algorithm, or three kinds of translation algorithms are selected from translation algorithm set.
(2):By the translation algorithm of selection, the second information is translated as to the 3rd information of object language form;
For example, using object language as Japanese, the second information is " he feels very delight ", and by the second information, " he feels non- The 3rd information that Chang Gaoxing " is translated as Japanese form is " that は と て も う れ い ", wherein " う れ い " are a day Language word, look like for happiness;For another example, the second information is " he feels to be as cheerful as a lark ", by the second information " he feels to be as cheerful as a lark " The 3rd information of Japanese form is translated as " that は と て も ease い ", wherein , " ease い " are a Japanese vocabulary, the meaning For happiness.
, can be with root when the second information is translated as into three information of object language form by the translation algorithm of selection Morphology and/or syntactic analysis are carried out to the second information according to the translation algorithm of selection.Also word only can be carried out to the second information Method is analyzed, or only carries out syntactic analysis to the second information, morphological analysis can also be first carried out to the second information, then carry out sentence Method is analyzed.Morphological analysis and syntactic analysis are not especially limited in embodiments of the present invention.
Wherein, rule-based translation algorithm, the translation algorithm of Case-based Reasoning and translation algorithm based on syntax are, it is necessary to right Second information carries out morphological analysis and syntactic analysis;And some translation algorithms can only to the second information carry out morphological analysis or Person's syntactic analysis.
Wherein, the instrument of morphological analysis and syntactic analysis is a lot, such as participle instrument can be Stanford POS Tagger (English Chinese Arabic), Computer Department of the Chinese Academy of Science ICTCLAS Chinese analysises system, the thulac participles of Tsing-Hua University ChaSen, Mecab, JUMAN etc. of system, Japanese.Such as syntax participle instrument can be Stanford Parse (English Chinese Arabic), Harbin Institute of Technology's Chinese parsing device, the parser such as Cabocha, KNP of Japanese, in embodiments of the present invention Lexical analysis tool and syntactic analysis instrument are not specifically limited.
Step 306:The word of object language form corresponding to the emoticon is extracted from the 3rd information;‘
The object language form according to corresponding to the word of original language form corresponding to the emoticon obtains the emoticon Word, the word of object language form corresponding to the emoticon is extracted from the 3rd information.
Step 307:The word of extraction is replaced into emoticon corresponding to the word of extraction in the 3rd information and obtains Four information.
Wherein, step 307 can be realized by first way or the second way, for the first implementation, Step 307 can be realized by following steps (1) to (4), including:
(1):Obtained from attribute information corresponding to object language and the corresponding relation of call number and include the emoticon pair The corresponding relation of the word for the object language form answered;
According to object language, attribute information and the corresponding relation of call number corresponding to object language are obtained;According to the expression The word of original language form corresponding to symbol, object language form corresponding to the emoticon is obtained from the corresponding relation of acquisition Word corresponding relation.
Wherein, the corresponding relation of call number, icon data and picture material is as shown in table 8 below corresponding to Japanese:
Table 8
Call number Icon data Picture material (remarks)
Y…Y001 010011000111……1000100 Happiness emoticon in Japanese
Y…Y002 010011000111……1000101 Happy emoticon in Japanese
Y…Y100 010011000111……1011010 Emoticon tired out in Japanese
Wherein, the attribute information of the emoticon in Japanese is as shown in table 9 below:
Table 9
For example, in the 3rd information, find out " う れ い " He " ease い " are the generation object of Japanese manufacturing side icon, According to " う れ い " He " ease い " find " call number of icon corresponding to う れ い " He " ease い " difference from table 8 For " Y ... Y001 " and " Y ... Y002 ", by " Y ... Y001 " are updated to " obtains " that in that は と て も う れ い う れ い " は と て も う れ い (Y ... Y001) ", by " Y ... Y002 " are updated to " obtains " that は と て in that は と て も ease い " も ease い (Y ... Y002) ".
(2):Extract the call number included in the corresponding relation obtained;
Wherein, call number, the call number that extractor includes from the corresponding relation of acquisition are included in corresponding relation.
(3):According to the call number, the icon number of the emoticon is obtained from the corresponding relation of call number and icon data According to;
Wherein, the corresponding relation of call number and icon data is stored in terminal.
(4):The icon data for the emoticon that the word is replaced into acquisition in the 3rd information obtains the 4th information.
For example, in the 3rd information " in that は と て も う れ い ", by " う れ い " are replaced into table corresponding to happiness Feelings symbol, it is " that は と て も+happiness emoticon " to obtain the 4th information, as shown in Fig. 3-3;For another example, in the 3rd information " that In は と て も ease い ", " ease い " are replaced into emoticon corresponding to happiness, it is " that は と て to obtain the 4th information も+happy emoticon ", as shown in Figure 3-4.
For example, " happiness emoticon " is inserted on the right side of that は と て も, to obtain the 4th information be " that は と て も う れ い (happiness emoticon) ", as in Figure 3-5;The right side that " happy emoticon " is inserted into that は と て も obtains It is " that は と て も ease い (happy emoticon) " to the 4th information, as seen in figures 3-6.
For second of implementation, step 307 can be realized by following steps (A) to (E), including:
(A):Obtain the word of original language form corresponding with the emoticon;
According to the emoticon, the word of acquisition original language form corresponding with the emoticon, for example, according to " glad The word that emoticon " obtains original language form corresponding with being somebody's turn to do " happiness emoticon " is " happiness ".
(B):Obtained from attribute information corresponding to original language and the corresponding relation of call number comprising the emoticon in source The corresponding relation of word in language;
According to original language, attribute information and the corresponding relation of call number corresponding to original language are obtained, corresponding from acquisition is closed The corresponding relation for including word of the emoticon in original language is obtained in system.
(C):Extract the call number included in the corresponding relation obtained;
Wherein, call number is included in corresponding relation, extracts the call number included in the corresponding relation of acquisition.
(D):According to call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
(E):The icon data that the word is replaced into the emoticon in the 3rd information obtains the 4th information.
Wherein, the icon data of the emoticon can also be inserted into a left side for the word by step (E) in the 3rd information Side or right side etc..
, wherein it is desired to the difference of explanation, the first implementation and second of implementation is, first way In emoticon be object language form emoticon, and the emoticon in second of implementation is original language form Emoticon.
Further, the word of extraction is replaced into emoticon corresponding to the word of extraction in the 3rd information and obtains After four information, the 4th information is exported.4th information can be with one kind or one kind in word, image, voice, emoticon etc. Exported with upper type.
The first effect of the present invention, it is that the present invention can not be limited by emoticon storehouse and dictionary for translation, can be effectively real The dictionary for translation comprising emoticon, translation rule or translation model, language are constructed in the high accuracy translation of existing emoticon, reduction Say the costs such as model.
The second effect of the present invention, it is that the present invention can effectively solve solve the emoticon being not logged in emoticon dictionary Identification, translation and generation problem.
The 3rd effect of the present invention, it is that the present invention can effectively solve emoticon in the structural position at object language end Judge and generate problem.At object language end, the position where emoticon is correctly identified, ensures the sentence knot of translation result Structure and semantic integrality.
The 4th effect of the present invention, it is that the present invention is not limited by languages, can effectively solves the emoticon of any languages Identification, translation and generation problem.
Embodiment 4
The embodiments of the invention provide a kind of device of translation information, referring to Fig. 4, wherein, the device includes:
First acquisition module 401, the emoticon included in the first information for obtaining original language form;
First replacement module 402, in the first information by emoticon be replaced into for identify emoticon One mark obtains the second information;
Translation module 403, for the second information to be translated as to the 3rd information of object language form;
First extraction module 404, for extracting second mark corresponding with the first mark from the 3rd information;
Second replacement module 405, for the second mark to be replaced into emoticon corresponding to the second mark in the 3rd information Number obtain the 4th information.
Further, first temporary variable is identified as, the form of temporary variable is all identical in every kind of language format;
First replacement module 402, including:
First allocation unit, for distributing interim numbering for emoticon;
First displacement unit, the interim numbering for emoticon to be replaced into emoticon in the first information obtain the Five information;
Second allocation unit, for being become according to position of the emoticon in the first information for emoticon distribution is interim Amount;
Associative cell, for the temporary variable for associating emoticon and interim numbering;
Second displacement unit, faces for being replaced into the interim numbering of emoticon in the 5th information with emoticon When the associated temporary variable of numbering obtain the second information.
Further, second it is identified as temporary variable, the first extraction module 404, including:
First extraction unit, the temporary variable contained for extracting the 3rd packet from the 3rd information;
Correspondingly, the second replacement module 405, including:
First acquisition unit, for obtaining the interim numbering associated with temporary variable;
3rd displacement unit, for temporary variable to be replaced into the interim volume associated with temporary variable in the 3rd information Number obtain the 6th information;
Second acquisition unit, for obtaining emoticon corresponding to interim numbering;
4th displacement unit, obtained for interim numbering to be replaced into emoticon corresponding to numbering temporarily in the 6th information To the 4th information.
Further, first word corresponding to emoticon is identified as, the language format of word is original language form;
First replacement module 402, including:
3rd acquiring unit, for according to emoticon, obtaining the attribute information of emoticon;
4th acquiring unit, at least one word according to corresponding to the attribute information of emoticon acquisition emoticon Language;
5th displacement unit, it is each at least one word for the emoticon in the first information to be replaced into respectively Word, obtain the second information corresponding to each word.
Further, the 3rd acquiring unit, including:
First obtains subelement, for the icon data according to emoticon, is closed from icon data is corresponding with call number The call number of emoticon is obtained in system;
Second obtains subelement, for the call number according to emoticon, believes from call number corresponding to original language and attribute The attribute information of emoticon is obtained in the corresponding relation of breath.
Further, the 4th acquiring unit, including:
Computation subunit, for calculate the attribute information of emoticon respectively with each attribute information in semantic dictionary it Between similarity, semantic dictionary is used for the corresponding relation of attribute information storage and word;
3rd obtains subelement, expires for obtaining the similarity between the attribute information of emoticon from semantic dictionary At least one attribute information of sufficient preparatory condition;
4th obtains subelement, for obtaining each attribute information pair at least one attribute information from semantic dictionary The word answered.
Further, the 4th acquiring unit, including:
Subelement is extracted, for extracting word corresponding to emoticon from attribute information corresponding to emoticon;
5th obtains subelement, for obtaining the synonym or near synonym of word corresponding to emoticon, and by synonym With near synonym as word corresponding to emoticon.
Further, the first extraction module 404, including:
Second extraction unit, will for the word of object language form corresponding to the mark of extraction first from the 3rd information The word of extraction is as the second mark;
Correspondingly, the second replacement module 405, including:
5th acquiring unit, included for being obtained from the corresponding relation of attribute information corresponding to object language and call number The corresponding relation of second mark;
3rd extraction unit, for extracting the call number included in the corresponding relation obtained;
6th acquiring unit, for according to call number, emoticon to be obtained from the corresponding relation of call number and icon data Number icon data;
6th displacement unit, the icon data of the emoticon for the second mark to be replaced into acquisition in the 3rd information Obtain the 4th information.
Further, the first extraction module 404, including:
3rd extraction unit, will for the word of object language form corresponding to the mark of extraction first from the 3rd information The word of extraction is as the second mark;
Correspondingly, the second replacement module 405, including:
7th acquiring unit, for obtaining first mark corresponding with the second mark;
8th acquiring unit, for being obtained from the corresponding relation of attribute information corresponding to original language and call number comprising the The corresponding relation of one mark;
4th extraction unit, for extracting the call number included in the corresponding relation obtained;
9th acquiring unit, for according to call number, emoticon to be obtained from the corresponding relation of call number and icon data Number icon data;
7th displacement unit, the icon data for the second mark to be replaced into emoticon in the 3rd information obtain the Four information.
The first effect of the present invention, it is that the present invention can not be limited by emoticon storehouse and dictionary for translation, can be effectively real The dictionary for translation comprising emoticon, translation rule or translation model, language are constructed in the high accuracy translation of existing emoticon, reduction Say the costs such as model.
The second effect of the present invention, it is that the present invention can effectively solve solve the emoticon being not logged in emoticon dictionary Identification, translation and generation problem.
The 3rd effect of the present invention, it is that the present invention can effectively solve emoticon in the structural position at object language end Judge and generate problem.At object language end, the position where emoticon is correctly identified, ensures the sentence knot of translation result Structure and semantic integrality.
The 4th effect of the present invention, it is that the present invention is not limited by languages, can effectively solves the emoticon of any languages Identification, translation and generation problem.
It should be noted that:The method of the translation information provided in above-described embodiment is in translation information, only with above-mentioned each The division progress of functional module, can be as needed and by above-mentioned function distribution by different work(for example, in practical application Energy module is completed, i.e., the internal structure of the device of translation information is divided into different functional modules, described above to complete All or part of function.In addition, the device of translation information and the embodiment of the method category of translation information that above-described embodiment provides In same design, its specific implementation process refers to embodiment of the method, repeated no more here.
It should be added that translation information method of the invention and translation information device are not specific for two kinds Language and propose that there is general applicability with the inventive method.The present invention can equally be well applied to other language pair.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment To complete, by program the hardware of correlation can also be instructed to complete, described program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.

Claims (8)

  1. A kind of 1. method of translation information, it is characterised in that methods described includes:
    Obtain the emoticon included in the first information of original language form;
    The first mark that the emoticon is replaced into for identifying the emoticon is obtained the in the first information Two information, described first is identified as word corresponding to temporary variable or the emoticon;
    Second information is translated as to the 3rd information of object language form;
    Second mark corresponding with the described first mark is extracted from the 3rd information;
    In the 3rd information by described second mark be replaced into it is described second mark corresponding to emoticon obtain the 4th letter Breath;
    When described first is identified as temporary variable, the form of the temporary variable is all identical in every kind of language format;It is described The first mark that the emoticon is replaced into for identifying the emoticon is obtained into the second letter in the first information Breath, including:
    Interim numbering is distributed for the emoticon;The emoticon is replaced into the emoticon in the first information Number interim numbering obtain the 5th information;It is the emoticon according to position of the emoticon in the first information Distribute temporary variable;Associate the temporary variable of the emoticon and interim numbering;By the expression in the 5th information The interim numbering of symbol is replaced into the temporary variable associated with the interim numbering of the emoticon and obtains the second information;
    When described first is identified as word corresponding to the emoticon, the language format of the word is original language form; It is described that the first mark that the emoticon is replaced into for identifying the emoticon is obtained the in the first information Two information, including:
    According to the emoticon, the attribute information of the emoticon is obtained;Obtained according to the attribute information of the emoticon Take at least one word corresponding to the emoticon;The emoticon in the first information is replaced into respectively described Each word at least one word, obtain the second information corresponding to each word.
  2. 2. the method as described in claim 1, it is characterised in that described second is identified as temporary variable, described from the described 3rd Second mark corresponding with the described first mark is extracted in information, including:
    The temporary variable that the 3rd packet contains is extracted from the 3rd information;
    Correspondingly, it is described that the described second mark is replaced into emoticon corresponding to second mark in the 3rd information The 4th information is obtained, including:
    Obtain the interim numbering associated with the temporary variable;
    The temporary variable is replaced into the interim numbering associated with the temporary variable in the 3rd information and obtains Six information;
    Obtain emoticon corresponding to the interim numbering;
    The interim numbering is replaced into the emoticon corresponding to numbering temporarily in the 6th information and obtains the 4th letter Breath.
  3. 3. the method as described in claim 1, it is characterised in that it is described according to the emoticon, obtain the emoticon Attribute information, including:
    According to the icon data of the emoticon, the emoticon is obtained from icon data and the corresponding relation of call number Call number;
    According to the call number of the emoticon, institute is obtained from the corresponding relation of call number corresponding to original language and attribute information State the attribute information of emoticon.
  4. 4. the method as described in claim 1, it is characterised in that described according to obtaining the attribute information of the emoticon At least one word corresponding to emoticon, including:
    The attribute information of the emoticon similarity between each attribute information in semantic dictionary respectively is calculated, it is described Semantic dictionary is used for the corresponding relation of attribute information storage and word;
    From the semantic dictionary obtain and the attribute information of the emoticon between similarity meet preparatory condition to A few attribute information;
    Word corresponding to each attribute information at least one attribute information is obtained from the semantic dictionary.
  5. 5. the method as described in claim 1, it is characterised in that described according to obtaining the attribute information of the emoticon At least one word corresponding to emoticon, including:
    Word corresponding to the emoticon is extracted from attribute information corresponding to the emoticon;
    The synonym or near synonym of word corresponding to the emoticon are obtained, and using the synonym and near synonym as described in Word corresponding to emoticon.
  6. 6. the method as described in claim 1, it is characterised in that the extraction from the 3rd information and the described first mark The second corresponding mark, including:
    The word of object language form corresponding to first mark is extracted from the 3rd information, by the word of the extraction As the second mark;
    Correspondingly, it is described that the described second mark is replaced into emoticon corresponding to second mark in the 3rd information The 4th information is obtained, including:
    The corresponding pass comprising the described second mark is obtained in corresponding relation from attribute information corresponding to object language with call number System;
    Extract the call number included in the corresponding relation of the acquisition;
    According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
    The icon data for the emoticon that the described second mark is replaced into the acquisition in the 3rd information obtains the 4th Information.
  7. 7. the method as described in claim 1, it is characterised in that the extraction from the 3rd information and the described first mark The second corresponding mark, including:
    The word of object language form corresponding to first mark is extracted from the 3rd information, by the word of the extraction As the second mark;
    Correspondingly, it is described that the described second mark is replaced into emoticon corresponding to second mark in the 3rd information The 4th information is obtained, including:
    Obtain first mark corresponding with the described second mark;
    The corresponding relation for including the described first mark is obtained from attribute information corresponding to original language and the corresponding relation of call number;
    Extract the call number included in the corresponding relation of the acquisition;
    According to the call number, the icon data of acquisition emoticon from the corresponding relation of call number and icon data;
    The icon data that described second mark is replaced into the emoticon in the 3rd information obtains the 4th information.
  8. 8. a kind of device of translation information, it is characterised in that described device includes:
    First acquisition module, the emoticon included in the first information for obtaining original language form;
    First replacement module, for the emoticon to be replaced into for identifying the emoticon in the first information First mark obtain the second information, described first is identified as word corresponding to temporary variable or the emoticon;
    Translation module, for second information to be translated as to the 3rd information of object language form;
    First extraction module, for extracting second mark corresponding with the described first mark from the 3rd information;
    Second replacement module, for the described second mark to be replaced into table corresponding to second mark in the 3rd information Feelings symbol obtains the 4th information;
    When described first is identified as temporary variable, the form of the temporary variable is all identical in every kind of language format;It is described First replacement module, it is additionally operable to distribute interim numbering for the emoticon;By the emoticon in the first information The interim numbering for being replaced into the emoticon obtains the 5th information;According to position of the emoticon in the first information It is set to the emoticon distribution temporary variable;Associate the temporary variable of the emoticon and interim numbering;The described 5th The interim numbering of the emoticon is replaced into the temporary variable associated with the interim numbering of the emoticon in information Obtain the second information;
    When described first is identified as word corresponding to the emoticon, the language format of the word is original language form; First replacement module, it is additionally operable to, according to the emoticon, obtain the attribute information of the emoticon;According to the table The attribute information of feelings symbol obtains at least one word corresponding to the emoticon;By the expression in the first information Symbol is replaced into each word at least one word respectively, obtains the second information corresponding to each word.
CN201510119654.0A 2015-03-18 2015-03-18 The method and apparatus of translation information Active CN104699675B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510119654.0A CN104699675B (en) 2015-03-18 2015-03-18 The method and apparatus of translation information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510119654.0A CN104699675B (en) 2015-03-18 2015-03-18 The method and apparatus of translation information

Publications (2)

Publication Number Publication Date
CN104699675A CN104699675A (en) 2015-06-10
CN104699675B true CN104699675B (en) 2018-01-30

Family

ID=53346814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510119654.0A Active CN104699675B (en) 2015-03-18 2015-03-18 The method and apparatus of translation information

Country Status (1)

Country Link
CN (1) CN104699675B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928236B2 (en) * 2015-09-18 2018-03-27 Mcafee, Llc Systems and methods for multi-path language translation
CN106708810A (en) * 2016-12-19 2017-05-24 新译信息科技(深圳)有限公司 Machine translation method, device and terminal device
CN110688840B (en) * 2019-09-26 2022-07-26 联想(北京)有限公司 Text conversion method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655231A (en) * 2004-02-10 2005-08-17 乐金电子(中国)研究开发中心有限公司 Expression figure explanation treatment method for text and voice transfer system
US6963839B1 (en) * 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
CN101030368A (en) * 2006-03-03 2007-09-05 国际商业机器公司 Method and system for communicating across channels simultaneously with emotion preservation
CN101937431A (en) * 2010-08-18 2011-01-05 华南理工大学 Emotional voice translation device and processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963839B1 (en) * 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
CN1655231A (en) * 2004-02-10 2005-08-17 乐金电子(中国)研究开发中心有限公司 Expression figure explanation treatment method for text and voice transfer system
CN101030368A (en) * 2006-03-03 2007-09-05 国际商业机器公司 Method and system for communicating across channels simultaneously with emotion preservation
CN101937431A (en) * 2010-08-18 2011-01-05 华南理工大学 Emotional voice translation device and processing method

Also Published As

Publication number Publication date
CN104699675A (en) 2015-06-10

Similar Documents

Publication Publication Date Title
Vilares et al. Universal, unsupervised (rule-based), uncovered sentiment analysis
US10031910B1 (en) System and methods for rule-based sentiment analysis
Sidorov et al. Empirical study of machine learning based approach for opinion mining in tweets
CN104881402B (en) The method and device of Chinese network topics comment text semantic tendency analysis
Vandeghinste et al. Translating text into pictographs
CN102122297A (en) Semantic-based Chinese network text emotion extracting method
CN106844348B (en) Method for analyzing functional components of Chinese sentences
KR20100035940A (en) System for extraction and analysis of opinion in web documents and method thereof
CN107291680A (en) A kind of system and implementation method that automatically generate composition based on template
Svoboda et al. New word analogy corpus for exploring embeddings of Czech words
Outahajala et al. Building an annotated corpus for Amazighe
CN109460552A (en) Rule-based and corpus Chinese faulty wording automatic testing method and equipment
CN106446147A (en) Emotion analysis method based on structuring features
Priyadarshi et al. Towards the first Maithili part of speech tagger: Resource creation and system development
CN104699675B (en) The method and apparatus of translation information
Hamdi et al. POS-tagging of Tunisian dialect using standard Arabic resources and tools
CN111259661B (en) New emotion word extraction method based on commodity comments
Gîfu et al. Multi-dimensional analysis of political language
Alotaiby et al. Arabic vs. English: Comparative statistical study
Lin et al. Developing a chunk-based grammar checker for translated English sentences
CN111914533A (en) Method and system for analyzing English long sentence
CN105045784A (en) English expression access device method and device
KR102182248B1 (en) System and method for checking grammar and computer program for the same
Alam et al. Multi-lingual author identification and linguistic feature extraction—A machine learning approach
Fonseca et al. An architecture for semantic role labeling on portuguese

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant