CN104537066A

CN104537066A - Near-synonym correlation method based on multi-language translation

Info

Publication number: CN104537066A
Application number: CN201410839087.1A
Authority: CN
Inventors: 陈立杰; 李之光
Original assignee: ZHENGZHOU ZONEYET TECHNOLOGY Co Ltd
Current assignee: ZHENGZHOU ZONEYET TECHNOLOGY Co Ltd
Priority date: 2014-12-30
Filing date: 2014-12-30
Publication date: 2015-04-22
Anticipated expiration: 2034-12-30
Also published as: CN104537066B

Abstract

The invention relates to a near-synonym correlation method based on multi-language translation. The near-synonym correlation method includes the steps that unique identification is given to each element record, wherein if word groups in different languages have the same meaning, the uniform label identification is given to the word groups, if word groups in the same language have the same meaning, near-synonym label identification is given to the word groups, and uniform labels and near-synonym labels in the element record are encrypted; when multiple cross-linguistic element records have the same uniform label or near-synonym label, it is proved that cross-linguistic near-synonyms or synonyms exist among the element records, detailed information of the near-synonyms or the synonyms of input key words can be retrieved conveniently, matching between the cross-linguistic synonyms or the cross-linguistic near-synonyms is achieved, and the efficiency of information retrieval and language translation is improved.

Description

Based on the near synonym correlating method of multilingual translation

Technical field

The present invention relates to network data retrieval and semantic analysis field, particularly a kind of near synonym correlating method based on multilingual translation.

Background technology

For daily retrieval, normal conditions can only retrieve the information comprising search key, if need retrieval to comprise relevant synonym or the near synonym information of key word, usually can become very difficult; If realize the retrieval of key word under multi-language environment, be almost impossible complete such work.No matter but the multilingual translation of specialty or daily across multilingual information retrieval, all need to solve this difficult problem.

Summary of the invention

For multilingual translation of the prior art or the synonym retrieval difficult problem across language, the invention provides a kind of near synonym correlating method based on multilingual translation, be applicable to information retrieval, search engine, Language Translation etc., the difficult point solve multilingual intertranslation, associating across language near synonym synonym.

According to design proposal provided by the present invention, a kind of near synonym correlating method based on multilingual translation, comprises following steps:

Step 1, be source Word message by convert information to be translated;

Step 2, according to different language, by the source Word message of conversion stored in word processing unit, participle punctuate is carried out to the source Word message that word processing unit stores, source Word message, information after participle punctuate forms an element record stored in result set, and give this element record identity label mark and time tag mark, if judge, the phrase of different language has identical connotation, then give this phrase and unify tag identifier, if judge, the phrase of same language has identical connotation, then give nearly adopted tag identifier, and to the unified label in this element record and nearly adopted tag encryption process, identical father's label is given to the element record with identical near adopted label and unified label,

Step 3, carry out information association database retrieval according to the key word in element record, search and whether have corresponding record, if having, then complete association; Otherwise, enter next step;

Step 4, by give in step 2 identity label, time tag, unified label, nearly adopted label and father's label element record be stored in information association database, carry out information association database retrieval for user.

Above-mentioned, packets of information to be translated in step 1 is containing voice messaging, text message, image information and video information, voice messaging is converted into source Word message by speech recognition, image information is converted into source Word message by image recognition, and video information is converted into source Word message by video identification.

Above-mentioned, the participle punctuate of step 2 comprises word processing unit and carries out participle punctuate according to the different language family of languages to source Word message, if source Word message is Romance, carries out participle according to space, and stored in result set; If source Word message is department of oriental languages, then first individual character is disassembled, and is reassembled into phrase, and mate with this family of languages dictionary, if the match is successful, be then effective phrase, otherwise, be considered as invalid phrase, by individual character and this effective phrase stored in result set, the phrase in result set is mated with information association database, if the match is successful for phrase, then be considered as having associated phrase, remove from result set.

The present invention is based on the beneficial effect of the near synonym correlating method of multilingual translation:

The present invention is applicable to information retrieval, search engine, Language Translation etc., solve multilingual intertranslation, across the difficult point of language near synonym synonym association, unique identify label is all given by every bar element record, the phrase of different language has identical connotation, then give this phrase and unify tag identifier, the phrase of same language has identical connotation, then give nearly adopted tag identifier, and to the unified label in this element record and nearly adopted tag encryption process, when the multiple element records across language have identical unified label or nearly adopted label, then prove between it it is the near synonym across language or synonym, the nearly justice of the key word of input or the details of synonym can be retrieved easily, realize across the coupling between the synonym of language and near synonym, improve information retrieval, the efficiency of Language Translation.

accompanying drawing illustrates:

Fig. 1 is the schematic flow sheet of the near synonym correlating method that the present invention is based on multilingual translation;

Fig. 2 is that result set of the present invention stores schematic diagram;

Fig. 3 is label correlating method schematic diagram of the present invention;

Fig. 4 is embodiments of the invention association schematic diagram.

embodiment:

Below in conjunction with accompanying drawing and technical scheme, the present invention is further detailed explanation, and describe embodiments of the present invention in detail by preferred embodiment, but embodiments of the present invention are not limited to this.

Embodiment: shown in Fig. 1 ~ 4, a kind of near synonym correlating method based on multilingual translation, comprises following steps:

Step 1, be source Word message by convert information to be translated;

Shown in Fig. 2, each element record stores all gives a unique identity, the time simultaneously stored by this element record is as time tag, mode field in storage identifies the state of this field, and unified label and nearly adopted label are used for multilingual translation association, associate with language equivalents near synonym; Shown in Fig. 3, multiple element record has identical unified label or nearly adopted label, namely has identical father's label, then illustrate it is near synonym across language or synonym, can retrieve entry information easily; Enumerate instantiation, as shown in Figure 4, in different language, " guitar " has different statements, as " Guitar " in English, in Spanish " Guitarra ", in Japanese " ギター ", in these languages, " guitar " is associated by the nearly adopted label of label " Tag1 ", represents between it it is translation between different language; Meanwhile, in Chinese, " guitar " and " Chinese lute " is more close musical instrument, the two is associated by " Tag2 " unified label; Then, by " Tag3 ", " Tag1 " and " Tag2 " is associated; After having associated, if user wants " ギター " to translate into English, just can by " Tag1 " and languages quick-searching to object content, if user wants to inquire about all the elements about " ギター ", that just can pass through " Tag1 ", " Tag2 ", " Tag3 " is all retrieved " guitar ", " Guitar ", " Guitarra ", " ギター ", " Chinese lute " rapidly, efficient and convenient.

The present invention is not limited to above-mentioned embodiment, and those skilled in the art also can make multiple change accordingly, but to be anyly equal to the present invention or similar change all should be encompassed in the scope of the claims in the present invention.

Claims

1., based on a near synonym correlating method for multilingual translation, it is characterized in that: comprise following steps:

Step 1, be source Word message by convert information to be translated;

2. the near synonym correlating method based on multilingual translation according to claim 1, it is characterized in that: the packets of information to be translated in step 1 is containing voice messaging, text message, image information and video information, voice messaging is converted into source Word message by speech recognition, image information is converted into source Word message by image recognition, and video information is converted into source Word message by video identification.

3. the near synonym correlating method based on multilingual translation according to claim 1, it is characterized in that: the participle punctuate of described step 2 comprises word processing unit and carries out participle punctuate according to the different language family of languages to source Word message, if source Word message is Romance, participle is carried out according to space, and stored in result set; If source Word message is department of oriental languages, then first individual character is disassembled, and is reassembled into phrase, and mate with this family of languages dictionary, if the match is successful, be then effective phrase, otherwise, be considered as invalid phrase, by individual character and this effective phrase stored in result set, the phrase in result set is mated with information association database, if the match is successful for phrase, then be considered as having associated phrase, remove from result set.