CN104537066A - Near-synonym correlation method based on multi-language translation - Google Patents

Near-synonym correlation method based on multi-language translation Download PDF

Info

Publication number
CN104537066A
CN104537066A CN201410839087.1A CN201410839087A CN104537066A CN 104537066 A CN104537066 A CN 104537066A CN 201410839087 A CN201410839087 A CN 201410839087A CN 104537066 A CN104537066 A CN 104537066A
Authority
CN
China
Prior art keywords
label
phrase
information
word message
source word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410839087.1A
Other languages
Chinese (zh)
Other versions
CN104537066B (en
Inventor
陈立杰
李之光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHENGZHOU ZONEYET TECHNOLOGY Co Ltd
Original Assignee
ZHENGZHOU ZONEYET TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHENGZHOU ZONEYET TECHNOLOGY Co Ltd filed Critical ZHENGZHOU ZONEYET TECHNOLOGY Co Ltd
Priority to CN201410839087.1A priority Critical patent/CN104537066B/en
Publication of CN104537066A publication Critical patent/CN104537066A/en
Application granted granted Critical
Publication of CN104537066B publication Critical patent/CN104537066B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion

Abstract

The invention relates to a near-synonym correlation method based on multi-language translation. The near-synonym correlation method includes the steps that unique identification is given to each element record, wherein if word groups in different languages have the same meaning, the uniform label identification is given to the word groups, if word groups in the same language have the same meaning, near-synonym label identification is given to the word groups, and uniform labels and near-synonym labels in the element record are encrypted; when multiple cross-linguistic element records have the same uniform label or near-synonym label, it is proved that cross-linguistic near-synonyms or synonyms exist among the element records, detailed information of the near-synonyms or the synonyms of input key words can be retrieved conveniently, matching between the cross-linguistic synonyms or the cross-linguistic near-synonyms is achieved, and the efficiency of information retrieval and language translation is improved.

Description

Based on the near synonym correlating method of multilingual translation
Technical field
The present invention relates to network data retrieval and semantic analysis field, particularly a kind of near synonym correlating method based on multilingual translation.
Background technology
For daily retrieval, normal conditions can only retrieve the information comprising search key, if need retrieval to comprise relevant synonym or the near synonym information of key word, usually can become very difficult; If realize the retrieval of key word under multi-language environment, be almost impossible complete such work.No matter but the multilingual translation of specialty or daily across multilingual information retrieval, all need to solve this difficult problem.
Summary of the invention
For multilingual translation of the prior art or the synonym retrieval difficult problem across language, the invention provides a kind of near synonym correlating method based on multilingual translation, be applicable to information retrieval, search engine, Language Translation etc., the difficult point solve multilingual intertranslation, associating across language near synonym synonym.
According to design proposal provided by the present invention, a kind of near synonym correlating method based on multilingual translation, comprises following steps:
Step 1, be source Word message by convert information to be translated;
Step 2, according to different language, by the source Word message of conversion stored in word processing unit, participle punctuate is carried out to the source Word message that word processing unit stores, source Word message, information after participle punctuate forms an element record stored in result set, and give this element record identity label mark and time tag mark, if judge, the phrase of different language has identical connotation, then give this phrase and unify tag identifier, if judge, the phrase of same language has identical connotation, then give nearly adopted tag identifier, and to the unified label in this element record and nearly adopted tag encryption process, identical father's label is given to the element record with identical near adopted label and unified label,
Step 3, carry out information association database retrieval according to the key word in element record, search and whether have corresponding record, if having, then complete association; Otherwise, enter next step;
Step 4, by give in step 2 identity label, time tag, unified label, nearly adopted label and father's label element record be stored in information association database, carry out information association database retrieval for user.
Above-mentioned, packets of information to be translated in step 1 is containing voice messaging, text message, image information and video information, voice messaging is converted into source Word message by speech recognition, image information is converted into source Word message by image recognition, and video information is converted into source Word message by video identification.
Above-mentioned, the participle punctuate of step 2 comprises word processing unit and carries out participle punctuate according to the different language family of languages to source Word message, if source Word message is Romance, carries out participle according to space, and stored in result set; If source Word message is department of oriental languages, then first individual character is disassembled, and is reassembled into phrase, and mate with this family of languages dictionary, if the match is successful, be then effective phrase, otherwise, be considered as invalid phrase, by individual character and this effective phrase stored in result set, the phrase in result set is mated with information association database, if the match is successful for phrase, then be considered as having associated phrase, remove from result set.
The present invention is based on the beneficial effect of the near synonym correlating method of multilingual translation:
The present invention is applicable to information retrieval, search engine, Language Translation etc., solve multilingual intertranslation, across the difficult point of language near synonym synonym association, unique identify label is all given by every bar element record, the phrase of different language has identical connotation, then give this phrase and unify tag identifier, the phrase of same language has identical connotation, then give nearly adopted tag identifier, and to the unified label in this element record and nearly adopted tag encryption process, when the multiple element records across language have identical unified label or nearly adopted label, then prove between it it is the near synonym across language or synonym, the nearly justice of the key word of input or the details of synonym can be retrieved easily, realize across the coupling between the synonym of language and near synonym, improve information retrieval, the efficiency of Language Translation.
accompanying drawing illustrates:
Fig. 1 is the schematic flow sheet of the near synonym correlating method that the present invention is based on multilingual translation;
Fig. 2 is that result set of the present invention stores schematic diagram;
Fig. 3 is label correlating method schematic diagram of the present invention;
Fig. 4 is embodiments of the invention association schematic diagram.
embodiment:
Below in conjunction with accompanying drawing and technical scheme, the present invention is further detailed explanation, and describe embodiments of the present invention in detail by preferred embodiment, but embodiments of the present invention are not limited to this.
Embodiment: shown in Fig. 1 ~ 4, a kind of near synonym correlating method based on multilingual translation, comprises following steps:
Step 1, be source Word message by convert information to be translated;
Step 2, according to different language, by the source Word message of conversion stored in word processing unit, participle punctuate is carried out to the source Word message that word processing unit stores, source Word message, information after participle punctuate forms an element record stored in result set, and give this element record identity label mark and time tag mark, if judge, the phrase of different language has identical connotation, then give this phrase and unify tag identifier, if judge, the phrase of same language has identical connotation, then give nearly adopted tag identifier, and to the unified label in this element record and nearly adopted tag encryption process, identical father's label is given to the element record with identical near adopted label and unified label,
Step 3, carry out information association database retrieval according to the key word in element record, search and whether have corresponding record, if having, then complete association; Otherwise, enter next step;
Step 4, by give in step 2 identity label, time tag, unified label, nearly adopted label and father's label element record be stored in information association database, carry out information association database retrieval for user.
Above-mentioned, packets of information to be translated in step 1 is containing voice messaging, text message, image information and video information, voice messaging is converted into source Word message by speech recognition, image information is converted into source Word message by image recognition, and video information is converted into source Word message by video identification.
Above-mentioned, the participle punctuate of step 2 comprises word processing unit and carries out participle punctuate according to the different language family of languages to source Word message, if source Word message is Romance, carries out participle according to space, and stored in result set; If source Word message is department of oriental languages, then first individual character is disassembled, and is reassembled into phrase, and mate with this family of languages dictionary, if the match is successful, be then effective phrase, otherwise, be considered as invalid phrase, by individual character and this effective phrase stored in result set, the phrase in result set is mated with information association database, if the match is successful for phrase, then be considered as having associated phrase, remove from result set.
Shown in Fig. 2, each element record stores all gives a unique identity, the time simultaneously stored by this element record is as time tag, mode field in storage identifies the state of this field, and unified label and nearly adopted label are used for multilingual translation association, associate with language equivalents near synonym; Shown in Fig. 3, multiple element record has identical unified label or nearly adopted label, namely has identical father's label, then illustrate it is near synonym across language or synonym, can retrieve entry information easily; Enumerate instantiation, as shown in Figure 4, in different language, " guitar " has different statements, as " Guitar " in English, in Spanish " Guitarra ", in Japanese " ギ タ ー ", in these languages, " guitar " is associated by the nearly adopted label of label " Tag1 ", represents between it it is translation between different language; Meanwhile, in Chinese, " guitar " and " Chinese lute " is more close musical instrument, the two is associated by " Tag2 " unified label; Then, by " Tag3 ", " Tag1 " and " Tag2 " is associated; After having associated, if user wants " ギ タ ー " to translate into English, just can by " Tag1 " and languages quick-searching to object content, if user wants to inquire about all the elements about " ギ タ ー ", that just can pass through " Tag1 ", " Tag2 ", " Tag3 " is all retrieved " guitar ", " Guitar ", " Guitarra ", " ギ タ ー ", " Chinese lute " rapidly, efficient and convenient.
The present invention is not limited to above-mentioned embodiment, and those skilled in the art also can make multiple change accordingly, but to be anyly equal to the present invention or similar change all should be encompassed in the scope of the claims in the present invention.

Claims (3)

1., based on a near synonym correlating method for multilingual translation, it is characterized in that: comprise following steps:
Step 1, be source Word message by convert information to be translated;
Step 2, according to different language, by the source Word message of conversion stored in word processing unit, participle punctuate is carried out to the source Word message that word processing unit stores, source Word message, information after participle punctuate forms an element record stored in result set, and give this element record identity label mark and time tag mark, if judge, the phrase of different language has identical connotation, then give this phrase and unify tag identifier, if judge, the phrase of same language has identical connotation, then give nearly adopted tag identifier, and to the unified label in this element record and nearly adopted tag encryption process, identical father's label is given to the element record with identical near adopted label and unified label,
Step 3, carry out information association database retrieval according to the key word in element record, search and whether have corresponding record, if having, then complete association; Otherwise, enter next step;
Step 4, by give in step 2 identity label, time tag, unified label, nearly adopted label and father's label element record be stored in information association database, carry out information association database retrieval for user.
2. the near synonym correlating method based on multilingual translation according to claim 1, it is characterized in that: the packets of information to be translated in step 1 is containing voice messaging, text message, image information and video information, voice messaging is converted into source Word message by speech recognition, image information is converted into source Word message by image recognition, and video information is converted into source Word message by video identification.
3. the near synonym correlating method based on multilingual translation according to claim 1, it is characterized in that: the participle punctuate of described step 2 comprises word processing unit and carries out participle punctuate according to the different language family of languages to source Word message, if source Word message is Romance, participle is carried out according to space, and stored in result set; If source Word message is department of oriental languages, then first individual character is disassembled, and is reassembled into phrase, and mate with this family of languages dictionary, if the match is successful, be then effective phrase, otherwise, be considered as invalid phrase, by individual character and this effective phrase stored in result set, the phrase in result set is mated with information association database, if the match is successful for phrase, then be considered as having associated phrase, remove from result set.
CN201410839087.1A 2014-12-30 2014-12-30 Near synonym correlating method based on multilingual translation Active CN104537066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410839087.1A CN104537066B (en) 2014-12-30 2014-12-30 Near synonym correlating method based on multilingual translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410839087.1A CN104537066B (en) 2014-12-30 2014-12-30 Near synonym correlating method based on multilingual translation

Publications (2)

Publication Number Publication Date
CN104537066A true CN104537066A (en) 2015-04-22
CN104537066B CN104537066B (en) 2017-10-03

Family

ID=52852594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410839087.1A Active CN104537066B (en) 2014-12-30 2014-12-30 Near synonym correlating method based on multilingual translation

Country Status (1)

Country Link
CN (1) CN104537066B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368117A (en) * 2018-12-26 2020-07-03 财团法人工业技术研究院 Cross-language information constructing and processing method and cross-language information system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070067291A1 (en) * 2005-09-19 2007-03-22 Kolo Brian A System and method for negative entity extraction technique
CN102637167A (en) * 2011-12-23 2012-08-15 东莞康明电子有限公司 Multilingual inter-translation method
CN102662937A (en) * 2012-04-12 2012-09-12 传神联合(北京)信息技术有限公司 Automatic translation system and automatic translation method thereof
CN102799661A (en) * 2012-07-09 2012-11-28 北京中科希望软件股份有限公司 Method and system for implementing semantic retrieval on electronic files
CN103885940A (en) * 2012-12-19 2014-06-25 新疆信息产业有限责任公司 Multilingual dictionary translation method based on network services

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070067291A1 (en) * 2005-09-19 2007-03-22 Kolo Brian A System and method for negative entity extraction technique
CN102637167A (en) * 2011-12-23 2012-08-15 东莞康明电子有限公司 Multilingual inter-translation method
CN102662937A (en) * 2012-04-12 2012-09-12 传神联合(北京)信息技术有限公司 Automatic translation system and automatic translation method thereof
CN102799661A (en) * 2012-07-09 2012-11-28 北京中科希望软件股份有限公司 Method and system for implementing semantic retrieval on electronic files
CN103885940A (en) * 2012-12-19 2014-06-25 新疆信息产业有限责任公司 Multilingual dictionary translation method based on network services

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘薇: "浅论中韩翻译过程中同类\同义单词的择选原则", 《云梦学刊》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368117A (en) * 2018-12-26 2020-07-03 财团法人工业技术研究院 Cross-language information constructing and processing method and cross-language information system

Also Published As

Publication number Publication date
CN104537066B (en) 2017-10-03

Similar Documents

Publication Publication Date Title
JP6829559B2 (en) Named place name dictionary for documents for named entity extraction
TWI601129B (en) A semantic parsing system and method for spoken language
Garg et al. Rule based Hindi part of speech tagger
CN110119510B (en) Relationship extraction method and device based on transfer dependency relationship and structure auxiliary word
JP2009151777A (en) Method and apparatus for aligning spoken language parallel corpus
WO2017166626A1 (en) Normalization method, device and electronic equipment
CN111553150A (en) Method, system, device and storage medium for analyzing and configuring automatic API (application program interface) document
Wang et al. Chinese informal word normalization: an experimental study
US8041556B2 (en) Chinese to english translation tool
Huang et al. Words without boundaries: Computational approaches to Chinese word segmentation
Bohbot et al. Presenting the Nénufar project: a diachronic digital edition of the Petit Larousse Illustré
JP5291351B2 (en) Evaluation expression extraction method, evaluation expression extraction device, and evaluation expression extraction program
CN102609410B (en) Authority file auxiliary writing system and authority file generating method
CN103164398A (en) Chinese-Uygur language electronic dictionary and automatic translating Chinese-Uygur language method thereof
CN102629244B (en) Multi-language work card generating system and method
TWI818713B (en) Computer-implemented method, computer program product and computer system for automatically assign term to text documents
CN104537066A (en) Near-synonym correlation method based on multi-language translation
Munkhjargal et al. Named entity recognition for Mongolian language
CN103164395A (en) Chinese-Kirgiz language electronic dictionary and automatic translating Chinese-Kirgiz language method thereof
CN109408828A (en) Words partition system for television field semantic analysis
KR20120045906A (en) Apparatus and method for correcting error of corpus
MX2022014972A (en) Information retrieval system.
Kim et al. Towards a Korean DBpedia and an Approach for Complementing the Korean Wikipedia based on DBpedia.
Sellam et al. Improved Statistical Machine Translation by Cross-Lingustic Projection of Named Entities Recognition and Translation
Gupta et al. A new approach towards bibliographic reference identification, parsing and inline citation matching

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: High tech Zone in Henan province Zhengzhou City, Tsui Chuk Street 450000 No. 6 9 Unit 1 building 4 layer 421

Applicant after: Zhengzhou Polytron Technologies Inc

Address before: 450002, Zhengzhou, Henan, Nanyang Road, Kong Gang Du North Street, seven floor business building

Applicant before: ZHENGZHOU ZONEYET TECHNOLOGY CO., LTD.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant