WO2015096625A1 - Information fragment translating method and system - Google Patents

Information fragment translating method and system Download PDF

Info

Publication number
WO2015096625A1
WO2015096625A1 PCT/CN2014/093657 CN2014093657W WO2015096625A1 WO 2015096625 A1 WO2015096625 A1 WO 2015096625A1 CN 2014093657 W CN2014093657 W CN 2014093657W WO 2015096625 A1 WO2015096625 A1 WO 2015096625A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
user
translation
text content
fragment
Prior art date
Application number
PCT/CN2014/093657
Other languages
French (fr)
Chinese (zh)
Inventor
江潮
Original Assignee
语联网(武汉)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 语联网(武汉)信息技术有限公司 filed Critical 语联网(武汉)信息技术有限公司
Publication of WO2015096625A1 publication Critical patent/WO2015096625A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Definitions

  • the present invention relates to the field of computers, and in particular, to a method and system for translating information fragments.
  • the present invention provides a method and system for translating information fragments to solve the problem that the information fragments in various languages cannot be directly viewed in the prior art.
  • the invention discloses a method for translating information fragments, comprising:
  • the textual content of the information fragment is displayed against the translation in a document format selected by the user.
  • the translation direction of the information fragment is determined by a target language set by the user.
  • the translation direction of the information fragment is determined by identifying a common language of the user and using the common language of the user as the target language.
  • the process of identifying a common language of the user includes:
  • the system language of the digital terminal of the user is identified, and the system language of the digital terminal is used as the common language of the user.
  • the method further includes: identifying a source of information of the information fragment when the user selects the information fragment;
  • the text content and the information source of the information fragment are respectively placed in a corresponding database for collection and storage;
  • the information source of the information fragment is displayed while displaying the text content and the translation of the information fragment.
  • the method further comprises:
  • the text content, translation and information source of the selected information fragment are displayed in the document format selected by the user.
  • the method further includes: after identifying the text content of the plurality of information fragments selected by the user, determining a keyword in the text content of each information fragment, and using the obtained keyword as the information fragment in the index directory.
  • the summary shows.
  • the information fragment comprises: a text format and a picture format
  • the user selects the corresponding global hotkey to invoke the corresponding selection function, and selects the information fragment of the text format or the image format.
  • the invention discloses a translation system for information fragments, comprising: an information recognition module, configured to identify text content and information source of information fragments selected by a user, and send the text content of the information fragment to a translation processing module for translation,
  • the text content, the translation and the information source of the information fragment are respectively placed in a corresponding database for collection and storage;
  • the translation processing module is configured to determine a translation direction of the information fragment, and translate the text content of the information fragment according to the determined translation direction;
  • a document output module is configured to display the text content, the translation, and the information source of the information fragment stored in the collection in a user-selected document format.
  • the method further includes: a parsing module, configured to identify a global hotkey triggered by the user, and send the identified global hotkey mapping control instruction to the corresponding selection module to provide a corresponding selection function of the user.
  • a parsing module configured to identify a global hotkey triggered by the user, and send the identified global hotkey mapping control instruction to the corresponding selection module to provide a corresponding selection function of the user.
  • a directory indexing module configured to create an index directory for all information fragments in the database for the user to select
  • the present invention includes the following advantages:
  • the translated text content is directly translated, and the obtained text content and translation are stored, and the user can view the information fragment at any time and directly;
  • Figure 1 shows a first flow chart of an embodiment
  • Figure 2 shows a second flow chart of an embodiment
  • Fig. 3 shows a schematic structural view of an embodiment.
  • the present invention discloses a translation system for information fragments, including:
  • the parsing module 1 the text selecting module 2, the picture selecting module 3, the information identifying module 4, the translation processing module 5, the directory indexing module 9 and the document output module 10;
  • the parsing module is configured to identify a global hotkey triggered by the user, and send the identified global hotkey mapping control instruction to the corresponding selecting module to provide a corresponding selection function of the user;
  • the global hotkey can be a single button or a combination of multiple individual buttons.
  • the information fragment is not only a text that can be selected, but also an unselectable text and a picture containing the fragmented information;
  • the parsing module After the parsing module recognizes the first global hotkey triggered by the user, the parsing module sends the control instruction of the first global hotkey mapping to the text selection module;
  • the text selection module After receiving the control instruction of the first global hotkey mapping sent by the parsing module, the text selection module provides a function for the user to directly select the information fragment in the text format;
  • the parsing module After the parsing module recognizes the second global hotkey triggered by the user, the parsing module sends a control instruction of the second global hotkey mapping to the image selecting module;
  • the image selection module After receiving the control instruction of the second global hotkey mapping sent by the parsing module, the image selection module provides a function of the user to capture the information fragment of the image format.
  • the system After the user selects the information fragment, the system automatically sends the selected information fragment to the information identification module;
  • the information identification module is configured to receive the information fragment selected by the user, and identify the text content and the information source of the information fragment;
  • the information source is the local storage address of the information fragment, for example, c: ⁇ 1 ⁇ 2 ⁇ 3 ⁇ the document where the information fragment is located; wherein the document where the information fragment is located can be various document formats, for example: various offices Documents, notepads, documents used to compile code, etc.; for network resources, the source of information is the network address of the information fragment, for example:
  • a translation processing module comprising: a translation direction identification module, a language matching module, and a translation module;
  • a translation direction identification module for identifying a source language of the text content of the information fragment and a target language of the translation
  • the target language of the translation is used to identify the user by recognition or by using the language commonly used by the user as the target language
  • Identify the user's local domain use the native language of the user's local domain as the user's common language; or identify the user's system language with the digital terminal installed in the system, and use the system language of the digital terminal as the user's common language.
  • a matching module configured to detect whether the source language of the information fragment and the translated target language are consistent
  • the information recognition module separately stores the text content and the information source of the information fragment into the corresponding database for collection and storage;
  • the text content of the information fragment is sent to the translation module for translation processing according to the target language, and the translation of the information fragment is obtained, and then the information is recognized.
  • the module stores the text content, translation and information source of the information fragment into the corresponding database for collection and storage.
  • the database includes: wherein the database comprises: a first database 8, a second database 6, and a third database 7;
  • Text content for storing information fragments in the first database
  • the source of information used to store information fragments in the second database is the source of information used to store information fragments in the second database
  • the text content, translation, and information source of the same information fragment have a mapping relationship among the three databases.
  • the information fragments matched by the user search words can be found by searching in the corresponding database according to the text content, the translation and the information source respectively, and outputting the display through the document output module.
  • a document output module configured to display the text content and the information source of the information fragment in a document format selected by a user
  • the information fragment translation is simultaneously displayed;
  • the text content and translation of the information fragment are displayed as a comparison.
  • the text content of multiple pieces of information can also be integrated into one document for display.
  • a directory indexing module for indexing information fragments in a database
  • the name in the index directory may be a number in a certain order, for example, a logical number after the length of the information fragment, the size, or the information acquisition time.
  • the name compiled by the user or the word displayed by the user in the information fragment can also be the name compiled by the user or the word displayed by the user in the information fragment.
  • the way of marking is to select the word in the picture through the screenshot, and after the information recognition module recognizes it, use it as an index.
  • the name of the directory is used;
  • the user determines a keyword in the information fragment, wherein the keyword may be one or more, and the process of determining the keyword is: a word compiled by the user or a word marked by the user in the information fragment;
  • the keyword of the information fragment is determined, the keyword is displayed together with the name of the index directory corresponding to the information fragment, and the summary display of the information fragment provides the user with more clear and unambiguous determination of the information fragment.
  • the required pieces of information selected by the user in the index directory are outputted through the document output module.
  • the information association module finds the text content of each of the two information fragments in the database for similarity calculation; for an information fragment, filters the similarity with the information fragment according to the set threshold Association of other pieces of information within a predetermined threshold range;
  • the information fragment in the database associated with the information association module is displayed, and after being output through the document output module, the text content and the information source of the information fragment associated with the information fragment are simultaneously displayed.
  • the similarity calculation specifically includes:
  • D 1 ⁇ T 11 , W 11 ; T 12 , W 12 , ..., T 1n , W 1n ⁇ ;
  • T 1n is the feature item of D 1
  • W 1n is a weight determined according to a word frequency of T 1n
  • n is a sequence number of a feature item in the first feature set
  • D 2 ⁇ T 21 , W 21 ; T 22 , W 22 ;..., T 2m , W 2m ⁇ ;
  • T 1m is a characteristic item of D 2
  • W 1m is a weight determined according to a word frequency of T 1m
  • m is a sequence number of a feature item in the second feature set
  • the Sim (D1, D2) is the similarity of two pieces of information, and k is a sequence number of the feature item.
  • the vector text model is used to represent the fragmented texts D1 and D2, which are calculated as follows:
  • the association table contains other information fragmentation information associated with the information fragment, and the information of other information fragments is sorted in the association table in descending order of similarity;
  • the document displays the text content of the fragment of the information, and displays the text content of the other information fragments in the order of the information fragments in the association table below the text content of the information fragment.
  • the present invention also discloses an information fragment translation method, including:
  • a preferred embodiment is provided based on the present invention, comprising:
  • the selected information fragment is identified, and the text content of the information fragment is identified;
  • the information source of the information fragment can also be identified
  • the text content of the information fragment is translated according to the target language to obtain a translation of the information fragment;
  • the text content, translation and information source of the information fragment are separated and stored in the corresponding database for collection and storage.
  • a keyword for determining information fragments a keyword for determining information fragments
  • the user selects the pieces of information he needs in the index directory based on the keyword; or
  • Fragment of information selected by the user in the index directory or by searching in the database The fragmentation is displayed in a document in a user-selected document format for viewing by the user.
  • the text content and the information source of the information fragment are displayed; in the case that the information fragment has a translation, the translation of the information fragment is simultaneously displayed; and the text content of the information fragment is displayed in comparison with the translation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Disclosed are an information fragment translating method and system. The method comprises: recognizing text content of information fragments selected by a user and a source language of the text content, and determining a translation target language; translating, under the condition that the source language of the text content is inconsistent with the translation target language, the text content of the information fragments according to the target language so as to obtain a translation of the information fragments; and displaying, in a document format selected by the user, the text content and the translation of the information fragments in a contrast mode.

Description

一种信息碎片的翻译方法及系统Method and system for translating information fragments 技术领域Technical field
本发明涉及一种计算机领域,特别是涉及一种信息碎片的翻译方法及系统。The present invention relates to the field of computers, and in particular, to a method and system for translating information fragments.
背景技术Background technique
当前,随着互联网时代的到来,当需要完成一个报告或者撰写一篇文档的时候,往往要对信息进行收集许多信息大多都以碎片的方式分散在不同的地方,找到后需要对整篇文稿进行复制、粘贴等操作收集文本内容,在收集到了信息碎片后,还得对信息碎片分类整理,对信息碎片的分类不当,还需要再次寻找信息碎片,操作十分繁琐,并且用户找到的信息碎片的语言与用户所需要的语言不同,需要用户通过其他翻译工具对信息碎片的文本内容进行翻译。At present, with the advent of the Internet era, when it is necessary to complete a report or write a document, it is often necessary to collect information. Most of the information is scattered in different places in a fragmented manner. After the discovery, the entire document needs to be Copying, pasting, etc. collect text content. After collecting information fragments, it is necessary to sort information fragments, improperly classify information fragments, and need to find information fragments again. The operation is very cumbersome, and the language of information fragments found by users is Different from the language required by the user, the user needs to translate the text content of the information fragment through other translation tools.
发明内容Summary of the invention
本发明在于提供一种信息碎片的翻译方法及系统,以解决现有技术中的对各个语言的信息碎片无法直接查看的问题。The present invention provides a method and system for translating information fragments to solve the problem that the information fragments in various languages cannot be directly viewed in the prior art.
本发明公开了一种信息碎片的翻译方法,包括:The invention discloses a method for translating information fragments, comprising:
识别用户选取的信息碎片的文本内容,确定所述信息碎片的翻译方向,将所述信息碎片的文本内容按照确定的所述翻译方向进行翻译,获得所述信息碎片的译文; Identifying a text content of the information fragment selected by the user, determining a translation direction of the information fragment, and translating the text content of the information fragment according to the determined translation direction to obtain a translation of the information fragment;
以用户选定的文档格式将所述信息碎片的文本内容和译文对照显示。The textual content of the information fragment is displayed against the translation in a document format selected by the user.
优选地,通过用户设定的目标语言,确定所述信息碎片的翻译方向。Preferably, the translation direction of the information fragment is determined by a target language set by the user.
优选地,通过识别用户常用语言,并将所述用户常用语言作为目标语言,确定所述信息碎片的翻译方向。Preferably, the translation direction of the information fragment is determined by identifying a common language of the user and using the common language of the user as the target language.
优选地,所述识别用户常用语言的过程包括:Preferably, the process of identifying a common language of the user includes:
识别用户所在地域,将该地域的母语作为所述用户常用语言;或Identify the user's location, using the native language of the region as the language of the user; or
识别用户的数字终端的系统语言,将所述数字终端的系统语言作为所述用户常用语言。The system language of the digital terminal of the user is identified, and the system language of the digital terminal is used as the common language of the user.
优选地,还包括:在用户选取所述信息碎片时,识别该信息碎片的信息来源;Preferably, the method further includes: identifying a source of information of the information fragment when the user selects the information fragment;
将所述信息碎片的文本内容和信息来源分别放入相应的数据库中进行归集存储;The text content and the information source of the information fragment are respectively placed in a corresponding database for collection and storage;
在显示所述信息碎片的文本内容和译文的同时,显示该信息碎片的信息来源。The information source of the information fragment is displayed while displaying the text content and the translation of the information fragment.
优选地,还包括: Preferably, the method further comprises:
为所述归集存储的所有信息碎片建立索引目录;Establishing an index directory for all information fragments stored in the collection;
等待用户在所述索引目录中选择其所需的信息碎片后,以用户选定的文档格式将选择的信息碎片的文本内容、译文和信息来源进行显示。After waiting for the user to select the required information fragment in the index directory, the text content, translation and information source of the selected information fragment are displayed in the document format selected by the user.
优选地,还包括:在识别用户选取的多个信息碎片的文本内容后,确定每个信息碎片的文本内容中的关键字,将得到的所述关键字作为该信息碎片在所述索引目录中的摘要显示。Preferably, the method further includes: after identifying the text content of the plurality of information fragments selected by the user, determining a keyword in the text content of each information fragment, and using the obtained keyword as the information fragment in the index directory. The summary shows.
优选地,所述信息碎片包括:文本格式和图片格式;Preferably, the information fragment comprises: a text format and a picture format;
还包括:Also includes:
通过用户触发相应的全局热键,调用出相应的选取功能,选取所述文本格式或图片格式的所述信息碎片。The user selects the corresponding global hotkey to invoke the corresponding selection function, and selects the information fragment of the text format or the image format.
本发明公开了一种信息碎片的翻译系统,包括:信息识别模块,用于识别用户选取的信息碎片的文本内容和信息来源,并将所述信息碎片的文本内容发送给翻译处理模块进行翻译,将所述信息碎片的文本内容、译文和信息来源分别放入相应的数据库中进行归集存储;The invention discloses a translation system for information fragments, comprising: an information recognition module, configured to identify text content and information source of information fragments selected by a user, and send the text content of the information fragment to a translation processing module for translation, The text content, the translation and the information source of the information fragment are respectively placed in a corresponding database for collection and storage;
所述翻译处理模块,用于确定所述信息碎片的翻译方向,并根据确定的翻译方向对所述信息碎片的文本内容进行翻译; The translation processing module is configured to determine a translation direction of the information fragment, and translate the text content of the information fragment according to the determined translation direction;
文档输出模块,用于以用户选定的文档格式将归集存储的所述信息碎片的文本内容、译文和信息来源进行显示。A document output module is configured to display the text content, the translation, and the information source of the information fragment stored in the collection in a user-selected document format.
优选地,还包括:解析模块,用于识别用户触发的全局热键,将识别出的全局热键映射的控制指令发送给相应的选取模块,提供用户相应的选取功能。Preferably, the method further includes: a parsing module, configured to identify a global hotkey triggered by the user, and send the identified global hotkey mapping control instruction to the corresponding selection module to provide a corresponding selection function of the user.
目录索引模块,用于为所述数据库中的所有信息碎片建立索引目录,供用户选择;a directory indexing module, configured to create an index directory for all information fragments in the database for the user to select;
与现有技术相比,本发明包括以下优点:Compared with the prior art, the present invention includes the following advantages:
1、将识别后的文本内容直接翻译,并将得到的文本内容和译文进行存储,用户可以随时地、直接地查看信息碎片;1. The translated text content is directly translated, and the obtained text content and translation are stored, and the user can view the information fragment at any time and directly;
2、可连续收集碎片,提高了效率;2, can continuously collect debris, improve efficiency;
3、自动识别用户的翻译方向,简化了翻译流程;3. Automatically identify the user's translation direction and simplify the translation process;
4、通过触发全局热键的方式,可以在不影响用户操作的过程中对信息碎片进行收集。4. By triggering the global hotkey, information fragments can be collected without affecting user operations.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的 不当限定。在附图中:The drawings described herein are provided to provide a further understanding of the invention, and are in the Improperly qualified. In the drawing:
图1示出了实施例的第一流程图;Figure 1 shows a first flow chart of an embodiment;
图2示出了实施例的第二流程图;Figure 2 shows a second flow chart of an embodiment;
图3示出了实施例的结构示意图。Fig. 3 shows a schematic structural view of an embodiment.
具体实施方式detailed description
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。The present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
参照图1,本发明公开了一个信息碎片的翻译系统,包括:Referring to Figure 1, the present invention discloses a translation system for information fragments, including:
解析模块1、文本选取模块2、图片选取模块3、信息识别模块4、翻译处理模块5、目录索引模块9和文档输出模块10;The parsing module 1, the text selecting module 2, the picture selecting module 3, the information identifying module 4, the translation processing module 5, the directory indexing module 9 and the document output module 10;
解析模块,用于识别用户触发的全局热键,并将识别出的全局热键映射的控制指令发送给相应的选取模块,提供用户相应的选取功能;The parsing module is configured to identify a global hotkey triggered by the user, and send the identified global hotkey mapping control instruction to the corresponding selecting module to provide a corresponding selection function of the user;
全局热键可以是一个单独的按键,也可以是由多个单独的按键的组合。The global hotkey can be a single button or a combination of multiple individual buttons.
其中,用户在对所需要的信息碎片进行选取时,信息碎片不仅仅是可以进行选择的文字,还包括不可选择文字和包含有碎片信息的图片;Wherein, when the user selects the required information fragments, the information fragment is not only a text that can be selected, but also an unselectable text and a picture containing the fragmented information;
解析模块识别出用户触发的第一全局热键后,解析模块将第一全局热键映射的控制指令发送给文本选取模块; After the parsing module recognizes the first global hotkey triggered by the user, the parsing module sends the control instruction of the first global hotkey mapping to the text selection module;
文本选取模块接收到解析模块发送的第一全局热键映射的控制指令后,提供用户直接选取文本格式的信息碎片的功能;After receiving the control instruction of the first global hotkey mapping sent by the parsing module, the text selection module provides a function for the user to directly select the information fragment in the text format;
解析模块识别出用户触发的第二全局热键后,解析模块将第二全局热键映射的控制指令发送给图片选取模块;After the parsing module recognizes the second global hotkey triggered by the user, the parsing module sends a control instruction of the second global hotkey mapping to the image selecting module;
图片选取模块接收到解析模块发送的第二全局热键映射的控制指令后,提供用户截图选取图片格式的信息碎片的功能。After receiving the control instruction of the second global hotkey mapping sent by the parsing module, the image selection module provides a function of the user to capture the information fragment of the image format.
在用户对信息碎片选取后,系统自动将选取的信息碎片发送给信息识别模块;After the user selects the information fragment, the system automatically sends the selected information fragment to the information identification module;
信息识别模块,用于接收用户所选取的信息碎片,并识别出该信息碎片的文本内容和信息来源;The information identification module is configured to receive the information fragment selected by the user, and identify the text content and the information source of the information fragment;
其中,对于本地的资源,信息来源为信息碎片的本地存储地址,例c:\1\2\3\信息碎片所在文档;其中,信息碎片所在文档可以为各种文档格式,例:各种office文档,记事本、用于编译代码的文档等;对于网络的资源,信息来源为信息碎片的网络地址,例如:For local resources, the information source is the local storage address of the information fragment, for example, c:\1\2\3\ the document where the information fragment is located; wherein the document where the information fragment is located can be various document formats, for example: various offices Documents, notepads, documents used to compile code, etc.; for network resources, the source of information is the network address of the information fragment, for example:
http://wenku.baidu.com/link?url=yKLV9Z1UyA3SCZqcZkDM0miWl5LW LgEJvOh_cY-iPQRIOP23sWg2sNgP_2-is2h_32e2Cr_u3HjVmraorpLEpt8v9J5V GTKEC9dVPi8-Fle; Http://wenku.baidu.com/link? Url=yKLV9Z1UyA3SCZqcZkDM0miWl5LW LgEJvOh_cY-iPQRIOP23sWg2sNgP_2-is2h_32e2Cr_u3HjVmraorpLEpt8v9J5V GTKEC9dVPi8-Fle;
通过信息碎片的信息来源,可以快速的找到该信息碎片所在的文档,方便用户查看、调用和选取更多的关于该信息碎片在其所在文档中其他部分。Through the information source of the information fragment, you can quickly find the document where the information fragment is located, so that the user can view, call and select more information about the fragmentation of the information in other parts of the document.
翻译处理模块,包括:翻译方向识别模块、语言匹配模块和翻译模块;a translation processing module, comprising: a translation direction identification module, a language matching module, and a translation module;
翻译方向识别模块,用于识别信息碎片的文本内容的源语言和翻译的目标语言;a translation direction identification module for identifying a source language of the text content of the information fragment and a target language of the translation;
其中,翻译的目标语言用于通过识别将用户设定或通过将用户常用语言作为目标语言;Wherein, the target language of the translation is used to identify the user by recognition or by using the language commonly used by the user as the target language;
其中,用户常用语言通过如下方式获取:Among them, the user's common language is obtained as follows:
识别用户所在地域,将用户所在地域的母语作为用户常用语言;或识别用户的安装有本系统的数字终端的系统语言,将该数字终端的系统语言作为用户常用语言。Identify the user's local domain, use the native language of the user's local domain as the user's common language; or identify the user's system language with the digital terminal installed in the system, and use the system language of the digital terminal as the user's common language.
匹配模块,用于检测信息碎片的源语言和翻译的目标语言是否相一致;a matching module, configured to detect whether the source language of the information fragment and the translated target language are consistent;
在源语言和目标语言相一致的情况下;信息识别模块将信息碎片的文本内容和信息来源分别放入相应的数据库中进行归集存储;In the case where the source language and the target language are consistent; the information recognition module separately stores the text content and the information source of the information fragment into the corresponding database for collection and storage;
在源语言和目标语言不一致的情况下,将该信息碎片的文本内容送入翻译模块按照所述目标语言进行翻译处理,获取信息碎片的译文,之后信息识 别模块将信息碎片的文本内容、译文和信息来源分别放入相应的数据库中进行归集存储。If the source language and the target language are inconsistent, the text content of the information fragment is sent to the translation module for translation processing according to the target language, and the translation of the information fragment is obtained, and then the information is recognized. The module stores the text content, translation and information source of the information fragment into the corresponding database for collection and storage.
数据库包括:其中,数据库包括:第一数据库8、第二数据库6和第三数据库7;The database includes: wherein the database comprises: a first database 8, a second database 6, and a third database 7;
第一数据库中用于存储信息碎片的文本内容;Text content for storing information fragments in the first database;
第二数据库中用于存储信息碎片的信息来源;The source of information used to store information fragments in the second database;
第三数据库中用于存储信息碎片的译文;a translation for storing information fragments in the third database;
并且,同一个信息碎片的文本内容、译文和信息来源在三个数据库中具有映射关系。Moreover, the text content, translation, and information source of the same information fragment have a mapping relationship among the three databases.
可以通过分别根据文本内容、译文和信息来源在相应的数据库中进行检索,找到用户检索词匹配的信息碎片,通过文档输出模块输出显示。The information fragments matched by the user search words can be found by searching in the corresponding database according to the text content, the translation and the information source respectively, and outputting the display through the document output module.
文档输出模块,用于将所述信息碎片的文本内容和信息来源,以用户选定的文档格式显示;a document output module, configured to display the text content and the information source of the information fragment in a document format selected by a user;
在该信息碎片存在译文的情况下,同时显示该信息碎片译文;In the case that the information fragment has a translation, the information fragment translation is simultaneously displayed;
信息碎片的文本内容和译文为对照显示。 The text content and translation of the information fragment are displayed as a comparison.
其中,也可以将多个信息碎片的文本内容整合到一个文档中显示。Among them, the text content of multiple pieces of information can also be integrated into one document for display.
目录索引模块,用于为数据库中的信息碎片建立索引目录;a directory indexing module for indexing information fragments in a database;
其中,该索引目录中的名称可以是按照一定顺序排列的编号,例如:通过信息碎片的长短、大小或信息碎片的获取时间的前后进行排列后的逻辑编号;The name in the index directory may be a number in a certain order, for example, a logical number after the length of the information fragment, the size, or the information acquisition time.
也可以是用户自行编译的名称或用户在信息碎片中标记的词语显示;对于图片格式个信息碎片,标记的方式为在该图片中通过截图选取词语,在信息识别模块识别后,将其作为索引目录的名称使用;It can also be the name compiled by the user or the word displayed by the user in the information fragment. For the image format fragmentation, the way of marking is to select the word in the picture through the screenshot, and after the information recognition module recognizes it, use it as an index. The name of the directory is used;
进一步的,用户在信息碎片中确定关键字,其中,该关键字可以为一个或者多个,确定关键字的过程为:用户自行编译的词语或用户在信息碎片中标记的词语;Further, the user determines a keyword in the information fragment, wherein the keyword may be one or more, and the process of determining the keyword is: a word compiled by the user or a word marked by the user in the information fragment;
确定信息碎片的关键字后,将该关键字与该信息碎片对应的索引目录的名称一同显示,作为该信息碎片的摘要显示,提供用户更加清楚、明确的确定信息碎片。After the keyword of the information fragment is determined, the keyword is displayed together with the name of the index directory corresponding to the information fragment, and the summary display of the information fragment provides the user with more clear and unambiguous determination of the information fragment.
将用户在索引目录中选取的所需的信息碎片,通过文档输出模块输出显示。The required pieces of information selected by the user in the index directory are outputted through the document output module.
信息关联模块,在数据库中找到每两个信息碎片的文本内容进行相似度计算;对于一个信息碎片来说,根据设定的阈值筛选出与该信息碎片相似度 在预先设定的阈值范围内的其他信息碎片进行关联;The information association module finds the text content of each of the two information fragments in the database for similarity calculation; for an information fragment, filters the similarity with the information fragment according to the set threshold Association of other pieces of information within a predetermined threshold range;
通过信息关联模块关联后的数据库中的信息碎片,在通过文档输出模块输出后,同时显示该信息碎片相关联的信息碎片的文本内容和信息来源。The information fragment in the database associated with the information association module is displayed, and after being output through the document output module, the text content and the information source of the information fragment associated with the information fragment are simultaneously displayed.
其中,相似度计算具体包括:Among them, the similarity calculation specifically includes:
选取所述信息碎片中的第一信息碎片D1和第二信息碎片D2Selecting the first information fragment D 1 and the second information fragment D 2 in the information fragment;
根据所述第一信息碎片的文本内容和第二信息碎片的文本内容,分别确定词频高于预先设定的第二阀值的关键字/词作为特征项;Determining, according to the text content of the first information fragment and the text content of the second information fragment, a keyword/word with a word frequency higher than a preset second threshold as a feature item;
建立所述第一信息碎片的第一特征集,如下:Establishing a first feature set of the first information fragment, as follows:
D1={T11,W11;T12,W12,......,T1n,W1n};D 1 = {T 11 , W 11 ; T 12 , W 12 , ..., T 1n , W 1n };
其中,T1n为D1的所述特征项,W1n为根据T1n的词频确定的权重,n为第一特征集中特征项的序号;Wherein T 1n is the feature item of D 1 , W 1n is a weight determined according to a word frequency of T 1n , and n is a sequence number of a feature item in the first feature set;
建立所述第二信息碎片的第二特征集,如下:Establishing a second feature set of the second information fragment, as follows:
D2={T21,W21;T22,W22;......,T2m,W2m};D 2 ={T 21 , W 21 ; T 22 , W 22 ;..., T 2m , W 2m };
其中,T1m为D2的所特征项,W1m为根据T1m的词频确定的权重,m为第二特征集中特征项的序号;Where T 1m is a characteristic item of D 2 , W 1m is a weight determined according to a word frequency of T 1m , and m is a sequence number of a feature item in the second feature set;
利用余弦公式计算得到两个所述信息碎片的所述相似度,所述余弦公式如下:The similarity of the two pieces of information is calculated using a cosine formula, which is as follows:
[根据细则26改正02.02.2015] 
Cosine:
Figure WO-DOC-MATHS-1
[Correction according to Rule 26 02.02.2015]
Cosine:
Figure WO-DOC-MATHS-1
其中,所述Sim(D1,D2)为两个所述信息碎片的所述相似度,k为特征项的序号。The Sim (D1, D2) is the similarity of two pieces of information, and k is a sequence number of the feature item.
用向量空间模型表示碎片文本D1和D2,计算如下:The vector text model is used to represent the fragmented texts D1 and D2, which are calculated as follows:
Figure PCTCN2014093657-appb-000002
Figure PCTCN2014093657-appb-000002
通过上述计算得到每一个信息碎片与其他信息碎片的相似度;Through the above calculation, the similarity between each information fragment and other information fragments is obtained;
选取与该信息碎片相似度大小在阀值(low,high)内的所有信息碎片,与该信息碎片关联,建立关联表:Select all pieces of information that are similar to the information fragment size in the threshold (low, high), and associate with the information fragment to establish an association table:
该关联表中包含有信息碎片相关联的的其他信息碎片信息,并且其他信息碎片的信息在关联表中按照相似度从大到小的顺序进行排序;The association table contains other information fragmentation information associated with the information fragment, and the information of other information fragments is sorted in the association table in descending order of similarity;
在用户选取要查看的信息碎片后,建立文档显示该信息碎片的文本内容,在该信息碎片的文本内容下方按照关联表中的信息碎片的排列排列顺序显示其他信息碎片的文本内容。After the user selects the fragment of the information to be viewed, the document displays the text content of the fragment of the information, and displays the text content of the other information fragments in the order of the information fragments in the association table below the text content of the information fragment.
如图2所示,本发明还公开了一种信息碎片翻译方法,包括:As shown in FIG. 2, the present invention also discloses an information fragment translation method, including:
S11、识别用户选取的信息碎片的文本内容和文本内容的源语言,并确定翻译的目标语言;S11. Identify a text content of the information fragment selected by the user and a source language of the text content, and determine a target language of the translation;
S12、在所述文本内容的源语言与所述翻译的目标语言不一致的情况下, 将所述信息碎片的文本内容按照所述目标语言进行翻译,获得所述信息碎片的译文;S12. In a case where the source language of the text content is inconsistent with the target language of the translation, Translating the text content of the information fragment according to the target language to obtain a translation of the information fragment;
S13、以用户选定的文档格式将所述信息碎片的文本内容和译文对照显示。S13. Display the text content of the information fragment and the translation in a document format selected by the user.
基于本发明提供了一个优选地实施例,包括:A preferred embodiment is provided based on the present invention, comprising:
S21、碎片收集;S21, debris collection;
等待用户通过触发特定的全局热键,调取相应的选取功能提供给用户,对相应格式的信息碎片进行选取;Waiting for the user to trigger a specific global hotkey to retrieve the corresponding selection function and provide the user with the information fragment of the corresponding format;
S22、碎片识别;S22, fragment identification;
在用户选取了信息碎片后,对选取的信息碎片进行识别,识别出信息碎片的文本内容;After the user selects the information fragment, the selected information fragment is identified, and the text content of the information fragment is identified;
进一步的,还可以识别出该信息碎片的信息来源;Further, the information source of the information fragment can also be identified;
S23、碎片翻译;S23, fragment translation;
确定识别得到的信息碎片的文本内容的源语言和需要进行翻译的目标语言; Determining the source language of the textual content of the identified information fragments and the target language for which translation is required;
在信息碎片的文本内容不一致的情况下,将信息碎片的文本内容按照目标语言进行翻译,得到信息碎片的译文;In the case where the text content of the information fragment is inconsistent, the text content of the information fragment is translated according to the target language to obtain a translation of the information fragment;
S24、归集存储处理;S24, collection storage processing;
将信息碎片的文本内容、译文和信息来源分离,分别存入相应的数据库中进行归集存储。The text content, translation and information source of the information fragment are separated and stored in the corresponding database for collection and storage.
S25、建立目录;S25, creating a directory;
为数据库中的信息碎片建立索引目录;Indexing the information fragments in the database;
其中,还包括:确定信息碎片的关键字;Among them, it also includes: a keyword for determining information fragments;
将关键字在索引目录中作为摘要显示。Display keywords as a summary in the index directory.
S26、选取碎片;S26, selecting fragments;
用户根据关键字在索引目录中选取其所需要的信息碎片;或The user selects the pieces of information he needs in the index directory based on the keyword; or
在数据库中根据信息碎片的文本内容或信息来源作为检索词,在数据库中进行检索,获取检索到的信息碎片;Searching in the database according to the text content or information source of the information fragment in the database, and obtaining the retrieved information fragments;
S27、输出碎片;S27, output fragmentation;
将用户在索引目录中选取的信息碎片或通过在数据库中检索得到的信 息碎片,以用户选定的文档格式统一在一篇文档中显示,供用户查看。Fragment of information selected by the user in the index directory or by searching in the database The fragmentation is displayed in a document in a user-selected document format for viewing by the user.
其中,显示信息碎片的文本内容和信息来源;在该信息碎片存在译文的情况下,同时显示该信息碎片的译文;并且信息碎片的文本内容和译文对照显示。The text content and the information source of the information fragment are displayed; in the case that the information fragment has a translation, the translation of the information fragment is simultaneously displayed; and the text content of the information fragment is displayed in comparison with the translation.
以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。 The description of the above embodiments is only for helping to understand the method and the core idea of the present invention; at the same time, for those skilled in the art, according to the idea of the present invention, there are some changes in the specific embodiments and application scopes. In summary, the content of the specification should not be construed as limiting the invention.

Claims (10)

  1. 一种信息碎片的翻译方法,其特征在于,包括:A method for translating information fragments, comprising:
    识别用户选取的信息碎片的文本内容和所述文本内容的源语言,并确定翻译的目标语言;Identifying text content of the information fragment selected by the user and a source language of the text content, and determining a target language of the translation;
    在所述文本内容的源语言与所述翻译的目标语言不一致的情况下,将所述信息碎片的文本内容按照所述目标语言进行翻译,获得所述信息碎片的译文;And if the source language of the text content is inconsistent with the target language of the translation, translating the text content of the information fragment according to the target language to obtain a translation of the information fragment;
    以用户选定的文档格式将所述信息碎片的文本内容和译文对照显示。The textual content of the information fragment is displayed against the translation in a document format selected by the user.
  2. 根据权利要求1所述的方法,其特征在于,所述目标语言通过用户设定得到;或The method of claim 1 wherein said target language is obtained by user setting; or
    通过将用户常用语言作为所述目标语言。By using the user's usual language as the target language.
  3. 根据权利要求2所述的方法,其特征在于,所述用户常用语言通过如下方式获得:The method according to claim 2, wherein said user common language is obtained as follows:
    识别用户所在地域,将该地域的母语作为所述用户常用语言;或Identify the user's location, using the native language of the region as the language of the user; or
    识别用户的数字终端的系统语言,将所述数字终端的系统语言作为所述用户常用语言。 The system language of the digital terminal of the user is identified, and the system language of the digital terminal is used as the common language of the user.
  4. 根据权利要求1所述的方法,其特征在于,还包括:在用户选取所述信息碎片时,识别该信息碎片的信息来源;The method according to claim 1, further comprising: identifying a source of information of the information fragment when the user selects the information fragment;
    将所述信息碎片的文本内容、译文和信息来源分别放入相应的数据库中进行归集存储;The text content, the translation and the information source of the information fragment are respectively placed in a corresponding database for collection and storage;
    在显示所述信息碎片的文本内容和译文的同时,显示该信息碎片的信息来源。The information source of the information fragment is displayed while displaying the text content and the translation of the information fragment.
  5. 根据权利要求4所述的方法,其特征在于,还包括:The method of claim 4, further comprising:
    为所述归集存储的所有信息碎片建立索引目录;Establishing an index directory for all information fragments stored in the collection;
    等待用户在所述索引目录中选择其所需的信息碎片后,以用户选定的文档格式将选择的信息碎片的文本内容、译文和信息来源进行显示。After waiting for the user to select the required information fragment in the index directory, the text content, translation and information source of the selected information fragment are displayed in the document format selected by the user.
  6. 根据权利要求1所述的方法,其特征在于,还包括:在获得所述信息碎片的译文后,确定所述译文中的关键字,将得到的所述关键字作为该信息碎片在所述索引目录中的摘要显示。The method according to claim 1, further comprising: after obtaining the translation of the information fragment, determining a keyword in the translation, and using the obtained keyword as the information fragment in the index A summary display in the catalog.
  7. 根据权利要求1所述的方法,其特征在于,所述信息碎片包括:文本格式和图片格式;The method according to claim 1, wherein the information fragment comprises: a text format and a picture format;
    还包括: Also includes:
    通过用户触发相应的全局热键,调用出相应的选取功能,选取所述文本格式或图片格式的所述信息碎片。The user selects the corresponding global hotkey to invoke the corresponding selection function, and selects the information fragment of the text format or the image format.
  8. 一种信息碎片的翻译系统,其特征在于,包括:信息识别模块,用于识别用户选取的信息碎片的文本内容和信息来源,并将所述信息碎片的文本内容发送给翻译处理模块进行翻译,将所述信息碎片的文本内容、译文和信息来源分别放入相应的数据库中进行归集存储;A translation system for information fragmentation, comprising: an information recognition module, configured to identify a text content and a source of information of a piece of information selected by a user, and send the text content of the information fragment to a translation processing module for translation, The text content, the translation and the information source of the information fragment are respectively placed in a corresponding database for collection and storage;
    所述翻译处理模块,用于识别所述信息碎片的文本内容的源语言,并确定翻译的目标语言,在所述源语言和所述目标语言不一致的情况下,对所述信息碎片的文本内容进行翻译;The translation processing module is configured to identify a source language of the text content of the information fragment, and determine a target language of the translation, where the source language and the target language are inconsistent, the text content of the information fragment Translate;
    文档输出模块,用于以用户选定的文档格式将归集存储的所述信息碎片的文本内容、译文和信息来源进行显示。A document output module is configured to display the text content, the translation, and the information source of the information fragment stored in the collection in a user-selected document format.
  9. 根据权利要求8所述的系统,其特征在于,还包括:The system of claim 8 further comprising:
    解析模块,用于识别用户触发的全局热键,将识别出的全局热键映射的控制指令发送给相应的选取模块,提供用户相应的选取功能。The parsing module is configured to identify a global hotkey triggered by the user, and send the identified global hotkey mapping control instruction to the corresponding selecting module to provide a corresponding selection function of the user.
  10. 根据权利要求8所述的系统,其特征在于,还包括:目录索引模块,用于为所述数据库中的所有信息碎片建立索引目录,供用户选择。 The system according to claim 8, further comprising: a directory indexing module, configured to create an index directory for all information fragments in the database for selection by a user.
PCT/CN2014/093657 2013-12-23 2014-12-12 Information fragment translating method and system WO2015096625A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310713245.4 2013-12-23
CN201310713245.4A CN103744841A (en) 2013-12-23 2013-12-23 Information fragment translating method and system

Publications (1)

Publication Number Publication Date
WO2015096625A1 true WO2015096625A1 (en) 2015-07-02

Family

ID=50501859

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/093657 WO2015096625A1 (en) 2013-12-23 2014-12-12 Information fragment translating method and system

Country Status (2)

Country Link
CN (1) CN103744841A (en)
WO (1) WO2015096625A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744841A (en) * 2013-12-23 2014-04-23 武汉传神信息技术有限公司 Information fragment translating method and system
CN104391840A (en) * 2014-11-24 2015-03-04 上海迈外迪网络科技有限公司 Translation method and device
CN105243058B (en) * 2015-09-30 2018-04-13 北京奇虎科技有限公司 A kind of web page contents interpretation method and electronic equipment
CN107766335A (en) * 2016-08-23 2018-03-06 耿诚 A kind of interpretation method and device of software to be translated
CN111104805A (en) * 2018-10-26 2020-05-05 广州金山移动科技有限公司 Translation processing method and device, computer storage medium and terminal
CN109977429A (en) * 2019-04-03 2019-07-05 新疆语视未来信息科技有限公司 A kind of information interacting method based on translation content instant playback

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1620659A (en) * 2001-12-21 2005-05-25 埃里·阿博 Multilingual database creation system and method
CN103167360A (en) * 2013-02-21 2013-06-19 中国对外翻译出版有限公司 Method for achieving multilingual subtitle translation
US20130332450A1 (en) * 2012-06-11 2013-12-12 International Business Machines Corporation System and Method for Automatically Detecting and Interactively Displaying Information About Entities, Activities, and Events from Multiple-Modality Natural Language Sources
CN103744841A (en) * 2013-12-23 2014-04-23 武汉传神信息技术有限公司 Information fragment translating method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AUPR360701A0 (en) * 2001-03-06 2001-04-05 Worldlingo, Inc Seamless translation system
US7698126B2 (en) * 2005-03-08 2010-04-13 Microsoft Corporation Localization matching component
CN101650716A (en) * 2008-08-12 2010-02-17 英业达股份有限公司 System and method for translating multiple languages
EP2629211A1 (en) * 2009-08-21 2013-08-21 Mikko Kalervo Väänänen Method and means for data searching and language translation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1620659A (en) * 2001-12-21 2005-05-25 埃里·阿博 Multilingual database creation system and method
US20130332450A1 (en) * 2012-06-11 2013-12-12 International Business Machines Corporation System and Method for Automatically Detecting and Interactively Displaying Information About Entities, Activities, and Events from Multiple-Modality Natural Language Sources
CN103167360A (en) * 2013-02-21 2013-06-19 中国对外翻译出版有限公司 Method for achieving multilingual subtitle translation
CN103744841A (en) * 2013-12-23 2014-04-23 武汉传神信息技术有限公司 Information fragment translating method and system

Also Published As

Publication number Publication date
CN103744841A (en) 2014-04-23

Similar Documents

Publication Publication Date Title
WO2015096625A1 (en) Information fragment translating method and system
US10552467B2 (en) System and method for language sensitive contextual searching
US11461386B2 (en) Visual recognition using user tap locations
US8577882B2 (en) Method and system for searching multilingual documents
US9235643B2 (en) Method and system for generating search results from a user-selected area
US20080215548A1 (en) Information search method and system
KR20140093957A (en) Interactive multi-modal image search
CN107861753B (en) APP generation index, retrieval method and system and readable storage medium
TW201541266A (en) Providing search results corresponding to displayed content
US11709881B2 (en) Visual menu
US10152540B2 (en) Linking thumbnail of image to web page
CN110866091A (en) Data retrieval method and device
US9165058B2 (en) Apparatus and method for searching for personalized content based on user's comment
WO2015188719A1 (en) Association method and association device for structural data and picture
CN116152831A (en) Method and system for ideographic character analysis
CN116508004A (en) Method for point of interest information management, electronic device, and storage medium
JP5484113B2 (en) Document image related information providing apparatus and document image related information acquisition system
CN105183729A (en) Method and device for retrieving audio/video content
JP2011053881A (en) Document management system
US10606875B2 (en) Search support apparatus and method
CN103744884A (en) Method and system for collating information fragments
JP2011054148A (en) Retrieval device and method, and program
JP2024130142A (en) News information analysis device, news information analysis program, and news information analysis method
TWI484356B (en) Retrieval methods, devices and systems
JP2007148625A (en) Information presentation device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14873264

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14873264

Country of ref document: EP

Kind code of ref document: A1