WO2019153480A1 - 一种文字翻译的方法、装置、服务器及介质 - Google Patents

一种文字翻译的方法、装置、服务器及介质 Download PDF

Info

Publication number
WO2019153480A1
WO2019153480A1 PCT/CN2018/082606 CN2018082606W WO2019153480A1 WO 2019153480 A1 WO2019153480 A1 WO 2019153480A1 CN 2018082606 W CN2018082606 W CN 2018082606W WO 2019153480 A1 WO2019153480 A1 WO 2019153480A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
text object
local
text
principal component
Prior art date
Application number
PCT/CN2018/082606
Other languages
English (en)
French (fr)
Inventor
蔡锦升
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019153480A1 publication Critical patent/WO2019153480A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Definitions

  • the present application belongs to the field of artificial intelligence technologies, and in particular, to a method, device, server and medium for text translation.
  • the invention solves the problem that the prior art has poor user convenience when the text is translated in an unfamiliar environment, and the processing efficiency is low.
  • a first aspect of the embodiments of the present application provides a method for text translation, including:
  • the classification hyperplane of more than one language is obtained; the text component is subjected to dimensionality reduction processing by principal component analysis to generate a principal component matrix of the text object, and then the main component of the text object is obtained by a Gaussian kernel function.
  • Mapping a matrix to a high-dimensional feature space to generate a test parameter of the text object calculating a Euclidean distance between the test parameter of the text object and the classification hyperplane of each of the languages, as the Euclidean distance corresponding to each language, and the Euclidean distance The smallest language, determined as the target language, and translated into the mother Languages.
  • a second aspect of the embodiments of the present application provides a device for text translation, including:
  • An acquiring module configured to acquire a native language input by the user, and detect a user location, determine a local language corresponding to the user location according to a preset relationship between the preset geographic location and the local language; and enable a module, if the local language is used Not for the native language language, the translation function is enabled; the determining module is configured to detect the text object after the translation function is turned on, and determine whether the text object belongs to the native language language or the local language; generating a module, And if the text object does not belong to the native language language, and does not belong to the local language, obtain a classification hyperplane of more than one language; perform dimensionality reduction processing on the text object by using a principal component analysis method to generate the a principal component matrix of the text object, and then mapping a principal component matrix of the text object to a high-dimensional feature space by a Gaussian kernel function to generate test parameters of the text object; and a first translation module for calculating the text object The test parameter and the Euclidean distance of the hyperplane of the
  • a third aspect of the embodiments of the present application provides a server for text translation, including a memory, a processor, and the computer storing computer readable instructions executable on the processor, the processor executing the The first aspect of the embodiments of the present application when the computer readable instructions are implemented provides a method of text translation.
  • FIG. 1 is a flowchart of an implementation of a method for text translation provided by an embodiment of the present application
  • FIG. 3 is a specific implementation flowchart of a classification hyperplane of a computing language provided by an embodiment of the present application
  • FIG. 4 is a structural block diagram of an apparatus for text translation provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.
  • FIG. 1 shows an implementation flow of a method for text translation provided by an embodiment of the present application, and the method flow includes steps S101 to S105.
  • the specific implementation principle of each step is as follows.
  • S101 Acquire a native language input by the user, and detect a user location, and determine a local language corresponding to the user location according to a preset relationship between the preset geographic location and the local language.
  • the correspondence between multiple sets of geographical locations and local languages is preset, for example, in the range of 73 degrees west longitude to 125 degrees west longitude, 25 degrees north latitude to 49 degrees north latitude, corresponding local languages. It is a language of English; in the range of 139 degrees east longitude to 142 degrees east longitude, 35 degrees north latitude to 40 degrees north latitude, the corresponding local language is Japanese language.
  • the local language corresponding to the detected user location can be determined by the corresponding relationship between the geographical location and the local language.
  • the correspondence between the geographic location and the local language is preset, and the user does not need to input the local language every time he wishes to translate the text.
  • the location automatically determines the local language.
  • the embodiment of the present application further provides that the language of the native language input by the user is obtained, and the location of the user is detected, and the language corresponding to the user location is used as the local language according to the corresponding relationship between the preset geographic location and the language.
  • the local language input by multiple users is counted, and the position coordinates of each user when inputting the local language are detected.
  • the local language can be automatically determined by the user location, but it does not mean that the user cannot manually input the local language. In many cases, for example, if the user finds that the automatically determined local language does not match the real situation, or finds that the local language cannot be determined by the user location, the user can determine the local language by manually inputting.
  • the embodiment of the present application determines and records the current location coordinates, and uses this as reference data for subsequently generating or modifying the correspondence between the geographic location and the local language.
  • the unit area is set, and the language with the highest proportion of the local language in the unit area in the map is determined as the local language corresponding to all the position coordinates in the unit area, so as to generate the correspondence between the geographical location and the local language. relationship.
  • a plurality of regions are drawn on a world map, each region is a rectangle, and the area of each region is set to a unit area, for example, the world map is divided into a plurality of areas of 1 square kilometer. Square area.
  • the proportion of the local language input by the user in each area can be calculated in this step, for example, in a certain area, a total is received.
  • the local language input by 100 users, 90 local languages input by the user are English languages, 8 local languages input by the user are French languages, and 2 local languages input by the user are Spanish languages, then the area is determined.
  • the local languages corresponding to all position coordinates are in English.
  • the local language is intelligently determined by detecting the location of the user, thereby reducing the number of steps for the user to manually select a language, and improving the operation efficiency.
  • S102 Determine whether the local language is a native language.
  • the text translation function is not continuously opened due to the consideration of saving CPU processing resources and power, and the translation function is automatically turned on only when the local language and the native language are different. Therefore, it is necessary to first judge whether the native language language and the local language determined according to the above steps are consistent.
  • the translation function is automatically turned on only when the local language is not the native language language, it does not mean that the translation function is only enabled in this case, because the translation function can also be It is manually turned on by the user.
  • the local language is a native language
  • the translation function will still be enabled.
  • S105 After the translation function is turned on, detecting a text object, and determining whether the text object belongs to the native language language or the local language. It can be understood that when the translation function is turned on, the text object to be translated can be detected by the camera of the electronic device to further determine whether the text object belongs to a native language or a local language.
  • the above S105 includes:
  • S1051 Detect a gap interval in the text object, and divide the text object into a plurality of character characters according to the gap interval. Specifically, since a text object may be composed of a plurality of character characters, the text object is divided into a plurality of parts by detecting the gap interval of the text object, and the corresponding image recognition technology can identify each part. Text characters.
  • S1052 Select a preset number of the character characters, and extract a main component of the text characters.
  • the principal component analysis algorithm (Principal) may be adopted.
  • Component Analysis, PCA extracts the principal components of each text character, thereby reducing the amount of computation for subsequent calculations.
  • S1053 Retrieving a character principal component database of the native language language and a character principal component database of the local language, and calculating a preset principal number of the character component of the character character existing in a character principal component of the native language language
  • the ratio of the database calculates a ratio of the preset number of main components of the character character existing in the character principal component database of the local language as a second ratio.
  • the character principal component database of the native language language and the character principal component database of the local language can be invoked, and the above two Each of the databases contains a large number of native characters of the native language language and a large number of character main components of the local language, so that the main components of the preset number of text characters can be sequentially in the character principal component database of the native language and the local Searching in the character principal component database of the language to determine the proportion of the main component of the preset number of character characters in the character principal component database of the native language language, as the first ratio, calculating the principal component of the preset number of text characters The proportion of the character principal component database existing in the local language as the second ratio.
  • S1054 if the first ratio is greater than or equal to a preset ratio threshold, and the second ratio is less than the preset ratio threshold, the text object belongs to the native language language; S1055, if the second If the ratio is greater than or equal to the preset ratio threshold, and the first ratio is less than the preset ratio threshold, the text object belongs to the local language; S1056, if the first ratio is smaller than the pre- If the ratio threshold is set, and the second ratio is less than the preset ratio threshold, the text object does not belong to the native language language and does not belong to the local language.
  • S106 if the text object does not belong to the native language language, and does not belong to the local language, obtain a classification hyperplane of one or more languages; perform a dimensionality reduction process on the text object by using a principal component analysis method to generate the The principal component matrix of the text object is then mapped to the high-dimensional feature space by the Gaussian kernel function to generate the test parameters of the text object.
  • the most likely language of the text object is a native language or a local language
  • the embodiment of the present application uses a pattern recognition method to identify the language to which the text object belongs.
  • a classification hyperplane of a plurality of languages is trained in advance, for example, a classification hyperplane of a German language, a classification hyperplane of a Korean language, and a classification hyperplane of an English language, and further determining a text object based on these classification hyperplanes
  • the specific language of the language, the specific judging process will be described in detail below. Firstly, if the text object does not belong to the native language language and does not belong to the local language, the classification hyperplane of more than one language is obtained.
  • the principal component analysis method is used to perform dimensionality reduction on the text object to generate a principal component matrix of the text object, and then the principal component matrix of the text object is mapped to the high-dimensional feature space by a Gaussian kernel function to generate a Before the test parameters of the text object are described, a process of classifying the hyperplane of the calculated language is also included.
  • FIG. 3 shows a specific implementation process of a classification hyperplane of a computing language provided by an embodiment of the present application, which is described in detail as follows:
  • S201 Select one or more candidate languages, and obtain one or more language databases of the candidate languages.
  • a plurality of commonly used languages are required as candidate languages, for example, French, English, Japanese, etc., and a language database of these candidate languages is retrieved, and these language databases are similar to the dictionary of these languages, but Compared to ordinary dictionaries, these language databases can be more concise, and can contain no explanation of words, but only the words themselves.
  • S202 Perform a dimensionality reduction process on each of the language databases by using a principal component analysis method to generate a principal component matrix of each of the candidate languages.
  • each candidate language is converted into a matrix form, the dimension is too large, which is not conducive to the subsequent calculation speed, and the words of a language often have the characteristics of common writing, so
  • the principal components of each candidate language are extracted by the PCA algorithm to generate a principal component matrix of each candidate language.
  • S203 Map a principal component matrix of each of the candidate languages to a high-dimensional feature space by a Gaussian kernel function to generate training parameters of each of the candidate languages.
  • an average of the principal component matrices of all candidate languages is calculated, an average principal component matrix is generated, and training parameters for each candidate language are calculated.
  • the forward training set and the negative training set corresponding to one candidate language may be input into the support vector machine model to calculate a classification hyperplane of the candidate language.
  • the calculation process of the above-mentioned classification hyperplane can be performed before the detection of the text object, that is, the classification hyperplane corresponding to multiple languages can be pre-calculated and stored in the mobile terminal, when the mobile terminal needs to recognize the text object. It can be called for the classification hyperplane of multiple languages. It can be understood that, since the classification hyperplane of the plurality of languages can be pre-stored, the embodiment of the present application may also calculate the classification hyperplane without using the above method, and by other methods, the classification hyperplane of multiple languages may be calculated and stored in The mobile terminal can implement the subsequent calculation process.
  • the principal component of the text object may be extracted by the PCA algorithm to generate a principal component matrix of the text object. Further, the principal component matrix of the text object is mapped to the high-dimensional feature space by a Gaussian kernel function to generate test parameters of the text object, and the specific calculation manner is the same as the calculation method of calculating the training parameters of the candidate language, so Not to repeat here.
  • the Euclidean distance formula can be used to calculate the Euclidean distance of the test parameter to the classification hyperplane of each language, and the smaller the Euclidean distance is, the smaller the difference between the test parameter and the classification hyperplane is, and the test parameter and the classification hyperplane are proved.
  • the corresponding language is the most similar, so in the embodiment of the present application, the language with the smallest Euclidean distance is determined as the target language.
  • the text object is translated into a native language.
  • FIG. 4 is a structural block diagram of an apparatus for text translation provided by the embodiment of the present application. For the convenience of description, only parts related to the embodiment of the present application are shown.
  • the apparatus includes:
  • the obtaining module 401 is configured to acquire a native language input by the user, and detect a user location, and determine a local language corresponding to the user location according to a preset relationship between the preset geographic location and the local language; and the opening module 402 is configured to: If the local language is not the native language, the translation function is enabled; the determining module 403 is configured to detect the text object after the translation function is enabled, and determine whether the text object belongs to the native language or the local language; a generating module 404, configured to acquire a classification hyperplane of one or more languages if the text object does not belong to the native language language and does not belong to the local language; and perform dimensionality reduction on the text object by using a principal component analysis method Generating a principal component matrix of the text object, and mapping a principal component matrix of the text object to a high-dimensional feature space by a Gaussian kernel function to generate a test parameter of the text object; the first translation module 405 is configured to: Calculating the Euclidean distance of the test parameter of
  • the device further includes: a statistics module, configured to collect a local language input by multiple users, and detect location coordinates when each user inputs a local language; a corresponding module, configured to set a unit area, and The language with the highest local language ratio in the unit area is determined as the local language corresponding to all the location coordinates in the unit area, to generate the corresponding relationship between the geographical location and the local language.
  • a statistics module configured to collect a local language input by multiple users, and detect location coordinates when each user inputs a local language
  • a corresponding module configured to set a unit area, and The language with the highest local language ratio in the unit area is determined as the local language corresponding to all the location coordinates in the unit area, to generate the corresponding relationship between the geographical location and the local language.
  • the determining module 403 includes: a detecting submodule, configured to detect a gap interval in the text object, and divide the text object into a plurality of text characters according to the gap interval; Selecting a predetermined number of the character characters, and extracting a main component of the text character; a calculation submodule for retrieving a character principal component database of the native language language and a character principal component database of the local language, And calculating a proportion of the preset number of the main components of the character character existing in the character principal component database of the native language language, and calculating, as the first ratio, a principal component of the preset number of the character characters a ratio of a character principal component database existing in the local language as a second ratio; a first determining submodule, configured to: if the first ratio is greater than or equal to a preset proportional threshold, and the second ratio is less than The preset proportional threshold, the text object belongs to the native language; the second determining submodule is configured to: if the second ratio is greater than or
  • the device further includes: selecting one or more candidate languages, and acquiring one or more language databases of the candidate languages; performing dimensionality reduction processing on each of the language databases by using a principal component analysis method to generate each of the a principal component matrix of candidate languages; mapping a principal component matrix of each of the candidate languages to a high-dimensional feature space by a Gaussian kernel function to generate training parameters of each of the candidate languages; and repeatedly performing execution from more than one candidate Selecting a language as the selected language in the language, forming a training parameter corresponding to the selected language into a forward training set, and training parameters corresponding to the language other than the selected language to form a negative training set, and according to the positive
  • the classification hyperplane is calculated to the training set and the negative training set until the classification hyperplane of all candidate languages is calculated.
  • the device further includes: a second translation module, configured to translate the text object into the local language if the text object belongs to the native language; and a third translation module If the text object belongs to the local language, the text object is translated into the
  • FIG. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 5 of this embodiment includes a processor 50, a memory 51, and computer readable instructions 52 stored in the memory 51 and executable on the processor 50, such as text translation. program.
  • the processor 50 when executing the computer readable instructions 52, implements the functions of the various modules/units in the various apparatus embodiments described above, such as the functions of the units 401 through 409 shown in FIG.
  • the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50, To complete this application.
  • the one or more modules/units may be a series of computer readable instruction instructions segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer readable instructions 52 in the electronic device 5.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

本方案适用于人工智能技术领域,提供了一种文字翻译的方法、装置、服务器及介质,通过获取母语语种,并根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种,在本地语种不为母语语种时,开启翻译功能,并在翻译功能开启后,检测文字对象所属的语种,若文字对象不属于所述母语语种,而且不属于所述本地语种,则通过预设算法识别文字对象,以得到对象语种,并将所述文字对象翻译为所述母语语种,使得用户无需手动选择本地语种,并且无需手动输入待翻译的文字,就可以在陌生的环境下对文字进行翻译,提高了自动翻译的便捷性。

Description

一种文字翻译的方法、装置、服务器及介质
本申请要求于2018年02月07日提交中国专利局、申请号为201810121444.9、发明名称为“一种文字翻译的方法及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于人工智能技术领域,尤其涉及一种文字翻译的方法、装置、服务器及介质。
背景技术
随着经济发展,越来越多的人走出国门到语言陌生的环境去旅游和工作。然而人们在异国他乡经常会遇到语言文字不通的情况,这给人们的生活和工作带来了极大的不便。例如,当人们在日本走进一家餐厅,会因为看不懂菜谱而无从点菜,当我们在法国的博物馆参观,也可能因为看不懂艺术品的介绍而影响参访效果。
为了解决在语言陌生环境下的阅读障碍问题,人们往往需要通过电子词典对当地的语言文字进行翻译,然而电子词典需要人手动输入待翻译的文字以及手动对当地语言以及母语进行选择,之后才可以进行翻译。
由此可见,当前的翻译过程需要人手动操作,而且步骤非常比较繁杂,耗时也非常的长,因此当前的文字翻译技术存在用户便捷性差,处理效率低等问题。
技术问题
解决现有技术在陌生环境进行文字翻译时存在的用户便捷性差,而且处理效率低的问题。
技术解决方案
本申请实施例的第一方面提供了一种文字翻译的方法,包括:
获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种;若所述本地语种不为所述母语语种,则开启翻译功能;在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种;若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数;计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
本申请实施例的第二方面提供了一种文字翻译的装置,包括:
获取模块,用于获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种;开启模块,用于若所述本地语种不为所述母语语种,则开启翻译功能;判断模块,用于在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种;生成模块,用于若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数;第一翻译模块,用于计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
本申请实施例的第三方面提供了一种文字翻译的服务器,包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现本申请实施例的第一方面提供了一种文字翻译的方法。
有益效果
用户无需手动选择本地语种,并且无需手动输入待翻译的文字,就可以在陌生的环境下对文字进行翻译,提高了自动翻译的便捷性。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的文字翻译的方法的实现流程图;
图2是本申请实施例提供的文字翻译的方法S105的具体实现流程图;
图3是本申请实施例提供的计算语种的分类超平面的具体实现流程图;
图4是本申请实施例提供的文字翻译的装置的结构框图;
图5是本申请实施例提供的电子设备的示意图。
本发明的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。
图1示出了本申请实施例提供的文字翻译的方法的实现流程,该方法流程包括步骤S101至S105。各步骤的具体实现原理如下。
S101:获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种。
在本申请实施例中,预先设定了多组地理位置与本地语种的对应关系,例如,在西经73度至西经125度,北纬25度至北纬49度的范围内,对应的本地语种为英语语种;在东经139度至东经142度,北纬35度至北纬40度的范围内,对应的本地语种为日语语种。
可以理解地,通过上述的地理位置与本地语种的对应关系,可以确定检测到的用户位置对应的本地语种。
进一步地,由于在本申请实施例中,地理位置与本地语种的对应关系是预设的,所以也并不需要用户在每一次希望对文字进行翻译时输入本地语种,本申请实施例可以通过用户位置自动确定本地语种。此外,本申请实施例还提供了在所述获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与语种的对应关系,将所述用户位置对应的语种作为本地语种之前,建立地理位置与本地语种的对应关系的方法,该方法包括:
首先,统计多个用户输入的本地语种,并检测各个用户输入本地语种时的位置坐标。
可以理解地,由于在一个大的地理范围内,可能存在一个小的地理范围的本地语种与其所在的大的地理范围的本地语种不同的情况,而这些小的区域对应的本地语种往往是很难直接确定的。例如,加拿大作为一个大的地理范围,其官方的本地语种应该是英语语种,然而在加拿大这个大的地理范围内,存在一个魁北克地区,二魁北克地区作为一个小的地理范围,其官方的本地语种是法语语种,在魁北克地区内例如博物馆中的介绍、餐馆的菜谱以及指示牌等文字,大多是由法语书写的。所以为了使得本申请实施例所自动确定的本地语种更加贴合实际情况,本申请实施例可以接收用户对于本地语种的选择。
可以理解地,虽然如上文所述,用户为了翻译文字并不需要每次都手动输入本地语种,可以通过用户位置自动确定本地语种,但是并不意味着不可以接收用户手动输入本地语种的。在很多情况下,例如用户发现自动确定的本地语种与真实情况不符合,或者发现无法通过用户位置确定本地语种时,用户都可以通过手动输入的方式,确定本地语种。本申请实施例在接收到用户输入的本地语种后,会确定并记录当前的位置坐标,并以此作为后续生成或修改地理位置与本地语种的对应关系的参考数据。
其次,设定单位面积,并将地图中的一个所述单位面积内本地语种比例最高的语种确定为所述单位面积内所有位置坐标对应的本地语种,以生成所述地理位置与本地语种的对应关系。
可选地,在一个世界地图上划设多个区域,每个区域都为一个矩形,而且每个区域的面积设定为单位面积,例如,将世界地图分为多个面积为1平方公里的正方形区域。
可以理解地,由于在上述步骤中,统计了多个用户输入的本地语种,所以可以在本步骤中计算出各个区域内用户输入的本地语种的比例,例如:在某个区域内,一共接收到100名用户输入的本地语种,其中有90名用户输入的本地语种是英语语种,有8名用户输入的本地语种是法语语种,有2名用户输入的本地语种是西班牙语语种,则确定该区域内所有位置坐标对应的本地语种都是英语语种。
在本申请实施例中,通过检测用户位置,智能地确定本地语种,从而减少用户手动选择语种的步骤次数,提高了操作效率。
S102:判断所述本地语种是否为母语语种。
在本申请实施例中,由于考虑到节省CPU处理资源以及电量的原因,文字翻译功能并不是持续开启的,而只有在本地语种与母语语种不相同时,翻译功能才会自动开启。所以需要首先对根据上述步骤确定的母语语种以及本地语种是否一致做出判断。
S103,若所述本地语种为所述母语语种,则不自动开启翻译功能。S104,若所述本地语种不为所述母语语种,则开启翻译功能。
值得注意地,虽然在本申请实施例中,只有在本地语种不为所述母语语种时会自动开启翻译功能,但是并不代表只有在这种情况下翻译功能才会开启,因为翻译功能还可以被用户手动开启,例如在本地语种为母语语种时,如果用户手动输入翻译开启指令,翻译功能依然会开启。
S105:在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种。可以理解地,当翻译功能开启后,通过电子设备的摄像头就可以检测待翻译的文字对象,以进一步判断所述文字对象是否属于母语语种或者本地语种。
作为本申请的一个实施例,如图2所示,上述S105包括:
S1051:检测所述文字对象中的空隙间隔,并根据所述空隙间隔将所述文字对象分为多个文字字符。具体地,由于一个文字对象可能是由多个文字字符构成的,所以通过检测文字对象的空隙间隔,将文字对象分为多个部分,通过现有的图像识别技术可以识别出每个部分对应的文字字符。
S1052:选取预设个数的所述文字字符,并提取所述文字字符的主成分。可选地,在本申请实施例中,可以通过主成分分析算法(Principal Component Analysis,PCA)提取出各个文字字符的主成分,从而减少后续计算的计算量。
S1053:调取所述母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库,并计算所述预设个数的所述文字字符的主成分存在于所述母语语种的字符主成分数据库的比例,作为第一比例,计算所述预设个数的所述文字字符的主成分存在于所述本地语种的字符主成分数据库的比例,作为第二比例。
可以理解地,由于根据上文所述的方法已经确定了母语语种以及本地语种,所以在本步骤中,可以调用母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库,而且上述两个数据库中分别包含了大量的母语语种的字符主成分以及大量的本地语种的字符主成分,所以可以依次将预设个数的文字字符的主成分在母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库中进行查找,以确定预设个数的文字字符的主成分存在于母语语种的字符主成分数据库的比例,作为第一比例,计算预设个数的文字字符的主成分存在于本地语种的字符主成分数据库的比例,作为第二比例。
S1054,若所述第一比例大于或等于预设的比例阈值,而所述第二比例小于所述预设的比例阈值,则所述文字对象属于所述母语语种;S1055,若所述第二比例大于或等于所述预设的比例阈值,而所述第一比例小于所述预设的比例阈值,则所述文字对象属于所述本地语种;S1056,若所述第一比例小于所述预设的比例阈值,而且所述第二比例小于所述预设的比例阈值,则所述文字对象不属于所述母语语种,而且不属于所述本地语种。
S106:若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数。可以理解地,由于文字对象最可能的所属语种为母语语种或者本地语种,所以在上述的步骤中首先对文字对象是否属于母语语种或者本地语种进行判断,如果文字对象属于母语语种或者本地语种,就可以直接调用对应的词典对文字对象中的字符进行翻译,从而减少翻译的计算量。然而,存在一种可能,即文字对象既不属于母语语种也不属于本地语种,则本申请实施例会采用一种模式识别的方法识别出文字对象所属的语种。
在本申请实施例中,事先训练出多个语种的分类超平面,例如德语语种的分类超平面、韩语语种的分类超平面以及英语语种的分类超平面,并进一步基于这些分类超平面判断文字对象所属的语种,具体的判断过程将在下文详述,在此首先描述在所述若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数之前,还包括的一个计算语种的分类超平面的流程。
作为本申请的一个实施例,图3示出了本申请实施例提供的计算语种的分类超平面的具体实现流程,详述如下:
S201:选取一个以上的候选语种,并获取所述候选语种的一个以上的语种数据库。
在本申请实施例中,需要将多个常用的语种作为候选语种,例如:法语、英语、日语等等,并调取这些候选语种的语种数据库,这些语种数据库类似是这些语种的字典,但与普通的字典相比,这些语种数据库可以更加简洁,可以不包含单词的解释,而只收录各个单词本身。S202:采用主成分分析法对各个所述语种数据库进行降维处理,生成各个所述候选语种的主成分矩阵。在本申请实施例中,由于考虑到一个语种对应的数据库转换成矩阵形式后,维度过大,不利于后续的计算速度,而且一个语种的单词往往都是有共同的写法上的特点,所以可以通过PCA算法提取各个候选语种的主成分,以生成各个候选语种的主成分矩阵。S203:通过高斯核函数将所述各个所述候选语种的主成分矩阵映射到高维特征空间,以生成各个所述候选语种的训练参数。可选地,计算所有候选语种的主成分矩阵的平均值,生成平均主成分矩阵,并计算各个候选语种的训练参数。
S204,反复执行从一个以上的所述候选语种中选择一个语种作为被选语种,将所述被选语种对应的训练参数组成正向训练集,将所述被选语种以外的语种对应的训练参数组成负向训练集,并根据所述正向训练集以及所述负向训练集计算出分类超平面,直至计算出全部候选语种的分类超平面。
可选地,可以将一个候选语种对应的正向训练集以及负向训练集一起输入支持向量机模型中,计算出该候选语种的分类超平面。
值得注意地,上述分类超平面的计算过程可以在检测文字对象之前进行,即多个语种对应的分类超平面可以是预先计算,并存储在移动终端中,当移动终端需要对文字对象进行识别时,对多个语种的分类超平面进行调用即可。可以理解地,由于多个语种的分类超平面是可以预先存储的,本申请实施例也可以不通过上述方法计算分类超平面,通过其他方法只要可以计算出多个语种的分类超平面并存储于移动终端,均可以实现后续的计算过程。
在本申请实施例中,可以通过PCA算法提取文字对象的主成分,以生成文字对象的主成分矩阵。进一步地,通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数,具体的计算方式与计算候选语种的训练参数的计算方式相同,所以不在此赘述。
S107,计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
可以理解地,通过欧式距离公式可以计算出测试参数到各个语种的分类超平面的欧式距离,而欧式距离越小证明测试参数与分类超平面的差距越小,进而证明测试参数与该分类超平面对应的语种最相似,所以在本申请实施例中,将欧式距离最小的语种,确定为对象语种。
进一步地,在确定了文字对象对应的对象语种之后,将文字对象翻译为母语语种。
S108,若所述文字对象属于所述母语语种,则将所述文字对象翻译为所述本地语种;
S109,若所述文字对象属于所述本地语种,则将所述文字对象翻译为所述母语语种。
对应于上文实施例所述的文字翻译的方法,图4示出了本申请实施例提供的文字翻译的装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。
参照图4,该装置包括:
获取模块401,用于获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种;开启模块402,用于若所述本地语种不为所述母语语种,则开启翻译功能;判断模块403,用于在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种;生成模块404,用于若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数;第一翻译模块405,用于计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
可选地,所述装置还包括:统计模块,用于统计多个用户输入的本地语种,并检测各个用户输入本地语种时的位置坐标;对应模块,用于设定单位面积,并将地图中的一个所述单位面积内本地语种比例最高的语种确定为所述单位面积内所有位置坐标对应的本地语种,以生成所述地理位置与本地语种的对应关系。
可选地,所述判断模块403包括:检测子模块,用于检测所述文字对象中的空隙间隔,并根据所述空隙间隔将所述文字对象分为多个文字字符;选取子模块,用于选取预设个数的所述文字字符,并提取所述文字字符的主成分;计算子模块,用于调取所述母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库,并计算所述预设个数的所述文字字符的主成分存在于所述母语语种的字符主成分数据库的比例,作为第一比例,计算所述预设个数的所述文字字符的主成分存在于所述本地语种的字符主成分数据库的比例,作为第二比例;第一判定子模块,用于若所述第一比例大于或等于预设的比例阈值,而所述第二比例小于所述预设的比例阈值,则所述文字对象属于所述母语语种;第二判定子模块,用于若所述第二比例大于或等于所述预设的比例阈值,而所述第一比例小于所述预设的比例阈值,则所述文字对象属于所述本地语种;第三判定子模块,用于若所述第一比例小于所述预设的比例阈值,而且所述第二比例小于所述预设的比例阈值,则所述文字对象不属于所述母语语种,而且不属于所述本地语种。
可选地,所述装置还包括:选取一个以上的候选语种,并获取所述候选语种的一个以上的语种数据库;采用主成分分析法对各个所述语种数据库进行降维处理,生成各个所述候选语种的主成分矩阵;通过高斯核函数将所述各个所述候选语种的主成分矩阵映射到高维特征空间,以生成各个所述候选语种的训练参数;反复执行从一个以上的所述候选语种中选择一个语种作为被选语种,将所述被选语种对应的训练参数组成正向训练集,将所述被选语种以外的语种对应的训练参数组成负向训练集,并根据所述正向训练集以及所述负向训练集计算出分类超平面,直至计算出全部候选语种的分类超平面。可选地,所述装置还包括:第二翻译模块,用于若所述文字对象属于所述母语语种,则将所述文字对象翻译为所述本地语种;第三翻译模块,用于若所述文字对象属于所述本地语种,则将所述文字对象翻译为所述母语语种。
图5是本申请一实施例提供的电子设备的示意图。如图5所示,该实施例的电子设备5包括:处理器50、存储器51以及存储在所述存储器51中并可在所述处理器50上运行的计算机可读指令52,例如文字翻译的程序。所述处理器50执行所述计算机可读指令52时实现上述各个文字翻译的方法实施例中的步骤,例如图1所示的步骤101至109。或者,所述处理器50执行所述计算机可读指令52时实现上述各装置实施例中各模块/单元的功能,例如图4所示单元401至409的功能。示例性的,所述计算机可读指令52可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器51中,并由所述处理器50执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令指令段,该指令段用于描述所述计算机可读指令52在所述电子设备5中的执行过程。

Claims (20)

  1. 一种文字翻译的方法,其特征在于,包括:
    获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种;
    若所述本地语种不为所述母语语种,则开启翻译功能;
    在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种;
    若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数;
    计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
  2. 如权利要求1所述的文字翻译的方法,其特征在于,在所述获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与语种的对应关系,将所述用户位置对应的语种作为本地语种之前,还包括:
    统计多个用户输入的本地语种,并检测各个用户输入本地语种时的位置坐标;
    设定单位面积,并将地图中的一个所述单位面积内本地语种比例最高的语种确定为所述单位面积内所有位置坐标对应的本地语种,以生成所述地理位置与本地语种的对应关系。
  3. 如权利要求1所述的文字翻译的方法,其特征在于,所述检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种,包括:
    检测所述文字对象中的空隙间隔,并根据所述空隙间隔将所述文字对象分为多个文字字符;
    选取预设个数的所述文字字符,并提取所述文字字符的主成分;
    调取所述母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库,并计算所述预设个数的所述文字字符的主成分存在于所述母语语种的字符主成分数据库的比例,作为第一比例,计算所述预设个数的所述文字字符的主成分存在于所述本地语种的字符主成分数据库的比例,作为第二比例;
    若所述第一比例大于或等于预设的比例阈值,而所述第二比例小于所述预设的比例阈值,则所述文字对象属于所述母语语种;
    若所述第二比例大于或等于所述预设的比例阈值,而所述第一比例小于所述预设的比例阈值,则所述文字对象属于所述本地语种;
    若所述第一比例小于所述预设的比例阈值,而且所述第二比例小于所述预设的比例阈值,则所述文字对象不属于所述母语语种,而且不属于所述本地语种。
  4. 如权利要求1所述的文字翻译的方法,其特征在于,在所述若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数之前,还包括:
    选取一个以上的候选语种,并获取所述候选语种的一个以上的语种数据库;
    采用主成分分析法对各个所述语种数据库进行降维处理,生成各个所述候选语种的主成分矩阵;
    通过高斯核函数将所述各个所述候选语种的主成分矩阵映射到高维特征空间,以生成各个所述候选语种的训练参数;
    反复执行从一个以上的所述候选语种中选择一个语种作为被选语种,将所述被选语种对应的训练参数组成正向训练集,将所述被选语种以外的语种对应的训练参数组成负向训练集,并根据所述正向训练集以及所述负向训练集计算出分类超平面,直至计算出全部候选语种的分类超平面。
  5. 如权利要求1所述的文字翻译的方法,其特征在于,还包括:
    若所述文字对象属于所述母语语种,则将所述文字对象翻译为所述本地语种;
    若所述文字对象属于所述本地语种,则将所述文字对象翻译为所述母语语种。
  6. 一种文字翻译的装置,其特征在于,包括:
    获取模块,用于获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种;
    开启模块,用于若所述本地语种不为所述母语语种,则开启翻译功能;
    判断模块,用于在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种;
    生成模块,用于若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数;
    第一翻译模块,用于计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
  7. 如权利要求6所述的文字翻译的装置,其特征在于,还包括:
    统计模块,用于统计多个用户输入的本地语种,并检测各个用户输入本地语种时的位置坐标;
    对应模块,用于设定单位面积,并将地图中的一个所述单位面积内本地语种比例最高的语种确定为所述单位面积内所有位置坐标对应的本地语种,以生成所述地理位置与本地语种的对应关系。
  8. 如权利要求6所述的文字翻译的装置,其特征在于,所述判断模块,包括:
    检测子模块,用于检测所述文字对象中的空隙间隔,并根据所述空隙间隔将所述文字对象分为多个文字字符;
    选取子模块,用于选取预设个数的所述文字字符,并提取所述文字字符的主成分;
    计算子模块,用于调取所述母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库,并计算所述预设个数的所述文字字符的主成分存在于所述母语语种的字符主成分数据库的比例,作为第一比例,计算所述预设个数的所述文字字符的主成分存在于所述本地语种的字符主成分数据库的比例,作为第二比例;
    第一判定子模块,用于若所述第一比例大于或等于预设的比例阈值,而所述第二比例小于所述预设的比例阈值,则所述文字对象属于所述母语语种;
    第二判定子模块,用于若所述第二比例大于或等于所述预设的比例阈值,而所述第一比例小于所述预设的比例阈值,则所述文字对象属于所述本地语种;
    第三判定子模块,用于若所述第一比例小于所述预设的比例阈值,而且所述第二比例小于所述预设的比例阈值,则所述文字对象不属于所述母语语种,而且不属于所述本地语种。
  9. 如权利要求6所述的文字翻译的装置,其特征在于,还用于:
    选取一个以上的候选语种,并获取所述候选语种的一个以上的语种数据库;
    采用主成分分析法对各个所述语种数据库进行降维处理,生成各个所述候选语种的主成分矩阵;
    通过高斯核函数将所述各个所述候选语种的主成分矩阵映射到高维特征空间,以生成各个所述候选语种的训练参数;
    反复执行从一个以上的所述候选语种中选择一个语种作为被选语种,将所述被选语种对应的训练参数组成正向训练集,将所述被选语种以外的语种对应的训练参数组成负向训练集,并根据所述正向训练集以及所述负向训练集计算出分类超平面,直至计算出全部候选语种的分类超平面。
  10. 如权利要求6所述的文字翻译的装置,其特征在于,还包括:
    第二翻译模块,用于若所述文字对象属于所述母语语种,则将所述文字对象翻译为所述本地语种;
    第三翻译模块,用于若所述文字对象属于所述本地语种,则将所述文字对象翻译为所述母语语种。
  11. 一种文字翻译的服务器,其特征在于,所述文字翻译的处理服务器包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种;
    若所述本地语种不为所述母语语种,则开启翻译功能;
    在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种;
    若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数;
    计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
  12. 根据权利要求11所述的文字翻译的服务器,其特征在于,在所述获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与语种的对应关系,将所述用户位置对应的语种作为本地语种之前,还包括:
    统计多个用户输入的本地语种,并检测各个用户输入本地语种时的位置坐标;
    设定单位面积,并将地图中的一个所述单位面积内本地语种比例最高的语种确定为所述单位面积内所有位置坐标对应的本地语种,以生成所述地理位置与本地语种的对应关系。
  13. 根据权利要求12所述的文字翻译的服务器,其特征在于,所述检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种,包括:
    检测所述文字对象中的空隙间隔,并根据所述空隙间隔将所述文字对象分为多个文字字符;
    选取预设个数的所述文字字符,并提取所述文字字符的主成分;
    调取所述母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库,并计算所述预设个数的所述文字字符的主成分存在于所述母语语种的字符主成分数据库的比例,作为第一比例,计算所述预设个数的所述文字字符的主成分存在于所述本地语种的字符主成分数据库的比例,作为第二比例;
    若所述第一比例大于或等于预设的比例阈值,而所述第二比例小于所述预设的比例阈值,则所述文字对象属于所述母语语种;
    若所述第二比例大于或等于所述预设的比例阈值,而所述第一比例小于所述预设的比例阈值,则所述文字对象属于所述本地语种;
    若所述第一比例小于所述预设的比例阈值,而且所述第二比例小于所述预设的比例阈值,则所述文字对象不属于所述母语语种,而且不属于所述本地语种。
  14. 根据权利要求11所述的文字翻译的服务器,其特征在于,在所述若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数之前,还包括:
    选取一个以上的候选语种,并获取所述候选语种的一个以上的语种数据库;
    采用主成分分析法对各个所述语种数据库进行降维处理,生成各个所述候选语种的主成分矩阵;
    通过高斯核函数将所述各个所述候选语种的主成分矩阵映射到高维特征空间,以生成各个所述候选语种的训练参数;
    反复执行从一个以上的所述候选语种中选择一个语种作为被选语种,将所述被选语种对应的训练参数组成正向训练集,将所述被选语种以外的语种对应的训练参数组成负向训练集,并根据所述正向训练集以及所述负向训练集计算出分类超平面,直至计算出全部候选语种的分类超平面。
  15. 根据权利要求11所述的文字翻译的服务器,其特征在于,还包括:
    若所述文字对象属于所述母语语种,则将所述文字对象翻译为所述本地语种;
    若所述文字对象属于所述本地语种,则将所述文字对象翻译为所述母语语种。
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被至少一个处理器执行时实现如下步骤:
    获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与本地语种的对应关系,确定所述用户位置对应的本地语种;
    若所述本地语种不为所述母语语种,则开启翻译功能;
    在所述翻译功能开启后,检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种;
    若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数;
    计算所述文字对象的测试参数与各个所述语种的分类超平面的欧式距离,作为各个语种对应的欧式距离,将欧式距离最小的语种,确定为对象语种,并将所述文字对象翻译为所述母语语种。
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,在所述获取用户输入的母语语种,并检测用户位置,根据预设的地理位置与语种的对应关系,将所述用户位置对应的语种作为本地语种之前,还包括:
    统计多个用户输入的本地语种,并检测各个用户输入本地语种时的位置坐标;
    设定单位面积,并将地图中的一个所述单位面积内本地语种比例最高的语种确定为所述单位面积内所有位置坐标对应的本地语种,以生成所述地理位置与本地语种的对应关系。
  18. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述检测文字对象,并判断所述文字对象是否属于所述母语语种或者所述本地语种,包括:
    检测所述文字对象中的空隙间隔,并根据所述空隙间隔将所述文字对象分为多个文字字符;
    选取预设个数的所述文字字符,并提取所述文字字符的主成分;
    调取所述母语语种的字符主成分数据库以及所述本地语种的字符主成分数据库,并计算所述预设个数的所述文字字符的主成分存在于所述母语语种的字符主成分数据库的比例,作为第一比例,计算所述预设个数的所述文字字符的主成分存在于所述本地语种的字符主成分数据库的比例,作为第二比例;
    若所述第一比例大于或等于预设的比例阈值,而所述第二比例小于所述预设的比例阈值,则所述文字对象属于所述母语语种;
    若所述第二比例大于或等于所述预设的比例阈值,而所述第一比例小于所述预设的比例阈值,则所述文字对象属于所述本地语种;
    若所述第一比例小于所述预设的比例阈值,而且所述第二比例小于所述预设的比例阈值,则所述文字对象不属于所述母语语种,而且不属于所述本地语种。
  19. 根据权利要求16所述的计算机可读存储介质,其特征在于,在所述若所述文字对象不属于所述母语语种,而且不属于所述本地语种,则获取一个以上语种的分类超平面;采用主成分分析法对所述文字对象进行降维处理,生成所述文字对象的主成分矩阵,再通过高斯核函数将所述文字对象的主成分矩阵映射到高维特征空间,以生成所述文字对象的测试参数之前,还包括:
    选取一个以上的候选语种,并获取所述候选语种的一个以上的语种数据库;
    采用主成分分析法对各个所述语种数据库进行降维处理,生成各个所述候选语种的主成分矩阵;
    通过高斯核函数将所述各个所述候选语种的主成分矩阵映射到高维特征空间,以生成各个所述候选语种的训练参数;
    反复执行从一个以上的所述候选语种中选择一个语种作为被选语种,将所述被选语种对应的训练参数组成正向训练集,将所述被选语种以外的语种对应的训练参数组成负向训练集,并根据所述正向训练集以及所述负向训练集计算出分类超平面,直至计算出全部候选语种的分类超平面。
  20. 根据权利要求16所述的计算机可读存储介质,其特征在于,还包括:
    若所述文字对象属于所述母语语种,则将所述文字对象翻译为所述本地语种;
    若所述文字对象属于所述本地语种,则将所述文字对象翻译为所述母语语种。
PCT/CN2018/082606 2018-02-07 2018-04-11 一种文字翻译的方法、装置、服务器及介质 WO2019153480A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810121444.9 2018-02-07
CN201810121444.9A CN108427672B (zh) 2018-02-07 2018-02-07 文字翻译的方法、终端设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019153480A1 true WO2019153480A1 (zh) 2019-08-15

Family

ID=63156752

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/082606 WO2019153480A1 (zh) 2018-02-07 2018-04-11 一种文字翻译的方法、装置、服务器及介质

Country Status (2)

Country Link
CN (1) CN108427672B (zh)
WO (1) WO2019153480A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427672B (zh) * 2018-02-07 2019-05-07 平安科技(深圳)有限公司 文字翻译的方法、终端设备及计算机可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1494695A (zh) * 2001-03-06 2004-05-05 无疏漏翻译系统
CN101702314A (zh) * 2009-10-13 2010-05-05 清华大学 基于语种对的鉴别式语种识别模型建立方法
US8843359B2 (en) * 2009-02-27 2014-09-23 Andrew Nelthropp Lauder Language translation employing a combination of machine and human translations
CN104205093A (zh) * 2012-02-03 2014-12-10 谷歌公司 经翻译的新闻
CN105632485A (zh) * 2015-12-28 2016-06-01 浙江大学 一种基于语种识别系统的语言距离关系的获取方法
CN206639220U (zh) * 2017-01-05 2017-11-14 陈伯妤 一种便携式同传设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5100445B2 (ja) * 2008-02-28 2012-12-19 株式会社東芝 機械翻訳する装置および方法
CN102650987A (zh) * 2011-02-25 2012-08-29 北京百度网讯科技有限公司 一种基于源语言复述资源的机器翻译方法及装置
CN104239516A (zh) * 2014-09-17 2014-12-24 南京大学 一种不平衡数据分类方法
CN105320644B (zh) * 2015-09-23 2018-01-02 陕西中医药大学 一种基于规则的自动汉语句法分析方法
US20170308526A1 (en) * 2016-04-21 2017-10-26 National Institute Of Information And Communications Technology Compcuter Implemented machine translation apparatus and machine translation method
CN107357568A (zh) * 2017-06-12 2017-11-17 北京天健通泰科技有限公司 一种基于多国语标签的原语言替换方法
CN108427672B (zh) * 2018-02-07 2019-05-07 平安科技(深圳)有限公司 文字翻译的方法、终端设备及计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1494695A (zh) * 2001-03-06 2004-05-05 无疏漏翻译系统
US8843359B2 (en) * 2009-02-27 2014-09-23 Andrew Nelthropp Lauder Language translation employing a combination of machine and human translations
CN101702314A (zh) * 2009-10-13 2010-05-05 清华大学 基于语种对的鉴别式语种识别模型建立方法
CN104205093A (zh) * 2012-02-03 2014-12-10 谷歌公司 经翻译的新闻
CN105632485A (zh) * 2015-12-28 2016-06-01 浙江大学 一种基于语种识别系统的语言距离关系的获取方法
CN206639220U (zh) * 2017-01-05 2017-11-14 陈伯妤 一种便携式同传设备

Also Published As

Publication number Publication date
CN108427672B (zh) 2019-05-07
CN108427672A (zh) 2018-08-21

Similar Documents

Publication Publication Date Title
TWI682302B (zh) 風險地址識別方法、裝置以及電子設備
CN107291945B (zh) 基于视觉注意力模型的高精度服装图像检索方法及系统
US10664519B2 (en) Visual recognition using user tap locations
CN104268603B (zh) 用于文字性客观题的智能阅卷方法及系统
WO2021057138A1 (zh) 一种证件识别方法及装置
US20200167595A1 (en) Information detection method, apparatus, and device
JP2020135852A (ja) 画像に基づくデータ処理方法、装置、電子機器、コンピュータ可読記憶媒体およびコンピュータプログラム
WO2018177316A1 (zh) 信息识别方法、计算设备及存储介质
CN111401410B (zh) 一种基于改进级联神经网络的交通标志检测方法
JP2016162423A (ja) 物体認識装置、物体認識方法、およびプログラム
CN111291759A (zh) 文字检测方法、装置、电子设备及存储介质
CN111008576B (zh) 行人检测及其模型训练、更新方法、设备及可读存储介质
TWI553491B (zh) 問句處理系統及其方法
JP2023527615A (ja) 目標対象検出モデルのトレーニング方法、目標対象検出方法、機器、電子機器、記憶媒体及びコンピュータプログラム
CN111444905B (zh) 基于人工智能的图像识别方法和相关装置
CN114898372A (zh) 一种基于边缘注意力引导的越南场景文字检测方法
Ling et al. Research on gesture recognition based on YOLOv5
WO2019153480A1 (zh) 一种文字翻译的方法、装置、服务器及介质
US20200097759A1 (en) Table Header Detection Using Global Machine Learning Features from Orthogonal Rows and Columns
CN106384127B (zh) 为图像特征点确定比较点对及二进制描述子的方法及系统
WO2021142765A1 (en) Text line detection
CN109241208B (zh) 地址定位、地址监测、信息处理方法及装置
CN111538813B (zh) 一种分类检测方法、装置、设备及存储介质
Nusrat et al. Automatic Bangla Signboard and Region of Text Interests Detection from Natural Scene
Ilyasi et al. Object-Text Detection and Recognition System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18905747

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18905747

Country of ref document: EP

Kind code of ref document: A1