WO2006133571A1 - Moyen et procede pour une traduction de langue adaptee - Google Patents

Moyen et procede pour une traduction de langue adaptee Download PDF

Info

Publication number
WO2006133571A1
WO2006133571A1 PCT/CA2006/001004 CA2006001004W WO2006133571A1 WO 2006133571 A1 WO2006133571 A1 WO 2006133571A1 CA 2006001004 W CA2006001004 W CA 2006001004W WO 2006133571 A1 WO2006133571 A1 WO 2006133571A1
Authority
WO
WIPO (PCT)
Prior art keywords
source
translation
language
bilingual
sublanguage
Prior art date
Application number
PCT/CA2006/001004
Other languages
English (en)
Inventor
George Foster
Roland Kuhn
Original Assignee
National Research Council Of Canada
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Research Council Of Canada filed Critical National Research Council Of Canada
Priority to CA2612404A priority Critical patent/CA2612404C/fr
Priority to EP06761071.7A priority patent/EP1894125A4/fr
Priority to US11/922,311 priority patent/US8612203B2/en
Publication of WO2006133571A1 publication Critical patent/WO2006133571A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment

Definitions

  • Each human language is made up of "sublanguages” or “discourse domains” within each of which words and phrases tend to have a single, unambiguous meaning.
  • sublanguages or “discourse domains” within each of which words and phrases tend to have a single, unambiguous meaning.
  • a defender of current statistical machine translation systems might say that these systems can already be adapted to a new domain, by virtue of the fact that they tend to work well for the type of document they have been trained on. For instance, one may have trained such a system on a group of news stories about events of general interest. Then, a client requests a customized version of the system that will produce good translations for specialized articles about finance. One then "adapts" the system to this domain by retraining it on a bilingual, parallel corpus of specialized articles about finance. For instance, a system trained on financial articles will know only the financial meaning of the English word "bank”, and thus accurately translate this word when new sentences from financial articles are given to it.
  • CMU Carnegie Mellon University
  • Another object of the invention is to provide a means and a method for adaptive machine translation.
  • Another embodiment of the invention comprises a computer readable memory for translating text from a source language into a target language comprising a translation unit, a source text unit, said source text unit comprising said translation unit, and a bilingual sublanguage space.
  • Figure 1A illustrates a conventional translation system
  • Figure 1 B illustrates an embodiment of the proposed system
  • Figure 2 illustrates different means of defining the source text unit
  • Figure 3 illustrates an alternative means of defining the source text unit
  • Figure 4 illustrates an example of decoding in a continuous space
  • Figure 5 illustrates an example of adaptive decoding based on weights estimated on a dynamically constructed development corpus
  • Figure 6 illustrates one embodiment of adaptive rescoring based on a bilingual sublanguage space
  • mapping S ⁇ T there is a variant embodiment of the invention in which the space is discrete, rather than continuous, in the sense that only certain discrete points in the space are associated with mappings S ⁇ T.
  • source text unit might be mapped (prior to translation) onto the nearest point in the space where a mapping S ⁇ T is defined. For instance, an article describing the finances of major-league baseball might have to be mapped either onto the region labeled "sport", or that labeled "finance". For most purposes this discrete variant of the invention is less desirable than the continuous one.
  • bilingual sublanguage space may correlate with topic, others may reflect aspects of genre or style. For instance, most languages maintain a distinction between formality and informality that is independent of the topic. Thus, two documents in the same language about the same topic may occupy different positions in sublanguage space, if one is written formally and the other informally.
  • bilingual sublanguage space is defined automatically by statistical means, and its dimensions may thus not always be entirely interpretable in terms of topic, style, or genre (though they will often be correlated with them).
  • Figure 1 shows the translation of the fth translation unit according to the invention. (In the figure, the translation unit happens to be a sentence).
  • the system locates a source text unit that includes translation unit i in source sublanguage space, at the position marked by an "X".
  • a source text unit that includes translation unit i in source sublanguage space, at the position marked by an "X”.
  • it constructs a specialized translation mapping S ⁇ T at this position.
  • it uses this mapping to carry out the translation of translation unit / into the target language.
  • the distinction between the source text unit and the translation unit it contains is important: the former is used to find the appropriate mapping S ⁇ T, while the latter is the word sequence to which this mapping is applied.
  • the source text unit could be defined to be the same as the translation unit. In this case, estimation of the position of translation unit / in source sublanguage space would be carried out only using the translation unit itself.
  • the correct granularity of text to use as the source text unit depends on the application. For instance, one could determine a location in sublanguage space for each source paragraph, or for the entire source text file.. If the source text comes with pre-defined typographically indicated boundaries (e.g., a segmentation into “stories”, “sections”, or “articles"), it may be convenient to use the units defined by these boundaries as the source text units. Alternatively, one might employ an automatic text segment boundary detector, as is well-known in the field of natural language processing (see for instance US patents #6,104,989 and #6,529,902, which show how to segment text into segments so that the topic does not change within a segment). If this approach is adopted, the source text unit will be defined as a window of varying size that includes the sentence or line currently being translated, whose boundaries are adjusted so that the text included in the window satisfies some criterion of homogeneity.
  • the source text unit may weight the information in the source text unit in order to assign less importance to words that are remote from the translation unit. For example, in the current embodiment, computation of the vector representing the source text unit could be modified by discounting the counts of words that are remote from the translation unit.
  • the appearance of a vocabulary word w anywhere within the source text unit causes the count for w to be incremented by 1. Instead, one could define the increment for an appearance of w as being 1/d, where d is the number of words separating this appearance of w from the sentence or line currently being translated (i.e., the translation unit) . Note that in this variant of the current embodiment, only the words contained in the current translation unit would receive the full weight of 1. Many other schemes of this kind, in which text in the source text unit is weighted in some way, are possible.
  • Figure 3 indicates another way in which the source text unit might be defined in the case where the translation unit is located in a web page on the Internet.
  • the source text unit that includes source-language text from pages that have a "web distance" (number of links) of 1 from the parts of the original page near the translation unit - i.e., from pages that link directly to or from nearby text. It would be easy to generalize this kind of definition of the source text unit by allowing different values for the web distance (perhaps treating "from” and "to” links differently).
  • One aspect of the invention is the module that finds the location of a source text unit in the source sublanguage space.
  • this module To implement this module, one requires a similarity measure between texts in a given language, with source text units that are similar being assigned to nearby points in the space, and dissimilar source text units being assigned to points that are far apart from each other in the space.
  • these measures underlie, for instance, the ability of search engines to find documents or web pages that are "close” to a set of words typed into the search engine by the user, or to find web pages that are "close” to another web page.
  • the module for finding the location of a source text unit in the source sublanguage space can be constructed according to a wide variety of well-understood techniques used in creating today's information retrieval systems.
  • decoding (translation) of a source sentence S is carried out by finding a word sequence J in the target language that maximizes the function P(SJ D*P(I).
  • the probability P(S]T) is estimated by a part of the system called the “translation model", encoded in a data structure which is called the "phrase table”.
  • the probability P(T) is estimated by a part of the system called the "language model”.
  • the translation model is estimated from training data consisting of bilingual aligned text in the source and target language; the language model is estimated from text in the language model only.
  • the initial step is phrase alignment between the source-language and target-language portions of each bilingual document pair, as described in the references above.
  • a phrase just means a contiguous sequence of words.
  • alignment is two-pass. In the first pass, word alignment is carried out by means of the IBM models as described above. In the second pass, phrase alignment is carried out on the basis of the word alignments.
  • This kind of two- pass phrase alignment is well-known in the statistical machine translation community; it is described, for instance, in "Improved alignment models for statistical phrase translation" by F. Och et al. in Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999 (pp. 2-28).
  • a table comprising the joint counts C(S 1 T) for the document pair is constructed; each entry in the table gives the number of times source-language phrase S was aligned with target-language phrase T in a given document pair.
  • a phrase table of conditional probabilities P(SJT) for use in decoding can easily be constructed, for instance using relative- frequency estimates: P(S
  • T) C(S, T) / C(T)
  • P(JJS) the reversed condition
  • T) the conditional probability phrase table
  • the language model P(T) which is also required for decoding, is trained separately; a single, global language model is applied irrespective of the nature of the source text unit.
  • the translation mapping applied to the translation unit S is a search over possible J such that P(S
  • P(T) could also be adapted.
  • T) could also be adapted.
  • T), only P(T), or both. Adapting both language and translation models has the advantage that elements from the current sublanguage are unlikely to be suppressed just because they may be rare in general usage.
  • the target-language portion of the document pair does not enter the similarity calculation; only the similarity between the source text unit and the source- language portion of the document pair needs to be calculated.
  • Two reference books in this area are "Automatic Text Processing” by G. Salton (Addison- Wesley Publishing Company, 1989) and “Modern Information Retrieval” by R. Baeza-Yates and B. Ribeiro-Neto (Addison-Wesley Publishing Company, 1999). Any of the text similarity measures described in these references could be employed in our invention in the place of the measure included in the current embodiment. Note that some measures assign a high number to two dissimilar texts, and a low number to texts that are similar; such measures are typically called “distance measures” rather than “similarity measures”. It is mathematically trivial to transform a distance measure into a similarity measure. Thus, the invention could also use any of the distance measures described in the above references.
  • both the source text unit and the source- language portion of the document are represented by a vector of real numbers of dimension V obtained from vectors of word counts within each text. This is done by application of the so-called "term frequency inverse document frequency” transformation to the counts (see Salton pg. 304) in order to increase the weight accorded to the most informative words.
  • V-dimensional vector as an S-vector, because it contains only information related to the source language.
  • the dimension V is the number of source-language words in the system's vocabulary.
  • V may be less than the number of words in the system's vocabulary. In an implementation of the invention, no words are dropped, and V is thus equal to the size of the source-language vocabulary.
  • the distance between the two S- vectors thus obtained from the source text unit and the source-language portion of the document pair respectively is defined to be the cosine measure (see Salton, Table 10.1 , pg. 318).
  • the cosine measure is a similarity measure as defined above: it yields a number close to 1 when v ⁇ and v2 are obtained from highly similar texts, and yields a value close to 0 when v ⁇ and v2 are obtained from highly dissimilar texts.
  • This optional step is a clustering of bilingual document pairs into K clusters, where K is less than D; we currently perform it using the well-known K-means clustering algorithm. The reason for performing this step is to reduce storage requirements, to reduce computational complexity during decoding, and to potentially enhance performance in cases where the original document pairs are small.
  • K-means clustering for instance K-medoids, hierarchical clustering, mixture modeling, and self-organizing maps.
  • J-dimensional vector For each of the document pairs, create a J-dimensional vector; this is then filled in with real numbers derived by means of the "term frequency inverse document frequency" transformation from the joint phrase counts for that document pair.
  • this J- dimensional vector obtained from joint phrase counts for a document pair as the document pair's "J-vector".
  • entry / in each J-vector represent the aligned phrase pair (s,
  • the similarity measure currently used for these vectors is, again, the cosine measure (although it is now applied to two J-vectors of dimension J, rather than two S-vectors of dimension V). Given two document pairs A and B, this measure applied to the J-vectors obtained from A and B will yield a value close to 1 if the same phrases are translated in similar ways in A and B, but will yield a value close to 0 if A and B have dissimilar translation usages.
  • An important difference between the S-vectors defined earlier and these J-vectors is that the former contain only information related to source-language text characteristics, while the latter are directly related to bilingual translation characteristics.
  • a disadvantage of the current procedure is that the J-vectors are very sparse, i.e., contain mostly zeros, since for a particular document pair, most of the J possible aligned phrase pairs do not appear.
  • K-means clustering of document pairs a value of K less than D must be chosen.
  • Each of the K clusters will be characterized by a centroid J-vector; this can be called a J-centroid.
  • the K J-centroids are initialized to be K different J-vectors chosen at random from the D J-vectors associated with the D document pairs.
  • each of the D documents is assigned to the cluster whose J-centroid is closest to the document's J-vector.
  • each of the K J- centroids is recomputed to take account of all the aligned phrases in the documents assigned to its cluster.
  • documents are reassigned to the cluster with the closest J-centroid. The process is iterated until cluster membership stops changing.
  • the J-vectors and J-centroids are used only during clustering, which is part of the process of training the bilingual sublanguage space.
  • these J-vectors are irrelevant, but the S- centroids are important; source text units are located in the space by measuring their similarity to the S-centroids.
  • the target language model P(T) is a global one that is employed in all cases, independently of the characteristics of the source text unit.
  • the language model component P(T) of the system is adapted to a source text unit including the translation unit currently being translated. This could be done, for instance, by associating a language model in the target language with each of the clusters constructed as described above. Note that since each of the clusters consists of a set of document pairs where half of each pair is a document written in the target language, it would be straightforward to train a language model for each cluster on all the target-language documents in that cluster.
  • the language model could be adapted in a way that directly parallels TM adaptation: weight counts derived from each cluster, then derive an adapted model from the weighted counts.
  • target language models for machine translation can be trained on more data than the translation model, since there are often more unilingual target-language documents available than there are bilingual source-target translated document pairs available.
  • the initial language model for each cluster has been obtained as described in the preceding paragraph, it can be improved by assigning additional target-language documents to the cluster, and retraining the language model on the larger set of documents now associated with the cluster.
  • Assignment of new target-language documents to a cluster could be done, for instance, by employing techniques from the information retrieval literature, whereby text from target-language documents originally in the cluster would be used in queries to find similar target- language documents from a large collection of such documents.
  • a language model for a particular language yields a probability for any document in that language.
  • each target- language document could be assigned to the cluster whose language model assigns it the highest probability.
  • the bilingual sublanguage space will consist either of clusters consisting of the original D document pairs or of K clusters derived from these document pairs via K-means clustering. Associated with each cluster is an S-centroid vector of dimension V characterizing its position in the source-language subspace.
  • a source text unit including the translation unit, as already described.
  • a V-dimensional S-vector is obtained from the source text unit by calculating word counts and transforming them (currently by means of the "term frequency inverse document frequency” transformation as previously described). The distances between this S-vector for the source text unit and the bilingual sublanguage centroids are then calculated; for instance, by means of the cosine measure already described.
  • a source text unit would be assigned to the cluster whose S-centroid is closest. Decoding for the new source sentence would then proceed according to the translation mapping associated with that cluster.
  • a specialized translation model can be constructed for the current translation unit by combining statistics from the clusters in a way that assigns more importance to clusters that are near the source text unit than to clusters that are far from it.
  • IJi(S 1 J) represent the /th entry in the interpolated joint phrase count table for source phrases S and target phrases J.
  • Let u represent the S-vector for the current source text unit
  • let cj_ represent the S-centroid for cluster 1
  • let c2 represent the S-centroid for cluster 2
  • C j (S,T) represent the joint phrase count table for cluster j
  • C j j(S,T) represent the /th entry in this table.
  • IJi(S 1 T) COsJrIe(U 1 Cl) + C 11 (S 1 T) + COSiDe(U 1 Cl) + C 21 (S 1 T) + ...+ cosine(u,cK)*C kl (S,T)
  • phrase pairs within a joint phrase count table will be treated the same, but this need not be the case. Different phrase pairs will tend to exhibit different degrees of variation across sublanguages, and those that are more variable should have their counts changed more in the final adapted table than those that are relatively static. This could be accomplished by classifying phrase pairs according to how much their counts varied among joint count phrase tables for all clusters. Pairs with low variation would be multiplied by a "flattened" version of the cosine weight (one that is closer to a constant across all clusters, possibly implemented using the sigmoid function described below), while those with high variation would be multiplied by the unmodified cosine weight. This would have the effect of reducing the effect of accidental statistical variation on parts of the core language that should remain stable within the adapted model.
  • Figure 4 shows how decoding is carried out in a variant of the current embodiment, where the distance measure d ⁇ is a true distance and not a similarity (i.e., a large value of d(t1,t2) indicates that its arguments t1 and t2 are dissimilar).
  • d ⁇ is a true distance and not a similarity (i.e., a large value of d(t1,t2) indicates that its arguments t1 and t2 are dissimilar).
  • Each cluster has an S-centroid (shown as a black circle in the middle of each cluster) and an associated table of aligned joint phrase counts Ci(S 1 T). Two source text units, A and B, are shown.
  • a V-dimensional S-vector derived from counts of the words in the unit is obtained, as described above; these are the steps "Find S-vector representing A” and “find S-vector representing B” in the figure.
  • the distances between the two S-vectors representing the source text units, on the one hand, and the cluster S- centroids, on the other hand, are then calculated. For instance, “d A3 " in the figure represents the value of the distance between the S-vector representing source text unit A and the S-centroid associated with cluster 3.
  • These distances are used to weight the contribution of the joint phrase count tables from the clusters to the adapted joint phrase count tables, as shown in the figure. For instance, it is clear that the information from cluster 1 will have a greater impact on the adapted count table for source text unit A than on the adapted count table for source text unit B, since d A i is smaller than d B i.
  • P(T) is estimated by a language model that is not itself adapted (in the current embodiment of the invention), so we only require a model for the conditional probability: P(S
  • T) For each aligned phrase pair (s,t), the conditional probability P(s
  • the translation model component P(S[T) is adapted to a source text unit that includes the translation unit currently being translated.
  • the adapted language model for the target language could be obtained by interpolating the predictions of the cluster language models, or by interpolating the N-gram counts associated with the cluster language models.
  • language model adaptation would be carried out by assigning greater weights to language models or language model counts associated with clusters that are close to the current source text unit, according to a similarity measure applied to the S-vector for the current source text unit and the S-centroids for the clusters.
  • the similarity measure would not have to be the same as the one used for translation model adaptation, though it could be; a sigmoid function (not necessarily the same one) could also be used for language model adaptation.
  • a key aspect of the invention is the speed with which adaptation takes place, due to the simplicity of the computation involved during decoding.
  • precompilation of the summary statistics associated with each training document pair or cluster e.g., the phrase table, joint phrase count table, or language model for the pair or cluster
  • This section describes a form of adaptive decoding somewhat different from that described in the previous section.
  • a state-of-the-art phrase- based machine translation system often comprises three components: the translation model (for estimating P(SJT)), the language model (for estimating P(J)). and the decoder (for finding a sequence of words J that maximizes the estimated P(S
  • the decoder for finding a sequence of words J that maximizes the estimated P(S
  • other components may also communicate with the decoder and influence the translations it produces: for instance, a "forward" phrase translation model P(T]S), a distortion model, a sentence length model, and so on.
  • each of these components is defined so that it can yield a numerical score for a partial or complete translation hypothesis for a given source-language input; each of the components is trained separately.
  • For the decoder to carry out translation it must know how to combine the scores from these components. This requires assigning a weight to each component score. For instance, if the scores are given as probabilities, we can raise each to a power ⁇ and then take the product of the resulting component scores as the overall core.
  • the decoder only employs component scores from the translation models P(SJD and P(JlS), and from the language model P(T), we could define the overall score for target hypothesis T as
  • log(scoreiBS)) ⁇ 1 * log P(SJD + ⁇ 2 * logP(JJS) + ⁇ 3 * logP(D.
  • this kind of model is called a "log-linear model" (where the overall score is a linear combination of the logarithms of the component scores).
  • a log-linear model will have the form
  • log(score(T ⁇ S)) ⁇ 1 * log U(S 1 T) + ⁇ 2 * log f 2 (SJ) + ... + an * log f n (S,D, where fi() represents the score returned by the /th component for hypothesis T.
  • weights ⁇ 1 , ⁇ 2, ... an for the components.
  • this is done (after training each of the components) by means of an estimation procedure performed on a small "development" corpus (often called the "dev" corpus) which is made up of bilingual sentence pairs and chosen at random.
  • the decoder is given initial settings for the weights ⁇ 1 , ⁇ 2, ..., an and used to decode the source sentences in the dev corpus.
  • the weights are changed, according to a mathematical formula, until the translations produced sufficiently resemble the target-language translations in the dev corpus.
  • the estimation procedure adjusts the weights ⁇ 1 , ⁇ 2, ... , an until the decoder with these weights on its components is capable of producing translations similar to those seen in the dev corpus.
  • the dev corpus is chosen more or less at random.
  • the dev corpus is constructed from bilingual document pairs whose source-language portion is similar to the source text unit.
  • the dev corpus is made up of documents which are close to the source text unit in the bilingual sublanguage space, according to an appropriate similarity measure between a source text unit and a document pair (as described earlier).
  • adaptation to the source text unit occurs by changing the weights on the scores returned by the components that communicate with the decoder.
  • this embodiment of the invention provides a slightly different method for combining information from different phrase translation models and different language models than the methods described in the previous section.
  • Figure 5 shows schematically how this aspect of the invention works.
  • a "dev corpus selection module” takes as input the source text unit and a collection of bilingual aligned document pairs, and selects a subset of the document pairs whose source-language portions are sufficiently close to the source text unit (according to a distance metric defined in source sublanguage space).
  • the selected document pairs are #1 , #2, and #4. These are aggregated to form a development ("dev") corpus for weight estimation.
  • the chosen weight estimation procedure is then invoked on this dynamically constructed dev corpus to find good values for the N weights on the N scores output by the N components that communicate with the decoder.
  • each selected document may be a single sentence pair (a source-language sentence and its target-language translation) extracted from a different bilingual text.
  • the scoring function for combining component scores in the decoder is log-linear
  • the weight estimation procedure used is that described above
  • the measure of quality of translation used to find the weights on the component functions from the dev corpus is the BLEU metric.
  • adaptive decoding based on adjusting component weights on a dev corpus chosen for its similarity to the source text unit could be applied to scoring functions of a different form, using other weight estimation procedures, and using other measures of translation quality. For instance, it would be possible to apply maximum-likelihood estimation of component weights, by attempting to maximize the probability that the source- language sentences in a dev corpus chosen for its similarity to the source text unit generated the target-language translations for them found in that dev corpus.
  • the output of the first step is a representation of the most probable translation hypotheses for the translation unit.
  • This representation may, for instance, be a set of N hypotheses, each accompanied by a probability score (an "N-best list"), or a word lattice with probabilities associated with transitions in the lattice.
  • the first step is typically performed by the decoder as described above, in a manner whereby the decoder calculates an overall score for each hypothesis that is a weighted combination of component scores.
  • a set of information sources is used to assign new probability scores to the translation hypotheses encoded in the representation output from the first step; this is called "rescoring".
  • the only requirement for an information source in this framework is that it be capable of generating a numerical score for each translation hypothesis.
  • the set of information sources for rescoring is different (usually larger) than the set of components that communicate with the decoder in the first step.
  • a weight estimation procedure is invoked prior to use of the complete two-step system to assign weights to the information sources employed in the second step, with larger weights being assigned to more reliable information sources.
  • This weight estimation procedure is often the same as that employed to estimate the component weights for the first step
  • Figure 6 shows another embodiment of the invention, in which the bilingual sublanguage space is optionally used in a rescoring step to improve translation of the translation unit (possibly along with other information sources for rescoring).
  • the source text unit containing the translation unit is used to obtain an adapted translation mapping.
  • the adapted translation mapping is used for rescoring. This can be done because the adapted translation mapping is capable of generating a statistical score for a particular translation hypothesis, based on the target-language word sequence of this hypothesis and the source- language word sequence of the translation unit.
  • the adapted translation mapping can be used for rescoring no matter how the representation of the translation hypotheses was generated.
  • a dynamically constructed development corpus can be used in the procedure for estimating weights on the components that communicate with the decoder as described in the previous section, so such a dynamically constructed corpus can be used to estimate weights on the information sources for rescoring.
  • the method is exactly as shown in Figure 5, except that the dev corpus selected for similarity to the source text unit is employed to estimate weights on information sources for rescoring.
  • the bilingual sublanguage space is constructed in advance, from a static collection of documents, before the source text is available, and is not subsequently changed.
  • the bilingual sublanguage space is itself adapted after the source text becomes available, shortly before translation takes place. This will not be a complete retraining on a new collection of documents, starting from scratch, as is already known to be a means of building a machine translation system for a new domain. As we have seen, such a retraining is a computationally expensive, time-consuming, offline operation. Instead, this aspect of the invention is a way of quickly "tweaking" or dynamically adapting a previously trained bilingual sublanguage space.
  • the reason this may be desirable is that the newly located bilingual documents are likely to have content closely related to that in the web page to be translated, since they were directly or indirectly linked to that page. Certain source-language words and phrases contained in the web page to be translated may not occur in the documents originally used to train the space; incorporation of the linked bilingual documents may enable the system to deal with these more effectively.
  • a variant of this scenario would be the case of an Internet-based email translation service based on the invention, where users submit their source- language emails to the service, edit the translation provided, and then send it off when they're satisfied.
  • the sending of an email would be the sign that the translation has been "approved” by the user, triggering use of the document pair (the source-language and target-language versions of the email) to adapt the original bilingual sublanguage space.
  • the new bilingual document pair must first be phrase-aligned, and its joint phrase counts and J-vector calculated. Then, the distance between the J-vector of the new document pair and the J-centroid for each cluster is calculated.
  • the first strategy for adapting the space to the new document pair is to assign the new document pair to the nearest existing cluster - i.e., to increment the C(S 1 T) table associated with the cluster by the joint phrase counts from the new pair. Then, the S-centroid vector associated with the chosen cluster is recomputed. All other clusters are left unchanged.
  • the second strategy is to increase the number of clusters to accommodate the new data, without changing the previously defined clusters.
  • new data gathered from the Internet or from recently performed translations can be used in a rescoring step, by training new mappings for translation on them and using these new mappings to rescore translation hypotheses output by the decoder.
  • dynamic adaptation is carried out not by changing the subspace used to perform the initial decoding, but by using information from the newly gathered data for rescoring.
  • Two key aspects of the invention are the construction of the bilingual sublanguage space and the way in which a source text unit is assigned to a location in that space.
  • a particularly interesting alternative embodiment of the invention is one in which the construction of the bilingual sublanguage space involves a form of dimensionality reduction such as Principal Component Analysis (PCA), Linear Discriminative Analysis (LDA), or Independent Component Analysis (ICA).
  • PCA Principal Component Analysis
  • LDA Linear Discriminative Analysis
  • ICA Independent Component Analysis
  • the source text unit When decoding occurs, the source text unit is projected into the space of dimension L. Distances from the source text unit to the D documents are calculated in this space, and used to calculate the adapted model from the counts associated with the document pairs as described above. Alternatively, dimensionality reduction could be carried out on the J-vectors.
  • Another interesting embodiment is one in which the space is associated with a model that gives the probability of generating word sequences in the source language. This could be achieved, for instance, by means of a minor modification to the current embodiment, in which the source-language portions of the documents associated with each cluster would be used to train a statistical language model for each cluster. Given a similarity measure between the source text unit and the clusters, and an interpolation function for a source language model based on the similarity measure, the system would search for the point in the space whose interpolated source-language model had the highest conditional probability of generating the source text unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

Cette invention concerne un moyen et un procédé pour traduire un texte source en un texte cible où les informations de contexte sont prises en considération. Une unité de source texte est définie autour d’une unité de traduction à traduire. Cette unité de texte source est mise en corrélation sur un espace de sous-langue bilingue où l’espace de sous-langue bilingue comprend un espace de sous-langue source et des corrélations sur la langue cible. La traduction est adaptée à l’unité de texte source, prenant en considération de ce fait des informations contextuelles.
PCT/CA2006/001004 2005-06-17 2006-06-16 Moyen et procede pour une traduction de langue adaptee WO2006133571A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CA2612404A CA2612404C (fr) 2005-06-17 2006-06-16 Moyen et procede pour une traduction de langue adaptee
EP06761071.7A EP1894125A4 (fr) 2005-06-17 2006-06-16 Moyen et procede pour une traduction de langue adaptee
US11/922,311 US8612203B2 (en) 2005-06-17 2006-06-16 Statistical machine translation adapted to context

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US69123805P 2005-06-17 2005-06-17
US60/691,238 2005-06-17

Publications (1)

Publication Number Publication Date
WO2006133571A1 true WO2006133571A1 (fr) 2006-12-21

Family

ID=37531936

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2006/001004 WO2006133571A1 (fr) 2005-06-17 2006-06-16 Moyen et procede pour une traduction de langue adaptee

Country Status (4)

Country Link
US (1) US8612203B2 (fr)
EP (1) EP1894125A4 (fr)
CA (1) CA2612404C (fr)
WO (1) WO2006133571A1 (fr)

Cited By (121)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US8768686B2 (en) 2010-05-13 2014-07-01 International Business Machines Corporation Machine translation with side information
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10607140B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10949904B2 (en) * 2014-10-04 2021-03-16 Proz.Com Knowledgebase with work products of service providers and processing thereof
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
EP3839799A1 (fr) * 2019-12-19 2021-06-23 Beijing Baidu Netcom Science And Technology Co., Ltd. Procédé, appareil, dispositif électronique et support d'enregistrement lisible pour la traduction
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11694041B2 (en) 2018-05-15 2023-07-04 Iflytek Co., Ltd. Chapter-level text translation method and device

Families Citing this family (133)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060116865A1 (en) 1999-09-17 2006-06-01 Www.Uniscape.Com E-services translation utilizing machine translation and translation memory
US7904595B2 (en) 2001-01-18 2011-03-08 Sdl International America Incorporated Globalization management system and method therefor
WO2003005166A2 (fr) 2001-07-03 2003-01-16 University Of Southern California Modele de traduction statistique fonde sur la syntaxe
WO2004001623A2 (fr) * 2002-03-26 2003-12-31 University Of Southern California Construction d'un lexique de traduction a partir de corpus comparables et non paralleles
US7711545B2 (en) * 2003-07-02 2010-05-04 Language Weaver, Inc. Empirical methods for splitting compound words with application to machine translation
US8548794B2 (en) * 2003-07-02 2013-10-01 University Of Southern California Statistical noun phrase translation
US7983896B2 (en) 2004-03-05 2011-07-19 SDL Language Technology In-context exact (ICE) matching
US8296127B2 (en) 2004-03-23 2012-10-23 University Of Southern California Discovery of parallel text portions in comparable collections of corpora and training using comparable texts
US8666725B2 (en) 2004-04-16 2014-03-04 University Of Southern California Selection and use of nonstatistical translation components in a statistical machine translation framework
DE112005002534T5 (de) * 2004-10-12 2007-11-08 University Of Southern California, Los Angeles Training für eine Text-Text-Anwendung, die eine Zeichenketten-Baum-Umwandlung zum Training und Decodieren verwendet
US8886517B2 (en) 2005-06-17 2014-11-11 Language Weaver, Inc. Trust scoring for language translation systems
US8676563B2 (en) 2009-10-01 2014-03-18 Language Weaver, Inc. Providing human-generated and machine-generated trusted translations
US7813918B2 (en) * 2005-08-03 2010-10-12 Language Weaver, Inc. Identifying documents which form translated pairs, within a document collection
US10319252B2 (en) 2005-11-09 2019-06-11 Sdl Inc. Language capability assessment and training apparatus and techniques
US8943080B2 (en) 2006-04-07 2015-01-27 University Of Southern California Systems and methods for identifying parallel documents and sentence fragments in multilingual document collections
US8886518B1 (en) 2006-08-07 2014-11-11 Language Weaver, Inc. System and method for capitalizing machine translated text
EP2054817A4 (fr) * 2006-08-18 2009-10-21 Ca Nat Research Council Moyens et procédé pour entraînement d'un système de traduction à machine statistique
US8433556B2 (en) 2006-11-02 2013-04-30 University Of Southern California Semi-supervised training for statistical word alignment
US20080120092A1 (en) * 2006-11-20 2008-05-22 Microsoft Corporation Phrase pair extraction for statistical machine translation
US9122674B1 (en) 2006-12-15 2015-09-01 Language Weaver, Inc. Use of annotations in statistical machine translation
US8468149B1 (en) 2007-01-26 2013-06-18 Language Weaver, Inc. Multi-lingual online community
US8615389B1 (en) 2007-03-16 2013-12-24 Language Weaver, Inc. Generation and exploitation of an approximate language model
US8959011B2 (en) * 2007-03-22 2015-02-17 Abbyy Infopoisk Llc Indicating and correcting errors in machine translation systems
US8831928B2 (en) * 2007-04-04 2014-09-09 Language Weaver, Inc. Customizable machine translation service
US8825466B1 (en) 2007-06-08 2014-09-02 Language Weaver, Inc. Modification of annotated bilingual segment pairs in syntax-based machine translation
JP5342760B2 (ja) * 2007-09-03 2013-11-13 株式会社東芝 訳語学習のためのデータを作成する装置、方法、およびプログラム
WO2009089180A1 (fr) * 2008-01-04 2009-07-16 Educational Testing Service Procédé de notation de réponse par un nombre réel
JP5007977B2 (ja) * 2008-02-13 2012-08-22 独立行政法人情報通信研究機構 機械翻訳装置、機械翻訳方法、及びプログラム
US8910110B2 (en) * 2008-03-19 2014-12-09 Oracle International Corporation Application translation cost estimator
US8521516B2 (en) * 2008-03-26 2013-08-27 Google Inc. Linguistic key normalization
US7958125B2 (en) * 2008-06-26 2011-06-07 Microsoft Corporation Clustering aggregator for RSS feeds
US8990064B2 (en) 2009-07-28 2015-03-24 Language Weaver, Inc. Translating documents based on content
US8364463B2 (en) * 2009-09-25 2013-01-29 International Business Machines Corporation Optimizing a language/media translation map
US8380486B2 (en) 2009-10-01 2013-02-19 Language Weaver, Inc. Providing machine-generated translations and corresponding trust levels
US8682640B2 (en) * 2009-11-25 2014-03-25 International Business Machines Corporation Self-configuring language translation device
US8589396B2 (en) * 2010-01-06 2013-11-19 International Business Machines Corporation Cross-guided data clustering based on alignment between data domains
US8229929B2 (en) 2010-01-06 2012-07-24 International Business Machines Corporation Cross-domain clusterability evaluation for cross-guided data clustering based on alignment between data domains
US10417646B2 (en) 2010-03-09 2019-09-17 Sdl Inc. Predicting the cost associated with translating textual content
US8798998B2 (en) * 2010-04-05 2014-08-05 Microsoft Corporation Pre-saved data compression for TTS concatenation cost
US8560297B2 (en) * 2010-06-07 2013-10-15 Microsoft Corporation Locating parallel word sequences in electronic documents
US10133737B2 (en) * 2010-08-26 2018-11-20 Google Llc Conversion of input text strings
US10657540B2 (en) 2011-01-29 2020-05-19 Sdl Netherlands B.V. Systems, methods, and media for web content management
US9547626B2 (en) 2011-01-29 2017-01-17 Sdl Plc Systems, methods, and media for managing ambient adaptability of web applications and web services
WO2012105231A1 (fr) * 2011-02-03 2012-08-09 日本電気株式会社 Dispositif d'adaptation de modèle, procédé d'adaptation de modèle et programme d'adaptation de modèle
US10580015B2 (en) 2011-02-25 2020-03-03 Sdl Netherlands B.V. Systems, methods, and media for executing and optimizing online marketing initiatives
US10140320B2 (en) 2011-02-28 2018-11-27 Sdl Inc. Systems, methods, and media for generating analytical data
US8880406B2 (en) * 2011-03-28 2014-11-04 Epic Systems Corporation Automatic determination of and response to a topic of a conversation
US11003838B2 (en) 2011-04-18 2021-05-11 Sdl Inc. Systems and methods for monitoring post translation editing
US8260615B1 (en) * 2011-04-25 2012-09-04 Google Inc. Cross-lingual initialization of language models
US8694303B2 (en) 2011-06-15 2014-04-08 Language Weaver, Inc. Systems and methods for tuning parameters in statistical machine translation
US9984054B2 (en) 2011-08-24 2018-05-29 Sdl Inc. Web interface including the review and manipulation of a web document and utilizing permission based control
US8886515B2 (en) 2011-10-19 2014-11-11 Language Weaver, Inc. Systems and methods for enhancing machine translation post edit review processes
US9256597B2 (en) * 2012-01-24 2016-02-09 Ming Li System, method and computer program for correcting machine translation information
US8942973B2 (en) 2012-03-09 2015-01-27 Language Weaver, Inc. Content page URL translation
US9773270B2 (en) 2012-05-11 2017-09-26 Fredhopper B.V. Method and system for recommending products based on a ranking cocktail
US10261994B2 (en) 2012-05-25 2019-04-16 Sdl Inc. Method and system for automatic management of reputation of translators
US9026425B2 (en) * 2012-08-28 2015-05-05 Xerox Corporation Lexical and phrasal feature domain adaptation in statistical machine translation
US10452740B2 (en) 2012-09-14 2019-10-22 Sdl Netherlands B.V. External content libraries
US11386186B2 (en) 2012-09-14 2022-07-12 Sdl Netherlands B.V. External content library connector systems and methods
US11308528B2 (en) 2012-09-14 2022-04-19 Sdl Netherlands B.V. Blueprinting of multimedia assets
US9916306B2 (en) 2012-10-19 2018-03-13 Sdl Inc. Statistical linguistic analysis of source content
US9152622B2 (en) 2012-11-26 2015-10-06 Language Weaver, Inc. Personalized machine translation via online adaptation
US10664657B2 (en) * 2012-12-27 2020-05-26 Touchtype Limited System and method for inputting images or labels into electronic devices
GB201223450D0 (en) 2012-12-27 2013-02-13 Touchtype Ltd Search and corresponding method
US20150356076A1 (en) * 2013-01-11 2015-12-10 Qatar Foundation For Education, Science And Community Development System and method of machine translation
US9697821B2 (en) * 2013-01-29 2017-07-04 Tencent Technology (Shenzhen) Company Limited Method and system for building a topic specific language model for use in automatic speech recognition
US20140350931A1 (en) * 2013-05-24 2014-11-27 Microsoft Corporation Language model trained using predicted queries from statistical machine translation
US20160132491A1 (en) * 2013-06-17 2016-05-12 National Institute Of Information And Communications Technology Bilingual phrase learning apparatus, statistical machine translation apparatus, bilingual phrase learning method, and storage medium
US9213694B2 (en) 2013-10-10 2015-12-15 Language Weaver, Inc. Efficient online domain adaptation
US9842592B2 (en) 2014-02-12 2017-12-12 Google Inc. Language models using non-linguistic context
JP6217468B2 (ja) * 2014-03-10 2017-10-25 富士ゼロックス株式会社 多言語文書分類プログラム及び情報処理装置
US9412365B2 (en) 2014-03-24 2016-08-09 Google Inc. Enhanced maximum entropy models
US9740687B2 (en) 2014-06-11 2017-08-22 Facebook, Inc. Classifying languages for objects and entities
US9805028B1 (en) * 2014-09-17 2017-10-31 Google Inc. Translating terms using numeric representations
US9864744B2 (en) 2014-12-03 2018-01-09 Facebook, Inc. Mining multi-lingual data
US10067936B2 (en) 2014-12-30 2018-09-04 Facebook, Inc. Machine translation output reranking
US9830386B2 (en) 2014-12-30 2017-11-28 Facebook, Inc. Determining trending topics in social media
US9830404B2 (en) 2014-12-30 2017-11-28 Facebook, Inc. Analyzing language dependency structures
US9477652B2 (en) 2015-02-13 2016-10-25 Facebook, Inc. Machine learning dialect identification
US9940324B2 (en) 2015-03-10 2018-04-10 International Business Machines Corporation Performance detection and enhancement of machine translation
US9934203B2 (en) 2015-03-10 2018-04-03 International Business Machines Corporation Performance detection and enhancement of machine translation
US10134394B2 (en) 2015-03-20 2018-11-20 Google Llc Speech recognition using log-linear model
US9519643B1 (en) 2015-06-15 2016-12-13 Microsoft Technology Licensing, Llc Machine map label translation
KR102396250B1 (ko) * 2015-07-31 2022-05-09 삼성전자주식회사 대역 어휘 결정 장치 및 방법
CN106484682B (zh) 2015-08-25 2019-06-25 阿里巴巴集团控股有限公司 基于统计的机器翻译方法、装置及电子设备
CN106484681B (zh) 2015-08-25 2019-07-09 阿里巴巴集团控股有限公司 一种生成候选译文的方法、装置及电子设备
US9734142B2 (en) * 2015-09-22 2017-08-15 Facebook, Inc. Universal translation
US10268684B1 (en) 2015-09-28 2019-04-23 Amazon Technologies, Inc. Optimized statistical machine translation system with rapid adaptation capability
US10185713B1 (en) * 2015-09-28 2019-01-22 Amazon Technologies, Inc. Optimized statistical machine translation system with rapid adaptation capability
US9959271B1 (en) 2015-09-28 2018-05-01 Amazon Technologies, Inc. Optimized statistical machine translation system with rapid adaptation capability
US10614167B2 (en) 2015-10-30 2020-04-07 Sdl Plc Translation review workflow systems and methods
US10133738B2 (en) 2015-12-14 2018-11-20 Facebook, Inc. Translation confidence scores
US9734143B2 (en) 2015-12-17 2017-08-15 Facebook, Inc. Multi-media context language processing
US9519871B1 (en) 2015-12-21 2016-12-13 International Business Machines Corporation Contextual text adaptation
US9805029B2 (en) 2015-12-28 2017-10-31 Facebook, Inc. Predicting future translations
US10002125B2 (en) 2015-12-28 2018-06-19 Facebook, Inc. Language model personalization
US9747283B2 (en) 2015-12-28 2017-08-29 Facebook, Inc. Predicting future translations
US9558182B1 (en) * 2016-01-08 2017-01-31 International Business Machines Corporation Smart terminology marker system for a language translation system
US10540357B2 (en) * 2016-03-21 2020-01-21 Ebay Inc. Dynamic topic adaptation for machine translation using user session context
US10083155B2 (en) * 2016-05-17 2018-09-25 International Business Machines Corporation Method for detecting original language of translated document
US10902221B1 (en) 2016-06-30 2021-01-26 Facebook, Inc. Social hash for language models
CN107564513B (zh) * 2016-06-30 2020-09-08 阿里巴巴集团控股有限公司 语音识别方法及装置
US10902215B1 (en) 2016-06-30 2021-01-26 Facebook, Inc. Social hash for language models
US10832664B2 (en) 2016-08-19 2020-11-10 Google Llc Automated speech recognition using language models that selectively use domain-specific model components
US10229113B1 (en) * 2016-09-28 2019-03-12 Amazon Technologies, Inc. Leveraging content dimensions during the translation of human-readable languages
US10275459B1 (en) 2016-09-28 2019-04-30 Amazon Technologies, Inc. Source language content scoring for localizability
US10261995B1 (en) 2016-09-28 2019-04-16 Amazon Technologies, Inc. Semantic and natural language processing for content categorization and routing
US10180935B2 (en) 2016-12-30 2019-01-15 Facebook, Inc. Identifying multiple languages in a content item
KR102637338B1 (ko) * 2017-01-26 2024-02-16 삼성전자주식회사 번역 보정 방법 및 장치와 번역 시스템
IL252071A0 (en) * 2017-05-03 2017-07-31 Google Inc Contextual language translation
US11468234B2 (en) * 2017-06-26 2022-10-11 International Business Machines Corporation Identifying linguistic replacements to improve textual message effectiveness
CN107368476B (zh) * 2017-07-25 2020-11-03 深圳市腾讯计算机系统有限公司 一种翻译的方法、目标信息确定的方法及相关装置
US10275462B2 (en) * 2017-09-18 2019-04-30 Sap Se Automatic translation of string collections
KR102509821B1 (ko) * 2017-09-18 2023-03-14 삼성전자주식회사 Oos 문장을 생성하는 방법 및 이를 수행하는 장치
KR102509822B1 (ko) * 2017-09-25 2023-03-14 삼성전자주식회사 문장 생성 방법 및 장치
US10380249B2 (en) 2017-10-02 2019-08-13 Facebook, Inc. Predicting future trending topics
CN107957989B9 (zh) 2017-10-23 2021-01-12 创新先进技术有限公司 基于集群的词向量处理方法、装置以及设备
US10635863B2 (en) 2017-10-30 2020-04-28 Sdl Inc. Fragment recall and adaptive automated translation
CN108170663A (zh) 2017-11-14 2018-06-15 阿里巴巴集团控股有限公司 基于集群的词向量处理方法、装置以及设备
US10817676B2 (en) 2017-12-27 2020-10-27 Sdl Inc. Intelligent routing services and systems
JP7247460B2 (ja) * 2018-03-13 2023-03-29 富士通株式会社 対応関係生成プログラム、対応関係生成装置、対応関係生成方法、及び翻訳プログラム
US11163952B2 (en) 2018-07-11 2021-11-02 International Business Machines Corporation Linked data seeded multi-lingual lexicon extraction
US11256867B2 (en) 2018-10-09 2022-02-22 Sdl Inc. Systems and methods of machine learning for digital assets and message creation
CN109190134B (zh) * 2018-11-21 2023-05-30 科大讯飞股份有限公司 一种文本翻译方法及装置
CN111563381B (zh) * 2019-02-12 2023-04-21 阿里巴巴集团控股有限公司 文本处理方法和装置
US11080601B2 (en) * 2019-04-03 2021-08-03 Mashtraxx Limited Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
US11763098B2 (en) * 2019-08-07 2023-09-19 7299362 Canada Inc. System and method for language translation
US11328132B2 (en) * 2019-09-09 2022-05-10 International Business Machines Corporation Translation engine suggestion via targeted probes
US11675963B2 (en) * 2019-09-09 2023-06-13 Adobe Inc. Suggestion techniques for documents to-be-translated
CN110941966A (zh) * 2019-12-10 2020-03-31 北京小米移动软件有限公司 机器翻译模型的训练方法、装置及系统
KR20210150842A (ko) * 2020-06-04 2021-12-13 삼성전자주식회사 음성 또는 문자를 번역하는 전자 장치 및 그 방법
GB2599441B (en) * 2020-10-02 2024-02-28 Emotional Perception Ai Ltd System and method for recommending semantically relevant content
US11907678B2 (en) 2020-11-10 2024-02-20 International Business Machines Corporation Context-aware machine language identification

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5867811A (en) * 1993-06-18 1999-02-02 Canon Research Centre Europe Ltd. Method, an apparatus, a system, a storage device, and a computer readable medium using a bilingual database including aligned corpora
US6092034A (en) * 1998-07-27 2000-07-18 International Business Machines Corporation Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models
US6278967B1 (en) * 1992-08-31 2001-08-21 Logovista Corporation Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
US20020107683A1 (en) * 2000-12-19 2002-08-08 Xerox Corporation Extracting sentence translations from translated documents
US6598015B1 (en) * 1999-09-10 2003-07-22 Rws Group, Llc Context based computer-assisted language translation
US20040102956A1 (en) * 2002-11-22 2004-05-27 Levin Robert E. Language translation system and method
US20050033567A1 (en) * 2002-11-28 2005-02-10 Tatsuya Sukehiro Alignment system and aligning method for multilingual documents

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5497319A (en) 1990-12-31 1996-03-05 Trans-Link International Corp. Machine translation and telecommunications system
US5477451A (en) 1991-07-25 1995-12-19 International Business Machines Corp. Method and system for natural language translation
US5510981A (en) 1993-10-28 1996-04-23 International Business Machines Corporation Language translation apparatus and method using context-based translation models
US6085162A (en) * 1996-10-18 2000-07-04 Gedanken Corporation Translation system and method in which words are translated by a specialized dictionary and then a general dictionary
US5991710A (en) 1997-05-20 1999-11-23 International Business Machines Corporation Statistical translation system with features based on phrases or groups of words
DE69818796T2 (de) * 1997-06-26 2004-08-05 Koninklijke Philips Electronics N.V. Maschinenorganisiertes verfahren und vorrichtung zum übersetzen einer wortorganisierten textquelle in einen wortorganisierten zieltext
DE69837979T2 (de) * 1997-06-27 2008-03-06 International Business Machines Corp. System zum Extrahieren einer mehrsprachigen Terminologie
US6195631B1 (en) * 1998-04-15 2001-02-27 At&T Corporation Method and apparatus for automatic construction of hierarchical transduction models for language translation
JP2004501429A (ja) 2000-05-11 2004-01-15 ユニバーシティ・オブ・サザン・カリフォルニア 機械翻訳技法
US6491456B2 (en) 2000-06-23 2002-12-10 Darfon Electronics Corp. Keyboard thin film circuit board with trenches to release air from hollow rubber domes
US6990439B2 (en) 2001-01-10 2006-01-24 Microsoft Corporation Method and apparatus for performing machine translation using a unified language model and translation model
US7295962B2 (en) 2001-05-11 2007-11-13 University Of Southern California Statistical memory-based translation system
US7689405B2 (en) 2001-05-17 2010-03-30 Language Weaver, Inc. Statistical method for building a translation memory
US20030154071A1 (en) * 2002-02-11 2003-08-14 Shreve Gregory M. Process for the document management and computer-assisted translation of documents utilizing document corpora constructed by intelligent agents
AU2003267953A1 (en) 2002-03-26 2003-12-22 University Of Southern California Statistical machine translation using a large monlingual corpus
CA2480398C (fr) 2002-03-27 2011-06-14 University Of Southern California Modele de probabilite jointe a base de phrases pour la traduction par machine statistique
AU2003222126A1 (en) 2002-03-28 2003-10-13 University Of Southern California Statistical machine translation
TWI256562B (en) 2002-05-03 2006-06-11 Ind Tech Res Inst Method for named-entity recognition and verification
US7383542B2 (en) 2003-06-20 2008-06-03 Microsoft Corporation Adaptive machine translation service
US8548794B2 (en) 2003-07-02 2013-10-01 University Of Southern California Statistical noun phrase translation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278967B1 (en) * 1992-08-31 2001-08-21 Logovista Corporation Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
US5867811A (en) * 1993-06-18 1999-02-02 Canon Research Centre Europe Ltd. Method, an apparatus, a system, a storage device, and a computer readable medium using a bilingual database including aligned corpora
US6092034A (en) * 1998-07-27 2000-07-18 International Business Machines Corporation Statistical translation system and method for fast sense disambiguation and translation of large corpora using fertility models and sense models
US6598015B1 (en) * 1999-09-10 2003-07-22 Rws Group, Llc Context based computer-assisted language translation
US20020107683A1 (en) * 2000-12-19 2002-08-08 Xerox Corporation Extracting sentence translations from translated documents
US20040102956A1 (en) * 2002-11-22 2004-05-27 Levin Robert E. Language translation system and method
US20050033567A1 (en) * 2002-11-28 2005-02-10 Tatsuya Sukehiro Alignment system and aligning method for multilingual documents

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1894125A4 *

Cited By (166)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US8583418B2 (en) 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis
US8712776B2 (en) 2008-09-29 2014-04-29 Apple Inc. Systems and methods for selective text to speech synthesis
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US8380507B2 (en) 2009-03-09 2013-02-19 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US8751238B2 (en) 2009-03-09 2014-06-10 Apple Inc. Systems and methods for determining the language to use for speech generated by a text to speech engine
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US11410053B2 (en) 2010-01-25 2022-08-09 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10607140B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984327B2 (en) 2010-01-25 2021-04-20 New Valuexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10984326B2 (en) 2010-01-25 2021-04-20 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10607141B2 (en) 2010-01-25 2020-03-31 Newvaluexchange Ltd. Apparatuses, methods and systems for a digital conversation management platform
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US8768686B2 (en) 2010-05-13 2014-07-01 International Business Machines Corporation Machine translation with side information
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US10949904B2 (en) * 2014-10-04 2021-03-16 Proz.Com Knowledgebase with work products of service providers and processing thereof
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11694041B2 (en) 2018-05-15 2023-07-04 Iflytek Co., Ltd. Chapter-level text translation method and device
EP3839799A1 (fr) * 2019-12-19 2021-06-23 Beijing Baidu Netcom Science And Technology Co., Ltd. Procédé, appareil, dispositif électronique et support d'enregistrement lisible pour la traduction
US11574135B2 (en) 2019-12-19 2023-02-07 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus, electronic device and readable storage medium for translation

Also Published As

Publication number Publication date
CA2612404C (fr) 2014-05-27
US8612203B2 (en) 2013-12-17
US20090083023A1 (en) 2009-03-26
EP1894125A4 (fr) 2015-12-02
EP1894125A1 (fr) 2008-03-05
CA2612404A1 (fr) 2006-12-21

Similar Documents

Publication Publication Date Title
US8612203B2 (en) Statistical machine translation adapted to context
Quirk et al. Dependency treelet translation: Syntactically informed phrasal SMT
Mairesse et al. Stochastic language generation in dialogue using factored language models
Ueffing et al. Transductive learning for statistical machine translation
EP2269148B1 (fr) Traduction automatique statistique intra-langues
JP5243167B2 (ja) 情報検索システム
JP2009533728A (ja) 機械翻訳の方法およびシステム
Durrani et al. Investigating the usefulness of generalized word representations in SMT
KR20040044176A (ko) 구문들 사이의 번역 관계를 학습하기 위한 통계적 방법 및장치
Ueffing et al. Semi-supervised model adaptation for statistical machine translation
Arisoy et al. Discriminative language modeling with linguistic and statistically derived features
Yuan Grammatical error correction in non-native English
Callison-Burch et al. Co-training for statistical machine translation
Gao et al. A unified approach to statistical language modeling for Chinese
Sajjad et al. Statistical models for unsupervised, semi-supervised, and supervised transliteration mining
Cuong et al. A survey of domain adaptation for statistical machine translation
Kuo et al. A phonetic similarity model for automatic extraction of transliteration pairs
Meyers et al. A multilingual procedure for dictionary-based sentence alignment
JP5500636B2 (ja) 句テーブル生成器及びそのためのコンピュータプログラム
JP2006004366A (ja) 機械翻訳システム及びそのためのコンピュータプログラム
Hoang Improving statistical machine translation with linguistic information
Alshawi et al. Learning dependency transduction models from unannotated examples
Ueffing et al. Semisupervised learning for machine translation
Shah Model adaptation techniques in machine translation
Garcia-Varea et al. Maximum Entropy Modeling: A Suitable Framework to Learn Context-Dependent Lexicon Models for Statistical Machine Translation: Basic Instructions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 11922311

Country of ref document: US

Ref document number: 2612404

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Ref document number: DE

WWE Wipo information: entry into national phase

Ref document number: 2006761071

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2006761071

Country of ref document: EP