CN116644764A - Machine translation method and device, electronic equipment and storage medium - Google Patents

Machine translation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116644764A
CN116644764A CN202310672071.5A CN202310672071A CN116644764A CN 116644764 A CN116644764 A CN 116644764A CN 202310672071 A CN202310672071 A CN 202310672071A CN 116644764 A CN116644764 A CN 116644764A
Authority
CN
China
Prior art keywords
temporal
sentence
target
intra
inter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310672071.5A
Other languages
Chinese (zh)
Inventor
凌天东
程宁
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310672071.5A priority Critical patent/CN116644764A/en
Publication of CN116644764A publication Critical patent/CN116644764A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a machine translation method and device, electronic equipment and a storage medium, and belongs to the technical field of digital medical treatment. The method comprises the following steps: inputting a target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on an N-gram; analyzing the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence; determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics; and outputting the translation result of the target sentence according to the temporal information. Based on the method, the inter-sentence temporal characteristics and the intra-sentence temporal characteristics are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the machine translation quality is remarkably improved, the translation information can be effectively integrated, good expansibility is achieved, multi-scene practical application is promoted, and the user satisfaction is improved.

Description

Machine translation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of digital medical technology, and in particular, to a machine translation method and apparatus, an electronic device, and a storage medium.
Background
In the machine translation process with high quality requirements, tenses are always taken as a very important loop in machine translation quality evaluation, and incorrect tenses can cause abnormal grammar expression, further cause misunderstanding of sentence meaning and cause adverse effects. The temporal problem has therefore attracted attention in many natural language processing applications. In many NLP (Natural Language Processing ) applications, such as event extraction and summarization, tenses are considered to be a key factor in providing temporal order.
However, temporal information is largely ignored by current statistical machine translation studies. Most current SMT (Statistical Machine Translation ) systems rely primarily on translation models and language models, and do not take into account and exploit temporal information, resulting in poor machine translation quality. Therefore, the existing machine translation method has low translation quality of medical texts and poor user experience.
Disclosure of Invention
The embodiment of the application mainly aims to provide a machine translation method and device, electronic equipment and a storage medium, which can fully utilize temporal information, thereby remarkably improving the translation quality of medical texts.
To achieve the above object, a first aspect of an embodiment of the present application provides a machine translation method, including:
inputting a target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on an N-gram;
analyzing the target sentence through the N-gram-based temporal model to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence;
determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and intra-sentence temporal characteristics;
and outputting the translation result of the target sentence according to the temporal information.
In some embodiments, the N-gram based temporal model is trained by the following method:
obtaining training sentences from a corpus;
performing temporal extraction on the training sentences to obtain temporal sequences of the training sentences;
analyzing the temporal sequence of the training sentences to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics;
and generating a temporal model based on the N-gram based on the inter-sentence temporal features and the intra-sentence temporal features.
In some embodiments, the analyzing the temporal sequence of the training sentence to obtain the inter-sentence temporal feature and the intra-sentence temporal feature includes:
Analyzing the temporal sequence of the training sentences to obtain the main temporal of the training sentences and the temporal of each target word in the training sentences;
calculating the inter-sentence temporal characteristics according to the main temporal state of the training sentences;
and calculating the intra-sentence temporal characteristics according to the temporal state of each target word in the training sentence.
In some embodiments, the computing the inter-sentence temporal feature from the dominant temporal of the training sentence includes:
acquiring a first temporal sequence of the main temporal state of the training sentence;
adjusting the weight of the first time sequence based on a minimum error rate training method;
and calculating the inter-sentence temporal characteristics according to the first temporal sequence and the weight of the first temporal sequence.
In some embodiments, the computing the intra-sentence temporal feature from the temporal state of each target word within the training sentence includes:
acquiring a second temporal sequence of the temporal state of each target word in the training sentence;
determining an average length of the second temporal sequence;
and calculating the intra-sentence temporal feature according to the average length of the second temporal sequence.
In some embodiments, the outputting the translation result of the target sentence according to the temporal information includes:
Analyzing the context information of the target sentence through a part-of-speech tagger in the phrase-based statistical machine translation system;
and outputting the translation result of the target sentence according to the context information and the temporal information of the target sentence.
In some embodiments, the outputting the translation result of the target sentence according to the context information and the temporal information of the target sentence includes:
determining verb corresponding relation of the target sentence mapped to Chinese nodes in English nodes according to the context information and the tense information of the target sentence;
and outputting a translation result of the target sentence based on the verb correspondence.
To achieve the above object, a second aspect of an embodiment of the present application proposes a machine translation apparatus, including:
the input module is used for inputting the target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on an N-gram;
the analysis module is used for analyzing the target sentence through the N-gram-based temporal model to obtain the inter-sentence temporal characteristics and the intra-sentence temporal characteristics of the target sentence;
The determining module is used for determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics;
and the output module is used for outputting the translation result of the target sentence according to the temporal information.
To achieve the above object, a third aspect of the embodiments of the present application proposes an electronic device, including a memory storing a computer program and a processor implementing the method according to the first aspect when the processor executes the computer program.
To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method of the first aspect.
The application provides a machine translation method, a device, electronic equipment and a storage medium, wherein a target sentence is input into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on N-gram; analyzing the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence; determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics; and outputting the translation result of the target sentence according to the temporal information. Based on the method, the device and the system, the target sentence is analyzed through the temporal model based on the N-gram to obtain the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence, then the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the inter-sentence temporal feature and the intra-sentence temporal feature are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the translation result of the target sentence is output according to the temporal information, the machine translation quality can be remarkably improved, the translation information can be effectively integrated, and meanwhile, the good expansibility is achieved, the improvement of the overall translation quality is promoted, the multi-scene practical application is promoted, and the user satisfaction is improved. Under the medical text translation scene, the medical text translation quality can be obviously improved.
Drawings
FIG. 1 is a flow chart of a machine translation method provided by an embodiment of the present application;
FIG. 2 is a flow chart of an N-gram based temporal model training method;
fig. 3 is a flowchart of step S203 in fig. 2;
fig. 4 is a flowchart of step S302 in fig. 3;
fig. 5 is a flowchart of step S303 in fig. 3;
fig. 6 is a flowchart of step S104 in fig. 1;
fig. 7 is a flowchart of step S602 in fig. 6;
FIG. 8 is a system flow diagram of a machine translation method provided by one embodiment of the present application;
FIG. 9 is a schematic diagram of a machine translation device according to an embodiment of the present application;
fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
First, several nouns involved in the present application are parsed:
artificial intelligence (artificial intelligence, AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding the intelligence of people; artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a manner similar to human intelligence, research in this field including robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of consciousness and thinking of people. Artificial intelligence is also a theory, method, technique, and application system that utilizes a digital computer or digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Machine translation: also known as automatic translation, is a process of converting one natural language (source language) into another natural language (target language) using a computer. The method is a branch of computational linguistics, is one of the ultimate targets of artificial intelligence, and has important scientific research value.
Statistical Machine Translation (SMT): is a method with better performance in machine translation in non-limiting fields. The basic idea of statistical machine translation is to construct a statistical translation model by performing statistical analysis on a large number of parallel corpora, and then use the model for translation. The transition from early word-based machine translation to phrase-based translation has been followed by fusion of syntactic information to further improve the accuracy of the translation.
Natural Language Processing (NLP): is a subject of language questions for studying human interactions with computers. According to different technical implementation difficulties, the system can be divided into three types of simple matching type, fuzzy matching type and paragraph cleavage type.
N-Gram: is a language model commonly used in large vocabulary continuous speech recognition, and for the middle, we refer to as a chinese language model (CLM, chinese Language Model). The Chinese language model can realize automatic conversion to Chinese characters by utilizing collocation information between adjacent words in the context.
Corpus: refers to a large-scale electronic text library which is scientifically sampled and processed, wherein language materials which are actually appeared in the actual use of the language are stored.
Part-of-Speech tagging (Part-of-Speech tagging or POS tagging): the term "part-of-speech" refers to a process of tagging each word in a word segmentation result with a correct part-of-speech, i.e., determining whether each word is a noun, a verb, an adjective, or another part-of-speech.
Minimizing error rate training: the given optimization criteria are optimized by optimizing feature weights on a data optimization Set (Tuning Set). Common optimization criteria include information entropy, BLEU, TER, etc. This stage requires multiple decoding of the optimal set using a decoder, each decoding yielding N highest scoring results, and adjustment of feature weights. When the weights are adjusted, the ordering of the N results will also change, and the highest scoring, i.e. decoded results, will be used to calculate the BLEU score or TER. When a new set of weights is obtained, such that the score of the entire optimization set is improved, the next round of decoding will be resumed. This is repeated until no new improvement can be observed.
Based on the above, the embodiment of the application provides a machine translation method and device, an electronic device and a storage medium, wherein a target sentence is input into a preset phrase-based statistical machine translation system, and the phrase-based statistical machine translation system comprises a temporal model based on an N-gram; analyzing the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence; determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics; and outputting the translation result of the target sentence according to the temporal information. Based on the method, the device and the system, the target sentence is analyzed through the temporal model based on the N-gram to obtain the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence, then the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the inter-sentence temporal feature and the intra-sentence temporal feature are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the translation result of the target sentence is output according to the temporal information, the machine translation quality can be remarkably improved, the translation information can be effectively integrated, and meanwhile, the good expansibility is achieved, the improvement of the overall translation quality is promoted, the multi-scene practical application is promoted, and the user satisfaction is improved. Under the medical text translation scene, the medical text translation quality can be obviously improved.
The machine translation method, the device, the electronic equipment and the storage medium provided by the embodiment of the application are specifically described through the following embodiments, and the machine translation method in the embodiment of the application is described first.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The embodiment of the application provides a machine translation method, which relates to the technical field of artificial intelligence. The machine translation method provided by the embodiment of the application can be applied to the terminal, can be applied to the server side, and can also be software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements a machine translation method, but is not limited to the above form.
The application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
It should be noted that, in each specific embodiment of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through popup or jump to a confirmation page and the like, and after the independent permission or independent consent of the user is definitely acquired, the necessary relevant data of the user for enabling the embodiment of the application to normally operate is acquired.
Fig. 1 is an optional flowchart of a machine translation method according to an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S104.
Step S101, inputting a target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on an N-gram;
step S102, analyzing a target sentence through a temporal model based on an N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence;
step S103, determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics;
step S104, outputting the translation result of the target sentence according to the temporal information.
It should be noted that the application can be applied to a translation use scenario with high requirements on the translation quality of english translations, such as news translations, legal translations, medical translations, etc., and can be adjusted in time according to the translations, so as to realize efficient implementation of service contents.
In step S101 of some embodiments, a target sentence is input to a preset phrase-based statistical machine translation SMT system, which includes an N-gram based temporal model. The target sentence is a sentence to be translated, for example, in a digital medical scene, the target sentence is a medical text to be translated. Sentences to be translated are input to a phrase-based SMT system for translation, and the translated languages include, but are not limited to, english to chinese.
In step S102 of some embodiments, the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence are obtained by parsing the target sentence based on the temporal model of the N-gram. In the application, the temporal model based on the N-gram of the Chinese language model is combined with the phrase-based statistical machine translation SMT system to improve the temporal consistency before and after translation. In the N-gram based tense model, the main tense of a sentence is called the document-level inter-sentence tense (i.e., document-level tense feature), and the tense of each target word within the sentence is called the sentence-level intra-sentence tense (i.e., sentence-level tense feature). Because the inter-sentence temporal feature and the intra-sentence temporal feature are adopted, inter-sentence temporal and intra-sentence temporal corresponding to English corpus are obtained from English syntactic analysis results through a correlation algorithm, the temporal corpus marked with temporal is used for training a temporal model based on N-gram, all assumed translations of a decoder are re-scored according to score values of the two temporal translation models, and the translation with the highest score is found out to be used as a final result. When the translation of the previous sentence is finished, the main tense of the sentence is saved in the decoder buffer memory and is transferred to the next sentence, and the main tense buffer memory is cleared until the translation of the document is finished, so that the target tense transfer across sentences is realized.
In step S103 of some embodiments, the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the sentence-level temporal feature and the document-level temporal feature are integrated into the phrase-based SMT system by introducing the inter-sentence temporal feature and the intra-sentence temporal feature.
In step S104 of some embodiments, the translation result of the target sentence is output according to the temporal information, so that the temporal information can be fully utilized, and the translation result of the target sentence is output according to the temporal information, which not only can remarkably improve the machine translation quality, but also can effectively integrate the translation information, and meanwhile has good expansibility, thereby promoting the improvement of the overall translation quality, promoting the actual application of multiple scenes, and improving the user satisfaction.
In some embodiments, taking machine translation as an example in which a temporal model of an N-gram is combined with a phrase-based statistical machine translation SMT system, when a hypothesis covers all source side words during decoding, the decoder first obtains the temporal sequence of the hypothesis and calculates intra-sentence temporal features Fs. At the same time, it recognizes the main tense of the sentence and associates the main tense of the previous sentence, and calculates the sentence of inter-sentence tense feature Fm. Next, the decoder uses these two additional functions to automatically re-score the values of the hypotheses and, after selecting a highest scoring hypothesis as the final hypothesis to translate a sentence, the decoder caches its main tense and passes it on to the next sentence. After processing a document, the decoder clears the buffer. Some of the reasons that SMT systems typically produce strange translations are word order anomalies, text imperfections, and so on. For these anomalous translated text, the syntactic analyzer does not work well, so previous methods of parsing the main temporal and temporal sequences of regular text are not applicable here either. Thus, the Stanford POS marker may be used for SMT output. The reason is that phrase-based SMT contains short contexts that the POS marker can use when the parser fails. Once a complete hypothesis is obtained, the decoder passes it to the Stanford POS marker and obtains all temporal sequences from the temporal verbs. Since the POS tag may not return information about the level structure, the decoder cannot identify the temporal sequence of main temporal states therefrom. However, since the Chinese verb has good correlation with the English verb, the prime tense verb of the SMT output can be obtained according to the tense, and the node parse tree corresponds to 'VV' (the 'VV' refers to the Chinese verb on the top layer of the source). English nodes are a tense verb that can speak the main tense of the English sentence. Thus, before translating a sentence, the decoder first parses and records a chinese "VV" node located at the top level. Once the complete hypothesis is generated, the decoder may map the nodes to english locations based on the phrase alignment information and obtain the dominant tense based on the POS tag. If the Chinese node does not contain the tense verb, the system cannot distinguish tense types by itself, but allows to find out the word with the main tense similar to the Chinese node and whether the top verb of Chinese has a verb corresponding relation in the left and right three bits. In this way, the learning rate of English can reach 83%. In this way, two additional functions can be successfully passed: sentence-level temporal features and document-level temporal features are integrated into a phrase-based SMT system, thereby improving translation quality.
In some embodiments, in the process of translating the medical text to be translated, the target sentence is analyzed through a temporal model based on the N-gram to obtain the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence, then the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the inter-sentence temporal feature and the intra-sentence temporal feature are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the translation result of the target sentence is output according to the temporal information, the machine translation quality can be remarkably improved, the translation information can be effectively integrated, and meanwhile, the good expansibility is achieved, so that the improvement of the overall translation quality is promoted, and the translation accuracy is improved.
Step S101 to step S104 shown in the embodiment of the application, inputting a target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on N-gram; analyzing the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence; determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics; and outputting the translation result of the target sentence according to the temporal information. Based on the method, the device and the system, the target sentence is analyzed through the temporal model based on the N-gram to obtain the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence, then the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the inter-sentence temporal feature and the intra-sentence temporal feature are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the translation result of the target sentence is output according to the temporal information, the machine translation quality can be remarkably improved, the translation information can be effectively integrated, and meanwhile, the good expansibility is achieved, the improvement of the overall translation quality is promoted, the multi-scene practical application is promoted, and the user satisfaction is improved. Under the medical text translation scene, the medical text translation quality can be obviously improved.
Referring to FIG. 2, in some embodiments, the N-gram based temporal model may be derived by training including, but not limited to, steps S201 through S204:
step S201, obtaining training sentences from a corpus;
step S202, performing temporal extraction on the training sentences to obtain temporal sequences of the training sentences;
step S203, analyzing the temporal sequence of the training sentence to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics;
step S204, generating a temporal model based on the N-gram based on the inter-sentence temporal features and the intra-sentence temporal features.
In some embodiments, training sentences are obtained from a corpus, then, tense extraction is performed on the training sentences to obtain a tense sequence of the training sentences, the tense sequence of the training sentences is analyzed to obtain inter-sentence tense features and intra-sentence tense features, wherein a main tense of one sentence is an inter-sentence tense at a document level, and tenses of each target word in the sentence are called intra-sentence tenses at a sentence level, and a tense model based on N-gram is generated based on the inter-sentence tense features and the intra-sentence tense features.
In some embodiments, after extracting the master tense of each sentence, an N-gram model is built, explicit as inter-sentence tenses and inter-document tenses, and integrated into the SMT system. The application can analyze sentences in the training corpus, extract temporal sequences and establish an N-gram model, not only can be used for estimating the rationality of temporal combination in sentences, but also can monitor SMT aiming at the problems that the temporal differences of simple sentences and parallel verbs are different in sentence level, the temporal consistency of clauses and main sentences is not achieved in compound sentences, and the like.
Referring to fig. 3, in some embodiments, step S203 may include, but is not limited to, steps S301 to S303:
step S301, analyzing a temporal sequence of the training sentence to obtain a main temporal of the training sentence and a temporal of each target word in the training sentence;
step S302, calculating inter-sentence temporal features according to the main temporal state of the training sentences;
step S303, calculating intra-sentence temporal features according to the temporal state of each target word in the training sentence.
In some embodiments, the temporal sequence of training sentences is analyzed to obtain a dominant temporal of the training sentences and a temporal of each target word within the training sentences, wherein the dominant temporal of one sentence is an inter-sentence temporal at the document level and the temporal of each target word within the sentence is referred to as an intra-sentence temporal at the sentence level. Inter-sentence temporal features are calculated according to the main temporal state of the training sentence, and intra-sentence temporal features are calculated according to the temporal state of each target word in the training sentence.
Referring to fig. 4, in some embodiments, step S302 may include, but is not limited to, steps S401 to S403:
step S401, a first temporal sequence of the main temporal of the training sentence is obtained;
step S402, adjusting the weight of the first time sequence based on a minimum error rate training method;
Step S403, calculating the inter-sentence temporal feature according to the first temporal sequence and the weight of the first temporal sequence.
In some embodiments, the inter-sentence temporal feature is calculated from the first temporal sequence and the weights of the first temporal sequence by obtaining a first temporal sequence of the dominant temporal of the training sentence, adjusting the weights of the first temporal sequence based on a minimum error rate training method (Minimum Error Rate Training, MERT).
In some embodiments, given a master temporal sequence t1, …, tm of a document, the calculation of the inter-sentence temporal feature Fm may be according to the following formula:
wherein P (·) of the formula can be estimated by the formula. Notably, the first sentence of a document is typically stressful because it corresponds to the title in most cases. For the first sentence, the P (·) value is set to 14 (4 temporal types).
Referring to fig. 5, in some embodiments, step S303 may include, but is not limited to, steps S501 to S503:
step S501, a second temporal sequence of the temporal state of each target word in the training sentence is obtained;
step S502, determining the average length of the second temporal sequence;
step S503, calculating intra-sentence temporal features according to the average length of the second temporal sequence.
In some embodiments, the intra-sentence temporal feature is calculated from the average length of the second temporal sequence by obtaining a second temporal sequence of the temporal of each target word within the training sentence, determining the average length of the second temporal sequence.
In some embodiments, the temporal sequence s1, …, se (e > 1) of a sentence is given, and the intra-sentence temporal feature Fs is calculated as follows:
the square root operator is used here to avoid penalizing translations with long temporal sequences. Notably, if the sentence contains only one tense, the P (·) value in the formula is also set to 14. Since the average length of intra-sentence temporal sequences is about 2.5, mainly considering the internal temporal binary structure model, n is equal to 2.5.
Referring to fig. 6, in some embodiments, step S104 may include, but is not limited to, steps S601 to S602:
step S601, analyzing the context information of the target sentence through a part-of-speech tagger in a phrase-based statistical machine translation system;
step S602, outputting the translation result of the target sentence according to the context information and the temporal information of the target sentence.
In some embodiments, the contextual information of the target sentence is analyzed by a part-of-speech tagger in a phrase-based SMT system, wherein the part-of-speech tagger includes, but is not limited to, a Stanford POS tagger. Part of the reasons for the unusual translations that SMT systems typically produce are word order anomalies, text imperfections, and the like. For these anomalous translated text, the syntactic analyzer does not work well, so previous methods of parsing the main temporal and temporal sequences of regular text are not applicable here either. Thus, the Stanford POS marker may be used for SMT output. The reason is that phrase-based SMT contains short contexts that the POS marker can use when the parser fails. Once a complete hypothesis is obtained, the decoder passes it to the Stanford POS marker and obtains all temporal sequences from the temporal verbs, outputting the translation result of the target sentence from the context information and the temporal information of the target sentence.
Referring to fig. 7, in some embodiments, step S602 may include, but is not limited to, steps S701 to S702:
step S701, determining verb corresponding relation of the target sentence mapped to Chinese nodes in English nodes according to context information and tense information of the target sentence;
step S702, outputting the translation result of the target sentence based on the verb correspondence.
In some embodiments, taking english translation chinese as an example, a verb correspondence of a target sentence mapped to a chinese node at an english node is determined according to context information and tense information of the target sentence, and since the chinese verb has a good correlation with the english verb, a prime tense verb of SMT output can be obtained according to such tenses, which corresponds to a "VV" ("VV" refers to a chinese verb at a top layer of a source end) node parse tree. English nodes are a tense verb that can speak the main tense of the English sentence. Thus, before translating a sentence, the decoder first parses and records a chinese "VV" node located at the top level. Once the complete hypothesis is generated, the decoder may map the nodes to english locations based on the phrase alignment information and obtain the dominant tense based on the POS tag. If the Chinese node does not contain the tense verb, the system cannot distinguish tense types by itself, but allows words with the main tense similar to the Chinese node and whether the top verb of the Chinese has a verb corresponding relation or not to be found in the left and right three bits, and the translation result of the target sentence is output based on the verb corresponding relation. In this way, the learning rate of English can reach 83%. In this way, two additional functions can be successfully passed: sentence-level temporal features and document-level temporal features are integrated into a phrase-based SMT system, thereby improving translation quality.
Based on the method, the target sentence in the training corpus can be analyzed in statistical machine translation based on the temporal model of the N-gram, the temporal sequence is extracted, and the temporal model of the N-gram is established, so that the method can be used for estimating the rationality of temporal combination in sentences and supervising the SMT according to the grammar problem of the SMT on the sentence level. The application can be suitable for translation use scenes with higher requirements on the translation quality of English translations, such as news translations, legal translations, medical translations and the like. The N-gram based temporal model can successfully integrate two functions, sentence-level temporal features and document-level temporal features into a phrase-based SMT system. The method not only can remarkably improve the machine translation quality, but also can effectively integrate the translation information, and has good expansibility, thereby promoting the improvement of the whole translation quality, promoting the actual application of multiple scenes and improving the satisfaction degree of users. Under the medical text translation scene, the medical text translation quality can be obviously improved.
The machine translation method of the present application is further described below with reference to the drawings and specific examples.
Referring to fig. 8, taking the example of machine translation in combination with a phrase-based statistical machine translation SMT system, the temporal model of an N-gram, when a hypothesis covers all source side words during decoding, the decoder first obtains the temporal sequence of the hypothesis and calculates intra-sentence temporal features Fs. At the same time, it recognizes the main tense of the sentence and associates the main tense of the previous sentence, and calculates the sentence of inter-sentence tense feature Fm. Next, the decoder uses these two additional functions to automatically re-score the values of the hypotheses and, after selecting a highest scoring hypothesis as the final hypothesis to translate a sentence, the decoder caches its main tense and passes it on to the next sentence. After processing a document, the decoder clears the buffer. Some of the reasons that SMT systems typically produce strange translations are word order anomalies, text imperfections, and so on. For these anomalous translated text, the syntactic analyzer does not work well, so previous methods of parsing the main temporal and temporal sequences of regular text are not applicable here either. Thus, the Stanford POS marker may be used for SMT output. The reason is that phrase-based SMT contains short contexts that the POS marker can use when the parser fails. Once a complete hypothesis is obtained, the decoder passes it to the Stanford POS marker and obtains all temporal sequences from the temporal verbs. Since the POS tag may not return information about the level structure, the decoder cannot identify the temporal sequence of main temporal states therefrom. However, since the Chinese verb has good correlation with the English verb, the prime tense verb of the SMT output can be obtained according to the tense, and the node parse tree corresponds to 'VV' (the 'VV' refers to the Chinese verb on the top layer of the source). English nodes are a tense verb that can speak the main tense of the English sentence. Thus, before translating a sentence, the decoder first parses and records a chinese "VV" node located at the top level. Once the complete hypothesis is generated, the decoder may map the nodes to english locations based on the phrase alignment information and obtain the dominant tense based on the POS tag. If the Chinese node does not contain the tense verb, the system cannot distinguish tense types by itself, but allows to find out the word with the main tense similar to the Chinese node and whether the top verb of Chinese has a verb corresponding relation in the left and right three bits. In this way, the learning rate of English can reach 83%. In this way, two additional functions can be successfully passed: sentence-level temporal features and document-level temporal features are integrated into a phrase-based SMT system, thereby improving translation quality.
Based on the method, the target sentence is input into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on N-gram; analyzing the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence; determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics; and outputting the translation result of the target sentence according to the temporal information. Based on the method, the device and the system, the target sentence is analyzed through the temporal model based on the N-gram to obtain the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence, then the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the inter-sentence temporal feature and the intra-sentence temporal feature are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the translation result of the target sentence is output according to the temporal information, the machine translation quality can be remarkably improved, the translation information can be effectively integrated, and meanwhile, the good expansibility is achieved, the improvement of the overall translation quality is promoted, the multi-scene practical application is promoted, and the user satisfaction is improved. Under the medical text translation scene, the medical text translation quality can be obviously improved.
Referring to fig. 9, an embodiment of the present application further provides a machine translation device, which may implement the machine translation method, where the device includes:
an input module 910, configured to input the target sentence into a preset phrase-based statistical machine translation system, where the phrase-based statistical machine translation system includes an N-gram-based temporal model;
the parsing module 920 is configured to parse the target sentence through a temporal model based on the N-gram, so as to obtain inter-sentence temporal features and intra-sentence temporal features of the target sentence;
a determining module 930, configured to determine temporal information of the target sentence according to the inter-sentence temporal feature and the intra-sentence temporal feature;
and the output module 940 is configured to output a translation result of the target sentence according to the temporal information.
In some embodiments of the present application, the input module 910 inputs the target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system includes an N-gram-based temporal model; the parsing module 920 parses the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal features and intra-sentence temporal features of the target sentence; the determining module 930 determines the temporal information of the target sentence according to the inter-sentence temporal feature and the intra-sentence temporal feature; the output module 940 outputs the translation result of the target sentence according to the temporal information.
In some embodiments of the present application, the input module 910 inputs the target sentence into a preset phrase-based statistical machine translation SMT system that includes an N-gram based temporal model. The target sentence is a sentence needing to be translated, the sentence needing to be translated is input into a phrase-based SMT system for translation, and the translated languages include but are not limited to English to Chinese.
In some embodiments of the present application, the parsing module 920 parses the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal features and intra-sentence temporal features of the target sentence. In the application, the temporal model based on the N-gram of the Chinese language model is combined with the phrase-based statistical machine translation SMT system to improve the temporal consistency before and after translation. In the N-gram based tense model, the main tense of a sentence is called the document-level inter-sentence tense (i.e., document-level tense feature), and the tense of each target word within the sentence is called the sentence-level intra-sentence tense (i.e., sentence-level tense feature). Because the inter-sentence temporal feature and the intra-sentence temporal feature are adopted, inter-sentence temporal and intra-sentence temporal corresponding to English corpus are obtained from English syntactic analysis results through a correlation algorithm, the temporal corpus marked with temporal is used for training a temporal model based on N-gram, all assumed translations of a decoder are re-scored according to score values of the two temporal translation models, and the translation with the highest score is found out to be used as a final result. When the translation of the previous sentence is finished, the main tense of the sentence is saved in the decoder buffer memory and is transferred to the next sentence, and the main tense buffer memory is cleared until the translation of the document is finished, so that the target tense transfer across sentences is realized.
In some embodiments of the present application, the determining module 930 determines the temporal information of the target sentence according to the inter-sentence temporal feature and the intra-sentence temporal feature, and integrates the sentence-level temporal feature and the document-level temporal feature into the phrase-based SMT system by introducing the inter-sentence temporal feature and the intra-sentence temporal feature.
In some embodiments of the present application, the output module 940 outputs the translation result of the target sentence according to the temporal information, so that the temporal information can be fully utilized, and the translation result of the target sentence is output according to the temporal information, which not only can significantly improve the machine translation quality, but also can effectively integrate the translation information, and simultaneously has good expansibility, thereby promoting the improvement of the overall translation quality, promoting the actual application of multiple scenes, and improving the user satisfaction.
Based on this, in the machine translation device according to the embodiment of the present application, the input module 910 inputs the target sentence into a preset phrase-based statistical machine translation system, where the phrase-based statistical machine translation system includes a temporal model based on N-gram; the parsing module 920 parses the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal features and intra-sentence temporal features of the target sentence; the determining module 930 determines the temporal information of the target sentence according to the inter-sentence temporal feature and the intra-sentence temporal feature; the output module 940 outputs the translation result of the target sentence according to the temporal information. Based on the method, the target sentence is input into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on N-gram; analyzing the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence; determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics; and outputting the translation result of the target sentence according to the temporal information. Based on the method, the device and the system, the target sentence is analyzed through the temporal model based on the N-gram to obtain the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence, then the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the inter-sentence temporal feature and the intra-sentence temporal feature are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the translation result of the target sentence is output according to the temporal information, the machine translation quality can be remarkably improved, the translation information can be effectively integrated, and meanwhile, the good expansibility is achieved, the improvement of the overall translation quality is promoted, the multi-scene practical application is promoted, and the user satisfaction is improved. Under the medical text translation scene, the medical text translation quality can be obviously improved.
The embodiment of the machine translation device is basically the same as the embodiment of the machine translation method described above, and will not be described herein.
The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the machine translation method when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 10, fig. 10 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:
the processor 1001 may be implemented by using a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. to execute related programs to implement the technical solutions provided by the embodiments of the present application.
The memory 1002 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM). The memory 1002 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present disclosure are implemented by software or firmware, relevant program codes are stored in the memory 1002, and the processor 1001 invokes a machine translation method for executing the embodiments of the present disclosure, that is, by inputting a target sentence into a preset phrase-based statistical machine translation system, where the phrase-based statistical machine translation system includes an N-gram-based temporal model; analyzing the target sentence through a temporal model based on the N-gram to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence; determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics; and outputting the translation result of the target sentence according to the temporal information. Based on the method, the device and the system, the target sentence is analyzed through the temporal model based on the N-gram to obtain the inter-sentence temporal feature and the intra-sentence temporal feature of the target sentence, then the temporal information of the target sentence is determined according to the inter-sentence temporal feature and the intra-sentence temporal feature, and the inter-sentence temporal feature and the intra-sentence temporal feature are integrated into the phrase-based SMT system, so that the temporal information can be fully utilized, the translation result of the target sentence is output according to the temporal information, the machine translation quality can be remarkably improved, the translation information can be effectively integrated, and meanwhile, the good expansibility is achieved, the improvement of the overall translation quality is promoted, the multi-scene practical application is promoted, and the user satisfaction is improved. Under the medical text translation scene, the medical text translation quality can be obviously improved.
An input/output interface 1003 for implementing information input and output.
The communication interface 1004 is configured to implement communication interaction between the device and other devices, and may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
A bus that transfers information between the various components of the device, such as the processor 1001, memory 1002, input/output interfaces 1003, and communication interfaces 1004.
Wherein the processor 1001, the memory 1002, the input/output interface 1003, and the communication interface 1004 realize communication connection between each other inside the device through a bus.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the machine translation method when being executed by a processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiment of the application provides a machine translation method, a machine translation device, electronic equipment and a storage medium, wherein a target voice signal is acquired; preprocessing a target voice signal to obtain a semantic information feature vector; pooling the semantic information feature vectors to obtain dimension reduction feature vectors; sequentially inputting the dimension-reducing feature vector into a plurality of cascade task classifiers from low to high according to the semantic hierarchy of the target task to obtain classification prediction results output by the task classifiers; and identifying emotion information of the target voice signal based on classification prediction results corresponding to the plurality of target tasks. Based on the method, semantic information feature vectors are extracted from voice signals, the semantic information feature vectors are pooled, the obtained dimension reduction feature vectors are input into a plurality of cascaded task classifiers for multitasking machine translation, and compared with the prior art that the parallel classifiers are adopted to directly output various classification results, the task classifiers are sequentially cascaded together according to the semantic hierarchy of tasks, so that the input voice feature hierarchy is gradually deepened, emotion information can be extracted layer by layer and interacted with the next-layer task, classification prediction results of different semantic hierarchies output by the task classifiers are obtained, emotion information of the voice signals is recognized based on the classification prediction results of the different semantic hierarchies, therefore, high-level semantics can be effectively extracted from the voice signals, the classification effect of the multitasking model is further improved, and the voice machine translation is more accurate. Under the medical text translation scene, the medical text translation quality can be obviously improved.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable programs, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable programs, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by persons skilled in the art that the embodiments of the application are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.
The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (10)

1. A machine translation method, the method comprising:
inputting a target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on an N-gram;
analyzing the target sentence through the N-gram-based temporal model to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics of the target sentence;
determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and intra-sentence temporal characteristics;
and outputting the translation result of the target sentence according to the temporal information.
2. The method according to claim 1, wherein the N-gram based temporal model is trained by:
obtaining training sentences from a corpus;
performing temporal extraction on the training sentences to obtain temporal sequences of the training sentences;
analyzing the temporal sequence of the training sentences to obtain inter-sentence temporal characteristics and intra-sentence temporal characteristics;
and generating a temporal model based on the N-gram based on the inter-sentence temporal features and the intra-sentence temporal features.
3. The method of claim 2, wherein analyzing the temporal sequence of training sentences to obtain inter-sentence temporal features and intra-sentence temporal features comprises:
Analyzing the temporal sequence of the training sentences to obtain the main temporal of the training sentences and the temporal of each target word in the training sentences;
calculating the inter-sentence temporal characteristics according to the main temporal state of the training sentences;
and calculating the intra-sentence temporal characteristics according to the temporal state of each target word in the training sentence.
4. A method according to claim 3, wherein said calculating said inter-sentence temporal feature from a dominant temporal of said training sentence comprises:
acquiring a first temporal sequence of the main temporal state of the training sentence;
adjusting the weight of the first time sequence based on a minimum error rate training method;
and calculating the inter-sentence temporal characteristics according to the first temporal sequence and the weight of the first temporal sequence.
5. The method of claim 3, wherein said calculating the intra-sentence temporal feature from the temporal state of each target word within the training sentence comprises:
acquiring a second temporal sequence of the temporal state of each target word in the training sentence;
determining an average length of the second temporal sequence;
and calculating the intra-sentence temporal feature according to the average length of the second temporal sequence.
6. The method according to claim 1, wherein outputting the translation result of the target sentence according to the temporal information comprises:
analyzing the context information of the target sentence through a part-of-speech tagger in the phrase-based statistical machine translation system;
and outputting the translation result of the target sentence according to the context information and the temporal information of the target sentence.
7. The method of claim 6, wherein the outputting the translation result of the target sentence according to the context information and the temporal information of the target sentence comprises:
determining verb corresponding relation of the target sentence mapped to Chinese nodes in English nodes according to the context information and the tense information of the target sentence;
and outputting a translation result of the target sentence based on the verb correspondence.
8. A machine translation apparatus, the apparatus comprising:
the input module is used for inputting the target sentence into a preset phrase-based statistical machine translation system, wherein the phrase-based statistical machine translation system comprises a temporal model based on an N-gram;
the analysis module is used for analyzing the target sentence through the N-gram-based temporal model to obtain the inter-sentence temporal characteristics and the intra-sentence temporal characteristics of the target sentence;
The determining module is used for determining the temporal information of the target sentence according to the inter-sentence temporal characteristics and the intra-sentence temporal characteristics;
and the output module is used for outputting the translation result of the target sentence according to the temporal information.
9. An electronic device comprising a memory storing a computer program and a processor that when executing the computer program implements the machine translation method of any of claims 1 to 7.
10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the machine translation method of any one of claims 1 to 7.
CN202310672071.5A 2023-06-07 2023-06-07 Machine translation method and device, electronic equipment and storage medium Pending CN116644764A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310672071.5A CN116644764A (en) 2023-06-07 2023-06-07 Machine translation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310672071.5A CN116644764A (en) 2023-06-07 2023-06-07 Machine translation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116644764A true CN116644764A (en) 2023-08-25

Family

ID=87624542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310672071.5A Pending CN116644764A (en) 2023-06-07 2023-06-07 Machine translation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116644764A (en)

Similar Documents

Publication Publication Date Title
US8832064B2 (en) Answer determination for natural language questioning
CN109493977A (en) Text data processing method, device, electronic equipment and computer-readable medium
Mills et al. Graph-based methods for natural language processing and understanding—A survey and analysis
JP2019504395A (en) Antecedent determination method and apparatus
US10824816B2 (en) Semantic parsing method and apparatus
Gokul et al. Sentence similarity detection in Malayalam language using cosine similarity
US11494420B2 (en) Method and apparatus for generating information
CN110347790B (en) Text duplicate checking method, device and equipment based on attention mechanism and storage medium
CN110096599B (en) Knowledge graph generation method and device
CN116561538A (en) Question-answer scoring method, question-answer scoring device, electronic equipment and storage medium
CN117271736A (en) Question-answer pair generation method and system, electronic equipment and storage medium
CN112347339A (en) Search result processing method and device
CN113919360A (en) Semantic understanding method, voice interaction method, device, equipment and storage medium
CN114840632A (en) Knowledge extraction method, system, equipment and storage medium
JP6867963B2 (en) Summary Evaluation device, method, program, and storage medium
CN114722774B (en) Data compression method, device, electronic equipment and storage medium
CN115115432B (en) Product information recommendation method and device based on artificial intelligence
CN114492437B (en) Keyword recognition method and device, electronic equipment and storage medium
CN116304231A (en) Query statement generation method and device based on grammar parsing tree, equipment and medium
CN114398903B (en) Intention recognition method, device, electronic equipment and storage medium
CN115034209A (en) Text analysis method and device, electronic equipment and storage medium
Sunitha et al. Automatic summarization of Malayalam documents using clause identification method
CN116644764A (en) Machine translation method and device, electronic equipment and storage medium
US11157538B2 (en) System and method for generating summary of research document
CN111814025A (en) Viewpoint extraction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination