CN116432666A - Text evaluation method and device, electronic equipment and storage medium - Google Patents

Text evaluation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116432666A
CN116432666A CN202111649387.XA CN202111649387A CN116432666A CN 116432666 A CN116432666 A CN 116432666A CN 202111649387 A CN202111649387 A CN 202111649387A CN 116432666 A CN116432666 A CN 116432666A
Authority
CN
China
Prior art keywords
evaluated
text
phrase
target
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111649387.XA
Other languages
Chinese (zh)
Inventor
严渊蒙
杨振
孟凡东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111649387.XA priority Critical patent/CN116432666A/en
Publication of CN116432666A publication Critical patent/CN116432666A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of computers, in particular to a text evaluation method, a text evaluation device, electronic equipment and a storage medium, which are used for improving the quality evaluation accuracy of machine translation text. The method comprises the following steps: obtaining a text to be evaluated obtained by performing machine translation on an original text and a corresponding standard translation text; carrying out phrase alignment on candidate phrases to be evaluated in the text to be evaluated and standard phrases in the standard translation text to obtain standard phrases corresponding to the candidate phrases to be evaluated; replacing at least one target phrase to be evaluated in the candidate phrases to be evaluated with a corresponding standard phrase, and determining the confusion degree variation of the texts to be evaluated before and after replacement; and determining target labels of the target phrases to be evaluated based on the confusion degree variation. According to the method and the device for determining the target label of the target phrase to be evaluated, the target label of the target phrase to be evaluated is determined based on the confusion degree conversion amounts of the texts to be evaluated before and after the target phrase to be evaluated is replaced, so that the accuracy of quality evaluation of the machine translation text can be effectively improved.

Description

Text evaluation method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a text evaluation method, a text evaluation device, an electronic device, and a storage medium.
Background
In recent years, with the development of artificial intelligence, particularly the increasing maturity of deep learning technology, the artificial intelligence is widely applied in various industries, and the production efficiency is greatly improved. In the field of machine translation in the field of natural language processing, machine translation based on a neural network also achieves a good effect. However, the current machine translation capability still does not reach the translation level of a professional translator, and the translation result of the machine translation needs to be detected to locate possible translation errors.
In the related art, in order to evaluate and detect the quality of a machine translation text, a translation error rate toolkit (Translation Error Rate Toolkit, TER toolkit) is mainly used to perform word level alignment on a machine translation (Machine Translation, MT) sentence and a reference translation sentence, and obtain a translation tag of each word in the MT sentence so as to label the word level of the MT sentence.
However, since the TER toolkit performs word level alignment based on the rule of complete matching of character strings, the method easily causes that some words and sentence components with the same meaning are not aligned without translation errors, and are marked as translation errors, so that quality evaluation of MT sentences is inaccurate. Therefore, how to accurately evaluate and measure the translation quality of machine translation text is a urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides a text evaluation method, a text evaluation device, electronic equipment and a storage medium, which are used for improving the quality evaluation accuracy of machine translation text.
The text evaluation method provided by the embodiment of the application comprises the following steps:
obtaining a text to be evaluated obtained by performing machine translation on an original text and a standard translation text corresponding to the original text;
carrying out phrase alignment on each candidate phrase to be evaluated in the text to be evaluated and the standard phrases in the standard translation text to obtain standard phrases corresponding to each candidate phrase to be evaluated;
respectively replacing at least one target phrase to be evaluated in the candidate phrases to be evaluated with a corresponding standard phrase, and determining the confusion degree variation of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced, wherein the confusion degree characterizes the semantic fluency of the text to be evaluated;
and determining target labels of corresponding target phrases to be evaluated based on the confusion degree variation.
The text evaluation device provided by the embodiment of the application comprises:
the acquisition unit is used for acquiring a text to be evaluated obtained by performing machine translation on the original text and a standard translation text corresponding to the original text;
The alignment unit is used for carrying out phrase alignment on each candidate phrase to be evaluated in the text to be evaluated and the standard phrases in the standard translation text to obtain standard phrases corresponding to each candidate phrase to be evaluated;
the replacing unit is used for replacing at least one target phrase to be evaluated in the candidate phrases to be evaluated with corresponding standard phrases respectively, determining the confusion degree variation of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced, and representing the semantic fluency of the text to be evaluated;
and the determining unit is used for determining the target label of the corresponding target phrase to be evaluated based on the confusion degree variation.
Optionally, the alignment unit is specifically configured to:
based on the to-be-evaluated words in the to-be-evaluated text and the similarity between the standard words in the standard translation text, carrying out word alignment between each to-be-evaluated word and each standard word to obtain each corresponding standard word of each to-be-evaluated word;
determining at least one candidate phrase to be evaluated consisting of the words to be evaluated aiming at the text to be evaluated, wherein each candidate phrase to be evaluated comprises at least one word to be evaluated;
And extracting the standard phrases from the standard translation text based on the words to be evaluated and the corresponding standard words contained in each candidate phrase to be evaluated, and obtaining the standard phrases corresponding to each candidate phrase to be evaluated.
Optionally, the alignment unit is specifically configured to:
based on the positions of the words to be evaluated in the text to be evaluated, obtaining candidate phrases to be evaluated composed of the words to be evaluated, wherein the positions of adjacent words to be evaluated contained in each candidate phrase to be evaluated in the text to be evaluated are adjacent; or alternatively, the process may be performed,
and carrying out component syntactic analysis on the text to be evaluated to obtain phrases which are contained in the text to be evaluated and accord with the specified grammar rules, and taking the phrases which accord with the specified grammar rules as candidate phrases to be evaluated in the text to be evaluated.
Optionally, the target phrase under evaluation includes at least one of:
the initial tag is a wrong candidate phrase to be evaluated, and the initial tag is: based on a preset editing frequency rule, aligning the words to be evaluated in the text to be evaluated with the standard words in the standard translation text;
and carrying out component syntactic analysis on the text to be evaluated, wherein the obtained phrases which are contained in the text to be evaluated and accord with the specified grammar rules.
Optionally, the determining unit is specifically configured to:
before and after replacing each target phrase to be evaluated in the at least one phrase to be evaluated, the confusion degree variation of the text to be evaluated;
if the confusion degree variation is smaller than a first threshold value, setting a target label of a corresponding target phrase to be evaluated to be correct;
if the confusion degree variation is not smaller than a first threshold, taking an initial label as a target label, wherein the initial label is as follows: and aligning the words to be evaluated in the text to be evaluated with the standard words in the standard translation text based on a preset editing frequency rule.
Optionally, the initial tag for each target phrase under evaluation is determined in the following manner:
aiming at one target phrase to be evaluated, carrying out word alignment on the word to be evaluated in the target phrase to be evaluated and the standard word in the standard translation text based on a preset editing frequency rule;
if each word to be evaluated in the target phrase to be evaluated is consistent with the corresponding standard word, the initial label of the target phrase to be evaluated is correct;
if each word to be evaluated in the target phrase to be evaluated is inconsistent with the corresponding standard word, the initial label of the target phrase to be evaluated is wrong.
Optionally, the determining unit is specifically configured to:
determining the word confusion degree variation of each target phrase to be evaluated based on the confusion degree variation and the word number of the corresponding target phrase to be evaluated;
sequencing the confusion degree variation of each word, and determining the minimum confusion degree variation of each word;
if the minimum word confusion degree variation is smaller than a second threshold, the target label of the target phrase to be evaluated corresponding to the minimum word confusion degree variation is wrong;
if the minimum word confusion degree variation is not smaller than a second threshold value, all target tags of the target phrases to be evaluated are correct.
Optionally, after determining the target tag of the corresponding target phrase to be evaluated based on each confusion degree variation, the apparatus further includes a construction unit configured to:
constructing parallel corpus based on a text to be evaluated obtained by performing machine translation on an original text, a standard translation text corresponding to the text to be evaluated and target labels corresponding to target phrases to be evaluated in the text to be evaluated;
model training is carried out based on the parallel corpus, a trained machine translation quality evaluation model is obtained, and the machine translation quality evaluation model is used for carrying out quality evaluation labeling on machine translation texts obtained by machine translation.
An electronic device provided in an embodiment of the present application includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, causes the processor to execute any one of the steps of the text evaluation method described above.
The embodiment of the application provides a computer readable storage medium comprising a computer program for causing an electronic device to execute the steps of any one of the text evaluation methods described above when the program code is run on the electronic device.
Embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium; when the processor of the electronic device reads the computer program from the computer-readable storage medium, the processor executes the computer program so that the electronic device performs the steps of any one of the text evaluation methods described above.
The beneficial effects of the application are as follows:
the text evaluation method, the device, the electronic equipment and the storage medium provided by the embodiment of the application firstly acquire a text to be evaluated obtained by performing machine translation on an original text and a standard translation text corresponding to the original text; then, each candidate phrase to be evaluated in the text to be evaluated is subjected to phrase alignment with the standard phrase in the standard translation text to obtain the standard phrase corresponding to each candidate phrase to be evaluated, and the phrase conforming to the sentence grammar structure can be obtained; further, at least one target phrase to be evaluated in each candidate phrase to be evaluated is replaced by a corresponding standard phrase respectively, and the confusion degree change amount of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced is determined; and determining target labels of corresponding target phrases to be evaluated based on the confusion degree variation. The method has the advantages that the target labels of the target phrases to be evaluated are determined through the confusion degree variation of the texts to be evaluated before and after the target phrases to be evaluated are replaced, so that phrases with the same meaning as the standard translation texts or sentence components in the texts to be evaluated can be reduced, when word level alignment is carried out through a TER tool kit, the conditions that the phrases cannot be aligned and are marked as errors are effectively improved, and therefore the accuracy of the labels of the phrases to be evaluated is improved, and the quality evaluation accuracy of the machine translation texts is improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1A is a schematic text diagram of an embodiment of the present application;
FIG. 1B is a schematic diagram of one of the related art for generating labeling labels;
FIG. 2 is a flow chart of an automatic generation of labeling labels in the related art;
FIG. 3 is a schematic diagram of a text labeled in the related art;
FIG. 4 is a schematic diagram of a text after dividing the text in the related art;
FIG. 5 is an alternative schematic diagram of an application scenario in an embodiment of the present application;
FIG. 6 is a flowchart of an implementation of a text evaluation method in an embodiment of the present application;
FIG. 7 is a schematic diagram of phrase replacement and confusion scoring in embodiments of the present application;
FIG. 8 is a schematic diagram of a word alignment method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of syntax analysis of syntax tree-based components in an embodiment of the present application;
FIG. 10 is a schematic diagram of a phrase alignment method in an embodiment of the present application;
FIG. 11 is a schematic diagram of obtaining an initial tag according to an embodiment of the present application;
FIG. 12A is a schematic diagram of obtaining a target tag according to an embodiment of the present application;
FIG. 12B is a schematic diagram of the results of a text evaluation method in an embodiment of the present application;
FIG. 13 is a schematic diagram of a text evaluation apparatus according to an embodiment of the present application;
fig. 14 is a schematic diagram of a composition structure of a text evaluation apparatus in an embodiment of the present application;
fig. 15 is a schematic diagram of a hardware composition structure of an electronic device to which the embodiments of the present application are applied;
fig. 16 is a schematic diagram of a hardware composition structure of another electronic device to which the embodiments of the present application are applied.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, which can be made by a person of ordinary skill in the art without any inventive effort, based on the embodiments described in the present application are intended to be within the scope of the technical solutions of the present application.
Some of the concepts involved in the embodiments of the present application are described below.
Neural network machine translation model (Neural Machine Translation, NMT): in the translation mode based on the neural network, the translation result of the machine translation still needs to be detected because the current machine translation capability still does not reach the translation level of a professional translator, and a possible translation error is marked to prompt a user.
Word level machine translation quality assessment task (Word-level Machine Translation Quality Estimation, word level MTQE): one of the machine translation quality assessment tasks, word-by-word labeling locates which components in the MT sentence contain translation errors (either the label OK or BAD, OK indicating no translation errors, BAD indicating translation errors).
MT sentence: the method refers to a translation sentence obtained after the translation of the original sentence by using a neural network machine translation model, namely a text to be evaluated in the embodiment of the application, and the machine translation quality is evaluated by carrying out translation error labeling on the MT sentence.
Post Edit (PE) sentence: and performing post-editing on the MT sentence by a professional translator, and correcting the sentence obtained after the translation error of the MT sentence to be used as a reference sentence when the translation quality of the MT sentence is evaluated, namely the standard translation text in the embodiment of the application.
TER toolkit: and the tool kit is used for matching word levels of the MT sentence and the PE sentence based on the principle of the minimum editing times, obtaining the label of the MT sentence in the MTQE task and calculating the translation error rate of the MT sentence.
Human direct evaluation (Human's Direct Assessment, human's DA): the method is characterized in that a professional translator directly marks the translation errors in the MT sentence, and the accuracy of the MTQE model on the MT sentence marking can be judged through the direct evaluation of human beings on the MT sentence.
Embodiments of the present application relate to artificial intelligence (Artificial Intelligence, AI), natural language processing (Nature Language processing, NLP), and Machine Learning (ML) techniques, designed based on computer vision techniques and Machine Learning in artificial intelligence.
Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence.
Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology mainly comprises a computer vision technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions. With research and progress of artificial intelligence technology, artificial intelligence is developed in various fields such as common smart home, intelligent customer service, virtual assistant, smart speaker, smart marketing, unmanned, automatic driving, robot, smart medical, etc., and it is believed that with the development of technology, artificial intelligence will be applied in more fields and become more and more important value.
Natural language processing is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Compared with the data mining, which finds the mutual characteristics among big data, the machine learning is more focused on the design of an algorithm, so that a computer can automatically learn the rules from the data and predict unknown data by utilizing the rules.
Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like. The machine translation quality assessment model in the embodiment of the application is trained by adopting a machine learning or deep learning technology. Based on the text evaluation method in the embodiment of the application, the accuracy of the machine translation quality evaluation model can be improved.
The following briefly describes the design concept of the embodiment of the present application:
In recent years, with the development of artificial intelligence, particularly the increasing maturity of deep learning technology, the artificial intelligence is widely applied in various industries, and the production efficiency is greatly improved. In the field of machine translation in the field of natural language processing, machine translation based on a neural network also achieves a good effect. However, the current machine translation capability still does not reach the translation level of a professional translator, and the translation result of the machine translation needs to be detected to locate possible translation errors.
In the related art, the following two modes are mainly adopted:
evaluation mode one: as shown in fig. 1A, first, an original sentence is sampled from a source language corpus, a corresponding MT sentence is generated by using an NMT model, and then a professional translator rewrites the MT sentence to obtain a PE sentence. Then, as shown in fig. 1B, according to the principle of the minimum editing times, the word-by-word alignment relationship of the MT sentence and the PE sentence is generated by the TER toolkit according to the MT sentence and the PE sentence. If the word in the MT sentence can be aligned to a certain word in the PE sentence, the word is considered to contain no translation error (tag OK), and if the word cannot be aligned, the word is considered to contain a translation error (tag BAD).
Evaluation mode II: as shown in fig. 2, a large-scale machine translation parallel corpus (including an original sentence and a corresponding target sentence) is firstly segmented into a training set and a testing set, an NMT model is trained by using the training set, then the trained NMT model is used for translating the original sentence in the testing set to obtain an MT sentence, then the target sentence in the testing set is used as a pseudo PE sentence, and the MTQE label of the MT sentence is generated by adopting a TER toolkit. And dividing the training set and the test set according to the proportion of 9:1 by adopting a 10-fold cross validation mode, repeating the process for generating the MTQE label for 10 times, and finally generating the MTQE label for the samples in the whole parallel corpus of machine translation.
However, although a high-quality PE sentence is referred to as a standard translation text by labeling an MT sentence based on the evaluation method, the method requires manual intervention, and the size of a text that can be labeled is very limited (about ten thousand or so in order of magnitude), which is costly. In addition, because the term toolkit performs word level alignment based on the rule that the character strings are completely matched, on one hand, as shown in fig. 3, words and sentence components (synonyms, intent translation, etc.) with the same partial meaning cannot be mapped through the term toolkit, and the thickened words represent words which cannot be mapped; on the other hand, as shown in fig. 4, since the TER toolkit has no knowledge of the grammar structure of sentences when it is aligned by running the minimum edit count algorithm, the labeled words are finely divided and are not the most exact phrases. Therefore, the words and sentence components with the same meaning are easy to cause that the words and sentence components are not aligned without translation errors and are marked as translation errors, so that the quality evaluation of MT sentences is inaccurate.
However, although the text marked by the mode of automatically generating the PE sentence can reach tens of millions based on the second evaluation mode, there is also a problem that the quality evaluation of the MT sentence is inaccurate due to the use of the TER toolkit in the first evaluation mode. Therefore, how to accurately evaluate and measure the translation quality of machine translation text is a problem to be solved.
In view of this, the embodiments of the present application provide a text evaluation method, apparatus, electronic device, and storage medium, where first, a text to be evaluated obtained by machine-translating an original text, and a standard translation text corresponding to the original text are obtained; then, each candidate phrase to be evaluated in the text to be evaluated is subjected to phrase alignment with the standard phrase in the standard translation text to obtain the standard phrase corresponding to each candidate phrase to be evaluated, and the phrase conforming to the sentence grammar structure can be obtained; further, at least one target phrase to be evaluated in each candidate phrase to be evaluated is replaced by a corresponding standard phrase respectively, and the confusion degree change amount of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced is determined; and determining target labels of corresponding target phrases to be evaluated based on the confusion degree variation. The target label of the target phrase to be evaluated is determined through the confusion degree variation of the text to be evaluated before and after the target phrase to be evaluated is replaced, so that the situation that phrases with the same meaning cannot be aligned and marked as errors can be reduced, the accuracy of the label of the phrase to be evaluated is effectively improved, and the quality evaluation accuracy of the machine translation text is improved.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and are not intended to limit the present application, and embodiments and features of embodiments of the present application may be combined with each other without conflict.
Fig. 5 is a schematic view of an application scenario in an embodiment of the present application. The application scenario diagram includes two terminal devices 510 and a server 520.
In the embodiment of the present application, the terminal device 510 includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a desktop computer, an electronic book reader, an intelligent voice interaction device, an intelligent home appliance, a vehicle-mounted terminal, and the like; the terminal device may be provided with a client related to machine translation text quality assessment, where the client may be software (such as a browser, translation software, etc.), or may be a web page, an applet, etc., and the server 520 may be a background server corresponding to the software or the web page, the applet, etc., or a server specifically used for machine translation text quality assessment, which is not specifically limited in this application. The server 520 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like.
Note that, the text evaluation method in the embodiment of the present application may be performed by an electronic device, which may be the server 520 or the terminal device 510, that is, the method may be performed by the server 520 or the terminal device 510 alone, or may be performed by the server 520 and the terminal device 510 together. For example, when the server 520 is executed separately, the server 520 obtains a text to be evaluated obtained by performing machine translation on the original text, and a standard translation text corresponding to the original text; carrying out phrase alignment on each candidate phrase to be evaluated in the text to be evaluated and each standard phrase in the standard translation text to obtain each standard phrase corresponding to each candidate phrase to be evaluated; respectively replacing at least one target phrase to be evaluated in each candidate phrase to be evaluated with a corresponding standard phrase, and determining the confusion degree variation of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced; and finally, determining the target label of the corresponding target phrase to be evaluated based on the variation of the confusion degree.
In an alternative embodiment, the terminal device 510 and the server 520 may communicate via a communication network.
In an alternative embodiment, the communication network is a wired network or a wireless network.
It should be noted that, the number of terminal devices and servers shown in fig. 5 is merely illustrative, and the number of terminal devices and servers is not limited in practice, and is not specifically limited in the embodiments of the present application.
In the embodiment of the present application, when the number of servers is plural, plural servers may be configured as a blockchain, and the servers are nodes on the blockchain; the text evaluation method disclosed by the embodiment of the application, wherein the related original text, the text to be evaluated and the standard translation text can be stored on a blockchain.
In addition, the embodiment of the application can be applied to various scenes, including not only a machine translation quality evaluation scene, but also scenes such as cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.
The text evaluation method provided by the exemplary embodiments of the present application will be described below with reference to the accompanying drawings in conjunction with the application scenario described above, and it should be noted that the application scenario described above is merely shown for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in any way in this respect.
Referring to fig. 6, a flowchart of an implementation of a text evaluation method provided in an embodiment of the present application is shown, where an application server applies the following specific implementation flow of the method:
S601: the method comprises the steps that a server obtains a text to be evaluated obtained by performing machine translation on an original text and a standard translation text corresponding to the original text;
the text to be evaluated and the standard translation text are mainly illustrated by taking the text to be evaluated and the standard translation text as chinese text as the english text, and in practice, the text evaluation method in the present application may be used for evaluating any language type of text, which is not limited herein.
S602: the server performs phrase alignment on each candidate phrase to be evaluated in the text to be evaluated and the standard phrases in the standard translation text to obtain standard phrases corresponding to each candidate phrase to be evaluated;
for example, the text to be evaluated is MT sentence "i am happy is required to speak here" in fig. 3, and the standard translation text is PE sentence "is invited to speak here i am happy" in fig. 3. The correspondence between the candidate phrase under evaluation and the standard phrase may be "i happy" corresponds to "i happy", "requested" corresponds to "invited", "here" corresponds to "here", "speaking" corresponds to "speaking".
S603: the server respectively replaces at least one target phrase to be evaluated in the candidate phrases to be evaluated with a corresponding standard phrase, and determines the confusion degree variation of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced;
the confusion degree characterizes the semantic fluency of the text to be evaluated, a language model can be trained based on a Ken language model tool kit for measuring the rationality and fluency of the text to be evaluated, the training corpus adopts all target sentences in the machine translation parallel corpus, the trained language model can be used for measuring the confusion degree of a given sentence, the higher the confusion degree is, the more fluent and the abnormal condition of the sentence are indicated, and therefore the higher the possibility of containing translation errors is. In addition, the language model can be replaced by other sentence-level scoring models, such as a sentence-level quality assessment model and a phrase-level semantic similarity assessment model.
Specifically, for a certain phrase (target phrase to be evaluated) in a selected MT sentence (text to be evaluated), firstly, a phrase (standard phrase) corresponding to the selected phrase in a PE sentence (standard translation text) can be found according to phrase alignment, then the selected phrase in the MT sentence is replaced by a phrase corresponding to the selected phrase in the PE sentence, the confusion degree of two sentences before and after replacement is calculated by using a language model and is respectively recorded as confusion degree 1 and confusion degree 2, and the change amount of the confusion degree is determined.
For example, still taking the text in fig. 3 as an example, as shown in fig. 7, the target phrase to be evaluated is "speak here", the "speak here" is replaced by the corresponding standard phrase "speak here", the text to be evaluated before replacement is "i very happy is required to speak here". The confusion degree obtained by inputting the text to be evaluated before replacement into the language model is 20.91; the text to be evaluated after the substitution is "i happy and is required to speak here". The confusion degree obtained by inputting the text to be evaluated after the substitution into the language model is 21.38, and the change amount of the confusion degree of the text to be evaluated is 0.47.
S604: the server determines the target labels of the corresponding target phrases to be evaluated based on the confusion degree variation.
For example, the target tag may be "BAD" or "OK", and according to the amount of change in the confusion, it is possible to determine how the phrase replacement affects the smoothness and rationality of the entire sentence.
In the embodiment of the application, the target label of the target phrase to be evaluated is determined through the confusion degree variation of the text to be evaluated before and after the target phrase to be evaluated is replaced, so that phrases or sentence components with the same meaning in the text to be evaluated and the standard translation text can be reduced, and when word level alignment is carried out through a TER tool kit, the condition that the phrases cannot be aligned and marked as errors is caused, the label accuracy of the phrases to be evaluated is effectively improved, and the quality evaluation accuracy of the machine translation text is improved.
Alternatively, after obtaining the text to be evaluated and the standard translation text, step S602 may be implemented based on the following steps:
step 1: based on the to-be-evaluated words in the to-be-evaluated text and the similarity between the standard words in the standard translation text, carrying out word alignment between each to-be-evaluated word and each standard word to obtain each corresponding standard word of each to-be-evaluated word;
specifically, the text to be evaluated is "i am happy and is required to speak here," and the standard translation text is "i am invited to speak here. In this embodiment of the present application, a Fast alignment (Fast alignment) tool is used to perform word alignment, where the alignment result is shown in fig. 8, and the correspondence between the word to be evaluated and the standard word is: "me" corresponds to "me", "very" corresponds to "very", "happy" corresponds to "happy", "quilt" corresponds to "quilt", "ask" corresponds to "invite", "in" corresponds to "in", "here" corresponds to "here", "speaking" corresponds to "speaking", ". ".
Step 2: determining at least one candidate phrase to be evaluated, which is composed of words to be evaluated, aiming at the text to be evaluated;
wherein each candidate phrase to be evaluated contains at least one word to be evaluated, for example, 3 words to be evaluated are contained in "i am happy", and 1 word to be evaluated is contained in "speaking".
Alternatively, candidate phrases to be evaluated are determined in two ways:
determining a first mode: based on the positions of the words to be evaluated in the text to be evaluated, obtaining candidate phrases to be evaluated composed of the words to be evaluated, wherein the positions of adjacent words to be evaluated contained in each candidate phrase to be evaluated in the text to be evaluated are adjacent;
specifically, adjacent phrases to be evaluated in the text to be evaluated are formed into candidate phrases to be evaluated, for example, the candidate phrases to be evaluated may be "i very happy", "required", "here", "speak", "here speak", etc., and the phrases of the category "i speak", "i happy" do not belong to the candidate phrases to be evaluated because they are not phrases formed by the adjacent words to be evaluated. Also, the candidate phrases to be evaluated obtained based on the above manner are possible combinations of all neighboring words to be evaluated, and only 5 candidate words to be evaluated are listed here for illustration.
And a second determination mode: and carrying out component syntactic analysis on the text to be evaluated to obtain phrases which are contained in the text to be evaluated and accord with the specified grammar rules, and taking the phrases which accord with the specified grammar rules as candidate phrases to be evaluated in the text to be evaluated.
Specifically, the text to be evaluated is subjected to component syntactic analysis to obtain a component tree of the text to be evaluated, and each non-leaf node in the component tree represents the most accurate phrase (such as a noun phrase NP, a verb phrase VP, a preposition phrase PP, and the like) conforming to grammar habit, as shown in FIG. 9, wherein the "hit on hook", "on hook" and "hook" are all non-leaf nodes. Namely, the phrases conforming to the specified grammar rules in the embodiment of the application are taken as candidate phrases to be evaluated.
Step 3: and extracting the standard phrases from the standard translation text based on the to-be-evaluated words and the corresponding standard words contained in each candidate to-be-evaluated phrase to obtain the corresponding standard phrases of each candidate to-be-evaluated phrase.
Specifically, in the embodiment of the application, phrase level alignment information is obtained using a phrase extraction algorithm in a natural language processing toolkit (Natural Language Toolkit, NLTK). For example, when the candidate phrase to be evaluated is determined based on the first determination mode, as shown in fig. 10, the correspondence between the candidate phrase to be evaluated and the standard phrase is exemplified by the "required" phrase to be evaluated, and the included words to be evaluated are: "required" and the standard word corresponding to the word to be evaluated "is" required "and the standard word corresponding to the word to be evaluated" required "is" invited ", and the standard phrase corresponding to the phrase to be evaluated" required "is" invited ".
Optionally, after obtaining the candidate phrase to be evaluated and the corresponding standard phrase, determining the target phrase to be evaluated according to whether the candidate phrase to be evaluated has an initial tag or not:
if the candidate phrase to be evaluated has the initial tag, the target phrase to be evaluated is the candidate phrase to be evaluated with the initial tag being wrong.
Wherein the initial tags of the candidate phrases under evaluation may be obtained by:
aiming at a candidate phrase to be evaluated, carrying out word alignment on the word to be evaluated in the candidate phrase to be evaluated and the standard word in the standard translation text based on a preset editing frequency rule; if each word to be evaluated in one candidate phrase to be evaluated is consistent with the corresponding standard word, the initial label of the one candidate phrase to be evaluated is correct; if each word to be evaluated in one candidate phrase to be evaluated is inconsistent with the corresponding standard word, the initial label of the one candidate phrase to be evaluated is wrong.
For example, using the TER toolkit, based on the principle of the least editing times, generating word-by-word alignment relation of MT sentences and PE sentences, as shown in fig. 11, the initial label of the word to be evaluated "is OK (i.e., correct) if the word to be evaluated" is "by" and the initial label of the word to be evaluated "speech" is BAD (i.e., incorrect) if the word to be evaluated "speech" is inconsistent with the standard word "speech".
If the candidate phrase to be evaluated does not have the initial tag, the target phrase to be evaluated is a phrase which is contained in the text to be evaluated and accords with the specified grammar rule, namely the candidate phrase to be evaluated obtained based on the determination mode II is directly used as the target phrase to be evaluated.
Alternatively, after determining the target phrase to be evaluated and determining the amount of confusion change of the text to be evaluated before and after the target phrase to be evaluated is replaced based on step S603, step S604 may be implemented by:
embodiment one: before and after each target phrase to be evaluated in at least one phrase to be evaluated is replaced, the confusion degree variation of the text to be evaluated; if the confusion degree variation is smaller than a first threshold value, setting the target label of the corresponding target phrase to be evaluated to be correct; if the confusion degree variation is not smaller than the first threshold, taking the initial label as the target label, wherein the initial label is as follows: and based on a preset editing frequency rule, aligning the words to be evaluated in the text to be evaluated with the standard words in the standard translation text.
For example, the target phrase to be evaluated, which is initially labeled as BAD in fig. 11, is replaced by the corresponding standard phrase respectively, as shown in fig. 12A, "i am happy", "i am demanding", "i am speaking", is replaced by the corresponding standard phrase "i am happy", "i am inviting", "i am happy" corresponds to the confusion degree variation 1 being 0, and the confusion degree variation 3 corresponding to "speaking" being 0.47, which is smaller than the first threshold 3, indicating that the meaning of the phrase is similar to that of the corresponding phrase in the standard translation text and does not belong to the true translation error, so that the initial label of the phrase is corrected as OK; the confusion degree variation 2 corresponding to the requirement is 10.89 and is larger than the first threshold value 3, and the phrase has a large influence on the translation quality, so that the label of the BAD is reserved, and the initial label BAD is used as a target label.
In addition, the target phrase to be evaluated "i am happy" contains 3 phrases to be evaluated "i am", "i am happy" with initial labels of BAD, and the adjacent phrases to be evaluated with initial labels of BAD can be combined and replaced, and can also be replaced independently, such as "requirement" is replaced by "invitation".
Embodiment two: determining the word confusion degree variation of each target phrase to be evaluated based on the confusion degree variation and the word number of the corresponding target phrase to be evaluated; sequencing the confusion degree variation of each word, and determining the minimum confusion degree variation of each word; if the minimum word confusion degree variation is smaller than the second threshold, the target label of the target phrase to be evaluated corresponding to the minimum word confusion degree variation is wrong; if the minimum word confusion degree variation is not smaller than the second threshold value, all target tags of the target phrases to be evaluated are correct.
Specifically, after obtaining the confusion degree variation Δppl, in order to find the cause translationThe minimum phrase unit with reduced quality can be used as normalization by using phrase length, and the variation of the confusion degree of the normalized words is calculated and recorded as
Figure BDA0003446384170000161
And if the word confusion degree variation of the target phrase to be evaluated is lower than a second threshold, namely, the confusion degree variation of the short word to be evaluated of other targets overlapped with the target phrase to be evaluated is higher than the target phrase to be evaluated, the target phrase to be evaluated is marked as BAD.
For example, the target phrase to be evaluated is "hit on hook", "hook" in fig. 9, the confusion degree variation 1 of the target phrase to be evaluated 1 "hit on hook" is-101.18, the word average confusion degree variation 1 is-20.24, the confusion degree variation 2 of the target phrase to be evaluated 2 "hit on hook" is-101.18, the word average confusion degree variation 2 is-25.30, the confusion degree variation 3 of the target phrase to be evaluated 3 "on hook" is-20.82, the word average confusion degree variation 3 is-6.94, the confusion degree variation 4 of the target phrase to be evaluated 4 "hook" is-20.82, and the average confusion degree variation 4 is-20.82. The word-average confusion degree variation is ordered, the smallest being the word-average confusion degree variation 2, and below the second threshold-3, so the "hit on hook" is labeled as BAD.
The embodiment of the application provides two label rewriting strategies, namely 'TER label correction' and 'grammar tree-based label generation', and can automatically generate a large amount of high-quality training corpus for a machine translation quality assessment task. The TER label correction strategy is based on phrase alignment, phrase replacement and language model scoring, and label correction is carried out according to the change condition of confusion degree after phrase replacement; the grammar tree-based tag generation strategy is another variant, component grammar trees are further introduced, selective phrase replacement is performed according to the structure of the grammar tree, and tag generation is performed according to the confusion degree change of the replaced words. The label rewriting strategy in the application focuses on solving the problem that the label generating strategy based on the TER toolkit exists and is inconsistent with manual judgment.
Referring to fig. 12B, a comparison schematic diagram of a result of text evaluation and direct human evaluation based on the text evaluation method in the present application, in which the bolded text indicates that there is a translation error, that is, the tag is BAD, as can be seen from fig. 12B, the problem that the machine translation text label generated in the related technology is inconsistent with human judgment is effectively improved, and the text evaluation method in the present application is applied to MTQE, so that the performance of MTQE can be greatly improved, and in particular, the consistency of the predicted tag and human judgment is improved.
Optionally, after obtaining the target tag of the target phrase to be evaluated in the text to be evaluated based on the text evaluation method in the application, a parallel corpus may be constructed based on the text to be evaluated obtained by performing machine translation on the original text, the standard translation text corresponding to the text to be evaluated, and the target tag corresponding to each target phrase to be evaluated in the text to be evaluated; and training the model based on the parallel corpus to obtain a trained machine translation quality evaluation model.
The machine translation quality evaluation model is used for performing quality evaluation labeling on machine translation texts obtained by machine translation, an MTQE model trained by parallel corpus data is built based on target labels obtained by the method, and a prediction result accords with human judgment criteria better than a TER automatic generation method, so that the effectiveness of the method is fully embodied.
Based on the label rewriting strategy, a large amount of high-quality MTQE training corpus can be generated, so that the effect of an MTQE model is remarkably improved, and especially the consistency of model prediction and human judgment is improved.
In the embodiment of the application, on one hand, the method can be applied to a word level machine translation quality evaluation function, the function marks out a part of MT sentences generated by a machine translation model, which possibly has errors, prompts a user, and through introducing a tag rewrite strategy, the sentence components misjudged as BADs are revised to OK based on technologies such as phrase alignment, component syntactic analysis and language model, and meanwhile, the syntactic structure information of the translation sentences is introduced, so that the generated BAD labels are integrated, and the problem that the generated MTQE tags are inconsistent with human cognition is effectively avoided; on the other hand, the MTQE training corpus can be automatically constructed by using the parallel corpus of machine translation to the greatest extent, and a large amount of high-quality corpus similar to human judgment is automatically generated, so that the model has the basis of good generalization and robustness, and the performance of the MTQE model is obviously improved.
The text evaluation method in the application respectively carries out model automatic evaluation and manual evaluation on a translation quality evaluation (Direct Assessment Quality Estimation, DAQE) data set reflecting human direct labeling of human judgment (Direct Assessment, DA), the model automatic evaluation result is shown in table 1, and the manual evaluation result carried out on the model output result is shown in table 2.
TABLE 1
Figure BDA0003446384170000181
The DAQE represents a translation quality evaluation data set directly marked by human, and experimental results show that compared with data generated by a method for automatically generating marks based on a TER tool kit in the related art, the method provided by the application can greatly improve the performance, the MCC index (index for measuring classification performance) in the quality evaluation task of English translation is integrally improved by 2.85, and the MCC index in the quality evaluation task of English translation is integrally improved by 2.24.
TABLE 2
Figure BDA0003446384170000191
In the manual evaluation experiment, a model (text evaluation method column) trained on the basis of the training corpus generated by the text evaluation method in the application is more in accordance with human judgment criteria in a prediction result than a model (TER tool pack column) trained on the basis of the labeling data generated by the TER tool pack in the related art, the model is improved by 0.61 on the Ind translation and 0.85 on the English translation, according to the last line of the table 2, the score (DA) based on the text evaluation method is higher than the score (TER) based on the TER tool pack in the score of the trained model, and the experimental result fully reflects the effectiveness of the text evaluation method in the application.
Referring to fig. 13, which is a logic diagram of a text evaluation method according to an embodiment of the present application, a specific implementation of the text evaluation method according to the present application is described below with reference to fig. 13.
First, input the text to be evaluated, "i am happy to be asked to speak here. "and standard translation text" is invited where speaking me is happy. "wherein the candidate phrases to be evaluated for which the initial tag is BAD are: "I happy", "require", "speak"; phrase alignment is then performed: "i happy" corresponds to "i happy", "by" corresponds to "by", "required" corresponds to "invite", "here" corresponds to "here", "speaking" corresponds to "speak"; phrase replacement is performed: the method comprises the steps of replacing the phrases to be evaluated, "I happy", "required" and "speaking" of a target with corresponding standard phrases respectively, calculating the confusion degree variation before and after the phrase is replaced as 0, calculating the confusion degree variation before and after the phrase is replaced as 10.89, calculating the confusion degree variation before and after the phrase is replaced as 0.47, and calculating the first threshold as 3, so that the target labels of the phrase "I happy" and the phrase "speaking" are corrected to OK, the target label of the phrase "required" is still set as BAD, and finally outputting a text evaluation result "I happy is required to speak here. ", the phrase labeled BAD is bolded to indicate to the user that there may be a translation error.
Based on the same inventive concept, the embodiment of the application also provides a text evaluation device. As shown in fig. 14, which is a schematic structural diagram of the text evaluation apparatus 1400, may include:
an obtaining unit 1401, configured to obtain a text to be evaluated obtained by performing machine translation on an original text, and a standard translation text corresponding to the original text;
an alignment unit 1402, configured to perform phrase alignment on each candidate phrase to be evaluated in the text to be evaluated and each standard phrase in the standard translation text to obtain a standard phrase corresponding to each candidate phrase to be evaluated;
a replacing unit 1403, configured to replace at least one target phrase to be evaluated in each candidate phrase to be evaluated with a corresponding standard phrase, and determine a confusion degree variation of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced, where the confusion degree characterizes semantic fluency of the text to be evaluated;
a determining unit 1404, configured to determine a target tag of the corresponding target phrase to be evaluated based on the confusion degree variation.
Optionally, the alignment unit 1402 is specifically configured to:
based on the to-be-evaluated words in the to-be-evaluated text and the similarity between the standard words in the standard translation text, carrying out word alignment between each to-be-evaluated word and each standard word to obtain each corresponding standard word of each to-be-evaluated word;
Determining at least one candidate phrase to be evaluated consisting of words to be evaluated aiming at the text to be evaluated, wherein each candidate phrase to be evaluated comprises at least one word to be evaluated;
and extracting the standard phrases from the standard translation text based on the to-be-evaluated words and the corresponding standard words contained in each candidate to-be-evaluated phrase to obtain the corresponding standard phrases of each candidate to-be-evaluated phrase.
Optionally, the alignment unit 1402 is specifically configured to:
based on the positions of the words to be evaluated in the text to be evaluated, obtaining candidate phrases to be evaluated composed of the words to be evaluated, wherein the positions of adjacent words to be evaluated contained in each candidate phrase to be evaluated in the text to be evaluated are adjacent; or alternatively, the process may be performed,
and carrying out component syntactic analysis on the text to be evaluated to obtain phrases which are contained in the text to be evaluated and accord with the specified grammar rules, and taking the phrases which accord with the specified grammar rules as candidate phrases to be evaluated in the text to be evaluated.
Optionally, the target phrase under evaluation includes at least one of:
the initial tag is the wrong candidate phrase to be evaluated, and the initial tag is: based on a preset editing frequency rule, aligning the words to be evaluated in the text to be evaluated with the standard words in the standard translation text;
And carrying out component syntactic analysis on the text to be evaluated to obtain phrases which are contained in the text to be evaluated and accord with the specified grammar rules.
Optionally, the determining unit 1404 is specifically configured to:
before and after replacing each target phrase to be evaluated in at least one phrase to be evaluated, the confusion degree variation of the text to be evaluated;
if the confusion degree variation is smaller than a first threshold value, setting the target label of the corresponding target phrase to be evaluated to be correct;
if the confusion degree variation is not smaller than the first threshold value, taking the initial label as the target label initial label is as follows: and based on a preset editing frequency rule, aligning the words to be evaluated in the text to be evaluated with the standard words in the standard translation text.
Optionally, the initial tag for each target phrase under evaluation is determined in the following manner:
aiming at a target phrase to be evaluated, carrying out word alignment on the word to be evaluated in the target phrase to be evaluated and the standard word in the standard translation text based on a preset editing frequency rule;
if each word to be evaluated in one target phrase to be evaluated is consistent with the corresponding standard word, the initial label of the one target phrase to be evaluated is correct;
If each word to be evaluated in one target phrase to be evaluated is inconsistent with the corresponding standard word, the initial label of the one target phrase to be evaluated is wrong.
Optionally, the determining unit 1404 is specifically configured to:
determining the word confusion degree variation of each target phrase to be evaluated based on the confusion degree variation and the word number of the corresponding target phrase to be evaluated;
sequencing the confusion degree variation of each word, and determining the minimum confusion degree variation of each word;
if the minimum word confusion degree variation is smaller than the second threshold, the target label of the target phrase to be evaluated corresponding to the minimum word confusion degree variation is wrong;
if the minimum word confusion degree variation is not smaller than the second threshold value, all target tags of the target phrases to be evaluated are correct.
Optionally, after determining the target tags of the corresponding target phrases to be evaluated based on the respective confusion degree variation, the apparatus further comprises a construction unit 1405 for:
constructing parallel corpus based on text to be evaluated obtained by machine translation of the original text, standard translation text corresponding to the text to be evaluated and target labels corresponding to all target phrases to be evaluated in the text to be evaluated;
Model training is carried out based on parallel corpus, a trained machine translation quality evaluation model is obtained, and the machine translation quality evaluation model is used for carrying out quality evaluation labeling on machine translation texts obtained by machine translation.
In the embodiment of the application, on one hand, the method can be applied to a word level machine translation quality evaluation function, the function marks out a part of MT sentences generated by a machine translation model, which possibly has errors, prompts a user, and through introducing a tag rewrite strategy, the sentence components misjudged as BADs are revised to OK based on technologies such as phrase alignment, component syntactic analysis and language model, and meanwhile, the syntactic structure information of the translation sentences is introduced, so that the generated BAD labels are integrated, and the problem that the generated MTQE tags are inconsistent with human cognition is effectively avoided; on the other hand, the MTQE training corpus can be automatically constructed by using the parallel corpus of machine translation to the greatest extent, and a large amount of high-quality corpus similar to human judgment is automatically generated, so that the model has the basis of good generalization and robustness, and the performance of the MTQE model is obviously improved.
For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.
Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
The embodiment of the application also provides electronic equipment based on the same inventive concept as the embodiment of the method. In one embodiment, the electronic device may be a server, such as server 520 shown in FIG. 5. In this embodiment, the structure of the electronic device may include a memory 1501, a communication module 1503, and one or more processors 1502 as shown in fig. 15.
A memory 1501 for storing computer programs executed by the processor 1502. The memory 1501 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, a program required for running an instant communication function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.
The memory 1501 may be a volatile memory (RAM) such as a random-access memory (RAM); the memory 1501 may also be a nonvolatile memory (non-volatile memory), such as a read-only memory, a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); or memory 1501, is any other medium capable of carrying or storing a desired computer program in the form of instructions or data structures and capable of being accessed by a computer, but is not limited thereto. The memory 1501 may be a combination of the above memories.
The processor 1502 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. A processor 1502 for implementing the above-described text evaluation method when calling a computer program stored in the memory 1501.
The communication module 1503 is used for communicating with the terminal device and other servers.
The specific connection medium between the memory 1501, the communication module 1503, and the processor 1502 is not limited in the embodiments of the present application. In the embodiment of the present application, the memory 1501 and the processor 1502 are connected by the bus 1504 in fig. 15, and the bus 1504 is depicted in a bold line in fig. 15, and the connection manner between other components is only schematically illustrated, but not limited to. The bus 1504 may be divided into an address bus, a data bus, a control bus, and the like. For ease of description, only one thick line is depicted in fig. 15, but only one bus or one type of bus is not depicted.
The memory 1501 stores therein a computer storage medium in which computer executable instructions for implementing the text evaluation method of the embodiment of the present application are stored. The processor 1502 is configured to perform the text evaluation method described above, as shown in fig. 6.
In another embodiment, the electronic device may also be other electronic devices, such as terminal device 510 shown in fig. 5. In this embodiment, the structure of the electronic device may include, as shown in fig. 16: communication component 1610, memory 1620, display unit 1630, camera 1640, sensor 1650, audio circuitry 1660, bluetooth module 1670, processor 1680, and the like.
The communication component 1610 is for communicating with a server. In some embodiments, a circuit wireless fidelity (Wireless Fidelity, wiFi) module may be included, where the WiFi module belongs to a short-range wireless transmission technology, and the electronic device may help the user to send and receive information through the WiFi module.
Memory 1620 may be used to store software programs and data. The processor 1680 performs various functions of the terminal device 510 and data processing by executing software programs or data stored in the memory 1620. The memory 1620 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. The memory 1620 stores an operating system that enables the terminal device 510 to operate. The memory 1620 in this application may store an operating system and various applications, and may also store code for performing the text evaluation methods of embodiments of the present application.
The display unit 1630 may also be used to display information input by a user or information provided to the user and a graphical user interface (graphical user interface, GUI) of various menus of the terminal device 510. In particular, the display unit 1630 may include a display screen 1632 disposed on the front side of the terminal device 510. The display 1632 may be configured in the form of a liquid crystal display, light emitting diodes, or the like. The display unit 1630 may be used to display a machine translation user interface or the like in the embodiments of the present application.
The display unit 1630 may also be used to receive input numeric or character information, generate signal inputs related to user settings and function control of the terminal device 510, and in particular, the display unit 1630 may include a touch screen 1631 disposed on the front of the terminal device 510, and may collect touch operations on or near the user, such as clicking buttons, dragging scroll boxes, and the like.
The touch screen 1631 may be covered on the display screen 1632, or the touch screen 1631 may be integrated with the display screen 1632 to implement input and output functions of the terminal device 510, and after integration, the touch screen may be abbreviated as touch screen. The display unit 1630 may display application programs and corresponding operation steps.
The camera 1640 may be used to capture still images, and a user may post comments on the image captured by the camera 1640 through an application. The camera 1640 may be one or a plurality of cameras. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive elements convert the optical signals to electrical signals, which are then passed to the processor 1680 for conversion to digital image signals.
The terminal device may further include at least one sensor 1650, such as an acceleration sensor 1651, a distance sensor 1652, a fingerprint sensor 1653, a temperature sensor 1654. The terminal device may also be configured with other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, light sensors, motion sensors, and the like.
Audio circuitry 1660, speakers 1661, and microphone 1662 may provide an audio interface between the user and the terminal device 510. The audio circuit 1660 may transmit the received electrical signal converted from audio data to the speaker 1661, and convert the electrical signal into an audio signal by the speaker 1661 to be output. The terminal device 510 may also be configured with a volume button for adjusting the volume of the sound signal. On the other hand, the microphone 1662 converts the collected sound signals into electrical signals, which are received by the audio circuit 1660 and converted into audio data, which are output to the communication component 1610 for transmission to, for example, another terminal device 510, or to the memory 1620 for further processing.
The bluetooth module 1670 is used to exchange information with other bluetooth devices having bluetooth modules through bluetooth protocols. For example, the terminal device may establish a bluetooth connection with a wearable electronic device (e.g., a smart watch) that also has a bluetooth module through bluetooth module 1670, thereby performing data interaction.
The processor 1680 is a control center of the terminal device, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs stored in the memory 1620 and calling data stored in the memory 1620. In some embodiments, the processor 1680 may include one or more processing units; the processor 1680 may also integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., and a baseband processor that primarily handles wireless communications. It will be appreciated that the baseband processor described above may not be integrated into the processor 1680. Processor 1680 in the present application may run an operating system, application programs, user interface displays, and touch responses, as well as text evaluation methods of embodiments of the present application. In addition, a processor 1680 is coupled to the display unit 1630.
In some possible embodiments, aspects of the text evaluation method provided herein may also be implemented in the form of a program product comprising a computer program for causing an electronic device to perform the steps of the text evaluation method according to the various exemplary embodiments of the present application described herein above when the program product is run on the electronic device, e.g. the electronic device may perform the steps as shown in fig. 6.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The program product of embodiments of the present application may employ a portable compact disc read only memory (CD-ROM) and comprise a computer program and may run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.
The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.
A computer program embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer programs for performing the operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer program may execute entirely on the user's computing device, partly on the user's equipment, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.
Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having a computer-usable computer program embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program commands may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the commands executed by the processor of the computer or other programmable data processing apparatus produce means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program commands may also be stored in a computer readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the commands stored in the computer readable memory produce an article of manufacture including command means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (12)

1. A method of text evaluation, the method comprising:
obtaining a text to be evaluated obtained by performing machine translation on an original text and a standard translation text corresponding to the original text;
carrying out phrase alignment on each candidate phrase to be evaluated in the text to be evaluated and the standard phrases in the standard translation text to obtain standard phrases corresponding to each candidate phrase to be evaluated;
respectively replacing at least one target phrase to be evaluated in the candidate phrases to be evaluated with a corresponding standard phrase, and determining the confusion degree variation of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced, wherein the confusion degree characterizes the semantic fluency of the text to be evaluated;
and determining target labels of corresponding target phrases to be evaluated based on the confusion degree variation.
2. The method of claim 1, wherein said phrase aligning each candidate phrase under evaluation in the text under evaluation with a standard phrase in the standard translation text to obtain a standard phrase corresponding to each candidate phrase under evaluation, comprises:
based on the to-be-evaluated words in the to-be-evaluated text and the similarity between the standard words in the standard translation text, carrying out word alignment between each to-be-evaluated word and each standard word to obtain each corresponding standard word of each to-be-evaluated word;
determining at least one candidate phrase to be evaluated consisting of the words to be evaluated aiming at the text to be evaluated, wherein each candidate phrase to be evaluated comprises at least one word to be evaluated;
and extracting the standard phrases from the standard translation text based on the words to be evaluated and the corresponding standard words contained in each candidate phrase to be evaluated, and obtaining the standard phrases corresponding to each candidate phrase to be evaluated.
3. The method of claim 2, wherein the determining at least one candidate phrase under evaluation consisting of the word under evaluation for the text under evaluation comprises:
based on the positions of the words to be evaluated in the text to be evaluated, obtaining candidate phrases to be evaluated composed of the words to be evaluated, wherein the positions of adjacent words to be evaluated contained in each candidate phrase to be evaluated in the text to be evaluated are adjacent; or alternatively, the process may be performed,
And carrying out component syntactic analysis on the text to be evaluated to obtain phrases which are contained in the text to be evaluated and accord with the specified grammar rules, and taking the phrases which accord with the specified grammar rules as candidate phrases to be evaluated in the text to be evaluated.
4. The method of claim 1, wherein the target phrase under evaluation comprises at least one of:
the initial tag is a wrong candidate phrase to be evaluated, and the initial tag is: based on a preset editing frequency rule, aligning the words to be evaluated in the text to be evaluated with the standard words in the standard translation text;
and carrying out component syntactic analysis on the text to be evaluated, wherein the obtained phrases which are contained in the text to be evaluated and accord with the specified grammar rules.
5. The method of claim 1, wherein the determining the target tag for the corresponding target phrase under evaluation based on the respective confusion degree variation comprises:
before and after replacing each target phrase to be evaluated in the at least one phrase to be evaluated, the confusion degree variation of the text to be evaluated;
if the confusion degree variation is smaller than a first threshold value, setting a target label of a corresponding target phrase to be evaluated to be correct;
If the confusion degree variation is not smaller than a first threshold, taking an initial label as a target label, wherein the initial label is as follows: and aligning the words to be evaluated in the text to be evaluated with the standard words in the standard translation text based on a preset editing frequency rule.
6. The method of claim 4 or 5, wherein the initial tag for each target phrase under evaluation is determined by:
aiming at one target phrase to be evaluated, carrying out word alignment on the word to be evaluated in the target phrase to be evaluated and the standard word in the standard translation text based on a preset editing frequency rule;
if each word to be evaluated in the target phrase to be evaluated is consistent with the corresponding standard word, the initial label of the target phrase to be evaluated is correct;
if each word to be evaluated in the target phrase to be evaluated is inconsistent with the corresponding standard word, the initial label of the target phrase to be evaluated is wrong.
7. The method of claim 1, wherein the determining the target tag for the corresponding target phrase under evaluation based on the respective confusion degree variation comprises:
Determining the word confusion degree variation of each target phrase to be evaluated based on the confusion degree variation and the word number of the corresponding target phrase to be evaluated;
sequencing the confusion degree variation of each word, and determining the minimum confusion degree variation of each word;
if the minimum word confusion degree variation is smaller than a second threshold, the target label of the target phrase to be evaluated corresponding to the minimum word confusion degree variation is wrong;
if the minimum word confusion degree variation is not smaller than a second threshold value, all target tags of the target phrases to be evaluated are correct.
8. The method of any one of claims 1-5, 7, further comprising, after said determining target tags for respective target phrases under evaluation based on respective confusion degree variations:
constructing parallel corpus based on a text to be evaluated obtained by performing machine translation on an original text, a standard translation text corresponding to the text to be evaluated and target labels corresponding to target phrases to be evaluated in the text to be evaluated;
model training is carried out based on the parallel corpus, a trained machine translation quality evaluation model is obtained, and the machine translation quality evaluation model is used for carrying out quality evaluation labeling on machine translation texts obtained by machine translation.
9. A text evaluation apparatus, comprising:
the acquisition unit is used for acquiring a text to be evaluated obtained by performing machine translation on the original text and a standard translation text corresponding to the original text;
the alignment unit is used for carrying out phrase alignment on each candidate phrase to be evaluated in the text to be evaluated and the standard phrases in the standard translation text to obtain standard phrases corresponding to each candidate phrase to be evaluated;
the replacing unit is used for replacing at least one target phrase to be evaluated in the candidate phrases to be evaluated with corresponding standard phrases respectively, determining the confusion degree variation of the text to be evaluated before and after each target phrase to be evaluated in the at least one target phrase to be evaluated is replaced, and representing the semantic fluency of the text to be evaluated;
and the determining unit is used for determining the target label of the corresponding target phrase to be evaluated based on the confusion degree variation.
10. An electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 8.
11. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method according to any one of claims 1-8 when said computer program is run on the electronic device.
12. A computer program product comprising a computer program, the computer program being stored on a computer readable storage medium; when the computer program is read from the computer readable storage medium by a processor of an electronic device, the processor executes the computer program, causing the electronic device to perform the steps of the method of any one of claims 1-8.
CN202111649387.XA 2021-12-30 2021-12-30 Text evaluation method and device, electronic equipment and storage medium Pending CN116432666A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111649387.XA CN116432666A (en) 2021-12-30 2021-12-30 Text evaluation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111649387.XA CN116432666A (en) 2021-12-30 2021-12-30 Text evaluation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116432666A true CN116432666A (en) 2023-07-14

Family

ID=87080205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111649387.XA Pending CN116432666A (en) 2021-12-30 2021-12-30 Text evaluation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116432666A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095302A (en) * 2024-04-26 2024-05-28 四川交通运输职业学校 Auxiliary translation method and system based on computer

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095302A (en) * 2024-04-26 2024-05-28 四川交通运输职业学校 Auxiliary translation method and system based on computer

Similar Documents

Publication Publication Date Title
EP3832519A1 (en) Method and apparatus for evaluating translation quality
US11748232B2 (en) System for discovering semantic relationships in computer programs
US10599767B1 (en) System for providing intelligent part of speech processing of complex natural language
CN108985358B (en) Emotion recognition method, device, equipment and storage medium
CN108170749B (en) Dialog method, device and computer readable medium based on artificial intelligence
WO2020119075A1 (en) General text information extraction method and apparatus, computer device and storage medium
US10831762B2 (en) Extracting and denoising concept mentions using distributed representations of concepts
US9805718B2 (en) Clarifying natural language input using targeted questions
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
US11010284B1 (en) System for understanding navigational semantics via hypothesis generation and contextual analysis
US10977155B1 (en) System for providing autonomous discovery of field or navigation constraints
CN115618045B (en) Visual question answering method, device and storage medium
CN113704460B (en) Text classification method and device, electronic equipment and storage medium
CN109034203A (en) Training, expression recommended method, device, equipment and the medium of expression recommended models
US11100297B2 (en) Provision of natural language response to business process query
US20220414463A1 (en) Automated troubleshooter
US11669679B2 (en) Text sequence generating method and apparatus, device and medium
CN112465144A (en) Multi-modal demonstration intention generation method and device based on limited knowledge
CN111126084B (en) Data processing method, device, electronic equipment and storage medium
CN116432611A (en) Manuscript writing auxiliary method, system, terminal and storage medium
CN110442877B (en) Using robotic planning as a parallel language corpus
CN109408175B (en) Real-time interaction method and system in general high-performance deep learning calculation engine
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
CN116432666A (en) Text evaluation method and device, electronic equipment and storage medium
CN113505786A (en) Test question photographing and judging method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40090583

Country of ref document: HK