CN111126087A - Domain translation processing method, device and equipment - Google Patents

Domain translation processing method, device and equipment Download PDF

Info

Publication number
CN111126087A
CN111126087A CN201911352107.1A CN201911352107A CN111126087A CN 111126087 A CN111126087 A CN 111126087A CN 201911352107 A CN201911352107 A CN 201911352107A CN 111126087 A CN111126087 A CN 111126087A
Authority
CN
China
Prior art keywords
corpus
keywords
model
keyword
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911352107.1A
Other languages
Chinese (zh)
Other versions
CN111126087B (en
Inventor
张睿卿
熊皓
何中军
李芝
吴华
王海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911352107.1A priority Critical patent/CN111126087B/en
Publication of CN111126087A publication Critical patent/CN111126087A/en
Application granted granted Critical
Publication of CN111126087B publication Critical patent/CN111126087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a field translation processing method, a field translation processing device and field translation processing equipment, which relate to the technical field of artificial intelligence, and the specific implementation scheme is as follows: acquiring a keyword set of a target field, wherein the keyword set comprises keywords of a target language; searching according to the key words to obtain a first corpus of the target language; judging whether a preset condition is met or not according to the first corpus, and if the preset condition is met, performing translation processing according to the first corpus to obtain a second corpus of the source language; and adjusting the processing parameters of the preset model according to the first corpus and the second corpus to generate a domain translation model of the target domain. Therefore, the domain self-adaptive translation is realized, the manual participation amount is reduced, the labor cost is reduced, and the processing efficiency is improved.

Description

Domain translation processing method, device and equipment
Technical Field
The application relates to the technical field of computers, in particular to the technical field of artificial intelligence, and provides a field translation processing method, a field translation processing device and field translation processing equipment.
Background
At present, a general machine translation can translate a text in a source language to a target language, and when a text related to a certain field is translated, problems of translation errors, translation incoherence and the like can occur when the text content in a specific field is translated due to factors such as polysemous words and the integral incomprehension of the text. Therefore, how to obtain more accurate translation results in domain translation is a research direction of machine translation.
In the related technology, the parallel corpora in the field are marked in a manual mode, translation model fine adjustment is carried out according to the marked parallel corpora, and the related scheme is large in manual marking workload, high in labor cost and low in efficiency.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a domain translation processing method, so as to implement domain adaptive translation, reduce the amount of human involvement, reduce labor cost, and improve processing efficiency.
A second object of the present application is to provide a domain translation processing apparatus.
A third object of the present application is to provide an electronic device.
A fourth object of the present application is to propose a computer readable storage medium.
An embodiment of a first aspect of the present application provides a domain translation processing method, including:
acquiring a keyword set of a target field, wherein the keyword set comprises keywords of a target language;
searching according to the keywords to obtain a first corpus of the target language;
judging whether a preset condition is met or not according to the first corpus, and if the preset condition is met, performing translation processing according to the first corpus to obtain a second corpus of the source language;
and adjusting the processing parameters of a preset model according to the first corpus and the second corpus to generate a domain translation model of the target domain.
In addition, the domain translation processing method according to the above embodiment of the present application may further have the following additional technical features:
optionally, after determining whether a preset condition is met according to the first corpus, the method further includes: if the preset condition is not met, extracting candidate keywords according to the first corpus; and determining a target keyword from the candidate keywords according to the word frequency and the inverse text frequency index of the candidate keywords, and adding the target keyword to the keyword set.
Optionally, before the retrieving according to the keyword, the method further includes: obtaining word vectors of the keywords, and classifying the keywords according to the word vectors;
the searching according to the keywords comprises the following steps: and for each classified category, extracting at least one keyword, and searching according to the at least one keyword.
Optionally, the determining whether a preset condition is met according to the first corpus includes: and obtaining the number of sentences of the first corpus, and determining that the preset condition is met if the number of sentences is greater than a preset threshold value.
Optionally, the adjusting, according to the first corpus and the second corpus, processing parameters of a preset model to generate a domain translation model of the target domain includes: training the preset model processing parameters according to the first corpus and the second corpus to generate an adjusting model; and carrying out model average processing according to the adjusting model and the preset model to generate the field translation model.
An embodiment of a second aspect of the present application provides a domain translation processing apparatus, including:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a keyword set of a target field, and the keyword set comprises keywords of a target language;
the retrieval module is used for retrieving according to the keywords to obtain a first corpus of the target language;
the processing module is used for judging whether a preset condition is met or not according to the first corpus, and if the preset condition is met, performing translation processing according to the first corpus to obtain a second corpus of the source language;
and the generating module is used for adjusting the processing parameters of a preset model according to the first corpus and the second corpus and generating a domain translation model of the target domain.
In addition, the domain translation processing method according to the above embodiment of the present application may further have the following additional technical features:
optionally, the apparatus further comprises: the expansion module is used for extracting candidate keywords according to the first corpus if the preset condition is not met; and determining a target keyword from the candidate keywords according to the word frequency and the inverse text frequency index of the candidate keywords, and adding the target keyword to the keyword set.
Optionally, the apparatus further comprises: the classification module is used for acquiring word vectors of the keywords and classifying the keywords according to the word vectors;
the retrieval module is specifically configured to: and for each classified category, extracting at least one keyword, and searching according to the at least one keyword.
Optionally, the processing module is specifically configured to: and obtaining the number of sentences of the first corpus, and determining that the preset condition is met if the number of sentences is greater than a preset threshold value.
Optionally, the generating module is specifically configured to: training the preset model processing parameters according to the first corpus and the second corpus to generate an adjusting model; and carrying out model average processing according to the adjusting model and the preset model to generate the field translation model.
The embodiment of the third aspect of the present application provides an electronic device, which includes at least one processor, and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the domain translation processing method according to the embodiment of the first aspect.
A fourth aspect of the present application provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the domain translation processing method according to the first aspect.
One embodiment in the above application has the following advantages or benefits: because the keyword set of the target field is obtained, the keyword set comprises the keywords of the target language. And then, searching according to the keywords to obtain the first corpus of the target language. And further, judging whether a preset condition is met or not according to the first language material, and if the preset condition is met, performing translation processing according to the first language material to obtain a second language material of the source language. And adjusting the processing parameters of the preset model according to the first corpus and the second corpus to generate a domain translation model of the target domain. Therefore, parallel corpora in the field can be obtained according to the keywords to adjust the translation model, field self-adaptive translation is achieved, the manual participation amount is reduced, the labor cost is reduced, and the processing efficiency is improved.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is a schematic flowchart of a domain translation processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another domain translation processing method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a model adjustment process according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a domain translation processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of another domain translation processing apparatus according to an embodiment of the present application;
FIG. 6 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In practical application, when performing machine translation, the text in the source language can be translated to the target language through a pre-trained translation model. For example, for the source language text "You've go texture" You've go right, while the while is basic typing the bones of the computers in the source language text "You have obtained texture through the general translation model. You have control over essentially putting the bones into the character. ". The text is the content of producing character animation in the CAD (Computer Aided Design) field, and the translation to the target language is 'you have texture and can control, so that the character animation can be basically produced into a skeleton'. Therefore, the problems of translation errors, inconsistent translation and the like can be caused when translating in a specific field due to the ambiguous word and the incomprehension to the whole text.
In the related art, in order to make the translation result obtained by the translation model during the domain translation more accurate, the parallel corpora in the domain are usually labeled manually, and the translation model is finely tuned according to the labeled parallel corpora. The manual marking workload in the related scheme is large, the labor cost is high, and the efficiency is low.
Fig. 1 is a schematic flowchart of a domain translation processing method according to an embodiment of the present application, and as shown in fig. 1, the method includes:
step 101, acquiring a keyword set of a target field, wherein the keyword set comprises keywords of a target language.
In this embodiment, the keywords of the target field may be obtained and stored in the keyword set, and optionally, the keywords are in the form of a target language. The target domain may be set as needed, for example, when translation is performed for the CAD domain, the target domain may include the CAD domain.
As a possible implementation manner, for a target field, keywords in the source language may be preset, and the keywords in the source language are translated to the target language through a translation model trained in advance, so as to generate a keyword set in the target field. For example, taking the CAD field as an example, the source language is english, the target language is chinese, keywords GPU (Graphics Processing Unit) of the source language, averaging, file, shot, render, animation are preset, and then the keywords are translated into the target language, and a keyword set [ GPU, acceleration, movie, shot, rendering, animation ] is obtained. Alternatively, a vocabulary without ambiguous words in the target domain may be selected as the keyword.
As another possible implementation manner, for the target domain, keywords of the target language may be set, and then a keyword set of the target domain is generated. For example, in the CAD field, the source language is english, the target language is chinese, and keywords are preset to generate a keyword set [ GPU, acceleration, movie, shot, rendering, animation ].
And 102, retrieving according to the keywords to obtain a first corpus of the target language.
In this embodiment, after the keywords of the target language are obtained, the search may be performed according to the keywords to obtain the first corpus of the target language. As an example, the search is performed according to the keywords "movie" and "rendering", and the text of the target language containing the keywords "movie" and "rendering" is obtained as the first corpus.
In an embodiment of the present application, any number of keywords in the keyword set may be selected for retrieval, so as to obtain the first corpus of the target language.
In one embodiment of the present application, before searching according to the keywords, word vectors of the keywords are obtained, and the keywords are classified according to the word vectors. And then, for each classified category, extracting at least one keyword, and searching according to the at least one keyword.
And 103, judging whether a preset condition is met or not according to the first corpus, and if the preset condition is met, performing translation processing according to the first corpus to obtain a second corpus of the source language.
In this embodiment, a preset condition for stopping the search may be preset, and when the preset condition is satisfied, the search of the corpus according to the keyword is stopped. And then, performing translation processing according to the first language material to obtain a second language material of the source language.
Optionally, the preset condition includes whether the number of the sentence corpora in the first corpus is greater than a preset threshold N, that is, the number of the sentences in the first corpus is obtained, and if the number of the sentences is greater than the preset threshold, it is determined that the preset condition is satisfied. For example, the first corpus is obtained according to keyword search, and when the corpus (sentences) of more than N sentences is collected, the search for the keyword is stopped, where N may be 2000 or more, and may also be set according to the size of the model, which is not limited herein.
As an example, when a preset condition is met, performing a translation process according to the first corpus to obtain a second corpus of the source language. Taking an example that a target language is Chinese and a source language is English, a first language material obtained by keyword retrieval comprises ' You have texture ', and is translated to a corresponding English ' You've go texture ' as a second language material. Therefore, the corresponding second linguistic data of the source language can be obtained according to the first linguistic data of the target language in the field and used as the parallel linguistic data of model training.
And 104, adjusting the processing parameters of the preset model according to the first corpus and the second corpus to generate a domain translation model of the target domain.
In this embodiment, the parallel corpus of the model training is obtained by obtaining the first corpus of the target language and translating to the second corpus of the source language according to the first corpus. Optionally, the preset model comprises a pre-trained translation model, and the preset model is subjected to model fine tuning by taking the first corpus and the corresponding second corpus as training samples, so that a domain translation model of the target domain is generated.
The domain translation model is used for translating the text of the source language into the target language.
In an embodiment of the present application, adjusting a processing parameter of a preset model according to a first corpus and a second corpus to generate a domain translation model of a target domain, includes: and training preset model processing parameters according to the first corpus and the second corpus to generate an adjusting model. And carrying out model average processing according to the adjustment model and the preset model to generate a domain translation model.
As an example, the first corpus and the second corpus are used as training samples, and training is performed on the basis of a preset model to generate an adjustment model. And further, determining weights corresponding to the adjustment model and the preset model respectively, and averaging the adjustment model and the preset model according to the weights to generate a domain translation model.
According to the field translation processing method, the keyword set of the target field is obtained, and the keyword set comprises the keywords of the target language. And then, searching according to the keywords to obtain the first corpus of the target language. And further, judging whether a preset condition is met or not according to the first language material, and if the preset condition is met, performing translation processing according to the first language material to obtain a second language material of the source language. And adjusting the processing parameters of the preset model according to the first corpus and the second corpus to generate a domain translation model of the target domain. Therefore, parallel corpora in the field can be obtained according to the keywords to adjust the translation model, field self-adaptive translation is achieved, the manual participation amount is reduced, the labor cost is reduced, and the processing efficiency is improved.
Based on the foregoing embodiment, further, the domain translation processing method according to the embodiment of the present application may further iteratively expand the keyword according to the related corpus retrieved by the keyword, and collect the monolingual corpus of the target language.
Fig. 2 is a schematic flowchart of another domain translation processing method provided in the embodiment of the present application, and as shown in fig. 2, the method includes:
step 201, a keyword set of a target field is obtained, and the keyword set comprises keywords of a target language.
Step 202, obtaining word vectors of the keywords, and classifying the keywords according to the word vectors.
As an example, the keywords are processed through a relevant language model to obtain word vectors of the keywords, and then the keywords are clustered by using the word vectors as features through a relevant clustering method, so that the keywords with similar attributes are classified into one category.
Step 203, extracting at least one keyword for each classified category, and retrieving according to the at least one keyword to obtain a first corpus of the target language.
In this embodiment, during the retrieval, for each classified category, at least one keyword is extracted from each category, and the retrieval is performed according to the extracted keyword to obtain the first corpus of the target language.
It can be understood that after the classification, keywords "movie" and "shot" are classified into one category, and "GPU", "acceleration", "rendering", and "animation" are classified into one category, and simultaneous search for "movie" and "shot" may result in a large amount of irrelevant text, so that by extracting at least one keyword from each category for retrieval, the obtained first corpus can be more accurate, irrelevant text is reduced, and translation accuracy is further improved.
And 204, judging whether a preset condition is met or not according to the first corpus, and if the preset condition is not met, extracting candidate keywords according to the first corpus.
In this embodiment, the number of sentences of the first corpus is obtained, and if the number of sentences is greater than a preset threshold, it is determined that a preset condition is satisfied. And if the preset condition is not met, extracting candidate keywords according to the first corpus by a related keyword extraction method.
It should be noted that the preset condition is not limited to the number of the sentences of the first corpus, and may also be determined by combining the number of the keywords and the category, for example, if the number of the keywords in the current keyword set is less than or equal to the preset number, it is determined that the preset condition is not satisfied; if the number of the keywords is greater than the preset number, whether the preset condition is met is further judged according to the number of the sentences, and no specific limitation is made here.
Step 205, determining a target keyword from the candidate keywords according to the word frequency and the inverse text frequency index of the candidate keywords, and adding the target keyword to the keyword set.
In this embodiment, a text of the first corpus is obtained, and candidate keywords are extracted according to the first corpus. And further, acquiring the word frequency and the inverse text frequency index of each candidate keyword in the first corpus text. As an example, the word frequency includes the number of occurrences of the candidate keyword in the text/the total number of words in the text, and the inverse text frequency index includes the logarithm of the base 10 after dividing the total number of texts in the first corpus by the number of texts containing the candidate keyword. And screening the candidate keywords by the product of the word frequency (TF) and the inverse text frequency Index (IDF) of the candidate keywords, reserving the candidate keywords corresponding to the high-weight result as target keywords, and adding the target keywords into the keyword set. Therefore, common keywords are filtered, important keywords are reserved, and keyword expansion is achieved.
In an embodiment of the present application, after the target keyword is added to the keyword set, the above steps 202 to 204 are repeatedly performed according to the current keyword combination until a preset condition is met, and then a translation process is performed according to the first corpus to obtain a second corpus of the source language.
According to the field translation processing method, the word vectors of the keywords are obtained, the keywords are classified according to the word vectors, at least one keyword is extracted for each classified category, and retrieval is carried out according to the at least one keyword, so that the first corpus of the target language is obtained. The acquired first corpus can be more accurate, and irrelevant texts are reduced. And if the preset condition is not met, extracting candidate keywords according to the first corpus, iteratively expanding the keywords, and realizing keyword expansion, thereby collecting more target language monolingual corpora.
The following description is given by way of example.
Referring to fig. 3, the source language is english, the target language is chinese, and the target field is a CAD field. The method comprises the steps of inputting keywords GPU, excellation, file, shot, render and animation in advance, translating the keywords into Chinese to obtain the keywords GPU, accelerating, movie, lens, rendering and animation. Classifying according to the word vectors of the keywords to obtain two categories of [ movies, shots ], [ GPU, acceleration, rendering and animation ], selecting at least one keyword for each category to search related linguistic data, and obtaining a first linguistic data of Chinese.
And then judging that the preset condition is not met, iteratively expanding the keywords according to the first corpus text to obtain two categories of keyword modeling, skeleton, texture, ray tracing, special effect and cooperation, classifying according to word vectors to obtain [ movies and lenses ], [ GPU, acceleration, rendering, animation, modeling, skeleton, texture, ray tracing, special effect and cooperation ]. Furthermore, at least one keyword is selected for each category to search related linguistic data, and a first linguistic data of Chinese is obtained. And if the judgment at this time meets the preset condition, outputting the first language material, and correspondingly translating the Chinese first language material into English to obtain a second language material. And adjusting the translation model according to the first corpus and the second corpus to generate a domain translation model in the CAD domain. Therefore, the domain self-adaptive translation is realized, the manual participation amount is reduced, the labor cost is reduced, and the processing efficiency is improved.
In order to implement the above embodiments, the present application further provides a domain translation processing apparatus.
Fig. 4 is a schematic structural diagram of a domain translation processing apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus includes: the system comprises an acquisition module 10, a retrieval module 20, a processing module 30 and a generation module 40.
The acquiring module 10 is configured to acquire a keyword set of a target field, where the keyword set includes keywords of a target language.
And the retrieval module 20 is configured to perform retrieval according to the keyword to obtain the first corpus of the target language.
And the processing module 30 is configured to determine whether a preset condition is met according to the first corpus, and if the preset condition is met, perform translation processing according to the first corpus to obtain a second corpus of the source language.
And the generating module 40 is configured to adjust a processing parameter of a preset model according to the first corpus and the second corpus, and generate a domain translation model of the target domain.
On the basis of fig. 4, the domain translation processing apparatus shown in fig. 5 further includes: an expansion module 50 and a classification module 60.
The expansion module 50 is configured to extract a candidate keyword according to the first corpus if the preset condition is not met; and determining a target keyword from the candidate keywords according to the word frequency and the inverse text frequency index of the candidate keywords, and adding the target keyword to the keyword set.
And a classification module 60, configured to obtain a word vector of the keyword, and classify the keyword according to the word vector.
The retrieval module 20 is specifically configured to: and for each classified category, extracting at least one keyword, and searching according to the at least one keyword.
In an embodiment of the present application, the processing module 30 is specifically configured to: and obtaining the number of sentences of the first corpus, and determining that the preset condition is met if the number of sentences is greater than a preset threshold value.
In an embodiment of the present application, the generating module 40 is specifically configured to: training the preset model processing parameters according to the first corpus and the second corpus to generate an adjusting model; and carrying out model average processing according to the adjusting model and the preset model to generate the field translation model.
The explanation of the domain translation processing method in the foregoing embodiment is also applicable to the domain translation processing apparatus in this embodiment, and will not be described herein again.
The domain translation processing device of the embodiment of the application acquires the keyword set of the target domain, wherein the keyword set comprises the keywords of the target language. And then, searching according to the keywords to obtain the first corpus of the target language. And further, judging whether a preset condition is met or not according to the first language material, and if the preset condition is met, performing translation processing according to the first language material to obtain a second language material of the source language. And adjusting the processing parameters of the preset model according to the first corpus and the second corpus to generate a domain translation model of the target domain. Therefore, parallel corpora in the field can be obtained according to the keywords to adjust the translation model, field self-adaptive translation is achieved, the manual participation amount is reduced, the labor cost is reduced, and the processing efficiency is improved.
In order to implement the foregoing embodiments, the present application further proposes a computer program product, wherein when the instructions in the computer program product are executed by a processor, the domain translation processing method according to any one of the foregoing embodiments is implemented.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 6 is a block diagram of an electronic device according to a domain translation processing method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the domain translation processing methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the domain translation processing method provided by the present application.
The memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the domain translation processing method in the embodiment of the present application (for example, the obtaining module 10, the retrieving module 20, the processing module 30, and the generating module 40 shown in fig. 4). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implements the domain translation processing method in the above method embodiment.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the domain translation processing method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the obtained position of the candidate frame is more accurate, the problem that the accuracy of obtaining the candidate frame in translation processing of the dense scene field needs to be improved is solved, and therefore the accuracy of the translation processing of the field is improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A domain translation processing method is characterized by comprising the following steps:
acquiring a keyword set of a target field, wherein the keyword set comprises keywords of a target language;
searching according to the keywords to obtain a first corpus of the target language;
judging whether a preset condition is met or not according to the first corpus, and if the preset condition is met, performing translation processing according to the first corpus to obtain a second corpus of the source language;
and adjusting the processing parameters of a preset model according to the first corpus and the second corpus to generate a domain translation model of the target domain.
2. The method according to claim 1, wherein after determining whether a predetermined condition is satisfied according to the first corpus, the method further comprises:
if the preset condition is not met, extracting candidate keywords according to the first corpus;
and determining a target keyword from the candidate keywords according to the word frequency and the inverse text frequency index of the candidate keywords, and adding the target keyword to the keyword set.
3. The method of claim 1, prior to retrieving based on the keyword, further comprising:
obtaining word vectors of the keywords, and classifying the keywords according to the word vectors;
the searching according to the keywords comprises the following steps:
and for each classified category, extracting at least one keyword, and searching according to the at least one keyword.
4. The method according to claim 1, wherein said determining whether a predetermined condition is satisfied according to the first corpus comprises:
and obtaining the number of sentences of the first corpus, and determining that the preset condition is met if the number of sentences is greater than a preset threshold value.
5. The method according to claim 1, wherein the generating a domain translation model of the target domain by adjusting processing parameters of a preset model according to the first corpus and the second corpus comprises:
training the preset model processing parameters according to the first corpus and the second corpus to generate an adjusting model;
and carrying out model average processing according to the adjusting model and the preset model to generate the field translation model.
6. A domain translation processing apparatus, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a keyword set of a target field, and the keyword set comprises keywords of a target language;
the retrieval module is used for retrieving according to the keywords to obtain a first corpus of the target language;
the processing module is used for judging whether a preset condition is met or not according to the first corpus, and if the preset condition is met, performing translation processing according to the first corpus to obtain a second corpus of the source language;
and the generating module is used for adjusting the processing parameters of a preset model according to the first corpus and the second corpus and generating a domain translation model of the target domain.
7. The apparatus of claim 6, further comprising:
the expansion module is used for extracting candidate keywords according to the first corpus if the preset condition is not met;
and determining a target keyword from the candidate keywords according to the word frequency and the inverse text frequency index of the candidate keywords, and adding the target keyword to the keyword set.
8. The apparatus of claim 6, further comprising:
the classification module is used for acquiring word vectors of the keywords and classifying the keywords according to the word vectors;
the retrieval module is specifically configured to:
and for each classified category, extracting at least one keyword, and searching according to the at least one keyword.
9. The apparatus of claim 6, wherein the processing module is specifically configured to:
and obtaining the number of sentences of the first corpus, and determining that the preset condition is met if the number of sentences is greater than a preset threshold value.
10. The apparatus of claim 6, wherein the generation module is specifically configured to:
training the preset model processing parameters according to the first corpus and the second corpus to generate an adjusting model;
and carrying out model average processing according to the adjusting model and the preset model to generate the field translation model.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the domain translation processing method of any of claims 1-5.
12. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the domain translation processing method according to any one of claims 1 to 5.
CN201911352107.1A 2019-12-25 2019-12-25 Domain translation processing method, device and equipment Active CN111126087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911352107.1A CN111126087B (en) 2019-12-25 2019-12-25 Domain translation processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911352107.1A CN111126087B (en) 2019-12-25 2019-12-25 Domain translation processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN111126087A true CN111126087A (en) 2020-05-08
CN111126087B CN111126087B (en) 2023-08-29

Family

ID=70502178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911352107.1A Active CN111126087B (en) 2019-12-25 2019-12-25 Domain translation processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN111126087B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090192781A1 (en) * 2008-01-30 2009-07-30 At&T Labs System and method of providing machine translation from a source language to a target language
CN106126505A (en) * 2016-06-20 2016-11-16 清华大学 Parallel phrase learning method and device
CN106484681A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 A kind of method generating candidate's translation, device and electronic equipment
CN108255940A (en) * 2017-12-08 2018-07-06 北京搜狗科技发展有限公司 A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN108460027A (en) * 2018-02-14 2018-08-28 广东外语外贸大学 A kind of spoken language instant translation method and system
CN109271644A (en) * 2018-08-16 2019-01-25 北京紫冬认知科技有限公司 A kind of translation model training method and device
US20190236147A1 (en) * 2018-01-26 2019-08-01 Samsung Electronics Co., Ltd. Machine translation method and apparatus
CN110334360A (en) * 2019-07-08 2019-10-15 腾讯科技(深圳)有限公司 Machine translation method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090192781A1 (en) * 2008-01-30 2009-07-30 At&T Labs System and method of providing machine translation from a source language to a target language
CN106484681A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 A kind of method generating candidate's translation, device and electronic equipment
CN106126505A (en) * 2016-06-20 2016-11-16 清华大学 Parallel phrase learning method and device
CN108255940A (en) * 2017-12-08 2018-07-06 北京搜狗科技发展有限公司 A kind of cross-language search method and apparatus, a kind of device for cross-language search
US20190236147A1 (en) * 2018-01-26 2019-08-01 Samsung Electronics Co., Ltd. Machine translation method and apparatus
CN108460027A (en) * 2018-02-14 2018-08-28 广东外语外贸大学 A kind of spoken language instant translation method and system
CN109271644A (en) * 2018-08-16 2019-01-25 北京紫冬认知科技有限公司 A kind of translation model training method and device
CN110334360A (en) * 2019-07-08 2019-10-15 腾讯科技(深圳)有限公司 Machine translation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111126087B (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN110543574B (en) Knowledge graph construction method, device, equipment and medium
CN112560912B (en) Classification model training method and device, electronic equipment and storage medium
US20210303921A1 (en) Cross-modality processing method and apparatus, and computer storage medium
KR102577514B1 (en) Method, apparatus for text generation, device and storage medium
CN110991196B (en) Translation method and device for polysemous words, electronic equipment and medium
CN112541076B (en) Method and device for generating expanded corpus in target field and electronic equipment
CN111967302A (en) Video tag generation method and device and electronic equipment
CN111144108A (en) Emotion tendency analysis model modeling method and device and electronic equipment
CN112528001B (en) Information query method and device and electronic equipment
CN111079945B (en) End-to-end model training method and device
CN110717340B (en) Recommendation method, recommendation device, electronic equipment and storage medium
CN111539209A (en) Method and apparatus for entity classification
CN112163405A (en) Question generation method and device
CN111090991A (en) Scene error correction method and device, electronic equipment and storage medium
CN111127191A (en) Risk assessment method and device
CN111126061B (en) Antithetical couplet information generation method and device
CN111241234A (en) Text classification method and device
CN110532415B (en) Image search processing method, device, equipment and storage medium
CN114444462B (en) Model training method and man-machine interaction method and device
CN111984774A (en) Search method, device, equipment and storage medium
CN111832396A (en) Document layout analysis method and device, electronic equipment and storage medium
CN111708800A (en) Query method and device and electronic equipment
US20210216710A1 (en) Method and apparatus for performing word segmentation on text, device, and medium
CN111460801B (en) Title generation method and device and electronic equipment
CN111949820A (en) Video associated interest point processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant