CN115600611A - Training data management method, training data management apparatus, and readable storage medium - Google Patents

Training data management method, training data management apparatus, and readable storage medium Download PDF

Info

Publication number
CN115600611A
CN115600611A CN202211291124.0A CN202211291124A CN115600611A CN 115600611 A CN115600611 A CN 115600611A CN 202211291124 A CN202211291124 A CN 202211291124A CN 115600611 A CN115600611 A CN 115600611A
Authority
CN
China
Prior art keywords
data set
training data
target
determining
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211291124.0A
Other languages
Chinese (zh)
Inventor
李志伟
邢俊文
柳晓
张旭敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Tranx Information Technology Shenzhen Co ltd
Original Assignee
New Tranx Information Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Tranx Information Technology Shenzhen Co ltd filed Critical New Tranx Information Technology Shenzhen Co ltd
Priority to CN202211291124.0A priority Critical patent/CN115600611A/en
Publication of CN115600611A publication Critical patent/CN115600611A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a training data management method, a training data management device and a computer readable storage medium, the method comprising: determining a target reference model according to a first selected operation received on a model selection interface, and acquiring a training data set corresponding to the target reference model; determining a target training data set corresponding to the target reference model according to a second selected operation received by a selection interface of the training data set; determining a test data set according to the target training data set; training the target reference model based on the target training data set, and testing the trained target reference model based on the test data set. The technical problem that the test result is inaccurate due to the fact that the test data determined by a conventional method is not strong in objectivity is solved, and the technical effects of obtaining high-quality test data and accurately evaluating a translation model are achieved.

Description

Training data management method, training data management apparatus, and readable storage medium
Technical Field
The present application relates to the field of translation model technology, and in particular, to a training data management method, a training data management device, and a computer-readable storage medium.
Background
The construction of a translation model is generally divided into two main steps of training and testing. The training step is to learn statistical knowledge from the corpus and carry out parameter training, and the testing step is to pass the selected test data and test the accuracy of the translated model to the original translation result in the test data after training.
In the related art, the quality of the translation model generally depends on the generalization ability of the translation model to a new sample, and a technician needs to determine test data to test the translation model when evaluating the translation model. After a batch of corpus data is obtained, research and development personnel generally divide the corpus data into two parts according to a random proportion, wherein one part is used as training data, and the other part is used as test data.
However, the test data determined using the conventional method is not objective, resulting in inaccurate test results.
Disclosure of Invention
By providing the training data management method, the training data management device and the computer readable storage medium, the technical problem that a test result is inaccurate due to poor objectivity of test data determined by a conventional method in the related art is solved, and the technical effects of obtaining high-quality test data and accurately evaluating a translation model are achieved.
The embodiment of the application provides a training data management method, which comprises the following steps:
when a creating instruction of a training task is received, outputting a model selection interface corresponding to the training task;
determining a target reference model based on a first selected operation received by the model selection interface, and acquiring a training data set corresponding to the target reference model;
outputting a selection interface of the training data set, and determining a target training data set corresponding to the target reference model based on a second selection operation received by the selection interface;
determining a test data set according to the target training data set;
and training the target reference model based on the target training data set, and testing the trained target reference model based on the test data set.
Optionally, before the step of determining a test data set according to the target training data set, the method further includes:
when receiving a correlation processing instruction, outputting a selected interface of the test data set;
determining a preset number and a name of the test data set based on a third selected operation received by a selected interface of the test data set;
the step of determining a test data set from the target training data set comprises:
randomly selecting the preset number of training data in the target training data set to form the test data set, and naming the test data set according to the name to determine the test data set;
after the step of determining a test data set from the target training data set, the method further comprises:
storing the test data set in association with the target training data set.
Optionally, before the step of determining a test data set according to the target training data set, the method further includes:
when the target training data set is detected and no associated test data set exists, receiving the associated processing instruction, and executing the step of outputting the selected interface of the test data set when the associated processing instruction is received;
when the target training data set is detected and the associated test data set exists, the step of determining the test data set according to the target training data set comprises:
and acquiring the test data set associated with the target training data set.
Optionally, the step of randomly selecting the preset number of training data includes:
determining a field type and sentence pair length corresponding to the association processing instruction;
and randomly extracting the training data matched with the field type and the sentence pair length in the target training data set according to the preset number.
Optionally, the step of outputting a selection interface of the training data set, and determining a target training data set corresponding to the target reference model based on a second selected operation received by the selection interface includes:
outputting a selection interface of the training data set, wherein the language type of the training data set is the same as that of the target reference model;
when the second selected operation is received, determining a field type and a training set name corresponding to the second selected operation;
and determining the target training data set according to the field type and the training set name.
Optionally, the step of determining a target reference model based on the first selected operation received by the model selection interface and acquiring a training data set corresponding to the target reference model includes:
determining a model name, a language type and a field type corresponding to the first selected operation;
and determining the target benchmark model matched with the model name, the language type and the field type in the model selection interface.
Optionally, after the steps of training the target reference model based on the target training data set and testing the trained target reference model based on the test data set, the method further includes:
outputting the tested target reference model, naming the target reference model and storing the target reference model in a memory library;
when a query instruction of translation data is received, outputting a translation data selection interface corresponding to the query instruction;
determining target translation data based on a fourth selected operation received by the translation data selection interface, and outputting a detail interface of the target translation data;
and performing corresponding data processing operation on the target translation data based on the operation instruction received by the detail interface.
Optionally, the step of performing corresponding data processing operation on the target translation data based on the operation instruction received by the detail interface includes at least one of:
when a data insertion instruction is received, determining a reference sentence pair and content to be inserted corresponding to the data insertion instruction, and inserting the content to be inserted after the reference sentence pair;
when a data merging instruction is received, determining a target sentence pair corresponding to the data merging instruction, and merging the target sentence pair into a sentence pair;
when a data import instruction is received, determining a data file to be imported corresponding to the data import instruction, and updating the target translation data according to the data file to be imported.
In addition, the present application also provides a training data management device, which includes a memory, a processor, and a training data management program stored on the memory and operable on the processor, and when the processor executes the training data management program, the steps of the training data management method described above are implemented.
Furthermore, the present application also proposes a computer-readable storage medium having stored thereon a training data management program which, when executed by a processor, implements the steps of the training data management method as described above.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
1. the method comprises the steps of determining a target reference model according to a first selected operation received on a model selection interface, and acquiring a training data set corresponding to the target reference model; determining a target training data set corresponding to the target reference model according to a second selected operation received by a selection interface of the training data set; determining a test data set according to the target training data set; the target reference model is trained on the basis of the target training data set, and the trained target reference model is tested on the basis of the test data set, so that the technical problem that the test result is inaccurate due to the fact that the objectivity of the test data determined by a conventional method is not strong is effectively solved, the technical effects of obtaining high-quality test data and accurately evaluating a translation model are achieved.
2. Determining the preset number and the name of the test data set due to the third selected operation received by the selected interface based on the test data set; randomly selecting the training data with the preset quantity from the target training data set to form the test data set, and naming the test data set according to the name; and storing the test data set in association with a target training data set. Therefore, the technical problem that the test result is inaccurate due to the fact that the test data determined by a conventional method is not strong in objectivity is effectively solved, and the technical effects of obtaining high-quality test data and accurately evaluating a translation model are achieved.
Drawings
FIG. 1 is a schematic flowchart of a training data management method according to a first embodiment of the present application;
FIG. 2 is a flowchart illustrating a second embodiment of a training data management method according to the present application;
FIG. 3 is a flowchart illustrating a third embodiment of a training data management method of the present application;
fig. 4 is a schematic diagram of a hardware structure according to an embodiment of the training data management apparatus of the present application.
Detailed Description
In the related art, after a batch of corpus data is obtained, a research and development staff usually divides the corpus data into two parts according to a random proportion, wherein one part is used as training data, and the other part is used as test data. However, such conventional methods determine test data with poor objectivity, resulting in inaccurate test results. The embodiment of the application adopts the following main technical scheme: determining a target reference model according to a first selected operation received on a model selection interface, and acquiring a training data set corresponding to the target reference model; determining a target training data set corresponding to the target reference model according to a second selected operation received by a selection interface of the training data set; determining a test data set according to the target training data set; training the target reference model based on the target training data set, and testing the trained target reference model based on the test data set. Therefore, the technical effects of acquiring high-quality test data and accurately evaluating the translation model are achieved.
In order to better understand the above technical solutions, exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
The embodiment of the application discloses a training data management method, and with reference to fig. 1, the training data management method includes:
step S10, when a creating instruction of a training task is received, outputting a model selection interface corresponding to the training task;
step S20, determining a target reference model based on a first selection operation received by the model selection interface, and acquiring a training data set corresponding to the target reference model;
in this embodiment, the creation instruction is a click instruction of a newly-built task key received by a client interface; the model selection interface is a selection window of a reference model, and the first selection operation is a selection instruction which is received by the model selection interface and is used for screening the attribute of the reference model; the domain type of the training data set is the same as the domain type of the reference model.
Optionally, step S20 includes:
step S21, determining a model name, a language type and a field type corresponding to the first selected operation;
and S22, determining the target reference model matched with the model name, the language type and the field type in the model selection interface.
As an optional implementation manner, when a client homepage interface receives a creation instruction of a training task, a corresponding model selection interface is displayed with a reference model list; and determining a target reference model based on the first selected operation received by the model selection interface, and acquiring a training data set corresponding to the target reference model.
Illustratively, after a newly-built training task key is triggered, the client jumps to a model selection page, the model selection page displays different reference models, and the reference models meeting the conditions corresponding to the selected operation can be screened out according to the received selected operation and the model name, the source language, the target language and the field type corresponding to the selected operation; after receiving an instruction that the reference model is selected, screening out training data sets with the same language type according to the language type of the selected target reference model.
Step S30, outputting a selection interface of the training data set, and determining a target training data set corresponding to the target reference model based on a second selection operation received by the selection interface;
in this embodiment, a selection interface of a training data set shows a data set available for training, and the target training data set is a training data set determined according to the selected operation and used for training the target reference model.
Optionally, step S30 includes:
step S31, outputting a selection interface of the training data set, wherein the language type of the training data set is the same as that of the target reference model;
step S32, when the second selected operation is received, determining a field type and a training set name corresponding to the second selected operation;
and S33, determining the target training data set according to the field type and the training set name.
In this embodiment, the language types are divided into a source language and a target language, and the same language type means that the source language is the same, and the target language is the same. The domain type is a domain in which the training data or the reference model is applied, such as a general domain, a biological domain, a chemical domain, a mathematical domain, a news domain, and the like, and is not particularly limited herein.
As an optional implementation manner, determining a training data set with the same language type as the target reference model, and outputting a corresponding selection interface according to the determined training data set; when a second selected operation is received, determining a field type and a training set name corresponding to the second selected operation; and screening out the training data set which meets the conditions according to the limited field type and the name of the training data set, and determining a target training data set in the screened training data set according to the received click command.
Exemplarily, after determining the target reference model, determining training data sets of the same source language and the same target language, and jumping to a selection interface of the training data sets; screening out a training data set which meets the conditions according to the training set name corresponding to the second selected operation and the screening conditions of the field type; and determining a corresponding target training data set according to the selected selection interface and the received click command.
Step S40, determining a test data set according to the target training data set;
and S50, training the target reference model based on the target training data set, and testing the trained target reference model based on the test data set.
In this embodiment, the test data set is used for testing the trained reference model, where the test data set may be a test data set associated with the target training data set, or a test data set selected according to a selected operation received by a selection interface of the test data set.
As an optional implementation manner, after the target training data set is determined, when a selection interface of the training data set receives that a "test set associated with the training set" button is triggered, determining a test data set associated with the target training data set; and when receiving that the 'start training' key is triggered, training the reference model by using a target training data set, and testing the trained reference model by using a test data set.
As another optional implementation, after the target training data set is determined, when the selection interface of the training data set receives that the next key is triggered, determining a test data set with the same language type as the target reference model, and outputting the selection interface of the test data set; and determining a target test data set according to the selected operation received by the selection interface of the test data set.
The technical scheme in the embodiment of the application at least has the following technical effects or advantages:
the method comprises the steps that a target reference model is determined according to first selection operation received on a model selection interface, and a training data set corresponding to the target reference model is obtained; determining a target training data set corresponding to the target reference model according to a second selected operation received by a selection interface of the training data set; determining a test data set according to the target training data set; the target reference model is trained on the basis of the target training data set, and the trained target reference model is tested on the basis of the test data set, so that the technical problem that the test result is inaccurate due to the fact that the objectivity of the test data determined by a conventional method is not strong is effectively solved, the technical effects of obtaining high-quality test data and accurately evaluating a translation model are achieved.
Example two
Based on the first embodiment, the second embodiment of the present application discloses a training data management method, and referring to fig. 2, before step S40, the method further includes:
step S210, outputting a selected interface of the test data set when receiving a correlation processing instruction;
step S220, determining a preset number and the name of the test data set based on a third selected operation received by a selected interface of the test data set;
optionally, step S40 includes:
step S230, randomly selecting the preset number of training data in the target training data set to form the test data set, and naming the test data set according to the name to determine the test data set;
optionally, after step S40, the method further includes:
step S240, storing the test data set and the target training data set in association.
In this embodiment, the association processing instruction is a presentation interface of the training data set, and the received instruction is the instruction that the "preprocessing" key is triggered.
As an optional implementation manner, when a display interface of a training data set receives a correlation processing instruction, determining a target training data set corresponding to the correlation processing instruction, and outputting a selected interface of a test data set, where the selected interface may determine a name, a screening manner, and a preset number of the test data set according to a received third selected operation; determining the preset number of the test data sets and the names of the test data sets according to the third selected operation; when the 'start preprocessing' key is triggered, randomly selecting a preset number of training data from a target training data set, forming the training data into a test data set, and naming the test data set by using a name corresponding to a third selected operation; and after the test data set is determined, the test data set and the target training data set are stored in a correlation mode.
Optionally, before step S40, the method further includes:
step S250, when the target training data set is detected and no associated test data set exists, receiving the associated processing instruction, and executing the step of outputting the selected interface of the test data set when the associated processing instruction is received;
step S260, when it is detected that the target training data set exists in the associated test data set, the step of determining the test data set according to the target training data set includes:
and acquiring the test data set associated with the target training data set.
As an alternative embodiment, after the selection interface of the training data set determines the target training data set, the selection interface pops up a "test set using training set association" button; and after detecting that the key is triggered, inquiring whether the target training set has an associated test data set.
Illustratively, when the associated test data set does not exist, jumping to a display interface of a target training data set, when detecting that a 'preprocessing' key is triggered, receiving the association processing instruction, and executing the step of outputting the selected interface of the test data set when receiving the association processing instruction.
Illustratively, when the target training data set is detected, and the associated test data set exists, the test data set associated with the target training data set is obtained. The target training data set can be preprocessed for multiple times, and the associated test data set is the test data set obtained by the latest preprocessing.
Optionally, step S230 includes:
step S231, determining a field type and a sentence pair length corresponding to the association processing instruction;
step S232, according to the preset number, randomly extracting the training data matched with the field type and the sentence pair length in the target training data set.
As an optional implementation manner, determining a target training data set corresponding to the association processing instruction, and determining a field type and a sentence pair length of the target training data set; and randomly extracting training data in the target training data set according to the preset number corresponding to the third selected operation, wherein the extracted training data is matched with the field type and the sentence pair length.
The technical scheme in the embodiment of the application at least has the following technical effects or advantages:
determining the preset number and the name of the test data set due to the third selected operation received by the selected interface based on the test data set; randomly selecting the training data with the preset quantity from the target training data set to form the test data set, and naming the test data set according to the name; and storing the test data set in association with a target training data set. Therefore, the technical problem that the test result is inaccurate due to the fact that the test data determined by a conventional method is not strong in objectivity is effectively solved, and the technical effects of obtaining high-quality test data and accurately evaluating a translation model are achieved.
EXAMPLE III
Based on the first embodiment, the third embodiment of the present application discloses a training data management method, and with reference to fig. 3, after step S50, the method further includes:
step S310, outputting the tested target reference model, naming the target reference model and storing the named target reference model in a memory base;
as an optional implementation manner, after the target reference model is tested, if the test result is higher than an expected value, it is determined that the target reference model training result meets a preset requirement, and the target reference model is named and then stored in a memory.
Step S320, when a query instruction of the translation data is received, outputting a translation data selection interface corresponding to the query instruction;
step S330, determining target translation data based on a fourth selection operation received by the translation data selection interface, and outputting a detail interface of the target translation data;
in this embodiment, the memory stores the tested target reference model, and the translation data selection interface displays the target reference model, and can output the target reference model meeting the screening condition according to the received screening instruction. The target translation data is the selected target reference model, and the detail interface is listed with translation data corresponding to the target reference model, such as original text and translated text.
As an optional implementation manner, when it is detected that the "memory library" key is triggered, the translation data selection interface is output, wherein the translation data selection interface displays the tested target benchmark model, and displays attribute information of the target benchmark model, such as name, language type, domain type, creator, creation time and the like. Determining a screening condition corresponding to a fourth selected operation based on the fourth selected operation received by a translation data selection interface, and screening out a target reference model meeting the screening condition; and after detecting that the information bar of the target reference model is clicked, determining that the target reference model is target translation data, and jumping to a detail interface of the target translation data, wherein the detail interface displays the translation data of the target reference model.
Step S340, performing corresponding data processing operation on the target translation data based on the operation instruction received by the detail interface.
Optionally, step S340 includes at least one of:
step S341, when a data insertion instruction is received, determining a reference sentence pair and a content to be inserted corresponding to the data insertion instruction, and inserting the content to be inserted after the reference sentence pair;
as an optional implementation manner, when the detail interface detects that the "insert" key is triggered, a reference sentence pair to be inserted is determined, and according to the content to be inserted corresponding to the data insertion instruction, the content to be inserted is inserted after the reference sentence pair.
Illustratively, the list number of the reference sentence pair is 7, and after the content to be inserted is added to the detail interface, the list number of the content to be inserted is 8, and the list numbers of the subsequent sentence pairs are continued.
Step S342, when receiving a data merging instruction, determining a target sentence pair corresponding to the data merging instruction, and merging the target sentence pair into a sentence pair;
as an optional implementation manner, when the detail interface detects that a merge key is triggered, a target sentence pair to be merged is determined, and the target sentence pair is merged into a sentence pair, wherein sentences corresponding to original texts of the target sentence pair are integrated into a first paragraph, translations of the target sentence pair are also integrated into a second paragraph, the first paragraph is added to the original texts of the merged sentence pair, and the second paragraph is added to the translations of the merged sentence pair.
Step S343, when receiving the data import instruction, determining the data file to be imported corresponding to the data import instruction, and updating the target translation data according to the data file to be imported.
As an optional implementation manner, when the detail interface detects that an "import" key is triggered, a bilingual file to be inserted is determined, and the target translation data is updated according to an original text and a translated text corresponding to the bilingual file.
The technical scheme in the embodiment of the application at least has the following technical effects or advantages:
the tested target reference model is stored in the memory bank, so that the tested target reference model can be managed according to the page of the memory bank, the addition, deletion, modification and check of the target reference model are realized on the page, and meanwhile, the receiving of the bilingual file to be imported is supported, and the target reference model is updated. The technical problem that the translation model after being tested needs to be trained again when being optimized and iterated in the related technology is effectively solved, the iterative translation model is optimized efficiently, and the maintenance time cost of the translation model is greatly saved.
The present application further provides a training data management device, and referring to fig. 4, fig. 4 is a schematic structural diagram of a training data management device in a hardware operating environment according to an embodiment of the present application.
As shown in fig. 4, the training data management apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in fig. 4 does not constitute a limitation of the training data management apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
Optionally, the memory 1005 is electrically connected to the processor 1001, and the processor 1001 may be configured to control operations of the memory 1005 and read data in the memory 1005 to implement training data management.
Alternatively, as shown in fig. 4, the memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and a training data management program.
Alternatively, in the training data management apparatus shown in fig. 4, the network interface 1004 is mainly used for data communication with other apparatuses; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the training data management apparatus of the present application may be provided in the training data management apparatus.
As shown in fig. 4, the training data management apparatus calls, through the processor 1001, a training data management program stored in the memory 1005, and performs operations related to the training data management method provided by the embodiment of the present application:
when a creating instruction of a training task is received, outputting a model selection interface corresponding to the training task;
determining a target reference model based on a first selected operation received by the model selection interface, and acquiring a training data set corresponding to the target reference model;
outputting a selection interface of the training data set, and determining a target training data set corresponding to the target reference model based on a second selected operation received by the selection interface;
determining a test data set according to the target training data set;
and training the target reference model based on the target training data set, and testing the trained target reference model based on the test data set.
Alternatively, the processor 1001 may call the training data management program stored in the memory 1005, and further perform the following operations:
outputting a selected interface of the test data set when an association processing instruction is received;
determining a preset number and a name of the test data set based on a third selected operation received by a selected interface of the test data set;
the step of determining a test data set from the target training data set comprises:
randomly selecting the preset number of training data in the target training data set to form the test data set, and naming the test data set according to the name to determine the test data set;
after the step of determining a test data set from the target training data set, the method further comprises:
storing the test data set in association with the target training data set.
Alternatively, the processor 1001 may call the training data management program stored in the memory 1005, and further perform the following operations:
when the target training data set is detected and no related test data set exists, receiving the related processing instruction, and executing the step of outputting the selected interface of the test data set when the related processing instruction is received;
when the target training data set is detected and the associated test data set exists, the step of determining the test data set according to the target training data set comprises:
and acquiring the test data set associated with the target training data set.
Alternatively, the processor 1001 may call the training data management program stored in the memory 1005, and further perform the following operations:
determining a field type and sentence pair length corresponding to the association processing instruction;
and randomly extracting the training data matched with the field type and the sentence pair length in the target training data set according to the preset number.
Alternatively, the processor 1001 may call the training data management program stored in the memory 1005, and further perform the following operations:
outputting a selection interface of the training data set, wherein the language type of the training data set is the same as that of the target reference model;
when the second selected operation is received, determining a field type and a training set name corresponding to the second selected operation;
and determining the target training data set according to the field type and the training set name.
Alternatively, the processor 1001 may call the training data management program stored in the memory 1005, and further perform the following operations:
determining a model name, a language type and a field type corresponding to the first selected operation;
and determining the target benchmark model matched with the model name, the language type and the field type in the model selection interface.
Alternatively, the processor 1001 may call the training data management program stored in the memory 1005, and further perform the following operations:
outputting the tested target reference model, naming the target reference model and storing the named target reference model in a memory library;
when a query instruction of translation data is received, outputting a translation data selection interface corresponding to the query instruction;
determining target translation data based on a fourth selected operation received by the translation data selection interface, and outputting a detail interface of the target translation data;
and performing corresponding data processing operation on the target translation data based on the operation instruction received by the detail interface.
Alternatively, the processor 1001 may call the training data management program stored in the memory 1005, and further perform the following operations:
when a data insertion instruction is received, determining a reference sentence pair and content to be inserted corresponding to the data insertion instruction, and inserting the content to be inserted after the reference sentence pair;
when a data merging instruction is received, determining a target sentence pair corresponding to the data merging instruction, and merging the target sentence pair into a sentence pair;
when a data import instruction is received, determining a data file to be imported corresponding to the data import instruction, and updating the target translation data according to the data file to be imported.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a training data management program is stored on the computer-readable storage medium, and when executed by a processor, the training data management program implements the relevant steps of any embodiment of the training data management method described above.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A training data management method, characterized by comprising:
when a creating instruction of a training task is received, outputting a model selection interface corresponding to the training task;
determining a target reference model based on a first selected operation received by the model selection interface, and acquiring a training data set corresponding to the target reference model;
outputting a selection interface of the training data set, and determining a target training data set corresponding to the target reference model based on a second selected operation received by the selection interface;
determining a test data set according to the target training data set;
training the target reference model based on the target training data set, and testing the trained target reference model based on the test data set.
2. The training data management method of claim 1 wherein said step of determining a test data set from said target training data set is preceded by the steps of:
outputting a selected interface of the test data set when an association processing instruction is received;
determining a preset number and a name of the test data set based on a third selected operation received by a selected interface of the test data set;
the step of determining a test data set from the target training data set comprises:
randomly selecting the preset number of training data in the target training data set to form the test data set, and naming the test data set according to the name to determine the test data set;
after the step of determining a test data set from the target training data set, the method further comprises:
storing the test data set in association with the target training data set.
3. The training data management method of claim 2, wherein said step of determining a test data set from said target training data set is preceded by the step of:
when the target training data set is detected and no associated test data set exists, receiving the associated processing instruction, and executing the step of outputting the selected interface of the test data set when the associated processing instruction is received;
when it is detected that the target training data set exists in the associated test data set, the determining a test data set according to the target training data set includes:
and acquiring the test data set associated with the target training data set.
4. The training data management method according to claim 2, wherein the step of randomly selecting the preset number of training data comprises:
determining a field type and sentence pair length corresponding to the association processing instruction;
and randomly extracting the training data matched with the field type and the sentence pair length in the target training data set according to the preset number.
5. The training data management method of claim 1, wherein the step of outputting a selection interface for the training data set and determining a target training data set corresponding to the target reference model based on a second selected operation received by the selection interface comprises:
outputting a selection interface of the training data set, wherein the language type of the training data set is the same as that of the target reference model;
when the second selected operation is received, determining a field type and a training set name corresponding to the second selected operation;
and determining the target training data set according to the field type and the training set name.
6. The method for managing training data according to claim 1, wherein the step of determining a target reference model based on the first selected operation received by the model selection interface and acquiring the training data set corresponding to the target reference model comprises:
determining a model name, a language type and a field type corresponding to the first selected operation;
and determining the target reference model matched with the model name, the language type and the field type in the model selection interface.
7. The training data management method of claim 1, wherein after the steps of training the target reference model based on the target training data set and testing the trained target reference model based on the test data set, further comprising:
outputting the tested target reference model, naming the target reference model and storing the named target reference model in a memory library;
when a query instruction of translation data is received, outputting a translation data selection interface corresponding to the query instruction;
determining target translation data based on a fourth selected operation received by the translation data selection interface, and outputting a detail interface of the target translation data;
and performing corresponding data processing operation on the target translation data based on the operation instruction received by the detail interface.
8. The training data management method according to claim 7, wherein the step of performing corresponding data processing operations on the target translation data based on the operation instructions received by the detail interface includes at least one of:
when a data insertion instruction is received, determining a reference sentence pair and content to be inserted corresponding to the data insertion instruction, and inserting the content to be inserted after the reference sentence pair;
when a data merging instruction is received, determining a target sentence pair corresponding to the data merging instruction, and merging the target sentence pair into a sentence pair;
when a data import instruction is received, determining a data file to be imported corresponding to the data import instruction, and updating the target translation data according to the data file to be imported.
9. A training data management apparatus comprising a memory, a processor and a training data management program stored on the memory and executable on the processor, the processor implementing the steps of the training data management method according to any one of claims 1 to 8 when executing the training data management program.
10. A computer-readable storage medium, having stored thereon a training data management program which, when executed by a processor, implements the steps of the training data management method according to any one of claims 1 to 8.
CN202211291124.0A 2022-10-21 2022-10-21 Training data management method, training data management apparatus, and readable storage medium Pending CN115600611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211291124.0A CN115600611A (en) 2022-10-21 2022-10-21 Training data management method, training data management apparatus, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211291124.0A CN115600611A (en) 2022-10-21 2022-10-21 Training data management method, training data management apparatus, and readable storage medium

Publications (1)

Publication Number Publication Date
CN115600611A true CN115600611A (en) 2023-01-13

Family

ID=84849870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211291124.0A Pending CN115600611A (en) 2022-10-21 2022-10-21 Training data management method, training data management apparatus, and readable storage medium

Country Status (1)

Country Link
CN (1) CN115600611A (en)

Similar Documents

Publication Publication Date Title
US9419884B1 (en) Intelligent automated testing method for restful web services
CN106598869A (en) Method and device for realizing automatic software testing and electronic equipment
Henderson et al. A reproducible systematic map of research on the illusory truth effect
US11544176B1 (en) Systems and methods for automatically assessing and conforming software development modules to accessibility guidelines in real-time
CN110837356B (en) Data processing method and device
WO2019111508A1 (en) Information processing device, information processing method, and program
JP6440895B2 (en) Software analysis apparatus and software analysis method
CN112948418A (en) Dynamic query method, device, equipment and storage medium
CN113377431A (en) Code processing method, device, equipment and medium
CN111181805A (en) Micro-service test baffle generation method and system based on test case
JP5651050B2 (en) Data generation apparatus and data generation program
CN114168565B (en) Backtracking test method, device and system of business rule model and decision engine
Rantala et al. Prevalence, contents and automatic detection of KL-SATD
US11176022B2 (en) Health diagnostics and analytics for object repositories
CN112306870A (en) Data processing method and device based on live APP
CN111813816B (en) Data processing method, device, computer readable storage medium and computer equipment
CN115600611A (en) Training data management method, training data management apparatus, and readable storage medium
CN110825646A (en) Test case generation method, interface test method, device and electronic equipment
CN116483344A (en) Code generation method and device, terminal equipment and computer readable storage medium
CN109242403B (en) Demand management method and computer equipment
CN111309371A (en) Query method and device
CN107704484B (en) Webpage error information processing method and device, computer equipment and storage medium
CN112732243A (en) Data processing method and device for generating functional component
WO2023238358A1 (en) Display control device, display control method, and display control program
US20240054232A1 (en) Build process security framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination