CN116796859A - Training method and device for sequence-to-sequence model, electronic equipment and medium - Google Patents

Training method and device for sequence-to-sequence model, electronic equipment and medium Download PDF

Info

Publication number
CN116796859A
CN116796859A CN202310991099.5A CN202310991099A CN116796859A CN 116796859 A CN116796859 A CN 116796859A CN 202310991099 A CN202310991099 A CN 202310991099A CN 116796859 A CN116796859 A CN 116796859A
Authority
CN
China
Prior art keywords
data
model
training
target
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310991099.5A
Other languages
Chinese (zh)
Inventor
张俊锋
李登高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianren Healthcare Big Data Technology Co Ltd
Original Assignee
Lianren Healthcare Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianren Healthcare Big Data Technology Co Ltd filed Critical Lianren Healthcare Big Data Technology Co Ltd
Priority to CN202310991099.5A priority Critical patent/CN116796859A/en
Publication of CN116796859A publication Critical patent/CN116796859A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a training method and device for a sequence-to-sequence model, electronic equipment and a storage medium. Analyzing pre-stored sample data into semi-structured data according to a preset format; wherein the semi-structured data may include resource name data, resource attribute data, and attribute content data; determining source data and target data in the semi-structured data, and determining training corpus of a model to be trained based on the source data and the target data; the model to be trained is a sequence-to-sequence model; training the model to be trained based on the training corpus to obtain a target model. According to the technical scheme provided by the embodiment of the invention, the problem of losing the structural relation between text information in the training sample can be avoided, and the effect of improving the effectiveness and accuracy of the target model is realized.

Description

Training method and device for sequence-to-sequence model, electronic equipment and medium
Technical Field
The embodiment of the invention relates to the technical field of machine learning, in particular to a training method, a training device, electronic equipment and a training medium for a sequence-to-sequence model.
Background
The Seq2Seq (Sequence to Sequence, sequence-to-sequence) model is widely used in the field of natural language processing. In the training process of the Seq2Seq model, unstructured data is required to be extracted from a large amount of sample data to serve as training corpus, so that training of the Seq2Seq model is completed based on the training corpus.
However, in the process of implementing the present invention, it is found that at least the following technical problems exist in the prior art: unstructured data is used as training corpus, and structural relations among text information in training samples are easy to lose, so that the effectiveness of a Seq2Seq model obtained through training is poor.
Disclosure of Invention
The embodiment of the invention provides a training method, a training device, electronic equipment and a storage medium for a sequence-to-sequence model, so as to achieve the aim of improving the effectiveness and accuracy of a target model.
According to an aspect of the present invention, there is provided a training method of a sequence-to-sequence model, including:
analyzing the pre-stored sample data into semi-structured data according to a preset format; wherein the semi-structured data may include resource name data, resource attribute data, and attribute content data;
determining source data and target data in the semi-structured data, and determining training corpus of a model to be trained based on the source data and the target data; the model to be trained is a sequence-to-sequence model;
and training the model to be trained based on the training corpus to obtain a target model.
According to another aspect of the present invention, there is provided a training apparatus for a sequence-to-sequence model, the apparatus comprising:
the data analysis module is used for analyzing the pre-stored sample data into semi-structured data according to a preset format; wherein the semi-structured data may include resource name data, resource attribute data, and attribute content data;
the training corpus determining module is used for determining source data and target data in the semi-structured data and determining training corpus of a model to be trained based on the source data and the target data; the model to be trained is a sequence-to-sequence model;
and the model training module is used for training the model to be trained based on the training corpus so as to obtain a target model.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the sequence-to-sequence model training method of any of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a training method of a sequence-to-sequence model according to any of the embodiments of the present invention when executed.
According to the technical scheme, the pre-stored sample data are analyzed into the semi-structured data according to the preset format, wherein the semi-structured data can comprise resource name data, resource attribute data and attribute content data; determining source data and target data in the semi-structured data, and determining training corpus of a model to be trained based on the source data and the target data, wherein the model to be trained is a sequence-to-sequence model; training the model to be trained based on the training corpus to obtain a target model. The method solves the problem that in the prior art, the unstructured data are adopted for training, so that the structural relation between text information in training samples is lost, and the effect of improving the effectiveness and accuracy of a target model is achieved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a training method for a sequence-to-sequence model provided in accordance with an embodiment of the present invention;
FIG. 2 is a diagram showing an example of semi-structured data provided in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training apparatus for sequence-to-sequence modeling according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing a training method of a sequence-to-sequence model according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flowchart of a training method of a sequence-to-sequence model according to an embodiment of the present invention. The present embodiment may be applied to the case of training a sequence-to-sequence model based on semi-structured data, which may be performed by a sequence-to-sequence model training device, which may be implemented in hardware and/or software.
As shown in fig. 1, the method of this embodiment may specifically include:
s110, analyzing the pre-stored sample data into semi-structured data according to a preset format.
The pre-stored sample data may be knowledge encyclopedia data, for example, a drug instruction. The semi-structured data may include resource name data, resource attribute data, and attribute content data. For example, the semi-structured data may be internet web page data including a title, a body, and a subtitle in the body; and can also be a literature book with a hierarchical structure such as a chapter catalog. The content in the semi-structured data can abstract three parts of resource name data, resource attribute data and attribute content data.
Illustratively, the semi-structured data may be drug instruction data. As shown in fig. 2, the "disease", "drug", "adverse reaction", "instructions for use", "clinical manifestation" and "treatment mode" are leaf nodes composed of texts, the "treatment drug" is a central node, and each side reflects the structural relationship between the text of each leaf node and the text of the central node; the central node may act as resource name data in the semi-structured data. Compared with unstructured data, the semi-structured data can keep the structural relation among texts, and the information of the reaction is more comprehensive.
In this embodiment, the preset format may be a table format in which several parts of the resource type, the resource name, the resource attribute, and the attribute content are used as a header; the preset format may be set to a chart format or a text format, which is not limited to this embodiment.
In the implementation, the data belonging to the resource type, the resource name, the resource attribute and the attribute content in the sample data can be extracted, and the resource type data, the resource name data, the resource attribute data and the attribute content data are stored according to a preset format, so that the analysis operation of the sample data is completed.
In this embodiment, before analyzing the pre-stored sample data into the semi-structured data according to the preset format, the method further includes: and performing a data cleaning operation on the sample data.
In order to ensure the training effect of the model to be trained, sample data for generating the training corpus can be subjected to data cleaning before the training corpus is generated. The data cleaning operation comprises at least one operation of data deduplication, data splitting and special identification removal. Through data deduplication, no repeated data in the training corpus is ensured; and the useless information in the sample data can be deleted by removing the special mark, so that the effectiveness of the training result is ensured.
S120, determining source data and target data in the semi-structured data, and determining training corpus of the model to be trained based on the source data and the target data.
Wherein the model to be trained is a sequence-to-sequence (Sequence to Sequence, seq2 Seq) model. Exemplary, the sequence-to-sequence model includes a T5 model. The method for determining the source data and the target data in the semi-structured data can be specifically as follows: at least one item of data in the resource type, the resource name and the resource attribute in the semi-structured data is extracted to be used as source data, and the resource content data is extracted to be used as target data; alternatively, the resource content data is extracted as source data, and at least one item of data of the resource type, the resource name, and the resource attribute is extracted as target data.
In this embodiment, determining the source data and the target data in the semi-structured data includes: determining auxiliary data in the semi-structured data; combining the resource name data, the resource attribute data and the auxiliary data according to a first preset composition mode to obtain first input data; determining attribute content data corresponding to the first input data as first standard data; the source data and the target data are determined based on the first input data and the first standard data.
It should be noted that the semi-structured data may include auxiliary data, where the auxiliary data may include, belong to, be equal terms, and be blank. In this embodiment, the resource name data, the resource attribute data, and the auxiliary data may be combined according to a first preset composition manner to obtain first input data, and the first input data is determined to be source data. The attribute content data corresponding to the first input data is determined as first standard data, and the first standard data is taken as target data.
The first preset composition mode may be "[ resource attribute data ] [ auxiliary data ]"; the resource attribute data may be resource name data auxiliary data; the first preset composition mode can be determined by a person skilled in the art according to the actual application situation, and the embodiment is not limited.
Optionally, before determining the first input data as the source data, the method further includes: combining the attribute content data and the auxiliary data according to a second preset composition mode to obtain second input data; combining the resource name data and the resource attribute data according to a third preset composition mode to obtain second standard data; determining source data and target data based on the first input data and the first standard data, comprising: determining the first input data and the second input data as source data; the first standard data and the second standard data are determined as target data corresponding to the source data.
In implementations, the source data and the target data may be expanded. Specifically, the attribute content data and the auxiliary data may be combined according to a second preset composition mode to obtain second input data. And combining the resource name data and the resource attribute data according to a third preset composition mode to obtain second standard data. The second preset composition mode may be "[ attribute content data ] [ auxiliary data ]"; the third preset composition may be "[ resource attribute data of [ resource name data ]".
In determining the source data and the target data, the first input data and the second input data may be used as the source data, and the first standard data and the second standard data may be determined as the target data. Thus completing the expansion of the source data and the target data and making the trained model semantically consider the source data and the target data equivalent.
In this embodiment, determining the training corpus of the model to be trained based on the source data and the target data includes: based on the structural relation of the semi-structured data, text pairs comprising source data and target data corresponding to the source data are constructed, and at least one text pair forms a training corpus.
For example, the source data may be "the treatment mode of ELLP syndrome includes", and in the structural relationship of the semi-structured data, the target data corresponding to the source data is "[ laser therapy ], [ drug therapy ]"; the treatment mode of the ELLP syndrome comprises ' sum ', ' laser treatment ', ' drug treatment ', ' constitutes text pairs, and at least one text pair constitutes training corpus.
S130, training the model to be trained based on the training corpus to obtain a target model.
In specific implementation, the source data can be used as input data to be input into the model to be trained to obtain output data of the model to be trained, and the output data is compared with the target data to complete training of the model to be trained to obtain the target model.
Specifically, training the model to be trained based on the training corpus to obtain a target model, including: inputting source data in the training corpus into a model to be trained to obtain output data corresponding to the source data; correcting model parameters of the model to be trained based on the output data and target data corresponding to the source data; and taking the convergence of the loss function in the model to be trained as a training target, and training to obtain a target model.
Wherein the loss function comprises a cross entropy loss function. Taking the convergence of the loss function in the model to be trained as a training target, training to obtain a target model, comprising the following steps: and converging the cross entropy loss function in the model to be trained as a training target, and training to obtain a target model.
The training process is illustrated by taking a model to be trained as a T5 model as an example; in the training process, corresponding input identifiers can be generated for the source data, and target identifiers are generated for the target data; and inputting the source data and the input identification into a T5 model, and determining a query vector, an attribute name vector and a value vector based on a self-attention mechanism of the T5 model. Wherein the query vector represents a question posed by the user, including a resource name; the attribute name vector represents the attribute name corresponding to the problem; the value vector represents the attribute content. Then, the hidden layer weight is obtained through the weight of the self-attention mechanism, the hidden layer weight is decoded by using a decoding module, output data is generated, the model to be trained is trained based on the output data, the target data and the target identification, and model parameters are optimized through back propagation of a loss function, so that the target model is obtained through training.
According to the technical scheme, the pre-stored sample data are analyzed into the semi-structured data according to the preset format, wherein the semi-structured data can comprise resource name data, resource attribute data and attribute content data; determining source data and target data in the semi-structured data, and determining training corpus of a model to be trained based on the source data and the target data, wherein the model to be trained is a sequence-to-sequence model; training the model to be trained based on the training corpus to obtain a target model. The method solves the problem that in the prior art, the unstructured data are adopted for training, so that the structural relation between text information in training samples is lost, and the effect of improving the effectiveness and accuracy of a target model is achieved.
The foregoing describes in detail the embodiments of the training method from sequence to sequence model, and in order to make the technical solution of the method further clear to those skilled in the art, specific application scenarios are given below.
In the embodiment, encyclopedia knowledge is used as sample data, and the sample data is analyzed into semi-structured data; the parsed semi-structured data are shown in table 1:
TABLE 1
The attribute names in table 1 are followed by a plurality of sub-attribute names, which are shown in the form of [ ], and the sub-attribute names are also followed by detailed text content description. The attribute names may use [ ] to identify hierarchical relationships between attributes, e.g., treatment style [ medication ]. For a multi-value attribute, the multi-value attribute is identified by adding [ n ] after the attribute; wherein n represents a positive integer.
Further, based on the parsed semi-structured data, a training corpus is generated. The process of generating the training corpus comprises the following two generation steps:
1. according to the composition form of the attribute names of the resource names, the auxiliary words, the text is formed, the text is used as source data, the content is used as target data, and the source data and the target data form a text pair;
2. forming a text according to the composition form of the [ content ] [ auxiliary word ], and taking the text as source data; forming a text according to the composition form of the attribute names of the resource names, and taking the text as target data; the source data and the target data form text pairs. Wherein the auxiliary words include "comprising," "belonging to," and "yes"; the auxiliary word may also be null.
3. A training corpus is formed by each text pair. The text pairs of the source data and the target data are shown in table 2.
TABLE 2
The training process of the model to be trained through the training corpus is as follows: in the training process, corresponding input identifiers can be generated for the source data, and target identifiers are generated for the target data; the method comprises the steps of inputting source data and an input identifier into a model to be trained, obtaining hidden layer weights through weights of a self-attention mechanism, decoding the hidden layer weights by using a decoding module, generating output data, training the model to be trained based on the output data, target data and the target identifier, and reversely propagating optimized model parameters through a loss function to train and obtain the target model.
Fig. 3 is a schematic structural diagram of a training apparatus for a sequence-to-sequence model according to an embodiment of the present invention, where the training apparatus is used to perform the training method for a sequence-to-sequence model according to any of the foregoing embodiments. The device and the training method of the sequence to the sequence model in the above embodiments belong to the same inventive concept, and reference may be made to the embodiment of the training method of the sequence to the sequence model for details which are not described in detail in the embodiment of the training device of the sequence to the sequence model. As shown in fig. 3, the apparatus includes:
the data analysis module 10 is configured to analyze the pre-stored sample data into semi-structured data according to a preset format; wherein the semi-structured data may include resource name data, resource attribute data, and attribute content data;
the training corpus determining module 11 is configured to determine source data and target data in the semi-structured data, and determine a training corpus of a model to be trained based on the source data and the target data; the model to be trained is a sequence-to-sequence model;
the model training module 12 is configured to train the model to be trained based on the training corpus to obtain the target model.
On the basis of any optional technical scheme in the embodiment of the present invention, optionally, the corpus determining module 11 module includes:
an auxiliary data determining unit for determining auxiliary data in the semi-structured data;
the first combination unit is used for combining the resource name data, the resource attribute data and the auxiliary data according to a first preset composition mode to obtain first input data;
a first standard data determining unit configured to determine first standard data based on attribute content data corresponding to the first input data;
and a target data determining unit configured to determine source data and target data based on the first input data and the first standard data.
On the basis of any optional technical scheme in the embodiment of the present invention, optionally, the corpus determining module 11 further includes:
the second combination unit is used for combining the attribute content data and the auxiliary data according to a second preset composition mode before the first input data is determined to be the source data, so as to obtain second input data;
the third combination unit is used for combining the resource name data and the resource attribute data according to a third preset composition mode to obtain second standard data;
a target data determination unit comprising:
a first determination subunit configured to determine the first input data and the second input data as source data;
and a second determination subunit configured to determine the first standard data and the second standard data as target data corresponding to the source data.
On the basis of any optional technical scheme in the embodiment of the present invention, optionally, the corpus determining module 11 includes:
the text pair construction unit is used for constructing text pairs containing source data and target data corresponding to the source data based on the structural relation of the semi-structured data, and at least one text pair forms a training corpus.
On the basis of any optional technical scheme of the embodiment of the present invention, optionally, the model training module 12 includes:
the output data determining unit is used for inputting the source data in the training corpus into the model to be trained to obtain output data corresponding to the source data;
the model parameter correction unit is used for correcting model parameters of the model to be trained based on the output data and target data corresponding to the source data;
the first target model determining unit is used for converging a loss function in the model to be trained as a training target and training to obtain a target model.
On the basis of any optional technical scheme in the embodiment of the invention, optionally, the loss function comprises a cross entropy loss function; model training module 12, comprising:
and the second target model determining unit is used for converging the cross entropy loss function in the model to be trained as a training target and training to obtain a target model.
On the basis of any optional technical scheme in the embodiment of the invention, optionally, the device further comprises:
the data cleaning module is used for performing data cleaning operation on the sample data before analyzing the pre-stored sample data into semi-structured data according to a preset format; the data cleaning operation comprises at least one operation of data deduplication, data splitting and special identification removal.
According to the technical scheme, the pre-stored sample data are analyzed into the semi-structured data according to the preset format, wherein the semi-structured data can comprise resource name data, resource attribute data and attribute content data; determining source data and target data in the semi-structured data, and determining training corpus of a model to be trained based on the source data and the target data, wherein the model to be trained is a sequence-to-sequence model; training the model to be trained based on the training corpus to obtain a target model. The method solves the problem that in the prior art, the unstructured data are adopted for training, so that the structural relation between text information in training samples is lost, and the effect of improving the effectiveness and accuracy of a target model is achieved.
It should be noted that, in the embodiment of the training device from sequence to sequence model, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Fig. 4 is a schematic structural diagram of an electronic device implementing a training method of a sequence-to-sequence model according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 20 includes at least one processor 21, and a memory, such as a Read Only Memory (ROM) 22, a Random Access Memory (RAM) 23, etc., communicatively connected to the at least one processor 21, wherein the memory stores a computer program executable by the at least one processor, and the processor 21 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 22 or the computer program loaded from the storage unit 28 into the Random Access Memory (RAM) 23. In the RAM23, various programs and data required for the operation of the electronic device 20 may also be stored. The processor 21, the ROM22 and the RAM23 are connected to each other via a bus 24. An input/output (I/O) interface 25 is also connected to bus 24.
Various components in the electronic device 20 are connected to the I/O interface 25, including: an input unit 26 such as a keyboard, a mouse, etc.; an output unit 27 such as various types of displays, speakers, and the like; a storage unit 28 such as a magnetic disk, an optical disk, or the like; and a communication unit 29 such as a network card, modem, wireless communication transceiver, etc. The communication unit 29 allows the electronic device 20 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 21 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 21 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 21 performs the various methods and processes described above, such as a sequence-to-sequence model training method.
In some embodiments, the training method of the sequence-to-sequence model may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 28. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 20 via the ROM22 and/or the communication unit 29. When the computer program is loaded into RAM23 and executed by processor 21, one or more steps of the sequence-to-sequence model training method described above may be performed. Alternatively, in other embodiments, the processor 21 may be configured to perform the training method of the sequence-to-sequence model in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of training a sequence-to-sequence model, comprising:
analyzing the pre-stored sample data into semi-structured data according to a preset format; wherein the semi-structured data may include resource name data, resource attribute data, and attribute content data;
determining source data and target data in the semi-structured data, and determining training corpus of a model to be trained based on the source data and the target data; the model to be trained is a sequence-to-sequence model;
and training the model to be trained based on the training corpus to obtain a target model.
2. The method of claim 1, wherein determining source data and target data in the semi-structured data comprises:
determining auxiliary data in the semi-structured data;
combining the resource name data, the resource attribute data and the auxiliary data according to a first preset composition mode to obtain first input data;
determining first standard data according to the attribute content data corresponding to the first input data;
the source data and the target data are determined based on the first input data and the first standard data.
3. The method of claim 2, wherein prior to determining the first input data as the source data, further comprising:
combining the attribute content data and the auxiliary data according to a second preset composition mode to obtain second input data;
combining the resource name data and the resource attribute data according to a third preset composition mode to obtain second standard data;
the determining the source data and the target data based on the first input data and the first standard data includes:
determining the first input data and the second input data as the source data;
the first standard data and the second standard data are determined as the target data corresponding to the source data.
4. The method of claim 1, wherein the determining a training corpus of a model to be trained based on the source data and the target data comprises:
and constructing text pairs containing the source data and target data corresponding to the source data based on the structural relation of the semi-structured data, and forming the training corpus by at least one text pair.
5. The method according to claim 1, wherein the training the model to be trained based on the training corpus to obtain a target model comprises:
inputting the source data in the training corpus into the model to be trained to obtain output data corresponding to the source data;
correcting model parameters of the model to be trained based on the output data and target data corresponding to the source data;
and converging the loss function in the model to be trained as a training target, and training to obtain the target model.
6. The method of claim 5, wherein the loss function comprises a cross entropy loss function; the step of converging the loss function in the model to be trained as a training target, and training to obtain the target model comprises the following steps:
and converging the cross entropy loss function in the model to be trained as the training target, and training to obtain the target model.
7. The method of claim 1, further comprising, prior to parsing the pre-stored sample data into semi-structured data in a pre-set format:
performing data cleaning operation on the sample data; the data cleaning operation comprises at least one operation of data deduplication, data splitting and special identification removal.
8. A training apparatus for a sequence-to-sequence model, comprising:
the data analysis module is used for analyzing the pre-stored sample data into semi-structured data according to a preset format; wherein the semi-structured data may include resource name data, resource attribute data, and attribute content data;
the training corpus determining module is used for determining source data and target data in the semi-structured data and determining training corpus of a model to be trained based on the source data and the target data; the model to be trained is a sequence-to-sequence model;
and the model training module is used for training the model to be trained based on the training corpus so as to obtain a target model.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the sequence-to-sequence model training method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to implement the sequence-to-sequence model training method of any one of claims 1-7 when executed.
CN202310991099.5A 2023-08-08 2023-08-08 Training method and device for sequence-to-sequence model, electronic equipment and medium Pending CN116796859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310991099.5A CN116796859A (en) 2023-08-08 2023-08-08 Training method and device for sequence-to-sequence model, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310991099.5A CN116796859A (en) 2023-08-08 2023-08-08 Training method and device for sequence-to-sequence model, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN116796859A true CN116796859A (en) 2023-09-22

Family

ID=88044011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310991099.5A Pending CN116796859A (en) 2023-08-08 2023-08-08 Training method and device for sequence-to-sequence model, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN116796859A (en)

Similar Documents

Publication Publication Date Title
US20220318275A1 (en) Search method, electronic device and storage medium
CN112487173A (en) Man-machine conversation method, device and storage medium
CN110275962B (en) Method and apparatus for outputting information
CN114595686A (en) Knowledge extraction method, and training method and device of knowledge extraction model
CN114281965A (en) Information retrieval method, device, electronic equipment and storage medium
CN112528067A (en) Graph database storage method, graph database reading method, graph database storage device, graph database reading device and graph database reading equipment
CN115510212A (en) Text event extraction method, device, equipment and storage medium
CN116340548A (en) Data processing method and device, electronic equipment and storage medium
CN114816578A (en) Method, device and equipment for generating program configuration file based on configuration table
CN114676678A (en) Structured query language data parsing method and device and electronic equipment
JP2022088540A (en) Method for generating user interest image, device, electronic apparatus and storage medium
CN117971698A (en) Test case generation method and device, electronic equipment and storage medium
CN114818736B (en) Text processing method, chain finger method and device for short text and storage medium
JP2023078411A (en) Information processing method, model training method, apparatus, appliance, medium and program product
CN116340518A (en) Text association matrix establishment method and device, electronic equipment and storage medium
CN116796859A (en) Training method and device for sequence-to-sequence model, electronic equipment and medium
CN113553415B (en) Question-answer matching method and device and electronic equipment
CN115639966A (en) Data writing method and device, terminal equipment and storage medium
CN114756691A (en) Structure chart generation method, model training method, map generation method and device
CN115510247A (en) Method, device, equipment and storage medium for constructing electric carbon policy knowledge graph
CN112887426B (en) Information stream pushing method and device, electronic equipment and storage medium
CN115328898A (en) Data processing method and device, electronic equipment and medium
CN114218431A (en) Video searching method and device, electronic equipment and storage medium
CN112560466A (en) Link entity association method and device, electronic equipment and storage medium
CN116089459B (en) Data retrieval method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination