CN114662688A

CN114662688A - Model training method, data processing method, device, electronic device and medium

Info

Publication number: CN114662688A
Application number: CN202210315455.7A
Authority: CN
Inventors: 尉德利
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-24

Abstract

The present disclosure provides a sample data generation method, a model training method, a data processing method, a device, an electronic device, and a medium, which relate to the technical field of artificial intelligence, in particular to the technical field of computer vision, deep learning, and image processing, and can be applied to scenes such as face recognition. The specific implementation scheme is as follows: determining effective sample data of the sample data to obtain the effective sample data, wherein the length of the effective sample data is the length of the effective sample; pruning the effective sample data to obtain pruning sample data; filling pruning sample data to obtain intermediate sample data; masking filling data in the intermediate sample data to obtain target sample data; and training the deep learning model by using a target sample data sequence to obtain a data processing model, wherein the target sample data sequence comprises a plurality of target sample data.

Description

Model training method, data processing method, device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly to the field of computer vision, deep learning, and image processing technology, and can be applied to scenes such as face recognition. In particular, the invention relates to a model training method, a data processing device, an electronic device and a medium.

Background

With the development of computer technology, artificial intelligence technology has also been developed. Artificial intelligence techniques may include computer vision techniques, speech recognition techniques, natural language processing techniques, machine learning, deep learning, big data processing techniques, knowledge-graph techniques, and the like.

Artificial intelligence technology has found wide application in a variety of fields. For example, artificial intelligence techniques can be utilized to generate sample data for training deep learning models.

Disclosure of Invention

The disclosure provides a model training method, a data processing device, an electronic device and a medium.

According to an aspect of the present disclosure, there is provided a training method of a data processing model, including: determining effective sample data of the sample data to obtain the effective sample data, wherein the length of the effective sample data is the length of the effective sample; pruning the effective sample data to obtain pruning sample data; filling the pruning sample data to obtain intermediate sample data; masking the filling data in the intermediate sample data to obtain target sample data; and training a deep learning model by using a target sample data sequence to obtain the data processing model, wherein the target sample data sequence comprises a plurality of target sample data.

According to another aspect of the present disclosure, there is provided a data processing method including: pruning the data sequence to be processed to obtain a pruned data sequence; and inputting the pruning data sequence into the data processing model to obtain a data processing result, wherein the data processing model is obtained by utilizing a training method of the data processing model.

According to another aspect of the present disclosure, there is provided a training apparatus for a data processing model, including: the determining module is used for determining effective sample data of the sample data to obtain the effective sample data, wherein the length of the effective sample data is the length of the effective sample; the first pruning module is used for pruning the effective sample data to obtain pruning sample data; the filling module is used for filling the pruning sample data to obtain intermediate sample data; the mask module is used for masking the filling data in the intermediate sample data to obtain target sample data; and training a deep learning model by using a target sample data sequence to obtain the data processing model, wherein the target sample data sequence comprises a plurality of target sample data.

According to another aspect of the present disclosure, there is provided a data processing apparatus including: the second pruning module is used for pruning the data sequence to be processed to obtain a pruning data sequence; and the input module is used for inputting the pruning data sequence into the data processing model to obtain a data processing result, wherein the data processing model is obtained by utilizing the training method of the data processing model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which a training method, data processing method and apparatus of a data processing model may be applied, according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a method of training a data processing model according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flowchart for determining valid sample data of sample data, resulting in valid sample data, according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flowchart for pruning valid sample data to obtain pruned sample data according to an embodiment of the present disclosure;

FIG. 5 schematically shows a flowchart for filling pruning sample data to obtain intermediate sample data according to an embodiment of the present disclosure;

fig. 6 schematically illustrates a flowchart of masking filler data in intermediate sample data to obtain target sample data according to an embodiment of the present disclosure;

FIG. 7 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates an example schematic diagram of a training method of a data processing model according to an embodiment of this disclosure;

FIG. 9 schematically shows an example schematic of a data processing method according to an embodiment of the disclosure;

FIG. 10 schematically illustrates a block diagram of a training apparatus for a data processing model according to an embodiment of the present disclosure;

FIG. 11 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure; and

FIG. 12 schematically shows a block diagram of an electronic device suitable for implementing a training method and a data processing method of a data processing model according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The data sequence can be processed by utilizing a deep learning model to obtain a data processing result. The data sequence may comprise a text data sequence or an audio data sequence, etc. In the processing process, the computation cost of the deep learning model is large. Therefore, the data length can be shortened by thinning (namely pruning) the data sequence, so that the calculation amount and the memory occupation amount of the deep learning model are reduced, and further the calculation cost of the deep learning is reduced. Pruning may refer to operations for shortening the length of data.

In the training phase of the deep learning model, a plurality of sample data can be combined into sample batch data to train the deep learning model. The sample batch data may also be referred to as a sample data sequence. The sample data sequence can be pruned to obtain a pruned sample data sequence. For example, a maximum sample length of sample data in the sequence of sample data may be determined. And pruning the sample data in the sample data sequence to obtain a pruned sample data sequence. The sample length of the pruning sample data in the pruning sample data sequence may be determined according to the maximum sample length.

In the training process of the deep learning model, the sample lengths of different sample data in the same sample data sequence need to be the same. The above is also applicable to pruning sample data sequences, i.e. the sample lengths of different pruning sample data in the same pruning sample data sequence need to be made the same. In practical situations, the sample lengths of different pruning sample data in the same pruning sample data sequence are not completely the same or different, and therefore, the pruning sample data in the pruning sample data sequence needs to be processed, so that the sample lengths of different pruning sample data in the same pruning sample data sequence are the same.

This can be achieved as follows. That is, the maximum sample length in the pruning sample data sequence may be determined. For sample data in the pruning sample data sequence whose sample length is smaller than the maximum sample length, data padding may be performed on the pruning sample data so that the sample length of the pruning sample data is the maximum sample length. A mask may be set to eliminate computational interference of the extra padding data on unpopulated data in the pruning sample data. And training the deep learning model by using the mask sample data. Masking may refer to an operation for removing the effect of padding data in data on unfilled data in the data so that the padding data does not participate in the computation in which the data participates.

In the inference stage of the deep learning model, a "variable length" mode can be utilized to speed up data processing. That is, the data to be processed included in the data sequence to be processed may be spliced end to obtain a spliced data sequence. And pruning the spliced data sequence to obtain the spliced data sequence after pruning. And processing the spliced data sequence after pruning by using the trained deep learning model to obtain a data processing result. Pruning the spliced data sequence to obtain a spliced data sequence after pruning, which may include: an effective length of the concatenated data in the concatenated data sequence is determined. And pruning spliced data in the spliced data sequence according to the effective length of the spliced data to obtain a spliced data sequence after pruning. That is, the length of the concatenated data may be determined according to the effective length.

The model training stage is pruned based on the maximum sample length, and the model inference stage is pruned based on the effective length, so that the pruning modes of the model training stage and the model inference stage are different, and therefore, the pruning proportion of the model training stage and the model inference stage is difficult to keep consistent, on the basis, a mask set in the training stage cannot eliminate the calculation interference of pruned data on non-pruned data, and finally, an inference result obtained by utilizing a trained deep learning model for inference is invalid.

In order to make the inference result obtained by using the trained deep learning model effective, the training phase and the inference phase need to be in the same pruning mode, so as to make the deep learning model obtained in the model training phase compatible with the inference mode used in the model inference phase, and improve the accuracy of the model inference result.

Fig. 1 schematically illustrates an exemplary system architecture of a training method, a data processing method and apparatus to which a data processing model may be applied, according to an embodiment of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the training method, the data processing method, and the apparatus for a data processing model may be applied may include a terminal device, but the terminal device may implement the training method, the data processing method, and the apparatus for a data processing model provided in the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services. For example, a background management server (for example only) that provides support for content browsed by a user using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

The Server 105 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, and solves the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS (Virtual Private Server, VPS). Server 105 may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that the data processing method provided by the embodiment of the present disclosure may be generally executed by the

terminal device

101, 102, or 103. Accordingly, the data processing apparatus provided in the embodiment of the present disclosure may also be provided in the

terminal device

101, 102, or 103.

Alternatively, the data processing method provided by the embodiment of the present disclosure may also be generally executed by the server 105. Accordingly, the data processing apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 105. The data processing method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the data processing apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be noted that the training method of the data processing model provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the training device of the data processing model provided by the embodiment of the present disclosure may be generally disposed in the server 105. The training method of the data processing model provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the training apparatus for the data processing model provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

Alternatively, the training method of the data processing model provided by the embodiment of the present disclosure may also be generally executed by the

terminal device

101, 102, or 103. Correspondingly, the training device of the data processing model provided by the embodiment of the present disclosure may also be disposed in the

terminal device

101, 102, or 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.

FIG. 2 schematically shows a flow chart of a method of training a data processing model according to an embodiment of the present disclosure.

As shown in FIG. 2, the method 200 includes operations S210-S250.

In operation S210, valid sample data of the sample data is determined, and the valid sample data is obtained, where the length of the valid sample data is the length of the valid sample.

In operation S220, the valid sample data is pruned to obtain pruned sample data.

In operation S230, the pruning sample data is filled to obtain intermediate sample data.

In operation S240, the padding data in the intermediate sample data is masked to obtain target sample data.

In operation S250, a deep learning model is trained using the target sample data sequence to obtain a data processing model.

According to an embodiment of the present disclosure, the data sequence may include at least one of: a text data sequence, an audio data sequence, and an image data sequence. The data may include at least one of: text data, audio data, and image data. For example, the target sample data sequence may comprise a target sample text data sequence or a target sample audio data sequence. The sample data may include one of: sample text data and sample audio data.

According to an embodiment of the present disclosure, the deep learning model may include one of: a text processing model, an audio processing model, and an image processing model. The text processing model may include at least one of: a text recognition model, a text detection model, a text question and answer model, a text translation model and the like. The audio processing model may include at least one of: an audio recognition model, an audio detection model, an audio translation model, an audio synthesis model, and the like.

According to an embodiment of the present disclosure, the deep learning model may include one of: supervised, semi-supervised and unsupervised models. The model structure of the deep learning model can be configured according to actual business requirements, and is not limited herein.

According to the embodiment of the disclosure, the sample data may be obtained from the database by calling the data obtaining interface and using the data obtaining interface in response to receiving the data obtaining request. For example, in a case where the sample data is sample text data, the sample text data may be obtained from a database corresponding to a text processing task by calling a data interface corresponding to the sample text data in response to receiving a text data obtaining request and by using the data interface corresponding to the sample text data. In the case where the sample data is sample audio data, the sample audio data may be acquired from a database corresponding to an audio processing task by calling a data interface corresponding to the sample audio data in response to receiving an audio data acquisition request and using the data interface corresponding to the sample audio data. The sample data may be labeled sample data or unlabeled sample data. For example, in response to receiving a data annotation request, the sample data may be annotated to obtain annotated sample data.

According to an embodiment of the present disclosure, valid sample data may refer to sample data in which data filling is not performed among sample data. The effective sample data can be pruned based on the pruning rule to obtain pruning sample data corresponding to the effective sample data. The pruning rules may be configured according to actual service requirements, and are not limited herein. For example, the pruning rule may be a rule for pruning a predetermined region in the valid sample data based on a pruning coefficient. The predetermined region may include a continuous region or an interval region.

According to the embodiment of the disclosure, data filling can be performed on pruning sample data to obtain intermediate sample data. For example, predetermined data may be filled in a predetermined position of the pruning sample data, resulting in intermediate sample data. The predetermined position may be configured according to actual service requirements, and is not limited herein.

According to the embodiment of the disclosure, in order to eliminate the influence of the padding data in the intermediate sample data on the unfilled data in the intermediate sample data, a mask may be set on the padding data in the intermediate sample data, so as to obtain the target sample data. The setting mode of the mask may be configured according to the actual service requirement, and is not limited herein. The target sample data is obtained by pruning and masking valid sample data of the sample data. The target sample data may be used as sample data for model training of the data processing model.

According to the embodiment of the disclosure, after the target sample data is obtained, the deep learning model can be trained by using the target sample data sequence to obtain the data processing model. The target sample data sequence may comprise a plurality of target sample data. Each target sample data may be obtained by the above processing method. That is, valid sample data of each of the plurality of sample data in the sample data sequence may be determined, and a plurality of valid sample data may be obtained. And pruning the plurality of effective sample data to obtain a pruning sample data sequence. And filling the pruning sample data sequence to obtain an intermediate sample data sequence. And masking the filling data in the intermediate sample data sequence to obtain a target sample data sequence.

According to an embodiment of the present disclosure, the sample data sequence may comprise a plurality of sample data. The dimension of the sample data sequence may be B × N × C. Wherein B may refer to the number of sample data of the sample data sequence; n may refer to the length of object data of the sample data; c may refer to the number of characteristic dimensions of the object data. Object data may refer to the smallest data primitive. Valid sample data may refer to sample data in which data filling is not performed. The sample data sequence may be decomposed to obtain a plurality of sample data. A plurality of sample data may be stored in a list or array form to facilitate determination of valid samples in the sample data.

According to an embodiment of the present disclosure, the pruning sample data sequence may comprise a plurality of pruning sample data. The plurality of effective sample data can be pruned based on the pruning rule to obtain pruning sample data corresponding to the plurality of effective sample data.

According to an embodiment of the present disclosure, the intermediate sample data sequence may comprise a plurality of intermediate sample data. And for the pruning sample data in the plurality of pruning sample data included in the pruning sample data sequence, performing data filling on the pruning sample data to obtain intermediate sample data.

According to the embodiment of the present disclosure, the target sample data sequence is obtained by pruning and masking valid sample data of each of a plurality of sample data in the sample data sequence. The target sample data sequence may be used as sample data for model training of the data processing model.

According to the embodiment of the disclosure, the target sample data sequence can be input into the deep learning model to obtain an output result. And obtaining a loss function value according to the output result based on the loss function. And adjusting model parameters of the deep learning model according to the loss function values to obtain a trained deep learning model, namely, a data processing model. For example, the output result may be input to a loss function to obtain a loss function value. And adjusting the model parameters of the deep learning model according to the loss function value until a preset ending condition is met. And determining the deep learning model obtained under the condition that the preset end condition is met as the trained deep learning model. The predetermined end condition may include at least one of: the loss function values converge and the training round reaches a predetermined training round. The predetermined training round may be determined according to business scenario requirements. Business scenario requirements may refer to requirements related to a training task. For example, business scenario requirements may include requirements related to a text processing task or requirements related to an audio processing task.

According to an embodiment of the present disclosure, the training method of the data processing model of the embodiment of the present disclosure may be performed by an electronic device. The electronic device may include at least one processor. The processor may be configured to perform a training method of the data processing model provided by the embodiments of the present disclosure. The training method of the data processing model provided by the embodiment of the disclosure may be performed by a single processor, or may be performed in parallel by a plurality of processors. For example, the electronic device may be a server or a terminal device. The server may be server 105 in fig. 1. The terminal device may be terminal device 101, terminal device 102 or terminal device 103 in fig. 1.

According to the embodiment of the present disclosure, the target sample data is obtained by pruning and masking valid sample data of the sample data, which is sample data of the valid sample length, and is compatible with the variable length mode used in the model estimation stage, because the same pruning manner as the variable length mode used in the model estimation stage is used. In addition, target sample data obtained by pruning the sample data is utilized, but not sample data which is not pruned, so that the deep learning model is trained by utilizing the target sample data sequence comprising a plurality of target sample data, the calculated amount and the display memory occupation amount of the deep learning model can be reduced, the data processing amount of electronic equipment such as a processor is reduced, the processing efficiency of the electronic equipment such as the processor is improved, the effect of improving the internal performance of the electronic equipment according with the natural law is further obtained, and the core competitiveness of the electronic equipment is further improved. On the basis, the accuracy of the inference result of the data processing model obtained by training the deep learning model by using the target sample data sequence can be improved.

The following describes a training method of a data processing model according to an embodiment of the disclosure with reference to fig. 3 to fig. 6.

Fig. 3 schematically illustrates a flowchart of determining valid sample data of sample data, resulting in valid sample data, according to an embodiment of the present disclosure.

As shown in fig. 3, the method 300 is further defined by operation S210 in fig. 2, and the method 300 may include operations S311 to S312.

In operation S311, unpopulated data in the sample data is determined.

In operation S312, valid sample data is obtained according to the unfilled data.

According to an embodiment of the present disclosure, the sample data may include a plurality of object data. The unfilled data may include one or more object data. It may be determined whether object data in the sample data is unpopulated data. If it is unfilled data, the object data may be determined to be object data in valid sample data.

Operations S311 to S312 may be performed by the electronic device according to an embodiment of the present disclosure. The electronic device may be a server or a terminal device. The server may be the server 105 in fig. 1. The terminal devices may be terminal device 101, terminal device 102, and terminal device 103 in fig. 1.

According to an embodiment of the present disclosure, operation S311 may include the following operations.

And determining unfilled data in the sample data according to the result of multiplying the sample data by the first preset data.

According to an embodiment of the present disclosure, the first predetermined data may refer to feature data that performs a logical operation with the sample data to determine unpopulated data in the sample data. And obtaining a calculation result according to the logic operation result of the sample data and the first preset data. The calculation result may be characterized as "0" or "1", and "0" may be characterized as "False", that is, the object data is filled data. A "1" may characterize "True", i.e., the object data is unfilled data. Whether the object data is unfilled data may be determined according to whether the calculation result is "0" or "1".

And pruning the effective sample data based on the preset pruning coefficient to obtain pruning sample data.

According to an embodiment of the present disclosure, the predetermined pruning coefficient may refer to a coefficient according to which valid sample data is pruned. The pruning coefficient may be a value greater than 0 and less than 1. The pruning coefficient may be configured according to actual service requirements, and is not limited herein. For example, the predetermined pruning factor may be analytically determined from historical data. The predetermined pruning factor may be 0.5 or 0.8.

According to the embodiment of the disclosure, for each valid sample data in the plurality of valid sample data, the valid sample data can be pruned based on a predetermined pruning coefficient and a pruning algorithm to obtain pruning sample data. For example, pruning may be performed from the end position to the start position of valid sample data such that the sample length of the pruned sample data and the sample length of the valid sample data satisfy a predetermined pruning coefficient. Alternatively, pruning may be performed from the start position to the end position of valid sample data such that the sample length of the pruned sample data and the sample length of the valid sample data satisfy a predetermined pruning coefficient.

Fig. 4 schematically shows a flowchart for pruning valid sample data to obtain pruned sample data according to an embodiment of the present disclosure.

As shown in fig. 4, the method 400 is a further limitation of operation S220 in fig. 2, and the method 400 may include operations S421 to S422.

In operation S421, a weight coefficient of object data in valid sample data is determined.

In operation S422, effective sample data is pruned based on a predetermined pruning coefficient and a weight coefficient of object data in the effective sample data, so as to obtain pruning sample data.

According to the embodiment of the present disclosure, the weight coefficient may characterize the importance degree of the object data in the valid sample data. The relationship between the weight coefficient and the importance degree may be configured according to the actual service requirement, and is not limited herein. For example, the larger the weight coefficient, the higher the degree of importance. Alternatively, the larger the weight coefficient, the lower the degree of importance.

According to the embodiment of the disclosure, the weight coefficient of each object data in the valid sample data can be determined according to the attention strategy. The weight coefficient of each object data in the sample data can be effectively used for sequencing the object data to obtain a sequencing result. And determining pruning sample data from the effective sample data according to the sequencing result and the preset pruning coefficient. Alternatively, the object data may be determined to be the object data in the pruning sample data in a case where it is determined that the weight coefficient of the object data is greater than or equal to a predetermined weight coefficient threshold value for the weight coefficient of each object data in the valid sample data. The predetermined weight coefficient threshold may be configured according to actual service requirements, and is not limited herein. For example, the predetermined weight coefficient threshold may be determined based on the model evaluation index value. The model evaluation index value may be obtained by evaluating a deep learning model trained by using historical sample data.

According to an embodiment of the present disclosure, operations S421 to S422 may be performed by an electronic device. The electronic device may be a server or a terminal device. The server may be the server 105 in fig. 1. The terminal devices may be terminal device 101, terminal device 102, and terminal device 103 in fig. 1.

Fig. 5 schematically shows a flowchart for filling pruning sample data to obtain intermediate sample data according to an embodiment of the present disclosure.

As shown in fig. 5, the method 500 is a further limitation of operation S230 in fig. 2, and the method 500 may include operations S531-S533.

In operation S531, is the sample length of the pruned sample data smaller than the expected sample length? If yes, perform operation S532; if not, operation S533 is performed.

In operation S532, second predetermined data is filled at the end of the pruning sample data to obtain intermediate sample data.

In operation S533, pruning sample data is determined as intermediate sample data.

According to an embodiment of the present disclosure, the expected sample length is a maximum sample length or a predetermined sample length, the maximum sample length being a maximum value of sample lengths of each of a plurality of pruning sample data in a pruning sample data sequence corresponding to the target sample data.

According to the embodiment of the present disclosure, the predetermined sample length may be configured according to an actual service requirement, and is not limited herein. For example, the predetermined sample length may be greater than or equal to the maximum sample length. The maximum sample length may be determined in the following manner. That is, the plurality of pruning sample data included in the pruning sample data sequence may be traversed, and the respective sample lengths of the plurality of pruning sample data may be determined. And determining the maximum value of the sample length according to the respective sample lengths of the plurality of pruning sample data. The maximum value of the sample length is determined as the maximum sample length.

According to embodiments of the present disclosure, it may be determined whether the sample length of the pruning sample data is less than an expected sample length. If the sample length of the pruning sample data is determined to be smaller than the expected sample length, second preset data can be filled after the last object data of the pruning sample data, and intermediate sample data is obtained. The second predetermined data may be configured according to an actual service requirement, which is not limited herein. For example, the second preset data may include "0".

Operations S531-S533 may be performed by an electronic device, according to an embodiment of the present disclosure. The electronic device may be a server or a terminal device. The server may be the server 105 in fig. 1. The terminal devices may be terminal device 101, terminal device 102, and terminal device 103 in fig. 1.

Fig. 6 schematically shows a flowchart of masking filler data in intermediate sample data to obtain target sample data according to an embodiment of the present disclosure.

As shown in fig. 6, the method 600 is further limited to operation S240 in fig. 2, and the method 600 may include operations S641 to S644.

In operation S641, does the intermediate sample data include the second predetermined sample data that is filled? If yes, executing operations S642-S643; if not, operation S644 is performed.

In operation S642, the second predetermined data is masked, resulting in masked data.

In operation S643, target sample data is obtained according to valid sample data in the mask data and the intermediate sample data.

In operation S644, the intermediate sample data is determined as the target sample data.

According to an embodiment of the present disclosure, if it is determined that the intermediate sample data includes the second predetermined data that is filled, in order to reduce an influence of the second predetermined data on other data than the second predetermined data in the intermediate sample data, the second predetermined data may be masked, resulting in masked data.

Operations S641 through S644 may be performed by an electronic device according to an embodiment of the present disclosure. The electronic device may be a server or a terminal device. The server may be the server 105 in fig. 1. The terminal devices may be terminal device 101, terminal device 102, and terminal device 103 in fig. 1.

According to an embodiment of the present disclosure, the deep learning model may include a Transformer. The Transformer model is a high-performance deep learning model which can process data sequences.

According to the embodiment of the disclosure, the Transforme model obtained by training according to the embodiment of the disclosure can enjoy the benefits of a 'lengthening' mode and sparsification, and the inference process of the Transformer model can be accelerated by utilizing the 'lengthening' mode and the sparsification, so that the display memory occupation amount is reduced.

Fig. 7 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure.

As shown in FIG. 7, the method 700 may include operations S710-S720.

In operation S710, a data sequence to be processed is pruned to obtain a pruned data sequence.

In operation S720, the pruned data sequence is input to the data processing model to obtain a data processing result.

According to an embodiment of the present disclosure, the data processing model may be obtained by training using a training method of the data processing model according to an embodiment of the present disclosure. The data to be processed may include one of: text data to be processed, audio data to be processed, and image data to be processed.

For example, the data processing model may be a retrieval model. The data to be processed can comprise a search word to be processed and a search text to be processed. The retrieval text to be processed can be input into the retrieval model, and a text vector to be processed corresponding to the retrieval text to be processed is obtained. And inputting the search word to be processed into the search model to obtain a word vector to be processed corresponding to the search word to be processed. And determining the similarity between the text vector to be processed and the word vector to be processed. And determining the correlation between the search words to be processed and the search texts to be processed according to the similarity.

According to an embodiment of the present disclosure, pruning a data sequence to be processed may include: determining respective effective data to be processed in a plurality of data to be processed in the data sequence to be processed. And pruning effective data to be processed in the plurality of data to be processed to obtain a pruning data sequence.

According to the embodiment of the present disclosure, the valid data to be processed may be pruned in the same way as the pruning way for valid sample data described in the embodiment of the present disclosure, and details are not described herein again.

According to an embodiment of the present disclosure, operations S710 to S720 may be performed by an electronic device, which may be a server or a terminal device. The server may be the server 105 in fig. 1. The terminal devices may be terminal device 101, terminal device 102, and terminal device 103 in fig. 1.

According to the embodiment of the disclosure, the data length of the data to be processed included in the data sequence to be processed is the effective data length, and the pruning data sequence is obtained by pruning the data sequence to be processed, so that the model recommendation stage and the model training stage use the same pruning mode, and therefore, the model inference stage can be compatible with the model training stage, and the accuracy of the inference result of the data processing model obtained by training the deep learning model by using the target sample data sequence is improved.

FIG. 8 schematically illustrates an example schematic of a training method of a data processing model according to an embodiment of this disclosure.

As shown in fig. 8, a sample data sequence 801 may include a plurality of sample data in 800. Unfilled data in the sample data is determined. From the unpopulated data, valid sample data for each of a plurality of sample data is determined, the plurality of valid sample data forming a valid sample data sequence 802.

Each valid sample data in the sample data sequence 802 is pruned according to a predetermined pruning coefficient to obtain a pruning sample data sequence 803. The pruning sample data sequence 803 may include a plurality of pruning sample data. For example, the predetermined pruning factor may be 0.5 (i.e., 1/2)

And traversing the plurality of pruning sample data, determining respective sample lengths of the plurality of pruning sample data, and obtaining a plurality of sample lengths. From the plurality of sample lengths, a maximum sample length is determined. The maximum sample length is determined as the expected sample length.

The sample length of each pruning sample data in the pruning sample data sequence 803 is compared with the expected sample length, and second predetermined data is filled at the end of the pruning sample data smaller than the expected sample length, so that the sample length of each pruning sample data in the pruning sample data sequence 803 is the same, and an intermediate sample data sequence 804 is obtained. The second predetermined data may be a "0" intermediate sample data sequence 804 may comprise a plurality of intermediate sample data.

Masking second predetermined data of each of the plurality of intermediate sample data to obtain a target sample data sequence 805. For example, the mask may be set to "p". The target sample data sequence 805 may comprise a plurality of target sample data. The deep learning model 806 is trained using the target sample data sequence 805, resulting in a data processing model 807.

Fig. 9 schematically shows an example schematic diagram of a data processing method according to an embodiment of the present disclosure.

As shown in fig. 9, in 900, a plurality of data to be processed are spliced into a data sequence 901, and the length of each data to be processed is an effective length. And pruning each to-be-processed data in the to-be-processed data sequence according to a preset pruning coefficient to obtain a pruning data sequence 902. The predetermined pruning factor may be 0.5 (i.e., 1/2). The pruned data sequence 902 may include a plurality of pruned data. A plurality of pieces of pruned data in pruned data sequence 902 are input to data processing model 903, resulting in data processing result 904.

FIG. 10 schematically shows a block diagram of a training apparatus for a data processing model according to an embodiment of the present disclosure.

As shown in fig. 10, the training apparatus 1000 of the data processing model may include a determination module 1010, a first pruning module 1020, a padding module 1030, a masking module 1040, and a training model 1050.

The determining module 1010 is configured to determine valid sample data of the sample data, to obtain the valid sample data, where the length of the valid sample data is an effective sample length.

A first pruning module 1020, configured to prune the valid sample data to obtain pruned sample data.

And a filling module 1030, configured to fill the pruning sample data to obtain intermediate sample data.

The mask module 1040 is configured to perform a mask on the padding data in the intermediate sample data to obtain target sample data.

The training module 1050 is configured to train the deep learning model by using a target sample data sequence to obtain a data processing model, where the target sample data sequence includes a plurality of target sample data.

According to an embodiment of the present disclosure, the determining module 1010 may include a first determining submodule and a first obtaining submodule.

The first determining submodule is used for determining unfilled data in the sample data.

And the first obtaining submodule is used for obtaining effective sample data according to the unfilled data.

According to an embodiment of the present disclosure, the first determination submodule may include a first determination unit.

And the determining unit is used for determining unfilled data in the sample data according to a result obtained by multiplying the sample data by the first preset data.

According to an embodiment of the present disclosure, the first pruning module 1020 may include a second obtaining sub-module and a pruning sub-module.

And the second obtaining submodule is used for determining the weight coefficient of the object data in the effective sample data.

And the pruning submodule is used for pruning the effective sample data based on the preset pruning coefficient and the weight coefficient of the object data in the effective sample data to obtain the pruning sample data.

According to an embodiment of the present disclosure, the padding module 1030 may include a padding sub-module.

And the filling sub-module is used for filling second preset data at the tail of the pruning sample data to obtain intermediate sample data under the condition that the sample length of the pruning sample data is determined to be smaller than the expected sample length, wherein the expected sample length is the maximum sample length or the preset sample length, and the maximum sample length is the maximum value in the sample lengths of a plurality of pruning sample data in the pruning sample data sequence corresponding to the target sample data.

According to an embodiment of the present disclosure, the masking module 1040 may include a masking submodule and a third obtaining submodule.

And the mask submodule is used for masking the second predetermined data to obtain mask data under the condition that the intermediate sample data is determined to comprise the filled second predetermined data.

And the third obtaining submodule is used for obtaining target sample data according to the mask data and the effective sample data in the intermediate sample data.

According to an embodiment of the present disclosure, the sample data includes one of: sample text data and sample audio data.

Fig. 11 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 11, the data processing apparatus 1100 may include a second pruning module 1110 and an input module 1120.

The second pruning module 1110 is configured to prune the data sequence to be processed to obtain a pruned data sequence.

An input module 1120, configured to input the pruned data sequence into the data processing model, so as to obtain a data processing result.

According to an embodiment of the present disclosure, the data processing model may be obtained by training using a training apparatus of the data processing model according to an embodiment of the present disclosure.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as above.

According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as above.

According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as above.

Fig. 12 schematically shows a block diagram of an electronic device adapted to implement a generation method of sample data, a training method of a data processing model and a data processing method according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the electronic apparatus 1200 includes a computing unit 1201, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM1203, various programs and data necessary for the operation of the electronic apparatus 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

Various components in the electronic device 1200 are connected to the I/O interface 1205, including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 1201 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1201 performs the respective methods and processes described above, such as the training method of the data processing model and the data processing method. For example, in some embodiments, the training method of the data processing model and the data processing method may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM1203 and executed by the computing unit 1201, one or more steps of the training method of the data processing model and the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured in any other suitable way (e.g. by means of firmware) to perform a training method and a data processing method of a data processing model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a data processing model, comprising:

determining effective sample data of the sample data to obtain the effective sample data, wherein the length of the effective sample data is the length of the effective sample;

pruning the effective sample data to obtain pruning sample data;

filling the pruning sample data to obtain intermediate sample data;

masking filling data in the intermediate sample data to obtain target sample data; and

and training a deep learning model by using a target sample data sequence to obtain the data processing model, wherein the target sample data sequence comprises a plurality of target sample data.

2. The method of claim 1, wherein said determining valid sample data for each of the sample data, resulting in valid sample data, comprises:

for each sample data of a plurality of sample data comprised by the sequence of sample data,

determining unpopulated data in the sample data; and

and obtaining the effective sample data according to the unfilled data.

3. The method of claim 2, wherein said determining unpopulated data in said sample data comprises:

and determining unfilled data in the sample data according to a result obtained by multiplying the sample data by first preset data.

4. The method according to any one of claims 1 to 3, wherein the pruning the valid sample data to obtain pruned sample data comprises:

determining a weight coefficient of object data in the valid sample data; and

and pruning the effective sample data based on a preset pruning coefficient and the weight coefficient of the object data in the effective sample data to obtain the pruning sample data.

5. The method according to any one of claims 1 to 4, wherein the filling the pruning sample data to obtain intermediate sample data comprises:

and under the condition that the sample length of the pruning sample data is determined to be smaller than the expected sample length, filling second preset data at the tail of the pruning sample data to obtain the intermediate sample data, wherein the expected sample length is the maximum sample length or the preset sample length, and the maximum sample length is the maximum value in the sample lengths of the plurality of pruning sample data in the pruning sample sequence corresponding to the sample data.

6. The method according to claim 5, wherein said masking the padding data in the intermediate sample data to obtain target sample data comprises:

under the condition that the intermediate sample data is determined to comprise the filled second preset data, masking the second preset data to obtain masked data; and

and obtaining the target sample data according to the mask data and the effective sample data in the intermediate sample data.

7. The method of any of claims 1 to 6, wherein the sample data comprises one of: sample text data and sample audio data.

8. The method of any of claims 1-7, wherein the deep learning model comprises a Transformer.

9. A method of data processing, comprising:

pruning the data sequence to be processed to obtain a pruned data sequence; and

inputting the pruning data sequence into the data processing model to obtain a data processing result,

wherein the data processing model is trained by the method according to any one of claims 1 to 8.

10. A training apparatus for a data processing model, comprising:

the device comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining effective sample data of the sample data to obtain the effective sample data, and the length of the effective sample data is the effective sample length;

the first pruning module is used for pruning the effective sample data to obtain pruning sample data;

the filling module is used for filling the pruning sample data to obtain intermediate sample data;

the mask module is used for masking the filling data in the intermediate sample data to obtain target sample data; and

and the training model is used for training the deep learning model by utilizing the target sample data sequence to obtain the data processing model.

11. The apparatus of claim 10, wherein the means for determining comprises:

a first determining submodule, configured to determine unpopulated data in the sample data; and

and the first obtaining submodule is used for obtaining the effective sample data according to the unfilled data.

12. The apparatus of claim 11, wherein the first determination submodule comprises:

and the determining unit is used for determining unfilled data in the sample data according to a result obtained by multiplying the sample data by first preset data.

13. The apparatus of any one of claims 10-12, wherein the first pruning module comprises:

the second obtaining submodule is used for determining the weight coefficient of the object data in the effective sample data; and

14. The apparatus of any of claims 10-13, wherein the fill module comprises:

and a filling sub-module, configured to, when it is determined that the sample length of the pruning sample data is smaller than an expected sample length, fill second predetermined data at the end of the pruning sample data to obtain the intermediate sample data, where the expected sample length is a maximum sample length or a predetermined sample length, and the maximum sample length is a maximum value of respective sample lengths of a plurality of pruning sample data in a pruning sample data sequence corresponding to the target sample data.

15. The apparatus of claim 14, wherein the masking module comprises:

the mask submodule is used for masking the second preset data to obtain mask data under the condition that the intermediate sample data is determined to comprise the filled second preset data; and

and the third obtaining submodule is used for obtaining the target sample data according to the mask data and the effective sample data in the intermediate sample data.

16. The apparatus according to any of claims 10-15, wherein the sample data comprises one of: sample text data and sample audio data.

17. The apparatus of any of claims 10-16, wherein the deep learning model comprises a Transformer.

18. A data processing apparatus comprising:

the second pruning module is used for pruning the data sequence to be processed to obtain a pruning data sequence; and

an input module for inputting the pruning data sequence into the data processing model to obtain a data processing result,

wherein the data processing model is trained by using the device according to any one of claims 10-17.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8 or 9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8 or 9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 8 or claim 9.