WO2020143303A1

WO2020143303A1 - Method and device for training deep learning model, computer apparatus, and storage medium

Info

Publication number: WO2020143303A1
Application number: PCT/CN2019/117310
Authority: WO
Inventors: 金戈; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-10
Filing date: 2019-11-12
Publication date: 2020-07-16
Also published as: CN109886402A

Abstract

Disclosed in embodiments of the present invention are a method and device for training a deep learning model, a computer apparatus, and a storage medium. The method relates to artificial intelligence technology, and comprises: inputting an input sample set into an input layer of a deep learning model to be trained, and taking an output result of the input layer as a sample set to be adjusted; nonlinearly activating the sample set to obtain an activated sample set; performing batch normalization on the activated sample set to obtain a normalized sample set; taking the next layer of the deep learning model as a target layer, and inputting the normalized sample set into the target layer; and taking an output result of the target layer as a new sample set to be adjusted, and continuing to nonlinearly activate the sample set to obtain an activated sample set.

Description

Deep learning model training method, device, computer equipment and storage medium

This application requires the priority of the Chinese patent application submitted to the China Patent Office on January 10, 2019, with the application number 201910023779.1 and the application name "Deep Learning Model Training Methods, Devices, Computer Equipment, and Storage Media", all of which are approved by The reference is incorporated in this application.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a deep learning model training method, device, computer equipment, and storage medium.

Background technique

Deep learning is a new field in machine learning research. Its motivation is to establish and simulate a neural network for human brain analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.

Deep learning models, for example, Convolutional Neural Networks (CNN) require a large amount of data training before they can be actually used. In the training process of deep learning models, most choose to use batch normalization (BN) method to process each layer of the deep learning model, so that the difference of samples in the process of network transmission in each layer is reduced. However, the existing processing methods do not have enough control over the next layer of the network, resulting in poor training of deep learning models.

Summary of the invention

Embodiments of the present application provide a deep learning model training method, device, computer equipment, and storage medium, which are intended to improve the training effect of deep learning models.

In a first aspect, an embodiment of the present application provides a deep learning model training method, which includes:

Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;

Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;

Batch-normalize the activated sample set to obtain a standard sample set;

Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;

Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.

In a second aspect, an embodiment of the present application further provides a deep learning model training device, which includes:

The first input unit is configured to input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;

A first activation unit, configured to perform a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set;

In the first batch of standardization units, the user performs batch normalization processing on the activated sample set to obtain a standard sample set;

A second input unit, configured to use the next layer of the deep learning model to be trained as a target layer, and input the standard sample set into the target layer;

The notification unit is configured to use the output result of the target layer as a new sample set to be adjusted, and notify the activation unit to return to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.

In a third aspect, an embodiment of the present application further provides a computer device, including a memory and a processor connected to the memory; the memory is used to store a computer program; the processor is used to run a computer stored in the memory Program to perform the following steps:

Batch-normalize the activated sample set to obtain a standard sample set;

According to a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps:

Batch-normalize the activated sample set to obtain a standard sample set;

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

1 is a schematic flowchart of a deep learning model training method provided by an embodiment of the present application;

2 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application;

3 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application;

4 is a schematic flowchart of a deep learning model training method provided by another embodiment of this application;

5 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application;

6 is a schematic block diagram of a deep learning model training device provided by an embodiment of this application;

7 is a schematic block diagram of a first activation unit unit of a deep learning model training device provided by an embodiment of this application;

8 is a schematic block diagram of the first batch of standardized units of a deep learning model training device provided by an embodiment of this application;

9 is a schematic block diagram of an acquisition unit of a first batch of standardized units of a deep learning model training device provided by an embodiment of this application;

10 is a schematic block diagram of a deep learning model training device provided by another embodiment of this application;

11 is a schematic block diagram of a first word segmentation unit of a deep learning model training device according to another embodiment of this application; and

12 is a schematic block diagram of a computer device provided by an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

It should be understood that when used in this specification and the appended claims, the terms "including" and "comprising" indicate the presence of described features, wholes, steps, operations, elements, and/or components, but do not exclude one or The presence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the terminology used in the description of this application is for the purpose of describing particular embodiments only and is not intended to limit this application. As used in the specification of the present application and the appended claims, unless the context clearly indicates otherwise, the singular forms "a", "an", and "the" are intended to include the plural forms.

It should also be further understood that the term "and/or" used in the specification of the present application and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes these combinations .

As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context . Similarly, the phrase "if determined" or "if [described condition or event] is detected" can be interpreted in the context to mean "once determined" or "in response to a determination" or "once detected [described condition or event ]" or "In response to detection of [the described condition or event]".

Please refer to FIG. 1, which is a schematic flowchart of a deep learning model training method provided by an embodiment of the present application. As shown in the figure, the method includes the following steps S1-S5:

S1. Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted.

In the embodiment of the present application, the deep learning model to be trained is trained by inputting a sample set. The deep learning model to be trained includes an input layer, multiple hidden layers, and an output layer.

In specific implementation, the input sample set is input to the input layer of the deep learning model to be trained to train the input layer of the deep learning model to be trained.

In the embodiment of the present application, when the output result of the input layer is input, the output result of the input layer is used as the sample set to be adjusted, and the sample set to be adjusted is adjusted before being input to the next layer of the deep learning model to be trained.

S2. Perform a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set.

In a specific implementation, a non-linear activation process is performed on the sample set to be adjusted to obtain an activated sample set. By performing a nonlinear activation process on the set of samples to be adjusted, the nonlinear factors of the deep learning model to be trained can be improved, and the expressiveness of the deep learning model to be trained can be improved.

In an embodiment, the above step S2 specifically includes the following steps:

Perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.

It should be noted that, in this application, the non-linear activation functions include: Sigmoid function, Tanh function, and ReLU (Rectified Linear Unit) function, which is not specifically limited in this application.

S3. Perform batch normalization processing on the activated sample set to obtain a standard sample set.

In specific implementation, a batch standardization process is performed on the activated sample set to obtain a standard sample set. The batch standardization process can reduce the difference of samples in the transmission of each layer of the deep learning model, thereby improving the training effect of the model.

In the embodiment of the present application, after performing non-linear activation processing on the sample set to be adjusted to obtain an activated sample set, batch normalization processing is performed on the activated sample set to obtain a standard sample set. By adjusting the batch normalization process to the non-linear activation process (the non-linear activation process will increase the sample difference), better control of the next layer of the network can be obtained, thereby improving the training effect of the deep learning model to be trained.

In an embodiment, referring to FIG. 2, the above step S3 includes the following steps S31-S32:

S31. Obtain the mean and variance of each sample in the activated sample set.

In a specific implementation, the batch normalization process of the activated sample set needs to use the mean and variance of each sample in the activated sample set. To this end, first calculate the mean and variance of each sample in the activated sample set.

In an embodiment, referring to FIG. 3, the above step S31 specifically includes the following steps S311-S312:

S311, by the following formula

Calculate the mean μ of each sample in the active sample set.

In the specific implementation, through the following formula

Calculate the mean μ of each sample in the active sample set, where i is the sample number, m is the number of samples, and x _i is the value of the sample.

S312, through the following formula

Calculate the variance σ of each sample in the active sample set.

In the specific implementation, through the formula

Calculate the variance σ of each sample in the active sample set.

Where i is the serial number of the sample, m is the number of samples, and x _i is the value of the sample.

S32. Perform batch processing on the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set.

In specific implementation, after obtaining the mean and variance of each sample in the activated sample set, batch process the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set .

In the embodiment of the present application, the batch normalization formula is

In the above formula, i is the serial number of the sample, x _i is the value of the sample in the active sample set, y _i is the value of the sample in the corresponding standard sample set, and μ is the average of each sample in the active sample set, σ is the variance of each sample in the active sample set, m is the number of samples in the active sample set, w, γ, β, and ε are the parameters of the deep learning model to be trained. The above random initialization is generated, and then iteratively updated during the training process These parameters.

S4. Use the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer.

In a specific implementation, the next layer of the deep learning model to be trained is used as a target layer, and the standard sample set is input into the target layer to train the target layer.

S5. Use the output result of the target layer as a new sample set to be adjusted, and return to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.

In the solution of the present application, the output result of the target layer is used as a new sample set to be adjusted, and the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set is returned, and then the activated sample set A batch standardization process is performed to obtain a standard sample set, and then the next layer of the deep learning model to be trained is used as a target layer, and the standard sample set is input into the target layer to train the target layer. And so on until the deep learning model to be trained outputs the result.

Through the above method, the output results of the previous layer of the deep learning model to be trained can be subjected to nonlinear activation processing and batch normalization processing, and then input to the next layer of the deep learning model to be trained until the deep learning model to be trained Output layer directly.

In the embodiment of the present application, by adjusting the batch normalization process to the non-linear activation process (the non-linear activation process will increase the difference of the samples), the batch normalization process directly acts on the next layer structure of the deep learning model, which can obtain The structure of the next layer has better control, which improves the training effect of the deep learning model to be trained.

FIG. 4 is a schematic flowchart of a deep learning model training method provided by another embodiment of the present application. As shown in FIG. 4, the deep learning model training method of this embodiment includes steps S41-S47. Steps S43-S47 are similar to steps S1-S5 in the above embodiment, and will not be repeated here. The steps S41-S42 added in this embodiment will be described in detail below.

S41: Perform word segmentation processing on the training text to obtain a word segmentation sample set, where the word segmentation sample set is a set of samples obtained after word segmentation is performed on the training text.

In this embodiment, the training text is the text pre-stored in the terminal, which can be directly called to obtain.

In the specific implementation, word segmentation refers to dividing a sequence of Chinese characters into individual words. Word segmentation is the process of recombining consecutive word sequences into word sequences according to certain specifications. Word segmentation is a basic step in text processing.

The word segmentation sample set is obtained by performing word segmentation processing on the training text, where the word segmentation sample set is a set composed of samples (words) obtained after word segmentation is performed on the training text.

In an embodiment, in an embodiment, referring to FIG. 5, the above step S41 specifically includes the following steps S411-S412:

S411: Perform word segmentation processing on the training text by a preset word segmentation tool to obtain an initial word segmentation sample set.

In the specific implementation, the commonly used word segmentation tool is the stammer word segmentation tool. In this embodiment, the stutter word segmentation tool is used to perform word segmentation processing on the training text to obtain an initial word segmentation sample set. The stammer word segmentation tool is suitable for the segmentation of Chinese text. The accuracy of the segmentation of Chinese text is extremely high, which can improve the accuracy of this program.

Or, in other embodiments, other word segmentation tools may be used to perform word segmentation processing on the training text, which is not specifically limited in this application.

S412: Remove the stop words in the initial word segmentation sample set to obtain the word segmentation sample set.

In a specific implementation, the stop words in the initial word segmentation sample set are removed to obtain a word segmentation sample set. It should be noted that stop words are often prepositions, adverbs or conjunctions. For example, "in", "inside", "also", "of", "it", "for", etc. are stop words.

S42: Perform word vector training on the samples in the word segmentation sample set by using a preset word vector tool to obtain the input sample set, where the input sample set is composed of word vectors of the samples in the word segmentation sample set set.

In specific implementation, word2vec is used as a word vector tool. word2vec is a natural language processing tool, and its function is to convert words in natural language into word vectors that can be understood by a computer.

Traditional word vectors are susceptible to dimensional disasters, and any two words are isolated and cannot reflect the relationship between words. Therefore, this embodiment uses word2vec to obtain the word vector, which can be calculated by calculating the vector The distance between them reflects the similarity between words and makes the training results more accurate.

In this embodiment, the word vector training is performed on the samples in the word segmentation sample set by word2vec to obtain the word vector of each sample. The word vectors of the samples in the word segmentation sample set are combined to obtain the input sample set.

Or, in other embodiments, other word vector tools may be used to perform word vector training on the samples in the word segmentation sample set, which is not specifically limited in this application.

FIG. 6 is a schematic block diagram of a deep learning model training device 60 provided by an embodiment of the present application. As shown in FIG. 6, corresponding to the above deep learning model training method, the present application further provides a deep learning model training device 60. The deep learning model training device 60 includes a unit for performing the above deep learning model training method, and the device may be configured in a desktop computer, a tablet computer, a laptop computer, and other terminals. Specifically, referring to FIG. 6, the deep learning model training device 60 includes a first input unit 61, a first activation unit 62, a first batch of normalization units 63, a second input unit 64, and a notification unit 65.

The first input unit 61 is used to input the input sample set to the input layer of the deep learning model to be trained, and the output result of the input layer is used as the sample set to be adjusted; the first activation unit 62 is used to Adjust the sample set to perform nonlinear activation processing to obtain the activated sample set; the first batch of normalization unit 63, the user performs batch normalization processing on the activated sample set to obtain the standard sample set; the second input unit 64 is used to convert the Train the next layer of the deep learning model as the target layer, and input the standard sample set into the target layer; the notification unit 65 is used to use the output result of the target layer as a new sample set to be adjusted, and The notification activation unit returns the step of performing a non-linear activation process on the sample set to be adjusted to obtain an activated sample set.

In an embodiment, referring to FIG. 7, the first activation unit 62 includes a second activation unit 621. The second activation unit 621 is configured to perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.

In an embodiment, referring to FIG. 8, the first batch of normalization units 63 includes an acquisition unit 631 and a second batch of normalization units 632.

The obtaining unit 631 is used to obtain the mean and variance of each sample in the activated sample set; the second batch of normalization unit 632 is used to obtain the mean and variance of each sample in the activated sample set according to a preset batch normalization formula The activated sample set is batch-processed.

In an embodiment, referring to FIG. 9, the acquisition unit 631 includes a first calculation unit 6311 and a second calculation unit 6312.

The first calculation unit 6211 is used to pass the following formula

Calculate the mean μ of each sample in the active sample set; the second calculation unit 6312 is used to pass the following formula

Calculate the variance σ of each sample in the active sample set; where i is the number of the sample, m is the number of samples, and x _i is the value of the sample.

FIG. 10 is a schematic block diagram of a deep learning model training device 60 provided by another embodiment of the present application. As shown in FIG. 10, the deep learning model training device 60 of this embodiment adds the first word segmentation unit 66 and the training unit 67 based on the above embodiment.

The first word segmentation unit 66 is used to perform word segmentation processing on the training text to obtain a word segmentation sample set, and the word segmentation sample set is a set composed of samples obtained after word segmentation is performed on the training text; the training unit 67 is used to pass a preset The word vector tool performs word vector training on the samples in the word segmentation sample set to obtain the input sample set, and the input sample set is a set of word vectors of the samples in the word segmentation sample set.

In an embodiment, referring to FIG. 11, the first word segmentation unit 66 includes a second word segmentation unit 661 and a removal unit 662.

The second word segmentation unit 661 is used to perform word segmentation processing on the training text through a preset word segmentation tool to obtain an initial word segmentation sample set; a removal unit 662 is used to remove stop words in the initial word segmentation sample set to obtain the word segmentation Sample collection.

It should be noted that those skilled in the art can clearly understand that the specific implementation process of the above deep learning model training device 60 and each unit can refer to the corresponding description in the foregoing method embodiments. For the convenience and conciseness of description, the This will not be repeated here.

The above deep learning model training device 60 may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 12.

Please refer to FIG. 12, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a terminal, where the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and other electronic devices with communication functions.

Referring to FIG. 12, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When the computer program 5032 is executed, it may cause the processor 502 to execute a deep learning model training method.

The processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute a deep learning model training method.

The network interface 505 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 12 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

The processor 502 is used to run the computer program 5032 stored in the memory to implement the deep learning model training method of the present application.

It should be understood that in the embodiment of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor.

A person of ordinary skill in the art may understand that all or part of the processes in the method for implementing the foregoing embodiments may be completed by instructing relevant hardware through a computer program. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiments.

Therefore, the present application also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. When the computer program is executed by the processor, the processor is caused to execute the deep learning model training method of the present application.

The storage medium is a physical, non-transitory storage medium, for example, it can be a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk and other various physical storages that can store program codes medium.

Those of ordinary skill in the art may realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly explain the hardware and software. Interchangeability, in the above description, the composition and steps of each example have been generally described according to function. Whether these functions are executed in hardware or software depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

The above is only the specific implementation of this application, but the scope of protection disclosed in this application is not limited to this, any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application Modifications or replacements, these modifications or replacements should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A deep learning model training method, including:

Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;

Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;

Batch-normalize the activated sample set to obtain a standard sample set;

Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;

Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
The method according to claim 1, wherein the performing non-linear activation processing on the sample set to be adjusted to obtain an activated sample set includes:

Perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.
The method according to claim 2, wherein the non-linear activation function includes a Sigmoid function, a Tanh function, and a ReLU function.
The method according to claim 1, wherein the batch normalizing the activated sample set to obtain a standard sample set includes:

Obtaining the mean and variance of each sample in the activated sample set;

Perform batch processing on the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set.
The method according to claim 4, wherein the obtaining the mean and variance of each sample in the active sample set includes:

By the following formula
Calculate the mean μ of each sample in the active sample set;

By the following formula
Calculate the variance σ of each sample in the active sample set;

Where i is the serial number of the sample, m is the number of samples, and x i is the value of the sample.
The method according to claim 1, wherein before the input sample set is input to the input layer of the deep learning model to be trained, and the output result of the input layer is used as the sample set to be adjusted, the method further comprises :

Performing word segmentation processing on the training text to obtain a word segmentation sample set, where the word segmentation sample set is a set of samples obtained after word segmentation is performed on the training text;

The word vector training is performed on the samples in the word segmentation sample set by a preset word vector tool to obtain the input sample set, and the input sample set is a set of word vectors of the samples in the word segmentation sample set.
The method according to claim 6, wherein the word segmentation processing on the training text to obtain a word segmentation sample set includes:

Use the preset word segmentation tool to perform word segmentation on the training text to obtain the initial word segmentation sample set;

The stop words in the initial word segmentation sample set are removed to obtain the word segmentation sample set.
The method according to claim 6, wherein the word vector tool is word2vec.
The method according to claim 7, wherein the word segmentation tool is a stammer word segmentation tool.
A deep learning model training device, including:

The first input unit is configured to input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;

A first activation unit, configured to perform a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set;

In the first batch of standardization units, the user performs batch normalization processing on the activated sample set to obtain a standard sample set;

A second input unit, configured to use the next layer of the deep learning model to be trained as a target layer, and input the standard sample set into the target layer;

The notification unit is configured to use the output result of the target layer as a new sample set to be adjusted, and notify the activation unit to return to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
A computer device includes a memory and a processor connected to the memory; wherein the memory is used to store a computer program; the processor is used to run the computer program stored in the memory to perform the following steps:

Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;

Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;

Batch-normalize the activated sample set to obtain a standard sample set;

Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;

Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
The computer device according to claim 11, wherein the step of performing a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set includes:

Perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.
The computer device according to claim 12, wherein the non-linear activation function includes a Sigmoid function, a Tanh function, and a ReLU function.
The computer device according to claim 11, wherein the step of batch-normalizing the activated sample set to obtain a standard sample set includes:

Obtaining the mean and variance of each sample in the activated sample set;

Perform batch processing on the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set.
The computer device according to claim 14, wherein the step of obtaining the mean and variance of each sample in the active sample set includes:

By the following formula
Calculate the mean μ of each sample in the active sample set;

By the following formula
Calculate the variance σ of each sample in the active sample set;

Where i is the serial number of the sample, m is the number of samples, and x i is the value of the sample.
The computer device according to claim 11, wherein before the input sample set is input to the input layer of the deep learning model to be trained, and the output result of the input layer is used as the sample set to be adjusted, the processor Also perform the following steps:

Performing word segmentation processing on the training text to obtain a word segmentation sample set, where the word segmentation sample set is a set of samples obtained after word segmentation is performed on the training text;

The word vector training is performed on the samples in the word segmentation sample set by a preset word vector tool to obtain the input sample set, and the input sample set is a set of word vectors of the samples in the word segmentation sample set.
The computer device according to claim 16, wherein the step of performing word segmentation processing on the training text to obtain a word segmentation sample set includes:

Use the preset word segmentation tool to perform word segmentation on the training text to obtain the initial word segmentation sample set;

The stop words in the initial word segmentation sample set are removed to obtain the word segmentation sample set.
The computer device according to claim 16, wherein the word vector tool is word2vec.
The computer device according to claim 17, wherein the word segmentation tool is a stammer word segmentation tool.
A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the processor causes the processor to perform the following steps:

Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;

Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;

Batch-normalize the activated sample set to obtain a standard sample set;

Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;

Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.