WO2020143303A1 - Method and device for training deep learning model, computer apparatus, and storage medium - Google Patents

Method and device for training deep learning model, computer apparatus, and storage medium Download PDF

Info

Publication number
WO2020143303A1
WO2020143303A1 PCT/CN2019/117310 CN2019117310W WO2020143303A1 WO 2020143303 A1 WO2020143303 A1 WO 2020143303A1 CN 2019117310 W CN2019117310 W CN 2019117310W WO 2020143303 A1 WO2020143303 A1 WO 2020143303A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample set
input
word segmentation
adjusted
activated
Prior art date
Application number
PCT/CN2019/117310
Other languages
French (fr)
Chinese (zh)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910023779.1A external-priority patent/CN109886402B/en
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020143303A1 publication Critical patent/WO2020143303A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a deep learning model training method, device, computer equipment, and storage medium.
  • Deep learning is a new field in machine learning research. Its motivation is to establish and simulate a neural network for human brain analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
  • Deep learning models for example, Convolutional Neural Networks (CNN) require a large amount of data training before they can be actually used.
  • CNN Convolutional Neural Networks
  • BN batch normalization
  • Embodiments of the present application provide a deep learning model training method, device, computer equipment, and storage medium, which are intended to improve the training effect of deep learning models.
  • an embodiment of the present application provides a deep learning model training method, which includes:
  • an embodiment of the present application further provides a deep learning model training device, which includes:
  • the first input unit is configured to input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
  • a first activation unit configured to perform a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set
  • the user performs batch normalization processing on the activated sample set to obtain a standard sample set
  • a second input unit configured to use the next layer of the deep learning model to be trained as a target layer, and input the standard sample set into the target layer;
  • the notification unit is configured to use the output result of the target layer as a new sample set to be adjusted, and notify the activation unit to return to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
  • an embodiment of the present application further provides a computer device, including a memory and a processor connected to the memory; the memory is used to store a computer program; the processor is used to run a computer stored in the memory Program to perform the following steps:
  • an embodiment of the present application further provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps:
  • FIG. 1 is a schematic flowchart of a deep learning model training method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a deep learning model training method provided by another embodiment of this application.
  • FIG. 5 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a deep learning model training device provided by an embodiment of this application.
  • FIG. 7 is a schematic block diagram of a first activation unit unit of a deep learning model training device provided by an embodiment of this application;
  • FIG. 8 is a schematic block diagram of the first batch of standardized units of a deep learning model training device provided by an embodiment of this application;
  • FIG. 9 is a schematic block diagram of an acquisition unit of a first batch of standardized units of a deep learning model training device provided by an embodiment of this application;
  • FIG. 10 is a schematic block diagram of a deep learning model training device provided by another embodiment of this application.
  • FIG. 11 is a schematic block diagram of a first word segmentation unit of a deep learning model training device according to another embodiment of this application.
  • FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • the term “if” may be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context .
  • the phrase “if determined” or “if [described condition or event] is detected” can be interpreted in the context to mean “once determined” or “in response to a determination” or “once detected [described condition or event ]” or “In response to detection of [the described condition or event]”.
  • FIG. 1 is a schematic flowchart of a deep learning model training method provided by an embodiment of the present application. As shown in the figure, the method includes the following steps S1-S5:
  • the deep learning model to be trained is trained by inputting a sample set.
  • the deep learning model to be trained includes an input layer, multiple hidden layers, and an output layer.
  • the input sample set is input to the input layer of the deep learning model to be trained to train the input layer of the deep learning model to be trained.
  • the output result of the input layer is used as the sample set to be adjusted, and the sample set to be adjusted is adjusted before being input to the next layer of the deep learning model to be trained.
  • a non-linear activation process is performed on the sample set to be adjusted to obtain an activated sample set.
  • step S2 specifically includes the following steps:
  • non-linear activation functions include: Sigmoid function, Tanh function, and ReLU (Rectified Linear Unit) function, which is not specifically limited in this application.
  • a batch standardization process is performed on the activated sample set to obtain a standard sample set.
  • the batch standardization process can reduce the difference of samples in the transmission of each layer of the deep learning model, thereby improving the training effect of the model.
  • batch normalization processing is performed on the activated sample set to obtain a standard sample set.
  • step S3 includes the following steps S31-S32:
  • the batch normalization process of the activated sample set needs to use the mean and variance of each sample in the activated sample set. To this end, first calculate the mean and variance of each sample in the activated sample set.
  • step S31 specifically includes the following steps S311-S312:
  • i is the serial number of the sample
  • m is the number of samples
  • x i is the value of the sample.
  • the batch normalization formula is In the above formula, i is the serial number of the sample, x i is the value of the sample in the active sample set, y i is the value of the sample in the corresponding standard sample set, and ⁇ is the average of each sample in the active sample set, ⁇ is the variance of each sample in the active sample set, m is the number of samples in the active sample set, w, ⁇ , ⁇ , and ⁇ are the parameters of the deep learning model to be trained.
  • the above random initialization is generated, and then iteratively updated during the training process These parameters.
  • the next layer of the deep learning model to be trained is used as a target layer, and the standard sample set is input into the target layer to train the target layer.
  • the output result of the target layer is used as a new sample set to be adjusted, and the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set is returned, and then the activated sample set
  • a batch standardization process is performed to obtain a standard sample set, and then the next layer of the deep learning model to be trained is used as a target layer, and the standard sample set is input into the target layer to train the target layer. And so on until the deep learning model to be trained outputs the result.
  • the output results of the previous layer of the deep learning model to be trained can be subjected to nonlinear activation processing and batch normalization processing, and then input to the next layer of the deep learning model to be trained until the deep learning model to be trained Output layer directly.
  • the batch normalization process directly acts on the next layer structure of the deep learning model, which can obtain The structure of the next layer has better control, which improves the training effect of the deep learning model to be trained.
  • FIG. 4 is a schematic flowchart of a deep learning model training method provided by another embodiment of the present application.
  • the deep learning model training method of this embodiment includes steps S41-S47. Steps S43-S47 are similar to steps S1-S5 in the above embodiment, and will not be repeated here. The steps S41-S42 added in this embodiment will be described in detail below.
  • S41 Perform word segmentation processing on the training text to obtain a word segmentation sample set, where the word segmentation sample set is a set of samples obtained after word segmentation is performed on the training text.
  • the training text is the text pre-stored in the terminal, which can be directly called to obtain.
  • word segmentation refers to dividing a sequence of Chinese characters into individual words. Word segmentation is the process of recombining consecutive word sequences into word sequences according to certain specifications. Word segmentation is a basic step in text processing.
  • the word segmentation sample set is obtained by performing word segmentation processing on the training text, where the word segmentation sample set is a set composed of samples (words) obtained after word segmentation is performed on the training text.
  • step S41 specifically includes the following steps S411-S412:
  • S411 Perform word segmentation processing on the training text by a preset word segmentation tool to obtain an initial word segmentation sample set.
  • the commonly used word segmentation tool is the stammer word segmentation tool.
  • the stutter word segmentation tool is used to perform word segmentation processing on the training text to obtain an initial word segmentation sample set.
  • the stammer word segmentation tool is suitable for the segmentation of Chinese text. The accuracy of the segmentation of Chinese text is extremely high, which can improve the accuracy of this program.
  • word segmentation tools may be used to perform word segmentation processing on the training text, which is not specifically limited in this application.
  • S412 Remove the stop words in the initial word segmentation sample set to obtain the word segmentation sample set.
  • stop words in the initial word segmentation sample set are removed to obtain a word segmentation sample set.
  • stop words are often prepositions, adverbs or conjunctions. For example, “in”, “inside”, “also”, “of”, “it”, “for”, etc. are stop words.
  • S42 Perform word vector training on the samples in the word segmentation sample set by using a preset word vector tool to obtain the input sample set, where the input sample set is composed of word vectors of the samples in the word segmentation sample set set.
  • word2vec is used as a word vector tool.
  • word2vec is a natural language processing tool, and its function is to convert words in natural language into word vectors that can be understood by a computer.
  • the word vector training is performed on the samples in the word segmentation sample set by word2vec to obtain the word vector of each sample.
  • the word vectors of the samples in the word segmentation sample set are combined to obtain the input sample set.
  • word vector tools may be used to perform word vector training on the samples in the word segmentation sample set, which is not specifically limited in this application.
  • FIG. 6 is a schematic block diagram of a deep learning model training device 60 provided by an embodiment of the present application. As shown in FIG. 6, corresponding to the above deep learning model training method, the present application further provides a deep learning model training device 60.
  • the deep learning model training device 60 includes a unit for performing the above deep learning model training method, and the device may be configured in a desktop computer, a tablet computer, a laptop computer, and other terminals. Specifically, referring to FIG. 6, the deep learning model training device 60 includes a first input unit 61, a first activation unit 62, a first batch of normalization units 63, a second input unit 64, and a notification unit 65.
  • the first input unit 61 is used to input the input sample set to the input layer of the deep learning model to be trained, and the output result of the input layer is used as the sample set to be adjusted;
  • the first activation unit 62 is used to Adjust the sample set to perform nonlinear activation processing to obtain the activated sample set;
  • the first batch of normalization unit 63 the user performs batch normalization processing on the activated sample set to obtain the standard sample set;
  • the second input unit 64 is used to convert the Train the next layer of the deep learning model as the target layer, and input the standard sample set into the target layer;
  • the notification unit 65 is used to use the output result of the target layer as a new sample set to be adjusted, and
  • the notification activation unit returns the step of performing a non-linear activation process on the sample set to be adjusted to obtain an activated sample set.
  • the first activation unit 62 includes a second activation unit 621.
  • the second activation unit 621 is configured to perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.
  • the first batch of normalization units 63 includes an acquisition unit 631 and a second batch of normalization units 632.
  • the obtaining unit 631 is used to obtain the mean and variance of each sample in the activated sample set; the second batch of normalization unit 632 is used to obtain the mean and variance of each sample in the activated sample set according to a preset batch normalization formula
  • the activated sample set is batch-processed.
  • the acquisition unit 631 includes a first calculation unit 6311 and a second calculation unit 6312.
  • the first calculation unit 6211 is used to pass the following formula Calculate the mean ⁇ of each sample in the active sample set;
  • the second calculation unit 6312 is used to pass the following formula Calculate the variance ⁇ of each sample in the active sample set; where i is the number of the sample, m is the number of samples, and x i is the value of the sample.
  • FIG. 10 is a schematic block diagram of a deep learning model training device 60 provided by another embodiment of the present application. As shown in FIG. 10, the deep learning model training device 60 of this embodiment adds the first word segmentation unit 66 and the training unit 67 based on the above embodiment.
  • the first word segmentation unit 66 is used to perform word segmentation processing on the training text to obtain a word segmentation sample set, and the word segmentation sample set is a set composed of samples obtained after word segmentation is performed on the training text; the training unit 67 is used to pass a preset
  • the word vector tool performs word vector training on the samples in the word segmentation sample set to obtain the input sample set, and the input sample set is a set of word vectors of the samples in the word segmentation sample set.
  • the first word segmentation unit 66 includes a second word segmentation unit 661 and a removal unit 662.
  • the second word segmentation unit 661 is used to perform word segmentation processing on the training text through a preset word segmentation tool to obtain an initial word segmentation sample set; a removal unit 662 is used to remove stop words in the initial word segmentation sample set to obtain the word segmentation Sample collection.
  • the above deep learning model training device 60 may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 12.
  • FIG. 12 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 is a terminal, where the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and other electronic devices with communication functions.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the computer program 5032 When executed, it may cause the processor 502 to execute a deep learning model training method.
  • the processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute a deep learning model training method.
  • the network interface 505 is used for network communication with other devices.
  • FIG. 12 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • the processor 502 is used to run the computer program 5032 stored in the memory to implement the deep learning model training method of the present application.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor.
  • a person of ordinary skill in the art may understand that all or part of the processes in the method for implementing the foregoing embodiments may be completed by instructing relevant hardware through a computer program.
  • the computer program may be stored in a storage medium, which is a computer-readable storage medium.
  • the computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiments.
  • the present application also provides a storage medium.
  • the storage medium may be a computer-readable storage medium.
  • the storage medium stores a computer program.
  • the processor is caused to execute the deep learning model training method of the present application.
  • the storage medium is a physical, non-transitory storage medium, for example, it can be a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk and other various physical storages that can store program codes medium.
  • ROM Read-Only Memory

Abstract

Disclosed in embodiments of the present invention are a method and device for training a deep learning model, a computer apparatus, and a storage medium. The method relates to artificial intelligence technology, and comprises: inputting an input sample set into an input layer of a deep learning model to be trained, and taking an output result of the input layer as a sample set to be adjusted; nonlinearly activating the sample set to obtain an activated sample set; performing batch normalization on the activated sample set to obtain a normalized sample set; taking the next layer of the deep learning model as a target layer, and inputting the normalized sample set into the target layer; and taking an output result of the target layer as a new sample set to be adjusted, and continuing to nonlinearly activate the sample set to obtain an activated sample set.

Description

深度学习模型训练方法、装置、计算机设备及存储介质Deep learning model training method, device, computer equipment and storage medium
本申请要求于2019年1月10日提交中国专利局、申请号为201910023779.1、申请名称为“深度学习模型训练方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on January 10, 2019, with the application number 201910023779.1 and the application name "Deep Learning Model Training Methods, Devices, Computer Equipment, and Storage Media", all of which are approved by The reference is incorporated in this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种深度学习模型训练方法、装置、计算机设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a deep learning model training method, device, computer equipment, and storage medium.
背景技术Background technique
深度学习是机器学习研究中的一个新的领域,其动机在于建立、模拟人脑进行分析学习的神经网络,它模仿人脑的机制来解释数据,例如图像,声音和文本等数据。Deep learning is a new field in machine learning research. Its motivation is to establish and simulate a neural network for human brain analysis and learning. It mimics the mechanism of the human brain to interpret data, such as images, sounds, and text.
深度学习模型,例如,卷积神经网络(Convolutional Neural Network,CNN)需经大量数据训练后才能实际使用。在深度学习模型的训练过程中,大多选择用批标准化(Batch Normalization,BN)的方法对深度学习模型的各层进行处理,使得网络在每一层传递的过程中样本的差异性有所降低,然而现有的处理方法对下一层网络的控制不够,导致深度学习模型的训练效果不佳。Deep learning models, for example, Convolutional Neural Networks (CNN) require a large amount of data training before they can be actually used. In the training process of deep learning models, most choose to use batch normalization (BN) method to process each layer of the deep learning model, so that the difference of samples in the process of network transmission in each layer is reduced. However, the existing processing methods do not have enough control over the next layer of the network, resulting in poor training of deep learning models.
发明内容Summary of the invention
本申请实施例提供了一种深度学习模型训练方法、装置、计算机设备及存储介质,旨在提高深度学习模型的训练效果。Embodiments of the present application provide a deep learning model training method, device, computer equipment, and storage medium, which are intended to improve the training effect of deep learning models.
第一方面,本申请实施例提供了一种深度学习模型训练方法,其包括:In a first aspect, an embodiment of the present application provides a deep learning model training method, which includes:
将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
对所述待调整样本集合进行非线性激活处理以得到激活样本集合;Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;
对所述激活样本集合进行批标准化处理以得到标准样本集合;Batch-normalize the activated sample set to obtain a standard sample set;
将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;
将所述目标层的输出结果作为新的待调整样本集合,并返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
第二方面,本申请实施例还提供了一种深度学习模型训练装置,其包括:In a second aspect, an embodiment of the present application further provides a deep learning model training device, which includes:
第一输入单元,用于将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;The first input unit is configured to input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
第一激活单元,用于对所述待调整样本集合进行非线性激活处理以得到激活样本集合;A first activation unit, configured to perform a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set;
第一批标准化单元,用户对所述激活样本集合进行批标准化处理以得到标准样本集合;In the first batch of standardization units, the user performs batch normalization processing on the activated sample set to obtain a standard sample set;
第二输入单元,用于将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;A second input unit, configured to use the next layer of the deep learning model to be trained as a target layer, and input the standard sample set into the target layer;
通知单元,用于将所述目标层的输出结果作为新的待调整样本集合,并通知激活单元返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。The notification unit is configured to use the output result of the target layer as a new sample set to be adjusted, and notify the activation unit to return to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
第三方面,本申请实施例还提供了一种计算机设备,包括存储器以及与所述存储器相连的处理器;所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如下步骤:In a third aspect, an embodiment of the present application further provides a computer device, including a memory and a processor connected to the memory; the memory is used to store a computer program; the processor is used to run a computer stored in the memory Program to perform the following steps:
将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
对所述待调整样本集合进行非线性激活处理以得到激活样本集合;Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;
对所述激活样本集合进行批标准化处理以得到标准样本集合;Batch-normalize the activated sample set to obtain a standard sample set;
将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;
将所述目标层的输出结果作为新的待调整样本集合,并返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
第四方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行以下步骤:According to a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the following steps:
将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
对所述待调整样本集合进行非线性激活处理以得到激活样本集合;Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;
对所述激活样本集合进行批标准化处理以得到标准样本集合;Batch-normalize the activated sample set to obtain a standard sample set;
将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;
将所述目标层的输出结果作为新的待调整样本集合,并返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
附图说明BRIEF DESCRIPTION
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的一种深度学习模型训练方法的流程示意图;1 is a schematic flowchart of a deep learning model training method provided by an embodiment of the present application;
图2为本申请实施例提供的一种深度学习模型训练方法的子流程示意图;2 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application;
图3为本申请实施例提供的一种深度学习模型训练方法的子流程示意图;3 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application;
图4为本申请另一实施例提供的一种深度学习模型训练方法的流程示意图;4 is a schematic flowchart of a deep learning model training method provided by another embodiment of this application;
图5为本申请实施例提供的一种深度学习模型训练方法的子流程示意图;5 is a schematic diagram of a sub-process of a deep learning model training method provided by an embodiment of the present application;
图6为本申请实施例提供的一种深度学习模型训练装置的示意性框图;6 is a schematic block diagram of a deep learning model training device provided by an embodiment of this application;
图7为本申请实施例提供的一种深度学习模型训练装置的第一激活单元单元的示意性框图;7 is a schematic block diagram of a first activation unit unit of a deep learning model training device provided by an embodiment of this application;
图8为本申请实施例提供的一种深度学习模型训练装置的第一批标准化单元的示意性框图;8 is a schematic block diagram of the first batch of standardized units of a deep learning model training device provided by an embodiment of this application;
图9为本申请实施例提供的一种深度学习模型训练装置的第一批标准化单元的获取单元的示意性框图;9 is a schematic block diagram of an acquisition unit of a first batch of standardized units of a deep learning model training device provided by an embodiment of this application;
图10为本申请另一实施例提供的一种深度学习模型训练装置的示意性框图;10 is a schematic block diagram of a deep learning model training device provided by another embodiment of this application;
图11为本申请另一实施例提供的一种深度学习模型训练装置的第一分词单元的示意性框图;以及11 is a schematic block diagram of a first word segmentation unit of a deep learning model training device according to another embodiment of this application; and
图12为本申请实施例提供的一种计算机设备的示意性框图。12 is a schematic block diagram of a computer device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the terms "including" and "comprising" indicate the presence of described features, wholes, steps, operations, elements, and/or components, but do not exclude one or The presence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the description of this application is for the purpose of describing particular embodiments only and is not intended to limit this application. As used in the specification of the present application and the appended claims, unless the context clearly indicates otherwise, the singular forms "a", "an", and "the" are intended to include the plural forms.
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be further understood that the term "and/or" used in the specification of the present application and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes these combinations .
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context . Similarly, the phrase "if determined" or "if [described condition or event] is detected" can be interpreted in the context to mean "once determined" or "in response to a determination" or "once detected [described condition or event ]" or "In response to detection of [the described condition or event]".
请参阅图1,图1是本申请实施例提供的一种深度学习模型训练方法的流程示意图。如图所示,该方法包括以下步骤S1-S5:Please refer to FIG. 1, which is a schematic flowchart of a deep learning model training method provided by an embodiment of the present application. As shown in the figure, the method includes the following steps S1-S5:
S1,将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合。S1. Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted.
在本申请实施例中,通过输入样本集合来对待训练深度学习模型进行训练。待训练深度学习模型包括输入层、多个隐藏层以及输出层。In the embodiment of the present application, the deep learning model to be trained is trained by inputting a sample set. The deep learning model to be trained includes an input layer, multiple hidden layers, and an output layer.
具体实施中,将输入样本集合输入到待训练深度学习模型的输入层,以对待训练深度学习模型的输入层进行训练。In specific implementation, the input sample set is input to the input layer of the deep learning model to be trained to train the input layer of the deep learning model to be trained.
在本申请实施例中,输入层输出结果时,将输入层的输出结果作为待调整样本集合,并对待调整样本集合进行调整后再输入到待训练深度学习模型的下 一层中。In the embodiment of the present application, when the output result of the input layer is input, the output result of the input layer is used as the sample set to be adjusted, and the sample set to be adjusted is adjusted before being input to the next layer of the deep learning model to be trained.
S2,对所述待调整样本集合进行非线性激活处理以得到激活样本集合。S2. Perform a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set.
具体实施中,对所述待调整样本集合进行非线性激活处理以得到激活样本集合。通过对所述待调整样本集合进行非线性激活处理能够提高待训练深度学习模型的非线性因素,提高待训练深度学习模型的表现力。In a specific implementation, a non-linear activation process is performed on the sample set to be adjusted to obtain an activated sample set. By performing a nonlinear activation process on the set of samples to be adjusted, the nonlinear factors of the deep learning model to be trained can be improved, and the expressiveness of the deep learning model to be trained can be improved.
在一实施例中,以上步骤S2具体包括如下步骤:In an embodiment, the above step S2 specifically includes the following steps:
通过预设的非线性激活函数对所述待调整样本集合进行非线性激活处理以得到激活样本集合。Perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.
需要说明的是,在本申请中,非线性激活函数包括:Sigmoid函数、Tanh函数以及ReLU(Rectified Linear Unit,修正线性单元)函数,本申请对此不作具体限定。It should be noted that, in this application, the non-linear activation functions include: Sigmoid function, Tanh function, and ReLU (Rectified Linear Unit) function, which is not specifically limited in this application.
S3,对所述激活样本集合进行批标准化处理以得到标准样本集合。S3. Perform batch normalization processing on the activated sample set to obtain a standard sample set.
具体实施中,通过对所述激活样本集合进行批标准化处理以得到标准样本集合。批标准化处理能够降低样本在深度学习模型的各层传递过程中的差异性,从而提高了模型的训练效果。In specific implementation, a batch standardization process is performed on the activated sample set to obtain a standard sample set. The batch standardization process can reduce the difference of samples in the transmission of each layer of the deep learning model, thereby improving the training effect of the model.
在本申请实施例中,先对待调整样本集合进行非线性激活处理得到激活样本集合后,再对激活样本集合进行批标准化处理以得到标准样本集合。通过将批标准化处理调整到非线性激活处理之后(非线性激活处理会提高样本的差异性),能够获得对于下一层网络更好的控制,从而提高了对待训练深度学习模型的训练效果。In the embodiment of the present application, after performing non-linear activation processing on the sample set to be adjusted to obtain an activated sample set, batch normalization processing is performed on the activated sample set to obtain a standard sample set. By adjusting the batch normalization process to the non-linear activation process (the non-linear activation process will increase the sample difference), better control of the next layer of the network can be obtained, thereby improving the training effect of the deep learning model to be trained.
在一实施例中,参见图2,以上步骤S3包括如下步骤S31-S32:In an embodiment, referring to FIG. 2, the above step S3 includes the following steps S31-S32:
S31,获取所述激活样本集合中各样本的均值以及方差。S31. Obtain the mean and variance of each sample in the activated sample set.
具体实施中,对激活样本集合进行批标准化处理需要用到激活样本集合中各样本的均值以及方差,为此首先计算激活样本集合中各样本的均值以及方差。In a specific implementation, the batch normalization process of the activated sample set needs to use the mean and variance of each sample in the activated sample set. To this end, first calculate the mean and variance of each sample in the activated sample set.
在一实施例中,参见图3,以上步骤S31具体包括如下步骤S311-S312:In an embodiment, referring to FIG. 3, the above step S31 specifically includes the following steps S311-S312:
S311,通过以下公式
Figure PCTCN2019117310-appb-000001
计算激活样本集合中各样本的均值μ。
S311, by the following formula
Figure PCTCN2019117310-appb-000001
Calculate the mean μ of each sample in the active sample set.
具体实施中,通过以下公式
Figure PCTCN2019117310-appb-000002
计算激活样本集合中各样本的均值μ,其中,i为样本的序号,m为样本的数量,x i为样本的值。
In the specific implementation, through the following formula
Figure PCTCN2019117310-appb-000002
Calculate the mean μ of each sample in the active sample set, where i is the sample number, m is the number of samples, and x i is the value of the sample.
S312,通过以下公式
Figure PCTCN2019117310-appb-000003
计算激活样本集合中各样本的方差σ。
S312, through the following formula
Figure PCTCN2019117310-appb-000003
Calculate the variance σ of each sample in the active sample set.
具体实施中,通过公式
Figure PCTCN2019117310-appb-000004
计算激活样本集合中各样本的方差σ。
In the specific implementation, through the formula
Figure PCTCN2019117310-appb-000004
Calculate the variance σ of each sample in the active sample set.
其中,i为样本的序号,m为样本的数量,x i为样本的值。 Where i is the serial number of the sample, m is the number of samples, and x i is the value of the sample.
S32,根据预设的批标准化公式以及所述激活样本集合中各样本的均值以及方差对所述激活样本集合进行批处理化处理。S32. Perform batch processing on the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set.
具体实施中,在获取了激活样本集合中各样本的均值以及方差后,根据预设的批标准化公式以及所述激活样本集合中各样本的均值以及方差对所述激活样本集合进行批处理化处理。In specific implementation, after obtaining the mean and variance of each sample in the activated sample set, batch process the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set .
在本申请实施例中,批标准化公式为
Figure PCTCN2019117310-appb-000005
其中,在以上公式中,i为样本的序号,x i为激活样本集合中的样本的数值,y i为相应的标准样本集合中的样本的数值,μ为激活样本集合中各样本的均值,σ为激活样本集合中各样本的方差,m为激活样本集合中样本的数量,w、γ、β以及ε为待训练深度学习模型的参数,以上随机初始化生成,然后训练过程中通过迭代来更新这些参数。
In the embodiment of the present application, the batch normalization formula is
Figure PCTCN2019117310-appb-000005
In the above formula, i is the serial number of the sample, x i is the value of the sample in the active sample set, y i is the value of the sample in the corresponding standard sample set, and μ is the average of each sample in the active sample set, σ is the variance of each sample in the active sample set, m is the number of samples in the active sample set, w, γ, β, and ε are the parameters of the deep learning model to be trained. The above random initialization is generated, and then iteratively updated during the training process These parameters.
S4,将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中。S4. Use the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer.
具体实施中,将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中,以对所述目标层进行训练。In a specific implementation, the next layer of the deep learning model to be trained is used as a target layer, and the standard sample set is input into the target layer to train the target layer.
S5,将所述目标层的输出结果作为新的待调整样本集合,并返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。S5. Use the output result of the target layer as a new sample set to be adjusted, and return to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
本申请的方案中,将目标层的输出结果作为新的待调整样本集合,并且返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤,然后再对激活样本集合进行批标准化处理以得到标准样本集合,之后再将待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中,以对所述目标层进行训练。以此类推直到所述待训练深度学习模型输出结果。In the solution of the present application, the output result of the target layer is used as a new sample set to be adjusted, and the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set is returned, and then the activated sample set A batch standardization process is performed to obtain a standard sample set, and then the next layer of the deep learning model to be trained is used as a target layer, and the standard sample set is input into the target layer to train the target layer. And so on until the deep learning model to be trained outputs the result.
通过以上方法,可实现对待训练深度学习模型的上一层的输出结果先后进行非线性激活处理以及批标准化处理后,再输入到待训练深度学习模型的下一 层中,直到待训练深度学习模型的输出层时,直接输出结果。Through the above method, the output results of the previous layer of the deep learning model to be trained can be subjected to nonlinear activation processing and batch normalization processing, and then input to the next layer of the deep learning model to be trained until the deep learning model to be trained Output layer directly.
本申请实施例中,通过将批标准化处理调整到非线性激活处理之后(非线性激活处理会提高样本的差异性),从而批标准化处理直接作用于深度学习模型的下一层结构,能够获得对于下一层结构更好的控制,提高了对待训练深度学习模型的训练效果。In the embodiment of the present application, by adjusting the batch normalization process to the non-linear activation process (the non-linear activation process will increase the difference of the samples), the batch normalization process directly acts on the next layer structure of the deep learning model, which can obtain The structure of the next layer has better control, which improves the training effect of the deep learning model to be trained.
图4是本申请另一实施例提供的一种深度学习模型训练方法的流程示意图。如图4所示,本实施例的深度学习模型训练方法包括步骤S41-S47。其中步骤S43-S47与上述实施例中的步骤S1-S5类似,在此不再赘述。下面详细说明本实施例中所增加的步骤S41-S42。FIG. 4 is a schematic flowchart of a deep learning model training method provided by another embodiment of the present application. As shown in FIG. 4, the deep learning model training method of this embodiment includes steps S41-S47. Steps S43-S47 are similar to steps S1-S5 in the above embodiment, and will not be repeated here. The steps S41-S42 added in this embodiment will be described in detail below.
S41,对训练文本进行分词处理以得到分词样本集合,其中,所述分词样本集合为对所述训练文本进行分词后得到的样本组成的集合。S41: Perform word segmentation processing on the training text to obtain a word segmentation sample set, where the word segmentation sample set is a set of samples obtained after word segmentation is performed on the training text.
在本实施例中,训练文本为预存在终端中的文本,可直接调用获取。In this embodiment, the training text is the text pre-stored in the terminal, which can be directly called to obtain.
具体实施中,分词指的是将一个汉字序列切分成一个个单独的词。分词就是将连续的字序列按照一定的规范重新组合成词序列的过程。分词是文本处理中的一个基础步骤。In the specific implementation, word segmentation refers to dividing a sequence of Chinese characters into individual words. Word segmentation is the process of recombining consecutive word sequences into word sequences according to certain specifications. Word segmentation is a basic step in text processing.
通过对训练文本进行分词处理以得到分词样本集合,其中,分词样本集合为对所述训练文本进行分词后得到的样本(词语)组成的集合。The word segmentation sample set is obtained by performing word segmentation processing on the training text, where the word segmentation sample set is a set composed of samples (words) obtained after word segmentation is performed on the training text.
在一实施例中,在一实施例中,参见图5,以上步骤S41具体包括如下步骤S411-S412:In an embodiment, in an embodiment, referring to FIG. 5, the above step S41 specifically includes the following steps S411-S412:
S411,通过预设的分词工具对训练文本进行分词处理以得到初始分词样本集合。S411: Perform word segmentation processing on the training text by a preset word segmentation tool to obtain an initial word segmentation sample set.
具体实施中,常用的分词工具为结巴分词工具。本实施例中,采用结巴分词工具对训练文本进行分词处理以得到初始分词样本集合。结巴分词工具适用于对中文文本的分词,其对中文文本分词的准确性极高,从而可提高本方案准确性。In the specific implementation, the commonly used word segmentation tool is the stammer word segmentation tool. In this embodiment, the stutter word segmentation tool is used to perform word segmentation processing on the training text to obtain an initial word segmentation sample set. The stammer word segmentation tool is suitable for the segmentation of Chinese text. The accuracy of the segmentation of Chinese text is extremely high, which can improve the accuracy of this program.
或者,在其他实施例中,可采用其他分词工具来对训练文本进行分词处理,本申请对此不作具体限定。Or, in other embodiments, other word segmentation tools may be used to perform word segmentation processing on the training text, which is not specifically limited in this application.
S412,将所述初始分词样本集合中的停止词去除以得到所述分词样本集合。S412: Remove the stop words in the initial word segmentation sample set to obtain the word segmentation sample set.
具体实施中,将所述初始分词样本集合中的停止词去除以得到分词样本集合。需要说明的是,停止词(stop word),常为介词、副词或连词等。例如," 在"、"里面"、"也"、"的"、"它"、"为"等都为停止词。In a specific implementation, the stop words in the initial word segmentation sample set are removed to obtain a word segmentation sample set. It should be noted that stop words are often prepositions, adverbs or conjunctions. For example, "in", "inside", "also", "of", "it", "for", etc. are stop words.
S42,通过预设词向量工具对所述分词样本集合中的样本进行词向量训练以得到所述输入样本集合,其中,所述输入样本集合为所述分词样本集合中的样本的词向量组成的集合。S42: Perform word vector training on the samples in the word segmentation sample set by using a preset word vector tool to obtain the input sample set, where the input sample set is composed of word vectors of the samples in the word segmentation sample set set.
具体实施中,采用word2vec作为词向量工具,word2vec是一种自然语言处理工具,其作用就是将自然语言中的字词转为计算机可以理解的词向量。In specific implementation, word2vec is used as a word vector tool. word2vec is a natural language processing tool, and its function is to convert words in natural language into word vectors that can be understood by a computer.
传统的词向量容易受维数灾难的困扰,且任意两个词之间都是孤立的,不能体现词和词之间的关系,因此本实施例采用word2vec来得到词向量,其可通过计算向量之间的距离来体现词与词之间的相似性,使得训练结果更加准确。Traditional word vectors are susceptible to dimensional disasters, and any two words are isolated and cannot reflect the relationship between words. Therefore, this embodiment uses word2vec to obtain the word vector, which can be calculated by calculating the vector The distance between them reflects the similarity between words and makes the training results more accurate.
本实施例中,通过word2vec对分词样本集合中的样本进行词向量训练以得到各样本的词向量。将分词样本集合中个样本的词向量组合得到输入样本集合。In this embodiment, the word vector training is performed on the samples in the word segmentation sample set by word2vec to obtain the word vector of each sample. The word vectors of the samples in the word segmentation sample set are combined to obtain the input sample set.
或者,在其他实施例中,可采用其他词向量工具对所述分词样本集合中的样本进行词向量训练,本申请对此不作具体限定。Or, in other embodiments, other word vector tools may be used to perform word vector training on the samples in the word segmentation sample set, which is not specifically limited in this application.
图6是本申请实施例提供的一种深度学习模型训练装置60的示意性框图。如图6所示,对应于以上深度学习模型训练方法,本申请还提供一种深度学习模型训练装置60。该深度学习模型训练装置60包括用于执行上述深度学习模型训练方法的单元,该装置可以被配置于台式电脑、平板电脑、手提电脑、等终端中。具体地,请参阅图6,该深度学习模型训练装置60包括第一输入单元61、第一激活单元62、第一批标准化单元63、第二输入单元64以及通知单元65。FIG. 6 is a schematic block diagram of a deep learning model training device 60 provided by an embodiment of the present application. As shown in FIG. 6, corresponding to the above deep learning model training method, the present application further provides a deep learning model training device 60. The deep learning model training device 60 includes a unit for performing the above deep learning model training method, and the device may be configured in a desktop computer, a tablet computer, a laptop computer, and other terminals. Specifically, referring to FIG. 6, the deep learning model training device 60 includes a first input unit 61, a first activation unit 62, a first batch of normalization units 63, a second input unit 64, and a notification unit 65.
第一输入单元61,用于将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;第一激活单元62,用于对所述待调整样本集合进行非线性激活处理以得到激活样本集合;第一批标准化单元63,用户对所述激活样本集合进行批标准化处理以得到标准样本集合;第二输入单元64,用于将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;通知单元65,用于将所述目标层的输出结果作为新的待调整样本集合,并通知激活单元返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。The first input unit 61 is used to input the input sample set to the input layer of the deep learning model to be trained, and the output result of the input layer is used as the sample set to be adjusted; the first activation unit 62 is used to Adjust the sample set to perform nonlinear activation processing to obtain the activated sample set; the first batch of normalization unit 63, the user performs batch normalization processing on the activated sample set to obtain the standard sample set; the second input unit 64 is used to convert the Train the next layer of the deep learning model as the target layer, and input the standard sample set into the target layer; the notification unit 65 is used to use the output result of the target layer as a new sample set to be adjusted, and The notification activation unit returns the step of performing a non-linear activation process on the sample set to be adjusted to obtain an activated sample set.
在一实施例中,参见图7,第一激活单元62包括第二激活单元621。第二激活单元621,用于通过预设的非线性激活函数对所述待调整样本集合进行非线性激活处理以得到激活样本集合。In an embodiment, referring to FIG. 7, the first activation unit 62 includes a second activation unit 621. The second activation unit 621 is configured to perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.
在一实施例中,参见图8,第一批标准化单元63包括获取单元631以及第二批标准化单元632。In an embodiment, referring to FIG. 8, the first batch of normalization units 63 includes an acquisition unit 631 and a second batch of normalization units 632.
获取单元631,用于获取所述激活样本集合中各样本的均值以及方差;第二批标准化单元632,用于根据预设的批标准化公式以及所述激活样本集合中各样本的均值以及方差对所述激活样本集合进行批处理化处理。The obtaining unit 631 is used to obtain the mean and variance of each sample in the activated sample set; the second batch of normalization unit 632 is used to obtain the mean and variance of each sample in the activated sample set according to a preset batch normalization formula The activated sample set is batch-processed.
在一实施例中,参见图9,获取单元631包括第一计算单元6311以及第二计算单元6312。In an embodiment, referring to FIG. 9, the acquisition unit 631 includes a first calculation unit 6311 and a second calculation unit 6312.
第一计算单元6211,用于通过以下公式
Figure PCTCN2019117310-appb-000006
计算激活样本集合中各样本的均值μ;第二计算单元6312,用于通过以下公式
Figure PCTCN2019117310-appb-000007
计算激活样本集合中各样本的方差σ;其中,i为样本的序号,m为样本的数量,x i为样本的值。
The first calculation unit 6211 is used to pass the following formula
Figure PCTCN2019117310-appb-000006
Calculate the mean μ of each sample in the active sample set; the second calculation unit 6312 is used to pass the following formula
Figure PCTCN2019117310-appb-000007
Calculate the variance σ of each sample in the active sample set; where i is the number of the sample, m is the number of samples, and x i is the value of the sample.
图10是本申请另一实施例提供的一种深度学习模型训练装置60的示意性框图。如图10所示,本实施例的深度学习模型训练装置60是上述实施例的基础上增加了第一分词单元66以及训练单元67。FIG. 10 is a schematic block diagram of a deep learning model training device 60 provided by another embodiment of the present application. As shown in FIG. 10, the deep learning model training device 60 of this embodiment adds the first word segmentation unit 66 and the training unit 67 based on the above embodiment.
第一分词单元66,用于对训练文本进行分词处理以得到分词样本集合,所述分词样本集合为对所述训练文本进行分词后得到的样本组成的集合;训练单元67,用于通过预设词向量工具对所述分词样本集合中的样本进行词向量训练以得到所述输入样本集合,所述输入样本集合为所述分词样本集合中的样本的词向量组成的集合。The first word segmentation unit 66 is used to perform word segmentation processing on the training text to obtain a word segmentation sample set, and the word segmentation sample set is a set composed of samples obtained after word segmentation is performed on the training text; the training unit 67 is used to pass a preset The word vector tool performs word vector training on the samples in the word segmentation sample set to obtain the input sample set, and the input sample set is a set of word vectors of the samples in the word segmentation sample set.
在一实施例中,参见图11,第一分词单元66包括第二分词单元661以及去除单元662。In an embodiment, referring to FIG. 11, the first word segmentation unit 66 includes a second word segmentation unit 661 and a removal unit 662.
第二分词单元661,用于通过预设的分词工具对训练文本进行分词处理以得到初始分词样本集合;去除单元662,用于将所述初始分词样本集合中的停止词去除以得到所述分词样本集合。The second word segmentation unit 661 is used to perform word segmentation processing on the training text through a preset word segmentation tool to obtain an initial word segmentation sample set; a removal unit 662 is used to remove stop words in the initial word segmentation sample set to obtain the word segmentation Sample collection.
需要说明的是,所属领域的技术人员可以清楚地了解到,上述深度学习模型训练装置60和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。It should be noted that those skilled in the art can clearly understand that the specific implementation process of the above deep learning model training device 60 and each unit can refer to the corresponding description in the foregoing method embodiments. For the convenience and conciseness of description, the This will not be repeated here.
上述深度学习模型训练装置60可以实现为一种计算机程序的形式,该计算机程序可以在如图12所示的计算机设备上运行。The above deep learning model training device 60 may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 12.
请参阅图12,图12是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备500是终端,其中,终端可以是智能手机、平板电脑、笔记本电脑、台式电脑、个人数字助理和穿戴式设备等具有通信功能的电子设备。Please refer to FIG. 12, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 is a terminal, where the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and other electronic devices with communication functions.
参阅图12,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。Referring to FIG. 12, the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行一种深度学习模型训练方法。The non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032. When the computer program 5032 is executed, it may cause the processor 502 to execute a deep learning model training method.
该处理器502用于提供计算和控制能力,以支撑整个计算机设备500的运行。The processor 502 is used to provide computing and control capabilities to support the operation of the entire computer device 500.
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行一种深度学习模型训练方法。The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503. When the computer program 5032 is executed by the processor 502, the processor 502 can execute a deep learning model training method.
该网络接口505用于与其它设备进行网络通信。本领域技术人员可以理解,图12中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The network interface 505 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 12 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied. The specific computer device 500 may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请的深度学习模型训练方法。The processor 502 is used to run the computer program 5032 stored in the memory to implement the deep learning model training method of the present application.
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor.
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一存储介质中,该存储介质为计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。A person of ordinary skill in the art may understand that all or part of the processes in the method for implementing the foregoing embodiments may be completed by instructing relevant hardware through a computer program. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiments.
因此,本申请还提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序。该计算机程序被处理器执行时使处理器执行本申请的深度学习模型训练方法。Therefore, the present application also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. When the computer program is executed by the processor, the processor is caused to execute the deep learning model training method of the present application.
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。The storage medium is a physical, non-transitory storage medium, for example, it can be a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk and other various physical storages that can store program codes medium.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art may realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two, in order to clearly explain the hardware and software. Interchangeability, in the above description, the composition and steps of each example have been generally described according to function. Whether these functions are executed in hardware or software depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
以上所述,仅为本申请的具体实施方式,但本申请明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above is only the specific implementation of this application, but the scope of protection disclosed in this application is not limited to this, any person skilled in the art can easily think of various equivalents within the technical scope disclosed in this application Modifications or replacements, these modifications or replacements should be covered within the scope of protection of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (20)

  1. 一种深度学习模型训练方法,包括:A deep learning model training method, including:
    将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
    对所述待调整样本集合进行非线性激活处理以得到激活样本集合;Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;
    对所述激活样本集合进行批标准化处理以得到标准样本集合;Batch-normalize the activated sample set to obtain a standard sample set;
    将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;
    将所述目标层的输出结果作为新的待调整样本集合,并返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
  2. 根据权利要求1所述的方法,其中,所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合,包括:The method according to claim 1, wherein the performing non-linear activation processing on the sample set to be adjusted to obtain an activated sample set includes:
    通过预设的非线性激活函数对所述待调整样本集合进行非线性激活处理以得到激活样本集合。Perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.
  3. 根据权利要求2所述的方法,其中,所述非线性激活函数包括Sigmoid函数、Tanh函数以及ReLU函数。The method according to claim 2, wherein the non-linear activation function includes a Sigmoid function, a Tanh function, and a ReLU function.
  4. 根据权利要求1所述的方法,其中,所述对所述激活样本集合进行批标准化处理以得到标准样本集合,包括:The method according to claim 1, wherein the batch normalizing the activated sample set to obtain a standard sample set includes:
    获取所述激活样本集合中各样本的均值以及方差;Obtaining the mean and variance of each sample in the activated sample set;
    根据预设的批标准化公式以及所述激活样本集合中各样本的均值以及方差对所述激活样本集合进行批处理化处理。Perform batch processing on the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set.
  5. 根据权利要求4所述的方法,其中,所述获取所述激活样本集合中各样本的均值以及方差,包括:The method according to claim 4, wherein the obtaining the mean and variance of each sample in the active sample set includes:
    通过以下公式
    Figure PCTCN2019117310-appb-100001
    计算激活样本集合中各样本的均值μ;
    By the following formula
    Figure PCTCN2019117310-appb-100001
    Calculate the mean μ of each sample in the active sample set;
    通过以下公式
    Figure PCTCN2019117310-appb-100002
    计算激活样本集合中各样本的方差σ;
    By the following formula
    Figure PCTCN2019117310-appb-100002
    Calculate the variance σ of each sample in the active sample set;
    其中,i为样本的序号,m为样本的数量,x i为样本的值。 Where i is the serial number of the sample, m is the number of samples, and x i is the value of the sample.
  6. 根据权利要求1所述的方法,其中,在所述将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合之前,所述方法还包括:The method according to claim 1, wherein before the input sample set is input to the input layer of the deep learning model to be trained, and the output result of the input layer is used as the sample set to be adjusted, the method further comprises :
    对训练文本进行分词处理以得到分词样本集合,所述分词样本集合为对所述训练文本进行分词后得到的样本组成的集合;Performing word segmentation processing on the training text to obtain a word segmentation sample set, where the word segmentation sample set is a set of samples obtained after word segmentation is performed on the training text;
    通过预设词向量工具对所述分词样本集合中的样本进行词向量训练以得到所述输入样本集合,所述输入样本集合为所述分词样本集合中的样本的词向量组成的集合。The word vector training is performed on the samples in the word segmentation sample set by a preset word vector tool to obtain the input sample set, and the input sample set is a set of word vectors of the samples in the word segmentation sample set.
  7. 根据权利要求6所述的方法,其中,所述对训练文本进行分词处理以得到分词样本集合,包括:The method according to claim 6, wherein the word segmentation processing on the training text to obtain a word segmentation sample set includes:
    通过预设的分词工具对训练文本进行分词处理以得到初始分词样本集合;Use the preset word segmentation tool to perform word segmentation on the training text to obtain the initial word segmentation sample set;
    将所述初始分词样本集合中的停止词去除以得到所述分词样本集合。The stop words in the initial word segmentation sample set are removed to obtain the word segmentation sample set.
  8. 根据权利要求6所述的方法,其中,所述词向量工具为word2vec。The method according to claim 6, wherein the word vector tool is word2vec.
  9. 根据权利要求7所述的方法,其中,所述分词工具为结巴分词工具。The method according to claim 7, wherein the word segmentation tool is a stammer word segmentation tool.
  10. 一种深度学习模型训练装置,包括:A deep learning model training device, including:
    第一输入单元,用于将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;The first input unit is configured to input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
    第一激活单元,用于对所述待调整样本集合进行非线性激活处理以得到激活样本集合;A first activation unit, configured to perform a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set;
    第一批标准化单元,用户对所述激活样本集合进行批标准化处理以得到标准样本集合;In the first batch of standardization units, the user performs batch normalization processing on the activated sample set to obtain a standard sample set;
    第二输入单元,用于将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;A second input unit, configured to use the next layer of the deep learning model to be trained as a target layer, and input the standard sample set into the target layer;
    通知单元,用于将所述目标层的输出结果作为新的待调整样本集合,并通知激活单元返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。The notification unit is configured to use the output result of the target layer as a new sample set to be adjusted, and notify the activation unit to return to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
  11. 一种计算机设备,包括存储器以及与所述存储器相连的处理器;其中,所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如下步骤:A computer device includes a memory and a processor connected to the memory; wherein the memory is used to store a computer program; the processor is used to run the computer program stored in the memory to perform the following steps:
    将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
    对所述待调整样本集合进行非线性激活处理以得到激活样本集合;Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;
    对所述激活样本集合进行批标准化处理以得到标准样本集合;Batch-normalize the activated sample set to obtain a standard sample set;
    将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;
    将所述目标层的输出结果作为新的待调整样本集合,并返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
  12. 根据权利要求11所述的计算机设备,其中,所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤包括:The computer device according to claim 11, wherein the step of performing a nonlinear activation process on the sample set to be adjusted to obtain an activated sample set includes:
    通过预设的非线性激活函数对所述待调整样本集合进行非线性激活处理以得到激活样本集合。Perform a nonlinear activation process on the sample set to be adjusted through a preset nonlinear activation function to obtain an activated sample set.
  13. 根据权利要求12所述的计算机设备,其中,所述非线性激活函数包括Sigmoid函数、Tanh函数以及ReLU函数。The computer device according to claim 12, wherein the non-linear activation function includes a Sigmoid function, a Tanh function, and a ReLU function.
  14. 根据权利要求11所述的计算机设备,其中,所述对所述激活样本集合进行批标准化处理以得到标准样本集合的步骤包括:The computer device according to claim 11, wherein the step of batch-normalizing the activated sample set to obtain a standard sample set includes:
    获取所述激活样本集合中各样本的均值以及方差;Obtaining the mean and variance of each sample in the activated sample set;
    根据预设的批标准化公式以及所述激活样本集合中各样本的均值以及方差对所述激活样本集合进行批处理化处理。Perform batch processing on the activated sample set according to a preset batch normalization formula and the mean and variance of each sample in the activated sample set.
  15. 根据权利要求14所述的计算机设备,其中,所述获取所述激活样本集合中各样本的均值以及方差的步骤包括:The computer device according to claim 14, wherein the step of obtaining the mean and variance of each sample in the active sample set includes:
    通过以下公式
    Figure PCTCN2019117310-appb-100003
    计算激活样本集合中各样本的均值μ;
    By the following formula
    Figure PCTCN2019117310-appb-100003
    Calculate the mean μ of each sample in the active sample set;
    通过以下公式
    Figure PCTCN2019117310-appb-100004
    计算激活样本集合中各样本的方差σ;
    By the following formula
    Figure PCTCN2019117310-appb-100004
    Calculate the variance σ of each sample in the active sample set;
    其中,i为样本的序号,m为样本的数量,x i为样本的值。 Where i is the serial number of the sample, m is the number of samples, and x i is the value of the sample.
  16. 根据权利要求11所述的计算机设备,其中,在所述将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合之前,所述处理器还执行如下步骤:The computer device according to claim 11, wherein before the input sample set is input to the input layer of the deep learning model to be trained, and the output result of the input layer is used as the sample set to be adjusted, the processor Also perform the following steps:
    对训练文本进行分词处理以得到分词样本集合,所述分词样本集合为对所述训练文本进行分词后得到的样本组成的集合;Performing word segmentation processing on the training text to obtain a word segmentation sample set, where the word segmentation sample set is a set of samples obtained after word segmentation is performed on the training text;
    通过预设词向量工具对所述分词样本集合中的样本进行词向量训练以得到所述输入样本集合,所述输入样本集合为所述分词样本集合中的样本的词向量组成的集合。The word vector training is performed on the samples in the word segmentation sample set by a preset word vector tool to obtain the input sample set, and the input sample set is a set of word vectors of the samples in the word segmentation sample set.
  17. 根据权利要求16所述的计算机设备,其中,所述对训练文本进行分词处 理以得到分词样本集合的步骤包括:The computer device according to claim 16, wherein the step of performing word segmentation processing on the training text to obtain a word segmentation sample set includes:
    通过预设的分词工具对训练文本进行分词处理以得到初始分词样本集合;Use the preset word segmentation tool to perform word segmentation on the training text to obtain the initial word segmentation sample set;
    将所述初始分词样本集合中的停止词去除以得到所述分词样本集合。The stop words in the initial word segmentation sample set are removed to obtain the word segmentation sample set.
  18. 根据权利要求16所述的计算机设备,其中,所述词向量工具为word2vec。The computer device according to claim 16, wherein the word vector tool is word2vec.
  19. 根据权利要求17所述的计算机设备,其中,所述分词工具为结巴分词工具。The computer device according to claim 17, wherein the word segmentation tool is a stammer word segmentation tool.
  20. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时使所述处理器执行以下步骤:A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the processor causes the processor to perform the following steps:
    将输入样本集合输入到待训练深度学习模型的输入层,并将所述输入层的输出结果作为待调整样本集合;Input the input sample set to the input layer of the deep learning model to be trained, and use the output result of the input layer as the sample set to be adjusted;
    对所述待调整样本集合进行非线性激活处理以得到激活样本集合;Performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set;
    对所述激活样本集合进行批标准化处理以得到标准样本集合;Batch-normalize the activated sample set to obtain a standard sample set;
    将所述待训练深度学习模型的下一层作为目标层,并将所述标准样本集合输入到所述目标层中;Take the next layer of the deep learning model to be trained as the target layer, and input the standard sample set into the target layer;
    将所述目标层的输出结果作为新的待调整样本集合,并返回所述对所述待调整样本集合进行非线性激活处理以得到激活样本集合的步骤。Taking the output result of the target layer as a new sample set to be adjusted, and returning to the step of performing nonlinear activation processing on the sample set to be adjusted to obtain an activated sample set.
PCT/CN2019/117310 2019-01-10 2019-11-12 Method and device for training deep learning model, computer apparatus, and storage medium WO2020143303A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910023779.1 2019-01-10
CN201910023779.1A CN109886402B (en) 2019-01-10 Deep learning model training method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020143303A1 true WO2020143303A1 (en) 2020-07-16

Family

ID=66925884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117310 WO2020143303A1 (en) 2019-01-10 2019-11-12 Method and device for training deep learning model, computer apparatus, and storage medium

Country Status (1)

Country Link
WO (1) WO2020143303A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010095960A (en) * 2000-04-14 2001-11-07 유인균 Neuro-controller for the Implementation of Artificial Intelligence Apartment Building
CN108334943A (en) * 2018-01-03 2018-07-27 浙江大学 The semi-supervised soft-measuring modeling method of industrial process based on Active Learning neural network model
CN108734193A (en) * 2018-03-27 2018-11-02 合肥麟图信息科技有限公司 A kind of training method and device of deep learning model
CN108898218A (en) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 A kind of training method of neural network model, device and computer equipment
CN108959265A (en) * 2018-07-13 2018-12-07 深圳市牛鼎丰科技有限公司 Cross-domain texts sensibility classification method, device, computer equipment and storage medium
CN109886402A (en) * 2019-01-10 2019-06-14 平安科技(深圳)有限公司 Deep learning model training method, device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010095960A (en) * 2000-04-14 2001-11-07 유인균 Neuro-controller for the Implementation of Artificial Intelligence Apartment Building
CN108334943A (en) * 2018-01-03 2018-07-27 浙江大学 The semi-supervised soft-measuring modeling method of industrial process based on Active Learning neural network model
CN108734193A (en) * 2018-03-27 2018-11-02 合肥麟图信息科技有限公司 A kind of training method and device of deep learning model
CN108898218A (en) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 A kind of training method of neural network model, device and computer equipment
CN108959265A (en) * 2018-07-13 2018-12-07 深圳市牛鼎丰科技有限公司 Cross-domain texts sensibility classification method, device, computer equipment and storage medium
CN109886402A (en) * 2019-01-10 2019-06-14 平安科技(深圳)有限公司 Deep learning model training method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109886402A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
WO2020232877A1 (en) Question answer selection method and apparatus, computer device, and storage medium
WO2022007823A1 (en) Text data processing method and device
US9807473B2 (en) Jointly modeling embedding and translation to bridge video and language
WO2020237869A1 (en) Question intention recognition method and apparatus, computer device, and storage medium
WO2018166114A1 (en) Picture identification method and system, electronic device, and medium
WO2022048173A1 (en) Artificial intelligence-based customer intent identification method and apparatus, device, and medium
WO2020151175A1 (en) Method and device for text generation, computer device, and storage medium
WO2020140632A1 (en) Hidden feature extraction method, apparatus, computer device and storage medium
WO2021000745A1 (en) Knowledge graph embedding representing method, and related device
WO2020090413A1 (en) Classification device, classification method, and classification program
WO2020143186A1 (en) Recommendation system training method and apparatus, and computer device and storage medium
WO2021008037A1 (en) A-bilstm neural network-based text classification method, storage medium, and computer device
CN111882005B (en) Data type determination method and device, terminal equipment and storage medium
WO2021159718A1 (en) Named entity recognition method and apparatus, terminal device and storage medium
WO2020168754A1 (en) Prediction model-based performance prediction method and device, and storage medium
WO2022095379A1 (en) Data dimension reduction processing method and apparatus, computer device and storage medium
CN114676704A (en) Sentence emotion analysis method, device and equipment and storage medium
CN115048938A (en) Statement emotion analysis method and device based on semantic and syntax dual channels
WO2023123926A1 (en) Artificial intelligence task processing method and apparatus, electronic device, and readable storage medium
WO2019091401A1 (en) Network model compression method and apparatus for deep neural network, and computer device
CN115410717A (en) Model training method, data retrieval method, image data retrieval method and device
WO2020199498A1 (en) Palmar digital vein comparison method and device, computer apparatus, and storage medium
WO2022105118A1 (en) Image-based health status identification method and apparatus, device and storage medium
WO2022116444A1 (en) Text classification method and apparatus, and computer device and medium
WO2022227214A1 (en) Classification model training method and apparatus, and terminal device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19908891

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19908891

Country of ref document: EP

Kind code of ref document: A1