WO2023231887A1 - 基于张量的持续学习方法和装置 - Google Patents

基于张量的持续学习方法和装置 Download PDF

Info

Publication number
WO2023231887A1
WO2023231887A1 PCT/CN2023/096249 CN2023096249W WO2023231887A1 WO 2023231887 A1 WO2023231887 A1 WO 2023231887A1 CN 2023096249 W CN2023096249 W CN 2023096249W WO 2023231887 A1 WO2023231887 A1 WO 2023231887A1
Authority
WO
WIPO (PCT)
Prior art keywords
tensor
neural network
training
task
layers
Prior art date
Application number
PCT/CN2023/096249
Other languages
English (en)
French (fr)
Inventor
李银川
邵云峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023231887A1 publication Critical patent/WO2023231887A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • This application relates to the technical field of artificial intelligence (AI) in big data, and in particular to a tensor-based continuous learning method and device.
  • AI artificial intelligence
  • Artificial intelligence AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that can respond in a manner similar to human intelligence.
  • Artificial intelligence is the study of the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Embodiments of the present application provide a tensor-based continuous learning method and device, which can effectively improve the anti-forgetting ability of the model, and the size of the model increases relatively little, thereby effectively saving storage and communication overhead.
  • this application provides a tensor-based continuous learning method, which method includes: obtaining input data; wherein the input data includes one or more of video, image, text, or voice; The input data is input into the first neural network to obtain data processing results; wherein, the first neural network is obtained through training of m tasks.
  • the neural network contains A tensor cores, the A tensor cores are divided into B tensor layers, each tensor layer in the B tensor layers includes each tensor core in the A tensor cores.
  • the first neural network adds C tensor cores and/or D tensor layers, and after the i+1th task During the training, the parameters in the C tensor cores and/or the D tensor layers are updated, m is a positive integer, and i is a positive integer less than or equal to m-1.
  • the parameters in the newly added tensor core and/or tensor layer can remain unchanged, that is, the neural network is updated in the dimensions of the tensor core and/or tensor layer.
  • the knowledge learned from the previous task can be effectively retained. You can also use new tensor cores and/or tensor layers to save the knowledge learned from new tasks, so that the neural network has higher-precision anti-forgetting capabilities.
  • the parameters in some or all of the A tensor cores remain unchanged, or the B tensor cores remain unchanged. Parameters in some or all tensor layers remain unchanged.
  • the method further includes: after completing the training of the i-th task, performing tensor combination on the A tensor cores to obtain one or more tensors; and Before performing the training of the i+1th task, the one or more tensors are tensor decomposed to obtain the A tensor cores.
  • the neural network used for continuous learning can be designed to contain only tensors and not tensor cores.
  • the tensors are first decomposed into tensors. Then more accurately update the parameters in the decomposed tensor core/or tensor layer; similarly, the neural network can also be a neural network containing tensor core/tensor layer at the time of design, during the training process of the task There is no need to perform tensor decomposition and tensor combination.
  • the neural network can be designed in different ways, and the method in this application can be applied, and has good versatility.
  • the training process of the i+1th task includes: training the ith backup tensor network using the i+1th batch of data sets to obtain the trained ith backup tensor. network; and use the i+1th batch of data sets to train the first neural network; wherein the loss function of the first neural network includes the output of the first neural network and the trained jth backup sheet
  • the backup tensor network trained on the previous task will be used to constrain the update of the neural network.
  • the tensor core/ The tensor layer update method makes the neural network have better anti-forgetting ability after training on new tasks.
  • the first neural network is one of multiple neural networks, the multiple neural networks are respectively located on different user equipment, and the first neural network is located on the first user equipment.
  • the method further includes: after the training of each task is completed, the first user equipment sends the model parameters of the first neural network to the server, so that the server determines the model parameters of the first neural network according to the training of each task.
  • the model parameters of the neural network update the second neural network on the server; wherein the model parameters of the first neural network include tensor cores included in the first neural network.
  • the continuous learning method of this application can be applied to the federated learning architecture, and the neural network parameters are updated by performing the continuous learning method in the aforementioned embodiment on different user devices, so that the scale of the neural network does not change significantly. It maintains good anti-forgetting ability under all circumstances; at the same time, only the tensor core of the neural network on each user device is sent to the server for model aggregation, which can effectively save communication overhead in the federated learning process.
  • the task includes image recognition, target detection, image segmentation or speech semantic recognition.
  • the continuous learning method in this application is not limited by task categories, can be applied to various types of application scenarios, and has strong versatility.
  • embodiments of the present application provide a tensor-based continuous learning method.
  • the method includes: obtaining input data; wherein the input data includes one or more of video, image, text, or voice. ; Input the input data into the first neural network to obtain data processing results; wherein, the first neural network is obtained through m task training, and after the training of the i-th task is completed, the neural network Contains A tensor cores, the A tensor cores include B tensor layers, and each tensor layer in the B tensor layers includes each tensor core in the A tensor cores.
  • the first neural network adds C tensor cores and/or D tensor layers, and in the During the training of i+1 tasks, the parameters in the C tensor cores and/or the D tensor layers are updated.
  • embodiments of the present application provide a tensor-based continuous learning method.
  • the method includes: receiving multiple model parameters respectively sent by multiple user equipments, wherein the multiple user equipments include the first user Device, the model parameters sent by the first user equipment include tensor cores included in the first neural network; update the second neural network based on the multiple model parameters, and process the to-be-received neural network based on the updated second neural network Process the data to obtain data processing results; wherein the data to be processed includes one or more of pictures, videos, voices or texts; wherein the first neural network is obtained through m task training, and in the After the training of i tasks is completed, the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers.
  • Each tensor layer in the B tensor layers Contains data in the same dimension for each of the A tensor cores; after the training of the i+1th task, the first neural network adds C tensor cores and/or D tensor layers, and in the training of the i+1th task, the parameters in the C tensor cores and/or the D tensor layers are updated, m is a positive integer, and i is less than Or a positive integer equal to m-1.
  • the parameters in some or all of the A tensor cores remain unchanged, or the B tensor cores remain unchanged. Parameters in some or all tensor layers remain unchanged.
  • embodiments of the present application provide a tensor-based continuous learning method, which is characterized in that the method includes: receiving multiple model parameters respectively sent by multiple user equipments, wherein the multiple user equipments including a first user equipment, the model parameters sent by the first user equipment include tensor cores included in the first neural network; updating the second neural network based on the plurality of model parameters, and based on the updated second neural network
  • the neural network processes the data to be processed and obtains data processing results; wherein the data to be processed includes one or more of pictures, videos, voices or texts; wherein the first neural network is obtained after training on m tasks , after the training of the i-th task is completed, the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers, and each of the B tensor layers
  • the tensor layers contain the data of each tensor core in the A tensor cores in the same dimension; after the training of the i+1th task is completed
  • the first neural network adds C tensor cores and/or D tensor layers, and in the In the training of i+1 tasks, the C tensor cores and/or all The parameters in the D tensor layers are updated.
  • inventions of the present application provide a tensor-based continuous learning device.
  • the device includes: an acquisition unit for acquiring input data; wherein the input data includes video, image, text or voice.
  • One or more processing units, used to process the input data based on the first neural network to obtain data processing results; wherein the first neural network is obtained through m task training, and in the i-th After the training of the task is completed, the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers.
  • the parameters in some or all of the A tensor cores remain unchanged, or the B tensor cores remain unchanged. Parameters in some or all tensor layers remain unchanged.
  • the processing unit is further configured to: after the training of the i-th task is completed, perform tensor combination on the A tensor cores to obtain one or more tensors; And before performing the training of the i+1th task, perform tensor decomposition on the one or more tensors to obtain the A tensor cores.
  • the processing unit is specifically configured to: train the ith backup tensor network using the i+1th batch of data sets to obtain the training A good i-th backup tensor network; and training the first neural network using the i+1 batch of data sets; wherein the loss function of the first neural network includes the output sum of the first neural network
  • the degree of difference between the outputs of the trained jth backup tensor network, or the loss function of the first neural network includes the model parameters of the first neural network and the trained jth backup tensor network.
  • the first neural network is one of multiple neural networks, the multiple neural networks are respectively located on different user equipment, and the first neural network is located on the first user equipment.
  • the device further includes: a sending unit configured to send the model parameters of the first neural network to the server after the training of each task is completed, so that the server can The model parameters of each neural network in the neural network update the second neural network on the server; wherein the model parameters of the first neural network include the tensor core included in the first neural network.
  • the task includes image recognition, target detection, image segmentation or speech semantic recognition.
  • inventions of the present application provide a tensor-based continuous learning device.
  • the device includes: an acquisition unit for acquiring input data; wherein the input data includes video, image, text or voice.
  • the neural network contains A tensor cores, the A tensor cores contain B tensor layers, and each of the B tensor layers contains the A tensor layers.
  • the first neural network increases C tensor cores and/or D tensor layers are obtained, and in the training of the i+1th task, the parameters in the C tensor cores and/or the D tensor layers are renew.
  • inventions of the present application provide a tensor-based continuous learning device.
  • the device includes: a receiving unit configured to receive multiple model parameters respectively sent by multiple user equipments, wherein the multiple user devices
  • the device includes a first user equipment, the model parameters sent by the first user equipment include tensor cores included in the first neural network; a processing unit configured to update the second neural network based on the plurality of model parameters, and based on the update
  • the second neural network processes the data to be processed to obtain data processing results; wherein the data to be processed includes one or more of pictures, videos, voices or texts; wherein the first neural network is Obtained after m task training, after the training of the i-th task is completed, the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers.
  • Each tensor layer in the tensor layer contains data in the same dimension of each tensor core in the A tensor cores; after the training of the i+1th task ends, the first neural network increases C tensor cores and/or D tensor layers are obtained, and in the training of the i+1th task, the parameters in the C tensor cores and/or the D tensor layers are Update, m is a positive integer, i is a positive integer less than or equal to m-1.
  • the parameters in some or all of the A tensor cores remain unchanged, or the B tensor cores remain unchanged. Parameters in some or all tensor layers remain unchanged.
  • inventions of the present application provide a tensor-based continuous learning device.
  • the device includes: a receiving unit configured to receive multiple model parameters respectively sent by multiple user equipments, wherein the multiple user devices
  • the device includes a first user equipment, the model parameters sent by the first user equipment include tensor cores included in the first neural network; a processing unit configured to update the second neural network based on the plurality of model parameters, and based on the update
  • the second neural network processes the data to be processed to obtain data processing results; wherein the data to be processed includes one or more of pictures, videos, voices or texts; wherein the first neural network is Obtained after m task training, after the training of the i-th task is completed, the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers.
  • Each tensor layer in the tensor layer contains data in the same dimension of each of the A tensor cores; after the training of the i+1th task ends, the A tensor cores
  • the parameters in some or all of the tensor cores remain unchanged, or the parameters in some of the B tensor layers remain unchanged, m is a positive integer, i is a positive number less than or equal to m-1 integer.
  • the first neural network adds C tensor cores and/or D tensor layers, and in the During the training of i+1 tasks, the parameters in the C tensor cores and/or the D tensor layers are updated.
  • inventions of the present application provide an electronic device.
  • the electronic device includes at least one processor, a memory, and an interface circuit.
  • the memory, the interface circuit, and the at least one processor are interconnected through lines. Instructions are stored in the at least one memory; when the instructions are executed by the processor, the method described in any one of the first to fourth aspects is implemented.
  • embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program is executed, any one of the above-mentioned first to fourth aspects will occur.
  • the method described is implemented.
  • embodiments of the present application provide a computer program.
  • the computer program includes instructions.
  • the computer program is executed, the method described in any one of the above first to fourth aspects is implemented.
  • Figures 1a-1c are schematic diagrams of several system architectures that can execute the tensor-based continuous learning method in this application provided by embodiments of the present application;
  • FIG. 2 is a schematic diagram of another system architecture provided by an embodiment of the present application.
  • Figure 3 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the relationship between a tensor core and a tensor layer provided by an embodiment of the present application.
  • Figures 8a-8c are schematic diagrams of how the tensor structure is changed according to the embodiment of the present application.
  • Figures 9a-9b are schematic diagrams of the update method of the tensor structure provided by the embodiment of the present application.
  • Figure 11 is a schematic diagram of an alternating training process provided by an embodiment of the present application.
  • Figure 12 is a schematic diagram of a continuous learning process based on a federated learning architecture provided by an embodiment of the present application
  • Figure 13 is a schematic flow chart of another continuous learning method provided by an embodiment of the present application.
  • Figure 14 is a schematic flow chart of another continuous learning method provided by the embodiment of the present application.
  • Figure 15 is a schematic flow chart of yet another continuous learning method provided by an embodiment of the present application.
  • Figure 16 is a schematic structural diagram of a continuous learning device provided by an embodiment of the present application.
  • Figure 17 is a schematic structural diagram of another continuous learning device provided by an embodiment of the present application.
  • Figure 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Tensor Tensor a multi-dimensional array.
  • a scalar can be viewed as a 0-dimensional tensor
  • a vector can be viewed as a 1-dimensional tensor
  • a matrix can be viewed as a 2-dimensional tensor. In actual use, this is usually used when the data dimension is greater than third order.
  • the data is called a tensor.
  • Tensor decomposition It is essentially a higher-order generalization of matrix decomposition. It is usually used for dimensionality reduction, missing data filling (or "sparse data filling") and implicit relationship mining. Commonly used tensor decomposition methods include: CP decomposition, Tucker decomposition, t-SVD decomposition, etc.
  • Tensor layer a collection of data with different tensor cores in the same dimension.
  • Continuous Learning Continuous Learning/Life-Long Learning: When learning new tasks, you can use the experience of previous tasks to quickly learn new tasks while maintaining the memory of old tasks. It is a kind of resistance. Deep learning for forgetting ability.
  • Machine Vision Computer Vision, CV
  • the continuous learning method in this application can be used to train models for various tasks in the field of machine vision (such as target detection, image segmentation, image classification, etc.), and then use The trained model performs inference.
  • the trained model can learn knowledge from the training data of new tasks and retain knowledge from previous tasks.
  • NLP Natural Language Processing
  • the continuous learning method in this application can be used to model various tasks in the field of natural language processing (such as speech semantic recognition, virtual human video generation, etc.) Train, and then use the trained model for inference.
  • the trained model can learn knowledge from the training data of new tasks and retain knowledge from previous tasks.
  • Figures 1a to 1c are schematic diagrams of several system architectures that can execute the tensor-based continuous learning method in this application provided by embodiments of this application.
  • the system shown in Figure 1a includes a user device 110 and a data processing device 120 (server).
  • the user equipment 110 includes smart terminals such as mobile phones, personal computers, vehicle-mounted terminals, or information processing centers.
  • the user equipment 110 is the initiator of data processing, and usually the user initiates a request through the user equipment 110 .
  • the data processing device 120 may be a cloud server, a network server, an application server, a management server, and other devices or servers with data processing functions.
  • the data processing device 120 receives query statements/voice/text requests from the user device 110 through the interactive interface, and then performs machine learning, deep learning, search, reasoning, etc. through the memory for storing data and the processor for data processing.
  • the continuous learning method in this application is executed through data processing such as decision-making, and then the neural network trained by the continuous learning method is used for reasoning, and finally the data processing results obtained by the reasoning are transmitted to the user device 110 through the network.
  • the memory may be a general term, including local storage and a database that stores historical data.
  • the database may be on a data processing device or on another network server.
  • the user equipment 110 in the system shown in Figure 1b directly serves as a data processing device, directly receiving input from the user and processing it directly by the hardware of the user equipment 110 itself.
  • the specific process is similar to Figure 1a. Please refer to the above description. No longer.
  • the system shown in Figure 1c includes at least one local device (such as local device 301 and local device 302), an execution device 210 and a data storage system 250.
  • the local device is equivalent to the user device 110 in Figure 1a and Figure 1b
  • the execution device 210 is equivalent to the data processing device 120.
  • the data storage system 250 can be integrated on the execution device 210, or can be set on On the cloud or other network servers.
  • FIG 2 is a schematic diagram of another system architecture provided by an embodiment of the present application.
  • the data collection device 260 is used to collect voice, text, image, video and other data and store them in the database 230.
  • the training device 220 trains the neural network 201 based on the image and text data maintained in the database 230 (i.e., this application the first neural network in ). The process of how the training device 220 continuously learns the neural network 201 will be described in detail below in the method embodiment shown in Figure 6.
  • the trained neural network 201 can learn one or more of text, speech, image, and video.
  • Input data is processed to generate data processing results corresponding to target tasks (such as target detection, image recognition, speech semantic recognition, etc.).
  • the input data is based on user requests sent by client device 240.
  • Figure 2 is also a functional module diagram during the execution of the continuous learning method.
  • the client device 240 can be the user equipment 110 in Figures 1a to 1c or The local device, the execution device 210 and the data storage system 250 can be integrated into the user device 110 or the local device when the data processing capability of the user device 110 is relatively strong.
  • the execution device 210 and the data storage system 250 can also be integrated on the data processing device 120 in Figure 1a.
  • the database 230, the training device 220 and the data collection device 260 can be integrated correspondingly on the data processing device 120 in Figure 1a, or set up on other servers on the cloud or the network, which is not limited by this application.
  • the data collection device 260 may be a terminal device, or an input and output interface of a server or cloud, which is an interaction layer (interface) used to obtain query statements and return reply statements.
  • the architecture of a deep learning model can be a deep neural network.
  • the work of each layer in a deep neural network can be expressed mathematically To describe: From the physical level, the work of each layer in the deep neural network can be understood as completing the transformation from the input space to the output space (that is, the row space of the matrix to the column space) through the operation of five input spaces (sets of input vectors) ), these five operations include: 1. Dimension raising/reducing; 2. Zoom in/out; 3. Rotation; 4. Translation; 5. "Bend”. Among them, the operations of 1, 2 and 3 are performed by Completed, the operation of 4 is completed by +b, and the operation of 5 is implemented by a().
  • space refers to the collection of all individuals of this type of thing.
  • W is a weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer.
  • This vector W determines the spatial transformation from the input space to the output space described above, that is, the weight W of each layer controls how to transform the space.
  • the purpose of training a deep neural network is to finally obtain the weight matrix of all layers of the trained neural network (a weight matrix formed by the vectors W of many layers). Therefore, the training process of neural network is essentially to learn how to control spatial transformation, and more specifically, to learn the weight matrix.
  • the weight vector of the network (of course, there is usually an initialization process before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it Predict lower and keep adjusting until the neural network is able to predict the target value you really want. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value".
  • loss function loss function
  • objective function object function
  • the neural network 201 obtained by training the device 220 can be applied in different systems or devices.
  • the execution device 210 is configured with an I/O interface 212 for data interaction with external devices.
  • the "user" can input data to the I/O interface 212 through the client device 240, that is, user requests, including user voice or text information input by the user, or graph for input images, video information, etc.
  • the execution device 210 can call data, codes, etc. in the data storage system 250, and can also store data, instructions, etc. in the data storage system 250.
  • the computing module 211 uses the neural network 201 to process input data (ie, user requests), thereby generating data processing results corresponding to the target task.
  • the I/O interface 212 returns the data processing results to the client device 240 and presents them to the user on the client device 240.
  • the training device 220 can train the neural network 201 suitable for new tasks based on different task requirements according to different scenario requirements, while maintaining compatible processing capabilities for old tasks to provide users with better results.
  • the user can manually specify the data to be input into the execution device 210 , for example, by operating in the interface provided by the I/O interface 212 .
  • the client device 240 can automatically input data to the I/O interface 212 and obtain the results. If the client device 240 automatically inputs data and requires the user's authorization, the user can set corresponding permissions in the client device 240 .
  • the user can view the results output by the execution device 210 on the client device 240, and the specific presentation form may be display, sound, action, etc.
  • the client device 240 can also serve as a data collection terminal to store the collected video, image, voice and text data in the database 230 for use in the training process.
  • Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present invention.
  • the positional relationship between the devices, devices, modules, etc. shown in Figure 2 does not constitute any limitation.
  • the data The storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 can also be placed in the execution device 210.
  • Figure 3 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application. It serves as an example to characterize the relevant internal structure of the neural network in the present application, but is not limiting.
  • CNN Convolutional Neural Network
  • CNN is a deep neural network with a convolutional structure and a deep learning architecture.
  • Deep learning architecture refers to multiple levels of learning at different levels of abstraction through machine learning algorithms.
  • CNN is a feed-forward artificial neural network in which individual neurons respond to overlapping areas in the image input into it.
  • a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
  • the convolutional layer/pooling layer 120 may include layers 121-126 as examples.
  • layer 121 is a convolution layer
  • layer 122 is a pooling layer
  • layer 123 is a convolution layer
  • layer 124 is a pooling layer
  • layer 125 is a convolution layer
  • layer 126 is a pooling layer
  • 121 and 122 are convolution layers
  • 123 is a pooling layer
  • 124 and 125 are convolution layers
  • 126 is a pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 121 may include many convolution operators.
  • the convolution operator is also called a kernel. Its role in this application is equivalent to extracting specific information from the input speech or semantic information.
  • the convolution operator can essentially be a weight matrix, which is usually predefined.
  • weight values in these weight matrices require a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can extract information from the input video/image/voice/text, thereby helping the convolutional neural network 100 Make correct predictions.
  • each layer 121-126 as shown at 120 in Figure 3 can be a convolution layer followed by a layer
  • the pooling layer can also be a multi-layer convolution layer followed by one or more pooling layers. In the process of natural language data processing, the only purpose of the pooling layer is to reduce the space size of the data.
  • the convolutional neural network 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 120 will only extract features and reduce the parameters brought by the input data. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to use the neural network layer 130 to generate an output or a set of required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in Figure 3) and an output layer 140. The parameters contained in the multiple hidden layers may be based on specific task types. Related training data are pre-trained. For example, the task type can include speech or semantic recognition, classification or generation, etc.
  • the output layer 140 After the multi-layer hidden layer in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140.
  • the output layer 140 has a loss function similar to classification cross entropy, specifically used to calculate the prediction error.
  • the convolutional neural network 100 shown in Figure 3 is only an example of a convolutional neural network.
  • the convolutional neural network can also exist in the form of other network models, for example, as Multiple convolutional layers/pooling layers shown in Figure 4 are parallel, and the extracted features are all input to the neural network layer 130 for processing.
  • the first neural network with the structure shown in Figure 3 and Figure 4 can use the continuous learning method in this application to train the task, and learn the knowledge in the new task while retaining the knowledge learned from the old task.
  • Knowledge makes the first neural network have better anti-forgetting ability and can be applied to different task reasoning, such as image recognition, target detection, speech semantic recognition, etc.
  • FIG. 5 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • a neural network processor (Neural-Networks Processing Unit, NPU) 50 is mounted on the main CPU (Host CPU) as a co-processor, and the Host CPU allocates tasks.
  • the core part of the NPU is the arithmetic circuit 503.
  • the controller 504 controls the arithmetic circuit 503 to extract data in the memory (weight memory or input memory) and perform operations.
  • the computing circuit 503 internally includes multiple processing units (Process Engine, PE).
  • arithmetic circuit 503 is a two-dimensional systolic array.
  • the arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition.
  • arithmetic circuit 503 is a general-purpose matrix processor.
  • the arithmetic circuit obtains the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit.
  • the operation circuit takes matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the matrix is stored in the accumulator 508 accumulator.
  • the vector calculation unit 507 can further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • the vector calculation unit 507 can be used for network calculations of non-convolutional/non-FC layers in neural networks, such as pooling, batch normalization, local response normalization, etc. .
  • vector calculation unit 507 can store the processed output vectors to unified memory 506 .
  • the vector calculation unit 507 may apply a nonlinear function to the output of the operation circuit 503, such as a vector of accumulated values, to generate an activation value.
  • vector calculation unit 507 generates normalized values, merged values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 503, such as for use in a subsequent layer in a neural network.
  • the unified memory 506 is used to store input data and output data.
  • the storage unit access controller 505 (Direct Memory Access Controller, DMAC) transfers the input data in the external memory to the input memory 501 and/or the unified memory 506, stores the weight data in the external memory into the weight memory 502, and stores the weight data in the unified memory 502.
  • the data in 506 is stored in external memory.
  • Bus Interface Unit (BIU) 510 is used to realize interaction between the main CPU, DMAC and fetch memory 509 through the bus.
  • An instruction fetch buffer (Instruction Fetch Buffer) 509 connected to the controller 504 is used to store instructions used by the controller 504.
  • the controller 504 is used to call instructions cached in the fetch memory 509 to control the working process of the computing accelerator.
  • the unified memory 506, the input memory 501, the weight memory 502 and the instruction memory 509 are all on-chip memories, and the external memory is a memory external to the NPU.
  • the external memory can be double data rate synchronous dynamic random access. Memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM for short), High Bandwidth Memory (HBM) or other readable and writable memory.
  • DDR SDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • HBM High Bandwidth Memory
  • Figure 6 is a schematic flow chart of a tensor-based continuous learning method provided by an embodiment of the present application. As shown in Figure 6, the method includes step S610 and step S620. in,
  • Step S610 Obtain input data; wherein the input data includes one or more of video, image, text or voice.
  • the input data obtained in the above steps includes but is not limited to video, image, text, voice and other data that can be processed using neural networks, and this application will not be exhaustive.
  • Step S620 Input the input data into the first neural network to obtain data processing results.
  • the first neural network is obtained through training of m tasks.
  • the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers, each of the B tensor layers contains data in the same dimension of each of the A tensor cores; at the end of the training of the i+1th task.
  • the first neural network adds C tensor cores and/or D tensor layers, and in the training of the i+1th task, the C tensor cores and/or the The parameters in D tensor layers are updated, m is a positive integer, and i is a positive integer less than or equal to m-1.
  • the above A, B, C and D are positive integers.
  • the input data input into the first neural network may be one or more of video, image, text, voice or other data that can be processed by the neural network.
  • each tensor core may contain multiple levels, each level including at least one dimension.
  • Each tensor layer contains data in the same dimension for each tensor core.
  • the tensor core is a 7*3*4 matrix, it contains three levels.
  • the first level contains 7 dimensions
  • the second level contains 3 dimensions
  • the third level contains 4 dimensions.
  • Figure 7 shows a specific example of the relationship between the tensor core and the tensor layer obtained after tensor decomposition.
  • tensor core A can be divided into two dimensions of data at a certain level, namely A1 and A2.
  • Tensor core B can be divided into two dimensions of data at this level: B1 and B2.
  • Tensor core C can be divided into two dimensions of data at this level: C1 and C2.
  • their data in the same dimension constitute a tensor layer.
  • the three tensor cores can be divided into two tensor layers: tensor layer 1 and tensor layer 2.
  • Tensor layer 1 contains data A1, B1, and C1
  • tensor layer 2 contains data A2, B2, and C2.
  • tensor cores A, B, and C are all 5*4*2 matrices.
  • the first level contains 5 dimensions
  • the second level contains three dimensions
  • the third level contains 2 dimensions.
  • these three tensor cores can be divided into 2 tensor layers.
  • Each tensor layer contains one dimension of data for each tensor core at the third level, that is, 5 *4*1 matrix.
  • Figure 7 is only a specific example used in this application to describe the relationship between tensor cores and tensor layers, and does not constitute a limitation on the number of tensor cores and the number of tensor layers.
  • the first neural network For any two adjacent tasks among the above m tasks (taking the i-th and i+1-th tasks as examples), during the training process of the i+1-th task, in the first neural network Add tensor layers and/or tensor cores, and update parameters in the added tensor layers and/or tensor cores to learn new knowledge from the training process of the i+1th task. That is, the newly added tensor core/tensor layer is used to save the knowledge learned from the i+1th task to ensure the continuous learning ability of the first neural network.
  • the changes to the tensor structure in the first neural network include three ways: (1) adding at least one tensor core; (2) adding at least one tensor layer ; (3) Add at least one tensor core and at least one tensor layer.
  • C tensor cores and/or D tensor layers are added, specifically including three methods: (1) In the training process of the i+1th task In, add C tensor cores; (2) During the training process of the i+1th task, add D tensor layers; (3) During the training process of the i+1th task, add C tensor layers Quantum core and D tensor layers. Among them, C and D are positive integers.
  • the above three ways of changing the tensor structure in the first neural network can specifically correspond to the three examples shown in Figures 8a to 8c.
  • Figure 8a corresponds to the first method mentioned above.
  • the first neural network contains three tensor cores and two tensor layers.
  • a tensor layer (A3, B3+C3) was added, and the parameters in the added tensor layer were updated, as shown in the shaded part in Figure 8a.
  • Figure 8b corresponds to the second method mentioned above.
  • the first neural network contains three tensor cores and two tensor layers.
  • a tensor core (D1+D2) is added, and the parameters in the added tensor core are updated, as shown in the shaded part in Figure 8b.
  • Figure 8c corresponds to the third method mentioned above.
  • the first neural network contains three tensor cores and two tensor layers.
  • a tensor core D1+D2+D3
  • a tensor layer were added (A3, B3, C3+D3), and the parameters in the added tensor core and tensor layer are updated, as shown in the shaded part in Figure 8c.
  • the losses during the training process of the ith task and the i+1th task are as follows:
  • the loss function of the i-th task is:
  • L 1 is the value of the loss function
  • Y is the label
  • the loss function of the i+1th task is:
  • L 2 is the value of the loss function
  • Y is the label
  • the continuous learning method in this application includes two implementation methods:
  • the three methods in the aforementioned embodiments can be used to add new tensor cores and/or tensor layers during the training process of the i+1th task, and update the added tensor cores and/or tensors.
  • the parameters in the layer will not be described here.
  • the parameters in all tensor cores or part of the tensor cores remain unchanged, or all tensor layers or Parameters in some tensor layers remain unchanged.
  • the parameters in some or all of the A tensor cores remain unchanged, or some or all of the B tensor layers remain unchanged.
  • the parameters in the quantity layer remain unchanged.
  • the tensor layer or tensor core can be used as the update object to update part of the tensor core or update the parameters in part of the tensor layer.
  • Figure 9a describes the way in which the tensor layer is the update object.
  • the first neural network contains three tensor cores and two tensor layers.
  • the parameters in tensor layer 2 (that is, the shaded part in Figure 9a) are updated.
  • the parameters in tensor layer 1 remain unchanged.
  • Tensor layer 2 is used to save the values from the i+th task. Knowledge learned in 1 task.
  • Figure 9b describes the way in which the tensor core is the update object.
  • the first neural network contains three tensor cores and two tensor layers.
  • the parameters in tensor core A that is, the shaded part in Figure 9b, the updated tensor core A includes A3+A4
  • the parameters in tensor core B and tensor core C The parameters of remain unchanged, and tensor core A is used to save the knowledge learned from the i+1th task.
  • the first neural network targeted by the continuous learning method in this application can be designed in two ways:
  • the first neural network designed contains tensors that can be tensor decomposed
  • tensor decomposition before training for each task, perform tensor decomposition on the tensors in the first neural network to obtain multiple tensor cores/tensor layers; and then use the training data for training to update part or all of the tensors.
  • the parameters in the tensor core, or the parameters in some or all tensor layers are updated; after the training of each task, the tensor core is combined into tensors.
  • the specific tensor decomposition and tensor combination processes will not be expanded on in this application.
  • the A tensor cores are tensor combined to obtain one or more tensors; and after executing the i+1-th task Before training the task, the one or more tensors are tensor decomposed to obtain the A tensor cores.
  • the above-mentioned tensor decomposition methods include: CP decomposition, Tucker decomposition, etc., which is not limited in this application.
  • the first neural network designed contains at least one tensor core and does not contain tensors that can be tensor decomposed.
  • the first neural network replaces tensors with tensor core/tensor layer structures during design. Therefore, during the training process of the task, there is no need to perform tensor decomposition and tensor combination operations.
  • first neural network For example, see the structure of a first neural network shown in Figure 10.
  • the network does not contain tensors that can be tensor decomposed, there is no need to perform tensor decomposition and tensor combination operations.
  • the structure of the first neural network includes two deep neural networks (Deep Neural Network, DNN) and tensor network.
  • DNN Deep Neural Network
  • the tensor network contains n tensor layers.
  • the function of the deep neural network at the input end is to transform the data dimensions to adapt to the data processing needs of the tensor network; the deep neural network at the output end is used to aggregate the data processed by each tensor core.
  • the deep neural network at the output end can also be replaced by other network structures, such as GateNet, etc. This application is not limited to this.
  • an alternate update method can be used to train new tasks.
  • the above-mentioned first neural network is the main model, that is, the model used for reasoning.
  • a backup tensor network is trained using the training data of the new task, and the parameters of the trained backup tensor network are stored.
  • the training process of the i+1th task includes: training the ith backup tensor network using the i+1th batch of data sets to obtain the trained ith backup tensor network; and using the ith backup tensor network.
  • i+1 batch data set trains the first neural network; wherein, the loss function of the first neural network includes the difference between the output of the first neural network and the output of the trained jth backup tensor network
  • the i+1th backup tensor network will first be trained using the training data of the i+1th task. At this time, after learning the first i tasks, i-1 backup tensor networks have been trained. In the process of training the main model (i.e., the first neural network) using the training data of the i+1th task, the already trained i-1 backup tensor networks are used to constrain the update of parameters in the main model.
  • the main model i.e., the first neural network
  • the specific form of the first loss function mentioned above can be as follows:
  • L is the value of the loss function
  • is the main model
  • l is the difference between the main model output and the label
  • f( ⁇ ) is the main model output
  • rank is the rank
  • A, B, C and F are tensor cores
  • f 1 and fi -1 are the outputs of the 1st backup tensor network and the i-1th backup tensor network
  • D is the training data of the i+1th task.
  • the specific form of the above second loss function can be as follows:
  • the above continuous learning method can be applied in a federated learning architecture.
  • the first neural network is one of multiple neural networks, the multiple neural networks are respectively located on different user equipment, and the first neural network is located on the first user equipment. That is, the training of the above-mentioned first neural network is a process on the user device.
  • the first user equipment sends the model parameters of the first neural network to the server, so that the server can perform training according to the model of each neural network in the plurality of neural networks.
  • Parameters update the second neural network on the server; wherein the model parameters of the first neural network include tensor cores included in the first neural network.
  • the federated learning architecture includes a server and s user devices.
  • the continuous learning process on each user device may be the same as described in the foregoing embodiments, and will not be described again here.
  • the updated model parameters i.e., model parameter 1, model parameter 2,..., model parameter s
  • the server uses all the received model parameters to perform Update the neural network on the server (ie, the second neural network in the above embodiment).
  • the loss function used when aggregating models on the server is as follows:
  • L is the total loss
  • i is the model parameter on the i-th user equipment
  • is the constraint strength parameter
  • the tasks include image recognition, target detection, image segmentation or speech semantic recognition.
  • the continuous learning method in this application is used for learning tasks including but not limited to image recognition, target detection, image segmentation or speech semantic recognition, etc.
  • FIG. 13 is a schematic flow chart of another continuous learning method provided by an embodiment of the present application.
  • the method includes step S1310 and step S1320. in,
  • Step S1310 Obtain input data; wherein the input data includes one or more of video, image, text or voice.
  • Step S1320 Input the input data into the first neural network to obtain data processing results.
  • the first neural network is obtained through training of m tasks.
  • the neural network contains A tensor cores, and the A tensor cores include B tensor cores.
  • Each tensor layer in the B tensor layers contains data in the same dimension of each tensor core in the A tensor cores; in the training results of the i+1th task
  • the parameters in some or all of the A tensor cores remain unchanged, or the parameters in some of the B tensor layers remain unchanged
  • m is a positive integer
  • i is a positive integer less than or equal to m-1.
  • the first neural network adds C tensor cores and/or D tensor layers, and after the i+1th task During the training, the parameters in the C tensor cores and/or the D tensor layers are updated.
  • FIG 14 is a schematic flow chart of another continuous learning method provided by an embodiment of the present application.
  • the method includes step S1410 and step S1420. in,
  • Step S1420 Update the second neural network based on the multiple model parameters, and process the data to be processed based on the updated second neural network to obtain a data processing result; wherein the data to be processed includes pictures, videos, and voices. or one or more of the text.
  • the first neural network is obtained through training of m tasks.
  • the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers, each of the B tensor layers contains data in the same dimension of each of the A tensor cores; at the end of the training of the i+1th task
  • the first neural network adds C tensor cores and/or D tensor layers, and in the training of the i+1th task, the C tensor cores and/or the The parameters in D tensor layers are updated, m is a positive integer, and i is a positive integer less than or equal to m-1.
  • the parameters in some or all of the A tensor cores remain unchanged, or some of the B tensor layers Or the parameters in all tensor layers remain unchanged.
  • Step S1510 Receive multiple model parameters respectively sent by multiple user equipments, wherein the multiple user equipments include a first user equipment, and the model parameters sent by the first user equipment include tensors included in the first neural network. nuclear.
  • Step S1520 Update the second neural network based on the multiple model parameters, and process the data to be processed based on the updated second neural network to obtain a data processing result; wherein the data to be processed includes pictures, videos, and voices. or one or more of the text.
  • the first neural network is obtained through training of m tasks.
  • the neural network contains A tensor cores, and the A tensor cores are divided into B tensor layers, each of the B tensor layers contains data in the same dimension of each of the A tensor cores; at the end of the training of the i+1th task After that, the parameters in some or all of the A tensor cores remain unchanged, or the parameters in some of the B tensor layers remain unchanged, m is a positive integer, i is a positive integer less than or equal to m-1.
  • Figure 16 is a schematic structural diagram of a continuous learning device provided by an embodiment of the present application.
  • the device includes an acquisition unit 1610 and a processing unit 1620. in,
  • the acquisition unit 1610 is used to acquire input data; wherein the input data includes one or more of video, image, text or voice; the processing unit 1620 is used to process the input data based on the first neural network to obtain Data processing results; wherein, the first neural network is obtained through training of m tasks.
  • the parameters in some or all of the A tensor cores remain unchanged, or the B tensor cores remain unchanged. Parameters in some or all tensor layers remain unchanged.
  • the processing unit is further configured to: after the training of the i-th task is completed, perform tensor combination on the A tensor cores to obtain one or more tensors; And before performing the training of the i+1th task, perform tensor decomposition on the one or more tensors to obtain the A tensor cores.
  • the processing unit is specifically configured to: train the ith backup tensor network using the i+1th batch of data sets to obtain the training A good i-th backup tensor network; and training the first neural network using the i+1 batch of data sets; wherein the loss function of the first neural network includes the output sum of the first neural network
  • the degree of difference between the outputs of the trained jth backup tensor network, or the loss function of the first neural network includes the model parameters of the first neural network and the trained jth backup tensor network.
  • the first neural network is one of multiple neural networks, the multiple neural networks are respectively located on different user equipment, and the first neural network is located on the first user equipment.
  • the device further includes: a sending unit configured to send the model parameters of the first neural network to the server after the training of each task is completed, so that the server can The model parameters of each neural network in the neural network update the second neural network on the server; wherein the model parameters of the first neural network include the tensor core included in the first neural network.
  • the task includes image recognition, target detection, image segmentation or speech semantic recognition.
  • the continuous learning device can also be used to perform the method in the embodiment of Figure 13, as follows:
  • the acquisition unit 1610 is used to acquire input data; wherein the input data includes one or more of video, image, text or voice; the processing unit 1620 is used to input the input data into the first neural network to obtain data Processing results; wherein, the first neural network is obtained through training of m tasks.
  • the neural network contains A tensor cores, and the A tensor cores include B tensor layers, each of the B tensor layers contains data in the same dimension of each of the A tensor cores; in the training of the i+1th task
  • the parameters in some or all of the A tensor cores remain unchanged, or the parameters in some of the B tensor layers remain unchanged
  • m is a positive integer
  • i is a positive integer less than or equal to m-1.
  • the first neural network increases C tensor cores and/or D tensor layers are obtained, and in the training of the i+1th task, the parameters in the C tensor cores and/or the D tensor layers are renew.
  • Figure 17 is a schematic structural diagram of a continuous learning device provided by an embodiment of the present application.
  • the device includes a receiving unit 1710 and a processing unit 1720. in,
  • the continuous learning device can be used to perform the method in the embodiment of Figure 14, as follows:
  • the receiving unit 1710 is configured to receive multiple model parameters respectively sent by multiple user equipments, wherein the multiple user equipments include a first user equipment, and the model parameters sent by the first user equipment include those included in the first neural network.
  • Tensor core; the processing unit 1720 is configured to update the second neural network based on the multiple model parameters, and process the data to be processed based on the updated second neural network to obtain a data processing result; wherein, the data to be processed Including one or more of pictures, videos, voices or texts; wherein, the first neural network is obtained through m task training, and after the training of the i-th task, the neural network includes A tensor cores, the A tensor cores are divided into B tensor layers, each tensor layer in the B tensor layers includes each tensor core in the A tensor cores.
  • the first neural network adds C tensor cores and/or D tensor layers, and after the i+1th task During the training, the parameters in the C tensor cores and/or the D tensor layers are updated, m is a positive integer, and i is a positive integer less than or equal to m-1.
  • the continuous learning device can also be used to perform the method in the embodiment of Figure 15, as follows:
  • the receiving unit 1710 is configured to receive multiple model parameters respectively sent by multiple user equipments, wherein the multiple user equipments include a first user equipment, and the model parameters sent by the first user equipment include those included in the first neural network.
  • Tensor core; the processing unit 1720 is configured to update the second neural network based on the multiple model parameters, and process the data to be processed based on the updated second neural network to obtain a data processing result; wherein, the data to be processed Including one or more of pictures, videos, voices or texts; wherein, the first neural network is obtained through m task training, and after the training of the i-th task, the neural network includes A tensor cores, the A tensor cores are divided into B tensor layers, each tensor layer in the B tensor layers includes each tensor core in the A tensor cores.
  • the first neural network adds C tensor cores and/or D tensor layers, and in the During the training of i+1 tasks, the parameters in the C tensor cores and/or the D tensor layers are updated.
  • FIG. 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in Figure 18, the device includes a processor 1801, a memory 1802, an interface circuit 1803 and a bus 1804.
  • Interface circuit 1803 is used to obtain input data.
  • the processor 1801 is configured to process the input data based on the first neural network Perform processing to obtain data processing results.
  • the memory 1802 is used to store the above data processing results. Among them, the processor 1801, the memory 1802 and the interface circuit 1803 are interconnected through the bus 1804.
  • the interface circuit 1803 is configured to receive multiple model parameters respectively sent by multiple user equipments; the processor 1801 is configured to update the second neural network based on the multiple model parameters, and process the pending processing based on the updated second neural network. data to obtain data processing results.
  • the memory 1802 is used to store the above data processing results. Among them, the processor 1801, the memory 1802 and the interface circuit 1803 are interconnected through the bus 1804.
  • the embodiment of the present application provides a chip system.
  • the chip system includes at least one processor, a memory and an interface circuit.
  • the memory, the interface circuit and the at least one processor are interconnected through lines.
  • the at least one memory Instructions are stored in; when the instructions are executed by the processor, some or all of the steps described in any of the above method embodiments can be implemented.
  • Embodiments of the present application provide a computer storage medium.
  • the computer storage medium stores a computer program.
  • the computer program is executed, some or all of the steps described in any of the above method embodiments can be realized.
  • Embodiments of the present application provide a computer program.
  • the computer program includes instructions.
  • the computer program is executed by a processor, some or all of the steps described in any of the above method embodiments are implemented.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division. In actual implementation, there may be other divisions.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了一种基于张量的持续学习方法和装置,该方法包括:获取输入数据;输入数据包括视频、图像、文本或语音中的一种或多种;将输入数据输入第一神经网络中,得到数据处理结果;其中,第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,神经网络中包含A个张量核,A个张量核被划分为B个张量层,B个张量层中的每个张量层包含A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练中,第一神经网络增加了C个张量核和/或D个张量层,且C个张量核和/或D个张量层中的参数被更新。通过本申请,可以有效提升模型的抗遗忘能力,且模型的规模增加较小,从而有效节省存储和通信开销。

Description

基于张量的持续学习方法和装置
本申请要求于2022年06月01日提交中国专利局、申请号为202210618700.1、申请名称为“基于张量的持续学习方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及大数据中的人工智能(Artificial Intelligence,AI)技术领域,尤其涉及一种基于张量的持续学习方法和装置。
背景技术
人工智能AI是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
随着大数据、人工智能、物联网等互联网技术的快速发展,各行各业都在逐渐实现数字化和智能化,以助于提升服务效率和服务质量。其中,在金融、电商、医疗、教育、多媒体等各领域中逐渐出现了数字人、虚拟人等交互方式。
传统机器学习(Machine Leaning,ML)要求训练样本之间是独立同分布(Independent and Identically Distributed,IID)数据,为了达到这个目的往往都会随机打乱训练数据,但如果要让模型处理分布在变化的连续数据,且不做任何处理依然按传统方法来训练,就会出现灾难性遗忘,因为新的数据会对模型造成干扰。模型会调整它学到的关于旧数据的参数以适应新任务,这样在旧数据上学到的知识就会被遗忘。
因而,持续学习应运而生。现有的持续学习方法主要是通过在训练新任务的过程中,限制已有模型参数的更新或者增加新的子模型来实现持续学习的目的。
然而,现有的持续学习方法存在抗遗忘能力较差,或者在经过多次持续学习后,模型的规模较大,从而会增加存储和通信开销。
发明内容
本申请实施例提供了一种基于张量的持续学习方法和装置,可以有效提升模型的抗遗忘能力,且模型的规模增加较小,从而有效节省存储和通信开销。
第一方面,本申请提供了一种基于张量的持续学习方法,所述方法包括:获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;将所述输入数据输入第一神经网络中,得到数据处理结果;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
从技术效果上看,在持续学习过程中(即学习完旧任务的神经网络学习新任务的过程), 通过更新新增的张量核和/或张量层中的参数,并可以使得原始张量核中参数保持不变,即在张量核和/或张量层的维度来更新神经网络,相比于现有技术在训练新任务时直接更新神经网络中的整个张量(张量核是由张量进行张量分解得到的)而言,既可以有效地保留从之前任务中学到的知识,也可以从利用新增的张量核和/或张量层来保存从新任务学到的知识,进而使得神经网络具有较高精度的抗遗忘能力。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
从技术效果上看,对于相邻的两个任务而言,在第二个任务的训练过程中,通过保持神经网络中已有的张量核/张量层中的部分或全部参数保持不变,来使得持续学习过程中,神经网络依然保留从之前任务中学习到的知识,而无须对模型结构和规模进行较大调整。
在一种可行的实施方式中,所述方法还包括:在所述第i个任务的训练结束后,将所述A个张量核进行张量组合,得到一个或多个张量;以及在执行所述第i+1个任务的训练之前,将所述一个或多个张量进行张量分解,得到所述A个张量核。
从技术效果上看,用于进行持续学习的神经网络在设计时可以是只包含张量,不包含张量核的神经网络,只是在任务的训练过程中,先对张量进行张量分解,而后更精确地更新分解得到的张量核/或张量层中的参数;同理,神经网络也可以在设计时即为包含张量核/张量层的神经网络,在进行任务的训练过程中无需进行张量分解和张量组合,既可以采用不同方式来设计神经网络,都可以适用本申请中的方法,通用性好。
在一种可行的实施方式中,所述第i+1个任务的训练过程,包括:利用第i+1批数据集训练第i个备份张量网络,得到训练好的第i个备份张量网络;以及利用所述第i+1批数据集训练所述第一神经网络;其中,所述第一神经网络的损失函数包含所述第一神经网络的输出和训练好的第j个备份张量网络的输出之间的差异程度,或者所述第一神经网络的损失函数包含所述第一神经网络的模型参数与训练好的第j个备份张量网络的模型参数之间的差异程度,j=1至i-1,j为小于或等于i-1的正整数。
从技术效果上看,在神经网络新任务的训练过程中,会利用之前任务训练好的备份张量网络来约束神经网络的更新,通过此种方式结合前述实施例中神经网络的张量核/张量层更新方式,来使得新任务训练后,神经网络具有更好的抗遗忘能力。
在一种可行的实施方式中,所述第一神经网络为多个神经网络中的一个,所述多个神经网络分别位于不同的用户设备上,所述第一神经网络位于第一用户设备上,所述方法还包括:在每个任务的训练结束后,所述第一用户设备向服务器发送所述第一神经网络的模型参数,以使所述服务器根据所述多个神经网络中每个神经网络的模型参数更新所述服务器上的第二神经网络;其中,所述第一神经网络的模型参数包括所述第一神经网络所包含的张量核。
从技术效果上看,本申请持续学习方法可以应用于联邦学习架构中,通过在不同的用户设备上进行前述实施例中的持续学习方法更新神经网络参数,使神经网络规模不发生较大改变的情况下保持较好的抗遗忘能力;同时,只将每个用户设备上神经网络的张量核发送到服务器端进行模型聚合,可以有效节省联邦学习过程中的通信开销。
在一种可行的实施方式中,所述任务包括图像识别、目标检测、图像分割或语音语义识别。
从技术效果上看,本申请中的持续学习方法可以不受任务类别的限制,可以应用于各种类型的应用场景中,通用性强。
第二方面,本申请实施例提供了一种基于张量的持续学习方法,所述方法包括:获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;将所述输入数据输入第一神经网络中,得到数据处理结果;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核包含B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
从技术效果上看,上述实施例的技术效果可以参见前述第一方面中对应实施例的描述,此处不再赘述。
第三方面,本申请实施例提供了一种基于张量的持续学习方法,所述方法包括:接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
从技术效果上看,上述实施例的技术效果可以参见前述第一方面中对应实施例的描述,此处不再赘述。
第四方面,本申请实施例提供了一种基于张量的持续学习方法,其特征在于,所述方法包括:接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所 述D个张量层中的参数被更新。
从技术效果上看,上述实施例的技术效果可以参见前述第一方面中对应实施例的描述,此处不再赘述。
第五方面,本申请实施例提供了一种基于张量的持续学习装置,所述装置包括:获取单元,用于获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;处理单元,用于基于第一神经网络对所述输入数据进行处理,得到数据处理结果;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
在一种可行的实施方式中,所述处理单元还用于:在所述第i个任务的训练结束后,将所述A个张量核进行张量组合,得到一个或多个张量;以及在执行所述第i+1个任务的训练之前,将所述一个或多个张量进行张量分解,得到所述A个张量核。
在一种可行的实施方式中,在所述第i+1个任务的训练过程中,所述处理单元具体用于:利用第i+1批数据集训练第i个备份张量网络,得到训练好的第i个备份张量网络;以及利用所述第i+1批数据集训练所述第一神经网络;其中,所述第一神经网络的损失函数包含所述第一神经网络的输出和训练好的第j个备份张量网络的输出之间的差异程度,或者所述第一神经网络的损失函数包含所述第一神经网络的模型参数与训练好的第j个备份张量网络的模型参数之间的差异程度,j=1至i-1,j为小于或等于i-1的正整数。
在一种可行的实施方式中,所述第一神经网络为多个神经网络中的一个,所述多个神经网络分别位于不同的用户设备上,所述第一神经网络位于第一用户设备上,所述装置还包括:发送单元,用于在每个任务的训练结束后,所述第一用户设备向服务器发送所述第一神经网络的模型参数,以使所述服务器根据所述多个神经网络中每个神经网络的模型参数更新所述服务器上的第二神经网络;其中,所述第一神经网络的模型参数包括所述第一神经网络所包含的张量核。
在一种可行的实施方式中,所述任务包括图像识别、目标检测、图像分割或语音语义识别。
第六方面,本申请实施例提供了一种基于张量的持续学习装置,所述装置包括:获取单元,用于获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;处理单元,用于将所述输入数据输入第一神经网络中,得到数据处理结果;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核包含B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述第一神经网络增加 了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
第七方面,本申请实施例提供了一种基于张量的持续学习装置,所述装置包括:接收单元,用于接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;处理单元,用于基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
第八方面,本申请实施例提供了一种基于张量的持续学习装置,所述装置包括:接收单元,用于接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;处理单元,用于基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
第九方面,本申请实施例提供了一种电子设备,所述电子设备包括至少一个处理器,存储器和接口电路,所述存储器、所述接口电路和所述至少一个处理器通过线路互联,所述至少一个存储器中存储有指令;所述指令被所述处理器执行时,上述第一方面到第四方面中任一所述的方法得以实现。
第十方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被执行时,上述第一方面到第四方面中任意一项所述的方法得以实现。
第十一方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被执行时,上述第一方面到第四方面中任意一项所述的方法得以实现。
附图说明
以下对本申请实施例用到的附图进行介绍。
图1a-图1c为本申请实施例提供的几种可以执行本申请中基于张量的持续学习方法的系统架构示意图;
图2为本申请实施例提供的另一种系统架构示意图;
图3为本申请实施例提供的一种卷积神经网络的结构示意图;
图4为本申请实施例提供的另一种卷积神经网络的结构示意图;
图5为本申请实施例提供的一种芯片硬件结构示意图;
图6为本申请实施例提供的一种基于张量的持续学习方法的流程示意图;
图7为本申请实施例提供的一种张量核与张量层之间关系示意图;
图8a-图8c为本申请实施例提供的张量结构的改变方式示意图;
图9a-图9b为本申请实施例提供的张量结构的更新方式示意图;
图10为本申请实施例提供的一种第一神经网络的结构;
图11为本申请实施例提供的一种交替训练的过程示意图;
图12为本申请实施例提供的一种基于联邦学习架构的持续学习过程示意图;
图13为本申请实施例提供的另一种持续学习方法流程示意图;
图14为本申请实施例提供的又一种持续学习方法流程示意图;
图15为本申请实施例提供的再一种持续学习方法流程示意图;
图16为本申请实施例提供的一种持续学习装置结构示意图;
图17为本申请实施例提供的另一种持续学习装置结构示意图;
图18为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
首先对本申请中的专业术语进行相应介绍和描述
(1)张量Tensor:即一个多维数组。标量可以看作是0维张量,向量可以看作1维张量,矩阵可以看作是二维张量。实际使用过程中,通常在数据维度大于三阶的情况下,才将这个 数据称为张量。
(2)张量分解:其从本质上来说是矩阵分解的高阶泛化。通常用于进行降维处理、缺失数据填补(或者说成“稀疏数据填补”)和隐性关系挖掘等。常用的张量分解方法包括:CP分解、Tucker分解、t-SVD分解等。
(3)张量核Tensor Core:通过张量分解得到原张量的因子矩阵为张量核。
(4)张量层:不同张量核在同一维度上的数据集合。
(5)联邦学习(Federated Learning):在进行机器学习的过程中,各参与方可借助其他方数据进行联合建模。各方无需共享数据资源,即数据不出本地的情况下,进行数据联合训练,建立共享的机器学习模型。
(6)持续学习(Continual Learning/Life-Long Learning):在学习新任务的时候,能将之前任务的经验用上,快速学到新任务,同时保持对老任务的记忆,是一种具有抗遗忘能力的深度学习。
下面举例介绍本申请中持续学习方法所适用的两类应用场景,应当理解,其不构成对本申请中方法所适用场景范围的限定。
(1)机器视觉(Computer Vision,CV):本申请中的持续学习方法可以用于对机器视觉领域的各种任务(例如,目标检测、图像分割、图像分类等)的模型进行训练,然后利用训练好的模型进行推理。训练好的模型既可以从新任务的训练数据中学到知识,也可以保留之前任务中的知识。
(2)自然语言处理(Natural Language Processing,NLP):本申请中的持续学习方法可以用于对自然语言处理领域的各种任务(例如,语音语义识别,虚拟人视频生成等场景)的模型进行训练,然后利用训练好的模型进行推理。训练好的模型既可以从新任务的训练数据中学到知识,也可以保留之前任务中的知识。
请参见图1a-图1c,图1a-图1c为本申请实施例提供的几种可以执行本申请中基于张量的持续学习方法的系统架构示意图。
其中,图1a所示的系统包括用户设备110以及数据处理设备120(服务器)。所述用户设备110包括手机、个人电脑、车载终端或者信息处理中心等智能终端。所述用户设备110为数据处理的发起端,通常用户通过用户设备110发起请求。
所述数据处理设备120可以是云服务器、网络服务器、应用服务器以及管理服务器等具有数据处理功能的设备或服务器。所述数据处理设备120通过所述交互接口接收来自用户设备110的查询语句/语音/文本等请求,再通过存储数据的存储器以及数据处理的处理器环节进行机器学习、深度学习、搜索、推理、决策等方式的数据处理,来执行本申请中的持续学习方法,然后利用持续学习方法训练得到的神经网络进行推理,最后将推理得到的数据处理结果通过网络传递到用户设备110上。所述存储器可以是一个统称,包括本地存储以及存储历史数据的数据库,所述数据库可以再数据处理设备上,也可以在其它网络服务器上。
图1b所示的系统中的用户设备110直接作为数据处理设备,直接接收来自用户的输入并直接由用户设备110本身的硬件进行处理,具体过程与图1a相似,可参考上面的描述,在此不再赘述。
图1c所示的系统包括至少一个本地设备(如本地设备301和本地设备302)、执行设备210和数据存储系统250。其中,本地设备相当于图1a和图1b中的用户设备110,执行设备210相当于数据处理设备120,数据存储系统250可以集成在执行设备210上,也可以设置在 云上或其它网络服务器上。
请参见图2,图2为本申请实施例提供的另一种系统架构示意图。如图2所示,数据采集设备260用于采集语音、文本、图像、视频等数据并存入数据库230,训练设备220基于数据库230中维护的图像和文本数据训练得到神经网络201(即本申请中的第一神经网络)。下面在图6所示的方法实施例中将详细地描述训练设备220如何对神经网络201进行持续学习的过程,训练好的神经网络201能够对文本、语音、图像、视频中一种或多种输入数据进行处理,生成与目标任务(如目标检测、图像识别、语音语义识别等)对应的数据处理结果。输入数据是基于客户设备240发送的用户请求所成的。
图2也是持续学习方法执行过程中的功能模块图,在其对应图1a-图1c中的系统(即实际应用场景图)时,客户设备240可以是图1a-图1c中的用户设备110或本地设备,执行设备210以及数据存储系统250在用户设备110数据处理能力比较强大时,可以集成在用户设备110或本地设备内。在一些实施例中,也可以将执行设备210以及数据存储系统250集成在图1a中的数据处理设备120上。数据库230、训练设备220以及数据采集设备260可以对应集成在图1a中的数据处理设备120上,或设置在云上或网络上的其它服务器上,本申请对此不限定。
其中,数据采集设备260可以是终端设备,也可以是服务器或者云的输入输出接口,用于获取查询语句以及返回答复语句的交互层(interface)。
下面将简要介绍本申请中深度学习模型的训练和推理原理。
深度学习模型的架构可以是深度神经网络。深度神经网络中的每一层的工作可以用数学表达式来描述:从物理层面深度神经网络中的每一层的工作可以理解为通过五种输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由完成,4的操作由+b完成,5的操作则由a()来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合。其中,W是权重向量,该向量中的每一个值表示该层神经网络中的一个神经元的权重值。该向量W决定着上文所述的输入空间到输出空间的空间变换,即每一层的权重W控制着如何变换空间。训练深度神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵。
因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
在图2中,训练设备220得到的神经网络201可以应用不同的系统或设备中。执行设备210配置有I/O接口212,与外部设备进行数据交互,“用户”可以通过客户设备240向I/O接口212输入数据,即用户请求,包括用户语音或者用户输入的文本信息、或者用于输入的图 像、视频信息等。
执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。
计算模块211使用神经网络201对输入数据(即用户请求)进行处理,从而生成与目标任务对应的数据处理结果。
最后,I/O接口212将数据处理结果返回给客户设备240,并在客户设备240上呈现给用户。
更深层地,训练设备220可以针对不同的场景需求,基于不同的任务训练得到适用新任务的神经网络201,并同时保持对旧任务的兼容处理能力,以给用户提供更佳的结果。
在图2中所示情况下,用户可以手动指定输入执行设备210中的数据,例如,在I/O接口212提供的界面中操作。另一种情况下,客户设备240可以自动地向I/O接口212输入数据并获得结果,如果客户设备240自动输入数据需要获得用户的授权,用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端将采集到视频、图像、语音和文本数据存入数据库230中供训练过程使用。
值得注意的,图2仅是本发明实施例提供的一种系统架构的示意图,图2中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2中,数据存储系统250相对执行设备210是外部存储器,在其它情况下,也可以将数据存储系统250置于执行设备210中。
请参见图3,图3为本申请实施例提供的一种卷积神经网络的结构示意图,作为一种表征本申请中神经网络的相关内部结构的示例,但不够成限定。
卷积神经网络(Convolutional Neural Network,CNN)是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构。深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元对输入其中的图像中的重叠区域作出响应。
如图3所示,卷积神经网络(CNN)100可以包括输入层110,卷积层/池化层120,其中池化层为可选的,以及神经网络层130。
卷积层:
如图3所示,卷积层/池化层120可以包括如示例121-126层。在一种实现中,121层为卷积层,122层为池化层,123层为卷积层,124层为池化层,125为卷积层,126为池化层;在另一种实现方式中,121、122为卷积层,123为池化层,124、125为卷积层,126为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
以卷积层121为例,卷积层121可以包括很多个卷积算子,卷积算子也称为核,其在本申请中的作用相当于一个从输入的语音或语义信息中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以从输入视频/图像/语音/文本中提取信息,从而帮助卷积神经网络100进行正确的预测。
当卷积神经网络100有多个卷积层的时候,初始的卷积层(例如121)往往提取较多的 一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络100深度的加深,越往后的卷积层(例如125)提取到的特征越来越复杂,比如高级别的图像和语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,即如图3中120所示例的121-126各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在自然语言数据处理过程中,池化层的唯一目的就是减少数据的空间大小。
神经网络层130:
在经过卷积层/池化层120的处理后,卷积神经网络100还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层120只会提取特征,并减少输入数据带来的参数。然而为了生成最终的输出信息(所需要的类信息或别的相关信息),卷积神经网络100需要利用神经网络层130来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层130中可以包括多层隐含层(如图3所示的131、132至13n)以及输出层140,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括语音或语义识别、分类或生成等等。
在神经网络层130中的多层隐含层之后,也就是整个卷积神经网络100的最后层为输出层140,该输出层140具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络100的前向传播(如图3由110至140的传播为前向传播)完成,反向传播(如图3由140至110的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络100的损失及卷积神经网络100通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图3所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,如图4所示的多个卷积层/池化层并行,将分别提取的特征均输入给神经网络层130进行处理。
在本方案中,具有图3和图4所示结构的第一神经网络可以采用本申请中的持续学习方式进行任务的训练,在保留从旧任务中学到知识的情况下,学习新任务中的知识,使得第一神经网络具有较好的抗遗忘能力,且能适用于不同的任务推理,如图像识别、目标检测、语音语义识别等。
请参见图5,图5为本申请实施例提供的一种芯片硬件结构示意图。如图5所示,神经网络处理器(Neural-Networks Processing Unit,NPU)50作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路503,控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路503内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器508  accumulator中。
向量计算单元507可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元507可以用于神经网络中非卷积/非FC层的网络计算,如池化(Pooling),批归一化(Batch Normalization),局部响应归一化(Local Response Normalization)等。
在一些实现种,向量计算单元507能将经处理的输出的向量存储到统一存储器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在神经网络中的后续层中的使用。
统一存储器506用于存放输入数据以及输出数据。
存储单元访问控制器505(Direct Memory Access Controller,DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502,以及将统一存储器506中的数据存入外部存储器。
总线接口单元(Bus Interface Unit,BIU)510,用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。
与控制器504连接的取指存储器(Instruction Fetch Buffer)509,用于存储控制器504使用的指令。
控制器504,用于调用取指存储器509中缓存的指令,实现控制该运算加速器的工作过程。
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,简称DDR SDRAM)、高带宽存储器(High Bandwidth Memory,HBM)或其他可读可写的存储器。
请参见图6,图6为本申请实施例提供的一种基于张量的持续学习方法的流程示意图。如图6所示,该方法包括步骤S610和步骤S620。其中,
步骤S610:获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种。
具体地,上述步骤中所获取到的输入数据包括但不限于视频、图像、文本或者语音等可以利用神经网络进行处理的数据,本申请不做穷举。
步骤S620:将所述输入数据输入第一神经网络中,得到数据处理结果。其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
其中,上述A、B、C和D为正整数。
具体地,输入到第一神经网络中的输入数据可以是视频、图像、文本、语音或其它可以利用神经网络处理的数据中的一种或多种。
具体地,对于每个张量核而言,其可以包含多个层次,每个层次包括至少一个维度。每个张量层则包含每个张量核在同一维度上的数据。
例如,张量核为7*3*4的矩阵,则其包含三个层次,第一个层次包含7个维度,第二个层次包含3个维度,第三个层次包含4个维度。
下面将结合图7的示例详细描述张量核和张量层的具体关系。图7所示可以作为张量分解后,得到的张量核与张量层关系的具体示例。
在图7中,对张量进行张量分解后,得到了三个张量核:张量核A、张量核B和张量核C。张量核A在某一层次上可以分为两个维度的数据,即A1和A2。张量核B在该层次上可以划分为两个维度的数据:B1和B2。张量核C在该层次上可以分为两个维度的数据:C1和C2。对于上述三个张量核而言,其在同一维度上的数据构成了一个张量层。如图7所示,该三个张量核可以划分为两个张量层:张量层1和张量层2。张量层1中包含数据A1、B1和C1,张量层2中包含数据A2、B2和C2。
举例来说,张量核A、B和C同为5*4*2的矩阵,其第一层次包含5个维度,第二层次包含三个维度,第三层次包含2个维度。则此时在第三个层次上,这三个张量核可以划分为2个张量层,每个张量层中包含每个张量核在第三个层次上一个维度的数据,即5*4*1的矩阵。
应当理解,图7只是本申请用于描述张量核与张量层关系的具体示例,并不构成张量核的个数和张量层的层数的限定。
对于上述m个任务中任意两个相邻的任务(以第i个和第i+1个任务为例)而言,通过在第i+1个任务的训练过程中,在第一神经网络中增加张量层和/或张量核,并更新增加的张量层和/或张量核中的参数,来从第i+1个任务的训练过程中学习新的知识。即利用新增加的张量核/张量层来保存从第i+1个任务中学到的知识,保证第一神经网络的持续学习能力。
具体地,在第i+1个任务的训练过程中,对第一神经网络中张量结构的改变包括三种方式:(1)增加至少一个张量核;(2)增加至少一个张量层;(3)增加至少一个张量核和至少一个张量层。
进一步地,上述在第i+1个任务的训练过程中,增加C个张量核和/或D个张量层,具体包括三种方式:(1)在第i+1个任务的训练过程中,增加C个张量核;(2)在第i+1个任务的训练过程中,增加D个张量层;(3)在第i+1个任务的训练过程中,增加C个张量核和D个张量层。其中,C和D为正整数。
举例来说,上述对第一神经网络中张量结构改变的三种方式可以具体对应参见图8a-图8c所示的三个示例。
图8a对应上述第一种方式。第i个任务的训练结束后,第一神经网络中包含三个张量核和两个张量层。第i+1个任务的训练过程中,增加了一个张量层(A3、B3+C3),且该增加的张量层中的参数被更新,即图8a中阴影部分所示。
图8b对应上述第二种方式。第i个任务的训练结束后,第一神经网络中包含三个张量核和两个张量层。第i+1个任务的训练过程中,增加了一个张量核(D1+D2),且该增加的张量核中的参数被更新,即图8b中阴影部分所示。
图8c对应上述第三种方式。第i个任务的训练结束后,第一神经网络中包含三个张量核和两个张量层。第i+1个任务的训练过程中,增加了一个张量核(D1+D2+D3)和一个张量层 (A3、B3、C3+D3),且该增加的张量核和张量层中的参数被更新,即图8c中阴影部分所示。
可选地,在采用增加张量核的方式(即上述第二种方式)来进行第i+1个任务的训练过程中,上述第i个任务和第i+1个任务训练过程中的损失函数分别如下:
第i个任务的损失函数为:
其中,L1为损失函数的数值;Y为标签;X为输入数据;f为第一神经网络的输出;Rank为秩;A、B和C为张量核;⊙表示张量内积。
第i+1个任务的损失函数为:
其中,L2为损失函数的数值;Y为标签;X为输入数据;f为第一神经网络的输出;Rank为秩;A、B、C和F为张量核;⊙表示张量内积。
张量网络的权重参数为:
W=A⊙B⊙C...⊙F
本申请中的持续学习方法包括两种实现方式:
(1)在新任务的训练过程中增加新的张量核和/或张量层,并更新增加的张量核和/或张量层中的参数
具体地,可以采用前述实施例中的三种方式在第i+1个任务的训练过程中,增加新的张量核和/或张量层,并更新增加的张量核和/或张量层中的参数,此处不再赘述。
进一步,此种方式下,新任务学习过程中,对于非新增的张量核/张量层而言,全部张量核或者部分张量核中的参数保持不变,或者全部张量层或者部分张量层中的参数保持不变。
即在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
具体地,若部分张量核/张量层中的参数保持不变,则可采用两种方式进行更新:(1)以张量核为更新对象,更新部分张量核中的参数:(2)以张量层为更新对象,更新部分张量层中的参数。
(2)在新任务的训练过程中不增加新的张量核和/或张量层,更新已有的部分张量核/张量核中的参数
具体地,在新任务的训练过程中,可以以张量层或者张量核为更新对象,更新部分张量核或者更新部分张量层中的参数。
例如,如图9a-图9b所示,描述了分别以张量核和张量层作为更新对象的两种更新方式。
图9a描述了以张量层为更新对象的方式。如图9a所示,第i个任务的训练结束后,第一神经网络中包含三个张量核和两个张量层。第i+1个任务的训练过程中,更新张量层2(即图9a中阴影部分)中的参数,张量层1中的参数保持不变,利用张量层2来保存从第i+1个任务中学到的知识。
图9b描述了以张量核为更新对象的方式。如图9b所示,第i个任务的训练结束后,第一神经网络中包含三个张量核和两个张量层。第i+1个任务的训练过程中,更新张量核A(即图9b中阴影部分,更新后的张量核A包括A3+A4)中的参数,张量核B和张量核C中的参数保持不变,利用张量核A来保存从第i+1个任务中学到的知识。
可选地,本申请中持续学习方法所针对的第一神经网络可以采用两种方式进行设计:
(1)设计的第一神经网络中包含可以进行张量分解的张量
具体地,在进行每个任务的训练之前,对第一神经网络中的张量进行张量分解,得到多个张量核/张量层;然后利用训练数据进行训练,以更新部分或全部张量核中的参数,或者更新部分或全部张量层中的参数;在每个任务的训练结束后,将张量核进行张量组合。具体的张量分解和张量组合过程在本申请中不进行展开。
进一步,基于前述实施例,在所述第i个任务的训练结束后,将所述A个张量核进行张量组合,得到一个或多个张量;以及在执行所述第i+1个任务的训练之前,将所述一个或多个张量进行张量分解,得到所述A个张量核。
可选地,上述进行张量分解的方式包括:CP分解、Tucker分解等,本申请对此不限定。
(2)设计的第一神经网络中包含至少一个张量核,而不包含可以进行张量分解的张量
具体地,第一神经网络在设计时即把张量用张量核/张量层的结构来替代,因此,在任务的训练过程中,无需进行张量分解和张量组合的操作。
举例来说,可以参见图10所示的一种第一神经网络的结构。在该结构所表征的第一神经网络而言,在每次任务的训练过程中,由于网络中不包含可以进行张量分解的张量,因而无需进行张量分解和张量组合操作。
如图10所示,第一神经网络的结构包括两个深度神经网络(Deep Neural Network,DNN)和张量网络。其中,张量网络中包含n个张量层。输入端的深度神经网络的作用在于对数据进行维度变换,以适应张量网络的数据处理需求;输出端的深度神经网络用于对每个张量核处理得到的数据进行聚合。
可选地,输出端的深度神经网络也可以采用其它网络结构进行替代,如GateNet等,本申请对此不限定。
可选地,可以采用交替更新方式,来进行新任务的训练。在交替训练过程中,上述第一神经网络为主模型,即用于进行推理的模型。在每次进行新任务的训练过程中,利用新任务的训练数据训练一个备份张量网络,并存储训练好的备份张量网络的参数。
进一步,所述第i+1个任务的训练过程,包括:利用第i+1批数据集训练第i个备份张量网络,得到训练好的第i个备份张量网络;以及利用所述第i+1批数据集训练所述第一神经网络;其中,所述第一神经网络的损失函数包含所述第一神经网络的输出和训练好的第j个备份张量网络的输出之间的差异程度,或者所述第一神经网络的损失函数包含所述第一神经网络的模型参数与训练好的第j个备份张量网络的模型参数之间的差异程度,j=1至i-1,j为小于或等于i-1的正整数。
具体地,可参见图11,在进行第i+1个任务的训练过程中,会首先利用第i+1个任务的训练数据训练好第i个备份张量网络。此时,经过前i个任务的学习,已经训练好了i-1个备份张量网络。在利用第i+1个任务的训练数据训练主模型(即第一神经网络)的过程中,利用已经训练好的i-1个备份张量网络约束主模型中参数的更新。
进一步,可以采用两种损失函数来进行主模型的训练:(1)损失函数中包含主模型的输出与训练好的第j个备份张量网络的输出之间的差值,j=1至i-1;(2)损失函数中包含主模型中模型参数与训练好的第j个备份张量网络的模型参数之间的差异。
可选地,在第i+1个任务的训练过程中,上述第一种损失函数的具体形式可以如下:
其中,L为损失函数的数值;θ为主模型;l为主模型输出与标签的差异;f(θ)为主模型输出;rank为秩;A、B、C和F为张量核;f1和fi-1为第1个备份张量网络和第i-1个备份张量网络的输出;D为第i+1个任务的训练数据。
可选地,在第i+1个任务的训练过程中,上述第二种损失函数的具体形式可以如下:
其中,各参数的物理意义可参见前述描述,此处不再赘述。
可选地,上述持续学习方法可以应用于联邦学习架构中。
具体地,所述第一神经网络为多个神经网络中的一个,所述多个神经网络分别位于不同的用户设备上,所述第一神经网络位于第一用户设备上。即,上述第一神经网络的训练为用户设备上的过程。
进一步地,在每个任务的训练结束后,所述第一用户设备向服务器发送所述第一神经网络的模型参数,以使所述服务器根据所述多个神经网络中每个神经网络的模型参数更新所述服务器上的第二神经网络;其中,所述第一神经网络的模型参数包括所述第一神经网络所包含的张量核。
具体地,如图12所示,在联邦学习架构中,包括服务器和s个用户设备。其中,每个用户设备上的持续学习过程可以与前述实施例中的描述相同,此处不再赘述。在每个用户设备执行完新任务的训练后,将更新后的模型参数(即模型参数1、模型参数2、…、模型参数s)分别发送到服务器上,服务器则利用接收的所有模型参数进行更新服务器上的神经网络(即上述实施例中的第二神经网络)。
可选地,服务器上模型聚合时所使用的损失函数如下:
其中,L为总损失;i为第i个用户设备上的模型参数;λ为约束强度参数;其它变量可以参见前述公式中的描述,此处不再赘述。
可选地,所述任务包括图像识别、目标检测、图像分割或语音语义识别。
具体地,本申请中的持续学习方法用于学习的任务包括但不限于图像识别、目标检测、图像分割或语音语义识别等。
请参见图13,图13为本申请实施例提供的另一种持续学习方法流程示意图。该方法包括步骤S1310和步骤S1320。其中,
步骤S1310:获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种。
步骤S1320:将所述输入数据输入第一神经网络中,得到数据处理结果。
其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核包含B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结 束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
可选地,在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
具体地,图13所示实施例的具体实现过程可以参见前述图6实施例中的描述,此处不再赘述。
请参见图14,图14为本申请实施例提供的又一种持续学习方法流程示意图。该方法包括步骤S1410和步骤S1420。其中,
步骤S1410:接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核。
步骤S1420:基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种。
其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
可选地,在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
具体地,图14所示实施例的具体实现过程可以参见前述图6实施例中的描述,此处不再赘述。
请参见图15,图15为本申请实施例提供的再一种持续学习方法流程示意图。该方法包括步骤S1510和步骤S1520。其中,
步骤S1510:接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核。
步骤S1520:基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种。
其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
具体地,图15所示实施例的具体实现过程可以参见前述图6实施例中的描述,此处不再赘述。
请参见图16,图16为本申请实施例提供的一种持续学习装置结构示意图。如图16所示,该装置包括获取单元1610和处理单元1620。其中,
该持续学习装置可以用于执行上述图6实施例中的方法,具体如下:
获取单元1610用于获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;处理单元1620用于基于第一神经网络对所述输入数据进行处理,得到数据处理结果;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
在一种可行的实施方式中,所述处理单元还用于:在所述第i个任务的训练结束后,将所述A个张量核进行张量组合,得到一个或多个张量;以及在执行所述第i+1个任务的训练之前,将所述一个或多个张量进行张量分解,得到所述A个张量核。
在一种可行的实施方式中,在所述第i+1个任务的训练过程中,所述处理单元具体用于:利用第i+1批数据集训练第i个备份张量网络,得到训练好的第i个备份张量网络;以及利用所述第i+1批数据集训练所述第一神经网络;其中,所述第一神经网络的损失函数包含所述第一神经网络的输出和训练好的第j个备份张量网络的输出之间的差异程度,或者所述第一神经网络的损失函数包含所述第一神经网络的模型参数与训练好的第j个备份张量网络的模型参数之间的差异程度,j=1至i-1,j为小于或等于i-1的正整数。
在一种可行的实施方式中,所述第一神经网络为多个神经网络中的一个,所述多个神经网络分别位于不同的用户设备上,所述第一神经网络位于第一用户设备上,所述装置还包括:发送单元,用于在每个任务的训练结束后,所述第一用户设备向服务器发送所述第一神经网络的模型参数,以使所述服务器根据所述多个神经网络中每个神经网络的模型参数更新所述服务器上的第二神经网络;其中,所述第一神经网络的模型参数包括所述第一神经网络所包含的张量核。
在一种可行的实施方式中,所述任务包括图像识别、目标检测、图像分割或语音语义识别。
该持续学习装置还可以用于执行上述图13实施例中的方法,具体如下:
获取单元1610用于获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;处理单元1620用于将所述输入数据输入第一神经网络中,得到数据处理结果;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核包含B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述第一神经网络增加 了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
具体地,图16所示装置具体运行过程可以参见前述对应的方法实施例,此处不再赘述。
请参见图17,图17为本申请实施例提供的一种持续学习装置结构示意图。如图16所示,该装置包括接收单元1710和处理单元1720。其中,
该持续学习装置可以用于执行上述图14实施例中的方法,具体如下:
接收单元1710用于接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;处理单元1720用于基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
该持续学习装置还可以用于执行上述图15实施例中的方法,具体如下:
接收单元1710用于接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;处理单元1720用于基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
在一种可行的实施方式中,在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
具体地,图17所示装置具体运行过程可以参见前述对应的方法实施例,此处不再赘述。
请参见图18,图18为本申请实施例提供的一种电子设备的结构示意图。如图18所示,该设备包括处理器1801、存储器1802、接口电路1803和总线1804。
当该电子设备作为用户设备时:
接口电路1803用于获取输入数据。处理器1801用于基于第一神经网络对所述输入数据 进行处理,得到数据处理结果。存储器1802用于存储上述数据处理结果。其中,处理器1801、存储器1802和接口电路1803通过总线1804互连。
当该电子设备作为服务器时:
接口电路1803用于接收多个用户设备分别发送的多个模型参数;处理器1801用于基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果。存储器1802用于存储上述数据处理结果。其中,处理器1801、存储器1802和接口电路1803通过总线1804互连。
应当理解,本申请实施例中电子设备上处理器和存储器的具体运行过程可以参见前述方法实施例中的对应过程,此处不再赘述。
本申请实施例提供了一种芯片系统,所述芯片系统包括至少一个处理器,存储器和接口电路,所述存储器、所述接口电路和所述至少一个处理器通过线路互联,所述至少一个存储器中存储有指令;所述指令被所述处理器执行时,上述方法实施例中记载的任意一种的部分或全部步骤得以实现。
本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,该计算机程序被执行时,使得上述方法实施例中记载的任意一种的部分或全部步骤得以实现。
本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被处理器执行时,使得上述方法实施例中记载的任意一种的部分或全部步骤得以实现。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (26)

  1. 一种基于张量的持续学习方法,其特征在于,所述方法包括:
    获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;
    将所述输入数据输入第一神经网络中,得到数据处理结果;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
  2. 根据权利要求1所述的方法,其特征在于,
    在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    在所述第i个任务的训练结束后,将所述A个张量核进行张量组合,得到一个或多个张量;以及
    在执行所述第i+1个任务的训练之前,将所述一个或多个张量进行张量分解,得到所述A个张量核。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述第i+1个任务的训练过程,包括:
    利用第i+1批数据集训练第i个备份张量网络,得到训练好的第i个备份张量网络;以及
    利用所述第i+1批数据集训练所述第一神经网络;其中,所述第一神经网络的损失函数包含所述第一神经网络的输出和训练好的第j个备份张量网络的输出之间的差异程度,或者所述第一神经网络的损失函数包含所述第一神经网络的模型参数与训练好的第j个备份张量网络的模型参数之间的差异程度,j=1至i-1,j为小于或等于i-1的正整数。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述第一神经网络为多个神经网络中的一个,所述多个神经网络分别位于不同的用户设备上,所述第一神经网络位于第一用户设备上,所述方法还包括:
    在每个任务的训练结束后,所述第一用户设备向服务器发送所述第一神经网络的模型参数,以使所述服务器根据所述多个神经网络中每个神经网络的模型参数更新所述服务器上的第二神经网络;其中,所述第一神经网络的模型参数包括所述第一神经网络所包含的张量核。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,
    所述任务包括图像识别、目标检测、图像分割或语音语义识别。
  7. 一种基于张量的持续学习方法,其特征在于,所述方法包括:
    获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;
    将所述输入数据输入第一神经网络中,得到数据处理结果;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核包含B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
  8. 根据权利要求7所述的方法,其特征在于,
    在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
  9. 一种基于张量的持续学习方法,其特征在于,所述方法包括:
    接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;
    基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
  10. 根据权利要求9所述的方法,其特征在于,
    在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
  11. 一种基于张量的持续学习方法,其特征在于,所述方法包括:
    接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;
    基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
  12. 根据权利要求11所述的方法,其特征在于,
    在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
  13. 一种基于张量的持续学习装置,其特征在于,所述装置包括:
    获取单元,用于获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;
    处理单元,用于基于第一神经网络对所述输入数据进行处理,得到数据处理结果;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
  14. 根据权利要求13所述的装置,其特征在于,
    在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
  15. 根据权利要求13或14所述的装置,其特征在于,所述处理单元还用于:
    在所述第i个任务的训练结束后,将所述A个张量核进行张量组合,得到一个或多个张量;以及
    在执行所述第i+1个任务的训练之前,将所述一个或多个张量进行张量分解,得到所述A个张量核。
  16. 根据权利要求13-15中任一项所述的装置,其特征在于,在所述第i+1个任务的训练过程中,所述处理单元具体用于:
    利用第i+1批数据集训练第i个备份张量网络,得到训练好的第i个备份张量网络;以及
    利用所述第i+1批数据集训练所述第一神经网络;其中,所述第一神经网络的损失函数包含所述第一神经网络的输出和训练好的第j个备份张量网络的输出之间的差异程度,或者所述第一神经网络的损失函数包含所述第一神经网络的模型参数与训练好的第j个备份张量网络的模型参数之间的差异程度,j=1至i-1,j为小于或等于i-1的正整数。
  17. 根据权利要求13-16中任一项所述的装置,其特征在于,所述第一神经网络为多个神经网络中的一个,所述多个神经网络分别位于不同的用户设备上,所述第一神经网络位于第一用户设备上,所述装置还包括:
    发送单元,用于在每个任务的训练结束后,所述第一用户设备向服务器发送所述第一神经网络的模型参数,以使所述服务器根据所述多个神经网络中每个神经网络的模型参数更新所述服务器上的第二神经网络;
    其中,所述第一神经网络的模型参数包括所述第一神经网络所包含的张量核。
  18. 根据权利要求13-17中任一项所述的装置,其特征在于,所述任务包括图像识别、目标检测、图像分割或语音语义识别。
  19. 一种基于张量的持续学习装置,其特征在于,所述装置包括:
    获取单元,用于获取输入数据;其中,所述输入数据包括视频、图像、文本或语音中的一种或多种;
    处理单元,用于将所述输入数据输入第一神经网络中,得到数据处理结果;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核包含B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
  20. 根据权利要求19所述的装置,其特征在于,
    在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
  21. 一种基于张量的持续学习装置,其特征在于,所述装置包括:
    接收单元,用于接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;
    处理单元,用于基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新,m为正整数,i为小于或等于m-1的正整数。
  22. 根据权利要求21所述的装置,其特征在于,
    在所述第i+1个任务的训练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分或全部张量层中的参数保持不变。
  23. 一种基于张量的持续学习装置,其特征在于,所述装置包括:
    接收单元,用于接收多个用户设备分别发送的多个模型参数,其中,所述多个用户设备包括第一用户设备,所述第一用户设备发送的模型参数包括第一神经网络所包含的张量核;
    处理单元,用于基于所述多个模型参数更新第二神经网络,以及基于更新后的所述第二神经网络处理待处理数据,得到数据处理结果;其中,所述待处理数据包括图片、视频、语音或文本中的一种或多种;
    其中,所述第一神经网络是经过m个任务训练得到的,在第i个任务的训练结束后,所述神经网络中包含A个张量核,所述A个张量核被划分为B个张量层,所述B个张量层中的每个张量层包含所述A个张量核中每个张量核在同一维度上的数据;在第i+1个任务的训 练结束后,所述A个张量核中的部分或全部张量核中的参数保持不变,或者所述B个张量层中部分张量层中的参数保持不变,m为正整数,i为小于或等于m-1的正整数。
  24. 根据权利要求23所述的装置,其特征在于,
    在所述第i+1个任务的训练结束后,所述第一神经网络增加了C个张量核和/或D个张量层,且在所述第i+1个任务的训练中,所述C个张量核和/或所述D个张量层中的参数被更新。
  25. 一种电子设备,其特征在于,所述电子设备包括至少一个处理器,存储器和接口电路,所述存储器、所述接口电路和所述至少一个处理器通过线路互联,所述至少一个存储器中存储有指令;所述指令被所述处理器执行时,权利要求1-12中任一所述的方法得以实现。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被执行时,权利要求1-12中任意一项所述的方法得以实现。
PCT/CN2023/096249 2022-06-01 2023-05-25 基于张量的持续学习方法和装置 WO2023231887A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210618700.1A CN115169548A (zh) 2022-06-01 2022-06-01 基于张量的持续学习方法和装置
CN202210618700.1 2022-06-01

Publications (1)

Publication Number Publication Date
WO2023231887A1 true WO2023231887A1 (zh) 2023-12-07

Family

ID=83483156

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/096249 WO2023231887A1 (zh) 2022-06-01 2023-05-25 基于张量的持续学习方法和装置

Country Status (2)

Country Link
CN (1) CN115169548A (zh)
WO (1) WO2023231887A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115169548A (zh) * 2022-06-01 2022-10-11 华为技术有限公司 基于张量的持续学习方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080222A1 (en) * 2017-09-12 2019-03-14 Yonatan Glesner Per kernel kmeans compression for neural networks
CN113095486A (zh) * 2021-04-22 2021-07-09 清华大学 图像处理方法、装置、电子设备及存储介质
CN113792874A (zh) * 2021-09-08 2021-12-14 清华大学 基于先天知识的持续学习方法及装置
CN114463605A (zh) * 2022-04-13 2022-05-10 中山大学 基于深度学习的持续学习图像分类方法及装置
CN115169548A (zh) * 2022-06-01 2022-10-11 华为技术有限公司 基于张量的持续学习方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190080222A1 (en) * 2017-09-12 2019-03-14 Yonatan Glesner Per kernel kmeans compression for neural networks
CN113095486A (zh) * 2021-04-22 2021-07-09 清华大学 图像处理方法、装置、电子设备及存储介质
CN113792874A (zh) * 2021-09-08 2021-12-14 清华大学 基于先天知识的持续学习方法及装置
CN114463605A (zh) * 2022-04-13 2022-05-10 中山大学 基于深度学习的持续学习图像分类方法及装置
CN115169548A (zh) * 2022-06-01 2022-10-11 华为技术有限公司 基于张量的持续学习方法和装置

Also Published As

Publication number Publication date
CN115169548A (zh) 2022-10-11

Similar Documents

Publication Publication Date Title
WO2021159714A1 (zh) 一种数据处理方法及相关设备
WO2021047286A1 (zh) 文本处理模型的训练方法、文本处理方法及装置
US10248664B1 (en) Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval
US20230229898A1 (en) Data processing method and related device
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2020147142A1 (zh) 一种深度学习模型的训练方法、系统
WO2022253074A1 (zh) 一种数据处理方法及相关设备
EP4152212A1 (en) Data processing method and device
WO2023231887A1 (zh) 基于张量的持续学习方法和装置
WO2021129668A1 (zh) 训练神经网络的方法和装置
WO2023020613A1 (zh) 一种模型蒸馏方法及相关设备
WO2020062299A1 (zh) 一种神经网络处理器、数据处理方法及相关设备
CN113505883A (zh) 一种神经网络训练方法以及装置
WO2021169453A1 (zh) 用于文本处理的方法和装置
WO2022222854A1 (zh) 一种数据处理方法及相关设备
CN114329029A (zh) 对象检索方法、装置、设备及计算机存储介质
WO2024067884A1 (zh) 一种数据处理方法及相关装置
CN115238909A (zh) 一种基于联邦学习的数据价值评估方法及其相关设备
WO2024067779A1 (zh) 一种数据处理方法及相关装置
WO2020192523A1 (zh) 译文质量检测方法、装置、机器翻译系统和存储介质
WO2023197857A1 (zh) 一种模型切分方法及其相关设备
CN113128285A (zh) 一种处理视频的方法及装置
WO2022227024A1 (zh) 神经网络模型的运算方法、训练方法及装置
EP3923199A1 (en) Method and system for compressing a neural network
CN114357963A (zh) 问诊模板生成方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23815070

Country of ref document: EP

Kind code of ref document: A1