CN113656333B - Method for accelerating deep learning training task data loading - Google Patents

Method for accelerating deep learning training task data loading Download PDF

Info

Publication number
CN113656333B
CN113656333B CN202111221953.7A CN202111221953A CN113656333B CN 113656333 B CN113656333 B CN 113656333B CN 202111221953 A CN202111221953 A CN 202111221953A CN 113656333 B CN113656333 B CN 113656333B
Authority
CN
China
Prior art keywords
data
cache
training
cur
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111221953.7A
Other languages
Chinese (zh)
Other versions
CN113656333A (en
Inventor
朱春节
银燕龙
何水兵
曾令仿
秦亦
周方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202111221953.7A priority Critical patent/CN113656333B/en
Publication of CN113656333A publication Critical patent/CN113656333A/en
Application granted granted Critical
Publication of CN113656333B publication Critical patent/CN113656333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement

Abstract

The invention discloses a method for accelerating deep learning training task data loading, which uses a double random sequence mode to calculate a random sequence of the next period in advance when each training period starts, and applies an independent memory to cache data required by the initial stage of the next period in advance. While data are sequentially prepared for the neural network according to the random sequence of the current period, the data required by the initial stage of the next period can be copied from the memory to the cache in time by referring to the random sequence of the next period, so that all the data required by the initial stage of the next period can be obtained from the cache. The method does not need to modify the existing deep learning framework, is simple to implement, introduces low calculation cost, can completely hit the cache data and can be used for multiple times, thereby reducing the data reading from a back-end storage system, and the acceleration effect of the method is more obvious when the training period number is more.

Description

Method for accelerating deep learning training task data loading
Technical Field
The invention relates to the field of deep learning, in particular to a method for accelerating the loading of deep learning training task data.
Background
Deep learning is a branch of machine learning, is an algorithm for performing characterization learning on data based on an artificial neural network, and is widely applied to the fields of computer vision, speech recognition, natural language processing and the like. The training process of the deep learning training task is executed in a plurality of cycles, and a convergent model is generated by repeated training. The training process of each period can be divided into three stages, namely data loading, data enhancement and neural network model training. The data loading stage needs to realize two functions, namely reading the training set from a back-end storage system to a memory, and randomly shuffling the training set. The function of the data enhancement stage is to perform operations such as turning, rotating, zooming, clipping, shifting, color mixing and the like on the training data in the memory, and increase the sample space covered by the training set.
And in the neural network model training stage, the enhanced data is utilized to calculate a neural network model containing a plurality of parameters.
Where the data load phase is I/O intensive and the other two phases are compute intensive. The increasing speed of computer computing power is far greater than the increasing speed of storage-side I/O performance in recent years, so that the proportion of the time overhead of a data loading phase in the total training overhead is continuously increased, and the data loading phase is gradually one of the bottlenecks of deep learning training.
The traditional method for accelerating data loading focuses on optimizing the organization mode and access mode of a training set in a back-end storage system, for example, small files in the training set are packaged into bundles, and the training set is loaded into a memory by taking the bundles as units, so that random reading operation of the small files with low performance is avoided, or the small files are sequentially loaded into the memory according to the data storage sequence, and then local randomization is realized in the memory, so that random reading with low performance is converted into rapid sequential reading. The methods can effectively utilize the I/O bandwidth of a back-end storage system and accelerate the speed of loading the training set into the memory, but the acceleration effect of the methods on data loading almost reaches the limit.
In order to avoid the overfitting phenomenon of deep learning, the training set is required to be shuffled globally and randomly in the data loading stage under normal conditions. However, because the size of the training set is too large, the global shuffling process cannot be directly performed in the memory, so a random sequence needs to be calculated in each training period, and then the data of the training set is loaded into the memory one by one according to the sequence, and the data enhancement operation directly modifies the original data in the memory, which results in that the data can be used only once after being loaded into the memory from the back-end storage system each time, the data needs to be loaded again in the next period, and the back-end storage system faces heavy I/O burden. Currently, there is a lack of an effective solution to this problem.
Disclosure of Invention
In order to solve the defects of the prior art, and achieve the purposes of reducing data reading from a back-end storage system, improving the data loading speed and having more acceleration effect along with more training cycles, the invention adopts the following technical scheme:
a method for accelerating deep learning training task data loading comprises the following steps:
s1, when the deep learning training task is initialized, an area is divided into a Cache from the memory occupied by the deep learning training task and recorded as CachenextThe method provides data required by a deep learning training task in the current period, and caches data required by the initial stage of the next period in advance;
s2, constructing a double random sequence mode for determining the sequence of the data of the training set entering the neural network, wherein the elements of the random sequence correspond to the data of the training set one by one, and in each training period, two new and old random sequences which are different from each other and are independent from each other exist simultaneously;
s3, before the first training period, a random sequence S is generatednextAnd at the beginning of any training period, the existing random sequence S is setnextIs assigned to ScurDetermining the sequence of the current period data entering the neural network, and then generating a new random sequence denoted as SnextFor determining the order of entry of the next cycle data into the neural network, SnextComprising a subsequence Snext_prefixCovering the training set data to be used in the initial stage of the next cycle, as the data loading stage is periodically executed, each cycle is traversed by ScurFor ScurEach element S ofcur[i]From CachenextOr back-end storage system acquisition Scur[i]Putting the data of the corresponding training set into a memory, and then referring to Snext_prefixUpdating CachenextThe method comprises the following steps:
s31, when S iscur[i]In CachenextOn a hit of the preceding curList, Scur[i]Copying the data of corresponding training set from curList to memory, and deleting S in curListcur[i]Corresponding toData, at this time, if Scur[i]Present in Snext_prefixIn, then Scur[i]Data corresponding to the training set is inserted into the CachenextThe next segment of nexList;
s32, when S iscur[i]In CachenextWhen the front-end curList is not hit, S is read from the back-end storage systemcur[i]The data of the corresponding training set is stored in the memory, and if S is detected at this timecur[i]Present in Snext_prefixIn, then Scur[i]Data corresponding to the training set is inserted into the CachenextThe next segment of nexList;
s33, when S iscurWhen the traversal is finished, clearing ScurLeaving only one random sequence Snext
And S4, completing the current cycle, if the number of completed cycles is less than the preset number N, returning to S3 to start the training of the next cycle, and if all training cycles are completed, finishing the deep learning training task.
Further, the S2 includes the following steps:
s21, before the first period of the deep learning training task is started, a random sequence is generated and is marked as Snext
S22, at the beginning of each cycle, SnextIs assigned to Scur,ScurDetermining the sequence of the current period data entering the neural network;
s23, generating another random sequence assignment to S by using the new random seednext,SnextDetermining the sequence of the data of the next period entering the neural network, so that two mutually different and mutually independent random sequences exist in the system at the same time;
s24, clearing S when one period is finishedcurRetention of Snext
Further, the Cache of S1nextLogically divided into a curList that buffers data to be used in a current training cycle and a nexList that buffers data to be used in a next training cycle, comprising the steps of:
s11, before the first training period begins, CachenextEmpty, curList and nexList are also empty;
s12, in the first training period, inserting the curList as null into the CachenextAll located in the nexList;
s13, when the non-first training period starts, the data of the nexList are all transferred into the curList, and the nexList is empty;
s14, during the non-first training period, the data hit in curList is removed, its length is gradually shortened, and a new Cache is insertednextThe data of (2) all enter the nexList, so that the length of the nexList is gradually increased;
s15, when a training period is over, the curList length is zero, the nexList length is equal to the CachenextLength of (d);
s16, the data sequence in the nexList and the ID of the data are in Snext_prefixThe order of (a) and (b) is kept consistent.
Further, in the S1, CachenextThe organization is in a linked list mode and is logically divided into curList and nexList.
Further, in the S1, CachenextThe capacity of (a) is determined at the discretion of the developer based on the actual available memory of the system.
Further, in S2, the element of the random sequence is the ID of the data in the training set.
Further, in S2, the random sequence is generated by using a random function, and the random seed required by the random function is initialized by using a clock of the computer.
Further, in S2, the random sequence corresponds to the data in the training set one by one, and the sequence length is the same as the total number of the data in the training set.
Further, in the S3, Snext_prefixLength of (d) and CachenextThe number of the contained nodes is the same, and the CachenextIs predefined, so Snext_prefixThe number of elements included is set before the deep learning training task begins.
Further, in S3, after the data of the training set enters the memory, the data enhancement stage is performed, and the enhancement operation directly modifies the original data in the memory, and the enhanced data and other enhanced data form a batch, and then the neural network model training stage is performed.
The invention has the advantages and beneficial effects that:
the invention additionally occupies a memory as a cache, and when each training period of the deep learning training task starts, the random sequence required by shuffling in the next period is calculated in advance, so that the data loading stage of each period has double random sequences. When the training set data is loaded into the memory according to the random sequence of the current period, the data to be used in the initial stage of the next period is cached in sequence by referring to another random sequence, so that the required data can be quickly read from the cache in the initial stage of the next period in the data loading stage without being read to a back-end storage system, the time overhead of the data loading stage is obviously reduced, and the I/O bottleneck of a deep learning training task is eliminated. The memory space occupied by the cache is configurable, the larger the configured memory is, the better the acceleration effect of the algorithm is, and in addition, the more the number of cycles executed by the deep learning training task is, the better the acceleration effect of the algorithm is. Finally, the situation that the data can only be used once after being loaded into the memory from the back-end storage system every time and needs to be loaded again in the next period is avoided, and the heavy I/O burden of the back-end storage system is relieved.
Drawings
FIG. 1 is a diagram of a working framework for accelerating deep learning training tasks using the method of the present invention.
FIG. 2 shows the Cache of the present inventionnextSchematic node design of (1).
FIG. 3 shows the Cache of the present inventionnextSchematic diagram of the organization structure of (1).
FIG. 4 is a diagram of a double random sequence in the present invention.
FIG. 5 shows S in the present inventionnextSchematic diagram of the logical partitioning of random sequences.
Fig. 6 is a flow chart of a method of the present invention.
FIG. 7 is the update Cache of the present inventionnextIs described.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 6, a method for accelerating data loading of a deep learning training task is provided, which aims to significantly reduce the time overhead of a data loading phase and eliminate an I/O bottleneck of the deep learning training task. The algorithm needs to additionally occupy a memory, the occupied memory space is configurable, and the larger the configured memory is, the better the acceleration effect of the algorithm is. In addition, the more cycles the deep learning training task is performed, the better the algorithm accelerates. The algorithm calculates the random sequence required by the next cycle in advance, so that the data loading phase of each cycle has a double random sequence. Then, while loading the training set according to the random sequence of the current period, the data to be used in the initial period of the next period is buffered in sequence by referring to another random sequence in advance. Then, at the beginning of the next cycle, the data loading phase can quickly read the required data from the cache, but not from the back-end storage system.
The method provided by the invention adopts double random sequences in the data loading stage, wherein the double random sequences respectively indicate the sequence of the training set entering the neural network in the current period and the next period, but the random sequence of only one period needs to be updated in each period. Correspondingly, the algorithm separately applies for a memory area as a CachenextThe purpose is to provide the data needed by the current period on one hand and buffer the data to be used in the initial period of the next period in time on the other hand.
The method for training the ResNet model on the deep learning platform Pyorch by using the ImageNet data set comprises the following steps:
1. as shown in FIG. 1, the device of the present invention is deployed in the form of one component on a deep learning platform Pytrch.
1.1, when a ResNet model training task is initialized, the component applies for a buffer area for the task, and the buffer area is recorded as CachenextAt this time, the CachenextNo content is cached; cache memorynextThe capacity of the Cache is preset by a user according to the available memory of the system, if the idle memory of the system has 2GB, the Cache is usednextCannot be greater than 2 GB.
1.2、CachenextThe adopted data structure is a two-way linked list, the structure is shown in figure 3, and the CachenextHaving a pointer, which will CachenextThe linked list is divided into two parts, the part near the head of the list buffers the epochmDuring the period, Pythroch will reference the picture, which is marked as curList, and the part near the tail of the list caches epochm+1During the period, the Pythroch records the referenced picture as nexList, and the pointer points to the first node of the nexList; the node design of the linked list is as shown in fig. 2, each node mainly includes three members, namely pre, nex and data, where pre is a pointer pointing to a predecessor node of the current node in the linked list, nex is a pointer pointing to a successor node of the current node in the linked list, and data is a pointer pointing to a data field of the current node, and in this case, a data field of one node caches one picture of the ImageNet training set.
1.3, ImageNet training set is composed of a series of pictures, each picture is only quoted once during an epoch, Pythroch uses shuffle function to generate random sequence, the length of the sequence is equal to the total number of pictures in ImageNet training set, each element in the sequence corresponds to one picture in ImageNet training set, and Pythroch determines the sequence of pictures entering ResNet model according to the random sequence.
2. The training process of the ResNet model is performed N times periodically.
2.1、epoch0Before start, ScurNull, Pyorch uses the shuffle function to generate a random sequence denoted SnextIt determines the epoch0During which the images of the ImageNet training set enter the order of the ResNet model.
2.2、epochm(0<=m<N-1) at the beginning, S is first introducednextIs assigned to ScurAnd is formed by ScurDetermining epochmDuring the process, pictures enter the training sequence of ResNet model, and then Pythrch uses the shuffle function to generate a new random sequence assignmentValue given to SnextAnd S isnextDetermining epochm+1The sequence of pictures entering ResNet model training; in this case, the Pythrch has two random sequences that are independent of each other and different from each other, as shown in FIG. 4.
2.3 when two random sequences ScurAnd SnextReady, Pyorch at epochmAccording to ScurSequentially from CachenextOr the back-end storage system loads the picture and then according to SnextUpdating CachenextThe process is shown in FIG. 6, and the specific steps are as follows
2.3.1 if Scur[i]In CachenextHit, Pythrch is from CachenextTaking out the corresponding picture to the memory and taking the picture from the CachenextDeleting; otherwise, the Pythrch reads the picture from the back-end storage system to the memory.
2.3.2, as shown in FIG. 5, SnextLogically divided into front and back parts, denoted as Snext_prefixAnd Snext_suffixIn which S isnext_prefixLength of (D) is measured by CachenextIs determined by the capacity of Snext_prefixCovered pictures in epochmIs inserted into Cache in sequencenext
If S iscur[i]Present in Snext_prefixThen copy S in the memorycur[i]Corresponding pictures are inserted into the CachenextEven if S iscur[i]From Cache just in step 2.3.1 abovenextRemoving; otherwise, not updating the Cachenext
2.4, this time Scur[i]After the picture is ready in the memory, the picture and other pictures subjected to data enhancement form a batch after the data enhancement phase, and then the ResNet model is entered for training.
2.5 when Scur[i]Is ScurThe last item of (1), epochmDuring the period when the data loading is completely finished, the emptying Scur
2.6 when epochmAfter the training of all the pictures in the ImageNet training set is completed, the epochmThe cycle is over.
3、CachenextProviding Pythrch in epochmData required during and cache epochm+1Data required at the beginning, and CachenextIs concurrently executed with the deep learning training task, CachenextThe updating process shown in FIG. 7 specifically includes the following steps
3.1、epoch0At the beginning, CachenextEmpty, curList and nexList are also empty; epochm(0<=m<N), all nodes in the nexList are transferred into curList, the nexList is empty, and pointer points to NULL.
3.2, in step 2.3.2 above, Scur[i]Corresponding picture insertion CachenextComprises the following steps: if the pointer is NULL, the picture is taken from the CachenextThe tail of the linked list is inserted to become the first node of the nexList, and then the pointer points to the new node; otherwise, inserting the picture into the CachenextThe length of the nexList increases and ensures that all nodes of the nexList are located with them at Snext_prefixThe positions of the corresponding elements are kept consistent; at the end of an epoch, the nexList is equivalent to the Cachenext,Snext_prefixAll covered pictures of are inserted into the Cachenext
3.3, in the above step 2.3.1, judge Scur[i]Whether or not it is in CachenextOn hit, only S needs to be judgedcur[i]Whether existing in the first node of curList, if yes, Scur[i]In CachenextHit, otherwise miss; when S iscur[i]In CachenextOn hit, S is removed from curListcur[i]The length of curList is shortened; at the end of an epoch, curList is empty.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for accelerating deep learning training task data loading is characterized by comprising the following steps:
s1, dividing a region in the memory as Cache, recording as Cachenext
S2, constructing a double random sequence mode for determining the sequence of the data of the training set entering the neural network, wherein the elements of the random sequence correspond to the data of the training set one by one, and in each training period, two new and old random sequences which are different from each other and are independent from each other exist simultaneously;
s3, before the first training period, a random sequence S is generatednextAnd at the beginning of any training period, the existing random sequence S is setnextIs assigned to ScurDetermining the sequence of the current period data entering the neural network, and then generating a new random sequence denoted as SnextFor determining the order of entry of the next cycle data into the neural network, SnextComprising a subsequence Snext_prefixCovering the training set data to be used in the initial stage of the next cycle, as the data loading stage is periodically executed, each cycle is traversed by ScurFor ScurEach element S ofcur[i]From CachenextOr back-end storage system acquisition Scur[i]Putting the data of the corresponding training set into a memory, and then referring to Snext_prefixUpdating CachenextThe method comprises the following steps:
s31, when S iscur[i]In CachenextOn a hit of the preceding curList, Scur[i]Copying the data of corresponding training set from curList to memory, and deleting S in curListcur[i]Corresponding data, at this time, if Scur[i]Present in Snext_prefixIn, then Scur[i]Data corresponding to the training set is inserted into the CachenextThe next segment of nexList; cache memorynextLogically divided into curList that caches data used for the current training cycle and nexList that caches the next training cycleData to be used for each training period;
s32, when S iscur[i]In CachenextWhen the front-end curList is not hit, S is read from the back-end storage systemcur[i]The data of the corresponding training set is stored in the memory, and if S is detected at this timecur[i]Present in Snext_prefixIn, then Scur[i]Data corresponding to the training set is inserted into the CachenextThe next segment of nexList;
s33, when S iscurWhen the traversal is finished, clearing ScurLeaving only one random sequence Snext
And S4, completing the current cycle, if the number of completed cycles is less than the preset number N, returning to S3 to start the training of the next cycle, and if all training cycles are completed, finishing the deep learning training task.
2. The method for accelerating the loading of deep learning training task data according to claim 1, wherein said S2 comprises the following steps:
s21, before the first period of the deep learning training task is started, a random sequence is generated and is marked as Snext
S22, at the beginning of each cycle, SnextIs assigned to Scur,ScurDetermining the sequence of the current period data entering the neural network;
s23, generating another random sequence assignment to S by using the new random seednext,SnextDetermining the sequence of the data of the next period entering the neural network, so that two mutually different and mutually independent random sequences exist in the system at the same time;
s24, clearing S when one period is finishedcurRetention of Snext
3. The method for accelerating the loading of deep learning training task data according to claim 1, wherein said S1 comprises the following steps:
s11, before the first training period begins, CachenextEmpty, curList and nexList are alsoEmpty;
s12, in the first training period, inserting the curList as null into the CachenextAll located in the nexList;
s13, when the non-first training period starts, the data of the nexList are all transferred into the curList, and the nexList is empty;
s14, during the non-first training period, the data hit in curList is removed, its length is gradually shortened, and a new Cache is insertednextThe data of (2) all enter the nexList, so that the length of the nexList is gradually increased;
s15, when a training period is over, the curList length is zero, the nexList length is equal to the CachenextLength of (d);
s16, the data sequence in the nexList and the ID of the data are in Snext_prefixThe order of (a) and (b) is kept consistent.
4. The method of claim 1, wherein the Cache in S1 is the Cache ofnextThe system is organized in a linked list mode and is logically divided into a curList and a nexList.
5. The method of claim 1, wherein in S1, the Cache is implemented by a computer, a computer readable medium, and a computer-readable storage mediumnextThe capacity of (c) is determined according to the actual available memory of the system.
6. The method of claim 1, wherein in S2, the elements of the random sequence are IDs of data in the training set.
7. The method of claim 1, wherein in step S2, the random sequence is generated by using a random function, and the random seed required by the random function is initialized by using a computer clock.
8. The method of claim 1, wherein in S2, the elements of the random sequence are in one-to-one correspondence with the data in the training set, and the sequence length is the same as the total number of the data in the training set.
9. The method of claim 1, wherein in S3, S isnext_prefixLength of (d) and CachenextThe number of the contained nodes is the same, and the CachenextIs predefined, so Snext_prefixThe number of elements included is set before the deep learning training task begins.
10. The method according to claim 1, wherein in S3, the training set data enters a data enhancement stage after entering a memory, and the enhancement operation directly modifies the original data in the memory, and the enhanced data constitutes a batch and then enters a neural network model training stage.
CN202111221953.7A 2021-10-20 2021-10-20 Method for accelerating deep learning training task data loading Active CN113656333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111221953.7A CN113656333B (en) 2021-10-20 2021-10-20 Method for accelerating deep learning training task data loading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111221953.7A CN113656333B (en) 2021-10-20 2021-10-20 Method for accelerating deep learning training task data loading

Publications (2)

Publication Number Publication Date
CN113656333A CN113656333A (en) 2021-11-16
CN113656333B true CN113656333B (en) 2022-03-18

Family

ID=78494740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111221953.7A Active CN113656333B (en) 2021-10-20 2021-10-20 Method for accelerating deep learning training task data loading

Country Status (1)

Country Link
CN (1) CN113656333B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110892484A (en) * 2018-07-11 2020-03-17 因美纳有限公司 Deep learning-based framework for identifying sequence patterns causing sequence-specific errors (SSEs)
CN111917474B (en) * 2020-07-22 2022-07-29 北京理工大学 Implicit triple neural network and optical fiber nonlinear damage balancing method
CN111858072B (en) * 2020-08-06 2024-02-09 华中科技大学 Resource management method and system for large-scale distributed deep learning
DE202020107550U1 (en) * 2020-12-23 2021-03-11 Ever Health Bio Medical International Co., Ltd. Personalized nutritional interpretation system based on transfer learning

Also Published As

Publication number Publication date
CN113656333A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
KR102465896B1 (en) Modification of machine learning models to improve locality
CN104050092B (en) A kind of data buffering system and method
JP2004514147A (en) Streaming architecture for waveform processing
EP0221358A2 (en) Sort string generation in a staged storage system
CN105739951B (en) A kind of L1 minimization problem fast solution methods based on GPU
CN114968588A (en) Data caching method and device for multi-concurrent deep learning training task
Helman et al. Designing practical efficient algorithms for symmetric multiprocessors
US20160224581A1 (en) Recursive Multi-Threaded File System Scanner For Serializing File System Metadata Exoskeleton
Chen et al. moDNN: Memory optimal DNN training on GPUs
CN113656333B (en) Method for accelerating deep learning training task data loading
CN115712583B (en) Method, device and medium for improving distributed cache cross-node access performance
CN110795042A (en) Method for writing and flushing metadata of full flash memory storage system and related components
CN102413170A (en) Graphic data client buffer memory method based on FLEX
CN116107754A (en) Memory management method and system for deep neural network
CN116205273A (en) Multi-agent reinforcement learning method for optimizing experience storage and experience reuse
CN107273310A (en) A kind of read method of multi-medium data, device, medium and equipment
US20230394307A1 (en) Data caching method and apparatus for multiple concurrent deep learning training tasks
CN117215973A (en) Processing method of cache data, deep learning training method and system
CN112561038A (en) Batch data set construction method and device, electronic equipment and storage medium
KR20200092900A (en) Method for overcoming catastrophic forgetting by neuron-level plasticity control and computing system performing the same
Wen et al. A swap dominated tensor re-generation strategy for training deep learning models
CN114528111B (en) FPGA chip for data recall and data recall method
CN111191774A (en) Simplified convolutional neural network-oriented low-cost accelerator architecture and processing method thereof
US11455533B2 (en) Information processing apparatus, control method, and non-transitory computer-readable storage medium for storing information processing program
CN108228801B (en) Jump table multithreading optimization method and device based on multi-core processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant