CN103049421A - Method and device for data transmission between central processing unit (CPU) and co-processors - Google Patents

Method and device for data transmission between central processing unit (CPU) and co-processors Download PDF

Info

Publication number
CN103049421A
CN103049421A CN2012105322924A CN201210532292A CN103049421A CN 103049421 A CN103049421 A CN 103049421A CN 2012105322924 A CN2012105322924 A CN 2012105322924A CN 201210532292 A CN201210532292 A CN 201210532292A CN 103049421 A CN103049421 A CN 103049421A
Authority
CN
China
Prior art keywords
coprocessor
data
cpu
data slicer
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105322924A
Other languages
Chinese (zh)
Other versions
CN103049421B (en
Inventor
欧阳剑
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210532292.4A priority Critical patent/CN103049421B/en
Publication of CN103049421A publication Critical patent/CN103049421A/en
Application granted granted Critical
Publication of CN103049421B publication Critical patent/CN103049421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Advance Control (AREA)
  • Multi Processors (AREA)

Abstract

The invention provides a method and a device for data transmission between a central processing unit (CPU) and co-processors. The method comprises steps of controlling data transmission of N co-processors in accordance with N thread parallelisms generated by the CPU, and the N is an integer of no less than 2. The control comprises that co-processors receive data which are transmitted by the CPU in a slice mode. Or, the co-processors receive and store data slice of the current moment, which is transmitted by the CPU or the last co-processor and transmit the stored data slice of the last moment to the next co-processor. By the aid of the method and the device, buses among co-processors and the CPU and among co-processors can be utilized, and the transmission efficiency of data which are transmitted to co-processors by the CPU and transmitted to other co-processors by co-processors is improved.

Description

Data transmission method between a kind of CPU and coprocessor and device
[technical field]
The present invention relates to the processor data transmission technology, relate in particular to data transmission method and device between a kind of CPU and coprocessor.
[background technology]
Nowadays, take the GPU(graphic process unit) had from strength to strength computing power as the coprocessor of representative, in numerous fields that need high-performance calculation, the mode that the capital adopts many coprocessors to cooperate with CPU is carried out calculation task, in this process, often need the data transmission between CPU and coprocessor and a plurality of coprocessor, the efficient of data transmission directly affects the execution efficient of calculation task.
In the existing data transmission method, data transfer to a plurality of coprocessors from CPU, and perhaps during to a plurality of coprocessor, data transfer efficient is all very low, is mainly reflected in data broadcast for coprocessor:
When one piece of data is transferred to a plurality of coprocessor from CPU, existing method normally CPU is carried out data transmission with these a plurality of coprocessors successively, be after each CPU and a coprocessor transfer this piece of data, carry out data transmission with another coprocessor again, like this so that CPU and a coprocessor when the transmission of data, the bus of all the other coprocessors all is in idle condition, and total line use ratio is very low.
Coprocessor is with a data transmission during to all the other a plurality of coprocessors, existing method normally coprocessor is sent to the CPU internal memory to data first, internal memory from CPU transfers to another coprocessor successively again, perhaps, the transition function that directly provides by the coprocessor manufacturer, successively with data transmission to all the other a plurality of coprocessors, with above-mentioned CPU that a data transmission is similar to a plurality of coprocessors, these two kinds of methods can cause not being in idle condition in the bus of the coprocessor of the transmission of data equally, and total line use ratio is very low.
Aforesaid problem is so that the data transmission efficiency between CPU and coprocessor and a plurality of coprocessor is very low, and can directly reduce the arithmetic capability of whole system, for example in the training process of speech recognition, need to cooperate CPU to calculate by many GPU, each GPU will have with a training data, yet because data transfer overhead is larger, can causes the training speed of many GPU even not have single GPU fast.
[summary of the invention]
In view of this, the invention provides data transmission method and device between a kind of CPU and coprocessor, in the time of can improving CPU data be sent to a plurality of coprocessor and coprocessor with data transmission the data transmission efficiency during to all the other a plurality of coprocessors.
Concrete technical scheme is as follows:
Data transmission method between a kind of CPU and coprocessor, the method comprises:
According to N the thread parallel that CPU generates the data transmission of N coprocessor is controlled, described N is the integer more than or equal to 2;
Described control comprises: coprocessor receives the data that CPU sends with the data slicer form; Perhaps, in the time of the data slicer of the current time that coprocessor reception and storage CPU or a upper coprocessor send, sent the data slicer in a upper moment of having stored to next coprocessor.
According to one preferred embodiment of the present invention, when described method is used for transferring data to N target coprocessor by CPU, described CPU is sent to one of them target coprocessor with data with the data slicer form, and this target coprocessor of the Thread control by correspondence receives and when the data slicer of the current time that storage CPU sends, sends upper one constantly the data slicer stored to next target coprocessor.
According to one preferred embodiment of the present invention, when described method is used for transferring data to N target coprocessor by CPU, described CPU is sent to described N target coprocessor with data with the data slicer form, and receives simultaneously and store the data slicer that CPU sends by the described N of a corresponding Thread control target coprocessor.
According to one preferred embodiment of the present invention, when described method is used for transferring data to other N-1 target coprocessors by a source coprocessor, described CPU is sent to CPU with data with the form of data slicer by corresponding Thread control source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, sent the data slicer in a upper moment of having stored to one of them target coprocessor, and receive and when the data slicer of the current time that storage CPU sends by this target coprocessor of corresponding Thread control, send upper one constantly the data slicer stored to next target coprocessor.
According to one preferred embodiment of the present invention, if described next target coprocessor is last target coprocessor, the data fragmentation that then arrives by last the target coprocessor reception of corresponding Thread control and storing received, otherwise, in the data slicer by described next the target coprocessor current time that a target coprocessor sends on receiving of corresponding Thread control, sent the data slicer in a upper moment of having stored to next target coprocessor, until last target coprocessor.
According to one preferred embodiment of the present invention, when described method is used for transferring data to other N-1 target coprocessors by a source coprocessor, described CPU is sent to CPU with the form of data slicer from the source coprocessor with data by corresponding thread notification source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, send the data slicer in a upper moment of having stored to described N-1 target coprocessor, and receive simultaneously and store the data slicer that CPU sends by the described N-1 of a corresponding Thread control coprocessor.
Data transmission device between a kind of CPU and coprocessor, this device is arranged at CPU, it is characterized in that, and this device comprises:
The Thread control unit is used for generating N thread;
Transmission control unit is used for according to a described N thread parallel data transmission of N coprocessor being controlled, and described N is the integer more than or equal to 2;
Described control comprises: coprocessor receives the data that CPU sends with the data slicer form; Perhaps, in the time of the data slicer of the current time that coprocessor reception and storage CPU or a upper coprocessor send, sent the data slicer in a upper moment of having stored to next coprocessor.
According to one preferred embodiment of the present invention, when described device is used for transferring data to N target coprocessor by CPU, described CPU is sent to one of them target coprocessor with data with the data slicer form, described transmission control unit receives according to this coprocessor of the Thread control of correspondence and when the data slicer of the current time that storage CPU sends, sends upper one constantly the data slicer stored to next target coprocessor.
According to one preferred embodiment of the present invention, when described device is used for transferring data to N target coprocessor by CPU, described CPU is sent to this N target coprocessor with data with the data slicer form, and described transmission control unit receives and store the data slicer that CPU sends simultaneously according to the described N of a corresponding Thread control target coprocessor.
According to one preferred embodiment of the present invention, when described device is used for transferring data to other N-1 target coprocessors by a source coprocessor, described transmission control unit is sent to CPU with data with the data slicer form according to corresponding Thread control source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, sent the data slicer in a upper moment of having stored to one of them target coprocessor, described transmission control unit receives according to this target coprocessor of corresponding Thread control and when the data slicer of the current time that storage CPU sends, sends upper one constantly the data slicer stored to next target coprocessor.
According to one preferred embodiment of the present invention, if described next target coprocessor is last target coprocessor, then described transmission control unit is by last target coprocessor of corresponding Thread control receives and storing received arrives data fragmentation, otherwise, in the data slicer of described transmission control unit by described next the target coprocessor current time that a target coprocessor sends on receiving of corresponding Thread control, sent the data slicer in a upper moment of having stored to next target coprocessor, until last target coprocessor.
According to one preferred embodiment of the present invention, when described device is used for transferring data to other N-1 target coprocessors by a source coprocessor, described transmission control unit is sent to CPU with data with the data slicer form according to corresponding Thread control source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, sent the data slicer in a upper moment of having stored to described N-1 target coprocessor, described transmission control unit receives and stores the data slicer that CPU sends simultaneously according to the described N-1 of a corresponding Thread control target coprocessor.
As can be seen from the above technical solutions, the present invention controls each coprocessor with the form transmission of data with section by generating multithreading, and each thread can carry out reception or the transmission operation of corresponding data slicer by its corresponding coprocessor of parallel control.The present invention can take full advantage of the bus between each coprocessor and the CPU, and the bus between each coprocessor, significantly improved when CPU is sent to a plurality of coprocessor with data and coprocessor with data transmission the data transmission efficiency during to all the other a plurality of coprocessors.
[description of drawings]
Method A exemplary plot when the data that Fig. 1 provides for the embodiment of the invention one transfer to a plurality of coprocessor by CPU;
Method B exemplary plot when the data that Fig. 2 provides for the embodiment of the invention one transfer to a plurality of coprocessor by CPU;
Method C exemplary plot when the data that Fig. 3 provides for the embodiment of the invention one transfer to a plurality of coprocessor by a coprocessor;
Method D exemplary plot when the data that Fig. 4 provides for the embodiment of the invention one transfer to a plurality of coprocessor by a coprocessor;
The CPU that Fig. 5 provides for the embodiment of the invention two and the data transmission device schematic diagram between coprocessor.
[embodiment]
In order to make the purpose, technical solutions and advantages of the present invention clearer, describe the present invention below in conjunction with the drawings and specific embodiments.
In the existing method, data transfer to a plurality of coprocessors by CPU, when perhaps transferring to all the other a plurality of coprocessors by coprocessor, at every turn can only be by carrying out data transmission between CPU and a coprocessor or two coprocessors, and the bus of all the other coprocessors all is in idle condition.If can provide a kind of method so that transmission can be carried out simultaneously between CPU and a plurality of coprocessor or a plurality of coprocessor, the efficient of transmission will have remarkable lifting so.The present invention controls a plurality of coprocessors by CPU generation multithreading just data is transmitted with the form of section, takes full advantage of the bus bandwidth of each processor, thereby the raising data transmission efficiency.
Embodiment one
The embodiment of the invention one provides the data transmission method between a kind of CPU and coprocessor, and the method comprises: data are transferred to the transmission method of a plurality of coprocessors by CPU; Data are transferred to the transmission method of a plurality of coprocessors by a coprocessor.
When method provided by the present invention can improve CPU data are sent to a plurality of coprocessor and coprocessor with data transmission the data transmission efficiency during to all the other a plurality of coprocessors, the below is described both of these case respectively.
1, data are transferred to the transmission method of a plurality of coprocessors by CPU, and the method by the following method A and method B dual mode realizes:
Method A: generate N thread by CPU and control respectively N coprocessor, successively data slicer is sent to coprocessor 1 from CPU, the data slicer that thread 1 control coprocessor 1 received and preserved the CPU transmission is sent to coprocessor 2 successively with the data slicer of preserving simultaneously, the data slicer that thread 2 control coprocessors 2 received and preserved coprocessor 1 transmission is sent to coprocessor 3 successively with the data slicer of preserving simultaneously, send data to by that analogy all coprocessors, wherein, N is the quantity of coprocessor.
The data that provide for understanding method A are better transferred to the transmission mode of a plurality of coprocessors by CPU, below in conjunction with example shown in Figure 1 method A is described.As shown in Figure 1, CPU need to be with a data transmission to 4 coprocessor.Generating respectively 4 thread cause CPU controls these 4 coprocessors and carries out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4.During the transmission beginning, CPU is sent to coprocessor 1 with the every a section of these data successively from the CPU internal memory, in the data slicer that thread 1 control coprocessor 1 reception CPU sends and the internal memory that is kept at coprocessor 1, simultaneously, the data slicer that thread 1 control coprocessor 1 will have been preserved is sent to coprocessor 2 successively, thread 2 control coprocessors 2 receive the data slicer of coprocessor 1 transmission and are kept in the internal memory of coprocessor 2, simultaneously, the data slicer that thread 2 control coprocessors 2 will have been preserved is sent to coprocessor 3 successively, by that analogy, thread 3 control coprocessors 3 receive the data slicer of coprocessor 2 transmissions and are kept in the internal memory of coprocessor 3, simultaneously, the data slicer that thread 3 control coprocessors 3 will have been preserved is sent to coprocessor 4 successively, and thread 4 control coprocessors 4 receive the data slicer of coprocessor 3 transmissions and are kept in the internal memory of coprocessor 4.
The method can take full advantage of the bus between each coprocessor, in each transmission, when CPU is sent to a coprocessor with data slicer, also can carry out the transmission of data slicer between all the other coprocessors.For example, as shown in Figure 1, in the transmission course at a time, the transmission of carrying out simultaneously has: CPU is sent to coprocessor 1 with the section of Slice_x piece of data, the Slice_x-1 piece of data section that coprocessor 1 will have been preserved is sent to coprocessor 2, the Slice_x-2 piece of data section that the Slice_x-1 piece of data section that coprocessor 2 receives and preservation coprocessor 1 sends will have been preserved simultaneously is sent to coprocessor 3, the Slice_x-3 piece of data section that the Slice_x-2 piece of data section that coprocessor 3 receives and preservation coprocessor 2 sends will have been preserved simultaneously is sent to coprocessor 4, coprocessor 4 receives and preserves the Slice_x-3 piece of data section that coprocessor 3 sends, wherein, thread 1-thread 4 is controlled respectively corresponding coprocessor and is carried out reception or the transmission work of corresponding data slicer, in transmission course, walk abreast between these 4 threads, can control simultaneously its corresponding coprocessor and carry out data transmission, compared with prior art, the method takes full advantage of the bus between each coprocessor in transmission course, higher data transmission efficiency is arranged.
The above-mentioned data that provide for the method A that describes in conjunction with Fig. 1 are transferred to the transmission method of a plurality of coprocessors by CPU.
Method B: generate N thread by CPU and control respectively N coprocessor, successively data slicer is sent to this N coprocessor from CPU, each thread is controlled respectively its corresponding coprocessor and is received successively and preserve the data slicer that CPU sends, and wherein, N is the quantity of coprocessor.
The data that provide for understanding method B better transfer to a plurality of coprocessor transmission modes by CPU, below in conjunction with example shown in Figure 2 method B are described.As shown in Figure 2, CPU need to be with a data transmission to 4 coprocessor.Generating respectively 4 thread cause CPU controls these 4 coprocessors and carries out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4.During the transmission beginning, CPU is sent to 4 coprocessors with the every a section of these data successively simultaneously from the CPU internal memory, and thread 1-thread 4 is controlled respectively its corresponding coprocessor and received the data slicer that CPU sends.
The method can take full advantage of the bus between each coprocessor and the CPU, and in each transmission, CPU can be sent to all coprocessors with a data slicer simultaneously.For example, as shown in Figure 2, in the transmission course at a time, CPU is sent to 4 coprocessors with the section of Slice_x piece of data from the CPU internal memory simultaneously, thread 1-thread 4 is controlled simultaneously its corresponding coprocessor and is received this Slice_x piece of data section, walk abreast between these 4 threads, can control simultaneously its corresponding coprocessor and receive the data slicer that CPU sends, compared with prior art, the method takes full advantage of the bus between each coprocessor and the CPU in transmission course, higher data transmission efficiency is arranged.
The above-mentioned data that provide for the method A that describes in conjunction with Fig. 2 are transferred to the transmission method of a plurality of coprocessors by CPU.
Said method A and method B are data are transferred to a plurality of coprocessors by CPU transmission method, two kinds of methods are all transmitted data in the mode of section, data slicer in two kinds of methods is the data block that presets size, be in the transmission course, by the data volume transmitted between the each CPU of Thread control and coprocessor or the coprocessor data block for a certain size, the size of section can be set according to the actual requirements, if but it is excessive to cut into slices, then transmission delay is excessive, too small if cut into slices, then efficient is lower, the invention provides a kind of preferred implementation data are cut into slices: the data slicer size is set as page, i.e. a 4KB.
In said method A and method B, CPU can be known by thread the state of each coprocessor, and finishes corresponding operation by the Thread control coprocessor.Can in the CPU internal memory, record the state of each coprocessor by the data structure of making by oneself (such as the free time, or receive data, or transmission data), and which data slicer is the transmission situation of data slicer (sent, which data slicer to be sent), so that CPU controls and dispatches each thread, for example when two coprocessors all were in idle condition, CPU can carry out the transmission of data slicer by these two coprocessors of Thread control.This part is prior art, exceeds at this and gives unnecessary details.
For said method A and method B, because the bus bandwidth of coprocessor and CPU is usually above the bus bandwidth between the coprocessor, therefore, the transfer efficiency of method B will be higher than method A in actual applications, but method B is subject to the bandwidth of CPU internal memory, is applicable to the higher situation of CPU memory bandwidth, as can be used in the multi-CPU system, if the situation that the bandwidth of CPU internal memory is not high enough then is fit to selecting method A.
2, data are transferred to the transmission method of a plurality of coprocessors by a coprocessor, and the method by the following method C and method D dual mode realizes:
Method C: generate N thread by CPU and control respectively N coprocessor, thread 1 control coprocessor 1 is sent to CPU with data slicer successively, the data slicer that CPU is received is sent to coprocessor 2 successively, the data slicer that thread 2 control coprocessors 2 received and preserved the CPU transmission is sent to coprocessor 3 successively with the data slicer of preserving simultaneously, the data slicer that thread 3 control coprocessors 3 received and preserved coprocessor 2 transmissions is sent to coprocessor 4 successively with the data slicer of preserving simultaneously, send data to by that analogy all target coprocessors, wherein, N is the quantity of coprocessor.
The data that provide for understanding method C are better transferred to the transmission mode of a plurality of coprocessors by a coprocessor, below in conjunction with example shown in Figure 3 method C is described.As shown in Figure 3, coprocessor 1 need to be with a data transmission to all the other 3 coprocessors.Generating respectively 4 thread cause CPU controls these 4 coprocessors and carries out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4, during the transmission beginning, thread 1 control coprocessor 1 is sent to CPU with the form of section successively with this piece of data, CPU receives data slicer and the preservation that coprocessor 1 sends, simultaneously, the data slicer that CPU will preserve is sent to coprocessor 2 successively, thread 2 control coprocessors 2 receive the data slicer of CPU transmission and are kept in the internal memory of coprocessor 2, simultaneously, the data slicer that thread 2 control coprocessors 2 will have been preserved is sent to coprocessor 3 successively, thread 3 control coprocessors 3 receive the data slicer of coprocessor 2 transmissions and are kept in the internal memory of coprocessor 3, simultaneously, the data slicer that thread 3 control coprocessors 3 will have been preserved is sent to coprocessor 4 successively, in the data slicer that thread 4 control coprocessors 4 reception coprocessors 3 send and the internal memory that is kept at coprocessor 4.
The method can take full advantage of the bus between each coprocessor, in each transmission, a coprocessor is sent to CPU with data slicer, when the another one coprocessor is cut into slices from the CPU receive data, also can carry out the transmission of data slicer between all the other coprocessors.For example, as shown in Figure 3, in the transmission course at a time, the transmission of carrying out simultaneously has: coprocessor 1 is sent to CPU with the section of Slice_x piece of data, CPU receives and preserves the Slice_x piece of data section that coprocessor 1 sends, the Slice_x-1 piece of data section that while CPU will preserve is sent to coprocessor 2, the Slice_x-2 piece of data section that the Slice_x-1 piece of data section that coprocessor 2 receives and preservation CPU sends will have been preserved simultaneously is sent to coprocessor 3, the Slice_x-3 piece of data section that the Slice_x-2 piece of data section that coprocessor 3 receives and preservation coprocessor 2 sends will have been preserved simultaneously is sent to coprocessor 4, coprocessor 4 receives and preserves the Slice_x-3 piece of data section that coprocessor 3 sends, wherein, thread 1-thread 4 is controlled respectively the operation that each self-corresponding coprocessor received and sent data slicer.In transmission course, walk abreast between the thread 1-thread 4, can control simultaneously its corresponding coprocessor and carry out data transmission, compared with prior art, the method takes full advantage of the bus between each coprocessor in transmission course, higher data transmission efficiency is arranged.
The above-mentioned data that provide for the method C that describes in conjunction with Fig. 3 are transferred to the transmission method of a plurality of coprocessors by a coprocessor.
Method D: generate N thread by CPU and control respectively N coprocessor, thread 1 control coprocessor 1 is sent to CPU with data slicer successively, the data slicer that CPU is received is sent to all the other coprocessors successively, thread 2-thread N controls respectively its corresponding coprocessor and receives the data slicer that CPU sends, wherein, N is the quantity of coprocessor.
The data that provide for understanding method D are better transferred to the transmission mode of a plurality of coprocessors by a coprocessor, below in conjunction with example shown in Figure 4 method D is described.As shown in Figure 4, coprocessor 1 need to be with a data transmission to all the other 3 coprocessors.Generating respectively 4 thread cause CPU controls these 4 coprocessors and carries out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4, during the transmission beginning, thread 1 control coprocessor 1 is sent to CPU with data with the form of cutting into slices successively, CPU receives data slicer and the preservation that coprocessor 1 sends, simultaneously, the data slicer that CPU will preserve is sent to all the other 3 coprocessors successively, and thread 2-thread 4 is controlled respectively its corresponding coprocessor and received and preserve the data slicer that CPU sends.
The method can take full advantage of the bus between each coprocessor and the CPU, and in each transmission, a coprocessor is sent to CPU with a data slicer, and simultaneously, the data slicer that CPU can receive portion is sent to a plurality of all the other coprocessors.For example, in the transmission course at a time, coprocessor 1 is sent to CPU with the section of Slice_x piece of data, simultaneously, the Slice_x-1 piece of data section that CPU will receive before is sent to all the other coprocessors, and thread 2-thread 4 is controlled respectively its corresponding coprocessor and received the Slice_x-1 piece of data section that CPU sends.In transmission course, walk abreast between the thread 1-thread 4, can control simultaneously its corresponding coprocessor and carry out data transmission, compared with prior art, the method takes full advantage of the bus between each coprocessor and the CPU in transmission course, higher data transmission efficiency is arranged.
The above-mentioned data that provide for the method D that describes in conjunction with Fig. 4 are transferred to the transmission method of a plurality of coprocessors by a coprocessor.
Said method C and method D are data are transferred to a plurality of coprocessors by a coprocessor transmission method, two kinds of methods are all transmitted data in the mode of section, data slicer in two kinds of methods is the data block that presets size, be in the transmission course, the data volume of transmitting between the each coprocessor of Thread control and CPU or the coprocessor is a certain size data block, the size of section can be set according to the actual requirements, if but it is excessive to cut into slices, then transmission delay is excessive, too small if cut into slices, then efficient is lower, the invention provides a kind of preferred implementation data are cut into slices: the data slicer size is set as page, i.e. a 4KB.
In said method C and method D, CPU can be known by thread the state of each coprocessor, and finishes corresponding operation by the Thread control coprocessor.Can in the CPU internal memory, record the state of each coprocessor by the data structure of making by oneself (such as the free time, or receive data, or transmission data), and which data slicer is the transmission situation of data slicer (sent, which data slicer to be sent), so that CPU controls and dispatches each thread, for example when two coprocessors all were in idle condition, CPU can carry out the transmission of data slicer by these two coprocessors of Thread control.This part is prior art, exceeds at this and gives unnecessary details.
For said method C and method D, because the bus bandwidth of coprocessor and CPU is usually above the bus bandwidth between the coprocessor, therefore, the transfer efficiency of method D will be higher than method C in actual applications, but method D is subject to the bandwidth of CPU internal memory, is applicable to the higher situation of CPU memory bandwidth, as can be used in the multi-CPU system, if the situation that the bandwidth of CPU internal memory is not high enough then is fit to selecting method C.
The above-mentioned description of carrying out for CPU that the embodiment of the invention one is provided and the data transmission method between coprocessor.Can find out, the present invention controls each coprocessor with the form transmission of data with section by generating multithreading, and each thread can carry out reception or the transmission operation of corresponding data slicer by its corresponding coprocessor of parallel control, take full advantage of the bus between a coprocessor and the CPU, and the bus between each coprocessor, in the time of can significantly improving CPU data be sent to a plurality of coprocessor and coprocessor with data transmission the data transmission efficiency during to all the other a plurality of coprocessors.The present invention can be used for GPU, and the multiple coprocessor such as the FPGA that are similar to GPU, the MIC(many-core processor of ARM and Intel) etc. the data transmission between coprocessor and the CPU.
Embodiment two
The CPU that Fig. 5 provides for the embodiment of the invention two and the data transmission device schematic diagram between coprocessor, as shown in Figure 5, this device comprises: Thread control unit 10, transmission control unit 20.
Data transmission device between CPU provided by the present invention and coprocessor is arranged at CPU, specifically comprises: Thread control unit 10 and transmission control unit 20.
Wherein, Thread control unit 10 is used for generating N thread;
Transmission control unit 20 is used for according to a described N thread parallel data transmission of N coprocessor being controlled, and described N is the integer more than or equal to 2.
Transmission control unit 20 specifically can be used for the control coprocessor and receive the data that CPU sends with sliced form; Perhaps, be used for that the control coprocessor receives and when the data slicer of the current time that storage CPU or a upper coprocessor send, send upper one constantly the data slicer stored to next coprocessor.
Transmission control unit 20 can also be used for the control coprocessor data are sent to CPU with the data slicer form.
When device provided by the present invention can improve CPU data are sent to a plurality of coprocessor and coprocessor with data transmission the data transmission efficiency during to all the other a plurality of coprocessors, the below is described both of these case respectively.
When 1, data transferred to a plurality of coprocessor by CPU, transmission unit 10 specifically can carry out following operation A or operation B transfers to a plurality of coprocessors with data by CPU:
Operation A: control respectively N coprocessor according to N the thread that Thread control unit 10 generates, successively data slicer can be finished by existing data transmission unit the CPU from the data transmission that CPU is sent to coprocessor 1(CPU self, also be like this in the subsequent descriptions, this data transmission unit does not illustrate in the drawings), transmission control unit 20 is sent to coprocessor 2 with the data slicer of preserving simultaneously successively according to the data slicer that thread 1 control coprocessor 1 received and preserved the CPU transmission, transmission control unit 20 is sent to coprocessor 3 with the data slicer of preserving simultaneously successively according to the data slicer that thread 2 control coprocessors 2 received and preserved coprocessor 1 transmission, send data to by that analogy all coprocessors, wherein, N is the quantity of coprocessor.
For example, CPU need to be with a data transmission to 4 coprocessor.Thread control unit 10 generates respectively 4 thread cause CPU and controls these 4 coprocessors and carry out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4.During the transmission beginning, CPU is sent to coprocessor 1 with the every a section of these data successively from the CPU internal memory, transmission control unit 20 receives in the data slicer that CPU sends and the internal memory that is kept at coprocessor 1 according to thread 1 control coprocessor 1, simultaneously, the data slicer that coprocessor 1 has been preserved is sent to coprocessor 2 successively, transmission control unit 20 receives the data slicer of coprocessor 1 transmission according to thread 2 control coprocessors 2 and is kept in the internal memory of coprocessor 2, simultaneously, the data slicer that coprocessor 2 has been preserved is sent to coprocessor 3 successively, by that analogy, transmission control unit 20 receives the data slicer of coprocessor 2 transmissions according to thread 3 control coprocessors 3 and is kept in the internal memory of coprocessor 3, simultaneously, the data slicer that coprocessor 3 has been preserved is sent to coprocessor 4 successively, and transmission control unit 20 receives the data slicer of coprocessor 3 transmissions according to thread 4 control coprocessors 4 and is kept in the internal memory of coprocessor 4.
This operation can take full advantage of the bus between each coprocessor, in each transmission, when CPU is sent to a coprocessor with data slicer, also can carry out the transmission of data slicer between all the other coprocessors.
Operation B: control respectively N coprocessor according to N the thread that Thread control unit 10 generates, successively data slicer is sent to this N coprocessor from CPU, transmission control unit 20 is controlled respectively this N coprocessor according to corresponding thread and is received successively and preserve the data slicer that CPU sends, wherein, N is the quantity of coprocessor.
For example, CPU need to be with a data transmission to 4 coprocessor.Thread control unit 10 generates respectively 4 thread cause CPU and controls these 4 coprocessors and carry out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4.During the transmission beginning, CPU is sent to 4 coprocessors with the every a section of these data successively simultaneously from the CPU internal memory, and transmission control unit 20 is controlled respectively its corresponding coprocessor according to thread 1-thread 4 and received the data slicer that CPU sends.
This operation can take full advantage of the bus between each coprocessor and the CPU, and in each transmission, CPU can be sent to all coprocessors with a data slicer simultaneously.
Data slicer among aforesaid operations A and the operation B is the data block that presets size, be in the transmission course, by the data volume transmitted between the each CPU of Thread control and coprocessor or the coprocessor data block for a certain size, the size of section can be set according to the actual requirements, if but it is excessive to cut into slices, then transmission delay is excessive, too small if cut into slices, then efficient is lower, the invention provides a kind of preferred implementation cuts into slices to data: the data slicer size is set as a page, i.e. 4KB.
In aforesaid operations A and operation B, CPU can be known by thread the state of each coprocessor, and finishes corresponding operation by the Thread control coprocessor.Can in the CPU internal memory, record the state of each coprocessor by the data structure of making by oneself (such as the free time, or receive data, or transmission data), and which data slicer is the transmission situation of data slicer (sent, which data slicer to be sent), so that CPU controls and dispatches each thread, for example when two coprocessors all were in idle condition, CPU can carry out the transmission of data slicer by these two coprocessors of Thread control.This part is prior art, exceeds at this and gives unnecessary details.
Because the bus bandwidth of coprocessor and CPU is usually above the bus bandwidth between the coprocessor, therefore, the transfer efficiency that operates in actual applications B will be higher than operation A, but operation B is subject to the bandwidth of CPU internal memory, be applicable to the higher situation of CPU memory bandwidth, as can be used in the multi-CPU system, if the not high enough situation of the bandwidth of CPU internal memory then is fit to select operation A.
When 2, data transferred to a plurality of coprocessor by a coprocessor, transmission unit 10 specifically can carry out following operation C or operation D transfers to a plurality of coprocessors with data by a coprocessor:
Operation C: control respectively N coprocessor according to N the thread that Thread control unit 10 generates, transmission control unit 20 is sent to CPU with data slicer successively according to thread 1 control coprocessor 1, the data slicer that CPU is received is sent to coprocessor 2 successively, transmission control unit 20 is sent to coprocessor 3 with the data slicer of preserving simultaneously successively according to the data slicer that thread 2 control coprocessors 2 received and preserved the CPU transmission, transmission control unit 20 is sent to coprocessor 4 with the data slicer of preserving simultaneously successively according to the data slicer that thread 3 control coprocessors 3 received and preserved coprocessor 2 transmissions, send data to by that analogy all target coprocessors, wherein, N is the quantity of coprocessor.
For example, coprocessor 1 need to be with a data transmission to all the other 3 coprocessors.Thread control unit 10 generates respectively 4 thread cause CPU and controls these 4 coprocessors and carry out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4, during the transmission beginning, transmission control unit 20 is sent to CPU with the form of section according to thread 1 control coprocessor 1 successively with this piece of data, CPU receives data slicer and the preservation that coprocessor 1 sends, simultaneously, the data slicer that CPU will preserve is sent to coprocessor 2 successively, transmission control unit 20 receives the data slicer of CPU transmission according to thread 2 control coprocessors 2 and is kept in the internal memory of coprocessor 2, simultaneously, the data slicer that coprocessor 2 has been preserved is sent to coprocessor 3 successively, transmission control unit 20 receives the data slicer of coprocessor 2 transmissions according to thread 3 control coprocessors 3 and is kept in the internal memory of coprocessor 3, simultaneously, the data slicer that coprocessor 3 has been preserved is sent to coprocessor 4 successively, and transmission control unit 20 receives in the data slicer that coprocessors 3 send and the internal memory that is kept at coprocessor 4 according to thread 4 control coprocessors 4.
This operation can take full advantage of the bus between each coprocessor, in each transmission, a coprocessor is sent to CPU with data slicer, when the another one coprocessor is cut into slices from the CPU receive data, also can carry out the transmission of data slicer between all the other coprocessors.
Operation D: control respectively N coprocessor according to N the thread that Thread control unit 10 generates, transmission control unit 20 is sent to CPU with data slicer successively according to thread 1 control coprocessor 1, the data slicer that CPU is received is sent to all the other coprocessors successively, transmission control unit 20 is controlled respectively its corresponding coprocessor according to thread 2-thread N and is received the data slicer that CPU sends, wherein, N is the quantity of coprocessor.
For example, coprocessor 1 need to be with a data transmission to all the other 3 coprocessors.Thread control unit 10 generates respectively 4 thread cause CPU and controls these 4 coprocessors and carry out data transmission, for ease of describing, these 4 threads are numbered respectively thread 1, thread 2, thread 3, thread 4, control respectively coprocessor 1, coprocessor 2, coprocessor 3, coprocessor 4, during the transmission beginning, transmission control unit 20 is sent to CPU with data with the form of cutting into slices successively according to thread 1 control coprocessor 1, CPU receives data slicer and the preservation that coprocessor 1 sends, simultaneously, the data slicer that CPU will preserve is sent to all the other 3 coprocessors successively, and transmission control unit 20 is controlled respectively its corresponding coprocessor according to thread 2-thread 4 and received and preserve the data slicer that CPU sends.
This operation can take full advantage of the bus between each coprocessor and the CPU, and in each transmission, a coprocessor is sent to CPU with a data slicer, and simultaneously, the data slicer that CPU can receive portion is sent to all the other a plurality of coprocessors.
Data slicer among aforesaid operations C and the operation D is the data block that presets size, be in the transmission course, the data volume of transmitting between the each coprocessor of Thread control and CPU or the coprocessor is a certain size data block, the size of section can be set according to the actual requirements, if but cut into slices excessively, then transmission delay is excessive, too small if cut into slices, then efficient is lower, the invention provides a kind of preferred implementation data are cut into slices: the data slicer size is set as page, i.e. a 4KB.
In aforesaid operations C and operation D, CPU can be known by thread the state of each coprocessor, and finishes corresponding operation by the Thread control coprocessor.Can in the CPU internal memory, record the state of each coprocessor by the data structure of making by oneself (such as the free time, or receive data, or transmission data), and which data slicer is the transmission situation of data slicer (sent, which data slicer to be sent), so that CPU controls and dispatches each thread, for example when two coprocessors all were in idle condition, CPU can carry out the transmission of data slicer by these two coprocessors of Thread control.This part is prior art, exceeds at this and gives unnecessary details.
The transfer efficiency that operates in actual applications D will be higher than operation C, but operation D is subject to the bandwidth of CPU internal memory, is applicable to the higher situation of CPU memory bandwidth, as can be used in the multi-CPU system, if the not high enough situation of the bandwidth of CPU internal memory then is fit to select operation C.
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (12)

1. the data transmission method between a CPU and coprocessor is characterized in that, the method comprises:
According to N the thread parallel that CPU generates the data transmission of N coprocessor is controlled, described N is the integer more than or equal to 2;
Described control comprises: coprocessor receives the data that CPU sends with the data slicer form; Perhaps, in the time of the data slicer of the current time that coprocessor reception and storage CPU or a upper coprocessor send, sent the data slicer in a upper moment of having stored to next coprocessor.
2. method according to claim 1, it is characterized in that, when described method is used for transferring data to N target coprocessor by CPU, described CPU is sent to one of them target coprocessor with data with the data slicer form, and this target coprocessor of the Thread control by correspondence receives and when the data slicer of the current time that storage CPU sends, sends upper one constantly the data slicer stored to next target coprocessor.
3. method according to claim 1, it is characterized in that, when described method is used for transferring data to N target coprocessor by CPU, described CPU is sent to described N target coprocessor with data with the data slicer form, and receives simultaneously and store the data slicer that CPU sends by the described N of a corresponding Thread control target coprocessor.
4. method according to claim 1, it is characterized in that, when described method is used for transferring data to other N-1 target coprocessors by a source coprocessor, described CPU is sent to CPU with data with the form of data slicer by corresponding Thread control source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, sent the data slicer in a upper moment of having stored to one of them target coprocessor, and receive and when the data slicer of the current time that storage CPU sends by this target coprocessor of corresponding Thread control, send upper one constantly the data slicer stored to next target coprocessor.
5. according to claim 2 or 4 described methods, it is characterized in that, if described next target coprocessor is last target coprocessor, the data fragmentation that then arrives by last the target coprocessor reception of corresponding Thread control and storing received, otherwise, in the data slicer by described next the target coprocessor current time that a target coprocessor sends on receiving of corresponding Thread control, sent the data slicer in a upper moment of having stored to next target coprocessor, until last target coprocessor.
6. method according to claim 1, it is characterized in that, when described method is used for transferring data to other N-1 target coprocessors by a source coprocessor, described CPU is sent to CPU with the form of data slicer from the source coprocessor with data by corresponding thread notification source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, send the data slicer in a upper moment of having stored to described N-1 target coprocessor, and receive simultaneously and store the data slicer that CPU sends by the described N-1 of a corresponding Thread control coprocessor.
7. the data transmission device between a CPU and coprocessor, this device is arranged at CPU, it is characterized in that, and this device comprises:
The Thread control unit is used for generating N thread;
Transmission control unit is used for according to a described N thread parallel data transmission of N coprocessor being controlled, and described N is the integer more than or equal to 2;
Described control comprises: coprocessor receives the data that CPU sends with the data slicer form; Perhaps, in the time of the data slicer of the current time that coprocessor reception and storage CPU or a upper coprocessor send, sent the data slicer in a upper moment of having stored to next coprocessor.
8. device according to claim 7, it is characterized in that, when described device is used for transferring data to N target coprocessor by CPU, described CPU is sent to one of them target coprocessor with data with the data slicer form, described transmission control unit receives according to this coprocessor of the Thread control of correspondence and when the data slicer of the current time that storage CPU sends, sends upper one constantly the data slicer stored to next target coprocessor.
9. device according to claim 7, it is characterized in that, when described device is used for transferring data to N target coprocessor by CPU, described CPU is sent to this N target coprocessor with data with the data slicer form, and described transmission control unit receives and store the data slicer that CPU sends simultaneously according to the described N of a corresponding Thread control target coprocessor.
10. device according to claim 7, it is characterized in that, when described device is used for transferring data to other N-1 target coprocessors by a source coprocessor, described transmission control unit is sent to CPU with data with the data slicer form according to corresponding Thread control source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, sent the data slicer in a upper moment of having stored to one of them target coprocessor, described transmission control unit receives according to this target coprocessor of corresponding Thread control and when the data slicer of the current time that storage CPU sends, sends upper one constantly the data slicer stored to next target coprocessor.
11. according to claim 8 or 10 described devices, it is characterized in that, if described next target coprocessor is last target coprocessor, then described transmission control unit is by last target coprocessor of corresponding Thread control receives and storing received arrives data fragmentation, otherwise, in the data slicer of described transmission control unit by described next the target coprocessor current time that a target coprocessor sends on receiving of corresponding Thread control, sent the data slicer in a upper moment of having stored to next target coprocessor, until last target coprocessor.
12. device according to claim 7, it is characterized in that, when described device is used for transferring data to other N-1 target coprocessors by a source coprocessor, described transmission control unit is sent to CPU with data with the data slicer form according to corresponding Thread control source coprocessor, described CPU receives and when the data slicer of the current time that storage source coprocessor sends, sent the data slicer in a upper moment of having stored to described N-1 target coprocessor, described transmission control unit receives and stores the data slicer that CPU sends simultaneously according to the described N-1 of a corresponding Thread control target coprocessor.
CN201210532292.4A 2012-12-11 2012-12-11 Data transmission method and device between a kind of CPU and coprocessor Active CN103049421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210532292.4A CN103049421B (en) 2012-12-11 2012-12-11 Data transmission method and device between a kind of CPU and coprocessor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210532292.4A CN103049421B (en) 2012-12-11 2012-12-11 Data transmission method and device between a kind of CPU and coprocessor

Publications (2)

Publication Number Publication Date
CN103049421A true CN103049421A (en) 2013-04-17
CN103049421B CN103049421B (en) 2019-08-27

Family

ID=48062066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210532292.4A Active CN103049421B (en) 2012-12-11 2012-12-11 Data transmission method and device between a kind of CPU and coprocessor

Country Status (1)

Country Link
CN (1) CN103049421B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104536936A (en) * 2015-01-28 2015-04-22 浪潮电子信息产业股份有限公司 Draw-bar box type programmable calculator device
WO2015192812A1 (en) * 2014-06-20 2015-12-23 Tencent Technology (Shenzhen) Company Limited Data parallel processing method and apparatus based on multiple graphic procesing units
CN106776455A (en) * 2016-12-13 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of many GPU communications of unit
WO2017206591A1 (en) * 2016-06-01 2017-12-07 华为技术有限公司 Data processing system and data processing method
CN107846709A (en) * 2017-09-29 2018-03-27 深圳市亿兆互联技术有限公司 A kind of radio communication device and wireless communications method based on LoRa
CN110908805A (en) * 2019-11-29 2020-03-24 深圳前海达闼云端智能科技有限公司 Information distribution method, robot and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020084963A (en) * 2001-05-03 2002-11-16 엘지전자 주식회사 Interrupt processing apparatus
US20080117220A1 (en) * 2006-09-25 2008-05-22 Neurala Llc Graphic Processor Based Accelerator System and Method
US20080120489A1 (en) * 2006-11-16 2008-05-22 Shinri Inamori Scalable Multi-Threaded Sequencing/Synchronizing Processor Architecture
US20080183912A1 (en) * 2007-01-31 2008-07-31 Monroe George T Coprocessor command synchronization using a dma channel
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor
US7602395B1 (en) * 2005-04-22 2009-10-13 Nvidia Corporation Programming multiple chips from a command buffer for stereo image generation
CN102007479A (en) * 2008-03-31 2011-04-06 先进微装置公司 Peer-to-peer special purpose processor architecture and method
CN102036043A (en) * 2010-12-15 2011-04-27 成都市华为赛门铁克科技有限公司 Video data processing method and device as well as video monitoring system
CN102135949A (en) * 2011-03-01 2011-07-27 浪潮(北京)电子信息产业有限公司 Computing network system, method and device based on graphic processing unit
CN102184397A (en) * 2011-04-25 2011-09-14 中国测绘科学研究院 Fast remote sensing image normal incidence correction method
CN102541804A (en) * 2011-12-26 2012-07-04 中国人民解放军信息工程大学 Multi-GPU (graphic processing unit) interconnection system structure in heterogeneous system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20020084963A (en) * 2001-05-03 2002-11-16 엘지전자 주식회사 Interrupt processing apparatus
US7602395B1 (en) * 2005-04-22 2009-10-13 Nvidia Corporation Programming multiple chips from a command buffer for stereo image generation
US20080117220A1 (en) * 2006-09-25 2008-05-22 Neurala Llc Graphic Processor Based Accelerator System and Method
US20080120489A1 (en) * 2006-11-16 2008-05-22 Shinri Inamori Scalable Multi-Threaded Sequencing/Synchronizing Processor Architecture
US20080183912A1 (en) * 2007-01-31 2008-07-31 Monroe George T Coprocessor command synchronization using a dma channel
CN102007479A (en) * 2008-03-31 2011-04-06 先进微装置公司 Peer-to-peer special purpose processor architecture and method
CN101526934A (en) * 2009-04-21 2009-09-09 浪潮电子信息产业股份有限公司 Construction method of GPU and CPU combined processor
CN102036043A (en) * 2010-12-15 2011-04-27 成都市华为赛门铁克科技有限公司 Video data processing method and device as well as video monitoring system
CN102135949A (en) * 2011-03-01 2011-07-27 浪潮(北京)电子信息产业有限公司 Computing network system, method and device based on graphic processing unit
CN102184397A (en) * 2011-04-25 2011-09-14 中国测绘科学研究院 Fast remote sensing image normal incidence correction method
CN102541804A (en) * 2011-12-26 2012-07-04 中国人民解放军信息工程大学 Multi-GPU (graphic processing unit) interconnection system structure in heterogeneous system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BEYOND071: "《http://bbs.csdn.net/topics/370097506》", 1 August 2011 *
LIU SHA等: "《SOLVERS FOR SYSTEMS OF LARGE SPARSE LINEAR AND NONLINEAR EQUATIONS BASED ON MULTI-GPUS》", 《TRANSACTIONS OF NANJING UNIVERSITY OF AERONAUTICS & ASTRONAUTICS》 *
刘伟峰等: "《基于多GPU的三维Kirchhoff积分法体偏移》", 《华中科技大学学报(自然科学版)》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015192812A1 (en) * 2014-06-20 2015-12-23 Tencent Technology (Shenzhen) Company Limited Data parallel processing method and apparatus based on multiple graphic procesing units
CN104035751B (en) * 2014-06-20 2016-10-12 深圳市腾讯计算机系统有限公司 Data parallel processing method based on multi-graphics processor and device
US10282809B2 (en) 2014-06-20 2019-05-07 Tencent Technology (Shenzhen) Company Limited Data parallel processing method and apparatus based on multiple graphic processing units
CN104536936A (en) * 2015-01-28 2015-04-22 浪潮电子信息产业股份有限公司 Draw-bar box type programmable calculator device
WO2017206591A1 (en) * 2016-06-01 2017-12-07 华为技术有限公司 Data processing system and data processing method
CN106776455A (en) * 2016-12-13 2017-05-31 郑州云海信息技术有限公司 A kind of method and device of many GPU communications of unit
CN107846709A (en) * 2017-09-29 2018-03-27 深圳市亿兆互联技术有限公司 A kind of radio communication device and wireless communications method based on LoRa
CN107846709B (en) * 2017-09-29 2021-08-24 深圳市亿兆互联技术有限公司 Wireless communication device and wireless communication method based on LoRa
CN110908805A (en) * 2019-11-29 2020-03-24 深圳前海达闼云端智能科技有限公司 Information distribution method, robot and storage medium

Also Published As

Publication number Publication date
CN103049421B (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN103049421A (en) Method and device for data transmission between central processing unit (CPU) and co-processors
CN101546276B (en) Method for achieving interrupt scheduling under multi-core environment and multi-core processor
JP6224244B2 (en) Power balancing to increase working density and improve energy efficiency
CN104796337A (en) Method and device for forwarding message
JP2015527681A5 (en)
CN105516024B (en) A kind of task flux monitoring method and system based on queue
CN102821164B (en) Efficient parallel-distribution type data processing system
WO2011150346A3 (en) Accelerator system for use with secure data storage
WO2013082069A3 (en) Method of power calculation for performance optimization
CN103516744A (en) A data processing method, an application server and an application server cluster
CN103064807A (en) Multi-channel direct memory access controller
CN105183549A (en) Automatic ticketing system based on task assignment
EP3054387A1 (en) Data compression method and storage system
CN107239342A (en) A kind of storage cluster task management method and device
CN104580503A (en) Efficient dynamic load balancing system and method for processing large-scale data
CN103607360A (en) Message processing method, line card and switching equipment
CN103916316A (en) Linear speed capturing method of network data packages
CN103888452B (en) For the order-preserving method and device of message compression
CN106126841B (en) A kind of method and apparatus based on hardware frequency conversion
CN103353750B (en) A kind of microwave metallurgical control method based on multibus
CN102446155A (en) Synchronizing device and method
CN104008067B (en) A kind of method and device of data storage
CN102780642A (en) Multichannel network message transmission method
EP3142333A1 (en) Data processing apparatus and data processing method
US11468127B1 (en) Data delivery

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant