WO2020156212A1 - 一种数据处理的方法、装置及电子设备 - Google Patents

一种数据处理的方法、装置及电子设备 Download PDF

Info

Publication number
WO2020156212A1
WO2020156212A1 PCT/CN2020/072503 CN2020072503W WO2020156212A1 WO 2020156212 A1 WO2020156212 A1 WO 2020156212A1 CN 2020072503 W CN2020072503 W CN 2020072503W WO 2020156212 A1 WO2020156212 A1 WO 2020156212A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing
core
resource
many
data
Prior art date
Application number
PCT/CN2020/072503
Other languages
English (en)
French (fr)
Inventor
祝夭龙
何伟
冯杰
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2020156212A1 publication Critical patent/WO2020156212A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present invention relates to the field of communication technology, in particular to a data processing method, device and electronic equipment.
  • artificial intelligence technology has become more and more widely used.
  • the three major factors affecting artificial intelligence technology include data, algorithms and computing power.
  • computing power is provided by chips, processing data and running algorithms How to improve the effective computing power of the chip and save power consumption while saving costs is the problem that needs to be solved at present.
  • FIG. 1 Structure processing core (Processing Core, PCore), respectively PCore A, PCore B, PCore C, PCoreD, PCoreE, PCoreF, PCoreG, PCoreH and PCore I, the core and the core can pass through the network on chip (Networks on Chip, NOC)
  • PCore A is the processing core of the nth convolutional layer
  • PCore E is the processing core of the n+1th convolutional layer
  • the data is pooled in PCore A
  • the pooled data is input
  • the pooling operation at the nth layer can reduce the calculation amount of the n+1th convolutional layer.
  • the present invention provides a data processing method, device, and electronic equipment to solve the problem of low effective computing power utilization in many-core chips in the prior art.
  • a data processing method including: determining that a plurality of first processing cores that perform first processing in a many-core chip output the pooled data in a set period The resource demand for the second processing of data; determine the resource balance quantity of the first processing core according to the resource demand; input the output data of the first processing core of the resource balance quantity into the many-core chip Any one of the second processing cores executes the second processing.
  • the second processing core can be made to work at full capacity, the effective computing power of the second processing core is improved, and the waste of resources is avoided.
  • the method further includes: setting the parameters of any first processing core in the many-core chip; copying the parameters to other first processing cores in the many-core chip Processing nuclear.
  • the same parameters are used for multiple homogeneous processing cores, and the parameter of one of the first processing cores can be set, which saves time for parameter configuration.
  • the first processing core if the first processing core is working at full capacity, the first processing core outputs the resource demand after the pooling operation in a set period as N points of the resource demand before the pooling operation One, where N is the multiple of the reduction in resource demand after the pooling operation.
  • the determining the resource balance quantity of the first processing core according to the resource demand specifically includes: determining the resource balance quantity of the first processing core as N according to the one-Nth.
  • the inputting the output data of the first processing core of the resource-balanced quantity to any one of the second processing cores in the many-core chip to perform the second processing specifically includes: The output data of the first processing core is input to any second processing core in the many-core chip to execute the second processing core, where the second processing core is working at full load.
  • N first processing cores are configured for the second processing core, so that the second processing core works at full capacity, and the waste of resources of the second processing core is avoided.
  • the method further includes: classifying the first processing cores in the many-core chip; setting the parameters of each type of the first processing core, wherein the same type The parameters of the first processing core are the same.
  • a data processing device including: a first determining unit, configured to determine that a plurality of first processing cores performing first processing in a many-core chip are within a set period Output the resource demand for the second processing of the data after the pooling operation; the second determining unit is used to determine the resource balance quantity of the first processing core according to the resource demand; the transmission unit is used to transfer the The output data of the first processing core of the resource-balanced quantity is input to any second processing core in the many-core chip to execute the second processing.
  • an electronic device including: a plurality of processing cores; and a network on a chip configured to interact data and external data among the plurality of processing cores;
  • the multiple processing cores store instructions, and according to the instructions, the electronic device executes the method as described in the first aspect or any one of the first aspect.
  • a computer-readable storage medium having computer program instructions stored thereon, and the computer program instructions, when executed by a processor, implement the same as the first aspect or any one of the first aspect.
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer, the computer can execute the computer program described in the first aspect or any one of the first aspects. method.
  • the beneficial effects of the embodiments of the present invention include: first determining the resource requirements for the second processing of the data after the pooling operation is output by the multiple first processing cores in the many-core chip performing the first processing in a set period, and then according to The resource demand determines the resource balance quantity of the first processing core, and finally the output data of the first processing core of the resource balance quantity is input to any second processing core in the many-core chip to execute the The second treatment.
  • the second processing core can simultaneously receive the data after the first processing core pooling operation of the resource balance quantity. Since the resource balance quantity of the first processing core is determined according to the resource demand, the resource balance quantity is the first
  • the output data of the processing core can make the second processing core work at full capacity, improve the effective computing power of the second processing core, and avoid the waste of resources.
  • Figure 1 is a schematic diagram of a many-core chip layout in the prior art
  • FIG. 2 is a schematic diagram of resource demand allocation before and after a pooling operation in the prior art
  • Figure 3 is a schematic diagram of a two-dimensional pooling operation in the prior art
  • FIG. 4 is a schematic diagram of a three-dimensional pooling operation in the prior art
  • Figure 6 is a flow chart of a data processing method provided by an embodiment of the present invention.
  • FIG. 7 is a flowchart of another data processing method provided by an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a many-core chip layout provided by an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of resource demand allocation before and after a pooling operation provided by an embodiment of the present invention.
  • Figure 10 is a schematic diagram of a data processing device provided by an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • the many-core chip in Figure 1 is used to pool data in a convolutional neural network, and pooling is used to reduce the feature vector that can be output by convolution and increase the robustness of the network.
  • pooling The previous data is a two-dimensional feature map, where the two-dimensional feature map is shown in Figure 3, including 16 elements, using a 2X2 maximum pooling operation, the two-dimensional feature map is downsampled from the original 16 elements into 4 After pooling, the number of pixels in the two-dimensional feature map is reduced by 4 times, that is, the resource demand of PCore A before pooling is at full load, and the resource demand of PCore E after pooling becomes 4 times before pooling.
  • One part which means that PCore E is idle for three-quarters of each time period, which wastes PCore E's computing resources, that is, PCore E's computing power.
  • a pooling operation can also be performed on a three-dimensional feature map. Specifically, as shown in FIG. 4, a maximum pooling operation of M*N is performed for each picture.
  • the resource demand distribution diagram before and after the pooling operation shown in FIG. 2 may be used, or the resource demand distribution diagram before and after the pooling operation shown in FIG. 5 may be used. That is to say, the tasks of the first four time periods are combined and run in one time period.
  • the processing core is working at full capacity in this time period, but the processing core is idle in other time periods, which wastes the computing resources of PCore E, that is, waste The computing power of PCore E, therefore, how to improve effective computing power in many-core chips is a problem that needs to be solved at present.
  • a data processing method provided by the present invention, as specifically shown in FIG. 6, includes:
  • Step S600 Determine the resource requirements for the multiple first processing cores that perform the first processing in the many-core chip to output the pooled data for the second processing within a set period.
  • Step S601 Determine the resource balance quantity of the first processing core according to the resource demand.
  • Step S602 Input the output data of the first processing core of the resource-balanced quantity to any second processing core in the many-core chip to execute the second processing.
  • the second processing core can simultaneously receive the data after the first processing core pooling operation of the resource balance quantity. Since the resource balance quantity of the first processing core is determined according to the resource demand, the resource balance quantity is the first The output data of the processing core can make the second processing core work at full capacity, improve the effective computing power of the second processing core, and avoid the waste of resources.
  • At least one first processing core in the many-core chip can be a homogeneous processing core or a heterogeneous processing core, where the computing power of the homogeneous processing core is the same, and the set parameters are the same.
  • the computing power of the core is different, and the set parameters are different.
  • Step S700 Set parameters of any first processing core in the many-core chip, and copy the parameters to other first processing cores in the many-core chip.
  • the convolutional layer of the nth layer performs a pooling operation, and the processing core used in the nth layer is called the first processing core.
  • the processing core used in the nth layer is called the first processing core.
  • Set any first processing core to perform the pooling operation according to 2X2 The other first processing cores also perform pooling operations in accordance with 2X2.
  • Step S701 Determine the resource requirements for the multiple first processing cores that perform the first processing in the many-core chip to output the pooled data for the second processing within a set period.
  • the resource requirements after the first processing core performs the pooling operation within a set period are the same.
  • the resource demand after the first processing core performs the pooling operation within the verification time period is one-fourth of the resource demand before the pooling.
  • Step S702 Determine that the resource balance quantity of the first processing core is N according to the output of the first processing core in the set period after the resource demand after the pooling operation is one N of the resource demand before the pooling operation. , Where N is the multiple of the reduction in resource demand after the pooling operation.
  • the resource demand after the first processing core performs the pooling operation within the set period is one-fourth of the resource demand before pooling, because the resource demand after each first processing core is pooled One-fourth, after pooling, output to the second processing core corresponding to the n+1th convolutional layer for processing.
  • the number of resource balances for the first processing core is 4.
  • the four determined first processing cores are PCoreA, PCoreB, PCoreC, and PCoreD, respectively, and PCoreA, PCoreB, PCoreC, and PCoreD are all working at full capacity.
  • Step S703 Input the output data of the N first processing cores to any second processing core in the many-core chip to execute the second processing.
  • the second processing core is working at full load.
  • the data corresponding to the resource demand after the pooling operation of the four first processing cores is input to the second processing core PCoreE in the many-core chip, as shown in FIG. 8.
  • the resource demand distribution diagram before and after the pooling operation is shown in Figure 9.
  • PCoreA, PCoreB, PCoreC and PCoreD are running at full capacity, and PCoreA, PCoreB, PCoreC and PCoreD are in accordance with 2X2 pooling operation, the resource demand is one-fourth of that before pooling.
  • the four processing cores of PCoreA, PCoreB, PCoreC and PCoreD will send the pooled resource demand to PCoreE, that is, at time T2 PCoreE runs at full load, and other time periods can be deduced by analogy, which is not described in detail in the present invention.
  • Specific embodiment 2 If the many-core chip includes 9 processing cores, and the 9 processing cores are heterogeneous processing cores, assuming that the first processing core is working at full capacity, the determining that the many-core chip is Before the plurality of first processing cores that perform the first processing output the pooled data in a set period and the resource demand for the second processing, classify the first processing cores in the many-core chip; set The parameters of each type of the first processing core, wherein the parameters of the first processing core of the same type are the same.
  • the first type of first processing computing power is 100
  • the second type of first processing computing power is 50
  • the third type of first processing core is 20. That is, within the set time, the computing power of the first processing core is twice that of the second type of first processing core, and 5 times of the third type of first processing core.
  • the pooling parameters of different first processing cores are also different , Perform resource balancing according to different computing power and parameters, as long as the pooled resource demand output by multiple first processing cores can make the second processing core run at full capacity.
  • Fig. 10 is a schematic diagram of a data processing device provided by an embodiment of the present invention.
  • the data processing apparatus of this embodiment includes: a first determining unit 1001, a second determining unit 1002, and a transmission unit 1003, wherein the first determining unit is used to determine multiple core chips
  • the first processing core that performs the first processing outputs the resource demand for the second processing of the pooled data within a set period;
  • the second determining unit is configured to determine the first processing according to the resource demand
  • a transmission unit configured to input the output data of the first processing core of the resource-balanced number to any second processing core in the many-core chip to perform the second processing.
  • FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in FIG. 11, the electronic device of this embodiment includes a processing core 11-1N and a network on chip 14. The processing cores 11-1N are all connected to the on-chip network 14. The network on chip 14 is used to exchange data between the N processing cores and external data.
  • Instructions are stored in the N processing cores, and the electronic device performs the following operations according to the instructions: It is determined that a plurality of first processing cores that perform the first processing in the many-core chip output the pooling operation in a set period The resource demand for the second processing of data; determine the resource balance quantity of the first processing core according to the resource demand; input the output data of the first processing core of the resource balance quantity into the many-core chip Any one of the second processing cores executes the second processing.
  • various aspects of the embodiments of the present invention can be implemented as a system, a method, or a computer program product. Therefore, various aspects of the embodiments of the present invention can take the following forms: a complete hardware implementation, a complete software implementation (including firmware, resident software, microcode, etc.), or may be generally referred to as “circuits” and “modules” in this document. "Or “system” is an implementation that combines software and hardware.
  • various aspects of the embodiments of the present invention may take the following form: a computer program product implemented in one or more computer-readable media, the computer-readable medium having computer-readable program code implemented thereon.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any appropriate combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that can contain or store a program used by an instruction execution system, device, or device or a program used in conjunction with an instruction execution system, device, or device.
  • the computer-readable signal medium may include a propagated data signal having computer-readable program code implemented therein as in baseband or as part of a carrier wave. Such propagated signals can take any of a variety of forms, including but not limited to: electromagnetic, optical, or any suitable combination thereof.
  • the computer-readable signal medium may be any of the following computer-readable media: it is not a computer-readable storage medium, and it can communicate and propagate the program used by the instruction execution system, device or device or used in conjunction with the instruction execution system, device or device Or transmission.
  • Any suitable medium including but not limited to wireless, wired, fiber optic cable, RF, etc. or any appropriate combination of the foregoing can be used to transmit the program code implemented on the computer-readable medium.
  • the computer program code used to perform the operations of the various aspects of the embodiments of the present invention can be written in any combination of one or more programming languages, the programming languages including: object-oriented programming languages such as Java, Smalltalk, C++, etc.; And conventional process programming languages such as "C" programming language or similar programming languages.
  • the program code can be executed as an independent software package entirely on the user's computer, partly on the user's computer; partly on the user's computer and partly on the remote computer; or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any type of network including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (for example, by using the Internet of an Internet service provider) .
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, by using the Internet of an Internet service provider
  • These computer program instructions can also be stored in a computer-readable medium that can direct a computer, other programmable data processing equipment, or other devices to operate in a specific manner, so that the generation of instructions stored in the computer-readable medium includes implementation in the flowcharts and / Or block diagram block or the product of the instruction of the function/action specified in the block.
  • Computer program instructions can also be loaded on a computer, other programmable data processing equipment or other devices, so that a series of operable steps are executed on the computer, other programmable equipment or other devices to generate a computer-implemented process, so that the computer Or instructions executed on other programmable devices provide processes for implementing functions/actions specified in the flowchart and/or block diagrams or blocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Multi Processors (AREA)

Abstract

本发明提供了一种数据处理的方法、装置及电子设备,用于解决现有技术中在众核芯片中有效算力利用率低的问题。包括:确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量;根据所述资源需求量确定所述第一处理核的资源配平数量;将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。

Description

一种数据处理的方法、装置及电子设备
本申请要求了2019年01月28日提交的、申请号为201910080981.8、发明名称为“一种数据处理的方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信技术领域,尤其涉及一种数据处理的方法、装置及电子设备。
背景技术
随着互联网应用的发展,人工智能技术得到越来越广泛的应用,影响人工智能技术的三大要素包括数据、算法和算力,其中,算力是由芯片提供的,是处理数据和运行算法的核心动力,如何在节约成本的情况下提升芯片的有效算力并且节约功耗,是目前需要解决的问题。
现有技术中,采用众核芯片来提升有效算力,例如,在卷积神经网络中对数据进行池化操作为例,假设众核芯片布局如图1所示,图1中有9个同构处理核(Processing Core,PCore),分别为PCore A、PCore B、PCore C、PCoreD、PCoreE、PCoreF、PCoreG、PCoreH和PCore I,核与核之间可以通过片上网络(Networks on Chip,NOC)进行通信,其中,PCore A为第n层卷积层的处理核,PCore E为第n+1层卷积层的处理核,在PCore A中对数据进行池化操作,池化后的数据输入到PCore E处理核,在第n层进行池化操作可以减少第n+1层卷积层的计算量,假设在每个时间段(Ti)中PCore A为满操作运算,采用2X2的池化操作,则计算量降低4倍,当数据输入PCore E时,由于PCore E与PCore A为同构处理核,算力相同,处理能力相同,当计算量降低4倍时,即处理四分之一的计算量,需要的运算时间为每个时间段的四分之一,具体如图2所示,为池化前后PCore A与PCore E的资源需求量分配示意图。可以看出,采用现有技术的方法,PCore E在每个时间段中有四分之三的时间之内是闲置的,浪费了PCore E的计算资源,即浪费了PCore E的算力,因此,如何在众核芯片中提升有效算力是目前需要解决的问题。
发明内容
有鉴于此,本发明提供了一种数据处理的方法、装置及电子设备,用于解决现有技术中在众核芯片中有效算力利用率低的问题。
根据本发明实施例的第一个方面,提供了一种数据处理的方法,包括:确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量;根据所述资源需求量确定所述第一处理核的资源配平数量;将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
通过上述方法,可以使第二处理核满负荷工作,提高了第二处理核的有效算力,避免了资源的浪费。
在一个实施例中,若所述众核芯片中的处理核为同构处理核,所述确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量之前,该方法还包括:设置所述众核芯片中任一第一处理核的参数;将所述参数复制到所述众核芯片中的其它第一处理核。
通过该方法,对于多个同构处理核使用相同的参数,只要设置其中一个第一处理核的参数的参数即可,节约了参数配置的时间。
在一个实施例中,若所述第一处理核为满负荷工作,所述第一处理核在设定周期内输出进行池化操作后的资源需求量为池化操作前资源需求量的N分之一,其中,N为池化操作后资源需求量降低的倍数。
通过该方法,通过池化操作降低资源需求量,节约资源。
在一个实施例中,所述根据所述资源需求量确定所述第一处理核的资源配平数量,具体包括:根据所述N分之一确定第一处理核的资源配平数量为N。
在一个实施例中,所述将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理,具体包括:将N个第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理核,其中,所述第二处理核为满负荷工作。
通过该方法,为第二处理核配置N个第一处理核,使第二处理核满负荷工作,避免了第二处理核资源的浪费。
在一个实施例中,若所述众核芯片中的处理核为异构处理核,所述确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量之前,该方法还包括:将所述众核芯片中的第一处理核进行分类; 设置每一类所述第一处理核的参数,其中,同一类所述第一处理核的参数相同。
通过该方法,对异构处理核进行分类,每一类设置相同的参数,节约了参数配置的时间。
根据本发明实施例的第二个方面,提供了一种数据处理的装置,包括:第一确定单元,用于确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量;第二确定单元,用于根据所述资源需求量确定所述第一处理核的资源配平数量;传输单元,用于将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
根据本发明实施例的第三个方面,提供了一种电子设备,所述电子设备包括:多个处理核;以及片上网络,被配置为交互所述多个处理核间的数据和外部数据;所述多个处理核中存储指令,根据所述指令所述电子设备执行如第一方面或第一方面任一种可能所述的方法。
根据本发明实施例的第四个方面,提供了一种计算机可读存储介质,其上存储计算机程序指令,所述计算机程序指令在被处理器执行时实现如第一方面或第一方面任一种可能所述的方法。
根据本发明实施例的第五个方面,提供了一种计算机程序产品,所述计算机程序产品在计算机上运行时,使得所述计算机执行如第一方面或第一方面任一种可能所述的方法。
本发明实施例的有益效果包括:首先确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量,然后根据所述资源需求量确定所述第一处理核的资源配平数量,最后将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。通过上述方法,第二处理核可以同时接收到资源配平数量的第一处理核池化操作后的数据,由于根据资源需求量确定的第一处理核的资源配平数量,因此资源配平数量的第一处理核的输出数据可以使第二处理核满负荷工作,提高了第二处理核的有效算力,避免了资源的浪费。
附图说明
通过以下参照附图对本发明实施例的描述,本发明的上述以及其它目的、特征和优点将更为清楚,在附图中:
图1是现有技术中一种众核芯片布局示意图;
图2是现有技术中一种池化操作前后资源需求量分配示意图;
图3是现有技术中一种二维池化操作示意图;
图4是现有技术中一种三维维池化操作示意图;
图5是现有技术中另一种池化操作前后资源需求量分配示意图;
图6是本发明实施例提供的一种数据处理的方法流程图;
图7是本发明实施例提供的另一种数据处理的方法流程图;
图8是本发明实施例提供的一种众核芯片布局示意图;
图9是本发明实施例提供的一种池化操作前后资源需求量分配示意图;
图10是本发明实施例提供的一种数据处理的装置示意图;
图11是本发明实施例提供的一种电子设备结构示意图。
具体实施方式
以下基于实施例对本发明进行描述,但是本发明并不仅仅限于这些实施例。在下文对本发明的细节描述中,详尽描述了一些特定的细节部分。对本领域技术人员来说没有这些细节部分的描述也可以完全理解本申请。此外,本领域普通技术人员应当理解,在此提供的附图都是为了说明的目的。
除非上下文明确要求,否则整个申请文件中的“包括”、“包含”等类似词语应当解释为包含的含义而不是排他或穷举的含义;也就是说,是“包括但不限于”的含义。
在本发明的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,不代表顺序,也不能理解为指示或暗示相对重要性。此外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。
在现有技术中,采用图1中的众核芯片在卷积神经网络中对数据进行池化操作,通过池化来降低卷积才能输出的特征向量,增加网络的鲁棒性,假设池化前的数据为二维特征图,其中,所述二维特征图如图3所示,包括16个元素,采用2X2的最大池化操作,二维特征图由原来的16个元素下采样成4个元素,池化后二维特征图中的像素个数降低了4倍,即池化前PCore A的资源需求量为满负荷,池化后PCore E的资源需求量成为了池化前的4分之一,也就是说PCore E在每个时间段中有四分之三的时间之内是闲置的,浪费了PCore E的计算资源,即浪费了PCore E的算力。
现有技术中,也可以对三维特征图进行池化操作,具体如图4所示,针对每个图片进行M*N的最大池化操作。现有技术中,针对上述2X2的最大池化操作可以采用如图2所示的池化操作前后资源需求量分配示意图,也可以采用如图5所示的池化操作前后资源需求量分配示意图,即将前四个时间段的任务合并在一个时间段中运行,是该时间段中处理核满负荷工作,但处理核在其他时间段内是闲置的,浪费了PCore E的计算资源,即浪费了PCore E的算力,因此,如何在众核芯片中提升有效算力是目前需要解决的问题。
本发明提供的一种数据处理的方法,具体如图6所示,包括:
步骤S600、确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量。
步骤S601、根据所述资源需求量确定所述第一处理核的资源配平数量。
步骤S602、将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
本发明实施例中,首先确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量,然后根据所述资源需求量确定所述第一处理核的资源配平数量,最后将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。通过上述方法,第二处理核可以同时接收到资源配平数量的第一处理核池化操作后的数据,由于根据资源需求量确定的第一处理核的资源配平数量,因此资源配平数量的第一处理核的输出数据可以使第二处理核满负荷工作,提高了第二处理核的有效算力,避免了资源的浪费。
本发明实施例中,众核芯片中至少一个第一处理核可以为同构处理核,也可以为异构处理核,其中,同构处理核的算力相同,设置的参数相同,异构处理核的算力不同,设置的参数不同,针对上述两种情况通过以下两个具体实施例进行详细说明。
具体实施例一、若所述众核芯片中包括9个处理核,且所述9个处理核为同构处理核,假设所述第一处理核为满负荷工作,处理流程如下:
步骤S700、设置所述众核芯片中任一第一处理核的参数,将所述参数复制到所述众核芯片中的其它第一处理核。
举例说明,在卷积神经网络中,第n层的卷积层进行池化操作,第n层中使用的处理核称为第一处理核,设置任一第一处理核按照2X2进行池化操作,则其他第一处 理核也按照2X2进行池化操作。
步骤S701、确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量。
具体的,所述第一处理核在设定周期内进行池化操作后的资源需求量相同。
举例说明,所述第一处理核在审定时间段内进行池化操作后的资源需求量为池化前资源需求量的4分之一。
步骤S702、根据所述第一处理核在设定周期内输出进行池化操作后的资源需求量为池化操作前资源需求量的N分之一,确定第一处理核的资源配平数量为N,其中,N为池化操作后资源需求量降低的倍数。
举例说明,所述第一处理核在设定周期内进行池化操作后的资源需求量为池化前资源需求量的4分之一,由于每个第一处理核池化后的资源需求量为4分之一,池化后输出到第n+1卷积层对应的第二处理核进行处理,当第二处理核需要满负荷,则需要第一处理核的资源配平数量为4。假设,确定的4个第一处理核分别为PCoreA、PCoreB、PCoreC和PCoreD,其中,PCoreA、PCoreB、PCoreC和PCoreD都是满负荷工作。
步骤S703、将所述N个第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
其中,所述第二处理核为满负荷工作。
举例说明,上述4个第一处理核进行池化操作后的资源需求量对应的数据输入到所述众核芯片中的第二处理核PCoreE,具体图8所示。
本发明实施例中,上述举例说明中,池化操作前后资源需求量分配示意图如图9所示,时间段T1时,PCoreA、PCoreB、PCoreC和PCoreD满负荷运行,PCoreA、PCoreB、PCoreC和PCoreD按照2X2进行池化操作,资源需求量为池化前的4分之1,PCoreA、PCoreB、PCoreC和PCoreD的4个处理核心将池化后的资源需求量发送给PCoreE,即在时间段T2时,PCoreE为满负荷运行,其他时间段以此类推,本发明对其不做赘述。
具体实施例二、若所述众核芯片中包括9个处理核,且所述9个处理核为异构处理核,假设所述第一处理核为满负荷工作,所述确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量之前,将所述众核芯片中的第一处理核进行分类;设置每一类所述第一处理核的参数,其中,同一类所述第一处理核的参数相同。
例如,假设将所述众核芯片中的第一处理核分为三类,第一类第一处理核算力为100、第二类第一处理核算力为50、第三类第一处理核为20,即在设定时间内第一处理核的算力是第二类第一处理核的2倍,是第三类第一处理核的5倍,不同第一处理核的池化参数也不同,根据不同的算力及参数进行资源配平,只要多个第一处理核输出的池化后的资源需求量可以使第二处理核满负荷运行即可。
图10是本发明实施例提供的一种数据处理的装置示意图。如图10所示,本实施例的数据处理的装置包括:第一确定单元1001、第二确定单元1002和传输单元1003,其中,所述第一确定单元,用于确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量;第二确定单元,用于根据所述资源需求量确定所述第一处理核的资源配平数量;传输单元,用于将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
图11是本发明实施例的电子设备的结构示意图。如图11所示,本实施例的电子设备包括处理核11-1N以及片上网络14。处理核11-1N均与片上网络14连接。片上网络14用于交互所述N个处理核间的数据和外部数据。所述N个处理核中存储指令,根据所述指令所述电子设备执行如下操作:确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量;根据所述资源需求量确定所述第一处理核的资源配平数量;将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
如本领域技术人员将意识到的,本发明实施例的各个方面可以被实现为系统、方法或计算机程序产品。因此,本发明实施例的各个方面可以采取如下形式:完全硬件实施方式、完全软件实施方式(包括固件、常驻软件、微代码等)或者在本文中通常可以都称为“电路”、“模块”或“系统”的将软件方面与硬件方面相结合的实施方式。此外,本发明实施例的各个方面可以采取如下形式:在一个或多个计算机可读介质中实现的计算机程序产品,计算机可读介质具有在其上实现的计算机可读程序代码。
可以利用一个或多个计算机可读介质的任意组合。计算机可读介质可以是计算机可读信号介质或计算机可读存储介质。计算机可读存储介质可以是如(但不限于)电子的、磁的、光学的、电磁的、红外的或半导体系统、设备或装置,或者前述的任意适当的组合。计算机可读存储介质的更具体的示例(非穷尽列举)将包括以下各项:具有一根或多根电线的电气连接、便携式计算机软盘、硬盘、随机存取存储器(RAM)、只 读存储器(ROM)、可擦除可编程只读存储器(EPROM或闪速存储器)、光纤、便携式光盘只读存储器(CD-ROM)、光存储装置、磁存储装置或前述的任意适当的组合。在本发明实施例的上下文中,计算机可读存储介质可以为能够包含或存储由指令执行系统、设备或装置使用的程序或结合指令执行系统、设备或装置使用的程序的任意有形介质。
计算机可读信号介质可以包括传播的数据信号,所述传播的数据信号具有在其中如在基带中或作为载波的一部分实现的计算机可读程序代码。这样的传播的信号可以采用多种形式中的任何形式,包括但不限于:电磁的、光学的或其任何适当的组合。计算机可读信号介质可以是以下任意计算机可读介质:不是计算机可读存储介质,并且可以对由指令执行系统、设备或装置使用的或结合指令执行系统、设备或装置使用的程序进行通信、传播或传输。
可以使用包括但不限于无线、有线、光纤电缆、RF等或前述的任意适当组合的任意合适的介质来传送实现在计算机可读介质上的程序代码。
用于执行针对本发明实施例各方面的操作的计算机程序代码可以以一种或多种编程语言的任意组合来编写,所述编程语言包括:面向对象的编程语言如Java、Smalltalk、C++等;以及常规过程编程语言如“C”编程语言或类似的编程语言。程序代码可以作为独立软件包完全地在用户计算机上、部分地在用户计算机上执行;部分地在用户计算机上且部分地在远程计算机上执行;或者完全地在远程计算机或服务器上执行。在后一种情况下,可以将远程计算机通过包括局域网(LAN)或广域网(WAN)的任意类型的网络连接至用户计算机,或者可以与外部计算机进行连接(例如通过使用因特网服务供应商的因特网)。
上述根据本发明实施例的方法、设备(系统)和计算机程序产品的流程图图例和/或框图描述了本发明实施例的各个方面。将要理解的是,流程图图例和/或框图的每个块以及流程图图例和/或框图中的块的组合可以由计算机程序指令来实现。这些计算机程序指令可以被提供至通用计算机、专用计算机或其它可编程数据处理设备的处理器,以产生机器,使得(经由计算机或其它可编程数据处理设备的处理器执行的)指令创建用于实现流程图和/或框图块或块中指定的功能/动作的装置。
还可以将这些计算机程序指令存储在可以指导计算机、其它可编程数据处理设备或其它装置以特定方式运行的计算机可读介质中,使得在计算机可读介质中存储的指令产生包括实现在流程图和/或框图块或块中指定的功能/动作的指令的制品。
计算机程序指令还可以被加载至计算机、其它可编程数据处理设备或其它装置上, 以使在计算机、其它可编程设备或其它装置上执行一系列可操作步骤来产生计算机实现的过程,使得在计算机或其它可编程设备上执行的指令提供用于实现在流程图和/或框图块或块中指定的功能/动作的过程。
以上所述仅为本发明的优选实施例,并不用于限制本发明,对于本领域技术人员而言,本发明可以有各种改动和变化。凡在本发明的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种数据处理的方法,其特征在于,包括:
    确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量;
    根据所述资源需求量确定所述第一处理核的资源配平数量;
    将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
  2. 如权利要求1所述的方法,其特征在于,若所述众核芯片中的处理核为同构处理核,所述确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量之前,该方法还包括:
    设置所述众核芯片中任一第一处理核的参数;
    将所述参数复制到所述众核芯片中的其它第一处理核。
  3. 如权利要求2所述的方法,其特征在于,若所述第一处理核为满负荷工作,所述第一处理核在设定周期内输出进行池化操作后的资源需求量为池化操作前资源需求量的N分之一,其中,N为池化操作后资源需求量降低的倍数。
  4. 如权利要求3所述的方法,其特征在于,所述根据所述资源需求量确定所述第一处理核的资源配平数量,具体包括:
    根据所述N分之一确定第一处理核的资源配平数量为N。
  5. 如权利要求4所述的方法,其特征在于,所述将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理,具体包括:
    将N个第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理核,其中,所述第二处理核为满负荷工作。
  6. 如权利要求1所述的方法,其特征在于,若所述众核芯片中的处理核为异构处理核,所述确定众核芯片中多个执行第一处理的第一处理核在设定周期内输出进行池化操作后的数据进行第二处理的资源需求量之前,该方法还包括:
    将所述众核芯片中的第一处理核进行分类;
    设置每一类所述第一处理核的参数,其中,同一类所述第一处理核的参数相同。
  7. 一种数据处理的装置,其特征在于,包括:
    第一确定单元,用于确定众核芯片中多个执行第一处理的第一处理核在设定周期 内输出进行池化操作后的数据进行第二处理的资源需求量;
    第二确定单元,用于根据所述资源需求量确定所述第一处理核的资源配平数量;
    传输单元,用于将所述资源配平数量的第一处理核的输出数据输入到所述众核芯片中的任一个第二处理核执行所述第二处理。
  8. 一种电子设备,其特征在于,所述电子设备包括:
    多个处理核;以及
    片上网络,被配置为交互所述多个处理核间的数据和外部数据;
    所述多个处理核中存储指令,根据所述指令所述电子设备执行如权利要求1-6中任一项所述的方法。
  9. 一种计算机可读存储介质,其上存储计算机程序指令,其特征在于,所述计算机程序指令在被处理器执行时实现如权利要求1-6中任一项所述的方法。
  10. 一种计算机程序产品,其特征在于,所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-6中任一项所述的方法。
PCT/CN2020/072503 2019-01-28 2020-01-16 一种数据处理的方法、装置及电子设备 WO2020156212A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910080981.8A CN111488216B (zh) 2019-01-28 2019-01-28 一种数据处理的方法、装置及电子设备
CN201910080981.8 2019-01-28

Publications (1)

Publication Number Publication Date
WO2020156212A1 true WO2020156212A1 (zh) 2020-08-06

Family

ID=71795880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/072503 WO2020156212A1 (zh) 2019-01-28 2020-01-16 一种数据处理的方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN111488216B (zh)
WO (1) WO2020156212A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008013A (zh) * 2013-02-26 2014-08-27 华为技术有限公司 一种核资源分配方法、装置及众核系统
CN106650699A (zh) * 2016-12-30 2017-05-10 中国科学院深圳先进技术研究院 一种基于卷积神经网络的人脸检测方法及装置
CN108304925A (zh) * 2018-01-08 2018-07-20 中国科学院计算技术研究所 一种池化计算装置及方法
CN108304923A (zh) * 2017-12-06 2018-07-20 腾讯科技(深圳)有限公司 卷积运算处理方法及相关产品

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8291430B2 (en) * 2009-07-10 2012-10-16 International Business Machines Corporation Optimizing system performance using spare cores in a virtualized environment
US8458635B2 (en) * 2009-12-04 2013-06-04 Synopsys, Inc. Convolution computation for many-core processor architectures
US8949836B2 (en) * 2011-04-01 2015-02-03 International Business Machines Corporation Transferring architected state between cores
CN102855218A (zh) * 2012-05-14 2013-01-02 中兴通讯股份有限公司 数据处理系统、方法及装置
TWI625622B (zh) * 2013-10-31 2018-06-01 聯想企業解決方案(新加坡)有限公司 在多核心處理器系統與運作多核心處理器系統的電腦實施方法
CN105678379B (zh) * 2016-01-12 2020-08-07 腾讯科技(深圳)有限公司 一种cnn的处理方法和装置
CN106656780B (zh) * 2017-02-28 2020-07-28 中国联合网络通信集团有限公司 虚拟网关的数据配置方法及装置
CN109034373B (zh) * 2018-07-02 2021-12-21 鼎视智慧(北京)科技有限公司 卷积神经网络的并行处理器及处理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008013A (zh) * 2013-02-26 2014-08-27 华为技术有限公司 一种核资源分配方法、装置及众核系统
CN106650699A (zh) * 2016-12-30 2017-05-10 中国科学院深圳先进技术研究院 一种基于卷积神经网络的人脸检测方法及装置
CN108304923A (zh) * 2017-12-06 2018-07-20 腾讯科技(深圳)有限公司 卷积运算处理方法及相关产品
CN108304925A (zh) * 2018-01-08 2018-07-20 中国科学院计算技术研究所 一种池化计算装置及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DENG, WENQI: "The Study of Many-core Deep Learning Accelerator Based on BWDSP", MASTER THESIS, 15 January 2019 (2019-01-15), pages 1 - 76, XP009522374, ISSN: 1674-0246 *

Also Published As

Publication number Publication date
CN111488216B (zh) 2024-04-30
CN111488216A (zh) 2020-08-04

Similar Documents

Publication Publication Date Title
CN107392308B (zh) 一种基于可编程器件的卷积神经网络加速方法与系统
US11010313B2 (en) Method, apparatus, and system for an architecture for machine learning acceleration
CN111861793B (zh) 基于云边协同计算架构的配用电业务分配方法及装置
CN111813526A (zh) 用于联邦学习的异构处理系统、处理器及任务处理方法
WO2021115052A1 (zh) 一种异构芯片的任务处理方法、任务处理装置及电子设备
WO2021259098A1 (zh) 一种基于卷积神经网络的加速系统、方法及存储介质
CN112527514B (zh) 基于逻辑扩展的多核心安全芯片处理器及其处理方法
TWI775210B (zh) 用於卷積運算的資料劃分方法及處理器
Du et al. Model parallelism optimization for distributed inference via decoupled CNN structure
CN111061763B (zh) 用于生成规则引擎的规则执行计划的方法及装置
Maqsood et al. Energy and communication aware task mapping for MPSoCs
CN116762080A (zh) 神经网络生成装置、神经网络运算装置、边缘设备、神经网络控制方法以及软件生成程序
Lin et al. Latency-driven model placement for efficient edge intelligence service
WO2020156212A1 (zh) 一种数据处理的方法、装置及电子设备
CN112084023A (zh) 数据并行处理的方法、电子设备及计算机可读存储介质
WO2023087227A1 (zh) 数据处理装置及方法
WO2021213075A1 (zh) 一种基于多处理节点来进行节点间通信的方法和设备
WO2021213076A1 (zh) 基于多处理节点来构建通信拓扑结构的方法和设备
CN112801276A (zh) 数据处理方法、处理器及电子设备
CN112732634A (zh) 面向边缘计算的arm-fpga协同硬件资源局部动态重构处理方法
CN117170986B (zh) 芯片一致性处理系统,及其方法、装置、设备及介质
WO2023115529A1 (zh) 芯片内的数据处理方法及芯片
TWI798591B (zh) 卷積神經網路運算方法及裝置
CN117519996B (zh) 一种数据处理方法、装置、设备以及存储介质
WO2024067207A1 (zh) 调度方法、调度装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20749624

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20749624

Country of ref document: EP

Kind code of ref document: A1