CN109739514A

CN109739514A - Parameter processing method and Related product

Info

Publication number: CN109739514A
Application number: CN201811570061.6A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Zhongke Cambrian Technology Co Ltd
Current assignee: Cambricon Technologies Corp Ltd; Beijing Zhongke Cambrian Technology Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2019-05-10
Anticipated expiration: 2038-12-21
Also published as: CN109739514B

Abstract

The disclosure provides a kind of parameter processing method and Related product, applied to artificial intelligence chip, superstratum interface and deep learning frame are deployed in artificial intelligence chip, it include container in deep learning frame, container is the class or structural body for storing parameter, it is connect with superstratum interface, wherein method includes: that the first parameter is written in container superstratum interface；Deep learning frame obtains the first parameter from container, and the first parameter and the module data of deep learning frame are interacted, and obtains the second parameter, and the second parameter is transmitted in container；Superstratum interface obtains the second parameter from container.The embodiment of the present application is crossed all and the first parameter is written into container, improves the concurrent operation effect in deep learning frame, by counting and obtaining the second parameter, improves the monitoring property of concurrent operation performance.

Description

Parameter processing method and Related product

Technical field

This disclosure relates to artificial intelligence field, and in particular to a kind of parameter processing method and Related product.

Background technique

With the development of artificial intelligence industry, more and more deep learning frames are redeveloped and are used by everybody.And In the mating artificial intelligence chip development use process of deep learning frame, it usually needs user sets some parameters to frame to reach Effect is calculated to better, or obtains the operating status that some parameters in frame carry out monitoring framework.

Deep learning frame is not directed to the relevant parameter setting mechanism of artificial intelligence chip and mode at present, leads to user The acquisition of parameter setting or chip operation related data can not be carried out for artificial intelligence chip.How this status is changed Into at urgent problem to be solved.

Disclosure

In view of this, the disclosure is designed to provide a kind of parameter processing method and Related product, by new volume increasing device, Then will be used to describe deep learning frame degree of concurrence the first parameter write-in container in, then by container the first parameter with Other modules of deep learning frame combine the second parameter obtained for monitoring concurrent operation performance, improve deep learning frame Calculating effect, while increasing the monitoring property of concurrent operation performance.

In order to solve the above-mentioned technical problem, first aspect of the embodiment of the present invention provides a kind of parameter processing method, application In artificial intelligence chip,

Superstratum interface and deep learning frame are deployed in the artificial intelligence chip, in the deep learning frame Including container, the container is connect with the superstratum interface, which comprises

The superstratum interface injects the first parameter in the container, wherein first parameter is described for describing The degree of concurrence of deep learning frame；

The deep learning frame obtains first parameter from the container, and by first parameter and the depth The module data of degree learning framework interacts, and obtains the second parameter, and second parameter is transmitted in the container, institute The concurrent operation performance of deep learning frame of second parameter for monitoring the first parameter description is stated, the container is to be used for Store the class or structural body of parameter；

The superstratum interface obtains the second parameter from the container.

In optional situation, before the first parameter is written in container the superstratum interface, the method also includes:

It include parameter data fields in the container, the parameter data fields are for being directed toward the first parameter and the second ginseng Number.

In optional situation, first parameter includes data parallel degree and model degree of parallelism.

In optional situation, second parameter includes channel extinction time and channel extinction time summation.

It is described to interact first parameter and the module data of the deep learning frame in optional situation, it obtains Obtain the second parameter, comprising:

The module that the data parallel degree is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel Corresponding channel extinction time (CET) and channel extinction time summation (CETS) are spent, the CETS and the CET are calculated for counting The calculating time of son；

The module that the model degree of parallelism is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel Spend corresponding CET and CETS.

In optional situation, the deep learning frame is MXNet deep learning frame.

In optional situation, the deep learning frame further includes carrier, the method also includes:

It carries out the transmitting of the parameter between the container and the module of the deep learning frame by the carrier to interact, institute Stating parameter includes the first parameter and the second parameter.

In optional situation, the artificial intelligence chip further includes bottom library module, the method also includes:

It carries out the transmitting of the parameter between the container and the bottom library module by the carrier to interact, the parameter packet Include the first parameter and the second parameter.

In optional situation, the container includes primary class or structural body in the deep learning frame, or is directed to institute State the class or structural body that artificial intelligence chip independently creates in the deep learning frame.

Second aspect of the embodiment of the present invention provides a kind of parameter processing apparatus, is applied to artificial intelligence chip, the people Superstratum interface and deep learning frame are deployed in work intelligent chip, include container in the deep learning frame, it is described Container is connect with the superstratum interface, and described device includes:

Writing module, for the first parameter to be written in container by the superstratum interface, wherein first ginseng Count the degree of concurrence for describing the deep learning frame；

Computing module, for obtaining first parameter from the container by the deep learning frame, and by institute The data for stating the first parameter and the module of the deep learning frame interact, and obtain the second parameter, and described second is joined Number is transmitted in the container, and second parameter is used to monitor the performance of concurrent operation, and the container is for storing parameter Class or structural body；

Module is obtained, for obtaining the second parameter from the container by the superstratum interface.

The third aspect of the embodiment of the present invention provides a kind of chip, the parameter processing apparatus provided including second aspect.

Fourth aspect of the embodiment of the present invention provides a kind of chip-packaging structure, which includes the above-mentioned third aspect The chip；

The 5th aspect of the embodiment of the present invention provides a kind of board, which includes the envelope of chip described in above-mentioned fourth aspect Assembling structure.

6th aspect, the embodiment of the present application provide a kind of electronic device, which includes above-mentioned fourth aspect institute Board described in the chip-packaging structure stated or above-mentioned 5th aspect.

The 7th aspect of the embodiment of the present invention provides a kind of storage medium, for storing the calculating for being used for electronic data interchange Machine program, wherein the computer program makes the instruction of step described in computer execution first aspect either method.

As can be seen that parameter processing method disclosed in the embodiment of the present application, deploys upper layer language in artificial intelligence chip It says interface and deep learning frame, includes container in deep learning frame, container is connect with superstratum interface, first upper layer language Say that the first parameter is written in container interface, then deep learning frame obtains the first parameter from container, in conjunction with the first parameter The second parameter is obtained with the module parameter of deep learning frame, and the second parameter is transmitted in container, last superstratum connects Mouth obtains the second parameter from container and is supplied to user.Because of the first parameter is used to describe deep learning frame and stroke Degree, the second parameter is used to monitor the performance of concurrent operation, therefore this process is improved by the way that the first parameter is written into container Concurrent operation effect in deep learning frame improves supervising for concurrent operation performance by counting and obtaining the second parameter The property surveyed.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

Figure 1A is a kind of artificial intelligence chip provided by the embodiments of the present application.

Figure 1B is a kind of parameter processing method flow diagram provided by the embodiments of the present application.

Fig. 2 is another parameter processing method flow diagram provided by the embodiments of the present application.

Fig. 3 is another parameter processing method flow diagram provided by the embodiments of the present application.

Fig. 4 is a kind of parameter processing apparatus provided by the embodiments of the present application.

Fig. 5 is a kind of schematic diagram of combined treatment device provided by the embodiments of the present application.

Fig. 6 is the structure chart of another combined treatment device provided by the embodiments of the present application.

Fig. 7 is a kind of structural schematic diagram of board provided by the embodiments of the present application.

Specific embodiment

For the purposes, technical schemes and advantages of the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing is described in further detail the disclosure.

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall in the protection scope of this application.

Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and Implicitly understand, embodiment described herein can be combined with other embodiments.

Figure 1A is please referred to, Figure 1A is a kind of artificial intelligence chip provided by the embodiments of the present application, as shown in Figure 1A, artificial intelligence Energy chip 10 includes superstratum interface 101 and deep learning frame 100, and superstratum interface is deep for accessing programming language The module in learning framework including container and other deep learning frames is spent, container can be carried out with the module of deep learning frame Data interaction, the module of deep learning frame include graph executor module, each operator module and engine mould Block etc..Optionally, superstratum interface 101 can also be deployed on other chips or device, other chips or device and artificial Intelligent chip connection, also can be carried out information exchange between the two.In addition, artificial intelligence chip 10 also may include bottom library module 102, bottom library module includes bottom run-time library and drive module etc..Further include carrier in deep learning frame 100, for into Data transmitting between row container and other modules of deep learning frame or bottom library module.

Figure 1B is please referred to, Figure 1B is a kind of parameter processing method flow diagram disclosed in application embodiment, this parameter place Reason method is applied to artificial intelligence chip as shown in Figure 1A, and as shown in Figure 1B, this method specifically comprises the following steps:

111, the first parameter is written in container the superstratum interface, wherein first parameter is described for describing The degree of concurrence of deep learning frame.

Deep learning frame is the code skeleton for carrying out deep learning project, currently a popular deep learning frame packet Include Tensorflow, Caffe, Theano, MXNet, Torch and PyTorch etc..Interface be in system two independent components into The Border of row information exchange.Superstratum and deep learning frame are two individual components, therefore exist between them and connect Mouthful, for carrying out information exchange.Superstratum such as Python, R language etc. can be used in deep learning, regular situation Under, superstratum interface is directly connected to deep learning frame.But lack relevant parameter setting mechanism in this interface, So that ginseng setting can not be carried out to artificial intelligence chip by user and parameter obtains, it is therefore, new in the lower layer of superstratum interface Volume increasing device, for carrying out the acquisition of parameter setting and related data.It is obtained for carrying out parameter setting and parameter in a reservoir Parameter data fields can increase newly in a reservoir, can also be newly-increased in other modules, and parameter setting and parameter is then specified to obtain Position be container position.

Container is the class or structural body for storing data, belongs to a module in deep learning frame.Deep learning Container in frame can be primary class or structural body in deep learning frame, then increases newly and is used in such or structural body Carry out the field that parameter setting and parameter obtain, such as graphexecutor class.Alternatively, the container in deep learning frame Can be user is the class or structure that the parameter processing method in artificial intelligent chip independently creates, such as mludevice equipment Class is individually used for carrying out the field that parameter setting and parameter obtain.

Optionally, this method further include: include parameter data fields in the container, the parameter data fields are for referring to To the first parameter and the second parameter.

Specifically, before creating parameter data fields in a reservoir, not about the first ginseng in entire artificial intelligence chip Several and the second parameter data field, therefore also can not just carry out the setting of the first parameter and the acquisition of the second parameter.In container Middle creation is related to the parameter data fields of the first parameter and the second parameter, is used to indicate the acquisition side of the first parameter and the second parameter Formula, with the interactive mode of other modules or interface and data storage location etc., be also convenient for the first parameter and the second parameter into Row management.Alternatively, it is also possible to create parameter data fields in other position, but data storage is carried out by container.

Optionally, the first parameter includes data parallel degree and model degree of parallelism.

Optionally, the deep learning frame in the embodiment is MXNet deep learning frame.

Data parallel (data parallelism, DP) refers to that different kernels or processing unit locate data parallel Reason, data parallel degree refers to when carrying out parallel processing to data, the maximum number that executes parallel；Parallel (the Model of model Parallelism, MP) refer to that an operator or model carry out parallel processing on multiple kernels, model degree of parallelism refers to mould When type or operator carry out parallel processing, the maximum number that executes parallel.When MXNet deep learning frame is on artificial intelligence chip When operation, operand is huge, in order to reduce operation time, improves operation efficiency, needs using DP or MP, or uses two simultaneously Kind concurrent operation.And in order to reach better operation effect, it needs to be configured data degree of parallelism and model degree of parallelism, a side Face will enable the degree of parallelism parameter of setting want to match with the hardware foundation of artificial intelligence chip, on the other hand, work as input data Scale, degree of rarefication or when other feature differences, it is also desirable to different degree of parallelism parameters is set.By the data parallel degree of setting And/or model degree of parallelism is written by programming language, is then injected in container by superstratum interface, that is, is completed the first parameter Setting.

MXNet is a deep learning frame, support C++, Python, R, Scala, Julia, Matlab and The language such as JavaScript support order and symbol programming, may operate on any hardware including artificial intelligence chip, be One of current classic deep learning frame.Therefore can be implemented well with the application using MXNet deep learning frame The method of example combines, and completes the setting of the first parameter and the acquisition of the second parameter.

112, the deep learning frame obtains first parameter from the container, by first parameter with it is described The module data of deep learning frame interacts, and obtains the second parameter, and second parameter is transmitted in the container, Second parameter is used to monitor the performance of the concurrent operation of the deep learning frame of the first parameter description.

After first parameter setting is completed and injected in container, the module of deep learning frame obtains the first ginseng from container Number, the module of deep learning frame includes graph executor module, each operator module and engine module etc..Such as Each operator module carries out concurrent operation if necessary, then needs to obtain the first parameter, then according to the first parameter coalescing operator Other parameters in module, such as data size etc. can be obtained the second parameter, and the second parameter is for monitoring concurrent operation The parameter of energy, the second parameter of acquisition need to pass back in container.

Optionally, the second parameter includes channel extinction time and channel extinction time summation.

Optionally, the first parameter and the module data of deep learning frame are interacted, obtain the second parameter, comprising: The module that data parallel degree is transmitted to deep learning frame is subjected to data interaction, the corresponding channel of data parallel degree is obtained and disappears Time (CET) and channel extinction time summation (CETS)；The module that model degree of parallelism is transmitted to deep learning frame is counted According to interaction, the corresponding CET and CETS of data parallel degree is obtained, wherein CETS and CET is used for the calculating time of Statistical Operator.

Specifically, when deep learning frame uses DP or MP, there are multiple parallel channels, channel extinction time (Channel Elapsed Time, CET) and channel extinction time summation (Channel Elapsed Time Sum, CETS), It is all the performance parameter that concurrent operation is carried out for describing multiple parallel channels, the calculating time for Statistical Operator.By basis The the second parameter transmitting for the individual module or entire depth learning framework that first parameter and the module of deep learning frame obtain Into container, that is, complete the acquisition of the second parameter.

113, the superstratum interface obtains the second parameter from the container.

Superstratum interface can obtain the second parameter with container from container and be exposed, then the second parameter for It is visible for user, user can be by the operational performance of the second parameter monitoring deep learning frame, and then can pass through It modifies the first parameter or other parameters and the second parameter is adjusted or is improved, promote the operation effect of deep learning frame.

Optionally, deep learning frame further includes carrier, this method further include: the module of container and deep learning frame is logical It crosses carrier and carries out data transmitting interaction.

Carrier is the class or structural body for being used to carry out data transmitting interaction in deep learning frame, container and deep learning Other modules are not directly linked, and can carry out data transmitting by carrier.Such as the carrier in MXNet frame can be operator Context class OpContext, container, can be by the first parameter assignment to carrier after injecting the first parameter, and carrier is again by One parameter passes to the module of deep learning frame.Likewise, the second parameter can also be by carrier from the mould of deep learning frame Block is transmitted to container.

Optionally, artificial intelligence chip further includes bottom library module, this method further include: by described in carrier progress Parameter transmitting between container and the bottom library module interacts, and the parameter includes the first parameter and the second parameter.

Specifically, bottom library module includes bottom run-time library and drive module etc., and the parameter in these bottom libraries may also The parallel performance or other performances of deep learning frame are influenced, therefore container can also be carried out by carrier and bottom library module Data interaction, to obtain concurrent operation performance parameter or other performance parameters.

As it can be seen that in the embodiment of the present application, superstratum interface and deep learning frame are deployed in artificial intelligence chip, It include container in deep learning frame, container is connect with superstratum interface, and the first parameter is written superstratum interface first In container, then deep learning frame obtains the first parameter from container, in conjunction with the module of the first parameter and deep learning frame The second parameter of gain of parameter, and the second parameter is transmitted in container, last superstratum interface obtains the second ginseng from container It counts and is supplied to user.Because the first parameter is used to describe the degree of concurrence of deep learning frame, the second parameter is for monitoring simultaneously The performance of row operation, therefore this process is improved parallel in deep learning frame by the way that the first parameter is written into container Operation effect improves the monitoring property of concurrent operation performance by counting and obtaining the second parameter.

Consistent with the above, referring to Fig. 2, Fig. 2 is another parameter processing method process provided by the embodiments of the present application Schematic diagram, as shown in Fig. 2, the parameter processing method includes:

201, the relevant parameter data fields of artificial intelligent chip are created in a reservoir, and the parameter data fields are related to One parameter and the second parameter；

202, superstratum interface injects first parameter in the container, wherein first parameter is for describing The degree of concurrence of the deep learning frame；

203, the deep learning frame further includes carrier, and the deep learning frame obtains described from the container One parameter is interacted first parameter and the module data of deep learning frame by the carrier, obtains the second ginseng Number；

204, second parameter is transmitted in the container by the deep learning frame by the carrier, and described Two parameters are used to monitor the performance of concurrent operation；

205, artificial intelligence chip further includes bottom library module, and the container and the bottom library module pass through the carrier The transmitting interaction of parameter is carried out, the parameter includes the first parameter and the second parameter.

Wherein, the specific descriptions of above-mentioned steps 201- step 205 are referred to parameter processing described in step 101-103 The corresponding description of method, details are not described herein.

It can be seen that, by volume increasing device new in deep learning frame, then carrying out depth by carrier in the embodiment of the present application Parameter between learning framework and container is interactive and the parameter between bottom library module and container interacts, because of the first parameter For describing the degree of concurrence of deep learning frame, the second parameter is used to monitor the performance of concurrent operation, therefore this process is logical It crosses and the first parameter is written into container, the concurrent operation effect in deep learning frame is improved, by counting and obtaining second Parameter improves the monitoring property of concurrent operation performance.

Consistent with the above, referring to Fig. 3, Fig. 3 is another parameter processing method process provided by the embodiments of the present application Schematic diagram, as shown in figure 3, the parameter processing method includes:

301, data parallel degree is set, the data parallel degree is used to describe the different piece of different kernel processes data When, the maximum number that executes parallel；

302, setting model degree of parallelism, the model degree of parallelism are enterprising in multiple kernels for describing an operator or model When row operation, the maximum number that executes parallel；

303, the data parallel degree and/or the model degree of parallelism are injected by the appearance by the superstratum interface In device；

304, the module that the data parallel degree is transmitted to deep learning frame is subjected to data interaction, obtains the data Degree of parallelism corresponding CET and CETS, the CETS and the CET are used for the calculating time of Statistical Operator；

305, the module that the model degree of parallelism is transmitted to deep learning frame is subjected to data interaction, obtains the data Degree of parallelism corresponding CET and CETS；

306, the data parallel degree and/or the corresponding CETS and CET of the model degree of parallelism are transmitted to the container In；

307, the superstratum interface obtains the data parallel degree and/or the model degree of parallelism from the container Corresponding CETS and CET.

Wherein, the specific descriptions of above-mentioned steps 301- step 307 are referred to parameter processing described in step 101-103 The corresponding description of method, details are not described herein.

It can be seen that, by volume increasing device new in deep learning frame, then carrying out depth by carrier in the embodiment of the present application Parameter between learning framework and container is interactive and the parameter between bottom library module and container interacts, by the way that data are arranged Degree of parallelism and/or the model degree of parallelism, improve the concurrent operation effect in deep learning frame, by counting and obtaining Two parameters improve the monitoring property of concurrent operation performance by obtaining CETS and CET.

Referring to Fig. 4, Fig. 4 is a kind of parameter processing apparatus provided by the embodiments of the present application, applied to as shown in Figure 1A Artificial intelligence chip, as shown in figure 4, this parameter processing apparatus 400 includes:

Writing module 401, for the first parameter to be written in container by the superstratum interface, wherein described first Parameter is used to describe the degree of concurrence of the deep learning frame；

Computing module 402, for obtaining first parameter from the container by the deep learning frame, by institute The data for stating the first parameter and the module of the deep learning frame interact, and obtain the second parameter, and described second is joined Number is transmitted in the container, and second parameter is used to monitor the performance of concurrent operation；

Module 403 is obtained, for obtaining the second parameter from the container by the superstratum interface.

Wherein, the specific descriptions of above-mentioned parameter processing unit are referred to parameter processing side described in step 101-103 The corresponding description of method, details are not described herein.

As it can be seen that container is written in the first parameter by the parameter processing apparatus in the embodiment of the present application, first superstratum interface In, then deep learning frame obtains the first parameter from container, in conjunction with the module parameter of the first parameter and deep learning frame The second parameter is obtained, and the second parameter is transmitted in container, last superstratum interface obtains the second parameter simultaneously from container It is supplied to user.Because the first parameter is used to describe the degree of concurrence of deep learning frame, the second parameter is for monitoring parallel fortune The performance of calculation, therefore this process improves the concurrent operation in deep learning frame by the way that the first parameter is written into container Effect improves the monitoring property of concurrent operation performance by counting and obtaining the second parameter.

In an alternative embodiment, the write module is also used to:

It in the above-described container include parameter data fields, the parameter data fields are for being directed toward the first parameter and the second ginseng Number.

In an alternative embodiment, first parameter includes data parallel degree and model degree of parallelism.

In an alternative embodiment, second parameter is channel extinction time and channel extinction time summation.

In an alternative embodiment, the computing module is specifically used for:

In an alternative embodiment, the deep learning frame is MXNet deep learning frame.

In an alternative embodiment, the deep learning frame further includes carrier, and the computing module is also used to:

In an alternative embodiment, the artificial intelligence chip further includes bottom library module, and the computing module is also For:

In an alternative embodiment, the container includes the primary class or structural body in the deep learning frame, Or the class or structural body independently created in the deep learning frame for the artificial intelligence chip.

The application is also disclosed that a combined treatment device comprising above-mentioned parameter processing apparatus, general interconnecting interface, With other processing units.Parameter processing apparatus is interacted with other processing units, the common operation completing user and specifying.Fig. 5 For the schematic diagram of combined treatment device.

Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its His interface of the processing unit as parameter processing apparatus and external data and control, including data are carried, and are completed to this parameter place Manage the basic control such as unlatching, stopping of device；Other processing units can also cooperate with parameter processing apparatus and complete operation jointly Task.

General interconnecting interface refers to for transmitting data and control between the parameter processing apparatus and other processing units It enables.The parameter processing apparatus obtains required input data from other processing units, and write parameters processing unit on piece is deposited Storage device；Control instruction, the control caching of write parameters processing unit on piece can be obtained from other processing units；It can also be with It reads the data in the memory module of parameter processing apparatus and is transferred to other processing units.

Optionally, the structure as shown in fig. 6, can also include storage device, storage device respectively with the parameter processing Device is connected with other described processing units.Storage device is for being stored in the parameter processing apparatus and other described processing dresses The data set, the data of operation required for being particularly suitable for are in the storage inside of this parameter processing apparatus or other processing units The data that can not all save.

The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.

In some embodiments, a kind of chip has also been applied for comprising above-mentioned parameter processing unit.

In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.

In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.It is mentioned refering to Fig. 7, Fig. 7 A kind of board is supplied, above-mentioned board can also include other matching components, the matching component other than including said chip Including but not limited to: memory device 710, reception device 720 and control device 730；

The memory device 710 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute Stating memory device may include multiple groups storage unit 711.Storage unit described in each group is connect with the chip by bus.It can To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate Synchronous DRAM).

DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group Defeated theoretical bandwidth can reach 25600MB/s.

In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips, Control for data transmission and data storage to each storage unit.

The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server Piece realizes data transfer.Preferably, when using the transmission of 16 interface of PCIE 3.0X, theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute It states interface arrangement and sends back external equipment (such as server).

The control device is electrically connected with the chip.The control device is for supervising the state of the chip Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits Working condition regulation.

In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.

Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.

The vehicles include aircraft, steamer and/or vehicle；The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator；The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application It is necessary.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit, It can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also be realized in the form of software program module.

If the integrated unit is realized in the form of software program module and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English: Random Access Memory, referred to as: RAM), disk or CD etc..

The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas； At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.

Claims

1. a kind of parameter processing method, which is characterized in that be applied to artificial intelligence chip, deployed in the artificial intelligence chip Superstratum interface and deep learning frame include container in the deep learning frame, and the container is for storing parameter Class or structural body, connect with the superstratum interface, which comprises

The superstratum interface injects the first parameter in the container, wherein first parameter is for describing the depth The degree of concurrence of learning framework；

The deep learning frame obtains first parameter from the container, by first parameter and the deep learning The module data of frame interacts, and obtains the second parameter, and second parameter is transmitted in the container, described second Parameter is used to monitor the concurrent operation performance of the deep learning frame of the first parameter description；

The superstratum interface obtains the second parameter from the container.

2. the method according to claim 1, wherein the method also includes:

It include parameter data fields in the container, the parameter data fields are for being directed toward the first parameter and the second parameter.

3. method according to claim 1 or 2, which is characterized in that first parameter includes data parallel degree and model Degree of parallelism.

4. according to the method described in claim 3, it is characterized in that, second parameter includes that channel extinction time and channel disappear Lose temporal summation.

5. according to the method described in claim 4, it is characterized in that, described by first parameter and the deep learning frame Module data interact, obtain the second parameter, comprising:

The module that the data parallel degree is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel degree pair The channel extinction time (CET) and channel extinction time summation (CETS) answered, the CETS and the CET are for Statistical Operator Calculate the time；

The module that the model degree of parallelism is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel degree pair The CET and CETS answered.

6. method according to claim 1-5, which is characterized in that the deep learning frame is MXNet depth Learning framework.

7. method according to claim 1-6, which is characterized in that the deep learning frame further includes carrier, The method also includes:

It carries out the transmitting of the parameter between the container and the module of the deep learning frame by the carrier to interact, the ginseng Number includes the first parameter and the second parameter.

8. the method according to the description of claim 7 is characterized in that the artificial intelligence chip further includes bottom library module, institute State method further include:

It carries out the transmitting of the parameter between the container and the bottom library module by the carrier interact, the parameter includes the One parameter and the second parameter.

9. method according to claim 1-8, which is characterized in that the container includes the deep learning frame In primary class or structural body, or the class that is independently created in the deep learning frame for the artificial intelligence chip or Structural body.

10. a kind of parameter processing apparatus, which is characterized in that be applied to artificial intelligence chip, disposed in the artificial intelligence chip Superstratum interface and deep learning frame include container in the deep learning frame, and the container is to join for storing Several classes or structural body is connect with the superstratum interface, and described device includes:

Writing module, for the first parameter to be written in container by the superstratum interface, wherein first parameter is used In the degree of concurrence for describing the deep learning frame；

Computing module, for obtaining first parameter from the container by the deep learning frame, by described first The data of parameter and the module of the deep learning frame interact, and obtain the second parameter, and second parameter is transmitted Into the container, second parameter is used to monitor the performance of concurrent operation；

11. a kind of electronic device, which is characterized in that including processor, memory, communication interface, and one or more programs, One or more of programs are stored in the memory, and are configured to be executed by the processor, described program packet It includes for executing the instruction such as the step in the described in any item methods of claim 1-9.

12. a kind of computer readable storage medium, which is characterized in that storage is used for the computer program of electronic data interchange, In, the computer program makes computer execute such as the described in any item methods of claim 1-9.