CN109739514A - Parameter processing method and Related product - Google Patents
Parameter processing method and Related product Download PDFInfo
- Publication number
- CN109739514A CN109739514A CN201811570061.6A CN201811570061A CN109739514A CN 109739514 A CN109739514 A CN 109739514A CN 201811570061 A CN201811570061 A CN 201811570061A CN 109739514 A CN109739514 A CN 109739514A
- Authority
- CN
- China
- Prior art keywords
- parameter
- deep learning
- container
- learning frame
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Stored Programmes (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The disclosure provides a kind of parameter processing method and Related product, applied to artificial intelligence chip, superstratum interface and deep learning frame are deployed in artificial intelligence chip, it include container in deep learning frame, container is the class or structural body for storing parameter, it is connect with superstratum interface, wherein method includes: that the first parameter is written in container superstratum interface;Deep learning frame obtains the first parameter from container, and the first parameter and the module data of deep learning frame are interacted, and obtains the second parameter, and the second parameter is transmitted in container;Superstratum interface obtains the second parameter from container.The embodiment of the present application is crossed all and the first parameter is written into container, improves the concurrent operation effect in deep learning frame, by counting and obtaining the second parameter, improves the monitoring property of concurrent operation performance.
Description
Technical field
This disclosure relates to artificial intelligence field, and in particular to a kind of parameter processing method and Related product.
Background technique
With the development of artificial intelligence industry, more and more deep learning frames are redeveloped and are used by everybody.And
In the mating artificial intelligence chip development use process of deep learning frame, it usually needs user sets some parameters to frame to reach
Effect is calculated to better, or obtains the operating status that some parameters in frame carry out monitoring framework.
Deep learning frame is not directed to the relevant parameter setting mechanism of artificial intelligence chip and mode at present, leads to user
The acquisition of parameter setting or chip operation related data can not be carried out for artificial intelligence chip.How this status is changed
Into at urgent problem to be solved.
Disclosure
In view of this, the disclosure is designed to provide a kind of parameter processing method and Related product, by new volume increasing device,
Then will be used to describe deep learning frame degree of concurrence the first parameter write-in container in, then by container the first parameter with
Other modules of deep learning frame combine the second parameter obtained for monitoring concurrent operation performance, improve deep learning frame
Calculating effect, while increasing the monitoring property of concurrent operation performance.
In order to solve the above-mentioned technical problem, first aspect of the embodiment of the present invention provides a kind of parameter processing method, application
In artificial intelligence chip,
Superstratum interface and deep learning frame are deployed in the artificial intelligence chip, in the deep learning frame
Including container, the container is connect with the superstratum interface, which comprises
The superstratum interface injects the first parameter in the container, wherein first parameter is described for describing
The degree of concurrence of deep learning frame;
The deep learning frame obtains first parameter from the container, and by first parameter and the depth
The module data of degree learning framework interacts, and obtains the second parameter, and second parameter is transmitted in the container, institute
The concurrent operation performance of deep learning frame of second parameter for monitoring the first parameter description is stated, the container is to be used for
Store the class or structural body of parameter;
The superstratum interface obtains the second parameter from the container.
In optional situation, before the first parameter is written in container the superstratum interface, the method also includes:
It include parameter data fields in the container, the parameter data fields are for being directed toward the first parameter and the second ginseng
Number.
In optional situation, first parameter includes data parallel degree and model degree of parallelism.
In optional situation, second parameter includes channel extinction time and channel extinction time summation.
It is described to interact first parameter and the module data of the deep learning frame in optional situation, it obtains
Obtain the second parameter, comprising:
The module that the data parallel degree is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel
Corresponding channel extinction time (CET) and channel extinction time summation (CETS) are spent, the CETS and the CET are calculated for counting
The calculating time of son;
The module that the model degree of parallelism is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel
Spend corresponding CET and CETS.
In optional situation, the deep learning frame is MXNet deep learning frame.
In optional situation, the deep learning frame further includes carrier, the method also includes:
It carries out the transmitting of the parameter between the container and the module of the deep learning frame by the carrier to interact, institute
Stating parameter includes the first parameter and the second parameter.
In optional situation, the artificial intelligence chip further includes bottom library module, the method also includes:
It carries out the transmitting of the parameter between the container and the bottom library module by the carrier to interact, the parameter packet
Include the first parameter and the second parameter.
In optional situation, the container includes primary class or structural body in the deep learning frame, or is directed to institute
State the class or structural body that artificial intelligence chip independently creates in the deep learning frame.
Second aspect of the embodiment of the present invention provides a kind of parameter processing apparatus, is applied to artificial intelligence chip, the people
Superstratum interface and deep learning frame are deployed in work intelligent chip, include container in the deep learning frame, it is described
Container is connect with the superstratum interface, and described device includes:
Writing module, for the first parameter to be written in container by the superstratum interface, wherein first ginseng
Count the degree of concurrence for describing the deep learning frame;
Computing module, for obtaining first parameter from the container by the deep learning frame, and by institute
The data for stating the first parameter and the module of the deep learning frame interact, and obtain the second parameter, and described second is joined
Number is transmitted in the container, and second parameter is used to monitor the performance of concurrent operation, and the container is for storing parameter
Class or structural body;
Module is obtained, for obtaining the second parameter from the container by the superstratum interface.
The third aspect of the embodiment of the present invention provides a kind of chip, the parameter processing apparatus provided including second aspect.
Fourth aspect of the embodiment of the present invention provides a kind of chip-packaging structure, which includes the above-mentioned third aspect
The chip;
The 5th aspect of the embodiment of the present invention provides a kind of board, which includes the envelope of chip described in above-mentioned fourth aspect
Assembling structure.
6th aspect, the embodiment of the present application provide a kind of electronic device, which includes above-mentioned fourth aspect institute
Board described in the chip-packaging structure stated or above-mentioned 5th aspect.
The 7th aspect of the embodiment of the present invention provides a kind of storage medium, for storing the calculating for being used for electronic data interchange
Machine program, wherein the computer program makes the instruction of step described in computer execution first aspect either method.
As can be seen that parameter processing method disclosed in the embodiment of the present application, deploys upper layer language in artificial intelligence chip
It says interface and deep learning frame, includes container in deep learning frame, container is connect with superstratum interface, first upper layer language
Say that the first parameter is written in container interface, then deep learning frame obtains the first parameter from container, in conjunction with the first parameter
The second parameter is obtained with the module parameter of deep learning frame, and the second parameter is transmitted in container, last superstratum connects
Mouth obtains the second parameter from container and is supplied to user.Because of the first parameter is used to describe deep learning frame and stroke
Degree, the second parameter is used to monitor the performance of concurrent operation, therefore this process is improved by the way that the first parameter is written into container
Concurrent operation effect in deep learning frame improves supervising for concurrent operation performance by counting and obtaining the second parameter
The property surveyed.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Figure 1A is a kind of artificial intelligence chip provided by the embodiments of the present application.
Figure 1B is a kind of parameter processing method flow diagram provided by the embodiments of the present application.
Fig. 2 is another parameter processing method flow diagram provided by the embodiments of the present application.
Fig. 3 is another parameter processing method flow diagram provided by the embodiments of the present application.
Fig. 4 is a kind of parameter processing apparatus provided by the embodiments of the present application.
Fig. 5 is a kind of schematic diagram of combined treatment device provided by the embodiments of the present application.
Fig. 6 is the structure chart of another combined treatment device provided by the embodiments of the present application.
Fig. 7 is a kind of structural schematic diagram of board provided by the embodiments of the present application.
Specific embodiment
For the purposes, technical schemes and advantages of the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference
Attached drawing is described in further detail the disclosure.
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall in the protection scope of this application.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments.
Figure 1A is please referred to, Figure 1A is a kind of artificial intelligence chip provided by the embodiments of the present application, as shown in Figure 1A, artificial intelligence
Energy chip 10 includes superstratum interface 101 and deep learning frame 100, and superstratum interface is deep for accessing programming language
The module in learning framework including container and other deep learning frames is spent, container can be carried out with the module of deep learning frame
Data interaction, the module of deep learning frame include graph executor module, each operator module and engine mould
Block etc..Optionally, superstratum interface 101 can also be deployed on other chips or device, other chips or device and artificial
Intelligent chip connection, also can be carried out information exchange between the two.In addition, artificial intelligence chip 10 also may include bottom library module
102, bottom library module includes bottom run-time library and drive module etc..Further include carrier in deep learning frame 100, for into
Data transmitting between row container and other modules of deep learning frame or bottom library module.
Figure 1B is please referred to, Figure 1B is a kind of parameter processing method flow diagram disclosed in application embodiment, this parameter place
Reason method is applied to artificial intelligence chip as shown in Figure 1A, and as shown in Figure 1B, this method specifically comprises the following steps:
111, the first parameter is written in container the superstratum interface, wherein first parameter is described for describing
The degree of concurrence of deep learning frame.
Deep learning frame is the code skeleton for carrying out deep learning project, currently a popular deep learning frame packet
Include Tensorflow, Caffe, Theano, MXNet, Torch and PyTorch etc..Interface be in system two independent components into
The Border of row information exchange.Superstratum and deep learning frame are two individual components, therefore exist between them and connect
Mouthful, for carrying out information exchange.Superstratum such as Python, R language etc. can be used in deep learning, regular situation
Under, superstratum interface is directly connected to deep learning frame.But lack relevant parameter setting mechanism in this interface,
So that ginseng setting can not be carried out to artificial intelligence chip by user and parameter obtains, it is therefore, new in the lower layer of superstratum interface
Volume increasing device, for carrying out the acquisition of parameter setting and related data.It is obtained for carrying out parameter setting and parameter in a reservoir
Parameter data fields can increase newly in a reservoir, can also be newly-increased in other modules, and parameter setting and parameter is then specified to obtain
Position be container position.
Container is the class or structural body for storing data, belongs to a module in deep learning frame.Deep learning
Container in frame can be primary class or structural body in deep learning frame, then increases newly and is used in such or structural body
Carry out the field that parameter setting and parameter obtain, such as graphexecutor class.Alternatively, the container in deep learning frame
Can be user is the class or structure that the parameter processing method in artificial intelligent chip independently creates, such as mludevice equipment
Class is individually used for carrying out the field that parameter setting and parameter obtain.
Optionally, this method further include: include parameter data fields in the container, the parameter data fields are for referring to
To the first parameter and the second parameter.
Specifically, before creating parameter data fields in a reservoir, not about the first ginseng in entire artificial intelligence chip
Several and the second parameter data field, therefore also can not just carry out the setting of the first parameter and the acquisition of the second parameter.In container
Middle creation is related to the parameter data fields of the first parameter and the second parameter, is used to indicate the acquisition side of the first parameter and the second parameter
Formula, with the interactive mode of other modules or interface and data storage location etc., be also convenient for the first parameter and the second parameter into
Row management.Alternatively, it is also possible to create parameter data fields in other position, but data storage is carried out by container.
Optionally, the first parameter includes data parallel degree and model degree of parallelism.
Optionally, the deep learning frame in the embodiment is MXNet deep learning frame.
Data parallel (data parallelism, DP) refers to that different kernels or processing unit locate data parallel
Reason, data parallel degree refers to when carrying out parallel processing to data, the maximum number that executes parallel;Parallel (the Model of model
Parallelism, MP) refer to that an operator or model carry out parallel processing on multiple kernels, model degree of parallelism refers to mould
When type or operator carry out parallel processing, the maximum number that executes parallel.When MXNet deep learning frame is on artificial intelligence chip
When operation, operand is huge, in order to reduce operation time, improves operation efficiency, needs using DP or MP, or uses two simultaneously
Kind concurrent operation.And in order to reach better operation effect, it needs to be configured data degree of parallelism and model degree of parallelism, a side
Face will enable the degree of parallelism parameter of setting want to match with the hardware foundation of artificial intelligence chip, on the other hand, work as input data
Scale, degree of rarefication or when other feature differences, it is also desirable to different degree of parallelism parameters is set.By the data parallel degree of setting
And/or model degree of parallelism is written by programming language, is then injected in container by superstratum interface, that is, is completed the first parameter
Setting.
MXNet is a deep learning frame, support C++, Python, R, Scala, Julia, Matlab and
The language such as JavaScript support order and symbol programming, may operate on any hardware including artificial intelligence chip, be
One of current classic deep learning frame.Therefore can be implemented well with the application using MXNet deep learning frame
The method of example combines, and completes the setting of the first parameter and the acquisition of the second parameter.
112, the deep learning frame obtains first parameter from the container, by first parameter with it is described
The module data of deep learning frame interacts, and obtains the second parameter, and second parameter is transmitted in the container,
Second parameter is used to monitor the performance of the concurrent operation of the deep learning frame of the first parameter description.
After first parameter setting is completed and injected in container, the module of deep learning frame obtains the first ginseng from container
Number, the module of deep learning frame includes graph executor module, each operator module and engine module etc..Such as
Each operator module carries out concurrent operation if necessary, then needs to obtain the first parameter, then according to the first parameter coalescing operator
Other parameters in module, such as data size etc. can be obtained the second parameter, and the second parameter is for monitoring concurrent operation
The parameter of energy, the second parameter of acquisition need to pass back in container.
Optionally, the second parameter includes channel extinction time and channel extinction time summation.
Optionally, the first parameter and the module data of deep learning frame are interacted, obtain the second parameter, comprising:
The module that data parallel degree is transmitted to deep learning frame is subjected to data interaction, the corresponding channel of data parallel degree is obtained and disappears
Time (CET) and channel extinction time summation (CETS);The module that model degree of parallelism is transmitted to deep learning frame is counted
According to interaction, the corresponding CET and CETS of data parallel degree is obtained, wherein CETS and CET is used for the calculating time of Statistical Operator.
Specifically, when deep learning frame uses DP or MP, there are multiple parallel channels, channel extinction time
(Channel Elapsed Time, CET) and channel extinction time summation (Channel Elapsed Time Sum, CETS),
It is all the performance parameter that concurrent operation is carried out for describing multiple parallel channels, the calculating time for Statistical Operator.By basis
The the second parameter transmitting for the individual module or entire depth learning framework that first parameter and the module of deep learning frame obtain
Into container, that is, complete the acquisition of the second parameter.
113, the superstratum interface obtains the second parameter from the container.
Superstratum interface can obtain the second parameter with container from container and be exposed, then the second parameter for
It is visible for user, user can be by the operational performance of the second parameter monitoring deep learning frame, and then can pass through
It modifies the first parameter or other parameters and the second parameter is adjusted or is improved, promote the operation effect of deep learning frame.
Optionally, deep learning frame further includes carrier, this method further include: the module of container and deep learning frame is logical
It crosses carrier and carries out data transmitting interaction.
Carrier is the class or structural body for being used to carry out data transmitting interaction in deep learning frame, container and deep learning
Other modules are not directly linked, and can carry out data transmitting by carrier.Such as the carrier in MXNet frame can be operator
Context class OpContext, container, can be by the first parameter assignment to carrier after injecting the first parameter, and carrier is again by
One parameter passes to the module of deep learning frame.Likewise, the second parameter can also be by carrier from the mould of deep learning frame
Block is transmitted to container.
Optionally, artificial intelligence chip further includes bottom library module, this method further include: by described in carrier progress
Parameter transmitting between container and the bottom library module interacts, and the parameter includes the first parameter and the second parameter.
Specifically, bottom library module includes bottom run-time library and drive module etc., and the parameter in these bottom libraries may also
The parallel performance or other performances of deep learning frame are influenced, therefore container can also be carried out by carrier and bottom library module
Data interaction, to obtain concurrent operation performance parameter or other performance parameters.
As it can be seen that in the embodiment of the present application, superstratum interface and deep learning frame are deployed in artificial intelligence chip,
It include container in deep learning frame, container is connect with superstratum interface, and the first parameter is written superstratum interface first
In container, then deep learning frame obtains the first parameter from container, in conjunction with the module of the first parameter and deep learning frame
The second parameter of gain of parameter, and the second parameter is transmitted in container, last superstratum interface obtains the second ginseng from container
It counts and is supplied to user.Because the first parameter is used to describe the degree of concurrence of deep learning frame, the second parameter is for monitoring simultaneously
The performance of row operation, therefore this process is improved parallel in deep learning frame by the way that the first parameter is written into container
Operation effect improves the monitoring property of concurrent operation performance by counting and obtaining the second parameter.
Consistent with the above, referring to Fig. 2, Fig. 2 is another parameter processing method process provided by the embodiments of the present application
Schematic diagram, as shown in Fig. 2, the parameter processing method includes:
201, the relevant parameter data fields of artificial intelligent chip are created in a reservoir, and the parameter data fields are related to
One parameter and the second parameter;
202, superstratum interface injects first parameter in the container, wherein first parameter is for describing
The degree of concurrence of the deep learning frame;
203, the deep learning frame further includes carrier, and the deep learning frame obtains described from the container
One parameter is interacted first parameter and the module data of deep learning frame by the carrier, obtains the second ginseng
Number;
204, second parameter is transmitted in the container by the deep learning frame by the carrier, and described
Two parameters are used to monitor the performance of concurrent operation;
205, artificial intelligence chip further includes bottom library module, and the container and the bottom library module pass through the carrier
The transmitting interaction of parameter is carried out, the parameter includes the first parameter and the second parameter.
Wherein, the specific descriptions of above-mentioned steps 201- step 205 are referred to parameter processing described in step 101-103
The corresponding description of method, details are not described herein.
It can be seen that, by volume increasing device new in deep learning frame, then carrying out depth by carrier in the embodiment of the present application
Parameter between learning framework and container is interactive and the parameter between bottom library module and container interacts, because of the first parameter
For describing the degree of concurrence of deep learning frame, the second parameter is used to monitor the performance of concurrent operation, therefore this process is logical
It crosses and the first parameter is written into container, the concurrent operation effect in deep learning frame is improved, by counting and obtaining second
Parameter improves the monitoring property of concurrent operation performance.
Consistent with the above, referring to Fig. 3, Fig. 3 is another parameter processing method process provided by the embodiments of the present application
Schematic diagram, as shown in figure 3, the parameter processing method includes:
301, data parallel degree is set, the data parallel degree is used to describe the different piece of different kernel processes data
When, the maximum number that executes parallel;
302, setting model degree of parallelism, the model degree of parallelism are enterprising in multiple kernels for describing an operator or model
When row operation, the maximum number that executes parallel;
303, the data parallel degree and/or the model degree of parallelism are injected by the appearance by the superstratum interface
In device;
304, the module that the data parallel degree is transmitted to deep learning frame is subjected to data interaction, obtains the data
Degree of parallelism corresponding CET and CETS, the CETS and the CET are used for the calculating time of Statistical Operator;
305, the module that the model degree of parallelism is transmitted to deep learning frame is subjected to data interaction, obtains the data
Degree of parallelism corresponding CET and CETS;
306, the data parallel degree and/or the corresponding CETS and CET of the model degree of parallelism are transmitted to the container
In;
307, the superstratum interface obtains the data parallel degree and/or the model degree of parallelism from the container
Corresponding CETS and CET.
Wherein, the specific descriptions of above-mentioned steps 301- step 307 are referred to parameter processing described in step 101-103
The corresponding description of method, details are not described herein.
It can be seen that, by volume increasing device new in deep learning frame, then carrying out depth by carrier in the embodiment of the present application
Parameter between learning framework and container is interactive and the parameter between bottom library module and container interacts, by the way that data are arranged
Degree of parallelism and/or the model degree of parallelism, improve the concurrent operation effect in deep learning frame, by counting and obtaining
Two parameters improve the monitoring property of concurrent operation performance by obtaining CETS and CET.
Referring to Fig. 4, Fig. 4 is a kind of parameter processing apparatus provided by the embodiments of the present application, applied to as shown in Figure 1A
Artificial intelligence chip, as shown in figure 4, this parameter processing apparatus 400 includes:
Writing module 401, for the first parameter to be written in container by the superstratum interface, wherein described first
Parameter is used to describe the degree of concurrence of the deep learning frame;
Computing module 402, for obtaining first parameter from the container by the deep learning frame, by institute
The data for stating the first parameter and the module of the deep learning frame interact, and obtain the second parameter, and described second is joined
Number is transmitted in the container, and second parameter is used to monitor the performance of concurrent operation;
Module 403 is obtained, for obtaining the second parameter from the container by the superstratum interface.
Wherein, the specific descriptions of above-mentioned parameter processing unit are referred to parameter processing side described in step 101-103
The corresponding description of method, details are not described herein.
As it can be seen that container is written in the first parameter by the parameter processing apparatus in the embodiment of the present application, first superstratum interface
In, then deep learning frame obtains the first parameter from container, in conjunction with the module parameter of the first parameter and deep learning frame
The second parameter is obtained, and the second parameter is transmitted in container, last superstratum interface obtains the second parameter simultaneously from container
It is supplied to user.Because the first parameter is used to describe the degree of concurrence of deep learning frame, the second parameter is for monitoring parallel fortune
The performance of calculation, therefore this process improves the concurrent operation in deep learning frame by the way that the first parameter is written into container
Effect improves the monitoring property of concurrent operation performance by counting and obtaining the second parameter.
In an alternative embodiment, the write module is also used to:
It in the above-described container include parameter data fields, the parameter data fields are for being directed toward the first parameter and the second ginseng
Number.
In an alternative embodiment, first parameter includes data parallel degree and model degree of parallelism.
In an alternative embodiment, second parameter is channel extinction time and channel extinction time summation.
In an alternative embodiment, the computing module is specifically used for:
The module that the data parallel degree is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel
Corresponding channel extinction time (CET) and channel extinction time summation (CETS) are spent, the CETS and the CET are calculated for counting
The calculating time of son;
The module that the model degree of parallelism is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel
Spend corresponding CET and CETS.
In an alternative embodiment, the deep learning frame is MXNet deep learning frame.
In an alternative embodiment, the deep learning frame further includes carrier, and the computing module is also used to:
It carries out the transmitting of the parameter between the container and the module of the deep learning frame by the carrier to interact, institute
Stating parameter includes the first parameter and the second parameter.
In an alternative embodiment, the artificial intelligence chip further includes bottom library module, and the computing module is also
For:
It carries out the transmitting of the parameter between the container and the bottom library module by the carrier to interact, the parameter packet
Include the first parameter and the second parameter.
In an alternative embodiment, the container includes the primary class or structural body in the deep learning frame,
Or the class or structural body independently created in the deep learning frame for the artificial intelligence chip.
The application is also disclosed that a combined treatment device comprising above-mentioned parameter processing apparatus, general interconnecting interface,
With other processing units.Parameter processing apparatus is interacted with other processing units, the common operation completing user and specifying.Fig. 5
For the schematic diagram of combined treatment device.
Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special
With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its
His interface of the processing unit as parameter processing apparatus and external data and control, including data are carried, and are completed to this parameter place
Manage the basic control such as unlatching, stopping of device;Other processing units can also cooperate with parameter processing apparatus and complete operation jointly
Task.
General interconnecting interface refers to for transmitting data and control between the parameter processing apparatus and other processing units
It enables.The parameter processing apparatus obtains required input data from other processing units, and write parameters processing unit on piece is deposited
Storage device;Control instruction, the control caching of write parameters processing unit on piece can be obtained from other processing units;It can also be with
It reads the data in the memory module of parameter processing apparatus and is transferred to other processing units.
Optionally, the structure as shown in fig. 6, can also include storage device, storage device respectively with the parameter processing
Device is connected with other described processing units.Storage device is for being stored in the parameter processing apparatus and other described processing dresses
The data set, the data of operation required for being particularly suitable for are in the storage inside of this parameter processing apparatus or other processing units
The data that can not all save.
The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment
The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment
The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard,
Network interface card, wifi interface.
In some embodiments, a kind of chip has also been applied for comprising above-mentioned parameter processing unit.
In some embodiments, a kind of chip-packaging structure has been applied for comprising said chip.
In some embodiments, a kind of board has been applied for comprising said chip encapsulating structure.It is mentioned refering to Fig. 7, Fig. 7
A kind of board is supplied, above-mentioned board can also include other matching components, the matching component other than including said chip
Including but not limited to: memory device 710, reception device 720 and control device 730;
The memory device 710 is connect with the chip in the chip-packaging structure by bus, for storing data.Institute
Stating memory device may include multiple groups storage unit 711.Storage unit described in each group is connect with the chip by bus.It can
To understand, storage unit described in each group can be DDR SDRAM (English: Double Data Rate SDRAM, Double Data Rate
Synchronous DRAM).
DDR, which does not need raising clock frequency, can double to improve the speed of SDRAM.DDR allows the rising in clock pulses
Edge and failing edge read data.The speed of DDR is twice of standard SDRAM.In one embodiment, the storage device can be with
Including storage unit described in 4 groups.Storage unit described in each group may include multiple DDR4 particles (chip).In one embodiment
In, the chip interior may include 4 72 DDR4 controllers, and 64bit is used for transmission number in above-mentioned 72 DDR4 controllers
According to 8bit is used for ECC check.It is appreciated that data pass when using DDR4-3200 particle in the storage unit described in each group
Defeated theoretical bandwidth can reach 25600MB/s.
In one embodiment, storage unit described in each group include multiple Double Data Rate synchronous dynamics being arranged in parallel with
Machine memory.DDR can transmit data twice within a clock cycle.The controller of setting control DDR in the chips,
Control for data transmission and data storage to each storage unit.
The interface arrangement is electrically connected with the chip in the chip-packaging structure.The interface arrangement is for realizing described
Data transmission between chip and external equipment (such as server or computer).Such as in one embodiment, the interface
Device can be standard PCIE interface.For example, data to be processed are transferred to the core by standard PCIE interface by server
Piece realizes data transfer.Preferably, when using the transmission of 16 interface of PCIE 3.0X, theoretical bandwidth can reach 16000MB/s.
In another embodiment, the interface arrangement can also be other interfaces, and the application is not intended to limit above-mentioned other interfaces
Specific manifestation form, the interface unit can be realized signaling transfer point.In addition, the calculated result of the chip is still by institute
It states interface arrangement and sends back external equipment (such as server).
The control device is electrically connected with the chip.The control device is for supervising the state of the chip
Control.Specifically, the chip can be electrically connected with the control device by SPI interface.The control device may include list
Piece machine (Micro Controller Unit, MCU).If the chip may include multiple processing chips, multiple processing cores or more
A processing circuit can drive multiple loads.Therefore, the chip may be at the different work shape such as multi-load and light load
State.It may be implemented by the control device to processing chips multiple in the chip, multiple processing and/or multiple processing circuits
Working condition regulation.
In some embodiments, a kind of electronic equipment has been applied for comprising above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projector, hand
Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily the application
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of
Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit,
It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also be realized in the form of software program module.
If the integrated unit is realized in the form of software program module and sells or use as independent product
When, it can store in a computer-readable access to memory.Based on this understanding, the technical solution of the application substantially or
Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products
Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment
(can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the application
Step.And memory above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory
The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can store in a computer-readable memory, memory
May include: flash disk, read-only memory (English: Read-Only Memory, referred to as: ROM), random access device (English:
Random Access Memory, referred to as: RAM), disk or CD etc..
The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and
Embodiment is expounded, the description of the example is only used to help understand the method for the present application and its core ideas;
At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of the application
There is change place, in conclusion the contents of this specification should not be construed as limiting the present application.
Claims (12)
1. a kind of parameter processing method, which is characterized in that be applied to artificial intelligence chip, deployed in the artificial intelligence chip
Superstratum interface and deep learning frame include container in the deep learning frame, and the container is for storing parameter
Class or structural body, connect with the superstratum interface, which comprises
The superstratum interface injects the first parameter in the container, wherein first parameter is for describing the depth
The degree of concurrence of learning framework;
The deep learning frame obtains first parameter from the container, by first parameter and the deep learning
The module data of frame interacts, and obtains the second parameter, and second parameter is transmitted in the container, described second
Parameter is used to monitor the concurrent operation performance of the deep learning frame of the first parameter description;
The superstratum interface obtains the second parameter from the container.
2. the method according to claim 1, wherein the method also includes:
It include parameter data fields in the container, the parameter data fields are for being directed toward the first parameter and the second parameter.
3. method according to claim 1 or 2, which is characterized in that first parameter includes data parallel degree and model
Degree of parallelism.
4. according to the method described in claim 3, it is characterized in that, second parameter includes that channel extinction time and channel disappear
Lose temporal summation.
5. according to the method described in claim 4, it is characterized in that, described by first parameter and the deep learning frame
Module data interact, obtain the second parameter, comprising:
The module that the data parallel degree is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel degree pair
The channel extinction time (CET) and channel extinction time summation (CETS) answered, the CETS and the CET are for Statistical Operator
Calculate the time;
The module that the model degree of parallelism is transmitted to deep learning frame is subjected to data interaction, obtains the data parallel degree pair
The CET and CETS answered.
6. method according to claim 1-5, which is characterized in that the deep learning frame is MXNet depth
Learning framework.
7. method according to claim 1-6, which is characterized in that the deep learning frame further includes carrier,
The method also includes:
It carries out the transmitting of the parameter between the container and the module of the deep learning frame by the carrier to interact, the ginseng
Number includes the first parameter and the second parameter.
8. the method according to the description of claim 7 is characterized in that the artificial intelligence chip further includes bottom library module, institute
State method further include:
It carries out the transmitting of the parameter between the container and the bottom library module by the carrier interact, the parameter includes the
One parameter and the second parameter.
9. method according to claim 1-8, which is characterized in that the container includes the deep learning frame
In primary class or structural body, or the class that is independently created in the deep learning frame for the artificial intelligence chip or
Structural body.
10. a kind of parameter processing apparatus, which is characterized in that be applied to artificial intelligence chip, disposed in the artificial intelligence chip
Superstratum interface and deep learning frame include container in the deep learning frame, and the container is to join for storing
Several classes or structural body is connect with the superstratum interface, and described device includes:
Writing module, for the first parameter to be written in container by the superstratum interface, wherein first parameter is used
In the degree of concurrence for describing the deep learning frame;
Computing module, for obtaining first parameter from the container by the deep learning frame, by described first
The data of parameter and the module of the deep learning frame interact, and obtain the second parameter, and second parameter is transmitted
Into the container, second parameter is used to monitor the performance of concurrent operation;
Module is obtained, for obtaining the second parameter from the container by the superstratum interface.
11. a kind of electronic device, which is characterized in that including processor, memory, communication interface, and one or more programs,
One or more of programs are stored in the memory, and are configured to be executed by the processor, described program packet
It includes for executing the instruction such as the step in the described in any item methods of claim 1-9.
12. a kind of computer readable storage medium, which is characterized in that storage is used for the computer program of electronic data interchange,
In, the computer program makes computer execute such as the described in any item methods of claim 1-9.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811570061.6A CN109739514B (en) | 2018-12-21 | 2018-12-21 | Parameter processing method and related product |
PCT/CN2019/087631 WO2020124948A1 (en) | 2018-12-21 | 2019-05-20 | Network offline model processing method, artificial intelligence processing device, and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811570061.6A CN109739514B (en) | 2018-12-21 | 2018-12-21 | Parameter processing method and related product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109739514A true CN109739514A (en) | 2019-05-10 |
CN109739514B CN109739514B (en) | 2021-03-02 |
Family
ID=66360837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811570061.6A Active CN109739514B (en) | 2018-12-21 | 2018-12-21 | Parameter processing method and related product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109739514B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112114931A (en) * | 2019-06-21 | 2020-12-22 | 鸿富锦精密电子(天津)有限公司 | Deep learning program configuration method and device, electronic equipment and storage medium |
CN112860424A (en) * | 2019-11-28 | 2021-05-28 | 上海商汤智能科技有限公司 | Task processing method and system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156851A (en) * | 2016-06-24 | 2016-11-23 | 科大讯飞股份有限公司 | The accelerator pursued one's vocational study towards the degree of depth and method |
CN107480789A (en) * | 2017-08-07 | 2017-12-15 | 北京中星微电子有限公司 | The efficient conversion method and device of a kind of deep learning model |
CN107844371A (en) * | 2017-10-12 | 2018-03-27 | 北京京东尚科信息技术有限公司 | Task processing method, system and electronic equipment |
CN108229258A (en) * | 2016-12-21 | 2018-06-29 | 田文洪 | A kind of face parallelism recognition method based on deep learning and Spark |
CN108921210A (en) * | 2018-06-26 | 2018-11-30 | 南京信息工程大学 | A kind of cloud classification method based on convolutional neural networks |
CN109032671A (en) * | 2018-06-25 | 2018-12-18 | 电子科技大学 | A kind of distributed deep learning method and system based on data parallel strategy |
CN109034386A (en) * | 2018-06-26 | 2018-12-18 | 中国科学院计算机网络信息中心 | A kind of deep learning system and method based on Resource Scheduler |
US20180365562A1 (en) * | 2017-06-20 | 2018-12-20 | Battelle Memorial Institute | Prediction of social media postings as trusted news or as types of suspicious news |
CN110110621A (en) * | 2019-04-23 | 2019-08-09 | 安徽大学 | The oblique photograph point cloud classifications method of deep learning model is integrated based on multiple features |
-
2018
- 2018-12-21 CN CN201811570061.6A patent/CN109739514B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156851A (en) * | 2016-06-24 | 2016-11-23 | 科大讯飞股份有限公司 | The accelerator pursued one's vocational study towards the degree of depth and method |
CN108229258A (en) * | 2016-12-21 | 2018-06-29 | 田文洪 | A kind of face parallelism recognition method based on deep learning and Spark |
US20180365562A1 (en) * | 2017-06-20 | 2018-12-20 | Battelle Memorial Institute | Prediction of social media postings as trusted news or as types of suspicious news |
CN107480789A (en) * | 2017-08-07 | 2017-12-15 | 北京中星微电子有限公司 | The efficient conversion method and device of a kind of deep learning model |
CN107844371A (en) * | 2017-10-12 | 2018-03-27 | 北京京东尚科信息技术有限公司 | Task processing method, system and electronic equipment |
CN109032671A (en) * | 2018-06-25 | 2018-12-18 | 电子科技大学 | A kind of distributed deep learning method and system based on data parallel strategy |
CN108921210A (en) * | 2018-06-26 | 2018-11-30 | 南京信息工程大学 | A kind of cloud classification method based on convolutional neural networks |
CN109034386A (en) * | 2018-06-26 | 2018-12-18 | 中国科学院计算机网络信息中心 | A kind of deep learning system and method based on Resource Scheduler |
CN110110621A (en) * | 2019-04-23 | 2019-08-09 | 安徽大学 | The oblique photograph point cloud classifications method of deep learning model is integrated based on multiple features |
Non-Patent Citations (2)
Title |
---|
PANDSU: "深度学习 模型训练超参数调整总结", 《HTTPS://BLOG.CSDN.NET/M0_37167788/ARTICLE/DETAILS/84059452?UTM_MEDIUM=DISTRIBUTE.PC_AGGPAGE_SEARCH_RESULT.NONE-TASK-BLOG-2~ALL~FIRST_RANK_V2~RANK_V25-3-84059452.NONECASE&UTM_TERM》 * |
杨楠: "基于Caffe深度学习框架的卷积神经网络研究", 《中国优秀硕士论文全文数据库 信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112114931A (en) * | 2019-06-21 | 2020-12-22 | 鸿富锦精密电子(天津)有限公司 | Deep learning program configuration method and device, electronic equipment and storage medium |
CN112114931B (en) * | 2019-06-21 | 2023-12-26 | 富联精密电子(天津)有限公司 | Deep learning program configuration method and device, electronic equipment and storage medium |
CN112860424A (en) * | 2019-11-28 | 2021-05-28 | 上海商汤智能科技有限公司 | Task processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109739514B (en) | 2021-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740751A (en) | The framework fusion method and relevant apparatus of neural network model | |
CN106951926B (en) | Deep learning method and device of hybrid architecture | |
CN109284815A (en) | Neural network model algorithm Compilation Method, device and Related product | |
CN109543825A (en) | Neural network model algorithm Compilation Method, device and Related product | |
CN109739514A (en) | Parameter processing method and Related product | |
CN110147249A (en) | A kind of calculation method and device of network model | |
CN110163349A (en) | A kind of calculation method and device of network model | |
CN110059809A (en) | A kind of computing device and Related product | |
CN109740746A (en) | Operation method, device and Related product | |
CN111813449A (en) | Operation method, device and related product | |
CN109740730A (en) | Operation method, device and Related product | |
CN111949318B (en) | Instruction processing method and device and related products | |
CN111382856B (en) | Data processing device, method, chip and electronic equipment | |
CN111047030A (en) | Operation method, operation device, computer equipment and storage medium | |
CN111124497B (en) | Operation method, operation device, computer equipment and storage medium | |
CN112396169B (en) | Operation method, device, computer equipment and storage medium | |
CN112396186B (en) | Execution method, execution device and related product | |
CN111339060B (en) | Operation method, device, computer equipment and storage medium | |
CN219304904U (en) | Three-dimensional vision development board and vision perception equipment | |
CN111062469B (en) | Computing device and related product | |
CN109543835A (en) | Operation method, device and Related product | |
CN111723920A (en) | Artificial intelligence computing device and related products | |
CN110020720A (en) | Operator joining method and device | |
CN112394985B (en) | Execution method, execution device and related product | |
US11983535B2 (en) | Artificial intelligence computing device and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant after: Zhongke Cambrian Technology Co., Ltd Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |