CN116069340A

CN116069340A - Automatic driving model deployment method, device, equipment and storage medium

Info

Publication number: CN116069340A
Application number: CN202211114501.3A
Authority: CN
Inventors: 罗文彬; 李梦真; 苗迪
Original assignee: Guoke Chushi Chongqing Software Co ltd
Current assignee: Guoke Chushi Chongqing Software Co ltd
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2023-05-05

Abstract

The disclosure relates to an automatic driving model deployment method, device, equipment and storage medium, wherein a search space is constructed by acquiring a configuration file of a hardware platform to be deployed, and the similarity of the algorithm model to be deployed and a corresponding correlation operator in the search space is determined according to calculation nodes of the algorithm model to be deployed; determining candidate operators according to the similarity; and selecting candidate operators corresponding to the computing nodes, and combining according to the computing node relation graphs to obtain a candidate deployment algorithm model. Aiming at a hardware platform needing to deploy an automatic driving algorithm model, the automatic driving algorithm model adapting to the hardware platform can be automatically output according to the hardware information of the hardware platform, the problems that the professional capability and experience requirements of the algorithm deployment engineer are high, the model deployment efficiency is low and the period is long in the current model deployment work are solved, the automatic driving algorithm model deployment workload in the whole vehicle research and development process of a host factory is greatly reduced, and the model deployment efficiency is improved.

Description

Automatic driving model deployment method, device, equipment and storage medium

Technical Field

The disclosure relates to the field of autopilot, and in particular relates to an autopilot model deployment method, an autopilot model deployment device, autopilot model deployment equipment and a storage medium.

Background

In the related art, vehicles, which are originally used as transportation or transportation means, are gradually moving toward intelligentization, and concepts such as "intelligent cabins", "autopilots" are sequentially proposed and are well known to more and more public. In the past, when consumers purchase vehicles, the factors are often considered in terms of hardware level of the vehicles, such as power, chassis, suspension, electric control system, comfort and the like, and the intelligent level of the current vehicle is also an important factor considered when consumers purchase the vehicles. Therefore, in order to better meet the demands of consumers, the vehicle host manufacturers are laying out self-home automatic driving technologies to obtain consumer acceptance, so as to promote the competitiveness and market share of the products.

When developing the automatic driving scheme of the whole vehicle, the host manufacturer usually selects different hardware according to own requirements (such as a supply chain, cost performance and the like) as a hardware platform for deploying an automatic driving algorithm. Because the architecture and hardware conditions of the hardware platform are different, the supported algorithm models may be different, so that in order to enable the automatic driving algorithm model to well adapt to hardware and obtain a good deployment effect, an algorithm deployment engineer is required to be skilled in grasping the characteristics and advantages of the hardware platform, and then the algorithm model is optimized in a targeted manner and deployed into a specific hardware platform. This is very time and effort consuming and model deployment is inefficient. Therefore, a scheme for automatically adapting an autopilot algorithm model to a corresponding hardware platform aiming at different hardware platforms is needed, so that development difficulty is reduced, and model deployment efficiency is improved.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides an automatic driving model deployment method, apparatus, device, and storage medium.

According to a first aspect of an embodiment of the present disclosure, there is provided an autopilot model deployment method, including:

acquiring a configuration file of a hardware platform to be deployed, wherein the configuration file comprises hardware information;

based on the hardware information, acquiring at least one matched correlation operator; constructing a search space based on the correlation operator;

acquiring a calculation node relation graph of an algorithm model to be deployed; the algorithm model to be deployed is formed by combining a plurality of computing nodes and is used for realizing the algorithm model for automatic driving of the vehicle;

determining the similarity of the computing nodes and corresponding correlation operators in the search space aiming at the computing nodes in each algorithm model to be deployed;

selecting the correlation operator with similarity meeting a preset condition as a candidate operator of a corresponding computing node;

and selecting candidate operators corresponding to the computing nodes, and combining according to the computing node relation graph of the algorithm model to be deployed to obtain at least one candidate deployment algorithm model.

Optionally, the automatic driving model deployment method further includes:

after at least one candidate deployment algorithm model is obtained, testing each candidate deployment algorithm model by utilizing a test data set to obtain a test result, and determining a target deployment algorithm model according to the test result.

Optionally, the determining the target deployment algorithm model according to the test result includes:

the test result comprises a test index value of the candidate deployment algorithm model;

screening at least one candidate deployment algorithm model meeting the preset deployment requirement as the target deployment algorithm model based on the test index value and the expected target value of each candidate deployment algorithm model.

Optionally, the test index value includes at least one of an accuracy rate, a frame rate, a time delay, and a power consumption.

Optionally, the hardware information includes first parameter information;

the at least one matched correlation operator is obtained based on the hardware information; constructing a search space based on the correlation operator includes:

and matching a first correlation operator according to the first parameter information, and constructing the search space based on the matched first correlation operator set.

Optionally, the first parameter information includes at least one of architecture type, dominant frequency, number of cores, instruction set.

Optionally, the hardware information further includes second parameter information, and the model deployment method further includes:

and matching a second correlation operator according to the second parameter information, and taking an intersection between a second correlation operator set and the first correlation operator set to construct the search space.

Optionally, the determining the similarity of the computing node and the corresponding correlation operator in the search space includes:

the first parameter of the computing node corresponds to the second parameter of the correlation operator;

and calculating the similarity between the first parameter and the second parameter as the similarity between the computing node and the correlation operator.

According to a second aspect of embodiments of the present disclosure, there is provided an autopilot model deployment apparatus comprising:

the first acquisition module is used for acquiring a configuration file of the hardware platform to be deployed, wherein the configuration file comprises hardware information;

the association searching module is used for acquiring at least one matched association operator based on the hardware information and constructing a searching space based on the association operator;

The second acquisition module is used for acquiring a calculation node relation map of the algorithm model to be deployed; the algorithm model to be deployed is formed by combining a plurality of computing nodes and is used for realizing the algorithm model for automatic driving of the vehicle;

the computing and screening module is used for determining the similarity of the computing nodes and corresponding correlation operators in the search space for the computing nodes of each algorithm model to be deployed; selecting the correlation operator with similarity meeting a preset condition as a candidate operator of a corresponding computing node;

the model generation module is used for selecting candidate operators corresponding to the computing nodes, and combining according to the computing node relation graphs of the algorithm model to be deployed to obtain at least one candidate deployment algorithm model.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor, a memory, and a communication bus;

the communication bus is used for realizing connection communication between the processor and the memory;

the processor is configured to execute one or more programs stored in the memory to implement the steps of the autopilot model deployment method provided in the first aspect of the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing one or more programs executable by one or more processors to implement the steps of the method for deployment of an autopilot model provided in the first aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: because of the differences of the architecture, instruction set and other hardware aspects among different hardware platforms, the scheme can automatically output and obtain the automatic driving algorithm model on the adaptive corresponding hardware platform based on the hardware information of the different hardware platforms, thereby improving the model deployment efficiency of the hardware platforms and reducing the development difficulty of the whole vehicle of a host factory.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating a method of model deployment, according to an exemplary embodiment.

FIG. 2 is a schematic diagram of a model computing node relationship graph, according to an example embodiment.

FIG. 3 is a flowchart illustrating a target deployment algorithm model screening method, according to an example embodiment.

FIG. 4 is a flowchart illustrating a search space construction method according to an exemplary embodiment.

FIG. 5 is a flowchart illustrating a search space screening optimization method, according to an exemplary embodiment.

FIG. 6 is a schematic diagram of another model computing node relationship graph, shown according to an example embodiment.

FIG. 7 is a block diagram illustrating a model deployment apparatus, according to an example embodiment.

FIG. 8 is a block diagram illustrating another model deployment apparatus, according to an example embodiment.

FIG. 9 is a block diagram illustrating yet another model deployment apparatus, according to an example embodiment.

Fig. 10 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

Exemplary embodiments will be described in detail below with reference to the accompanying drawings.

It should be noted that the related embodiments and the drawings are only for the purpose of describing exemplary embodiments provided by the present disclosure, and not all embodiments of the present disclosure, nor should the present disclosure be construed to be limited by the related exemplary embodiments.

It should be noted that the terms "first," "second," and the like, as used in this disclosure, are used merely to distinguish between different steps, devices, or modules, and the like. Relational terms are used not to indicate any particular technical meaning nor sequence or interdependence between them.

It should be noted that the modifications of the terms "one", "a plurality", "at least one" as used in this disclosure are intended to be illustrative rather than limiting. Unless the context clearly indicates otherwise, it should be understood as "one or more".

It should be noted that the term "and/or" is used in this disclosure to describe an association between associated objects, and generally indicates that there are at least three associations. For example, a and/or B may at least represent: a exists independently, A and B exist simultaneously, and B exists independently.

It should be noted that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. The scope of the present disclosure is not limited by the order of description of the steps in the related embodiments unless specifically stated.

It should be noted that, all actions for acquiring signals, information or data in the present disclosure are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

Exemplary method 1

FIG. 1 is a flow chart of a model deployment method, as shown in FIG. 1, primarily for use in a computer device or server (hereinafter referred to as a model deployment platform), primarily for automatically adapting an algorithm model to a hardware platform in which an autopilot algorithm is to be deployed, according to an exemplary embodiment; comprising the following steps.

In step S110, a configuration file of the hardware platform to be deployed is obtained, where the configuration file includes hardware information.

The hardware platform to be deployed is a platform requiring deployment of an autopilot algorithm, and includes, but is not limited to, a vehicle end, a server end of a host factory for providing autopilot service for a vehicle, and other hardware platforms requiring implementation/assistance for implementing autopilot.

The model deployment platform is in communication connection with the hardware platform to be deployed, including but not limited to, implementing the communication connection in a wired manner and a wireless manner, for example, through a 4G (4 th generation Mobile Communication Technology, fourth generation mobile communication technology) technology and a 5G (5 th Generation Mobile Communication Technology, fifth generation mobile communication technology) technology, so as to obtain the configuration file.

The configuration file at least contains hardware information corresponding to the hardware platform to be deployed, and the configuration file should be able to be identified and read by the model deployment platform. The hardware information specifically includes, but is not limited to, processor information, memory information, etc., and the hardware information may be a specific model of hardware or a specific parameter.

In step S120, based on the hardware information, obtaining at least one matched correlation operator; a search space is constructed based on the relevance operator.

And carrying out association matching on the hardware information and operators in the operator database, obtaining an association operator matched with the hardware information, and taking the obtained association operator set as a search space. That is, the correlation operator is an operator which can match the performance of the related hardware after the pointer optimizes the hardware.

It should be appreciated that the operator database may be any existing operator database, and the disclosure is not limited. For example, for each operator set after specific hardware optimization, the operator database contains a large number of realized operators, each operator has description information corresponding to the hardware information, and the description information is used for describing the limitation or the application range of the operator to the hardware information, so that the matched correlation operators can be obtained by correlation search according to the hardware information of the hardware platform to be deployed. One operator is not applicable to any hardware, and the present example obtains an associated operator satisfying the use requirement and limitation based on hardware information of a hardware platform to be deployed, so as to construct a search space according to the associated operator set.

In step S130, a computing node relationship map of the algorithm model to be deployed is obtained; the algorithm model to be deployed is formed by combining a plurality of computing nodes and is used for realizing automatic driving of the vehicle.

The algorithm model to be deployed may be any existing neural network model suitable for the autopilot field, and the host factory may be flexibly selected based on the architecture, performance, etc. of the home hardware platform, which is not limited by the disclosure, including but not limited to convolutional neural networks (Convolutional Neural Network, CNN), cyclic neural networks (Recurrent Neural Network, RNN), long-short-term memory networks (Long-Short Term Memory, LSTM), support vector machines (Support vector machines, SVM), etc.

The algorithm model to be deployed may serve as a carrier of an autopilot algorithm, including but not limited to a visual perception algorithm, a radar perception algorithm, a planning algorithm, etc., to assist the vehicle in achieving autopilot, without limitation to the present disclosure.

It should be understood that the algorithm model to be deployed may be regarded as being composed of individual computing nodes, which we may refer to as operators (OP for short). In the network model, the computation logic in the operator corresponding layer, for example: the convolution layer (Convolution Layer) is an operator; the Pooling Layer (Pooling Layer) is an operator; the weight summation process in the full-connected Layer (FC Layer) is an operator.

The computational node relationship graph can be used to describe a model computational reasoning process, including the input-output relationships between all computational nodes of the model.

For easy understanding, the present example provides a feasible example of calculating a node relationship graph by using an algorithm model to be deployed, and it should be understood that the scheme is fully applicable to other algorithm models to be deployed, please refer to fig. 2, and assume that the algorithm model to be deployed includes 3 calculation nodes, namely, a convolution layer Conv, a pooling layer Pool and a full-connection layer FC, input data x is firstly subjected to convolution processing once by the calculation node Conv, then input a convolution processing result x1 into the calculation node Pool for pooling processing, then input a pooling processing result x2 into the full-connection layer FC for weight summation processing, and finally output an inference result y.

In step S140, for the computing nodes in each algorithm model to be deployed, the similarity of the computing nodes to the corresponding correlation operators in the search space is determined.

For example, a similarity between a first parameter of a computing node and a second parameter of an associative operator is determined as the similarity of the computing node to the associative operator, wherein the first parameter of the computing node corresponds to the second parameter of the associative operator.

The algorithm model to be deployed may be one or more. The computing node nature in the algorithm model to be deployed can also be regarded as an operator, so that the similarity between the computing node and the associated operator in the search space is determined, for example, the similarity can be calculated based on tensor shape, size and other dimensions between the computing node and the associated operator, and the similarity calculation mode can be calculated by adopting Euclidean distance, difference average value and the like, which is not limited in the disclosure. In other examples of the present disclosure, any manner of calculating the similarity between operators may be employed.

In step S150, an association operator whose similarity satisfies a preset condition is selected as a candidate operator of the corresponding computing node.

Judging whether a preset condition is met according to the similarity calculation result, and if so, taking the corresponding operator as a candidate operator of the calculation node; if not, the processing is not performed.

The preset conditions can be flexibly set according to actual conditions, for example, a similarity threshold interval is set: is greater than a set similarity threshold.

In step S160, candidate operators corresponding to the computing nodes are selected, and are combined according to the computing node relation graph of the algorithm model to be deployed, so as to obtain at least one candidate deployment algorithm model.

Each computing node of the algorithm model to be deployed has at least one candidate operator, each time when the candidate operators are combined to generate the candidate deployment algorithm model, each computing node of the algorithm model to be deployed needs to select a corresponding candidate operator, and then the candidate operators selected by each computing node are combined according to the corresponding computing node relation graph, so that the candidate deployment algorithm model can be obtained.

Assuming that the algorithm model to be deployed contains 3 compute nodes, each with 2 candidate operators, then 8 (2 x 2) candidate deployment algorithm models will be generated.

Taking the algorithm model to be deployed as shown in fig. 2 as an example, assume that the first computing node Conv has two candidate operators, namely Conv1 and Conv2; the second computing node Pool has two candidate operators, namely Pool1 and Pool2; the third computing node FC has two candidate operators, namely FC1 and FC2; each computing node selects a corresponding candidate operator, and combines according to the computing node relation graphs of the algorithm models to be deployed, so as to obtain 8 candidate deployment algorithm models, please refer to the following table 1:

TABLE 1

It should be appreciated that the candidate operators corresponding to different computing nodes may be the same or may be different; for example, similar operators may be required at different node locations in the algorithm model to be deployed, and the same operator in the operator database is finally selected, that is, the same candidate operator exists. For example, assume that a certain algorithm model to be deployed contains 4 computing nodes, wherein the first computing node has two candidate operators, conv11 and Conv12 respectively; the second computing node has two candidate operators, namely Pool11 and Pool12; third computing nodes Conv11, conv14; here, since the first computing node and the third computing node both need to perform convolution computation, the same candidate operator Conv11 is screened.

Based on the model deployment method provided by the disclosure, aiming at the hardware information of the hardware platform to be deployed and the computing node information of the algorithm model to be deployed, the candidate deployment algorithm model which can be adapted to the hardware platform to be deployed can be automatically generated, so that the problems of low deployment efficiency and long period due to the fact that the current model deployment work depends on the knowledge of algorithm deployment engineers on the characteristics and advantages of the hardware platform and has higher requirements on the professional ability and experience of the algorithm deployment engineers are solved; the method and the device are applicable to various hardware platforms to be deployed so as to automatically realize the rapid optimal deployment of the automatic driving algorithm model, are beneficial to shortening the development period of a host factory on the whole vehicle, reduce the development cost, and are beneficial to promoting the development of the automatic driving technology, and have important theoretical and practical significance.

Exemplary method 2

On the basis of the above-mentioned exemplary method 1, after the candidate deployment algorithm model is obtained, the present example further screens to obtain the target deployment algorithm model, so as to better meet the deployment requirements of the host factory.

Referring to fig. 3, the screening process specifically includes:

in step S310, the candidate deployment algorithm model is tested by using the test data set, so as to obtain a test result corresponding to the model.

Wherein the test results should contain test index values corresponding to the candidate deployment algorithm models, including but not limited to at least one of accuracy, frame rate, time delay, power consumption. It should be understood that the process of obtaining the test result by using the test data set test model is not important in the present invention, and any existing method may be adopted, which is not described herein.

In step S320, a target deployment algorithm model is determined according to the test result.

And determining a final target deployment algorithm model according to the test result, so as to determine an optimal or more suitable algorithm model for the requirements of the host factory. Whether the model deployment algorithm model is optimal or meets the requirements of the host factory depends on the expectations of the host factory on model deployment, so that the model comprehensively judges the expected target value of the host factory by acquiring the test result of each candidate deployment algorithm model, and screens out at least one candidate deployment algorithm model meeting the preset deployment requirements as a target deployment algorithm model.

It should be appreciated that in comprehensively determining the test index value and the target expected value, the test index value of the candidate deployment algorithm model should reach or be better than the expected target value, so that the candidate deployment algorithm model may be screened as the target deployment algorithm model.

For example, the test accuracy and the test frame rate of the candidate deployment algorithm model should be greater than or equal to the expected accuracy and the expected frame rate; the test time delay value and the test power consumption value should be smaller than the expected time delay value and the expected power consumption value and are screened out; and if any expected target value is not met, the corresponding candidate deployment algorithm model is eliminated.

And regarding the candidate deployment algorithm model meeting the expected target value, taking the candidate deployment algorithm model with the highest accuracy as a target deployment algorithm model to be deployed into a hardware platform to be deployed, so as to realize automatic driving.

It should be understood that the preset deployment requirement may also be flexibly set according to the requirements of the host factory, for example, selecting the candidate deployment algorithm model with the highest frame rate as the target deployment algorithm model, or selecting the candidate deployment algorithm model with the smallest power consumption as the target deployment algorithm model.

According to the model deployment method provided by the invention, the hardware resources of the hardware platform to be deployed can be fully utilized, the optimal deployment scheme meeting the requirements of a host factory is obtained, the deployment period of the client to the automatic driving algorithm model under different hardware platforms is greatly reduced, and the development efficiency is improved.

Exemplary method 3

On the basis of the above example, the hardware information of the hardware platform to be deployed in the present example includes the first parameter information, and the search space is constructed based on the first parameter information.

Referring to fig. 4, the process of constructing the search space includes:

in step S410, the first correlation operator is matched according to the first parameter information.

And carrying out association search with operators in the operator database according to the first parameter information to obtain a first association operator matched with the first parameter information.

It should be appreciated that the operator database may be any existing operator database, such as an operator set optimized for each particular hardware, and the disclosure is not limited. The operator database contains a large number of realized operators, each operator has description information corresponding to the first parameter information and is used for describing the limitation or application range of the operator to the corresponding parameter information, so that the matched first correlation operator can be obtained through correlation search according to the first parameter information in the hardware information of the hardware platform to be deployed.

Wherein the first parameter information comprises parameter information of the processor, such as at least one of architecture type, dominant frequency, number of cores, instruction set.

The architecture types herein are processor architectures including, but not limited to, the X86architecture, ARM (Advanced RISC Machine, advanced reduced instruction set machine) architecture, MIPS (Microprocessor without interlocked piped stages architecture) architecture, and RISC-V (Reduced Instruction Set Computer-V, 5 th generation reduced instruction set computer) architecture.

The dominant frequency, i.e., the Clock frequency (Clock Speed) at which the processor core operates, e.g., 2.0GHz, means that the processor will generate 20 hundred million Clock signals per second, each Clock signal having a period of 0.5 nanoseconds.

The number of cores means that there are physically several cores, for example, a dual core is composed of 2 relatively independent CPU core unit groups, and a quad core is meant to include 4 relatively independent CPU core unit groups.

The instruction set, i.e., the set of instructions in a processor used to compute and control a computer system, is largely divided into two categories, the reduced instruction set RISC (fully known as Reduced Instruction Set Computer in english) and the complex instruction set CISC (fully known as Complex Instruction Set Computing in english), wherein the reduced instruction set includes, but is not limited to ARM, MIPS, RISC-V, etc., and the complex instruction set includes, but is not limited to, X86, etc.

In some examples of the disclosure, a first correlation operator matching the first parameter information is determined, a fuzzy search algorithm may be used, in an operator database that has been implemented, a correlation search is performed with each first parameter information (for example, a processor architecture, a dominant frequency, a number of cores, a cache size, an instruction set, etc.) by an operator name, a first correlation operator applicable to the first parameter information is obtained, and the first correlation operator is put into a cache pool to form a first correlation operator set, thereby obtaining a search space.

And comparing the parameter information of the processor with the limit or application range of the corresponding operator in the operator database for the corresponding parameter information to determine whether the parameter information and the application range are matched. For example, an operator is suitable for running on a processor of an ARM architecture (assuming no other constraints are present), and the first parameter information includes that the architecture type is an ARM architecture, then the operator can be used as a first correlation operator. It should be appreciated that as long as one item of parameter information in the processor does not meet the limit or application range of the corresponding operator for the corresponding parameter information, the operator cannot be regarded as the first correlation operator.

In some specific application scenarios, the hardware platform to be deployed may employ a heterogeneous architecture of CPU (Central Processing Unit ) +xpu, which may be one or more of GPU (Graphic Process Unit, image processor), NPU (Neural network Processing Unit, neural network processor), ASIC (Application Specific Integrated Circuit, application specific integrated chip), FPGA (Field Programmable Gate Array ). The hardware platform to be deployed specifically selects which type of hardware and which heterogeneous architecture is adopted, and is mainly determined based on the actual requirements of a host factory. Such as supply chain, price, etc. That is, the hardware platform to be deployed generally includes a plurality of processors, where for each processor, a corresponding first correlation operator may be obtained according to first parameter information of the processor, and then a first correlation operator set is constructed according to the first correlation operator corresponding to each processor as a search space.

For example, assuming that the hardware platform to be deployed comprises a processor 1 and a processor 2, obtaining a matched first correlation operator with an operator m1 according to parameter information of the processor 1; the first correlation operator obtained by corresponding matching according to the parameter information of the processor 2 has an operator m2; then the corresponding resulting first set of correlation operators is: { m1 (processor 1), m2 (processor 2) }, then { m1 (processor 1), m2 (processor 2) } is taken as the search space.

In step S420, a search space is constructed based on the matched first set of correlation operators.

And constructing a first correlation operator set according to all the matched first correlation operators, and taking the first correlation operator set as a search space.

Exemplary method 4

On the basis of the above exemplary method, this example further includes: the hardware information of the hardware platform to be deployed further includes second parameter information, and screening and optimizing the search space constructed in the exemplary method 3 based on the second parameter information. Referring to fig. 5, the specific process mainly includes:

in step S510, the second correlation operator is matched according to the second parameter information in the hardware information of the hardware platform to be deployed.

In step S520, the intersection between the second set of correlation operators and the first set of correlation operators is taken, and a search space is constructed.

And screening and updating the first correlation operator in the search space through the second parameter information, so that the correlation operator in the search space not only meets the hardware limitation on the first parameter information, but also meets the hardware limitation on the second parameter information.

Based on the existing computer system architecture, the processor is one of the main hardware affecting the performance of the computer, and the hardware platform may include, in addition to the hardware such as the processor, a memory, a bus, a communication module, and the like in some specific application scenarios. It should be understood that, what specific hardware configuration is specifically adopted by the hardware platform to be deployed and what heterogeneous architecture is adopted by the hardware platform to be deployed, the disclosure is not limited, and the model deployment method provided by the disclosure can be applicable to any existing hardware platform. The more complex the architecture of the hardware platform to be deployed, the more processors, the better the processor performance and the stronger the compatibility, and the larger the data volume of the search space constructed generally, the larger the workload of generating the candidate deployment algorithm model and determining the target deployment algorithm model is, thus necessarily affecting the deployment efficiency. The interaction between the processors and the outside generally needs to rely on the memory to read the data, and the deployment efficiency of the candidate deployment algorithm model is good and bad, which not only depends on the adaptation of the candidate deployment algorithm model and the processor of the hardware platform to be deployed, but also depends on whether the memory parameter information meets the operator application requirement. In this regard, in some examples of the present disclosure, the second parameter information includes memory parameter information including, but not limited to, memory type, capacity, operating frequency, read rate, etc.; and matching the second correlation operator according to the memory parameter information, and taking an intersection between the second correlation operator set and the first correlation operator set to construct a search space. The method and the device have the advantages that the search space constructed based on the processor parameter information is screened and optimized, the quality of candidate deployment algorithm models is guaranteed, and meanwhile, the deployment efficiency is improved.

Taking a to-be-deployed platform with a CPU+ASIC heterogeneous structure as an example, assume that the to-be-deployed platform obtains a first correlation operator set based on processor information as follows: { a1 (CPU), a2 (CPU), a3 (CPU), a4 (CPU), a1 (ASIC), a2 (ASIC), a5 (ASIC) }; the second correlation operator set is obtained according to the memory parameter information, and is: { a1, a2, a3, a4}; taking the intersection between the second set of correlation operators and the first set of correlation operators to obtain: { a1 (CPU), a2 (CPU), a3 (CPU), a4 (CPU), a1 (ASIC), a2 (ASIC) } eliminating the correlation operator a5 (ASIC); the similarity calculation of the correlation operator a5 (ASIC) is not carried out later, and a candidate deployment algorithm model is generated, so that the deployment efficiency can be improved; meanwhile, the correlation operator a5 is not suitable for running under the condition of the memory parameter, so that the deployment effect of the algorithm model constructed based on the correlation operator is poor in a large probability, the correlation operator is removed, omission of the algorithm model with good deployment effect is not caused, and the quality of candidate deployment algorithm models is affected.

Exemplary method 5

On the basis of the above-mentioned exemplary method, in order to better meet the personalized requirements of the host factory for model deployment, in this example, the configuration file of the hardware platform to be deployed may be provided by the host factory and written according to a fixed format required by the model deployment platform, so that the configuration file can be acquired and identified.

The configuration file may include hardware information of the hardware platform to be deployed and constraint information about model deployment. The hardware information is mainly used for describing the characteristics of hardware of the hardware platform to be deployed, and the constraint information is mainly used for describing the deployment requirement/limitation of a host factory on the algorithm model.

Specifically, the hardware information includes a processor model, memory information, and the like, which form a hardware platform to be deployed; constraint information includes an algorithm model to be deployed, a test dataset, and desired target values for model deployment by the host factory.

In some examples of the present disclosure, the form of the configuration file is shown in table 2 below:

table 2 configuration file

Among them, processors include, but are not limited to CPU, GPU, NPU, ASIC, FPGA, etc.; memory parameter information includes, but is not limited to, memory type, capacity, operating frequency, read rate, etc.

In some examples of the disclosure, the first parameter information in the hardware information may be derived based on a corresponding processor model, and the second parameter information may be derived based on a corresponding memory model; optionally, according to the type of the processor (such as a specific type of CPU, GPU, NPU, ASIC, FPGA) provided by the hardware information, detailed information of the processor, such as parameter information of the processor architecture, main frequency, core number, instruction set and the like, can be queried in the software framework based on a table look-up mode; similarly, according to the memory model provided in the hardware information, the parameter information such as the memory type, capacity, working frequency, reading rate and the like can be obtained through inquiry in the existing mode.

In some examples of the disclosure, a first set of correlation operators may be determined according to processor parameter information of a hardware platform to be deployed, and if the hardware platform to be deployed is a heterogeneous architecture of a cpu+asic, a matched first set of correlation operators may be obtained according to the parameter information of the CPU and the parameter information of the ASIC chip, respectively, to obtain the first set of correlation operators; the first correlation operator which is matched with the parameter information of the CPU is assumed to be provided with an operator a1, an operator a2, an operator a3 and an operator a4; the parameter information of the ASIC is correspondingly matched to obtain a first correlation operator with an operator a1, an operator a2 and an operator a5; then the corresponding resulting first set of correlation operators is: { a1 (CPU), a2 (CPU), a3 (CPU), a4 (CPU), a1 (ASIC), a2 (ASIC), a5 (ASIC) }.

It should be understood that a1 (CPU) means that the operator a1 is associated with the CPU, and that the data processing procedure of the operator a1 is performed by the CPU, that is, the execution subject of a1 (CPU) is the CPU; this is the same as the operator a1 (ASIC) in that the operator itself is the same, but the execution body is different, and the data processing procedure of the operator a1 (ASIC) is executed by the ASIC.

In some examples of the present disclosure, in generating the candidate deployment algorithm model, the operators a1 (CPU) and a1 (ASIC) are considered as two different operators, i.e., the operators a1 (CPU) and a1 (ASIC) will act as two different candidate operators for constructing the different candidate deployment algorithm models.

By adopting the mode, more candidate deployment algorithm models meeting the conditions are obtained, and the problem that some important candidate deployment algorithm model schemes are omitted is avoided. For example, the first set of correlation operators is: { a1 (CPU), a2 (CPU), a1 (ASIC), a2 (ASIC) }, there is a problem of missing candidate deployment algorithm model schemes. The problem that unnecessary correlation operators exist in the search space and the deployment efficiency is affected can also be avoided, for example, the first correlation operator set is as follows: { a1 (CPU), a2 (CPU), a3 (CPU), a4 (CPU), a5 (CPU), a1 (ASIC), a2 (ASIC), a3 (ASIC), a4 (ASIC), a5 (ASIC) }, there is a problem that deployment efficiency is affected.

Exemplary method 6

Based on the above exemplary methods, the present example provides a method of generating candidate deployment algorithm models from candidate operators.

Referring to fig. 6, an alternative computing node relationship map of an algorithm model to be deployed is shown, where the algorithm model to be deployed has 4 computing nodes, and the search space is assumed to be respectively a computing node 31, a computing node 32, a computing node 33 and a computing node 34: { a1 (CPU), a2 (CPU), a3 (CPU), a4 (CPU), a1 (ASIC), a2 (ASIC) } is calculated to obtain a candidate operator corresponding to a calculation node 31 of the algorithm model to be deployed as a3, a candidate operator corresponding to a calculation node 32 as a4, a candidate operator corresponding to a calculation node 33 as a1, and a candidate node corresponding to a calculation node 34 as a2; since candidate operator a1 has two implementations, a1 (CPU) and a1 (ASIC); there are two implementations of candidate operator a2, a2 (CPU) and a2 (ASIC), respectively, so that the candidate deployment algorithm model in 4 can be generated, please refer to the following Table 3:

TABLE 3 Table 3

That is, when the same operator corresponds to different executing processors, in some examples of the disclosure, the method may be used to generate different candidate deployment algorithm models, and for the generated operator, the candidate deployment algorithm models with the same operator but different execution bodies may be different in model deployment effect, so that it is beneficial to obtain a target deployment algorithm model with better deployment effect.

Exemplary apparatus

FIG. 7 is a block diagram of a model deployment device, according to an example embodiment. Referring to fig. 7, the apparatus 700 includes a first acquisition module 710, an association search module 720, a second acquisition module 730, a computational screening module 740, and a model generation module 750, wherein:

the first obtaining module 710 is configured to obtain a configuration file of a hardware platform to be deployed, where the configuration file includes hardware information;

the association search module 720 is configured to obtain at least one matched association operator based on the hardware information, and construct a search space based on the association operator;

the second obtaining module 730 is configured to obtain a calculated node relationship graph of the algorithm model to be deployed; the algorithm model to be deployed is formed by combining a plurality of computing nodes and is used for realizing the algorithm model for automatic driving of the vehicle;

The calculation screening module 740 is configured to determine, for each calculation node of the algorithm model to be deployed, a similarity between the calculation node and a corresponding correlation operator in the search space; selecting an associated operator with similarity meeting a preset condition as a candidate operator of a corresponding computing node;

the model generating module 750 is configured to select candidate operators corresponding to the computing nodes, and combine the candidate operators according to the computing node relationship graphs of the algorithm model to be deployed to obtain at least one candidate deployment algorithm model.

In some examples of the present disclosure, the hardware information includes first parameter information, and the association search module 720 is configured to match a first association operator according to the first parameter information, and construct a search space based on the matched first association operator set.

Wherein the first parameter information includes at least one of architecture type, dominant frequency, number of cores, instruction set.

In some examples of the present disclosure, the hardware information further includes second parameter information, and the association search module 720 is configured to match a second association operator according to the second parameter information, and take an intersection between the second association operator set and the first association operator set to construct a search space. The method and the device have the advantages that screening and updating of the search space are realized, and model deployment efficiency is improved under the condition that the quality of the generated candidate deployment algorithm model is not adversely affected.

In some examples of the disclosure, the first parameter of the compute node corresponds to the second parameter of the correlation operator; the calculation filtering module 740 is configured to calculate a similarity between the first parameter and the second parameter as a similarity between the computing node and the correlation operator. The computing node nature in the algorithm model to be deployed can also be regarded as an operator, so that the similarity between the computing node and the associated operator in the search space is determined, for example, the similarity can be calculated based on tensor shape, size and other dimensions between the computing node and the associated operator, and the similarity calculation mode can be calculated by adopting Euclidean distance, difference average value and the like, which is not limited in the disclosure. In other examples of the present disclosure, any manner of calculating the similarity between operators may be employed.

Referring to fig. 8, in some examples of the present disclosure, the model deployment apparatus 700 further includes a model test module 760 and a model screening module 770, where after obtaining candidate deployment algorithm models based on the model generating module 750, the model test module 760 is configured to test each candidate deployment algorithm model by using a test data set to obtain a test result; the model screening module 770 is configured to determine a target deployment algorithm model based on the test results.

In some examples of the present disclosure, the test results include test index values of the candidate deployment algorithm model, such as at least one of accuracy, frame rate, time delay, power consumption. The model screening module 770 is configured to screen at least one candidate deployment algorithm model satisfying a preset deployment requirement as a target deployment algorithm model based on the test index value and the expected target value of each candidate deployment algorithm model.

It should be understood that, when the model screening module 770 comprehensively determines the test index value and the target expected value, the test index value of the candidate deployment algorithm model should reach or be better than the expected target value, so that the candidate deployment algorithm model may be screened as the target deployment algorithm model; for example, the test accuracy and the test frame rate of the candidate deployment algorithm model should be greater than or equal to the expected accuracy and the expected frame rate; the test time delay value and the test power consumption value should be smaller than the expected time delay value and the expected power consumption value and are screened out; and if any expected target value is not met, the corresponding candidate deployment algorithm model is eliminated.

For the candidate deployment algorithm models meeting the expected target value, the model screening module 770 can take the candidate deployment algorithm model with the highest accuracy as the target deployment algorithm model to be deployed into the hardware platform to be deployed, so as to realize automatic driving.

It should be understood that the preset deployment condition may also be flexibly set according to the requirements of the host factory, for example, the model screening module 770 may select, as the target deployment algorithm model, the candidate deployment algorithm model with the highest frame rate, or select, as the target deployment algorithm model, the candidate deployment algorithm model with the smallest power consumption.

Referring to fig. 9, in some examples of the present disclosure, the model deployment apparatus 700 further includes a model deployment module 780 configured to automatically deploy the target deployment algorithm model into the hardware platform to be deployed, so as to implement vehicle autopilot.

In some examples of the present disclosure, model deployment module 780 may employ OTA (Over-the-Air Technology) Technology to implement model deployment.

According to the model deployment device provided by the disclosure, the hardware resources of the hardware platform to be deployed can be fully utilized, an optimal deployment scheme meeting the requirements of a host factory is obtained, the deployment period of customers on the automatic driving algorithm model under different hardware platforms is greatly reduced, and the development efficiency is improved.

Exemplary electronic device

Fig. 10 is a block diagram of an electronic device 100, shown in accordance with an exemplary embodiment. The electronic device 100 may be a third party platform, server, computer, or other type of electronic device that provides model deployment services.

Referring to fig. 10, an electronic device 100 may include at least one processor 110 and a memory 120. Processor 110 may execute instructions stored in memory 120. The processor 110 is communicatively coupled to the memory 120 via a data bus. In addition to memory 120, processor 110 may also be communicatively coupled with input device 130, output device 140, and communication device 150 via a data bus.

The processor 110 may be any conventional processor, such as a commercially available CPU. The processor may also include, for example, an image processor (Graphic Process Unit, GPU), a field programmable gate array (Field Programmable Gate Array, FPGA), a System On Chip (SOC), an application specific integrated Chip (Application Specific Integrated Circuit, ASIC), or a combination thereof.

The memory 120 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

In the embodiment of the present disclosure, the memory 120 stores executable instructions, and the processor 110 may read the executable instructions from the memory 120 and execute the instructions to implement all or part of the steps of the method for deploying an autopilot model according to any one of the above-described exemplary embodiments.

Exemplary computer-readable storage Medium

In addition to the methods and apparatus described above, exemplary embodiments of the present disclosure may also be a computer program product or a computer readable storage medium storing the computer program product. The computer program instructions are embodied in a computer product that is executable by a processor to perform all or part of the steps of the method for deploying an autopilot model described in any one of the above-described exemplary embodiments.

The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages, as well as scripting languages (e.g., python). The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: a Static Random Access Memory (SRAM), an electrically erasable programmable read-only memory (EEPROM), an erasable programmable read-only memory (EPROM), a programmable read-only memory (PROM), a read-only memory (ROM), a magnetic memory, a flash memory, a magnetic or optical disk, or any suitable combination of the foregoing having one or more electrical conductors.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An autopilot model deployment method, comprising:

2. The autopilot model deployment method of claim 1 wherein the model deployment method further comprises:

3. The method of automated driving model deployment of claim 2, wherein the determining a target deployment algorithm model from the test results comprises:

4. The autopilot model deployment method of claim 1 wherein the hardware information includes first parameter information;

5. The method of automatic driving model deployment according to claim 4, wherein the first parameter information comprises at least one of architecture type, dominant frequency, number of cores, instruction set.

6. The autopilot model deployment method of claim 4 wherein the hardware information further includes second parameter information, the model deployment method further comprising:

7. The method of automated driving model deployment of any of claims 1-6, wherein the determining the similarity of the computing node to a corresponding relevance operator in the search space comprises:

8. An autopilot model deployment apparatus, comprising:

9. An electronic device comprising a processor, a memory, and a communication bus;

the processor is configured to execute one or more programs stored in a memory to implement the steps of the autopilot model deployment method of any one of claims 1 to 7.

10. A computer-readable storage medium storing one or more programs executable by one or more processors to implement the steps of the autopilot model deployment method of any one of claims 1 to 7.