CN108614703B

CN108614703B - Embedded platform-based algorithm transplanting system and algorithm transplanting method thereof

Info

Publication number: CN108614703B
Application number: CN201611256319.6A
Authority: CN
Inventors: 陈立刚; 周劲蕾; 赵俊能; 胡进
Original assignee: Zhejiang Sunny Optical Intelligent Technology Co Ltd
Current assignee: Zhejiang Sunny Optical Intelligent Technology Co Ltd
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2022-04-19
Anticipated expiration: 2036-12-30
Also published as: CN108614703A

Abstract

The algorithm transplanting system comprises an acquisition evaluation unit, a calculation unit and a calculation unit, wherein the acquisition evaluation unit is used for acquiring an evaluation algorithm; an algorithm flow adjusting unit for adjusting the algorithm flow; a multi-core allocation unit, which is used for allocating multi-core to the algorithm process for processing; the framework integration unit is used for carrying out framework integration on the algorithm process subjected to multi-core processing; and the recording unit is used for recording the algorithm into the embedded platform, so that the algorithm designed based on the PC end is transplanted to the embedded platform.

Description

Embedded platform-based algorithm transplanting system and algorithm transplanting method thereof

Technical Field

The invention relates to algorithm transplantation, in particular to an algorithm transplantation system based on an embedded platform and an algorithm transplantation method thereof.

Background

Nowadays, embedded mobile devices have become more and more widely used, and thus more and more PC (Personal Computer) algorithms are also expected to be migrated and applied by virtue of the embedded platform.

CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer) are two architectures of existing CPUs, and have advantages, advantages and disadvantages and application range due to different design concepts and methods.

CISC the core of each microprocessor is the circuit that runs the instructions. An instruction consists of multiple steps to accomplish a task, either transferring a value to a register or performing an add operation. The instruction system of the CISC is rich and has special instructions to complete specific functions, so that most of PCs adopt a CISC framework, and the CISC framework is more suitable for finishing preliminary design at a CISC computer end for most application programs.

The format of all instructions of RISC is consistent, the cycles of all instructions are the same, and pipelining is used. The design idea simplifies the number of instructions and the addressing mode, so that the realization is easier, the parallel execution degree of the instructions is good, and the efficiency of the compiler is higher. However, RISC is typically performed by combining instructions for less common functions, and thus may be less efficient when implementing special functions on the RSIC. Therefore, the basic design of most application programs is performed in the CISC, not in the RSIC, but when the program design is stable and the actual application is performed, the program needs to be migrated to the device, and these devices are usually embedded devices and are structured in the RSIC manner.

Due to the difference of the concepts of the basic design ideas of RISC and CISC, when an algorithm based on CISC design is to be transplanted and applied to an embedded device of a RISC framework, the basic flow of the algorithm and the starting of multiple cores of a Digital Signal Processing (DSP) need to be adjusted, so that the device of the RSIC framework based on an embedded platform can be applied.

Disclosure of Invention

An object of the present invention is to provide an algorithm migration system based on an embedded platform and an algorithm migration method thereof, wherein the method adjusts an algorithm based on CISC design to be migrated to an embedded device of a DSP multi-core RISC framework.

An object of the present invention is to provide an algorithm transplantation system based on an embedded platform and an algorithm transplantation method thereof, wherein the algorithm to be transplanted is designed according to characteristics of the algorithm to be transplanted, so that the algorithm to be transplanted maximally utilizes multi-core resources.

The invention aims to provide an algorithm transplanting system based on an embedded platform and an algorithm transplanting method thereof, wherein the method systematically integrates an algorithm flow framework.

The invention aims to provide an algorithm transplanting system based on an embedded platform and an algorithm transplanting method thereof, wherein after the method is used for frame integration, the algorithm can be further optimized in a plurality of different modes, and the running efficiency of the transplanted algorithm at an embedded end is improved.

An object of the present invention is to provide an algorithm transplantation system based on an embedded platform and an algorithm transplantation method thereof, wherein the algorithm transplantation mode is suitable for transplantation of an algorithm applied to a three-dimensional detection technology.

The invention aims to provide an algorithm transplanting system based on an embedded platform and an algorithm transplanting method thereof, wherein the algorithm transplanting method is suitable for being applied to transplanting of a 3D structured light algorithm, and 3D structured light algorithm flow at a PC end is transplanted to the embedded platform of a DSP multi-core RISC framework.

In order to achieve at least one of the above objects, the present invention provides an embedded platform-based algorithm transplantation system, which includes: an obtaining and evaluating unit for obtaining an evaluating algorithm; an algorithm flow adjusting unit for adjusting the algorithm flow; a multi-core allocation unit, which is used for allocating multi-core to the algorithm process for processing; the framework integration unit is used for carrying out framework integration on the algorithm process subjected to multi-core processing; and the recording unit is used for recording the algorithm into the embedded platform so as to transplant the algorithm to the embedded platform.

According to some embodiments, after the algorithm is obtained, the obtaining and evaluating unit in the algorithm transplantation system performs evaluation in a Cycle calculation mode of frequency maximization on a DSP core by using an assembly language.

According to some embodiments, the algorithm evaluation unit in the algorithm transplanting system evaluates whether the performance of the algorithm meets a desired standard, and when the performance of the algorithm meets the standard, the next step can be carried out; and when the evaluation does not reach the expected standard, performing algorithm theory optimization and evaluating again.

According to some embodiments, the algorithm flow adjusting unit in the algorithm transplanting system determines whether the adjustment of the processing flow is needed, and when the adjustment of the processing flow is determined to be needed, the flow adjustment of the algorithm is performed; and when the adjustment of the processing flow is not needed, the flow adjustment is not carried out on the algorithm flow, and the multi-core processing is directly started.

According to some embodiments, the algorithm migration system divides the algorithm flow into a platform frame part and a non-platform frame part when the algorithm flow adjustment unit adjusts the algorithm flow.

According to some embodiments, the multi-core distribution unit in the algorithm migration system separately initiates multi-core processing for the platform frame portion.

According to some embodiments, the multi-core allocation unit in the algorithm transplanting system determines whether multi-core needs to be started or not for the non-platform frame part, and allocates a single core when the multi-core does not need to be started; and when the multi-core needs to be started, distributing the multi-core.

According to some embodiments, the multi-core allocation unit in the algorithm transplanting system determines the inter-row adhesion of a part of the non-platform frame part, where multi-core needs to be started, and when the inter-row adhesion is large, horizontal division is adopted, and when the inter-row adhesion is small, vertical division is adopted.

According to some embodiments, the algorithm migration system includes a memory adjustment unit, and the memory adjustment unit is configured to adjust memory allocation after the algorithm is subjected to the multi-core allocation processing, so as to improve operation efficiency.

According to some embodiments, the memory adjustment unit in the algorithm migration system performs overflow judgment on the DSP memory, and when the overflow is judged, the memory of the DSP core is divided into a code segment and a data segment, and the code segment is linked to the memory segment with a low speed, and the data segment is linked to the memory segment with a high speed.

According to some embodiments, the frame-integrated unit in the algorithm transplanting system performs frame integration on the algorithm after DSP multi-core processing on a RISC control core.

According to some embodiments, the algorithm migration system comprises an algorithm optimization unit for optimizing the algorithm integrated by the framework integration unit.

According to some embodiments, the optimization method of the algorithm optimization unit in the algorithm transplantation system is selected from the following methods: putting the correlation calculation under the multi-core condition as much as possible for processing; multiplexing the time sequence space of the thread; optimizing the register level by using a bottom language; preprocessing by utilizing an inline mode in a compiling stage; and replacing part of software filtering operators in the algorithm with hardware filtering.

According to some embodiments, the algorithm transplanting system comprises an operation evaluation unit for evaluating the operation effect of the algorithm transplanted to the embedded platform, and after the evaluation is passed, the logging unit logs the algorithm into the embedded platform.

According to some embodiments, the logging-in unit in the algorithm transplanting system utilizes JTAG to burn the flash file into the embedded platform.

According to some embodiments, the algorithm migration system is a 3D structured light algorithm.

Another aspect of the present invention provides an algorithm transplantation method based on an embedded platform, which includes the steps of:

(A) evaluating an algorithm;

(B) distributing the DSP;

(C) a conformable frame; and

(D) and inputting the algorithm into the embedded platform.

According to some embodiments, the embedded platform-based algorithm transplanting method, wherein the step (a) comprises the steps of: after the algorithm is obtained, evaluation is performed on the DSP core in a manner that utilizes assembly language to perform frequency-maximized Cycle calculations.

According to some embodiments, the embedded platform-based algorithm transplanting method, wherein the step (a) comprises the steps of: evaluating whether the performance of the algorithm meets a desired criterion, and when the criterion is met, proceeding to the next step; and when the evaluation does not reach the expected standard, performing algorithm theory optimization and evaluating again.

According to some embodiments, the embedded platform-based algorithm transplanting method, wherein the step (B) comprises the steps of: judging whether the adjustment of the processing flow needs to be carried out or not, and carrying out flow adjustment on the algorithm when the adjustment of the processing flow needs to be carried out; and when the adjustment of the processing flow is not needed, the flow adjustment is not carried out on the algorithm flow, and the multi-core processing is directly started.

According to some embodiments, the embedded platform-based algorithm transplanting method, wherein the step (B) comprises the steps of: judging whether the platform frame of the embedded platform can be utilized or not, and dividing the algorithm flow steps into a platform frame part when the platform frame is judged to be utilized; and when the platform framework of the embedded platform cannot be utilized, dividing the algorithm flow steps into a non-platform framework part.

According to some embodiments, the embedded platform-based algorithm transplanting method, wherein the step (B) further comprises the steps of: multi-core processing is separately initiated for the platform frame portion.

According to some embodiments, the embedded platform-based algorithm transplantation system, wherein the step (B) further comprises the steps of: judging whether multi-core needs to be started or not for the non-platform frame part, and distributing single cores when the multi-core does not need to be started; and when the multi-core needs to be started, distributing the multi-core.

According to some embodiments, the embedded platform-based algorithm transplantation system is a 3D structured light algorithm, wherein the normalization step allocates single-core processing, and the feature point filtering and the Node point filtering start multi-core.

According to some embodiments, the embedded platform-based algorithm transplanting method, wherein the step (B) further comprises the steps of: and judging the adhesion between lines of the part needing opening the multi-core in the non-platform frame part, adopting horizontal segmentation when the adhesion between the lines is large, and vertically segmenting when the adhesion between the lines is small.

According to some embodiments, the algorithm transplantation method based on the embedded platform is a 3D structured light algorithm, wherein the feature point filtering adopts horizontal segmentation, and the Node point filtering adopts vertical segmentation.

According to some embodiments, the embedded platform-based algorithm transplantation method, wherein the algorithm is a 3D structured light algorithm, wherein image signal processing, smoothing filtering and convolution filtering are divided into a platform frame part; the normalization, feature point filtering, and Node point filtering are divided into non-platform frame portions.

According to some embodiments, the embedding is platform-based algorithm transplanting method, wherein step (E) is included after step (B): and adjusting the DSP multi-core memory.

According to some embodiments, in the embedded platform-based algorithm transplanting method, an overflow judgment is performed on the DSP memory in step (E), and when the overflow is judged, the memory of the DSP core is divided into a code segment and a data segment, and the code segment is linked to a memory segment with a low speed, and the data segment is linked to a memory segment with a high speed.

According to some embodiments, in the embedded platform-based algorithm transplanting method, in the step (C), the algorithm after DSP multi-core processing is frame-integrated on a RISC control core.

According to some embodiments, the embedded platform based algorithm transplantation system, wherein the step (D) is followed by a step (F): the algorithm is optimized.

According to some embodiments, the embedded platform based algorithm transplantation system, wherein the optimization method of step (F) is selected from the following methods: putting the correlation calculation under the multi-core condition as much as possible for processing; multiplexing the time sequence space of the thread; optimizing the register level by using a bottom language; preprocessing by utilizing an inline mode in a compiling stage; and replacing part of software filtering operators in the algorithm with hardware filtering.

According to some embodiments, the embedded algorithm based migration method comprises, before the step (D), the steps of: evaluating the effect of the transplantation, and when the evaluation is passed, performing step (D).

According to some embodiments, in the embedded algorithm migration method, in the step (D), JTAG is used to burn the flash file into the embedded platform.

According to some embodiments, the embedded algorithm-based transplanting method, wherein the algorithm is a 3D structured light algorithm.

Another aspect of the present invention provides a 3D structured light algorithm transplanting method, wherein the 3D structured light algorithm flow includes smooth filtering, normalization, feature point filtering, and Node point filtering, and wherein the 3D structured light algorithm is transplanted to an embedded platform by the algorithm transplanting method.

Drawings

Fig. 1 is an embedded platform-based algorithm transplantation system according to a first preferred embodiment of the present invention.

Fig. 2 is a schematic diagram of an execution flow of the embedded platform-based algorithm transplantation system according to the first preferred embodiment of the present invention.

Fig. 3 is a block diagram of an embedded platform-based algorithm migration method according to a second preferred embodiment of the present invention.

Fig. 4 is a flowchart of an embedded platform-based algorithm migration method according to a second preferred embodiment of the present invention.

Fig. 5 is a flowchart of 3D structured light at the PC end in the embedded platform-based 3D structured light algorithm transplanting method according to a specific implementation manner of the above preferred embodiment of the present invention.

Fig. 6 is a flowchart of a 3D structured light algorithm transplanting method based on an embedded platform according to a specific implementation manner of the above preferred embodiment of the present invention.

Detailed Description

The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.

It will be understood by those skilled in the art that in the present disclosure, the terms "longitudinal," "lateral," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for ease of description and simplicity of description, and do not indicate or imply that the referenced devices or components must be constructed and operated in a particular orientation and thus are not to be considered limiting.

Referring to fig. 1 and 2, the present invention provides an embedded platform-based algorithm migration system 100, wherein the algorithm migration system 100 can migrate an algorithm designed or run on a PC side to an embedded platform, such as an embedded platform of a mobile device, so that the algorithm can be run by means of the embedded platform, thereby expanding the application range of the algorithm by using the advantages of the embedded platform or the advantages of the mobile device. In particular, the embedded platform based algorithm migration system 100 is adapted to migrate algorithms under a predetermined framework, such as, but not limited to, multi-core DSP, RISC. The algorithm transplanting system 100 can be applied to a depth algorithm transplanting process for machine vision, intelligent robots, binocular sensing and the like.

Specifically, according to this embodiment of the present invention, the embedded platform-based algorithm migration system 100 includes an obtaining and evaluating unit 101, where the obtaining and evaluating unit 101 is configured to obtain an algorithm and evaluate the performance of the algorithm. That is, the acquisition evaluation unit 101 is configured to preliminarily evaluate whether the acquired algorithm can be applied to the algorithm migration system 100.

Further, the acquisition evaluation unit 101 performs evaluation of algorithm performance on the DSP core, such as Cycle (clock Cycle) calculation with frequency maximization using assembly language.

Further, the evaluation criteria of the acquisition evaluation unit 101 after the acquisition algorithm are: if the performance evaluated by the obtaining and evaluating unit 101 is far from the expected standard, the algorithm needs to be optimized and adjusted theoretically; if the performance evaluated by the acquisition and evaluation unit 101 meets the expected standard, the next step can be performed.

Due to the limitation of hardware resources, the embedded platform is usually performed according to rows when processing images, and some steps in the PC algorithm are usually performed according to frames, for example, when an extremum is found for each pixel in a traversal image. Therefore, according to this embodiment of the present invention, the embedded platform-based algorithm migration system 100 includes an algorithm flow adjustment unit 102, where the algorithm flow adjustment unit 102 is configured to perform flow adjustment on an algorithm to be adjusted, and adjust a flow designed by the algorithm on the PC side to a flow suitable for the embedded platform. Such as adjusting the flow designed to run on the PC end by frame to a flow suitable for the embedded platform by line. Of course, when the flow of the algorithm designed at the PC end is consistent with the flow requirement of the embedded platform, the adjustment of the flow is not needed, and the next step can be performed.

Of course, before the adjustment, the algorithm flow adjustment unit 102 needs to determine whether the adjustment of the processing flow needs to be performed, and when the adjustment of the processing flow needs to be performed, the algorithm flow adjustment is performed; and when the adjustment of the processing flow is not needed, the algorithm flow is not adjusted, and the DSP multi-core processing can be started by directly utilizing the embedded platform framework. That is, when the obtained algorithm flow conforms to the framework system of the embedded platform, no algorithm flow adjustment is performed.

More specifically, the adjustment manner of the algorithm flow adjustment unit 102 is, for example and without limitation, to perform flow adjustment on a platform frame and a non-platform frame on the algorithm according to the characteristics of the embedded platform. The platform framework is an existing line processing modular packaging framework of the embedded platform. These frames often have the characteristics of fast processing speed and convenient modular adjustment, but the disadvantage is also obvious, namely, the degree of freedom for users to adjust the data processing mode is too low, so that a non-platform frame needs to be built.

In other words, the algorithm flow adjusting unit 102 may adjust the algorithm into two parts, namely, a platform frame and a non-platform frame during the adjustment process. That is, algorithms that are consistent with the platform-embedded frame are placed into the platform-frame portion, while algorithms that are inconsistent with the platform-embedded frame are placed into the non-platform-frame portion, so that processing can be performed separately for the platform-frame portion and the non-platform-frame portion.

Further, the embedded platform-based algorithm transplanting system 100 includes a multi-core allocation unit 103, and the multi-core allocation unit 103 is configured to allocate a multi-core processing mode of the DSP. For example, but not limited to, the multi-core allocation unit 103 allocates whether each step in the execution algorithm flow performs single-core processing or multi-core coordination processing. In some cases, the task of inter-core allocation of the DSP is done by itself inside the platform framework, so the platform framework part is suitable for multi-core processing, while the non-platform framework part needs to make a specific decision.

Further, when the non-platform framework portion is allocated to start multi-core, the multi-core allocation unit 103 needs to determine the multi-core division direction of the DSP, such as horizontal division and vertical division. When the adhesion among lines of the image processed by the algorithm is large, the image needs to be divided horizontally, and when the adhesion among lines of the image is small, the image needs to be divided vertically. The adhesivity refers to the correlation of pixel points expressed in the algorithm calculation between the upper and lower rows of the pixel points of the image.

For example, in some embodiments, after the algorithm flow unit adjusts the algorithm, the steps in the algorithm flow are adjusted to a platform frame part and a non-platform frame part, and the multi-core allocation unit 103 may perform multi-core activation on the platform frame part, that is, the platform frame part is processed by multi-core of DSP. Judging whether a multi-core can be started or not for the non-platform frame part, if not, distributing a single core and compiling an operator; if the multi-core partition device can be started, the direction of multi-core partition needs to be judged at the moment, the distribution with high adhesion of the pixel points is horizontally divided, and the distribution with low adhesion of the pixel points is vertically divided. And by the method, the utilization of the DSP multi-core resource is maximized.

Further, the embedded platform based algorithm migration system 100 includes a memory adjustment unit 104, where the memory adjustment unit 104 is configured to adjust a memory segment. More specifically, the memory adjusting unit 104 determines that the DSP memory overflows, and if the memory is determined to overflow, the memory adjusting unit 104 may adjust the memory of the DSP core by separating code segments from data segments, link the code segments to the memory segments with a low rate, and link the data segments to the memory segments with a high rate; when the memory is judged not to have overflow, the next step can be directly carried out. Since the code segments are usually loaded once and the data segments are read and refreshed continuously while the algorithm is running, the inner adjusting unit allocates the algorithm to different memory segments, thereby more efficiently utilizing the memory.

Further, the embedded platform based algorithm transplantation system 100 includes a framework integration unit 105, and the framework integration unit 105 is configured to integrate the interfaces of the algorithm flow framework on the RISC control core.

In other words, the algorithm process adjusted by the algorithm process unit is integrated by the frame integration unit 105, for example, the platform frame part and the non-platform frame part are integrated after being respectively processed by the DSP, so that the algorithm process frame system is integrated into the RISC frame.

Further, the embedded platform-based algorithm transplanting system 100 includes an algorithm optimizing unit 106, where the algorithm optimizing unit 106 is configured to optimize details of the algorithm, so as to further improve the operating efficiency of the algorithm on the embedded platform framework.

Specifically, the algorithm optimization unit 106 may optimize in the following ways, for example and without limitation:

1. and (4) putting the calculation of the correlation under the multi-core condition as much as possible for processing.

2. The timing space of the threads is multiplexed.

3. The optimization at the register level is performed using an underlying language (e.g., assembly).

4. Preprocessing is performed in a compiling stage by means of an inline mode and the like.

5. And replacing part of software filtering operators in the algorithm with hardware filtering.

Further, the embedded platform based algorithm migration system 100 includes an operation evaluation unit 107, where the operation evaluation unit 107 is configured to perform operation evaluation on the algorithm operation effect after the algorithm is migrated to the embedded platform.

In other words, the operation evaluation unit 107 is used to evaluate whether the algorithm is successfully migrated and the migration effect.

Further, the embedded platform-based algorithm transplanting system 100 includes an entry unit 108, where the entry unit 108 is configured to enter the transplanted algorithm into the embedded platform upper computer. Such as flash file burning using JTAG.

Referring to fig. 3 and 4, according to the above preferred embodiment of the present invention, the present invention provides an embedded platform-based algorithm transplantation method 1000, the algorithm transplantation method 1000 is suitable for transplanting an algorithm designed or run by a PC end to an embedded platform, and the algorithm transplantation method 1000 can be applied to a deep algorithm transplantation process for machine vision, intelligent robot, binocular sensing and the like. The algorithm migration method 1000 includes the steps of:

1001: evaluating an algorithm;

1002: adjusting an algorithm flow;

1003: distributing the DSP;

1004: adjusting the memory;

1005: a conformable frame;

1006: optimizing an algorithm;

1007: evaluating the transplantation effect; and

1008: and (4) inputting an algorithm into the embedded platform.

Therein, in the step 1001, performance evaluation is performed on the DSP core, such as frequency maximization Cycle (clock Cycle) calculation using assembly language.

The criteria evaluated in said step 1001 are: when the performance of the evaluation algorithm is far from the expected standard, carrying out optimization adjustment on the algorithm; when the performance of the algorithm meets the criteria, step 1002 is performed. That is, when the evaluation result is not expected, the algorithm needs to be optimized and evaluated again, and step 1002 is executed until the performance of the algorithm reaches the standard.

Due to the limitation of hardware resources, the embedded platform often processes images according to rows, and some steps in the PC algorithm often need to be performed according to frames, for example, when each pixel in an image is traversed to obtain an extremum, inconsistency exists between the two pixels, and therefore the algorithm flow needs to be adjusted. In step 1002, the algorithm flow is adjusted, so that the algorithm at the PC end is suitable for the execution flow of the embedded platform. Before adjustment, whether the adjustment of the processing flow needs to be carried out or not needs to be judged, and when the adjustment of the processing flow does not need to be carried out, an embedded platform frame can be directly adopted to start DSP multi-core processing; and when the adjustment of the processing flow is needed, adjusting the algorithm flow.

Specifically, when adjusting, the flow of the algorithm is divided into a platform frame part and a non-platform frame part according to the characteristics of the embedded platform. The platform framework is an existing line processing modular packaging framework of the embedded platform. These frames often have the characteristics of fast processing speed and convenient modular adjustment, but the disadvantage is also obvious, namely, the degree of freedom for users to adjust the data processing mode is too low, so that a non-platform frame needs to be built.

In other words, during the tuning process, the algorithm may be tuned to both the platform frame and the non-platform frame. That is, algorithms that are consistent with the platform-embedded frame are placed into the platform-frame portion, while algorithms that are inconsistent with the platform-embedded frame are placed into the non-platform-frame portion, so that processing can be performed separately for the platform-frame portion and the non-platform-frame portion.

Thus, said step 1002 comprises the step 10021: and judging whether to adjust the processing flow.

Said step 10021 further comprises the steps of:

100211: when the adjustment of the processing flow is needed, the algorithm flow is adjusted;

100212: and when the adjustment of the processing flow is not needed, starting the DSP multi-core by adopting the platform framework.

Wherein said step 100211 further comprises the steps of:

1002111: judging whether a platform frame of the embedded platform can be utilized or not;

1002112: when the platform frame is judged to be available, dividing the algorithm flow steps into a platform frame part;

1002113: and when the platform framework of the embedded platform cannot be utilized, dividing the algorithm flow steps into a non-platform framework part.

In the step 1003, a multi-core processing manner of the multi-core DSP is allocated, for example, whether single-core processing or multi-core processing is adopted is allocated to the steps in the algorithm flow, and what division manner is adopted in the multi-core process.

In some cases, the task of inter-core allocation of the DSP is done by itself inside the platform framework, so that the platform framework part is adapted to need to be multi-core processed, while the non-platform framework part can be further specifically determined.

Further, when the non-platform frame portion is allocated to start multi-core, the direction of multi-core segmentation of the DSP needs to be determined, such as horizontal segmentation and vertical segmentation, and when the adhesion between lines of the image processed by the algorithm is large, the image needs to be segmented horizontally, and when the adhesion between lines of the image is small, the image needs to be segmented vertically. The adhesivity refers to the correlation of pixel points expressed in the algorithm calculation between the upper and lower rows of the pixel points of the image.

For example, in some embodiments, after the algorithm is adjusted, the steps in the algorithm flow are adjusted to be a platform frame part and a non-platform frame part, and the platform frame part can be turned on with multiple cores, that is, the platform frame part is processed by the multiple cores of the DSP. Judging whether a multi-core can be started or not for the non-platform frame part, if not, distributing a single core and compiling an operator; if the multi-core partition device can be started, the direction of multi-core partition needs to be judged at the moment, the distribution with high adhesion of the pixel points is horizontally divided, and the distribution with low adhesion of the pixel points is vertically divided. And by the method, the utilization of the DSP multi-core resource is maximized.

Thus, the step 1003 further comprises the steps of:

10031: starting DSP multi-core processing on the platform frame part;

10032: judging whether a multi-core can be started for the non-platform frame part;

said step 10032 further comprises the steps of:

100321: when the multi-core can not be started, single-core processing is adopted for allocation, and operators are written;

100323: when the multi-core is judged to be started, further judging whether the inter-line adhesion of the image processed by the algorithm is large; when the adhesion between lines of the processed image is large, horizontal segmentation is adopted; when the inter-line blocking of the processed image is not large, vertical segmentation is employed.

In the step 1004, an overflow judgment is performed on the DSP memory, and if the memory is judged to have an overflow, the memory of the DSP core is adjusted in a manner of separating code segments from data segments, the code segments are linked to the memory segment with a low speed, and the data segments are linked to the memory segment with a high speed; when the memory is judged not to overflow, the next step can be directly carried out. Since the code segments are usually loaded once and the data segments are read and refreshed continuously while the algorithm is running, the algorithm is allocated to different memory segments, so that the memory can be used more efficiently.

Thus, the step 1004 further comprises the steps of:

10041: judging whether the memory overflows or not;

10042: when the memory overflow is judged, the code segment and the data end of the inner algorithm are linked separately;

in step 10042, code segments are linked to a low-rate memory segment, and data segments are linked to a high-rate memory segment.

In step 1005, the interfaces of the algorithm flow framework are integrated on the RISC control core. For example, the platform frame part and the non-platform frame part are respectively integrated after being processed by the DSP, so that the algorithm flow frame system is integrated into a RISC frame.

In step 1006, details of the algorithm are optimized, so as to further improve the operation efficiency of the algorithm on the embedded platform framework.

Specifically, the optimization can be performed in several ways:

2. The timing space of the threads is multiplexed.

In step 1007, the algorithm operation effect after the algorithm is transplanted to the embedded platform is evaluated. In other words, whether the algorithm was successfully migrated and the effect of the migration are assessed.

In step 1008, the algorithm to be transplanted is entered into the embedded platform upper computer. Such as flash file burning using JTAG (test compiler inside chip).

Fig. 5 is a flowchart of 3D structured light at the PC end in the embedded platform-based 3D structured light algorithm transplanting method according to a specific implementation manner of the above preferred embodiment of the present invention. Fig. 6 is a flowchart of a 3D structured light algorithm transplanting method based on an embedded platform according to a specific implementation manner of the above preferred embodiment of the present invention. In other words, the 3D structured light algorithm implemented at the PC side in fig. 5 is transplanted to the embedded platform by the method shown in fig. 6.

Specifically, referring to fig. 5 and 6, the present invention provides a 3D structured light algorithm transplanting method 2000, wherein the 3D structured light algorithm transplanting method 2000 is suitable for transplanting a 3D structured light algorithm designed or operated by a PC terminal to an embedded platform. The algorithm flow of the 3D structured light at the PC end comprises the following steps: image input, smooth filtering, convolution and normalization, feature point detection, feature point filtering, feature point network connection, Node point calculation, Node point connection, table lookup according to the list and image output. The normalization is to obtain the maximum and minimum values of the frame image, the feature point filtering needs multi-core segmentation and cannot perform single-core processing, and when the Node point is calculated, the original image is imported again, so that the performance is lost. The feature point detection, the feature point filtering and the feature point network connection may be integrated as feature point filtering, and the Node point calculation and the Node point connection may be integrated as Node point filtering. The 3D structured light algorithm transplanting method 2000 includes the following steps:

2001: evaluating an algorithm;

2002: adjusting an algorithm flow;

2003: distributing the DSP;

2004: adjusting the memory;

2005: a conformable frame;

2006: optimizing an algorithm;

2007: evaluating the transplantation effect; and

2008: and (4) inputting an algorithm into the embedded platform.

Therein, in said step 2001, performance evaluation is performed on the DSP core, such as frequency maximization Cycle (clock Cycle) calculation using assembly language.

More specifically, in the step 2001, the 3D structured light algorithm in fig. 5 is acquired, and the 3D structured light algorithm is evaluated.

The criteria evaluated in said step 2001 are: when the performance of the 3D structured light algorithm is evaluated to be far from the expected standard, carrying out optimization adjustment on the 3D structured light algorithm; when the performance of the 3D structured light algorithm reaches a criterion, step 2002 is performed. That is, when the evaluation result is not expected, it is necessary to return to optimize the 3D structured light algorithm, and then perform the evaluation, and in turn, until the performance of the 3D structured light algorithm reaches the standard, step 2002 is performed.

Due to the limitation of hardware resources, the embedded platform often processes images according to rows, and some steps in the PC algorithm often need to be performed according to frames, for example, when each pixel in an image is traversed to obtain an extremum, inconsistency exists between the two pixels, and therefore the algorithm flow needs to be adjusted. In the step 2002, the 3D structured light algorithm flow is adjusted, so that the algorithm of the PC end is suitable for the execution flow of the embedded platform. Before adjustment, whether the adjustment of the processing flow needs to be carried out or not needs to be judged, and when the adjustment of the processing flow does not need to be carried out, an embedded platform frame can be directly adopted to start DSP multi-core processing; and when the adjustment of the processing flow is needed, adjusting the algorithm flow.

Specifically, when adjusting, the flow of the 3D structured light algorithm is divided into a platform frame part and a non-platform frame part according to the embedded platform features. The platform framework is an existing line processing modular packaging framework of the embedded platform. These frames often have the characteristics of fast processing speed and convenient modular adjustment, but the disadvantage is also obvious, namely, the degree of freedom for users to adjust the data processing mode is too low, so that a non-platform frame needs to be built.

In other words, in the adjusting process, the 3D structured light algorithm flow steps can be adjusted to be two parts, namely, a platform frame and a non-platform frame. That is, algorithms that are consistent with the platform-embedded frame are placed into the platform-frame portion, while algorithms that are inconsistent with the platform-embedded frame are placed into the non-platform-frame portion, so that processing can be performed separately for the platform-frame portion and the non-platform-frame portion.

Specifically, as can be seen from fig. 5, in the 3D structured light algorithm, during the normalization step, the whole frame image needs to be processed, and therefore, corresponding adjustment needs to be performed when the embedded platform performs processing. According to the method of this embodiment of the present invention, the adjusted 3D structured light algorithm flow is divided into a platform framework part, which includes: image signal processing, smoothing filtering and convolution filtering; and a non-platform frame portion comprising: normalization, feature point filtering and Node point filtering. The feature point filtering includes: the method comprises the steps of feature point detection, feature point filtering and feature point network connection, wherein the feature point filtering can only carry out single-core processing. The Node filtering comprises the following steps: node point calculation, Node point connection.

Thus, said step 2002 comprises the steps 20021: and judging whether to adjust the processing flow.

It further comprises the steps of:

200211: when the adjustment of the processing flow is needed, the 3D structured light algorithm flow is adjusted;

200212: and when the adjustment of the processing flow is not needed, starting the DSP multi-core by adopting the platform framework.

Wherein said step 200211 further comprises the steps of:

2002111: judging whether a platform frame of the embedded platform can be utilized or not;

2002112: when the platform frame is judged to be available, dividing the algorithm flow into a platform frame part;

2002113: and when the platform framework of the embedded platform cannot be utilized, dividing the algorithm flow into a non-platform framework part.

In step 2002111, the decision is based on that the embedded platform framework can be utilized when the step of the 3D structured light algorithm conforms to a predetermined framework structure, and the embedded platform framework cannot be utilized when the step of the 3D structured light algorithm does not conform to the predetermined framework structure.

That is, in the method of this embodiment of the present invention, in the step 2002112, the image signal processing, the smoothing filtering, and the convolution filtering in the step of obtaining the 3D structured light algorithm are determined to be divided into the platform frame portion by using the platform frame of the embedded platform. In the step 2002113, it is determined that the normalization, the feature point filtering, and the Node point filtering of the 3D structured light algorithm cannot be classified into a non-platform frame part by using the platform frame of the embedded platform.

In step 2003, a multi-core processing manner of the multi-core DSP is allocated, for example, whether single-core processing or multi-core processing is adopted is allocated to the steps in the 3D structured light algorithm flow, and what division manner, a vertical division manner, or a horizontal division manner is adopted when multi-core processing is adopted.

Further, when the non-platform frame portion is allocated to start multi-core, it is necessary to determine the direction of multi-core segmentation of the DSP, such as horizontal segmentation and vertical segmentation, and when the inter-row adhesion of the image processed by the 3D structured light algorithm is large, the image needs to be segmented horizontally, and when the inter-row adhesion of the image is small, the image needs to be segmented vertically. The adhesivity refers to the correlation of pixel points expressed in the algorithm calculation between the upper and lower rows of the pixel points of the image.

For example, in some embodiments, after the 3D structured light algorithm is adjusted, the steps in the 3D structured light algorithm flow are adjusted to be a platform frame portion and a non-platform frame portion, and the platform frame portion can be turned on with multiple cores, that is, the platform frame portion is processed by the multiple cores of the DSP. Judging whether a multi-core can be started or not for the non-platform frame part, if not, distributing a single core and compiling an operator; if the multi-core partition device can be started, the direction of multi-core partition needs to be judged at the moment, the distribution with high adhesion of the pixel points is horizontally divided, and the distribution with low adhesion of the pixel points is vertically divided. And by the method, the utilization of the DSP multi-core resource is maximized.

Thus, said step 2003 further comprises the steps of:

20031: starting DSP multi-core processing on the platform frame part;

specifically, in the step 10031, the platform frame part, i.e., the image model, in the 3D structured light algorithm is processed, the smoothing filtering is performed, and the convolution filtering starts DSP multi-kernel processing.

20032: judging whether a multi-core can be started for the non-platform frame part;

the judgment in the step 20032 is based on, for example but not limited to, judging the magnitude of the correlation by calculating the maximum and minimum values of the image pixels processed by the algorithm, that is, judging the magnitude of the influence of the processing results between each other. When the correlation is large, the multi-core cannot be turned on, and when the correlation is small, the multi-core can be turned on. For example, when the latter processing result depends on the former processing result, the correlation is large. The correlation is small when the latter processing structure does not depend on the previous processing result. Of course, other criteria may be used.

The step 20032 further comprises the steps of:

200321: when the multi-core can not be started, single-core processing is adopted for allocation, and operators are written;

200322: when the multi-core can be started, further judging whether the inter-line adhesion of the 3D structured light algorithm processed image is large; when the adhesion between lines of the processed image is large, horizontal segmentation is adopted; when the inter-line blocking of the processed image is not large, vertical segmentation is employed.

Specifically, in the non-platform frame part after the 3D structured light algorithm is adjusted, that is, the normalization, the feature point filtering, and in the Node point filtering, in the feature point filtering step, since the adhesion of data between lines is large when finding neighboring points, it needs to be horizontally divided, and the Node point filtering can be divided in the vertical direction.

In the step 2004, an overflow judgment is performed on the DSP memory, and if the memory is judged to have an overflow, the memory of the DSP core is adjusted in a manner of separating code segments from data segments, the code segments are linked to the memory segment with a low rate, and the data segments are linked to the memory segment with a high rate; when the memory is judged not to overflow, the next step can be directly carried out. Since the code segments are usually loaded once and the data segments are read and refreshed continuously while the algorithm is running, the algorithm is allocated to different memory segments, so that the memory can be used more efficiently.

Thus, the step 2004 further comprises the steps of:

20041: judging whether the memory overflows or not;

20042: when the memory overflow is judged, the code segment and the data end of the inner algorithm are linked separately;

in step 20042, code segments are linked to the low-rate memory segments, and data segments are linked to the high-rate memory segments.

In step 2005, the interface of the 3D structured light algorithm flow framework is integrated on a RISC control core. For example, the platform frame part and the non-platform frame part are respectively integrated after being processed by the DSP, so that the 3D structured light algorithm flow frame system is integrated into a RISC frame.

In step 2006, details of the 3D structured light algorithm are optimized, so as to further improve the operating efficiency of the 3D structured light algorithm on the embedded platform frame.

Specifically, the optimization can be performed in several ways:

2. The timing space of the threads is multiplexed.

Specifically, in the 3D structured light algorithm, as can be seen from fig. 5, when Node point calculation is performed, because original image data needs to be reintroduced, the time consumption of the test algorithm inevitably increases, thereby reducing the operation performance.

In step 2007, the algorithm operation effect after the 3D structured light algorithm is transplanted to the embedded platform is run and evaluated. In other words, whether the 3D structured light algorithm was successfully transplanted and the effect of the transplantation is evaluated.

In the step 2008, the 3D structured light algorithm to be transplanted is recorded into the embedded platform upper computer. Such as flash file burning using (a chip internal test compiler).

It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are given by way of example only and are not limiting of the invention. The objects of the invention have been fully and effectively accomplished. The functional and structural principles of the present invention have been shown and described in the examples, and any variations or modifications of the embodiments of the present invention may be made without departing from the principles.

Claims

1. An algorithm migration system, comprising:

an acquisition evaluation unit for acquiring an algorithm;

an algorithm flow adjustment unit for performing flow adjustment on the obtained algorithm, wherein the algorithm flow adjustment unit is further configured to divide the algorithm into a platform frame portion and a non-platform frame portion in response to the obtained algorithm needing to be adjusted, wherein the platform frame portion is an algorithm portion that is consistent with a frame of the embedded platform; the non-platform frame part is an algorithm part inconsistent with the frame of the embedded platform;

a multi-core allocation unit, which is used for allocating multi-core to the algorithm for processing;

the framework integration unit is used for carrying out framework integration on the algorithm subjected to multi-core processing; and

and the recording unit is used for recording the algorithm into the embedded platform so as to transplant the algorithm to the embedded platform.

2. The algorithm migration system of claim 1, wherein the acquisition evaluation unit performs algorithm performance evaluation on the DSP core by performing frequency-maximized clock cycle calculations using assembly language after acquiring the algorithm.

3. The algorithm transplantation system of claim 1, wherein the acquisition evaluation unit evaluates whether performance of the algorithm meets a desired criterion, and when the criterion is met, proceeds to a next step; and when the evaluation does not reach the expected standard, performing algorithm theory optimization and evaluating again.

4. The algorithm transplantation system according to claim 1, wherein the algorithm flow adjustment unit judges whether or not adjustment of the process flow is required, and when it is judged that the adjustment of the process flow is required, performs flow adjustment on the algorithm; and when the adjustment of the processing flow is not needed, the flow adjustment is not carried out on the algorithm, and the multi-core processing is directly started.

5. The algorithm migration system of claim 1, wherein the multi-core distribution unit initiates multi-core processing separately for the platform frame portion.

6. The algorithm transplantation system of claim 1, wherein the multi-core assignment unit determines whether multi-core needs to be turned on for the non-platform frame portion, and assigns a single core when it is determined that multi-core does not need to be turned on; and when the multi-core needs to be started, distributing the multi-core.

7. The algorithm transplantation system of claim 6, wherein the multi-core allocation unit determines the size of the row-to-row adhesivity for a portion of the non-platform frame portion requiring multi-core activation, and adopts horizontal division when the row-to-row adhesivity is large and vertical division when the row-to-row adhesivity is small.

8. The algorithm transplantation system of claim 1, wherein the algorithm transplantation system includes a memory adjustment unit for adjusting memory allocation after the algorithm is subjected to the multi-core allocation process, so as to improve operation efficiency.

9. The algorithm transplantation system of claim 8, wherein the memory adjustment unit performs an overflow judgment on the memory of the DSP core, and when the overflow is judged, divides the memory of the DSP core into a code segment and a data segment, and links the code segment to the memory segment with a low rate and links the data segment to the memory segment with a high rate.

10. The algorithm transfer system of claim 1, wherein said framework integration unit framework integrates said algorithms after DSP multi-core processing on a RISC control core.

11. The algorithm transplantation system of claim 1, wherein the algorithm transplantation system includes an algorithm optimization unit for optimizing the algorithm after integration of the framework integration unit.

12. The algorithm transplantation system of claim 11, wherein the algorithm optimizing unit optimizes the algorithm by a method selected from the group consisting of: putting the correlation calculation under the multi-core condition as much as possible for processing; multiplexing the time sequence space of the thread; optimizing the register level by using a bottom language; preprocessing by utilizing an inline mode in a compiling stage; and replacing part of software filtering operators in the algorithm with hardware filtering.

13. The algorithm transplantation system according to claim 1, wherein the algorithm transplantation system includes an operation evaluation unit for evaluating an operation effect of the algorithm transplanted to the embedded platform, and the entry unit enters the algorithm into the embedded platform after the evaluation is passed.

14. The algorithm transplantation system of claim 1, wherein the logging unit burns a flash file to the embedded platform using JTAG.

15. The algorithm transplantation system of any one of claims 1 to 14, wherein the algorithm is a 3D structured light algorithm.

16. An algorithm transplanting method based on an embedded platform is characterized by comprising the following steps:

A. evaluating an algorithm;

B. allocating a DSP, comprising:

judging whether the algorithm needs to be subjected to flow adjustment, and when the algorithm needs to be subjected to flow adjustment, carrying out flow adjustment on the algorithm; when the adjustment of the processing flow is not needed, directly starting the multi-core processing, wherein the flow adjustment of the algorithm comprises the following steps: in response to the need to adjust the obtained algorithm, dividing the algorithm into a platform frame part and a non-platform frame part, wherein the platform frame part is an algorithm part consistent with the frame of the embedded platform; the non-platform frame part is an algorithm part inconsistent with the frame of the embedded platform;

C. a conformable frame; and

D. and inputting the algorithm into the embedded platform.

17. The embedded platform based algorithm transplanting method according to claim 16, wherein the step a comprises the steps of: after the algorithm is obtained, algorithm performance evaluation is performed on the DSP core in a mode of performing frequency-maximized clock cycle calculation by utilizing assembly language.

18. The embedded platform based algorithm transplanting method according to claim 16, wherein the step a comprises the steps of: evaluating whether the performance of the algorithm meets an expected standard, and entering the next step when the performance of the algorithm meets the expected standard; and when the evaluation does not reach the expected standard, performing algorithm theory optimization and evaluating again.

19. The embedded platform based algorithm transplantation method of claim 16, wherein said step B further comprises the steps of: multi-core processing is separately initiated for the platform frame portion.

20. The embedded platform based algorithm transplantation method of claim 19, wherein said step B further comprises the steps of: judging whether multi-core needs to be started or not for the non-platform frame part, and distributing single cores when the multi-core does not need to be started; and when the multi-core needs to be started, distributing the multi-core.

21. The embedded platform based algorithm transplantation method of claim 16, wherein said step B further comprises the steps of: and judging the adhesion between lines of the part needing opening the multi-core in the non-platform frame part, adopting horizontal segmentation when the adhesion between the lines is large, and vertically segmenting when the adhesion between the lines is small.

22. The embedded platform based algorithm transplantation method of claim 16, wherein after said step B comprises: and E, adjusting the DSP multi-core memory.

23. The embedded platform based algorithm transplanting method of claim 22, wherein the step E performs overflow judgment on the memory of the DSP core, and when the overflow is judged, the memory of the DSP core is divided into a code segment and a data segment, and the code segment is linked to the memory segment with low speed and the data segment is linked to the memory segment with high speed.

24. The embedded platform based algorithm transplanting method of claim 16, wherein in the step C, the algorithm after DSP multi-core processing is frame-integrated on a RISC control core.

25. The embedded platform based algorithm transplantation method of claim 16, wherein said step D is followed by: and F, optimizing the algorithm.

26. The embedded platform based algorithm transplantation method of claim 25, wherein the method for optimizing said algorithm in said step F is selected from the group consisting of: putting the correlation calculation under the multi-core condition as much as possible for processing; multiplexing the time sequence space of the thread; optimizing the register level by using a bottom language; preprocessing by utilizing an inline mode in a compiling stage; and replacing part of software filtering operators in the algorithm with hardware filtering.

27. The embedded platform based algorithm transplantation method of claim 16, wherein the algorithm is a 3D structured light algorithm.

28. A method for migrating a 3D structured light algorithm, wherein the 3D structured light algorithm flow includes smoothing filtering, normalization, feature point filtering and Node point filtering, and wherein the 3D structured light algorithm is migrated to an embedded platform by the method of any one of claims 16 to 27.