WO2021171897A1

WO2021171897A1 - Program parallelization method, program parallelization device, and electronic control device

Info

Publication number: WO2021171897A1
Application number: PCT/JP2021/003111
Authority: WO
Inventors: 泰輔植田; 茂規早瀬; 一芹沢
Original assignee: 日立Astemo株式会社
Priority date: 2020-02-26
Filing date: 2021-01-28
Publication date: 2021-09-02
Also published as: JP7323478B2; JP2021135664A

Abstract

This program parallelization method for generating a multi-core processor program from a single-core processor program includes: a pre-conversion step of executing pre-conversion for program parallelization processing of the single-core processor program; a parallelization processing step of subjecting the pre-converted single-core processor program to parallelization processing; and a reverse conversion step of executing reverse conversion of the pre-conversion in a region of the single-core processor program that was subjected to the parallelization processing but was not parallelized.

Description

Program parallelization method, program parallelization device, electronic control device

The present invention relates to a program parallelization method, a program parallelization device, and an electronic control device.

Technology development aimed at automatic driving of vehicles is underway. Autonomous driving requires recognition of the surroundings and control of the vehicle on behalf of the driver, and enormous information processing is required. In order to cope with the increasing amount of information processing, studies are underway to utilize a multi-core processor, and appropriate processing is required in consideration of the complexity of implementation due to the increase in the number of cores used. Among them, there are expectations for program parallelization that automatically creates parallel programs for multi-core processors from sequential programs for single-core processors, and we are also considering using the automatic parallelization tool that is that tool. Has been done.

Patent Document 1 is a parallel compilation method for generating a parallel program parallelized so that it can be processed by a multi-core processor from a serial program described so that it can be processed by a single-core processor, and constitutes the sequential program. According to the classification procedure for classifying the processing group to be performed into sequential processing that sequentially operates on a single core constituting the multi-core processor and parallel processing that operates in parallel on a plurality of cores constituting the multi-core processor, and the classification procedure. The parallel program is created based on the allocation procedure for executing the non-uniform allocation process for non-uniformly allocating the processes classified into the parallel process to the plurality of cores, the classification result of the classification procedure, and the allocation result of the allocation procedure. A parallel compilation method is disclosed, which comprises a generation procedure for generation.

Japanese Patent Application Laid-Open No. 2016-218503

With the technology described in Patent Document 1, there is room for improvement in the performance of the generated program.

The program parallelization method according to the first aspect of the present invention is a program parallelization method for generating a multi-core processor program from a single-core processor program executed by a computer, and is a program parallelization process of the single-core processor program. Of the pre-conversion step for executing the pre-conversion for the above, the parallel processing step for parallelizing the pre-converted single-core processor program, and the parallelized single-core processor program. It includes an inverse conversion step that executes the inverse conversion of the pre-conversion in the non-parallelized region.
The program parallelizing device according to the second aspect of the present invention executes the above-mentioned program parallelizing method.
The electronic control device according to the third aspect of the present invention includes a storage unit for storing the multi-core processor program created by using the above-mentioned program parallelization method, and the multi-core processor program stored in the storage unit. It has a multi-core processor to run.

According to the present invention, the performance of the program can be improved.

Overall configuration diagram of program parallelizer and in-vehicle system Hardware configuration diagram of program parallelizer and autonomous drive control device Functional configuration diagram of the autonomous driving control device Functional configuration diagram of the program parallelizer Flowchart showing the processing of the program parallelizer Diagram showing the relationship between each process and the program Diagram showing an overview of program changes Sequence diagram showing information transmission from the program parallelizing device to the autonomous driving control device

-First Embodiment-
Hereinafter, the first embodiment of the program parallelizing device will be described with reference to FIGS. 1 to 8. In the present embodiment, the "program" describes a process that can be interpreted by a computer and executed by the computer. In this embodiment, the use of a compiled programming language is assumed, but it can also be applied to an interpreted programming language. Further, in the present embodiment, "program" is used in the same meaning as "source code" that can be read and written by humans, but "program" is "binary code" that is difficult for humans to understand directly and is easy for computers to understand. It may be.

<System configuration>
FIG. 1 is an overall configuration diagram of an in-vehicle system 1 including a program parallelizing device 111 and an autonomous traveling control device 2 using a program output by the program parallelizing device 111. The configuration and operation of the program parallelizing device 111 will be described in detail later.

The in-vehicle system 1 is mounted on the vehicle 100 and has a camera information acquisition unit 101 that acquires the external world condition of the vehicle 100 by a camera, a radar information acquisition unit 102 that acquires the external world condition of the vehicle 100 by a radar, and an external world condition of the vehicle 100. It is provided with a laser information acquisition unit 103 that acquires the information by a laser, and a vehicle position information acquisition unit 104 that detects the position of the vehicle 100 using a satellite navigation system, for example, a GPS (Global Positioning System) receiver. The in-vehicle system 1 further includes an automatic driving setting unit 105 for setting the automatic driving of the vehicle 100, and a wireless communication unit 106 for updating the information of the in-vehicle system 1 by OTA (Over-The-Air). The wireless communication unit 106 is connected to the program parallelizing device 111 via a wireless network, for example. The program parallelizing device 111 executes a program parallelizing process described later.

The in-vehicle system 1 further includes an autonomous driving control device 2 which is an electronic control device, an auxiliary control unit 107, a brake control unit 108, an engine control unit 109, and a power steering control unit 110. Each of the autonomous travel control device 2, the auxiliary control unit 107, the brake control unit 108, the engine control unit 109, and the power steering control unit 110 is, for example, an ECU (Electronic Control Unit).

Camera information acquisition unit 101, radar information acquisition unit 102, laser information acquisition unit 103, vehicle position information acquisition unit 104, automatic driving setting unit 105, wireless communication unit 106, autonomous driving control device 2, auxiliary control unit 107, brake control The unit 108, the engine control unit 109, and the power steering control unit 110 are connected to each other so as to be able to communicate with each other by an in-vehicle network such as CAN (Controller Area Network) or Ethernet (registered trademark).

The camera information acquisition unit 101, the radar information acquisition unit 102, the laser information acquisition unit 103, and the own vehicle position information acquisition unit 104 each transmit the information received from the sensor or the like to the autonomous travel control device 2. The automatic driving setting unit 105 transmits setting information such as a destination, a route, and a traveling speed at the time of automatic driving to the autonomous driving control device 2.

The autonomous driving control device 2 performs processing for automatic driving control and outputs a control command to the brake control unit 108, the engine control unit 109, and the power steering control unit 110 based on the processing result. The auxiliary control unit 107 performs the same control as the autonomous travel control device 2 as an auxiliary. The brake control unit 108 controls the braking force of the vehicle 100. The engine control unit 109 controls the driving force of the vehicle 100. The power steering control unit 110 controls the steering of the vehicle 100.

When the autonomous driving control device 2 receives an automatic driving setting request from the automatic driving setting unit 105, the camera information acquisition unit 101, the radar information acquisition unit 102, the laser information acquisition unit 103, the vehicle position information acquisition unit 104, and the like The trajectory on which the vehicle 100 moves is calculated based on the information of the outside world. Then, the autonomous travel control device 2 issues control commands such as braking force, driving force, and steering to the brake control unit 108, the engine control unit 109, and the power steering control unit 110 so as to move the vehicle 100 according to the calculated track. Output to. The brake control unit 108, the engine control unit 109, and the power steering control unit 110 receive control commands from the autonomous travel control device 2 and output operation signals to actuators, which are not shown, respectively.

<Hardware configuration>
FIG. 2 is a hardware configuration diagram of the program parallelizing device 111 and the autonomous traveling control device 2. Since the hardware configurations of both are common, the hardware configuration of the program parallelizing device 111 will be described here as a representative. The program parallelizing device 111 includes a CPU 251, a ROM 252, a RAM 253, a flash memory 254, and a communication interface 256. The flash memory 254 is a non-volatile storage area. The CPU 251 realizes a function described later by expanding and executing a program stored in at least one of the ROM 252 and the flash memory 254 in the RAM 253.

The hardware of the CPU 251 which constitutes the program parallelizing device 111, the ROM 252, the RAM 253, the flash memory 254, and the communication interface 256 may be configured as a plurality of devices. Further, the CPU 251 and the ROM 252, the RAM 253, and the flash memory 254 may be configured by configuring a plurality of hardware as one device, such as a SoC (System on Chip).

The program stored in the flash memory 254 of the autonomous travel control device 2 may be a program received from the program parallelizing device 111. The communication interface 256 of the autonomous driving control device 2 is an interface for communicating with a predetermined protocol such as CAN. The communication interface 256 of the program parallelizing device 111 is, for example, a wireless communication module for communicating with the wireless communication unit 106 of the vehicle 100. The autonomous travel control device 2 may be composed of one ECU (Electronic Control Unit) or may be composed of a plurality of ECUs.

<Functional configuration of autonomous driving control device>
FIG. 3 is a functional configuration diagram of the autonomous travel control device 2. The autonomous driving control device 2 is a multi-core processor including a first communication interface 211-1, a second communication interface 201-2, and a first core 203-1 to an Nth core 203-N (N is an arbitrary two or more natural numbers). It has 202 and a storage unit 204. Hereinafter, the first communication interface 211-1 and the second communication interface 201-2 will be collectively referred to as "communication interface 201". Further, the first core 203-1 to the Nth core 203-N are collectively referred to as "core 203".

The communication interface 201 is realized by the communication interface 256 of FIG. The multi-core processor 202 is realized by the CPU 251 and the core 203 is composed of the multi-core included in the CPU 251. The multi-core processor 202 may be realized by SoC. The storage unit 204 may be configured in the CPU 251 or may be realized by the ROM 252, the RAM 253, or the flash memory 254. The storage unit 204 may be considered as a general term for a storage area required when the multi-core processor 202 expands and executes a program stored in the ROM 252 in a RAM 253 or the like.

The autonomous driving control device 2 uses the camera information acquisition unit 101, the radar information acquisition unit 102, the laser information acquisition unit 103, the vehicle position information acquisition unit 104, and the automatic driving setting in FIG. 1 via the first communication interface 211-1. It is connected to the unit 105 and the wireless communication unit 106, and is connected to the auxiliary control unit 107, the brake control unit 108, the engine control unit 109, and the power steering control unit 110 via the second communication interface 201-2.

The autonomous driving control device 2 executes processing for automatic driving control in the multi-core processor 202. The multi-core processor 202 receives sensor information from the camera information acquisition unit 101, radar information acquisition unit 102, laser information acquisition unit 103, and own vehicle position information acquisition unit 104, which are input from the first communication interface 211-1, and automatic operation. Acquires automatic operation setting information from the setting unit 105. The acquired information is used to execute peripheral cognitive processing and trajectory calculation processing, and based on the calculation processing results, control commands such as braking force and driving force are output from the second communication interface 201-2.

The program for the autonomous travel control device 2 to execute the process is created in the program parallelizing device 111. The autonomous travel control device 2 may acquire the program information from the wireless communication unit 106 and store it in the storage unit 204.

<Functional configuration of program parallelizer>
FIG. 4 is a functional configuration diagram of the program parallelizing device 111. The program parallelization device 111 includes a determination unit 31, a pre-processing unit 32, a parallelization unit 33, a post-processing unit 34, an integration unit 35, a compiler 39, and a device storage unit 37. The determination unit 31, the pre-processing unit 32, the parallelization unit 33, the post-processing unit 34, the integration unit 35, and the compiler 39 are realized by the CPU 251 executing a program stored in the ROM 252.

The device storage unit 37 stores the original program 51, the target program 52, the preprocessed program 53, the converted program 54, the inverse conversion addition program 55, the non-target program 56, and the integrated program 57. Will be done. The device storage unit 37 is a concept including the RAM 253 and the flash memory 254 shown in FIG. 2, and may be either of the two or may be realized by combining the two. However, the programs indicated by reference numerals 51 to 57 are listed only for the sake of explanation, and it is not essential that all of them exist in the device storage unit 37 at the same time.

The original program 51 is a program code created in advance, and is created by, for example, a programmer or an automatic source code generation tool. The original program 51 does not include an explicit command regarding parallelization as described later. That is, since the original program 51 is not specialized in processing by the multi-core processor, the original program 51 can also be called a “single-core processor program”.

The target program 52 is a program among the original programs 51 that is determined by the discriminating unit 31 to be the target of parallel processing. The non-target program 56 is a program among the original programs 51 that is determined by the discriminating unit 31 to be not the target of parallel processing. That is, the target program 52 and the non-target program 56 are combined to form the original program 51.

The pre-processed program 53 is a target program 52 that has been pre-processed by the pre-processing unit 32. The converted program 54 is a pre-processed program 53 that has been parallelized by the parallelizing unit 33. The inverse transformation addition program 55 is a program output by the post-processing unit 34, and is a program in which the parallelization unit 33 reverse-converts the portion of the converted program 54 that has not been parallelized. Details will be described later.

The integrated program 57 is a program that combines the inverse transformation addition program 55 and the non-target program 56. The non-target program 56 is a program obtained by removing the target program 52 from the original program 51, and the target program 52 is converted into an inverse transformation addition program 55 suitable for parallel processing by various processes. Therefore, the integrated program 57 can be said to be the original program 51 optimized for parallel processing.

The discrimination unit 31 outputs the target program 52 and the non-target program 56 with the original program 51 as the processing target. The determination unit 31 reads the original program 51, outputs a portion determined to be the target of parallel processing as the target program 52, and outputs a portion determined to be non-target of parallel processing as the non-target program 56. The discriminating unit 31 determines that the target of parallel processing is a part of the target program 52 that satisfies both the improvement possibility and the processability. The possibility of improvement means that there is a high possibility that the processing speed will be improved by parallelization. The processability means that the parallelization unit 33 can process the parallelization.

When parallelization processing is possible by the parallelization unit 33, parallelization can be performed by the parallelization unit 33 as it is without performing any special processing, or by performing descriptive conversion processing by the preprocessing unit 32. The case where the unit 33 can be processed is included. That is, the case where the processability is denied is a case where the parallelization process by the parallelization unit 33 is impossible and the description conversion process by the preprocessing unit 32 is impossible as it is.

The discriminating unit 31 performs profiling on the original program 51, and there is a high possibility that the processing speed will be improved by parallelization, and the function that can be descriptively converted so that the parallelization unit 33 can process is the target of parallel processing. Judge. The discriminating unit 31 is realized by using, for example, a profiler. The determination unit 31 may determine the target of parallelization in consideration of the measurement of the execution time of each function and the dependency between the functions.

The pre-processing unit 32 targets the target program 52 and outputs the pre-processed program 53. The pre-processing unit 32 rewrites the target program 52 so as to meet the known processing restrictions of the parallelization unit 33. This rewriting is also called "description conversion processing". The processing restrictions of the parallelizing unit 33 are, for example, restrictions on variable types and restrictions on function calls. For example, the preprocessing unit 32 may use the floating-point single-precision "float" if the parallelizing unit 33 has a restriction that the "double" type, which is a floating-point double-precision variable, cannot be used in the variable type. Rewrite to type. Further, the preprocessing unit 32 rewrites the recursive function into a non-recursive function when there is a limitation that the parallelizing unit 33 cannot handle the recursive call in the function call.

The pre-processing unit 32 records the contents of the pre-processing so that the post-processing unit 34 can perform the reverse conversion processing. This record may be recorded as a comment having a specific format in the preprocessed program 53, a specific character string may be embedded in the preprocessed program 53, or it may be written to an intermediate processing file (not shown). good. The record as a comment to the preprocessed program 53 is, for example, "// # preconv32 # double value1 >> float value1". This comment indicates that "# preconv32 #" at the beginning is the description of the preprocessing unit 32, and further indicates that "double value1" has been rewritten to "float value1".

The method of embedding a specific character string in the preprocessed program 53 is, for example, a method of describing "typedef float_preconv32_double_float" and "_preconv32_double_float value1" in the preprocessed program 53. This description indicates that the processing is performed by the preprocessing unit 32 by describing "_preconv32_" at the beginning of the new name specified by "typedef". Furthermore, "double_float" indicates that the "double" type has been changed to the "float" type.

The parallelization unit 33 targets the preprocessed program 53 as a processing target, and outputs the converted program 54. The parallelization unit 33 is a known parallelization tool and converts the source code into the source code. That is, the parallelization unit 33 is not a compiler but a program that rewrites the source code. The parallelization unit 33 rewrites the preprocessing program 53 and gives the compiler 39 an explicit command regarding parallelization. At the location rewritten by the parallelization unit 33, a specific parallel processing command is inserted after a specific character string such as "#pragma parallel".

The parallelizing unit 33 does not delete at least the comment described by the preprocessing unit 32, but leaves it as it is in the converted program 54. The parallelizing unit 33 specifies in advance the characteristics of the comments described by the preprocessing unit 32, so that only the comments described by the preprocessing unit 32 are selected and left by automatic processing, and other comments are deleted. May be good. Further, the parallelizing unit 33 may specify an operation mode in which the comment is not deleted from the operator by the operation option.

The post-processing unit 34 targets the converted program 54 as a processing target, and outputs the inverse conversion addition program 55. The post-processing unit 34 reads the converted program 54 and identifies a part that has not been rewritten by the parallelizing unit 33, for example, a function that has not been rewritten by the parallelizing unit 33. For example, specify a function that does not contain a specific character string such as "#pragma parallel" immediately before the function. Then, the post-processing unit 34 rewrites the portion that has not been rewritten by the parallelizing unit 33 to the state before the pre-processing by the pre-processing unit 32 is performed. In other words, the pre-processing unit 32 converts the program into a format suitable for parallelization, but the post-processing unit 34 reversely converts the program into the original format.

The process of inverse transformation by the post-processing unit 34 is performed by referring to the record because the pre-processing unit 32 records the processing content as described above. For example, the post-processing unit 34 includes the comment "// # preconv32 # double value1 >> float value1" in the converted program 54, and "#pragma" immediately before the function containing the declaration of "float value1". If a specific character string such as "parallel" is not described, rewrite "float value1" to "double value1".

The integration unit 35 targets the inverse transformation addition program 55 and the non-target program 56, and outputs the integrated program 57. That is, the integration unit 35 creates the integrated program 57 by combining the description contents of the inverse transformation addition program 55 and the description contents of the non-target program 56.

<Operation flowchart>
FIG. 5 is a flowchart showing the processing of the program parallelizing device 111. For example, when the original program 51 is stored in the storage unit or when a processing execution command is received from the outside, the program parallelizing device 111 executes the operation shown in the following flowchart. The discriminating unit 31 executes steps S401 to S404 described below, the pre-processing unit 32 executes step S405, the parallelizing unit 33 executes step S406, and the post-processing unit 34 executes step S407. Step S408 is executed by the integration unit 35, and step S409 is executed by the compiler 39. Hereinafter, the method in which the program parallelizing device 111 creates the integrated program 57 based on the original program 51 is referred to as a “program parallelizing method”.

In step S401, the program parallelizer 111 profiles the original program 51. Specifically, the program parallelizing device 111 extracts a part that can be speeded up by changing from sequential processing to parallel processing.

In step S402, the program parallelizing device 111 determines, based on the result extracted in step S401, the evaluation of the above-mentioned improvement possibility, that is, whether or not there is a possibility that the speed can be increased by parallel processing. Specifically, the determination unit 31 proceeds to step S403 when it determines that at least a part of the original program 51 is likely to be speeded up by parallel processing. When the determination unit 31 determines that there is no possibility that the speed can be increased by parallel processing at any of the parts of the original program 51, the determination unit 31 proceeds to step S409.

In step S403, the preprocessing unit 32 of the program parallelizing device 111 determines whether or not the above-mentioned processability evaluation, in other words, descriptive conversion can be performed on the program expected to be speeded up in step S402. The description conversion performed here is a conversion of the program necessary for carrying out program parallelization, and indicates that the description is converted into a description within a range that can be analyzed by the parallelization unit 33 described later. It is generally required to adjust the program type and the bit size to be processed, and as a specific conversion example, it depends on the data model adopted by the OS (Operation System) in order to adjust the double precision to single precision. It is possible to convert a long bit size type of long long type to a short bit size type such as int type.

In step S403, the determination unit 31 proceeds to step S404 when it determines that the program expected to be faster can be descriptively converted. When the determination unit 31 determines that all the programs expected to be speeded up cannot be described and converted, that is, the parallelization unit 33 cannot analyze the program, the process proceeds to step S409. In step S404, the discriminating unit 31 of the program parallelizing device 111 extracts a program capable of descriptive conversion, which is expected to be speeded up by parallel processing. Specifically, the discriminating unit 31 and the portion used as it is for the single-core processor are saved as the target program 52, and the portion for performing program parallelization for the multi-core processor is saved as the non-target program 56.

In step S405, the pre-processed program 53 of the program parallelizing device 111 executes the description conversion of the extracted program. Specifically, the target program 52 saved in step S404 is subjected to descriptive conversion, and the preprocessed program 53 is output. In step S406, the parallelizing unit 33 of the program parallelizing device 111 performs parallelization on the preprocessed program 53 output in step S405, and outputs the converted program 54. In other words, in step S406, a sequential program which is a program for a single core processor, that is, a parallel program which is a program for a multi-core processor, that is, a converted program 54 is created from the preprocessed program 53.

In step S407, the post-processing unit 34 of the program parallelizing device 111 performs descriptive inverse transformation on the program that was not parallelized by the parallelizing unit 33 in step S406. In the program that has not been parallelized, the descriptive transformation executed by the preprocessing unit 32 in step S405 is inversely transformed so as to return to the original program description input at the start of processing.

In step S408, the integration unit 35 of the program parallelizing device 111 is an integrated program by combining the inverse conversion addition program 55 that has undergone the description inverse transformation in step S407 and the non-target program 56 that has not undergone the description conversion in step S404. Generate 57. In the following step S409, the compiler 39 compiles the integrated program 57 to generate the binary code 59, and ends the process shown in FIG. However, when the compiler 39 proceeds from step S402 and step S403 to step S409, the compiler 39 compiles the original program 51 instead of the integrated program 57. The above is the description of FIG.

<Relationship between each process and program>
FIG. 6 is a diagram showing the relationship between each process up to the creation of the integrated program 57 in the program parallelizing device 111 and the program. This will be described together with the flowchart shown in FIG. The original program 51 is a sequential program for a single core processor input to the program parallelizing device 111. The description of "C" indicates that the program is written in, for example, C language.

The original program 51 is divided into a target program 52 and a non-target program 56 in the parallel extraction step S501. The parallel extraction step S501 corresponds to steps S401 to S404 shown in FIG. The target program 52 is a program extracted in step S404 as a portion for performing program parallelization for the multi-core processor. The non-target program 56 is a program not extracted in step S404, that is, a program obtained by removing the target program 52 from the original program 51. If it is determined in step S402 and step S403 that the process proceeds to step S409, it may be considered that the target program 52 is an empty set in the parallel extraction step S501.

Next, the target program 52 is converted into the pre-processed program 53 in the description conversion step S502. The description conversion step S502 corresponds to step S405 shown in FIG. Next, the pre-processed program 53 is converted into the converted program 54 in the parallel processing step S503. The parallel processing step S503 corresponds to step S406 shown in FIG.

Next, the converted program 54 is converted into the inverse conversion addition program 55 in the description inverse conversion step S504. The description inverse conversion step S504 corresponds to step S407 shown in FIG. Next, the inverse transformation addition program 55 and the non-target program 56 are converted into the integrated program 57 in the joining step S505. The joining step S505 corresponds to step S408 shown in FIG.

<Outline of program changes>
FIG. 7 is a diagram showing an outline of changes in the program in the program parallelizing device 111. In FIG. 7, specific source code names are described inside each program, and the horizontal length shown shows the processing time when it is assumed that the source code is compiled and executed as it is. Further, in the description of FIG. 7, "processing time when it is assumed that the source code X is compiled and executed as it is" is abbreviated as "processing time of the source code X". Further, in FIG. 7, an example will be described in which a multi-core processor 202 having N of 3 and having three cores 203 is used.

The original program 51 shown in FIG. 7 is a sequential program that sequentially executes the processes described in the source codes A to C. The original program 51 can be said to be a sequential program in which the processes A to C are sequentially executed. According to the parallel extraction step S501, the source code A and the source code B were determined to be capable of speeding up and the description conversion was possible, and became the target program 52, and the remaining source code C became the non-target program 56. Since the source codes A and B included in the target program 52 do not change from the original program 51, the processing time, which is the length in the horizontal direction shown in the drawing, does not change.

The pre-processed program 53 is configured as a program that sequentially executes the processes described in the source codes A1 and B1 by the description conversion step S502 shown in FIG. The source code A1 is a source code to which a description conversion is applied to the source code A so that the parallelizing unit 33 can process the source code A. The source code B1 is a source code to which a descriptive conversion is applied to the source code B. In FIG. 7, it is shown that the descriptive conversion is applied to the hatched source code such as the source codes A1 and B1.

Here, the pre-processed program 53 has a longer processing time than the target program 52. The reason for this is as follows. That is, when the program type and the bit size to be processed are adjusted within the range that can be analyzed by the parallelizing unit 33 by the descriptive conversion, it is necessary to take measures so that the same processing can be executed under the restricted conditions. This is because the binary obtained by compiling the source code obtained by performing this descriptive conversion as it is is generally considered to have disadvantages such as a long processing time.

In the converted program 54, the processes shown in the source code A1 are parallelized into three by the process of the parallelization process step S503 shown in FIG. 6, and are paralleled as A2-1, A2-2, and A2-3. Is executed, and then the sequential processing shown in the source code B2 is executed. That is, in the example shown in FIG. 7, the parallelization unit 33 indicates that the source code A1 is described for parallelization and the source code B1 is not processed for parallelization. Since the processing of the source code A1 is parallelized in three as processing A2-1 to processing A2-3 in the converted program 54, the execution time indicated by the length in the horizontal direction shown in the figure is not only shorter than that of the source code A1. , Shorter than source code A. This is because even if there is overhead for descriptive conversion and parallelization, the advantage of parallelization is large and the processing time is shortened.

The source code B1 that was not described for parallelization by the parallelization unit 33 is also rewritten to the source code B2 by the parallelization unit 33. This means that the parallelization unit 33 has rewritten the description so that the compilation process is easy. The processing time of the source code B1 and the processing time of the source code B2 are substantially the same.

The inverse transformation addition program 55 sequentially executes the parallel programs (processes A2-1 to A2-3) parallelized in three and the sequential programs of the processes B by the description inverse transformation step S504 shown in FIG. It is configured as a program.

Here, the processing time of the inverse transformation addition program 55 is shorter than that of the converted program 54. Perform descriptive inverse conversion on the source code B2 that was not parallelized by the parallelization unit 33, and restore the same processing as before the descriptive conversion, such as restoring the adjusted program type and the bit size to be processed. The processing time can be shortened by corresponding to. The integrated program 57 is a combination of the inverse transformation addition program 55 and the non-target program 56 by the combination step S505 shown in FIG.

<Example of information reception sequence>
FIG. 8 is a sequence diagram showing information reception from the program parallelizing device 111 to the autonomous travel control device 2. In FIG. 8, for example, when the autonomous driving control device 2 detects an abnormality in the multi-core processor 202, it notifies the program parallelizing device 111 installed on the cloud or the like, and receives new program information via the wireless network by OTA. Here is an example of how to do it.

First, when the autonomous driving control device 2 detects a failure of the multi-core processor 202 (S701), it transfers the detected information to the wireless communication unit 106 of the in-vehicle system 1 (S702). Next, the wireless communication unit 106 transfers the received detection information to the program parallelizing device 111 via the wireless network (S703).

The program parallelizing device 111 that has received the detection information reconfigures the program for the multi-core processor 202 (S704). Specifically, for example, based on the detection information, the program parallelization process is performed according to the number of usable cores 203 without being affected by the failure. Next, the program parallelizing device 111 transfers the information of the reconstructed program to the wireless communication unit 106 (S705). Next, the wireless communication unit 106 transfers the received program information to the autonomous travel control device 2 (S706). After that, the autonomous travel control device 2 may operate with a new program (S707) according to the update timing and the method in the in-vehicle system 1 to complete the process.

According to this embodiment, not only the description conversion according to the automatic conversion tool is performed, but also the processing time in program parallelization can be shortened by combining the description inverse conversion. Therefore, it is possible to improve the performance of the multi-core processor operated by the parallel program created through the program parallelization. Further, according to this embodiment, the program can be reconfigured according to the state of the multi-core processor that executes the program.

According to the first embodiment described above, the following effects can be obtained.
(1) The program parallelizing device 111 executes a program parallelizing method for generating an integrated program 57, which is a program for a multi-core processor, from an original program 51, which is a program for a single-core processor. The program parallelization method includes a pre-conversion step (step S405 in FIG. 5) for executing pre-conversion for program parallelization processing of a single-core processor program, and post-processing that is a pre-converted single-core processor program. The parallelization processing step (step S406 in FIG. 5) in which the parallelizing unit 33 parallelizes the program 53 and the non-parallelized region of the parallelized single-core processor program, for example, an example shown in FIG. Then, the source code B2 includes an inverse conversion step (step S407 in FIG. 5) for executing the inverse conversion of the pre-conversion. Therefore, the demerit of the description conversion can be eliminated by inversely converting the description of the program that has not been parallelized by the parallelization unit 33. The disadvantages here are that the accuracy of the calculation is reduced by reducing the number of bits of the variable so that the parallelization unit 33 can process it, and that the calculation time is rewritten so as to avoid recursive representation. For example, to extend. That is, the program parallelizing device 111 can improve the performance of the output program by performing the inverse transformation by the post-processing unit 34.

(2) The program parallelization method is the first program area of a program for a single-core processor, which is expected to have a high-speed effect due to the program parallelization processing and can be pre-converted for the program parallelization processing. An extraction step (steps S402 and S403 in FIG. 5) for extracting the target program 52 is included. In the pre-conversion step, pre-conversion is executed in the target program 52, which is the first program area. Therefore, the processing load in steps S404 to S407 can be reduced by limiting the target of the pre-conversion to not the entire original program 51 but a part thereof.

(3) Of the original programs 51, the non-target program 56, which is a second program area other than the target program 52, and the reverse conversion addition, which is the first program in which the pre-conversion, parallelization processing, and inverse conversion are executed. It includes a joining step (step S408 of FIG. 5) of joining the programs 55 to obtain the integrated program 57. Therefore, by separating the programs unsuitable for the parallel processing in advance and combining them after the parallel processing, the entire original program 51 can be collectively processed by the compiler 39.

(4) The pre-conversion process (step S405 in FIG. 5) includes a process of changing the description of a large bit size type to a small bit size type, for example, a process of changing a double precision type to a single precision type. The inverse transformation process (step S407 in FIG. 5) includes a process of changing the description of a small bit size type to a large bit size type, for example, a process of changing a single precision type to a double precision type. Therefore, the parallelization process can be executed by using the parallelization unit 33 in which the bit size of the variable that can be processed is limited.

(5) The program parallelizing device 111 executes the above-mentioned program parallelizing method. Therefore, the program parallelizing device 111 can output the integrated program 57, which is a program in which the disadvantages due to the description conversion of the non-parallelized portion are eliminated.

(6) The autonomous travel control device 2 includes a multi-core processor 202 and a storage unit 204. The storage unit 204 has a binary code 59 obtained by compiling the integrated program 57, which is a program for a multi-core processor created by the above-mentioned program parallelization method. Therefore, the autonomous travel control device 2 can execute the binary code 59 in which the demerit due to the descriptive conversion is eliminated by the post-processing unit 34 for the portion that is parallelized by the action of the parallelization unit 33 and is not parallelized.

(Modification example 1)
In the above-described embodiment, the program parallelizing device 111 performs the descriptive inverse transformation only for the source code B2 that is not parallelized by the parallelizing unit 33 in FIG. 7. However, the post-processing unit 34 performs descriptive inverse transformation on the source code A2 parallelized by the parallelization unit 33 as long as the synchronization between the cores can be ensured without breaking the dependency of the parallel processing order. It may be carried out.

(Modification 2)
In the above-described embodiment, the program parallelizer 111 is configured to include a compiler 39. However, the program parallelizing device 111 does not include the compiler 39, and may use a compiler included in another device connected via a network or the like.

(Modification example 3)
The program parallelizing device 111 does not include the discriminating unit 31, and the preprocessing unit 32 may process the entire original program 51. Further, in this case, since the non-target program 56 does not exist, the program parallelizing device 111 does not have to include the integrating unit 35.

(Modification example 4)
The program created by the program parallelizing device 111 may be stored in the ROM 252 of the autonomous travel control device 2 in advance.

In each of the above-described embodiments and modifications, the configuration of the functional block is only an example. Several functional configurations shown as separate functional blocks may be integrally configured, or the configuration represented by one functional block diagram may be divided into two or more functions. Further, a part of the functions possessed by each functional block may be provided in the other functional blocks.

The present invention is not limited to the above-mentioned examples and modifications, but includes various modifications and equivalent configurations within the scope of the attached claims. For example, the above-described examples and modifications have been described in detail in order to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to those having all the described configurations. In addition, the control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines necessary for implementation. In practice, it can be considered that almost all configurations are interconnected.

The above-described embodiments and modifications may be combined. Although various embodiments and modifications have been described above, the present invention is not limited to these contents. Other aspects conceivable within the scope of the technical idea of the present invention are also included within the scope of the present invention.

1 ... In-vehicle system 2 ... Autonomous driving control device 31 ... Discrimination unit 32 ... Pre-processing unit 33 ... Parallelization unit 34 ... Post-processing unit 35 ... Integration unit 39 ... Compiler 51 ... Original program 52 ... Target program 53 ... Pre-processing program 54 ... Converted program 55 ... Reverse conversion additional program 56 ... Excluded program 57 ... Integrated program 59 ... Binary code 111 ... Program parallelizer 202 ... Multi-core processor 204 ... Storage unit

Claims

It is a program parallelization method that a computer executes to generate a program for a multi-core processor from a program for a single-core processor.
A pre-conversion step for executing pre-conversion for program parallelization processing of the single-core processor program, and
A parallel processing step for parallel processing the pre-converted program for a single core processor, and
A program parallelization method including an inverse transformation step of executing the inverse transformation of the pre-conversion in a non-parallelized region of the parallelized program for a single core processor.
The program parallelization method according to claim 1.
Further, an extraction step of extracting the first program area in which the program parallelization process is expected to speed up the program for the single core processor and the pre-conversion is possible for the program parallelization process is further performed. Including
The pre-conversion step is a program parallelization method for executing the pre-conversion in the first program area.
The program parallelization method according to claim 2.
Among the programs for a single core processor, a second program area other than the first program area and a first program in which the pre-conversion, the parallel processing, and the inverse conversion are executed are combined. A program parallelization method that further includes a join step.
The program parallelization method according to claim 1.
The pre-conversion involves changing the description of a large bit size type to a small bit size type.
The inverse transformation is a program parallelization method including a process of changing a description from a small bit size type to a large bit size type.
A program parallelizer that executes the program parallelization method according to claim 1.
A storage unit for storing the program for the multi-core processor created by using the program parallelization method according to claim 1.
An electronic control device including a multi-core processor that executes a program for the multi-core processor stored in the storage unit.