CN112270232A

CN112270232A - Method and device for classifying weak traffic participants around vehicle

Info

Publication number: CN112270232A
Application number: CN202011121248.5A
Authority: CN
Inventors: 郭子杰; 支蓉
Original assignee: Daimler AG
Current assignee: Mercedes Benz Group AG
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2021-01-26

Abstract

The invention discloses a method and a device for classifying weak traffic participants around a vehicle. The method includes acquiring an environmental image around a vehicle; segmenting at least one sub-image from the environment image, wherein each sub-image of the at least one sub-image comprises a single vulnerable traffic participant, wherein a vulnerable traffic participant represents a person exposed to the vehicle surroundings; classifying all the vulnerable traffic participants in the at least one sub-image through a pre-trained neural network, and determining the occupation types of all the vulnerable traffic participants according to the classification result. By the method and the device for classifying the participants in the weak traffic around the vehicle, the occupation type of the participants in the weak traffic around the vehicle can be determined through the environment image around the vehicle, so that more accurate information about the participants in the weak traffic can be provided for driver driving, automatic driving, intelligent driving, auxiliary driving and the like.

Description

Method and device for classifying weak traffic participants around vehicle

Technical Field

The present invention relates to the field of image processing, and in particular, to a method and apparatus for classifying vulnerable traffic participants around a vehicle.

Background

With the increasing popularity of vehicles and the popularization of functions such as automatic driving, intelligent driving, auxiliary driving and the like, environmental information around the vehicle is of great importance to the driving process of the vehicle during the driving process of the vehicle. Among the environmental information, information on a person in the environment around the vehicle is important information that affects driver driving, automatic driving, smart driving, driving assistance, and the like.

In general, information of a person in the vehicle surroundings is acquired by: during the travel of the vehicle, an image of the surroundings of the vehicle is acquired, and it is then determined whether the person in the image belongs to a person of a specific type, for example a person who is more likely to be accidentally injured by a vehicle on the road than a person located inside the vehicle, by classifying the person in the image.

However, the above manner of acquiring information of the person in the vehicle surroundings, i.e., classifying the person in the image, only divides the person in the image into two types of a specific type of person and a non-specific type of person. In such a classification manner, the driver needs further judgment to determine how to further control the vehicle; and in the case of automatic driving, intelligent driving, etc., such a classification manner may cause a vehicle to fail to accurately determine a manner of further controlling the vehicle due to failure to know accurate information about the person belonging to a specific type, thereby causing traffic congestion, even traffic accidents, etc.

Thus, existing ways of classifying people in the vehicle surroundings do not meet driving demands.

Disclosure of Invention

The invention aims to provide a method and a device for classifying weak traffic participants around a vehicle.

According to an aspect of the present invention, there is provided a method of classifying participants in vulnerable traffic around a vehicle, the method comprising: acquiring an environment image around a vehicle; segmenting at least one sub-image from the environment image, wherein each sub-image of the at least one sub-image comprises a single vulnerable traffic participant, wherein a vulnerable traffic participant represents a person exposed to the vehicle surroundings; classifying all vulnerable traffic participants in the at least one sub-image through a pre-trained neural network; and determining the occupation types of all the vulnerable traffic participants according to the classification result.

Optionally, the step of classifying all vulnerable traffic participants in the at least one sub-image by a pre-trained neural network comprises: for each of the at least one sub-image as input data to the neural network: acquiring pixel information of the sub-image through a pixel information extraction layer of the neural network; extracting the features of the sub-images according to the acquired pixel information through a feature extraction layer of the neural network; determining, by a classification layer of the neural network, output data as a classification result according to the extracted features, wherein the output data of the neural network represents a probability that the vulnerable traffic participants in the sub-image are classified into each of all occupation types into which the neural network classifies, wherein the step of determining the occupation types of all the vulnerable traffic participants according to the classification result includes: and determining the occupation type with the highest probability in the output data as the occupation type of the vulnerable traffic participant.

Optionally, the step of acquiring the pixel information of the sub-image includes: adjusting the resolution of the sub-image according to the resolution of the feature extraction layer to obtain an adjusted sub-image; acquiring a pixel value vector of each color channel of the adjusted sub-image; normalizing each element in all the obtained pixel value vectors to obtain normalized vectors which are used as pixel information and respectively correspond to all the pixel value vectors, wherein the feature extraction layer of the neural network extracts the features of the sub-images represented by the feature vectors through a convolution function according to the obtained normalized vectors, and the classification layer of the neural network determines output data according to the extracted feature vectors and the weight vectors of the classification layer.

Optionally, the neural network is trained by: inputting training data into a predetermined neural network as input data, wherein the training data comprises training images and corresponding labeled occupation types, and the training images are images comprising single vulnerable traffic participants; acquiring output data of the predetermined neural network, wherein the output data of the predetermined neural network represents the probability that the vulnerable traffic participants in the training images determined by the predetermined neural network are classified into each of all occupation types; determining a difference between the output data of the predetermined neural network and the labeled occupational type through a loss function to adjust a convolution kernel of a feature extraction layer and a weight vector of a classification layer of the predetermined neural network; repeating the above steps using the predetermined neural network of next training data and adjusted convolution kernels and weight vectors until a predetermined stopping condition is met, wherein the predetermined stopping condition is determined according to at least one of: the accuracy of classification of the predetermined neural network, the recall rate of classification of the predetermined neural network, whether the vulnerable traffic participants in the training images are blocked or not and the training times.

Optionally, the classification layer is a softmax classifier layer, wherein the output data is represented by the following equation:

A＝softmax(Z)＝softmax(W^TX+b)

wherein A is a vector representing the output data, wherein each element in A represents a probability that the respective disadvantaged traffic participant is classified as a occupation type corresponding to the respective element position, W is a weight vector, X is a vector corresponding to the respective feature vector, b is a predetermined offset vector, and Z is a vector representing W^TThe vector of X + b, Z being the same dimension as A, wherein the jth element A in A is calculated by the following equation_j，

Wherein Z is_jRepresents the jth element in the vector Z, j is less than or equal to m, m represents the number of all occupational types, and j and m are natural numbers.

Optionally, the loss function is a cross-entropy loss function represented by the following equation:

therein, loss_CETo represent the output of the cross entropy loss function of the difference,

is represented by the formula A_jThe corresponding label is the corresponding value in the occupation type.

Optionally, the convolution function is represented by the following equation:

h (H, k) is a feature vector of H × k dimensions as an output of the convolution function, f (p, q) is a convolution kernel of p × q dimensions, and I is a normalized vector of H × k dimensions representing pixel information, where H, k, p, and q are natural numbers, and p < H, q < k.

According to another aspect of the present invention, there is provided an apparatus for classifying participants in vulnerable traffic around a vehicle, the apparatus comprising: an image acquisition unit configured to be able to acquire an environmental image around a vehicle; an image segmentation unit configured to be able to segment at least one sub-image from the environment image, wherein each sub-image of the at least one sub-image comprises a single vulnerable traffic participant; a classification unit configured to classify all vulnerable traffic participants in the at least one sub-image by a pre-trained neural network; a type determination unit configured to be able to determine the occupation types of all the vulnerable traffic participants according to the classification result.

According to another aspect of the invention, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the method of classifying participants in vulnerable traffic around a vehicle according to the invention.

According to another aspect of the present invention, there is provided a system for classifying participants in vulnerable traffic around a vehicle, the system comprising: a processor; a memory storing a computer program which, when executed by the processor, causes the processor to implement the method of classifying participants in vulnerable traffic around a vehicle according to the invention.

By the method and the device for classifying the participants in the weak traffic around the vehicle, the occupation type of the participants in the weak traffic around the vehicle can be determined through the environment image around the vehicle, so that more accurate information about the participants in the weak traffic can be provided for driver driving, automatic driving, intelligent driving, auxiliary driving and the like.

Drawings

The foregoing and other aspects of the invention will be more fully understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

fig. 1 shows a flow chart of a method of classifying vulnerable traffic participants in the surroundings of a vehicle according to an exemplary embodiment of the present invention.

Fig. 2 shows a flow chart of the steps of classifying the participants in weak traffic in each sub-image in a method of classifying the participants in weak traffic around a vehicle according to an exemplary embodiment of the present invention.

FIG. 3 shows a schematic diagram of a trained neural network according to an exemplary embodiment of the present invention.

Fig. 4 shows a schematic diagram of classifying vulnerable traffic participants around a vehicle according to an exemplary embodiment of the present invention.

Fig. 5 shows a block diagram of an apparatus for classifying vulnerable traffic participants around a vehicle according to an exemplary embodiment of the present invention.

Detailed Description

In the following, some exemplary embodiments of the invention will be described in more detail with reference to the accompanying drawings in order to better understand the basic ideas and advantages of the invention.

Referring to fig. 1, in step S1, an environmental image around the vehicle is acquired.

Here, the environment image may be an image of the environment around the vehicle captured by an image acquisition device on the vehicle, such as a camera, and the image may be an image captured from any orientation of the vehicle or may be a composite image of images captured from a plurality of orientations.

At step S2, at least one sub-image is segmented from the environment image, wherein each of the at least one sub-image comprises a single vulnerable traffic participant, wherein the vulnerable traffic participant represents a person exposed to the vehicle surroundings.

Here, at least one sub-image may be divided from the environment image according to the divided bounding boxes by dividing the respective bounding boxes including the single vulnerable traffic participant in the acquired environment image.

It should be understood that the above manner of dividing the sub-image is only an example, and any image dividing manner capable of dividing the sub-image including a single person (vulnerable traffic participant) from the environment image may be used according to actual needs.

Here, the person exposed to the surroundings of the vehicle, represented by the vulnerable traffic participant, may be anyone not in the vehicle, e.g. a person walking, riding, being in a wheelchair, being in a baby carriage and/or staying in place, etc.

In step S3, all the vulnerable traffic participants in the at least one sub-image are classified by a pre-trained neural network.

Here, the pre-trained neural network may be a neural network trained for professional type classification of the vulnerable traffic participants. The pre-trained neural network can be classified into a variety of occupational types. For example, various occupational types may be primarily directed to occupational types operating outdoors, such as police, doorguards, sanitation workers, road builders, construction workers, and the like, and in addition, vulnerable traffic participants other than these occupational types may be collectively classified as other occupational types.

For example, in step S3, each sub-image may be input as input data to the pre-trained neural network, which may determine a probability that each of the vulnerable traffic participants in each sub-image corresponds to a respective occupation type. For example, with the above example, a probability may be determined that the vulnerable traffic participant corresponds to each of police, doorguards, sanitation workers, road builders, construction workers, and other types of profession.

It should be understood that the above types of professions are merely examples, and that any type of profession may be used depending on actual needs.

In step S4, the occupation types of all the vulnerable traffic participants are determined according to the classification result.

For example, the occupation type of the vulnerable traffic participant may be determined according to the probability that the vulnerable traffic participant corresponds to each occupation type determined in step S3. For example, when it is determined in step S3 that the probability that the vulnerable traffic participant corresponds to the police vocational type is the highest, the career type of the vulnerable traffic participant may be determined as police in step S4; when it is determined in step S3 that the probability that the handicapped traffic participant corresponds to other occupation types is the highest, the occupation type of the handicapped traffic participant may be determined as other occupation types, etc. in step S4.

As an example, in step S3 of fig. 1, steps S31 to S34 of fig. 2 may be performed for each of the at least one sub-image as input data of the neural network.

Referring to fig. 2, in step S31, pixel information of the sub-image may be acquired through a pixel information extraction layer of the neural network.

Here, in order to acquire pixel information adapted to the neural network, as an example, step S31 may include: adjusting the resolution of the sub-image according to the resolution of the feature extraction layer to obtain an adjusted sub-image; acquiring a pixel value vector of each color channel of the adjusted sub-image; each element in all the obtained pixel value vectors is normalized to obtain normalized vectors corresponding to all the pixel value vectors as pixel information, respectively.

Here, in order to adapt the sub-image input to the neural network to the resolution of the feature extraction layer of the neural network, the resolution of the sub-image may be first adjusted to be the same as the resolution of the feature extraction layer of the neural network. Then, a pixel value vector of each color channel (e.g., R/G/B channel or other color channel) of the resolution-adjusted sub-image may be obtained, the pixel value vector may be a two-dimensional vector, and each element in the two-dimensional vector may be a color value (or any other value capable of representing pixel information, such as a gray value, etc.) of a pixel at a corresponding position in the resolution-adjusted sub-image. Then, for ease of processing, each element in each pixel value vector may be normalized to maintain the size of each element of the normalized two-dimensional vector at [ -1, 1]Within the range of (1). For example, each element value of the original two-dimensional vector (pixel value vector) can be normalized by the following equation: x_norm＝(X/2^n-1) -1, where x is the original pixel value in the pixel value vector, x_normThe value range of the original pixel value in the pixel value vector is [0, 2 ] for the normalized pixel valueⁿ-1]。

In step S32, the feature of the sub-image may be extracted from the acquired pixel information by the feature extraction layer of the neural network.

Here, the feature extraction layer of the neural network may be any form of feature extraction layer, that is, the feature extraction layer of the neural network may be any form of feature extractor capable of extracting features of an image.

For example, the feature extraction layer of the neural network may be a convolutional layer. As an example, after obtaining the normalized vector as the pixel information at step S31, the feature extraction layer of the neural network may extract the features of the sub-image expressed in the feature vector through a convolution function according to the obtained normalized vector at step S32.

As an example, the convolution function may be represented by the following equation:

It should be understood that the above feature extraction layers are only examples, and any form of feature extraction layer capable of extracting image features may be used according to actual needs.

In step S33, output data as a classification result may be determined according to the extracted features by a classification layer of the neural network, wherein the output data of the neural network may represent a probability that the vulnerable traffic participant in the sub-image is classified into each of all occupation types into which the neural network is classified.

As an example, after obtaining a normalized vector as pixel information at step S31 and acquiring features represented by feature vectors at step S32, the classification layer of the neural network may determine output data according to the extracted feature vectors and weight vectors of the classification layer at step S33.

Here, any form of classification layer capable of performing classification according to the feature vector may be used, and the classification layer may be, for example, a softmax classifier layer, in which output data (output data of the neural network after training or output data of the neural network during training) may be represented by the following equation:

A＝softmax(Z)＝softmax(W^TX+b)

wherein A is a vector representing output data, wherein each element in A represents the probability that the corresponding vulnerable traffic participant is classified as the occupation type corresponding to the corresponding element position, W is a weight vector, and X isA vector corresponding to the corresponding feature vector, b is a predetermined offset vector, and Z is a term representing W^TThe vector of X + b, Z being the same dimension as A, wherein the jth element A in A is calculated by the following equation_j，

In the case of fig. 2, as an example, step S4 in fig. 1 may include the steps of: and determining the occupation type with the highest probability in the output data A as the occupation type of the vulnerable traffic participant.

It should be understood that the pixel information extraction layer, the feature extraction layer, and the classification layer of the neural network may each include at least one sub-layer.

Referring to fig. 3, in order to acquire a training image for training a neural network, it is possible to acquire an image of an environment around a vehicle as diverse as possible during driving by driving the vehicle under various traffic environments, various weather environments, and various geographical locations (solid-field acquisition on the upper left side of fig. 3). For example, the environment image around the vehicle may be acquired at a fixed frequency during the running of the vehicle, or may be acquired at varying time intervals according to the vehicle speed.

Here, the various traffic environments may include urban congestion environments, suburban environments, high-speed environments, and the like. The plurality of weather conditions may include normal weather conditions (e.g., weather conditions that are clear or in the presence of small winds, rain, snow, and/or fog) and inclement weather conditions (e.g., weather conditions in the presence of large winds, rain, snow, and/or fog). The multiple geographic locations may include multiple cities of different latitudes and longitudes.

After the acquisition of the image, with reference to the dashed box on the upper side in fig. 3, the image may be segmented to segment at least one sub-image from the acquired environment image, each comprising a single vulnerable traffic participant. Labels may then be added to the segmented sub-images, i.e. labeling the occupational type.

Then, referring to the dashed box at the upper right side of fig. 3, the at least one divided sub-image and the corresponding labeled occupation type thereof may be input to a predetermined neural network as training data, and the neural network may be trained by a training device.

Referring to a dashed line box at the lower side of fig. 3, in the training process, a training image may be input to a neural network, then pixel information of the training image is extracted through a pixel information extraction layer of the neural network (not shown in fig. 3 because the pixel information extraction layer does not need to be debugged in the training process), and then a classification result of the training image is output through a feature extraction layer (for ease of understanding, fig. 3 is shown as a feature extractor) and a classification layer (for ease of understanding, fig. 3 is shown as a classifier) of the neural network.

The feature extraction layer and the classification layer of the neural network can then be adapted according to the differences between the output data (i.e. the classification results) of the classification layer (the classifier in fig. 3) and the corresponding label occupation type until the classification results of the neural network reach an acceptable range, i.e. a predetermined stopping criterion is reached.

An exemplary process of training a neural network is illustrated above with reference to fig. 3, and in particular, the neural network may be trained, by way of example, as follows:

first, training data may be input as input data to a predetermined neural network, wherein the training data includes training images that are images including a single vulnerable traffic participant and corresponding tagged occupation types.

For example, the training image may be a segmented sub-image as shown in fig. 3.

Thereafter, output data of the predetermined neural network may be obtained, wherein the output data of the predetermined neural network may represent a probability that the vulnerable traffic participant in the training image determined by the predetermined neural network is classified into each of all occupation types.

Then, a difference between the output data of the predetermined neural network and the labeled occupation type may be determined by a loss function to adjust a convolution kernel of a feature extraction layer and a weight vector of a classification layer of the predetermined neural network.

In the case where the classification layer of the neural network is the softmax classifier layer, as an example, the loss function may be a cross entropy loss function represented by the following equation:

Next, after adjusting the convolution kernel and weight vector, the above steps may be repeated using the next training data and the predetermined neural network of adjusted convolution kernels and weight vectors until a predetermined stopping condition is satisfied.

As an example, the predetermined stop condition may be determined according to at least one of: the accuracy of classification of the predetermined neural network, the recall rate of classification of the predetermined neural network, whether the vulnerable traffic participants in the training images are blocked or not and the training times.

As an example, the predetermined stop condition may be determined according to an accuracy rate of the classification by the predetermined neural network, a recall rate of the classification by the predetermined neural network, and the like.

Here, F1 is an index for determining whether or not a predetermined stop condition is satisfied, Precision is an accuracy of classification by the predetermined neural network, and Recall is a Recall rate of classification by the predetermined neural network. For example, when F1 reaches a predetermined threshold, training of the neural network may be stopped. Here, the above accuracy and recall may be obtained using any existing means. Further, whether the predetermined stop condition is satisfied may also be determined by an average value of F1 calculated a plurality of times instead of F1 calculated a single time.

Further, the predetermined stop condition may also be determined by combining the above F1 with other conditions.

As another example, F1 may be combined with whether a vulnerable traffic participant in the training image is occluded to determine whether training of the neural network needs to be stopped. Upon determining that F1 does not reach the predetermined threshold, it may be determined whether the vulnerable traffic participants in the training images that resulted in F1 not reaching the predetermined threshold are occluded, and if more training images in which the vulnerable traffic participants are occluded, these images may be excluded to recalculate F1, or the threshold of F1 may be adjusted, thereby determining whether training of the neural network should be stopped based on the recalculated and/or threshold adjusted F1.

As another example, F1 and the number of trains may be combined to determine whether training of the neural network needs to be stopped. For example, when the number of training times reaches the predetermined upper limit, and F1 has not yet reached the predetermined threshold, it may indicate that the training of the neural network has reached the limit, which may not be applicable to this classification, at which point the training of the neural network may be stopped and the neural network may be debugged or replaced for retraining.

It should be understood that the above determination of the stopping condition for stopping the training of the neural network is only an example, and different stopping conditions may be determined according to actual use requirements.

Referring to fig. 4, after the neural network is trained as described with reference to fig. 3, an image of the environment around the vehicle may be acquired during the driving of the vehicle (solid-field acquisition in fig. 4). For example, the environmental image around the vehicle may be acquired at a fixed frequency or at varying time intervals depending on the vehicle speed. At this time, the acquired image may be a complete picture.

In order to segment the image portions (sub-images) including the single vulnerable traffic participant from the complete picture, after the acquisition of the environment image, a bounding box including the single vulnerable traffic participant may be divided over the complete picture by a predetermined processing unit, for example, an intelligent traffic driving system shown in fig. 4, so that at least one sub-image each including the single vulnerable traffic participant, i.e., the segmented picture shown in fig. 4, is segmented from the complete picture according to the bounding box.

The segmented pictures, i.e. sub-pictures, may then be input as input data into the trained neural network described with reference to fig. 3. Then, the pixel information extraction layer of the neural network may extract pixel information of the input sub-image (not shown in fig. 4), the feature extraction layer of the neural network (shown in fig. 4 as a feature extractor) may extract features of the sub-image according to the pixel information, and the classification layer (shown in fig. 4 as a classifier) may output data as a classification result according to the extracted features.

Through the method, the vulnerable traffic participants in the input image (the sub-image) can be accurately classified, so that the probability that the vulnerable traffic participants in the image are classified into various occupation types is obtained, the specific occupation type of the vulnerable traffic participants is determined according to the classification result of the neural network, and the determined specific occupation type can provide more accurate information for executing the next operation for the driving process.

For example, in the automatic driving, intelligent driving, etc., in case that the determined type of occupation is police according to the method for classifying the vulnerable traffic participants around the vehicle, the relevant processing unit (e.g., intelligent driving system) of the vehicle may further determine the police's gesture, so as to perform corresponding operations of turning, parking, etc. according to its gesture.

By the method for classifying the participants in the weak traffic around the vehicle, the occupation type of the participants in the weak traffic around the vehicle can be determined through the environment image around the vehicle, so that more accurate information about the participants in the weak traffic can be provided for driver driving, automatic driving, intelligent driving, auxiliary driving and/or the like.

Referring to fig. 5, an apparatus for classifying vulnerable traffic participants around a vehicle according to an exemplary embodiment of the present invention includes: an image acquisition unit 1, an image segmentation unit 2, a classification unit 3 and a type determination unit 4.

The image acquisition unit 1 is configured to be able to acquire an environmental image around the vehicle.

The image segmentation unit 2 is configured to be able to segment at least one sub-image from the environment image, wherein each sub-image of the at least one sub-image comprises a single vulnerable traffic participant.

The classification unit 3 is configured to be able to classify all the vulnerable traffic participants in the at least one sub-image by means of a pre-trained neural network.

The type determination unit 4 is configured to be able to determine the occupation types of all the vulnerable traffic participants according to the classification result.

Here, the acquisition and segmentation of the environment image, the classification of the vulnerable traffic participants, and the determination of the occupation type have been described in detail with reference to fig. 1 to 4 above, and are not described in detail here.

With the apparatus for classifying the participants in the vulnerable traffic around the vehicle according to the present invention, the occupation type of the participants in the vulnerable traffic around the vehicle can be determined through the image of the environment around the vehicle, so that more accurate information on the participants in the vulnerable traffic can be provided for driver driving, automatic driving, smart driving, and/or assisted driving, etc.

There is also provided in accordance with an exemplary embodiment of the present invention a system for classifying vulnerable traffic participants around a vehicle. The system for classifying participants in vulnerable traffic around a vehicle includes a processor and a memory. The memory is configured to be capable of storing a computer program. The computer program, when being executed by a processor, causes the processor to carry out the method of classifying vulnerable traffic participants in the surroundings of a vehicle according to the invention.

There is also provided in accordance with an exemplary embodiment of the present invention a computer-readable recording medium having a computer program stored thereon, wherein the computer program, when executed by a processor, causes the processor to implement the method of classifying vulnerable traffic participants around a vehicle according to the present invention. The computer readable recording medium is any data storage device that can store data read by a computer system. Examples of the computer-readable recording medium include: read-only memory, random access memory, read-only optical disks, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the internet via wired or wireless transmission paths). The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. In addition, functional programs, codes, and code segments for accomplishing the present invention can be easily construed by programmers of ordinary skill in the art to which the present invention pertains within the scope of the present invention.

Furthermore, each unit in the above-described apparatuses and devices according to exemplary embodiments of the present invention may be implemented as a hardware component or a software module. Further, the respective units may be implemented by using, for example, a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), or a processor according to the processing performed by the respective units defined by those skilled in the art.

Although the invention is illustrated and described herein with reference to specific embodiments, the invention is not intended to be limited to the details shown. Rather, various modifications may be made in the details within the scope of the invention.

List of reference numerals

S1 obtaining an environmental image around the vehicle

S2 segmenting at least one sub-image from the ambient image

S3 classifying all vulnerable traffic participants in the at least one sub-image by a pre-trained neural network

S4 determining occupation types of all the vulnerable traffic participants according to the classification result

S31 obtaining the pixel information of the sub-image through the pixel information extraction layer of the neural network

S32 extracting the feature of the sub-image according to the acquired pixel information through the feature extraction layer of the neural network

S33 determining output data as classification result according to the extracted features through the classification layer of the neural network

1 image acquisition unit

2 image segmentation unit

3 classification unit

4 type determination unit

Claims

1. A method of classifying vulnerable traffic participants around a vehicle, the method comprising:

acquiring an environment image around a vehicle;

segmenting at least one sub-image from the environment image, wherein each sub-image of the at least one sub-image comprises a single vulnerable traffic participant, wherein a vulnerable traffic participant represents a person exposed to the vehicle surroundings;

classifying all vulnerable traffic participants in the at least one sub-image through a pre-trained neural network;

and determining the occupation types of all the vulnerable traffic participants according to the classification result.

2. The method of claim 1, wherein the step of classifying all vulnerable traffic participants in the at least one sub-image by a pre-trained neural network comprises:

for each of the at least one sub-image as input data to the neural network:

acquiring pixel information of the sub-image through a pixel information extraction layer of the neural network;

extracting the features of the sub-images according to the acquired pixel information through a feature extraction layer of the neural network;

determining, by a classification layer of the neural network, output data as a result of the classification from the extracted features, wherein the output data of the neural network represents a probability that the vulnerable traffic participant in the sub-image belongs to each of all occupation types classified by the neural network,

wherein the step of determining the occupation types of all the vulnerable traffic participants according to the classification result comprises the following steps:

and determining the occupation type with the highest probability in the output data as the occupation type of the vulnerable traffic participant.

3. The method of claim 2, wherein the step of obtaining pixel information for the sub-image comprises:

adjusting the resolution of the sub-image according to the resolution of the feature extraction layer to obtain an adjusted sub-image;

acquiring a pixel value vector of each color channel of the adjusted sub-image;

normalizing each element of the obtained all pixel value vectors to obtain normalized vectors corresponding to all pixel value vectors as pixel information, respectively,

wherein the feature extraction layer of the neural network extracts features of the sub-image expressed by feature vectors through a convolution function according to the obtained normalized vectors,

wherein the classification layer of the neural network determines output data according to the extracted feature vectors and the weight vectors of the classification layer.

4. The method of claim 3, wherein the neural network is trained by:

inputting training data into a predetermined neural network as input data, wherein the training data comprises training images and corresponding labeled occupation types, and the training images are images comprising single vulnerable traffic participants;

acquiring output data of the predetermined neural network, wherein the output data of the predetermined neural network represents the probability that the vulnerable traffic participants in the training images determined by the predetermined neural network are classified into each of all occupation types;

determining a difference between the output data of the predetermined neural network and the labeled occupational type through a loss function to adjust a convolution kernel of a feature extraction layer and a weight vector of a classification layer of the predetermined neural network;

repeating the above steps using the predetermined neural network of next training data and adjusted convolution kernels and weight vectors until a predetermined stopping condition is met,

wherein the predetermined stop condition is determined according to at least one of: the accuracy of classification of the predetermined neural network, the recall rate of classification of the predetermined neural network, whether the vulnerable traffic participants in the training images are blocked or not and the training times.

5. The method of claim 4, wherein the classification layer is a softmax classifier layer, wherein the output data is represented by the following equation:

A＝softmax(Z)＝softmax(W^TX+b)

6. The method of claim 5, wherein the loss function is a cross-entropy loss function represented by the equation:

7. The method of any of claims 3 to 6, wherein the convolution function is represented by the equation:

H(h，k)＝(f*I)(h，k)＝∑_h∑_kf(p，q)I(h-p，k-q)

8. An apparatus for classifying vulnerable traffic participants around a vehicle, the apparatus comprising:

an image acquisition unit configured to be able to acquire an environmental image around a vehicle;

an image segmentation unit configured to be able to segment at least one sub-image from the environment image, wherein each sub-image of the at least one sub-image comprises a single vulnerable traffic participant;

a classification unit configured to classify all vulnerable traffic participants in the at least one sub-image by a pre-trained neural network;

a type determination unit configured to be able to determine the occupation types of all the vulnerable traffic participants according to the classification result.

9. A computer-readable recording medium storing a computer program, wherein the computer program, when executed by a processor, causes the processor to implement the method of any one of claims 1 to 7.

10. A system for classifying vulnerable traffic participants around a vehicle, the system comprising:

a processor;

a memory storing a computer program that, when executed by the processor, causes the processor to implement the method of any one of claims 1 to 7.