CN110287782A

CN110287782A - Pedestrian's parted pattern training method and device

Info

Publication number: CN110287782A
Application number: CN201910414408.6A
Authority: CN
Inventors: 石娟峰
Original assignee: Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Maigewei Technology Co Ltd
Priority date: 2019-05-17
Filing date: 2019-05-17
Publication date: 2019-09-27

Abstract

The present disclosure discloses a kind of pedestrian's parted pattern training method, device, electronic equipment and computer readable storage mediums, wherein the described method includes: obtaining training sample set；Training sample set is made of the multiple sample images for being marked cut zone, and the cut zone that first kind label is marked in multiple sample images shares N class and the shared M class of cut zone for being marked the second class label；It include the first convolutional neural networks of at least one sub- convolutional neural networks by the input of training sample set, the corresponding pedestrian of a sub- convolutional neural networks divides branch model；Each pedestrian divides branch model parallel training and obtains pedestrian's parted pattern.The sample image that the disclosure passes through the different classes of cut zone of mark, and it is trained to obtain pedestrian's parted pattern according to the sample image being marked, allow pedestrian's parted pattern in cut zone, different classes of cut zone can be distinguished, it can be used for dividing and be difficult to separated region, promote pedestrian's segmentation precision.

Description

Pedestrian segmentation model training method and device

Technical Field

The present disclosure relates to the field of pedestrian detection technologies, and in particular, to a method and an apparatus for training a pedestrian segmentation model, an electronic device, and a computer-readable storage medium.

Background

In many applications of video structuring, pedestrian analysis is of great importance, and especially, the pedestrian analysis plays a core role in many fields such as security protection, video retrieval and the like for human identity recognition.

Pedestrian segmentation refers to a technology for separating areas such as coats, under-coats, shoes, hats, hair, bags, skin, and backgrounds in a human body. The method is limited in many practical scenes, the image quality of pedestrians is poor, pixels are low, the influence of light, postures and the like is large, and pedestrian segmentation is a difficult point in video structuring. Particularly for difficult samples such as particularly adjacent and poorly separated areas, such as caps and hair, which are adjacent and difficult to separate.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a pedestrian segmentation model training method, apparatus, electronic device, and computer-readable storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a pedestrian segmentation model training method, including:

acquiring a training sample set; the training sample set consists of a plurality of sample images marked with segmentation areas, wherein the segmentation areas marked with first-class marks in the plurality of sample images share N classes, and the segmentation areas marked with second-class marks in the plurality of sample images share M classes, wherein M and N are positive integers;

inputting the set of training samples into a first convolutional neural network; the first convolutional neural network comprises at least one sub-convolutional neural network, and one sub-convolutional neural network corresponds to one pedestrian segmentation branch model;

each pedestrian segmentation branch model is trained in parallel according to the training sample set until a preset convergence condition is met, and a pedestrian segmentation model comprising at least one pedestrian segmentation branch model is obtained; wherein the pedestrian segmentation model is used for segmenting segmentation regions in a pedestrian image.

Further, the convolutional neural network comprises a first sub-convolutional neural network, wherein the first sub-convolutional neural network corresponds to a first pedestrian segmentation branch model;

correspondingly, each pedestrian segmentation branch model is trained in parallel according to the training sample set until a preset convergence condition is met, so that a pedestrian segmentation model comprising at least one pedestrian segmentation branch model is obtained, and the method comprises the following steps:

and the first pedestrian segmentation branch model takes the M types of segmentation regions as M types of training samples respectively, classifies the N types of segmentation regions as one type of training samples, and trains the M +1 types of training samples by adopting the first sub convolution neural network until a preset convergence condition is met.

Further, the convolutional neural network further comprises a second sub-convolutional neural network in parallel with the first sub-convolutional neural network, wherein the second sub-convolutional neural network corresponds to a second pedestrian segmentation branch model;

correspondingly, the step of obtaining a pedestrian segmentation model further includes:

and the second pedestrian segmentation branch model classifies the M types of segmentation regions into one type of training samples, takes the N types of segmentation regions as N types of training samples respectively, and trains the N +1 types of training samples by adopting the second sub convolution neural network until a preset convergence condition is met.

Further, the convolutional neural network further comprises a third sub-convolutional neural network in parallel with the first sub-convolutional neural network and the second sub-convolutional neural network, wherein the third sub-convolutional neural network corresponds to a third person segmentation branch model;

and the third person segmentation branch model takes the N + M types of segmentation regions as N + M types of training samples respectively, and the third person segmentation branch model trains the N + M types of training samples by adopting the third sub convolution neural network until a preset convergence condition is met.

Further, the method further comprises:

calculating a loss function of the first pedestrian segmentation branch model in the training process of the first pedestrian segmentation branch model;

calculating a loss function of the second pedestrian segmentation branch model in the training process of the second pedestrian segmentation branch model;

calculating a loss function of the third person segmentation branch model in the training process of the third person segmentation branch model;

weighting the loss function of the first pedestrian-segmentation branch model, the loss function of the second pedestrian-segmentation branch model and the loss function of the third pedestrian-segmentation branch model;

and taking the weighted loss function as the loss function of the pedestrian segmentation model, and taking the convergence condition of the loss function of the pedestrian segmentation model as the preset convergence condition.

Further, the inputting the training sample set into a first convolutional neural network includes:

inputting the training sample set into a second convolutional neural network, and respectively performing feature extraction on sample images in the training sample set through the second convolutional neural network to obtain a feature map set containing feature information;

inputting the feature map set into a first convolutional neural network.

According to a second aspect of the embodiments of the present disclosure, there is provided a pedestrian segmentation method including:

acquiring a pedestrian image;

inputting the pedestrian image into a pedestrian segmentation model obtained by training by adopting the pedestrian segmentation model training method;

and segmenting the pedestrian image through the pedestrian segmentation model to obtain a segmentation region.

Further, the segmenting the pedestrian image through the pedestrian segmentation model to obtain a segmentation region includes:

and segmenting the pedestrian image through a third pedestrian segmentation branch model of the pedestrian segmentation model to obtain a segmentation region.

inputting the pedestrian image into a second convolutional neural network, and performing feature extraction on the image through the second convolutional neural network to obtain a feature map containing feature information;

and inputting the characteristic diagram into a pedestrian segmentation model to obtain a segmentation region.

According to a third aspect of the embodiments of the present disclosure, there is provided a pedestrian segmentation model training device, including:

the sample acquisition module is used for acquiring a training sample set; the training sample set consists of a plurality of sample images marked with segmentation areas, wherein the segmentation areas marked with first-class marks in the plurality of sample images share N classes, and the segmentation areas marked with second-class marks in the plurality of sample images share M classes, wherein M and N are positive integers;

a sample input module for inputting the training sample set into a first convolutional neural network; the first convolutional neural network comprises at least one sub-convolutional neural network, and one sub-convolutional neural network corresponds to one pedestrian segmentation branch model;

the model training module is used for performing parallel training on each pedestrian segmentation branch model according to the training sample set until a preset convergence condition is met to obtain a pedestrian segmentation model comprising at least one pedestrian segmentation branch model; wherein the pedestrian segmentation model is used for segmenting segmentation regions in a pedestrian image.

correspondingly, the model training module is specifically configured to: and the first pedestrian segmentation branch model takes the M types of segmentation regions as M types of training samples respectively, classifies the N types of segmentation regions as one type of training samples, and trains the M +1 types of training samples by adopting the first sub convolution neural network until a preset convergence condition is met.

correspondingly, the model training module is specifically configured to: and the second pedestrian segmentation branch model classifies the M types of segmentation regions into one type of training samples, takes the N types of segmentation regions as N types of training samples respectively, and trains the N +1 types of training samples by adopting the second sub convolution neural network until a preset convergence condition is met.

correspondingly, the model training module is specifically configured to: and the third person segmentation branch model takes the N + M types of segmentation regions as N + M types of training samples respectively, and the third person segmentation branch model trains the N + M types of training samples by adopting the third sub convolution neural network until a preset convergence condition is met.

Further, the apparatus further comprises:

the loss function calculation module is used for calculating a loss function of the first pedestrian segmentation branch model in the training process of the first pedestrian segmentation branch model; calculating a loss function of the second pedestrian segmentation branch model in the training process of the second pedestrian segmentation branch model; calculating a loss function of the third person segmentation branch model in the training process of the third person segmentation branch model; weighting the loss function of the first pedestrian-segmentation branch model, the loss function of the second pedestrian-segmentation branch model and the loss function of the third pedestrian-segmentation branch model; and taking the weighted loss function as the loss function of the pedestrian segmentation model, and taking the convergence condition of the loss function of the pedestrian segmentation model as the preset convergence condition.

Further, the sample input module is specifically configured to: inputting the training sample set into a second convolutional neural network, and respectively performing feature extraction on sample images in the training sample set through the second convolutional neural network to obtain a feature map set containing feature information; inputting the feature map set into a first convolutional neural network.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a pedestrian dividing device including:

the image acquisition module is used for acquiring a pedestrian image;

the image input module is used for inputting the pedestrian image into a pedestrian segmentation model obtained by training by adopting the pedestrian segmentation model training method;

and the image segmentation module is used for segmenting the pedestrian image through the pedestrian segmentation model to obtain a segmentation region.

Further, the image segmentation module is specifically configured to: and segmenting the pedestrian image through a third pedestrian segmentation branch model of the pedestrian segmentation model to obtain a segmentation region.

Further, the image segmentation module is specifically configured to: inputting the pedestrian image into a second convolutional neural network, and performing feature extraction on the image through the second convolutional neural network to obtain a feature map containing feature information; and inputting the characteristic diagram into a pedestrian segmentation model to obtain a segmentation region.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions; wherein the processor is configured to execute any one of the above pedestrian segmentation model training methods or any one of the above pedestrian segmentation methods.

According to a sixth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform any one of the above-mentioned pedestrian segmentation model training methods, or perform any one of the above-mentioned pedestrian segmentation methods.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: through marking the sample images of the segmentation regions of different classes, training is carried out according to the marked sample images to obtain the pedestrian segmentation model, so that the pedestrian segmentation model can distinguish the segmentation regions of different classes when the segmentation regions are formed, the segmentation regions which are difficult to separate can be used for segmenting, and the pedestrian segmentation precision is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1a is a flowchart of a pedestrian segmentation model training method according to an embodiment of the present disclosure.

Fig. 1b is a schematic diagram of a convolution process of a convolutional layer in a pedestrian segmentation model training method according to an embodiment of the present disclosure;

fig. 1c is a schematic diagram of a convolution result of a convolutional layer in a pedestrian segmentation model training method according to an embodiment of the disclosure;

fig. 2 is a flowchart of a pedestrian segmentation model training method provided in the second embodiment of the present disclosure.

Fig. 3 is a block diagram of a pedestrian segmentation model training device according to a third embodiment of the present disclosure.

Fig. 4 is a block diagram of a pedestrian segmentation model training device according to a fourth embodiment of the present disclosure.

Fig. 5 is a block diagram of an electronic device according to a fifth embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Example one

Fig. 1a is a flowchart of a training method for a pedestrian segmentation model according to an embodiment of the present disclosure, and an execution main body of the training method for a pedestrian segmentation model according to the embodiment of the present disclosure may be a training device for a pedestrian segmentation model, which may be integrated in a mobile terminal (e.g., a smart phone, a tablet computer, etc.), a notebook computer, or a fixed terminal (a desktop computer), and the training device for a pedestrian segmentation model may be implemented by hardware or software. As shown in fig. 1a, the method comprises the following steps:

step S11, acquiring a training sample set; the training sample set is composed of a plurality of sample images marked with segmentation areas, wherein the segmentation areas marked with first-class marks in the plurality of sample images share N classes, and the segmentation areas marked with second-class marks share M classes, wherein M and N are positive integers.

Wherein the mark of the segmentation region includes but is not limited to at least one of the following: jacket, lower garment, shoes, hat, hair, bag, skin, background.

The sample image may be a pedestrian image, and the segmented region of the first type mark may be an image region that is relatively easy to segment in the image, such as a pedestrian and a background; the segmented regions of the second type of mark may be adjacent and difficult to segment image regions in the image, such as hair and caps.

The numerical values of N and M are determined by the segmentation areas contained in all the sample images, each sample image is segmented into different areas, the segmentation areas on each sample image are marked with first marks which are less than or equal to N types, the segmentation areas on each sample image are marked with second marks which are less than or equal to M types, and the set comprises N types and M types.

For example, the sample image 1 includes a hat, a hair, and a jacket, the sample image 2 includes a coat, a shoe, and a bag, and the sample image 3 includes a coat and a bag, and the sample image collectively includes 6 types of divided regions such as a hat, a hair, a coat, a shoe, and a bag, and further, the coat, the shoe, and the bag can be regarded as one type of divided region, i.e., a divided region of the first type mark, and the hat and the hair can be regarded as one type of divided region, i.e., a divided region of the second type mark, according to the difficulty of division, in this case, the value of N corresponds to 4, and the value of M corresponds to 2.

Step S12, inputting the training sample set into a first convolution neural network; the first convolutional neural network comprises at least one sub-convolutional neural network, and one sub-convolutional neural network corresponds to one pedestrian segmentation branch model.

The Convolutional Neural Networks (CNN) are a type of feed-forward Neural network that includes Convolutional calculation and has a deep structure, and mainly include an input layer, a Convolutional layer, a pooling layer, a full-link layer, and an output layer. Also, a convolutional neural network may include a plurality of convolutional layers. In this context, the convolutional neural network may be a straight-tube convolutional neural network, and may also be a deep learning convolutional neural network, which is not limited specifically herein.

The convolution layer includes convolution kernel, the convolution kernel may be a matrix for performing convolution on the input image, and the specific calculation method is to multiply elements of different local matrices of the input image and each position of the convolution kernel matrix and then add the multiplied elements. In this context, each training channel corresponds to a different convolution kernel.

For example, as shown in FIG. 1b, the input is a two-dimensional 3 x 4 matrix and the convolution kernel is a 2 x 2 matrix. Assuming that the convolution is performed by shifting one pixel at a time, the upper left corner 2 x 2 of the input is first convolved with a convolution kernel, i.e. the elements at each position are multiplied and then added to obtain the element of S00 of the output matrix S, which has the value aw + bx + ey + fzaw + bx + ey + fz. The input local is then shifted one pixel to the right, now a matrix of four elements (b, c, f, g) is convolved with a convolution kernel, thus obtaining the elements of S01 of the output matrix S, and in the same way, the elements of S02, S10, S11, S12, S10, S11, S12 of the output matrix S can be obtained. As shown in fig. 1c, the resulting matrix of convolution outputs is a 2 x 3 matrix S.

The parameters include parameters corresponding to convolution kernels of the convolution layers, for example, the size of a convolution matrix, which may be set to 3 × 3, for example, and different convolution layers may have different convolution kernels. In addition, parameters of the pooling layer, such as the size of the pooling matrix, the pooling matrix which may be 3 × 3, or parameters of the output layer, such as a linear coefficient matrix and a bias vector, may also be included.

Herein, the first convolutional neural network is an overall network comprising at least one sub-convolutional neural network, the sub-convolutional neural network being a branch of the overall network.

Step S13, each pedestrian segmentation branch model is trained in parallel according to the training sample set until a preset convergence condition is met, and a pedestrian segmentation model comprising at least one pedestrian segmentation branch model is obtained; wherein the pedestrian segmentation model is used for segmenting segmentation regions in a pedestrian image.

And each pedestrian segmentation branch model is processed in parallel, corresponding convergence conditions are provided or the same convergence condition is met to finish the training process, so that at least one pedestrian segmentation branch model is obtained, and the pedestrian segmentation model is formed by the at least one pedestrian segmentation branch model. When the pedestrian segmentation is carried out, one pedestrian segmentation branch model can be selected for segmentation.

In the embodiment, the sample images of the segmentation regions of different classes are labeled, and the pedestrian segmentation model is obtained by training according to the labeled sample images, so that the pedestrian segmentation model can distinguish the segmentation regions of different classes when the segmentation regions are used, the segmentation regions difficult to separate can be used for segmenting, and the pedestrian segmentation precision is improved.

In an optional embodiment, the convolutional neural network comprises a first sub-convolutional neural network, wherein the first sub-convolutional neural network corresponds to a first pedestrian segmentation branch model;

accordingly, step S13 includes:

In this embodiment, the M classes of segmentation regions are respectively used as M classes of training samples, the N classes of segmentation regions are classified into one class of training samples, and classification training is performed according to this way, so that the first pedestrian segmentation branch model can learn how to better segment the M classes of segmentation regions without considering the distinction between the N classes of segmentation regions.

For example, if the M-class segmentation region is two adjacent difficult segmentation regions that are difficult to segment, such as hair and hat, the hair and hat are respectively used as two types of training samples, and the remaining N-class segmentation regions are classified as one type of training samples, so that the first pedestrian segmentation branch model focuses on learning how to segment between the hair and the hat, and thus the first pedestrian segmentation branch model can better process more complicated and difficult segmentation regions, i.e., the difficult segmentation regions.

In an optional embodiment, the convolutional neural network further comprises a second sub-convolutional neural network in parallel with the first sub-convolutional neural network, wherein the second sub-convolutional neural network corresponds to a second pedestrian segmentation branch model;

correspondingly, step S13 further includes:

In the present embodiment, the class M division regions and the class N division regions are trained separately, so that the second pedestrian division branch model does not need to consider the relationship between the class M division regions, the class M division region is trained, the division between the class N division regions is learned with emphasis, and the division between the class M division regions and the division between the class N division regions are considered by combining the first pedestrian division branch model, so that the complexity of each pedestrian division branch model is small, the calculation amount is small, and the pedestrian division branch model with better division effect can be obtained.

As an example, if M is 2, the corresponding 2 types of segmentation regions are two types of adjacent difficult segmentation regions which are difficult to segment, such as hair and hat, in this embodiment, hair and hat are respectively used as a type of training sample, and the remaining N types of segmentation regions are respectively used as N types of training samples to be trained, so that the model does not need to consider the relationship between hair and hat, the large type of hair and hat is trained, the segmentation between the N types of segmentation regions is learned with emphasis, and by combining the first pedestrian segmentation branch model, the segmentation between the M types of segmentation regions and the segmentation between the N types of segmentation regions are considered, so that the complexity of each pedestrian segmentation branch model is small, the calculation amount is small, and the more complicated and difficult segmentation regions, that is, can be better processed.

In an optional embodiment, the convolutional neural network further comprises a third sub-convolutional neural network in parallel with the first sub-convolutional neural network and the second sub-convolutional neural network, wherein the third sub-convolutional neural network corresponds to a third person segmentation branch model;

accordingly, step S13 further includes:

In this embodiment, the first pedestrian segmentation branch model is used to respectively use the M types of segmentation regions as M types of training samples, and the N types of segmentation regions are classified into one type of training samples to be respectively trained independently, so that the first pedestrian segmentation branch model can be used to learn how to better segment the M types of segmentation regions, and the first pedestrian segmentation branch model can better process more complicated and less segmented segmentation regions, i.e., difficult segmentation regions; combining a second pedestrian segmentation branch model, classifying the M types of segmentation regions into one type of training sample, and taking the N types of segmentation regions as N types of training samples respectively for carrying out differentiation training, so that the second pedestrian segmentation branch model does not need to consider the relation between the M types of segmentation regions any more, training the large class of the M types of segmentation regions first, and learning the segmentation between the N types of segmentation regions in a focused manner, thereby considering the segmentation between the M types of segmentation regions and the segmentation between the N types of segmentation regions, so that the complexity of each pedestrian segmentation branch model is small, the calculation amount is small, and the pedestrian segmentation branch model with better segmentation effect can be obtained; and further training all classes by combining a third pedestrian segmentation branch model, so that the 3 branch pedestrian segmentation models are obtained by utilizing a method of training classes and subclasses together, and the precision of difficult classes of the pedestrian segmentation models can be improved.

In an optional embodiment, the method further comprises:

In an alternative embodiment, step S12 includes:

step S121: inputting the training sample set into a second convolutional neural network, and respectively performing feature extraction on sample images in the training sample set through the second convolutional neural network to obtain a feature map set containing feature information;

the second convolutional neural network is shared by the first pedestrian segmentation branch model, the second pedestrian segmentation branch model and the third pedestrian segmentation branch model.

The second convolutional neural network may use a classical network structure (e.g., GoogleNet, or VGG, or ResNet) as the underlying convolutional neural network into which a sample image is first input, and the parameters of the underlying convolutional neural network are initialized with a trained model of the underlying convolutional neural network.

Step S122: inputting the feature map set into a first convolutional neural network.

Specifically, the training sample set can be input into the second convolutional neural network to extract corresponding feature information, so that the features of the segmented region in the obtained sample image are more obvious, and the obtained sample image is input into the first convolutional neural network as the feature map set to be trained, so that the obtained pedestrian segmentation model is more accurate, and the segmentation precision is further improved.

Example two

Fig. 2 is a flowchart of a pedestrian segmentation method according to a second embodiment of the present disclosure, and an execution main body of the pedestrian segmentation model training method according to the second embodiment of the present disclosure may be a pedestrian segmentation model training device according to the second embodiment of the present disclosure, and the device may be integrated in a mobile terminal (e.g., a smart phone, a tablet computer, etc.), a notebook computer or a fixed terminal (a desktop computer), and the pedestrian segmentation model training device may be implemented by hardware or software. As shown in fig. 2, the method specifically includes:

in step S21, a pedestrian image is acquired.

Specifically, the pedestrian image can be acquired in real time through the camera, or the pre-stored pedestrian image can be acquired from the local database.

In step S22, the pedestrian image is input into a pedestrian segmentation model.

The pedestrian segmentation model is obtained by training by the pedestrian segmentation model training method in the first embodiment.

And step S23, segmenting the pedestrian image through the pedestrian segmentation model to obtain a segmentation region.

The pedestrian segmentation model is obtained by training according to the sample images marked with the segmentation regions of different types, so that the segmentation regions of different types can be distinguished when the pedestrian segmentation model is used for segmenting the regions, the regions which are difficult to separate can be segmented, and the pedestrian segmentation precision is improved.

In an optional embodiment, step S23 specifically includes:

The pedestrian segmentation model includes three pedestrian segmentation branch models, which are a first pedestrian segmentation branch model, a second pedestrian segmentation branch model and a third pedestrian segmentation branch model, and the definitions of the three pedestrian segmentation branch models are specific to the first embodiment, and are not described herein again.

In the embodiment, only the third pedestrian segmentation branch model needs to be taken out for use in segmentation, so that the method does not increase the complexity of the pedestrian segmentation model while improving the precision.

In an optional embodiment, step S23 specifically includes:

inputting the pedestrian image into a second convolutional neural network, and performing feature extraction on the image through the second convolutional neural network to obtain a feature map containing feature information; and inputting the characteristic diagram into a pedestrian segmentation model to obtain a segmentation region. Specifically, the features of the segmentation region in the pedestrian image can be more obvious by extracting the corresponding feature information from the second convolutional neural network of the pedestrian image, and the segmentation precision is further improved. The pedestrian segmentation model of the embodiment can obtain the segmentation region with the first type mark and the segmentation region with the second type mark which is adjacent in position and difficult to segment.

EXAMPLE III

Fig. 3 is a block diagram of a pedestrian segmentation model training device provided in the third embodiment of the present disclosure. The device can be integrated in a mobile terminal (e.g., a smart phone, a tablet computer, etc.), a notebook computer or a fixed terminal (desktop computer), and the pedestrian segmentation model training device can be implemented by hardware or software. Referring to fig. 3, the apparatus includes a sample acquisition module 31, a sample input module 32, and a model training module 33; wherein,

the sample obtaining module 31 is configured to obtain a training sample set; the training sample set consists of a plurality of sample images marked with segmentation areas, wherein the segmentation areas marked with first-class marks in the plurality of sample images share N classes, and the segmentation areas marked with second-class marks in the plurality of sample images share M classes, wherein M and N are positive integers;

the sample input module 32 is configured to input the training sample set into a first convolutional neural network; the first convolutional neural network comprises at least one sub-convolutional neural network, and one sub-convolutional neural network corresponds to one pedestrian segmentation branch model;

the model training module 33 is used for performing parallel training on each pedestrian segmentation branch model according to the training sample set until a preset convergence condition is met, so as to obtain a pedestrian segmentation model comprising at least one pedestrian segmentation branch model; wherein the pedestrian segmentation model is used for segmenting segmentation regions in a pedestrian image.

accordingly, the model training module 33 is specifically configured to: and the first pedestrian segmentation branch model takes the M types of segmentation regions as M types of training samples respectively, classifies the N types of segmentation regions as one type of training samples, and trains the M +1 types of training samples by adopting the first sub convolution neural network until a preset convergence condition is met.

accordingly, the model training module 33 is specifically configured to: and the second pedestrian segmentation branch model classifies the M types of segmentation regions into one type of training samples, takes the N types of segmentation regions as N types of training samples respectively, and trains the N +1 types of training samples by adopting the second sub convolution neural network until a preset convergence condition is met.

accordingly, the model training module 33 is specifically configured to: and the third person segmentation branch model takes the N + M types of segmentation regions as N + M types of training samples respectively, and the third person segmentation branch model trains the N + M types of training samples by adopting the third sub convolution neural network until a preset convergence condition is met.

Further, the apparatus further comprises: a loss function calculation module 34; wherein,

the loss function calculation module 34 is configured to calculate a loss function of the first pedestrian segmentation branch model in the training process of the first pedestrian segmentation branch model; calculating a loss function of the second pedestrian segmentation branch model in the training process of the second pedestrian segmentation branch model; calculating a loss function of the third person segmentation branch model in the training process of the third person segmentation branch model; weighting the loss function of the first pedestrian-segmentation branch model, the loss function of the second pedestrian-segmentation branch model and the loss function of the third pedestrian-segmentation branch model; and taking the weighted loss function as the loss function of the pedestrian segmentation model, and taking the convergence condition of the loss function of the pedestrian segmentation model as the preset convergence condition.

Further, the sample input module 32 is specifically configured to: inputting the training sample set into a second convolutional neural network, and respectively performing feature extraction on sample images in the training sample set through the second convolutional neural network to obtain a feature map set containing feature information; inputting the feature map set into a first convolutional neural network.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Example four

Fig. 4 is a block diagram of a pedestrian segmentation apparatus according to a fourth embodiment of the present disclosure. The device can be integrated in a mobile terminal (e.g., a smart phone, a tablet computer, etc.), a notebook computer or a fixed terminal (desktop computer), and the pedestrian segmentation model training device can be implemented by hardware or software. Referring to fig. 4, the apparatus includes an image acquisition module 41, an image input module 42, and an image segmentation module 43; wherein,

the image acquisition module 41 is used for acquiring a pedestrian image;

the image input module 42 is configured to input the pedestrian image into a pedestrian segmentation model trained by using the pedestrian segmentation model training method described in any one of the above;

the image segmentation module 43 is configured to segment the pedestrian image through the pedestrian segmentation model to obtain a segmentation region.

Further, the image segmentation module 43 is specifically configured to: and segmenting the pedestrian image through a third pedestrian segmentation branch model of the pedestrian segmentation model to obtain a segmentation region.

Further, the image segmentation module 43 is specifically configured to: inputting the pedestrian image into a second convolutional neural network, and performing feature extraction on the image through the second convolutional neural network to obtain a feature map containing feature information; and inputting the characteristic diagram into a pedestrian segmentation model to obtain a segmentation region.

EXAMPLE five

An embodiment of the present disclosure provides an electronic device, including:

a processor;

a memory for storing processor-executable instructions; wherein the processor is configured to:

Further, the method further comprises:

inputting the feature map set into a first convolutional neural network.

Fig. 5 is a block diagram of an electronic device provided in an embodiment of the present disclosure. For example, the electronic device may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like. Referring to fig. 5, the electronic device may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.

The processing component 502 generally controls overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 506 provides power to the various components of the electronic device. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for an electronic device.

The multimedia component 508 includes a screen that provides an output interface between the electronic device and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the electronic device. For example, the sensor assembly 514 may detect an open/closed state of the electronic device, the relative positioning of components, such as a display and keypad of the electronic device, the sensor assembly 514 may detect a change in position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, orientation or acceleration/deceleration of the electronic device, and a change in temperature of the electronic device. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the electronic device to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, an application program, such as the memory 504 including instructions executable by the processor 520 of the electronic device to perform the above-described method, is also provided.

EXAMPLE six

a processor;

acquiring a pedestrian image;

inputting the pedestrian image into a second convolutional neural network, and performing feature extraction on the image through the second convolutional neural network to obtain a feature map containing feature information; and inputting the characteristic diagram into a pedestrian segmentation model to obtain a segmentation region.

The structural block diagram of the electronic device of this embodiment refers to the fifth embodiment, and details are not described here.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A pedestrian segmentation model training method is characterized by comprising the following steps:

2. The pedestrian segmentation model training method of claim 1, wherein the convolutional neural network comprises a first sub-convolutional neural network, wherein the first sub-convolutional neural network corresponds to a first pedestrian segmentation branch model;

3. The pedestrian segmentation model training method according to claim 2, wherein the convolutional neural network further comprises a second sub-convolutional neural network in parallel with the first sub-convolutional neural network, wherein the second sub-convolutional neural network corresponds to a second pedestrian segmentation branch model;

4. The pedestrian segmentation model training method of claim 3, wherein the convolutional neural network further comprises a third sub-convolutional neural network in parallel with the first sub-convolutional neural network and the second sub-convolutional neural network, wherein the third sub-convolutional neural network corresponds to a third pedestrian segmentation branch model;

5. The pedestrian segmentation model training method of claim 4, wherein the method further comprises:

6. The pedestrian segmentation model training method according to any one of claims 1 to 5, wherein the inputting the set of training samples into a first convolutional neural network comprises:

inputting the feature map set into a first convolutional neural network.

7. A pedestrian segmentation method, comprising:

acquiring a pedestrian image;

inputting the pedestrian image into a pedestrian segmentation model obtained by training by adopting the pedestrian segmentation model training method of any one of claims 1 to 6;

8. The pedestrian segmentation method according to claim 7, wherein the segmenting the pedestrian image through the pedestrian segmentation model to obtain segmented regions comprises:

9. The pedestrian segmentation method according to claim 7 or 8, wherein the segmenting the pedestrian image through the pedestrian segmentation model to obtain segmented regions comprises: inputting the pedestrian image into a second convolutional neural network, and performing feature extraction on the image through the second convolutional neural network to obtain a feature map containing feature information;

10. A pedestrian segmentation model training device, comprising:

11. A pedestrian segmentation apparatus, comprising:

the image acquisition module is used for acquiring a pedestrian image;

an image input module, configured to input the pedestrian image into a pedestrian segmentation model trained by the pedestrian segmentation model training method according to any one of claims 1 to 6;

12. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions; wherein the processor is configured to perform the pedestrian segmentation model training method of any one of claims 1 to 6 or to perform the pedestrian segmentation method of any one of claims 7 to 9.

13. A non-transitory computer readable storage medium having instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the pedestrian segmentation model training method of any one of claims 1 to 6 or to perform the pedestrian segmentation method of any one of claims 7 to 9.