CN115376064A

CN115376064A - Method and system for generating image sequence of pedestrian invasion of railway based on posture migration

Info

Publication number: CN115376064A
Application number: CN202210961141.4A
Authority: CN
Inventors: 郭保青; 余祖俊; 朱力强; 阮涛; 王尧; 王耀东
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-11-22

Abstract

The embodiment of the invention provides a method and a system for generating an image sequence of pedestrian invasion of a railway based on posture migration, wherein the method comprises the following steps: acquiring a pedestrian attitude sequence in a non-railway scene as a target attitude sequence; establishing a model and training to obtain a posture migration pedestrian generation model, wherein the posture migration pedestrian generation model comprises a generator and a discriminator, and the discriminator comprises an appearance discriminator and a posture discriminator; inputting a preset pedestrian appearance image and a target posture sequence into a posture migration pedestrian generation model to obtain a plurality of target pedestrian images; and synthesizing the target pedestrian image into the railway scene image to generate a railway pedestrian invasion image, automatically completing automatic labeling of the invasion pedestrian image, and generating a railway scene pedestrian invasion video sequence from the railway pedestrian invasion image containing the target action sequence.

Description

Method and system for generating image sequence of pedestrian invasion of railway based on posture migration

Technical Field

The invention relates to the technical field of railway operation safety detection, in particular to a method and a system for generating an image sequence of railway pedestrian invasion based on posture migration.

Background

With the continuous acceleration of the running speed of the railway train, the invasion of foreign matters into the railway clearance is a main factor of the frequent occurrence of the current railway safety accidents, and great threat is caused to the railway operation safety. At present, the illegal invasion of pedestrians into the railway is still the main cause of casualties caused by railway traffic accidents, the illegal boarding of pedestrians seriously threatens the railway transportation safety, and huge losses are brought to national economy and the property safety of people.

The current railway clearance foreign matter detection means comprises contact type and non-contact type. The contact detection method is commonly used in a dual-power-grid monitoring technology, a fiber grating technology and the like. The protection net technology judges whether foreign matters exist or not by detecting whether the protection net has impact or not, and is accurate in detecting sudden foreign matters falling on the protection net by falling rocks and the like, but cannot judge other invading foreign matters. In addition, the protection net needs to be installed in a large range and needs to be maintained regularly, and the detection of the railway diversified foreign matter invasion limit condition cannot be met. Non-contact detection methods include infrared, millimeter wave radar, laser, video analysis, and the like. Among them, the intrusion target recognition detection method based on video analysis can obtain relatively intuitive detection results, and therefore is widely adopted by railway security systems.

With the development of deep learning technology, the people intrusion monitoring method based on deep learning is gradually mature. The intrusion target recognition detection method based on video analysis can judge the category of an intrusion target by utilizing a deep learning model, and has high judgment accuracy, but the accuracy of the deep learning recognition model depends on the abundance of the types of a large number of intrusion image samples used for model training and the number of the samples to a great extent, and higher recognition accuracy can be obtained when the number of the samples is larger.

Because the high-speed railway is a closed operation scene, the intrusion target image sample in the high-speed railway scene is very difficult to obtain, even if the intrusion target image sample can be obtained, only a few railway scenes exist, and the intrusion scene is not rich enough. Meanwhile, constructing an annotation data set for deep learning requires a large amount of manual image annotation work, consumes a large amount of manpower, and may cause inaccurate annotation due to human reasons.

Disclosure of Invention

The embodiment of the invention provides a method and a system for generating an image sequence of railway pedestrian intrusion based on posture migration, which aim to overcome the defects of the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme.

In one aspect, the invention provides a method for generating an image sequence of railway pedestrian intrusion based on posture migration, which comprises the following steps:

acquiring a pedestrian attitude sequence in a non-railway scene as a target attitude sequence;

establishing a model and training to obtain a posture migration pedestrian generation model, wherein the posture migration pedestrian generation model comprises a generator and a discriminator, and the discriminator comprises an appearance discriminator and a posture discriminator;

inputting a preset pedestrian appearance image and a target posture sequence into a posture migration pedestrian generation model to obtain a plurality of target pedestrian images; and

and synthesizing the target pedestrian image into the railway scene image to generate a railway pedestrian invasion image.

Optionally, the method further comprises:

and generating a pedestrian invasion video sequence of the railway scene according to the target attitude sequence by using the plurality of railway pedestrian invasion images.

Optionally, the method further comprises:

and automatically labeling the railway pedestrian invasion image to generate a railway scene pedestrian invasion labeled image.

Optionally, after the target pedestrian image is synthesized into the railway scene image, the position coordinates of the pedestrian in the railway pedestrian invasion image and the height data and width data of the pedestrian frame are recorded, and a labeling image of the railway scene pedestrian invasion is obtained.

Optionally, pedestrian attitude data of the public data set is extracted, the corresponding pedestrian image and the extracted pedestrian attitude data are used as the data set to train the model, and the attitude transition pedestrian generation model with the optimal parameters is obtained.

Optionally, the training method comprises:

extracting pedestrian image samples in a training set and pedestrian postures corresponding to the pedestrian image samples;

inputting the extracted pedestrian image sample and the pedestrian posture corresponding to the pedestrian image sample into a generator, and generating a pedestrian posture transition image with the input pedestrian image appearance and target posture according to the preset target posture and the appearance of the pedestrian image sample;

the discriminator discriminates the pedestrian posture migration image; and

and when the preset loss value is reached, obtaining an attitude transition pedestrian generation model with the optimal parameters.

Optionally, the appearance discriminator is configured to determine whether an appearance of the generated pedestrian image is consistent with a pedestrian appearance of the input pedestrian image, and the posture discriminator is configured to determine whether the generated pedestrian posture is consistent with the target posture;

the appearance discriminator is structured as follows:

the splicing layer is used for respectively splicing the input pedestrian posture migration image and the target pedestrian posture, and the target image and the target pedestrian posture to obtain a splicing vector;

the plurality of downsampling modules are used for downsampling the spliced vectors to obtain downsampled feature vectors; and

a plurality of residual error modules for extracting the characteristics of the down-sampled characteristic vectors to obtain the distinguishing characteristics,

wherein the appearance discriminator completes discrimination based on the discrimination characteristics.

In a second aspect, the present invention further provides a system for generating an image sequence of a pedestrian intrusion on a railway based on gesture migration, including:

the attitude extraction module is used for acquiring a pedestrian attitude sequence in a non-railway scene to obtain a target attitude sequence, extracting pedestrian attitude data of the Deep fast on public data set and generating a data set from a pedestrian image corresponding to the pedestrian attitude data;

the pedestrian attitude transition generation module is used for establishing a model, training the model through a data set generated by pedestrian images corresponding to pedestrian attitude data to obtain an attitude transition pedestrian generation model with optimal parameters, and inputting a preset pedestrian appearance image and a target attitude sequence into the pedestrian attitude transition generation model to obtain a plurality of target pedestrian images; and

and the railway scene invading pedestrian image synthesis module is used for inserting the target pedestrian image into the railway scene image to generate a railway pedestrian invading image.

Optionally, the system further comprises:

and the railway scene invading video generation module is used for generating a railway scene invading video sequence according to the railway pedestrian invading images generated by the railway scene invading pedestrian image synthesis module and the target attitude sequence.

Optionally, the system further comprises:

and the automatic pedestrian invasion image labeling module is used for automatically labeling the railway pedestrian invasion image and generating a railway scene pedestrian invasion labeled image.

The invention has the beneficial effects that: the pedestrian appearance image and the target posture sequence are input into the posture migration pedestrian generation model after the posture migration pedestrian generation model is obtained, a plurality of target pedestrian images are obtained, the target pedestrian images are inserted into the railway scene images to generate railway pedestrian invasion images so as to obtain the railway scene pedestrian invasion image sequence with rich pedestrian appearance and railway scene, in the generated railway pedestrian invasion image data, the pedestrian appearance texture is clear, the posture is clear, the authenticity is high, and the problems that the railway scene pedestrian invasion image is rare and difficult to obtain are solved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a block diagram of an image sequence generation system for pedestrian intrusion on a railway based on gesture migration according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of an image sequence generation method for pedestrian intrusion on a railway based on gesture migration according to an embodiment of the present invention;

FIG. 3 is a block diagram of a generator according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a spatial attention mechanism feature fusion module according to an embodiment of the present invention;

FIG. 5 is a frame diagram of a pedestrian pose extraction module according to an embodiment of the present invention;

FIG. 6 is a sequence diagram of a pedestrian image and corresponding extracted poses provided by an embodiment of the present invention;

FIG. 7 is a block diagram of a pedestrian-generating process for posture migration according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an appearance discriminator according to an embodiment of the present invention;

fig. 9 is a comparison diagram of the image generation results of the GSGAN algorithm and the PATN algorithm according to the generation effect provided by the embodiment of the present invention;

FIG. 10 is a diagram illustrating an effect of generating a sequence of images of pedestrian intrusion on a railway according to an embodiment of the present invention;

fig. 11 is a schematic diagram of an annotated image of a railway scene of pedestrian intrusion according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are exemplary only for explaining the present invention and are not construed as limiting the present invention.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Interpretation of terms:

a railway scene: the method refers to a scene that a high-speed train passes or may pass under closed operation and cannot directly adopt a camera to acquire an image of pedestrian invasion

Non-railway scenarios: the video of the action sequence of various postures of a certain pedestrian in the scene can be acquired by utilizing a camera.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Example 1

As shown in fig. 1, the present embodiment 1 provides an image sequence generation system for pedestrian intrusion on a railway based on gesture migration, including:

In embodiment 1, with the above system for generating an image sequence of pedestrian intrusion on a railway based on pose migration, a method for generating an image sequence of pedestrian intrusion on a railway based on pose migration is implemented, which includes:

in one aspect, the invention provides a method for generating an image sequence of pedestrian invasion of a railway based on posture migration, which comprises the following steps:

acquiring a pedestrian attitude sequence in a non-railway scene to obtain a target attitude sequence;

Optionally, the method further comprises:

Optionally, after the target pedestrian image is synthesized into the railway scene image, the position coordinates of the pedestrian in the railway pedestrian invasion image and the height data and the width data of the pedestrian frame are recorded, and a labeling image of the railway scene pedestrian invasion is obtained.

Optionally, the training method comprises:

the discriminator discriminates the pedestrian posture migration image; and

the appearance discriminator is structured as follows:

a plurality of residual error modules for extracting the features of the down-sampled feature vectors to obtain the distinguishing features,

Example 2

In this embodiment, the Ubuntu 7.5.0-3Ubuntu1 operating system and NVIDIA Tesla V100S GPU are trained by PyTorch deep learning platform, the processor is Hygon C86 7165, and the number of single-batch samples is set to 16. In addition, in the embodiment, the update of the model parameters is implemented by using an Adam (Adaptive motion Estimation) optimization algorithm, the initial learning rate is set to 0.0002, and each epoch is reduced by 1 × 10 on the basis of the last epoch starting at the 499 th epoch ^-6 The learning rate adjustment method is shown in equation (1), and the two exponential decay rates are 0.5 and 0.999. In addition, the weight parameters of the three loss functions are respectively set to be lambda _gan ＝5，λ _l1 ＝10，λ _per =10 and feature mapping channel C =128.

In the formula I _r For the current learning rate,/ _r0 Is the initial learning rate; max is a function of taking the maximum value; epoch is the number of trains.

As shown in fig. 2, the present embodiment 2 provides a method for generating an image sequence of pedestrian intrusion on a railway based on pose migration, which includes the following steps:

step 1: and acquiring a pedestrian attitude sequence in a non-railway scene as a target attitude sequence.

Specifically, in this embodiment, a railway scene a and a non-railway scene B are taken as examples, a pedestrian posture sequence in the scene B is extracted as a target posture, and a posture sequence is generated, and the posture sequence can be used as a target posture sequence for performing posture migration from the scene B to the scene a.

Step 2: and establishing a model and training to obtain a posture migration pedestrian generation model, wherein the posture migration pedestrian generation model comprises a generator and a discriminator, and the discriminator comprises an appearance discriminator and a posture discriminator.

Specifically, the pedestrian attitude transition generation model is based on GSGAN, and the GSGAN algorithm structure comprises a generator and two discriminators. The input of the generator comprises two parts of pedestrian appearance and pedestrian posture, and the purpose of the generator is to learn the distribution from the original pedestrian posture to the target pedestrian posture and generate a pedestrian image of the target posture to deceive the discriminator. Inputting a desired pedestrian appearance image S and a series of desired target posture sequences P _i (i =1, \ 8230;, N, representing the number of pose sequences), extracted from the N frames of video images in the scene B described above, the pose migration pedestrian creation model performs its own internal countermeasure, creating N pedestrian images with the appearance S and the pose Pi. The discriminator includes an appearance discriminator and a posture discriminator. The input of the appearance discriminator is a real sample pair and a generated sample pair. The input of the attitude discriminator is a real attitude pair and a generated attitude pair. The two discriminators further improve the learning ability of the discriminators through gradient propagation by calculating the loss between the label predicted value and the label real value of the input sample, so that the authenticity can be better judged, and the generator can be further supervised to improve the quality of the pedestrian image sample.

In the conventional technology, the pedestrian sample appearance texture and skeleton posture generated by a network model taking GAN as a frame generally have the problem of unclear definition. The generation of the pedestrian appearance texture and the effect of pose migration depend substantially on the network structure of the algorithm, so the generator in this embodiment is a structure as shown in fig. 3, which contains N layers (N =9 in this embodiment), a unit consisting of a GCN module and a spatial attention feature fusion module, which can be used to generate a pedestrian image with a target pose. The generator in the embodiment solves the problems of the definition of the appearance texture and the accuracy of the posture migration, and can generate a pedestrian image sample sequence with clear appearance texture and a specified action according to a specified posture sequence.

Specifically, the GCN module is a posture migration module based on a graph convolution network, and can aggregate node characteristics near one node, and the characteristics of the node are learned through weighted aggregation to facilitate a prediction task. The pedestrian attitude graph comprises 18 nodes, each node has own characteristics, the characteristics of the nodes form a characteristic matrix, and the relationship among the nodes also forms a matrix. After the target attitude and the source attitude corresponding to the input image are input into the GCN module, through layer-to-layer propagation, the source attitude is subjected to node relation modeling through spatial mapping, and then is mapped back to the image space to obtain an intermediate attitude, and the intermediate attitude is used as one path of input of the spatial attention mechanism feature fusion module.

Specifically, as shown in fig. 4, the spatial attention mechanism feature fusion module has two inputs, one is the intermediate pose P generated by the GCN module in the previous layer, and the other is the image appearance feature I (the appearance feature of each layer is obtained by adding the appearance feature input by the spatial attention mechanism feature fusion module in the previous layer and the output appearance feature). Wherein, the input of the attitude characteristic P enters the space attention module SA through the convolution layer and the activation layer to generate an attitude P ', and the attitude P ' is fused with the characteristic generated by the appearance characteristic I through two layers of convolution to generate the output appearance characteristic I '. The structure can enhance the fusion of appearance and posture characteristics, and can search the position for gathering important information by adding a space attention module SA, so as to guide the output of appearance characteristics.

In this embodiment, the deepwashion public data set is used as a training set. The Deepfashion public dataset contains 48674 garment image samples, of which there are 50 garment category images, including three types of upper body garments, lower body garments, and full body garments. Extracting a posture skeleton in a 48674 model clothing sample in the Deepwashion public data set to generate 18 key point coordinates, wherein in the final posture coordinate data, the posture characteristic of each pedestrian image corresponds to the X and Y coordinates of 18 key points.

As shown in fig. 5, the module is constructed on the basis of a convolutional neural network, and first represents a key point association region in an image by using a vector field, determines key points of pedestrians in the image based on the extracted association region, and then implements key point matching of the same pedestrian based on a hungarian algorithm, and finally generates a pedestrian attitude sequence of a video frame. Firstly, single pedestrian attitude coordinate data of a DeepFashinon public data set are generated based on an algorithm, and the single pedestrian attitude coordinate data and pedestrian image data form a database for training an attitude migration algorithm. In addition, the pedestrian attitude sequence generated in other non-railway scenes is obtained based on the attitude extraction algorithm, the target attitude is provided for the attitude migration algorithm, and the effect of the generated pedestrian attitude sequence is shown in fig. 6.

As shown in fig. 7, the algorithm framework model is used to train the network by using the depfashinon public data set and its pose data in the training stage, and to generate a pedestrian image sequence with a desired appearance and a target pose sequence by using the optimal generator model in the application stage.

The training steps of the posture migration pedestrian generation model are as follows:

step 2.1: extracting pedestrian image samples in a training set and pedestrian postures corresponding to the pedestrian image samples;

step 2.2: inputting the extracted pedestrian image sample and the pedestrian posture corresponding to the pedestrian image sample into a generator, and generating a pedestrian posture transition image with the input pedestrian image appearance and target posture according to the preset target posture and the appearance of the pedestrian image sample;

step 2.3: the discriminator discriminates the pedestrian posture migration image;

step 2.4: and when the preset loss value is reached, obtaining an attitude transition pedestrian generation model with the optimal parameters.

In this embodiment, the character image sample in the Deeparfashion data set, the posture corresponding to the sample, and the sample target posture are used as input and provided to the generator, and the output of the generator is the character image corresponding to the target posture. The input character posture and the target posture are spliced firstly and then input into a generator network, wherein the splicing channel comprises three channels of RGB of the character image sample. In order to increase the correlation between the image sample source pose and the target pose, in this embodiment, the image convolution network is used to extract the feature of the pose, the spatial attention mechanism is finally used to acquire the key feature of the pose and combine the key feature with the appearance feature for output, and after a plurality of image convolution networks and feature fusion, the feature is up-sampled to obtain the finally generated pedestrian image. The arbiter is used to calculate the loss, better supervising the generator. The input of the attitude discriminator is a false sample pair of the generated image and the target attitude image and a true sample pair of the target image and the target attitude image; the input of the appearance discriminator is a pair of false samples of the generated image and the input image and a pair of true samples of the target image and the input image.

As shown in fig. 8, a network model of the appearance discriminator is obtained by first stitching an input generated image and target pose pair with a target image and target pose pair to obtain vectors, performing downsampling by convolution with two steps of 2, performing feature extraction by 6 residual error modules, and judging features to realize the task of the discriminator. The network model of the pose discriminator is basically similar to the appearance discriminator, and the appearance discriminator is taken as an example for explanation in this embodiment, and the description of the pose discriminator is omitted here.

The loss function adopts three types of antagonistic loss, pixel-level L1 loss, and perceptual loss. The countermeasure loss is constituted by the loss of the appearance discriminator and the pose discriminator in order to judge the possibility of the same appearance in the input of the appearance discriminator and the degree of matching of the pose of the pedestrian in the generated image with the pose of the target pedestrian in the input of the pose discriminator. The generator wants the generated image to be judged to be true by the discriminator, so two attitude image pairs are input to the attitude discriminator, and the loss value of the output value of the discriminator and the tag value is calculated by using the BCE loss function in the pyrrch frame. The appearance confrontation loss is to input two image pairs into an appearance discriminator, and also to calculate the loss value of the label and the output of the appearance discriminator by using a BCE loss function. The L1 penalty at the pixel level is used to calculate the difference between the generated image and the target image.

Perceptual Loss (Perceptual Loss) aims to reduce the pose deformation and make the generated image look more natural and smooth. The training process for generating the countermeasure network is a process of alternately optimizing the generator and the arbiter. The training goal of the generator is to minimize the objective function so that the generated data distribution is close to the real image. And the training goal of the arbiter is to maximize the objective function.

In the training stage, a DeepFashinon public data set and posture data thereof are used for network training, GSGAN inputs are two groups of posture and image data, so that a group of target posture and target image are set for each group of data, the pedestrian appearance is ensured to be consistent while the network learning posture is migrated, and then the two groups of posture data and the pedestrian image are input into the network in pairs for training.

And carrying out countermeasure training on the generator and the discriminator based on the network structure and the loss function. In the initial stage of training, the effect of generating samples by the generator is poor, and the output of the appearance discriminator and the posture discriminator to the negative sample is 0 and the output to the positive sample is 1. And under the condition that the parameters of the discriminator are fixed, feeding the loss back to the generator, and updating the network weight value of the generator according to the loss by the generator to regenerate the pedestrian image. And repeating the iteration until the discriminator cannot judge the positive and negative samples. At the moment, the generator learns some features, the weight value of the features is not updated any more, the generator parameters are fixed, the weight parameters of the arbiter network are updated, and iteration is repeated until the arbiter can correctly distinguish positive and negative samples. At this time, the network weight of the discriminator is not updated any more, and the discriminator parameter is fixed. Thus, the optimal generator and discriminator model parameters are obtained as the optimal model of the application stage.

And step 3: and inputting the preset pedestrian appearance image and the target posture sequence into a posture migration pedestrian generation model to obtain a plurality of target pedestrian images.

And 4, step 4: and synthesizing the target pedestrian image into the railway scene image to generate a railway pedestrian invasion image.

Specifically, the size of the pedestrian image needs to be adjusted according to the position of the pedestrian in the railway scene in the generation process, and the generated height of the pedestrian is ensured to have similarity with the real person at the position. In the embodiment, the generation of the pedestrian height is realized by adopting a method that parallel straight lines are projected to an image space in a real scene and are compared with vanishing points, firstly, straight lines of two parallel steel rails are extracted, and the vanishing points in the image are determined. The pedestrian attitude transition generation module can generate pedestrians with various appearances, and images of pedestrians invaded by different appearances in different railway scenes can be realized as long as the pedestrians are input into different railway scenes.

In order to verify the effectiveness of the embodiment, an ablation experiment IS performed in the embodiment, and the network model IS evaluated by using three evaluation indexes of IS, SSIM and Pckh. As shown in table 1, the test results of the ablation experiments.

Table 1 test results of ablation experiments

As can be seen from table 1, in this example, the original network GAN is used as a baseline for comparative analysis, and it can be seen that after the GCN module is added, the index of pckh0.5 is improved from 0.9651 to 0.9702, which indicates that the GCN module enhances the relationship between the original posture and the target posture and improves the posture migration effect; after the SA module IS added, SSIM and IS indexes for evaluating image quality are respectively improved by 0.0557 and 0.14, and Pckh0.5 IS also improved by 0.0236, which shows that the SA module plays an important role in improving the attitude migration effect and generating the image quality.

Secondly, in order to verify the effectiveness of the proposed GSGAN algorithm, a comparison experiment is performed between the proposed GSGAN algorithm and a classical pedestrian attitude transition algorithm PATN in the embodiment. In this embodiment, the training data sets of the GSGAN algorithm and the PATN algorithm are identical, and the relevant parameters of the PATN are set to be optimal. As shown in fig. 9, compared with the second line GSGAN algorithm, the first line PATN algorithm has a poor generation effect in both the details of the pedestrian clothing and the details of the five sense organs of the face. The lower body of the first three images generated by the PATN is very fuzzy, and the legs of the first three images generated by the GSGAN algorithm are clearly visible. As shown in table 2, the GSGAN algorithm achieved a higher score in all three indicators for comparison of the test results of the two algorithms.

A pckh0.5 of 0.9938 indicates that the pedestrian posture generated by the GSGAN algorithm in the embodiment is basically not different from the target posture, and is 0.0284 higher than that of PATN, which proves that the method in the embodiment has a better posture migration effect.

TABLE 2 test results of two algorithms

The image synthesis algorithm in this example is to synthesize the determined pedestrian size with the railway background based on the unchanged ratio of the track gauge to the pedestrian size. Firstly, railway empty scene images in normal operation periods in monitoring videos of an original flat part of a railway section, a ring railway test field of China railway science research institute, a Guangzhou to Shenzhen high-speed railway section and a Baoji to Lanzhou high-speed railway section are collected, the images comprise different positions of a railway, such as a tunnel portal, a turnout, a cutting and different weather conditions, such as sunny days, cloudy days, rainy days and the like, and the image resolution is 1920 x 1080. Firstly, preprocessing an image to filter other noises or trees and the like except iron rails in the image, detecting steel rail straight lines and blanking points in the image through Hough transform, determining straight lines among steel rails through projection to determine sleeper pixels at different positions, and finally realizing the estimation of pedestrian dimensions. And finally, synthesizing with the railway background, and generating an effect graph of the railway pedestrian invasion image sequence as shown in fig. 10. And finally, forming pedestrian invasion image sequences of 11 pedestrian appearances and 12 railway scenes, wherein each image sequence comprises 92 frames of images continuously invaded by pedestrians, and the converted video of the image sequence comprises 132 videos, and 12144 images of data. In addition, the deep learning algorithm is trained on the basis of the intrusion image and the annotation data thereof, but the annotation of a large amount of image data consumes manpower, and the method automatically obtains the annotation data while generating the intrusion image, so that the time and the manpower are saved.

In order to verify whether the generated image meets the requirement of a training sample, the example repeatedly tests the image sequence by using four pre-trained network models, namely Yolov3, faster R-CNN, SSD and DSSD-321 which are widely used at present, and evaluates the result by using Average Precision (AP). Table 3 shows the test results of the generated image sequence, and each detection network has a higher test score for the generated sample library and a smaller difference with the test score of the real database, which indicates that the railway pedestrian invasion image sequence generated herein is close to the real pedestrian invasion image, and proves that the generated invasion image sequence meets the requirements of the training sample.

TABLE 3 test evaluation of target detection network pair Generation and actual samples

In addition, the embodiment finally provides training data for the foreign matter intrusion detection network, and in order to verify the effectiveness of the generated pedestrian intrusion image data as the training data, the embodiment verifies the improvement effect of the detection network model by using the pedestrian intrusion image data as the training data. Five groups of data sets are set, and the first group of data only comprises 672 real railway pedestrian invasion data, wherein 966 invaded pedestrians are included; and then, adding 1000 sheets of data in each group on the basis of the upper group to generate railway pedestrian invasion images, wherein the number of the invasion pedestrians is increased by 1000. And training the target detection network by utilizing the five groups of data to obtain five target detection models, and testing the five target detection models respectively to verify the validity of the generated data. The method is based on a Yolov3 network model and utilizes AP to evaluate, the result is shown in Table 4, yolov3 can be trained to a satisfactory state only by using 672 sheets of real data, the detection precision reaches 71.6%, but the detection precision of the model is improved by 0.4%, the detection precision of the model is improved by 7.4%, the detection precision of the model is improved by 9.4% and the detection precision of the model is improved by 12.7% after 1000 sheets of generated data are added, which shows that the intrusion data generated in the method can improve the detection precision of a target detection network model, namely, the generated image sequence can be used as training data to detect the research, development and test of the network.

TABLE 4 test results of different training set models

And 5: and generating a railway scene intrusion video sequence by the plurality of railway pedestrian intrusion images according to the target attitude sequence.

Specifically, a railway scene intrusion video sequence which has the same posture and different appearances in the non-railway scene B is generated in the railway scene A according to the sequence of the pedestrian posture sequences in the non-railway scene B by the generated multiple railway scene pedestrian images, and the video sequence can migrate pedestrians and input expected pedestrian appearances according to different pedestrian posture sequences and postures in different scenes B to generate railway scene pedestrian intrusion video sequences with different postures, different appearances and continuous actions.

Step 6: and automatically labeling the railway pedestrian invasion image to generate a railway scene pedestrian invasion labeled image.

Specifically, the method can automatically work in the working process of a railway scene pedestrian image synthesis module, generates a labeled image sample for pedestrian invasion of the railway scene according to formats of labeled data in different data sets, and synthesizes pedestrian images with different appearances and different postures generated by a posture migration pedestrian generation module into the railway scene, wherein the size of a pedestrian in the image and the position of the pedestrian in the railway scene are related in the synthesis process, and the actual labeling process of the railway pedestrian image is to label the position of a pedestrian frame and the height and width of the pedestrian frame in the image; therefore, in this embodiment, the position and size of the pedestrian in the synthetic image are directly recorded in the synthetic process of the image of the pedestrian intruding into the railway scene, and the synthetic image form the automatic annotation data set of the image of the pedestrian intruding into the railway, including the corresponding image of the railway intruding into the railway, and the position of the synthetic pedestrian target and the height and width of the pedestrian frame.

As shown in fig. 8, a box 1 in the figure represents a railway scene image, the upper left corner of which is pixel coordinates (0, 0), a box 2 in the figure is a minimum surrounding rectangle for pedestrians after the pedestrians are synthesized into the railway scene image, and the upper left corner of which is pixel coordinates (x) in the railway scene image ₀ ,y ₀ ) H is the height, and W is the width; and automatically generating an annotation image in the railway scene according to the parameters. The automatically generated labeled data can adjust the specific content and format of the generated labeled data according to different data and the required labeled format so as to adapt to the use requirements of different data sets. According to the different positions of the generated pedestrian images in the railway scene images, pedestrian images with different heights H and widths W are generated according to a railway scene pedestrian intrusion image synthesis method, and different railway pedestrian intrusion image labeling data samples are generated.

In summary, in the embodiment of the invention, the pedestrian attitude sequence in the non-railway scene is obtained to obtain the target attitude sequence, the model is then established and trained to obtain the attitude transition pedestrian generation model, the preset pedestrian appearance image and the target attitude sequence are input into the attitude transition pedestrian generation model to obtain a plurality of target pedestrian images, the target pedestrian images are inserted into the railway scene image to generate the railway pedestrian invasion image, so as to obtain the railway scene pedestrian invasion image sequence with rich pedestrian appearance and railway scene, and in the generated railway pedestrian invasion image data, the pedestrian appearance texture is clear, the attitude is distinct, the authenticity is high, and the problem that the railway scene pedestrian invasion image is rare and difficult to obtain is solved.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, the method or system embodiments, which are substantially similar to the method embodiments, are described in a relatively simple manner with reference to the method embodiments in part of the description. The above-described embodiments of the method and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement without inventive effort.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for generating an image sequence of railway pedestrian intrusion based on posture migration is characterized by comprising the following steps:

inputting a preset pedestrian appearance image and the target posture sequence into the posture migration pedestrian generation model to obtain a plurality of target pedestrian images; and

and synthesizing the target pedestrian image into a railway scene image to generate a railway pedestrian invasion image.

2. The method of claim 1, further comprising:

and generating a railway scene pedestrian invasion video sequence by the plurality of railway pedestrian invasion images according to the target attitude sequence.

3. The method of claim 1, further comprising:

and automatically labeling the railway pedestrian invasion image to generate a labeled image of railway scene pedestrian invasion.

4. The method according to claim 3, wherein after the target pedestrian image is synthesized into the railway scene image, position coordinates of pedestrians and height data and width data of pedestrian frames in the railway pedestrian invasion image are recorded, and the annotation image of the railway scene pedestrian invasion is obtained.

5. The method of claim 1, wherein pedestrian attitude data of a common data set is extracted, and the model is trained using a corresponding pedestrian image and the extracted pedestrian attitude data as a training set to obtain the attitude transition pedestrian generation model with optimal parameters.

6. The method of claim 5, wherein the training method comprises:

extracting pedestrian image samples in the training set and pedestrian postures corresponding to the pedestrian image samples;

inputting the extracted pedestrian image samples and pedestrian postures corresponding to the pedestrian image samples and a preset target posture into a generator, and generating a pedestrian posture transition image with the input pedestrian image appearance and target posture according to the preset target posture and the appearance of the pedestrian image samples;

the discriminator discriminates the pedestrian posture migration image; and

and when a preset loss value is reached, obtaining the attitude transition pedestrian generation model with the optimal parameters.

7. The method of claim 6,

the appearance discriminator is used for judging whether the appearance of the generated pedestrian image is consistent with the pedestrian appearance of the input pedestrian image or not, and the posture discriminator is used for judging whether the generated pedestrian posture is consistent with the target posture or not;

the appearance discriminator includes:

the splicing layer is used for respectively splicing the input pedestrian posture migration image and the target pedestrian posture, and the input target image and the input target pedestrian posture to obtain a splicing vector;

the plurality of downsampling modules are used for downsampling the spliced vector to obtain a downsampled feature vector; and

a plurality of residual error modules for extracting the characteristics of the down-sampled characteristic vectors to obtain distinguishing characteristics,

wherein the appearance discriminator completes discrimination based on the discrimination feature.

8. An image sequence generation system for railway pedestrian intrusion based on gesture migration, comprising:

the attitude extraction module is used for acquiring a pedestrian attitude sequence in a non-railway scene to obtain a target attitude sequence, extracting pedestrian attitude data of a public data set and generating a data set from a pedestrian image corresponding to the pedestrian attitude data;

the pedestrian attitude transition generation module is used for establishing a model, training the model through the data set generated by the pedestrian image corresponding to the pedestrian attitude data to obtain an attitude transition pedestrian generation model with optimal parameters, and inputting a preset pedestrian appearance image and the target attitude sequence into the attitude transition pedestrian generation model to obtain a plurality of target pedestrian images; and

9. The system of claim 8, further comprising:

and the railway scene invading video generating module is used for generating railway scene invading video sequences according to the railway pedestrian invading images generated by the railway scene invading image synthesizing module and the target attitude sequence.

10. The system of claim 8, further comprising:

and the automatic pedestrian invasion image labeling module is used for labeling the railway pedestrian invasion image and generating a labeling image of railway scene pedestrian invasion.