CN113158854A - Automatic monitoring train safety operation method based on multi-mode information fusion - Google Patents

Automatic monitoring train safety operation method based on multi-mode information fusion Download PDF

Info

Publication number
CN113158854A
CN113158854A CN202110376832.3A CN202110376832A CN113158854A CN 113158854 A CN113158854 A CN 113158854A CN 202110376832 A CN202110376832 A CN 202110376832A CN 113158854 A CN113158854 A CN 113158854A
Authority
CN
China
Prior art keywords
train
image
illumination
sample
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110376832.3A
Other languages
Chinese (zh)
Other versions
CN113158854B (en
Inventor
沙晓鹏
崔逸丰
曹加奇
赵玉良
陈若愚
李文超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University Qinhuangdao Branch
Original Assignee
Northeastern University Qinhuangdao Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University Qinhuangdao Branch filed Critical Northeastern University Qinhuangdao Branch
Priority to CN202110376832.3A priority Critical patent/CN113158854B/en
Publication of CN113158854A publication Critical patent/CN113158854A/en
Application granted granted Critical
Publication of CN113158854B publication Critical patent/CN113158854B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention provides an automatic train safety operation monitoring method based on multi-mode information fusion, and relates to the technical field of railway traffic management and information fusion. The invention carries out fusion analysis on train running video frames and wheel track impact audio signals collected in a monitoring interval by a computer image processing technology and an audio information processing technology, calculates train running state information and automatically monitors the running state of the train. The system comprises a camera installed on a railway field and a multi-mode information fusion processing algorithm. The system can automatically detect and analyze information of the trains running under different illumination environments, and effectively ensures the safe running of the trains.

Description

Automatic monitoring train safety operation method based on multi-mode information fusion
Technical Field
The invention relates to the technical field of railway traffic management and information fusion, in particular to an automatic train safety monitoring method based on multi-mode information fusion.
Background
In modern transportation modes, compared with other transportation modes, railway transportation has the advantages of large carrying capacity, low transportation cost, high running speed and the like, so that the railway transportation plays an increasingly important role in the transportation field. At the same time, the load and transport speed of railway transportation are increasing continuously in the face of increasing transportation demands. Therefore, how to better ensure the safe operation of the train under the conditions of high-speed operation and high carrying capacity also becomes a main problem to be solved by railway train developers and scientific researchers.
In recent years, with the continuous maturity and development of video image processing technology and audio processing technology, more and more researchers obtain train operation parameters by analyzing signals such as video and audio during train operation, so as to realize supervision on train operation state and effectively guarantee safe train operation. However, the existing train supervision system cannot accurately identify the train at night and in a harsh illumination environment, and has the disadvantages of single detection parameter, single detection mode, high cost and the like, so a new method is proposed to solve the problems.
Through carrying out fusion analysis on the video frame of the train in the monitoring interval and the wheel track impact audio signal, the problem that the train running state cannot be accurately acquired in a single video mode under three conditions of sufficient illumination, insufficient illumination, no illumination and the like can be effectively solved. Meanwhile, only a camera device installed on site is used, and the system cost is reduced.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an automatic train safety operation monitoring method based on multi-mode information fusion, which realizes the identification and positioning of trains in different illumination environments and the acquisition of information such as train operation speed, train number, number of carriages, operation time and the like, and provides guarantee for better safe operation of trains.
The technical scheme adopted by the invention is as follows:
a multi-mode information fusion automatic train safety monitoring method comprises the following steps:
step 1: through the camera of the on-the-spot installation of railway, strike audio information and gather train video image information and rail, carry out all-weather information acquisition to the train control section, set up the illumination intensity interval to the information of gathering is categorised to the illumination intensity of three kinds of differences: namely, the illumination environment is good, and is recorded as A; if the lighting environment is insufficient, recording as B; without light environment, record C.
The good illumination environment is as follows: the illumination intensity meets the image frame sequence generated in the illumination intensity interval under the condition of good illumination environment. The average gray value is larger than that under the condition of night illumination, and the universal identification characteristic is achieved;
the insufficient illumination environment is as follows: the illumination intensity meets the illumination intensity interval under the condition of insufficient illumination environment, and the target object reflects light or emits light at night and is identified in the image as a characteristic;
the non-illumination environment comprises: the illumination intensity meets an illumination intensity interval under the condition of no illumination environment, a high gray value area is not generated in the imaging of the target object in the camera at night, the visual characteristic capable of identifying the target object does not exist in the image frame sequence, and at the moment, the audio information in the same time period is collected for analysis and identification;
step 2: designing an illumination classifier; through an illumination classifier, performing illumination classification on the audio and video image sequence acquired by the camera, marking the audio and video image sequence as an illumination environment classification A, B, C, and designing the illumination classifier by combining an illumination analysis algorithm and a characteristic analysis algorithm;
step 2.1: a is distinguished from B, C in two categories, namely A is classified into one category and B and C are classified into one category through a light analysis algorithm.
The illumination analysis algorithm is as follows:
in the A, B, C three categories, A is greater than B, C for both categories of background pixel average gray levels. For this case, an adaptive illumination analysis algorithm is proposed. First, a background ROI S is selectedroi1(x, y) and Sroi2(x, y) defining S in the k-th frameroi1(x, y) and Sroi2The coordinate points s (k) composed of the (x, y) area average gray-scale values are gray-scale feature points, and average gray-scale value data a (k, p, l) and b (k, p, l) representing the sample are obtained by the formulas (1) and (2).
Figure BDA0003011367260000021
Figure BDA0003011367260000022
Where N is the sample volume, kframeIs the total frame number of the image sequence of sample k, and m and n represent Sroi1(x, y) and Sroi2Image size of (x, y), Sroi1The coordinate row value of the (x, y) region is (i)1,m1) Column value of (j)1,n1),Sroi2The (x, y) region coordinate row and column value is (m)2,n2) P is the total number of samples, representing kframeP-frame pictures are randomly sampled. Then, the average value of P times of samples is calculated for a (k, P, l) and a (k, P, l) by the formulas (3) and (4),
Figure BDA0003011367260000023
Figure BDA0003011367260000024
wherein a (k), b (k) are S in the k-th sample sequence respectivelyroi1(x, y) and Sroi2The average gray value of (x, y), and finally the distance | l (k) | from the gray feature point to the origin is obtained by equations (5) and (6):
L(k)=(a(k),b(k))k∈(1,N) (5)
Figure BDA0003011367260000025
by determining the magnitude of the | l (k) | value, category a is distinguished from category B, C.
The two classification problems of the illumination analysis algorithm in the A class and the B, C class are that A and B, C are classified through a set threshold value.
Step 2.2: the reclassification is performed using a feature analysis algorithm for B, C.
The characteristic analysis algorithm is
The emphasis of the classification of B and C is whether visual features exist in Sroi(x, y) but is influenced by the change in illumination, imaging effect, and nighttime flying insect interference on the frame, and exists in SroiFeatures in (x, y) do not accurately represent class C. Therefore, analysis of the image data suggests a 'proportion' statistical feature. The 'duty'd (k) is defined as the number of characteristic frames nfeatureAnd total number k of sample frame sequencessampleThe ratio of (a) to (b), namely:
Figure BDA0003011367260000031
wherein a sequence of sample frames knIs selected from kframeThe first n frames. The feature frame is defined as being in a binary image BWSroiIn (x, y), an image of white gradation values exists. By calculating a preselected background image biGenerating a background image b (x, y) by the average gray value of each pixel point in (x, y), namely:
Figure BDA0003011367260000032
the subsequent image processing procedure is as follows:
c=ksample(i)/b(x,y)i∈(1,ksample)(9)
ksample(i)=ksample(i)-c*(x,y)i∈(1,ksample)(10)
BSWroi=Fl(ksample(i)),i∈(1,ksample)(11)
wherein the function Fl(x) Representing the image processing procedure of feature extraction. And c is a brightness ratio, the brightness of the background image is adaptively adjusted according to the image brightness of the current ith frame image, and the image interference in the background subtraction process is reduced. Computing a binary image BWSrThe total number of feature frames in oi (x, y) is:
Figure BDA0003011367260000033
nsample(k)=∑sequencek (13)
sequencek(l) A 'fractional sequence' of samples, called k, nsampleIs the number of "1" in the fractional sequence. Classification is made to B, C by setting a threshold for the fractional feature analysis of B, C categories.
And step 3: respectively extracting characteristic points of A, B, C-class audio and video information;
wherein, the characteristic point extraction of the A-class image sequence is that under the condition of sufficient illumination environment, the gap between two carriages is selected as the visual characteristic and marked as C1(x, y), selecting window J1(x, y), extracting target feature points in the video sequence. Given C1(x, y) and J1(x, y) set relationship to generate data set Dobject(x) Namely:
Figure BDA0003011367260000041
extracting characteristic points of the B-type image sequence, namely selecting an image sequence of head light imaging in 0: 00-1: 00 night time period, and marking the image sequence as C2(x, y), design Window J2(x, y), judgment C2(x, y) and J2(x, y) set relationship to generate data set Dobject(x) Namely:
Figure BDA0003011367260000042
obtaining Dobject(x) Then, a gray scale-time data set G generated by combining the gray scale statistic of each frame imagestatistics(x) Correcting the data to remove false positive characteristics caused by locomotive head illumination;
the characteristic point extraction is carried out on the C-type image sequence, namely, visual characteristics do not exist in the C-type image frame, and the train operation parameters are not applicable to be obtained in a visual mode, so that the identification of the train, the carriage counting and the acquisition of the operation time parameters are completed by processing and analyzing the audio signals. The noise generated by the wheel hitting the rail joint is periodic and is an obvious sound characteristic of the target train. Performing frame windowing on the sound waveform, calculating short-time energy, and generating a short-time energy-time curve Eshort(t);
And 4, step 4: the detection of train safety parameters is realized, wherein the train safety parameters specifically comprise train positioning, train running speed and train compartment counting;
the train positioning includes: time location and position location and number identification
The time positioning is to extract the time watermark in the image frame when the train is detected to enter the monitoring area, to complete template matching and identification after the target number is obtained, and to finish the target number Tk(x, y) and template number (0-9) M0(x,y)-M9(x, y) the white pixel value h is obtained by the following equations (16) and (17)k(l),
Figure BDA0003011367260000043
Figure BDA0003011367260000044
In the formula H1(x, y) is a binary image matching result, and then a matching similarity standard function S is providedk(l) Judging the accuracy, as formula (18) (19):
tk=min(hk(l)),l∈(0,9),k∈(1,6) (18)
Figure BDA0003011367260000045
tkas evaluation basis, the statistical value hk(l) And the minimum value tkRatio S ofk(l) The template image M with the maximum recognition similarity is defined as the recognition similaritylAnd (x, y) represents the matching result of the numbers, and the matching result is combined to output the time positioning information.
The position location and number identification is: train number identification under class A and B conditions, passing feature C1(x, y) positioning an area N (x, y), segmenting and rotating the N (x, y) to form a digital binary image, and evaluating the similarity degree of the target number and the template by using an exclusive or summation algorithm, wherein the evaluation similarity degree standard is
tk=min(hk(l)),l∈(0,9),k∈(1,6) (20)
Figure BDA0003011367260000051
And identifying and combining the digital information to obtain the train number.
The running speed of the train is the length l of the target train when the model of the locomotive and the model of the carriage are knownobjectIs also determined, Dobject(x) The frame difference between them can be converted into the running time t of the targetobject. When the train is detected to enter the monitoring area, the running speed v is calculated by the formula (22):
Figure BDA0003011367260000052
the train car count includes: counting visual features and counting audio modes;
wherein in classes a and B by visual feature counting, by visual means: a feature count of a selected car in the sequence of images; class C, without visual features, by audio: the train carriages are counted by two modes of counting and analyzing the audio peak value generated when the train impacts the track.
The beneficial effects produced by adopting the technical method are as follows:
the invention provides an automatic train safety operation monitoring method based on multi-mode information fusion, which has high accuracy and high operation speed and does not need to artificially extract the characteristics of train images. Through practical experimental verification, the method provided by the invention has higher accuracy in train carriage number identification and provides guarantee for safe train operation in good and extreme illumination environments, train in and out time detection, real-time operation speed and carriage accurate positioning. Meanwhile, the invention has the following advantages:
(1) the method is not influenced by hardware resources and computing resources, and can be deployed in the working environment as much as possible.
(2) And richer railway safety related parameter information can be output in real time.
(3) By fusing the video and the sound information, the system can be used all weather, and has stronger robustness to illumination change.
Drawings
FIG. 1 is a flow chart of a method for automatically monitoring safe operation of a train by selecting characteristics under different illumination environments in an embodiment of the invention;
FIG. 2 is a diagram illustrating feature selection under different illumination environments according to an embodiment of the present invention;
wherein, the graph (a) is under the condition of good illumination intensity; the graph (b) shows the case of insufficient light; panel (c) is in the absence of illumination;
FIG. 3 is a drawing for extracting the characteristics of train lights under insufficient illumination in the embodiment of the present invention;
FIG. 4 is a sound waveform characteristic diagram of a single section of a target car in an embodiment of the present invention;
wherein, the diagram (a) is a front view of a single carriage, the diagram (b) is a waveform of the single carriage after filtering, the diagram (c) is a short-time energy curve of the waveform, and the diagram (d) is the envelope analysis and the maximum value of the short-time energy curve;
FIG. 5 is a diagram illustrating a process for extracting time information in the positioning system according to an embodiment of the present invention;
FIG. 6 is a data set D in the absence of illumination according to an embodiment of the present inventionobject(x) Measuring the obtained relation graph of the number of the carriages and the audio time;
fig. 7 is a flowchart of digital image acquisition of train car numbers in the embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
A multi-modal information fusion method for automatically monitoring safe operation of a train is shown in figure 1 and comprises the following steps:
step 1: through the camera of the on-the-spot installation of railway, strike audio information to train video image information and rail and gather, the surveillance video parameter is in this embodiment: video size 1920 x 1080, frame rate 24fps, audio sampling rate 16 Khz. Carry out all-weather information acquisition to the train control section, set up the illumination intensity interval to the information that three kinds of different illumination intensities will gather is categorised: namely, the illumination environment is good, and is recorded as A; if the lighting environment is insufficient, recording as B; without light environment, record C.
The good illumination environment is as follows: the illumination intensity meets the image frame sequence generated in the illumination intensity interval under the condition of good illumination environment. The average gray value is larger than that under the condition of night illumination, and the identification characteristic of universality is provided, such as a background ROI between couplers in fig. 2(a), which represents that a new object enters the image.
The insufficient illumination environment is as follows: the illumination intensity satisfies the illumination intensity interval under the condition of insufficient illumination environment, and the target object reflects light or emits light at night and is recognized in the image as a feature, as shown in fig. 2(b) and 3.
The non-illumination environment comprises: the illumination intensity meets the illumination intensity interval under the condition of no illumination environment, the imaging of the target object in the camera does not generate a high gray value area at night, the visual features capable of identifying the target object do not exist in the image frame sequence, as shown in fig. 2(c), at the moment, the audio information in the same time period is collected for analysis, and identification is made, as shown in fig. 4.
Step 2: designing an illumination classifier; through an illumination classifier, performing illumination classification on the audio and video image sequence acquired by the camera, marking the audio and video image sequence as an illumination environment classification A, B, C, and designing the illumination classifier by combining an illumination analysis algorithm and a characteristic analysis algorithm;
step 2.1: a is distinguished from B, C in two categories, namely A is classified into one category and B and C are classified into one category through a light analysis algorithm.
The illumination analysis algorithm is as follows:
in the A, B, C three categories, A is greater than B, C for both categories of background pixel average gray levels. For this case, an adaptive illumination analysis algorithm is proposed. First, select background ROISroi1(x, y) and Sroi2(x, y) defining S in the k-th frameroi1(x, y) and Sroi2The coordinate points s (k) composed of the (x, y) area average gray-scale values are gray-scale feature points, and average gray-scale value data a (k, p, l) and b (k, p, l) representing the sample are obtained by the formulas (1) and (2).
Figure BDA0003011367260000071
Figure BDA0003011367260000072
Where N is the sample volume, kframeIs the total frame number of the image sequence of sample k, and m and n represent Sroi1(x, y) and Sroi2Image size of (x, y), Sroi1The coordinate row value of the (x, y) region is (i)1,m1) Column value of (j)1,n1),Sroi2The (x, y) region coordinate row and column value is (m)2,n2) P is the total number of samples, representing kframeP-frame pictures are randomly sampled. Then, the average value of P times of samples is calculated for a (k, P, l) and a (k, P, l) by the formulas (3) and (4),
Figure BDA0003011367260000073
Figure BDA0003011367260000074
wherein a (k), b (k) are S in the k-th sample sequence respectivelyroi1(x, y) and Sroi2The average gray value of (x, y), and finally the distance | l (k) | from the gray feature point to the origin is obtained by equations (5) and (6):
L(k)=(a(k),b(k))k∈(1,N) (5)
Figure BDA0003011367260000075
by determining the magnitude of the | l (k) | value, category a is distinguished from category B, C.
The two classification problems of the illumination analysis algorithm in the A class and the B, C class are that A and B, C are classified through a set threshold value.
Step 2.2: the reclassification is performed using a feature analysis algorithm for B, C.
The characteristic analysis algorithm is
The emphasis of the classification of B and C is whether visual features exist in Sroi(x, y) but is influenced by the change in illumination, imaging effect, and nighttime flying insect interference on the frame, and exists in SroiFeatures in (x, y) do not accurately represent class C. Therefore, analysis of the image data suggests a 'proportion' statistical feature. The 'duty'd (k) is defined as the number of characteristic frames nfeatureAnd total number k of sample frame sequencessampleThe ratio of (a) to (b), namely:
Figure BDA0003011367260000076
wherein a sequence of sample frames knIs selected from kframeThe first n frames. The feature frame is defined as being in twoValued image BWSroiIn (x, y), an image of white gradation values exists. By calculating a preselected background image biGenerating a background image b (x, y) by the average gray value of each pixel point in (x, y), namely:
Figure BDA0003011367260000077
the subsequent image processing procedure is as follows:
c=ksample(i)/b(x,y)i∈(1,ksample)(9)
ksample(i)=ksample(i)-c*(x,y)i∈(1,ksample)(10)
BSWroi=Fl(ksample(i)),i∈(1,ksample)(11)
wherein the function Fl(x) Representing the image processing procedure of feature extraction. And c is a brightness ratio, the brightness of the background image is adaptively adjusted according to the image brightness of the current ith frame image, and the image interference in the background subtraction process is reduced. Computing a binary image BWSroiThe total number of feature frames in (x, y) is:
Figure BDA0003011367260000081
nsample(k)=∑sequencek (13)
sequencek(l) A 'fractional sequence' of samples, called k, nsampleIs the number of "1" in the fractional sequence. Classification is made to B, C by setting a threshold for the fractional feature analysis of B, C categories.
And step 3: respectively extracting characteristic points of A, B, C-class audio and video information;
wherein, the characteristic point extraction of the A-class image sequence is that under the condition of sufficient illumination environment, the gap between two carriages is selected as the visual characteristic, as shown in figure 2(a), marked as C1(x, y), selecting window J1(x, y), extracting target feature points in the video sequence. Given C1(x, y) and J1(x, y) set relationship to generate data set Dobject(x) Namely:
Figure BDA0003011367260000082
the characteristic point extraction is carried out on the B-type image sequence, the image sequence of the head light imaging in 0: 00-1: 00 time period at night is selected, and the mark is C as shown in figure 32(x, y), design Window J2(x, y), judgment C2(x, y) and J2(x, y) set relationship to generate data set Dobject(x) Namely:
Figure BDA0003011367260000083
obtaining Dobject(x) Then, a gray scale-time data set G generated by combining the gray scale statistic of each frame imagestatistics(x) Correcting the data to remove false positive characteristics caused by locomotive head illumination;
the characteristic point extraction is carried out on the C-type image sequence, namely, visual characteristics do not exist in the C-type image frame, and the train operation parameters are not applicable to be obtained in a visual mode, so that the identification of the train, the carriage counting and the acquisition of the operation time parameters are completed by processing and analyzing the audio signals. The noise generated by the wheel hitting the rail joint is periodic and is an obvious sound characteristic of the target train. Performing frame windowing on the sound waveform, calculating short-time energy, and generating a short-time energy-time curve Eshort(t), as shown in FIG. 6, the short-time energy peak point is selected as the identification feature of the C-class image.
And 4, step 4: the detection of train safety parameters is realized, wherein the train safety parameters specifically comprise train positioning, train running speed and train compartment counting;
the train positioning includes: time location and position location and number identification
The time positioning is realized by aligning the images when the train is detected to enter the monitoring areaAnd extracting the time watermark in the frame, and completing template matching and identification after obtaining the target number, as shown in fig. 5. Target number Tk(x, y) and template number (0-9) M0(x,y)-M9(x, y) the white pixel value h is obtained by the following equations (16) and (17)k(l):
Figure BDA0003011367260000091
Figure BDA0003011367260000092
In the formula Hl(x, y) is a binary image matching result, and then a matching similarity standard function S is providedk(l) Judging the accuracy, as formula (18) (19):
tk=min(hk(l)),l∈(0,9),k∈(1,6) (18)
Figure BDA0003011367260000093
tkas evaluation basis, the statistical value hk(l) And the minimum value tkRatio S ofk(l) The template image M with the maximum recognition similarity is defined as the recognition similarityl(x, y) represents the matching result of the numbers, and then the matching result is combined to output time positioning information;
the position location and number identification is: identifying train numbers under class A and class B conditions; the train number acquisition process is as shown in FIG. 7, and passes through the feature C1(x, y) positioning an area N (x, y), segmenting and rotating the N (x, y) to form a digital binary image, and evaluating the similarity degree of the target number and the template by using an exclusive or summation algorithm, wherein the evaluation similarity degree standard is
tk=min(hk(l)),l∈(0,9),k∈(1,6) (20)
Figure BDA0003011367260000094
Identifying and combining the digital information to obtain a train number;
the running speed of the train is the length l of the target train when the model of the locomotive and the model of the carriage are knownobjectIs also determined, Dobject(x) The frame difference between them can be converted into the running time t of the targetobject(ii) a When the train is detected to enter the monitoring area, the running speed v is calculated by the formula (22):
Figure BDA0003011367260000095
the train car count includes: counting visual features and counting audio modes;
wherein in classes a and B by visual feature counting, by visual means: a feature count of a selected car in the sequence of images; class C, without visual features, by audio: the train carriages are counted by two modes of counting and analyzing the audio peak value generated when the train impacts the track.

Claims (3)

1. A multi-mode information fusion automatic train safety monitoring method is characterized by comprising the following steps:
step 1: through the camera of the on-the-spot installation of railway, strike audio information and gather train video image information and rail, carry out all-weather information acquisition to the train control section, set up the illumination intensity interval to the information of gathering is categorised to the illumination intensity of three kinds of differences: namely, the illumination environment is good, and is recorded as A; if the lighting environment is insufficient, recording as B; no light environment, record C;
the good illumination environment is as follows: the illumination intensity meets an image frame sequence generated in an illumination intensity interval under the condition of good illumination environment; the average gray value is larger than that under the condition of night illumination, and the universal identification characteristic is achieved;
the insufficient illumination environment is as follows: the illumination intensity meets the illumination intensity interval under the condition of insufficient illumination environment, and the target object reflects light or emits light at night and is identified in the image as a characteristic;
the non-illumination environment comprises: the illumination intensity meets an illumination intensity interval under the condition of no illumination environment, a high gray value area is not generated in the imaging of the target object in the camera at night, the visual characteristic capable of identifying the target object does not exist in the image frame sequence, and at the moment, the audio information in the same time period is collected for analysis and identification;
step 2: designing an illumination classifier; through an illumination classifier, performing illumination classification on the audio and video image sequence acquired by the camera, marking the audio and video image sequence as an illumination environment classification A, B, C, and designing the illumination classifier by combining an illumination analysis algorithm and a characteristic analysis algorithm;
and step 3: respectively extracting characteristic points of A, B, C-class audio and video information;
wherein, the characteristic point extraction of the A-class image sequence is that under the condition of sufficient illumination environment, the gap between two carriages is selected as the visual characteristic and marked as C1(x, y), selecting window J1(x, y), extracting target feature points in the video sequence; given C1(x, y) and J1(x, y) set relationship to generate data set Dobject(x) Namely:
Figure FDA0003011367250000011
extracting characteristic points of the B-type image sequence, namely selecting an image sequence of head light imaging in 0: 00-1: 00 night time period, and marking the image sequence as C2(x, y), design Window J2(x, y), judgment C2(x, y) and J2(x, y) set relationship to generate data set Dobject(x) Namely:
Figure FDA0003011367250000012
obtaining Dobject(x) Then, combining the gray scale generated by the gray scale statistic value of each frame imageA time data set Gstatistics(x) Correcting the data to remove false positive characteristics caused by locomotive head illumination;
extracting characteristic points of the C-type image sequence, namely, the visual characteristic does not exist in the C-type image frame, and the train operation parameters are not applicable to be acquired in a visual mode, so that the train identification, the carriage counting and the operation time parameter acquisition are completed by processing and analyzing the audio signals; the noise generated when the wheels impact the steel rail joint is periodic and is an obvious sound characteristic of a target train; performing frame windowing on the sound waveform, calculating short-time energy, and generating a short-time energy-time curve Eshort(t);
And 4, step 4: the detection of the train safety parameters is realized, and the train safety parameters specifically comprise train positioning, train running speed and train compartment counting.
2. The method for automatically monitoring safe operation of a train based on multi-modal information fusion according to claim 1, wherein the step 2 specifically comprises:
step 2.1: through an illumination analysis algorithm, distinguishing A from B, C, namely, A is one type, and B and C are classified into one type;
the illumination analysis algorithm is as follows: in the A, B, C three categories, A is greater than B, C two categories of background pixel average gray levels; aiming at the situation, an adaptive illumination analysis algorithm is provided; first, a background ROI S is selectedroi1(x, y) and Sroi2(x, y) defining S in the k-th frameroi1(x, y) and Sroi2A coordinate point s (k) composed of the average gray values of the (x, y) areas is a gray characteristic point, and average gray value data a (k, p, l) and b (k, p, l) representing the sample are obtained by the formulas (1) and (2);
Figure FDA0003011367250000021
Figure FDA0003011367250000022
where N is the sample volume, kframeIs the total frame number of the image sequence of sample k, and m and n represent Sroi1(x, y) and Sroi2Image size of (x, y), Sroi1The coordinate row value of the (x, y) region is (i)1,m1) Column value of (j)1,n1),Sroi2The (x, y) region coordinate row and column value is (m)2,n2) P is the total number of samples, representing kframeRandomly sampling a P frame image; then, the average value of P samples is calculated for a (k, P, l) and a (k, P, l) by the formulas (3) and (4):
Figure FDA0003011367250000023
Figure FDA0003011367250000024
wherein a (k), b (k) are S in the k-th sample sequence respectivelyroi1(x, y) and Sroi2The average gray value of (x, y), and finally the distance | l (k) | from the gray feature point to the origin is obtained by equations (5) and (6):
L(k)=(a(k),b(k)) k∈(1,N) (5)
Figure FDA0003011367250000025
distinguishing the cases of types A and B, C by judging the magnitude of the value of | L (k) |;
the two classification problems of the illumination analysis algorithm in the A class and the B, C class are that the A class and the B, C class are classified through a set threshold value;
step 2.2: using B, C a feature analysis algorithm to classify again;
the characteristic analysis algorithm is
The emphasis of the classification of B and C is whether visual features exist in Sroi(x, y) but is exposed to lightThe influence of variation, imaging effect, night-time winged insect interference on the frame exists in SroiFeatures in (x, y) do not accurately represent class C; therefore, the image data is analyzed, and the 'ratio' statistical characteristic is provided; the 'duty'd (k) is defined as the number of characteristic frames nfeatureAnd total number k of sample frame sequencessampleThe ratio of (a) to (b), namely:
Figure FDA0003011367250000031
wherein a sequence of sample frames knIs selected from kframeThe first n frames; the feature frame is defined as being in a binary image BWSroiAn image in which white gray values exist in (x, y); by calculating a preselected background image biGenerating a background image b (x, y) by the average gray value of each pixel point in (x, y), namely:
Figure FDA0003011367250000032
the subsequent image processing procedure is as follows:
c=ksample(i)/b(x,y) i∈(1,ksample) (9)
ksample(i)=ksample(i)-c*(x,y) i∈(1,ksample) (10)
BSWroi=Fl(ksample(i)),i∈(1,ksample) (11)
wherein the function Fl(x) An image processing process representing feature extraction; c is a brightness ratio, the brightness of the background image is self-adaptively adjusted according to the image brightness of the current ith frame image, and the image interference in the background subtraction process is reduced; computing a binary image BWSroiThe total number of feature frames in (x, y) is:
Figure FDA0003011367250000033
nsample(k)=∑sequencek (13)
sequencek(l) A 'fractional sequence' of samples, called k, nsampleIs the number of "1" in the proportion sequence; classification is made to B, C by setting a threshold for the fractional feature analysis of B, C categories.
3. The method for automatically monitoring safe operation of train based on multi-modal information fusion as claimed in claim 1, wherein the train positioning in step 4 comprises: time positioning, position positioning and number identification;
the time positioning is to extract the time watermark in the image frame when the train is detected to enter the monitoring area, to complete template matching and identification after the target number is obtained, and to finish the target number Tk(x, y) and template number (0-9) M0(x,y)-M9(x, y) the white pixel value h is obtained by the following equations (16) and (17)k(l),
Figure FDA0003011367250000034
Figure FDA0003011367250000035
In the formula H1(x, y) is a binary image matching result, and then a matching similarity standard function S is providedk(l) Judging the accuracy, as formula (18) (19):
tk=min(hk(l)),l∈(0,9),k∈(1,6) (18)
Figure FDA0003011367250000041
tkas evaluation basis, the statistical value hk(l) And the minimum value tkRatio S ofk(l) Defined as recognition similarity, recognition similarityMaximum template image Ml(x, y) represents the matching result of the numbers, and then the matching result is combined to output time positioning information;
the position location and number identification is: identifying train numbers under class A and class B conditions; passing characteristic C1(x, y) positioning an area N (x, y), segmenting and rotating the N (x, y) to form a digital binary image, and evaluating the similarity degree of the target number and the template by using an exclusive or summation algorithm, wherein the evaluation similarity degree standard is
tk=min(hk(l)),l∈(0,9),k∈(1,6) (20)
Figure FDA0003011367250000042
Identifying and combining the digital information to obtain a train number;
the running speed of the train is the length l of the target train when the model of the locomotive and the model of the carriage are knownobjectIs also determined, Dobject(x) The frame difference between them can be converted into the running time t of the targetobject(ii) a When the train is detected to enter the monitoring area, the running speed v is calculated by the formula (22):
Figure FDA0003011367250000043
the train car count includes: counting visual features and counting audio modes;
wherein in classes a and B by visual feature counting, by visual means: a feature count of a selected car in the sequence of images; class C, without visual features, by audio: the train carriages are counted by two modes of counting and analyzing the audio peak value generated when the train impacts the track.
CN202110376832.3A 2021-04-08 2021-04-08 Automatic monitoring train safety operation method based on multi-mode information fusion Active CN113158854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110376832.3A CN113158854B (en) 2021-04-08 2021-04-08 Automatic monitoring train safety operation method based on multi-mode information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110376832.3A CN113158854B (en) 2021-04-08 2021-04-08 Automatic monitoring train safety operation method based on multi-mode information fusion

Publications (2)

Publication Number Publication Date
CN113158854A true CN113158854A (en) 2021-07-23
CN113158854B CN113158854B (en) 2022-03-22

Family

ID=76889170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110376832.3A Active CN113158854B (en) 2021-04-08 2021-04-08 Automatic monitoring train safety operation method based on multi-mode information fusion

Country Status (1)

Country Link
CN (1) CN113158854B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839085A (en) * 2014-03-14 2014-06-04 中国科学院自动化研究所 Train carriage abnormal crowd density detection method
CN105868694A (en) * 2016-03-24 2016-08-17 中国地质大学(武汉) Dual-mode emotion identification method and system based on facial expression and eyeball movement
CN109814718A (en) * 2019-01-30 2019-05-28 天津大学 A kind of multi-modal information acquisition system based on Kinect V2
CN111881884A (en) * 2020-08-11 2020-11-03 中国科学院自动化研究所 Cross-modal transformation assistance-based face anti-counterfeiting detection method, system and device
CN112230772A (en) * 2020-10-14 2021-01-15 华中师范大学 Virtual-actual fused teaching aid automatic generation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839085A (en) * 2014-03-14 2014-06-04 中国科学院自动化研究所 Train carriage abnormal crowd density detection method
CN105868694A (en) * 2016-03-24 2016-08-17 中国地质大学(武汉) Dual-mode emotion identification method and system based on facial expression and eyeball movement
CN109814718A (en) * 2019-01-30 2019-05-28 天津大学 A kind of multi-modal information acquisition system based on Kinect V2
CN111881884A (en) * 2020-08-11 2020-11-03 中国科学院自动化研究所 Cross-modal transformation assistance-based face anti-counterfeiting detection method, system and device
CN112230772A (en) * 2020-10-14 2021-01-15 华中师范大学 Virtual-actual fused teaching aid automatic generation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘强等: "数据驱动的高速列车轴承多模态运行监控与故障诊断", 《中国科学》 *
奠雨洁等: "视听相关的多模态概念检测", 《计算机研究与发展》 *
袁理等: "基于少量特征点的多模态人脸识别", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN113158854B (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN108596129B (en) Vehicle line-crossing detection method based on intelligent video analysis technology
CN104751634B (en) The integrated application method of freeway tunnel driving image acquisition information
CN109948582B (en) Intelligent vehicle reverse running detection method based on tracking trajectory analysis
CN102968625B (en) Ship distinguishing and tracking method based on trail
CN101510356B (en) Video detection system and data processing device thereof, video detection method
Sina et al. Vehicle counting and speed measurement using headlight detection
CN106446926A (en) Transformer station worker helmet wear detection method based on video analysis
KR20030080285A (en) Apparatus and method for queue length of vehicle to measure
CN109896386B (en) Method and system for detecting repeated opening and closing of elevator door based on computer vision technology
CN111126171A (en) Vehicle reverse running detection method and system
CN113160575A (en) Traffic violation detection method and system for non-motor vehicles and drivers
CN109145708A (en) A kind of people flow rate statistical method based on the fusion of RGB and D information
Chen et al. Indoor and outdoor people detection and shadow suppression by exploiting HSV color information
CN112528861A (en) Foreign matter detection method and device applied to track bed in railway tunnel
CN101996307A (en) Intelligent video human body identification method
CN113657305B (en) Video-based intelligent detection method for black smoke vehicle and ringeman blackness level
CN113158854B (en) Automatic monitoring train safety operation method based on multi-mode information fusion
CN112800974A (en) Subway rail obstacle detection system and method based on machine vision
Kumar et al. A Novel Approach for Speed Estimation along with Vehicle Detection Counting
CN202854837U (en) Vehicle lane mark line detection device
KR20220075999A (en) Pothole detection device and method based on deep learning
CN113487878A (en) Motor vehicle illegal line pressing running detection method and system
CN114005097A (en) Train operation environment real-time detection method and system based on image semantic segmentation
CN116342966B (en) Rail inspection method, device, equipment and medium based on deep learning
CN116485799B (en) Method and system for detecting foreign matter coverage of railway track

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant