CN110168625A

CN110168625A - A kind of recognition methods of emergency vehicles and device

Info

Publication number: CN110168625A
Application number: CN201780082732.1A
Authority: CN
Inventors: 宋风龙; 刘浏; 汪涛
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-05-03
Filing date: 2017-05-03
Publication date: 2019-08-23
Also published as: WO2018201349A1

Abstract

A kind of recognition methods of emergency vehicles and device, this method comprises: obtaining the voice signal and video image of emergency vehicles local environment；The voice signal is analyzed and processed to obtain auditory coding, and is analyzed and processed the video image to obtain visual coding；The type and state of the emergency vehicles are determined according to the auditory coding and visual coding.This method passes through the auditory coding, visual coding, or in conjunction with the auditory coding and visual coding, the type for matching the emergency vehicles to be identified and active state are searched, it can be achieved that the accurate identification of emergency vehicles under several scenes in emergency vehicles type and state as defined in existing standard and warning whistling voice frequency, modified tone period, warning lamp color, the standard correspondence relationship information of warning lamp flash pattern.

Description

A kind of recognition methods of emergency vehicles and device

Technical field

The present invention relates to the recognition methods of automatic Pilot field more particularly to a kind of emergency vehicles and devices.

Background technique

Automatic Pilot is one of the application scenarios of present cognitive computing technique with the development of internet.It is the industry mainstream vehicle of speciality enterprise all in the hardware technology for developing automatic Pilot using automotive engineering currently, being the industry mainstream Internet company of speciality all in the study and control technology for developing automatic Pilot using machine learning and cognition computing technique.

The state of the art overall development of automatic Pilot is preferable, and automatic Pilot in total kilometres, do not accomplish still less than 1% in the fully automated driving being familiar in scene by crash rate at present.Automatic Pilot fail the main reason for first is that identification mistake.Identification is the pith of autonomous driving vehicle technology.Identification, generally by the data of the acquisition vehicle-periphery such as video camera, radar, sensor, and it is rule-based or ambient enviroment is understood and determined based on the method for study, and then predict the action of surrounding objects.

At present, in the identification technology to emergency vehicles (such as police car, ambulance, fire fighting truck and breakdown lorry for being carrying out task etc.), whether the main mode for using visual data processing, i.e., enable the type and active state to judge emergency vehicles according to warning lamp.It is not necessarily active but in many cases, in the emergency vehicles of flashing light, i.e., not necessarily in the task of execution, thus in this case, mistake will be will appear to the identification of emergency vehicles.

Summary of the invention

The embodiment of the invention provides a kind of recognition methods of emergency vehicles and devices, are accurately identified with realizing to emergency vehicles.

On the one hand, the embodiment of the present invention provides a kind of recognition methods of emergency vehicles, and the method comprising the steps of:

Obtain the voice signal and video image of emergency vehicles local environment；

The voice signal is analyzed and processed, obtains the sense of hearing classification coding of the voice signal, and the video image is analyzed and processed, obtains the visual category coding of the video image；

According to the sense of hearing classification of voice signal coding and/or the visual category coding of the video image, the type and state of the emergency vehicles are determined.

In one possible implementation, described to be analyzed and processed the voice signal, the sense of hearing classification coding of the voice signal is obtained, is specifically included:

The voice signal is subjected to sub-frame processing；

The voice signal after sub-frame processing is subjected to acoustic feature extraction, obtains the sound sequence of the voice signal；

Detection identification is carried out to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.

In one possible implementation, detection identification is carried out to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal, specifically includes:

Construct voice recognition model；

The voice recognition model is trained, the type identification parameter of the voice recognition model is obtained；

The voice recognition model carries out detection identification according to the type identification parameter, to the sound sequence of the voice signal, Obtain the sense of hearing classification coding of the voice signal.

In one possible implementation, described that the voice recognition model is trained, it specifically includes:

The voice recognition model is trained using the voice signal in the first classification information dictionary；The first classification information dictionary includes at least the correspondence relationship information of the correspondence relationship information of the sound characteristic of emergency vehicles warning flute and the type of emergency vehicles, the state of the sound equipment of emergency vehicles warning flute and emergency vehicles.

In one possible implementation, described before being trained using the voice signal in the first classification information dictionary to the voice recognition model, the method also includes:

It warns the corresponding relationship of the sound characteristic of flute and the type of emergency vehicles, emergency vehicles to warn the corresponding relationship of the sound equipment of flute and the state of emergency vehicles according to emergency vehicles, constructs the first classification information dictionary, and the first classification information dictionary is stored；The sound characteristic of the emergency vehicles warning flute includes audio frequency range, modified tone period.

In one possible implementation, described to be analyzed and processed the video image, the visual category coding of the video image is obtained, is specifically included:

The video image is detected, the video frame of the emergency vehicles is obtained；

The video frame is subjected to image, semantic segmentation, obtains the warning lamp region of the emergency vehicles；

Video Semantic Analysis is carried out to the warning lamp region of the emergency vehicles, obtains the color and flash pattern of the warning lamp；

According to the color and flash pattern of the warning lamp, the visual category coding of the video image is generated.

In one possible implementation, it is analyzed and processed by the video image, before obtaining the visual category coding of the video image, the method also includes:

According to the corresponding relationship of the state of the luminescence feature of emergency vehicles warning lamp and the corresponding relationship of emergency vehicles type, the flash pattern of warning lamp and emergency vehicles, the second classification information dictionary is constructed, and the second classification information dictionary is stored；The luminescence feature of the warning lamp includes the flash pattern of the color of warning lamp, warning lamp.

In one possible implementation, the visual category according to the sense of hearing classification of voice signal coding and/or the video image encodes, and determines the type and state of the emergency vehicles, specifically includes:

According to sense of hearing classification coding and/or visual category coding, access index is generated；

According to the access index, the type and state of the emergency vehicles are matched from the first classification information dictionary and/or the second classification information dictionary.

In one possible implementation,

When the access index includes that the sense of hearing classification encodes, according to the access index, is searched in the first classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the sense of hearing classification；

When the access index includes that the visual category encodes, according to the access index, is searched in the second classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the visual category；

When the access index includes sense of hearing classification coding and visual category coding, according to the access index, the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and the visual category are searched in the first classification information dictionary and the second classification information dictionary.

On the other hand, the embodiment of the present invention provides a kind of identification device of emergency vehicles, and for executing emergency vehicles recognition methods provided in an embodiment of the present invention, which includes:

Sound transducer, for obtaining the voice signal of emergency vehicles local environment；

Optical sensor, for obtaining the video image of emergency vehicles local environment；

Processor, for the voice signal to be analyzed and processed, the sense of hearing classification for obtaining the voice signal is encoded, And be analyzed and processed the video image, obtain the visual category coding of the video image；And encoded according to sense of hearing classification coding and/or the visual category, determine the type and state of the emergency vehicles；

Memory, for storing the type and status information of emergency vehicles.

In one possible implementation, the processor is specifically used for,

The voice signal is subjected to sub-frame processing；

In one possible implementation, the processing implement body is also used to,

Construct voice recognition model；

The voice recognition model carries out detection identification according to the type identification parameter, to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.

In one possible implementation, the voice signal that the processor is read in the first classification information dictionary is trained the voice recognition model；The first classification information dictionary includes at least the correspondence relationship information of the correspondence relationship information of the sound characteristic of emergency vehicles warning flute and the type of emergency vehicles, the state of the sound equipment of emergency vehicles warning flute and emergency vehicles.

In one possible implementation, before being trained to the voice recognition model, the processor warns the corresponding relationship of the state of the corresponding relationship of the sound characteristic of flute and the type of emergency vehicles, the sound equipment of emergency vehicles warning flute and emergency vehicles according to emergency vehicles, the first classification information dictionary is constructed, and the memory is written into the first classification information dictionary；The sound characteristic of the emergency vehicles warning flute includes audio frequency range, modified tone period.

In one possible implementation, the processing implement body is also used to,

In one possible implementation, before the video image is analyzed and processed, the processor is according to the corresponding relationship of the state of the luminescence feature of emergency vehicles warning lamp and the corresponding relationship of the type of emergency vehicles, the flash pattern of warning lamp and emergency vehicles, the second classification information dictionary is constructed, and the memory is written into the second classification information dictionary；The luminescence feature of the warning lamp includes the flash pattern of the color of warning lamp, warning lamp.

In one possible implementation, the processing implement body is also used to,

Another aspect, the embodiment of the present invention also provide a kind of identification device of emergency vehicles, and for executing emergency vehicles recognition methods provided in an embodiment of the present invention, which includes:

Data receipt unit, for obtaining the voice signal and video image of emergency vehicles local environment；

Multi-modal sension unit obtains the sense of hearing classification coding of the voice signal, and the video image is analyzed and processed, obtains the visual category coding of the video image for the voice signal to be analyzed and processed；

Matching unit determines the type and state of the emergency vehicles for encoding according to the visual category of the sense of hearing classification of voice signal coding and/or the video image.

In one possible implementation, the multi-modal sension unit includes: auditory processing module；The auditory processing module is specifically used for,

The voice signal is subjected to sub-frame processing；

In one possible implementation, described device further include:

In one possible implementation, the auditory processing module is specifically also used to,

Construct voice recognition model；

In one possible implementation, the voice signal that the auditory processing module is read in the first classification information dictionary is trained the voice recognition model；The first classification information dictionary includes at least the correspondence relationship information of the correspondence relationship information of the sound characteristic of emergency vehicles warning flute and the type of emergency vehicles, the state of the sound equipment of emergency vehicles warning flute and emergency vehicles.

In one possible implementation, before being trained to the voice recognition model, the auditory processing module warns the corresponding relationship of the state of the corresponding relationship of the sound characteristic of flute and the type of emergency vehicles, the sound equipment of emergency vehicles warning flute and emergency vehicles according to emergency vehicles, the first classification information dictionary is constructed, and the first classification information dictionary is stored；The sound characteristic of the emergency vehicles warning flute includes audio frequency range, modified tone period.

In one possible implementation, the multi-modal sension unit further includes vision processing module；The vision processing module is used for,

In one possible implementation, before the video image is analyzed and processed, the vision processing module is according to the corresponding relationship of the state of the luminescence feature of emergency vehicles warning lamp and the corresponding relationship of the type of emergency vehicles, the flash pattern of warning lamp and emergency vehicles, the second classification information dictionary is constructed, and the second classification information dictionary is stored；The luminescence feature of the warning lamp includes the flash pattern of the color of warning lamp, warning lamp.

In one possible implementation, the multi-modal sension unit further includes access index generation module；

The access index generation module generates access index for encoding according to sense of hearing classification coding and/or the visual category；

The matching unit matches the type and state of the emergency vehicles according to the access index from the first classification information dictionary and/or the second classification information dictionary.

In one possible implementation,

When the access index includes that the sense of hearing classification encodes, the matching unit is searched in the first classification information dictionary according to the access index and matches the type and state for encoding corresponding emergency vehicles with the sense of hearing classification；

When the access index includes that the visual category encodes, the matching unit is searched in the second classification information dictionary according to the access index and matches the type and state for encoding corresponding emergency vehicles with the visual category；

When the access index includes sense of hearing classification coding and visual category coding, the matching unit searches the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and the visual category according to the access index in the first classification information dictionary and the second classification information dictionary.

In another aspect, the embodiment of the present invention also provides a kind of vehicle, this kind of vehicle include include various anxious vehicle identifiers provided in an embodiment of the present invention, the type and state of emergency vehicles identification device emergency vehicles for identification.

A kind of recognition methods of emergency vehicles provided in an embodiment of the present invention, by obtaining multi-modal information, the voice signal and video image information for being in environment locating for emergency vehicles, and the voice signal and video image information are analyzed and processed respectively, obtain the sense of hearing classification coding and visual category coding of voice signal.By sense of hearing classification coding, visual category coding or in conjunction with sense of hearing classification coding and visual category coding, the type and active state of the emergency vehicles to be identified are searched, matched in emergency vehicles type and state as defined in existing standard and warning whistling voice frequency, modified tone period, warning lamp color, the standard correspondence relationship information of warning lamp flash pattern, and then it realizes in following scene, the accurate identification to emergency vehicles:

It when different emergency vehicles have different warning whistling, is only encoded by sense of hearing classification, the detection and identification of emergency vehicles can be singly completed；

When different emergency vehicles have different warning lamp colors and flash pattern, is only encoded by visual category, the detection and identification of emergency vehicles can be singly completed；

When different emergency vehicles may have identical warning whistling, or different emergency vehicles are when may have identical warning lamp color, flash pattern, in combination with sense of hearing classification coding and visual category coding, emergency vehicles are detected and are identified, the accuracy of identification is improved.

Detailed description of the invention

Fig. 1 is a kind of structural schematic diagram of the identification device of emergency vehicles provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart of the recognition methods of emergency vehicles provided in an embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of the identification device of emergency vehicles provided in an embodiment of the present invention；

Fig. 4 is the schematic diagram that the embodiment of the present invention carries out framing to voice signal；

Fig. 5 is that the embodiment of the present invention extracts observable sound sequence obtained from acoustic feature to the voice signal shown in Fig. 2 after framing；

Fig. 6 is identification process schematic diagram of the voice recognition model provided in an embodiment of the present invention to sound sequence shown in Fig. 3；

Fig. 7 is the schematic diagram that the embodiment of the present invention carries out image, semantic segmentation to the video frame of emergency vehicles；

Fig. 8 is the schematic diagram that the embodiment of the present invention carries out Video Semantic Analysis to emergency Vehicular warning lamp region.

Specific embodiment

Below by attached drawing and specific embodiment, the technical solution of the embodiment of the present invention is described in further detail.

Fig. 1 is a kind of structural schematic diagram of the identification device of emergency vehicles provided in an embodiment of the present invention.As shown in Figure 1, the present apparatus includes: sound transducer 101, optical sensor 102, processor 103, memory 104.

Sound transducer 101 is used to obtain the voice signal of emergency vehicles local environment.

Optical sensor 102 is used to obtain the video image of emergency vehicles local environment.

Optionally, the optical sensor 102 can use video camera, or the optical scanner based on laser radar.

Processor 103 obtains the sense of hearing classification coding of the voice signal, and the video image is analyzed and processed, obtains the visual category coding of the video image for the voice signal to be analyzed and processed；And encoded according to sense of hearing classification coding and the visual category, matching obtains the type and state of the emergency vehicles from memory 104.

Memory 104 is used to store the type and all possible status information of emergency vehicles.

Fig. 2 is a kind of flow chart of the recognition methods of emergency vehicles provided in an embodiment of the present invention.As shown in Fig. 2, the recognition methods of emergency vehicles provided in an embodiment of the present invention, executing subject is the device that can identify emergency vehicles, can be device shown in FIG. 1, be the specific implementation process of this method embodiment below:

Step S200: the voice signal and video image of emergency vehicles local environment are obtained.

Emergency vehicles are in the process of moving, it generally can all open warning flute and warning lamp, the siren of different emergency vehicles (such as police car, ambulance, engineering first-aid repair vehicle, fire fighting truck) can be made a sound by corresponding frequency harmony modulation law, and warning lamp can also shine by corresponding color and flash pattern.But due to environmental factor, emergency vehicles can adulterate various sound in environment in the process of moving, therefore, in automatic Pilot technology, in order to identify the emergency vehicles on automatic driving vehicle periphery, need to obtain the voice signal and video image in surrounding enviroment.So that subsequent step handles the analysis of acquired voice signal and video image, the type and active state of emergency vehicles and the emergency vehicles are identified, convenient for automatic driving vehicle adjustment traveling planning control.

Step S210: the voice signal is analyzed and processed, and obtains the sense of hearing classification coding of the voice signal, and the video image is analyzed and processed, and obtains the visual category coding of the video image.

Specifically, the analysis of the voice signal is handled, obtains the sense of hearing classification coding of the voice signal, specifically includes:

Step S2101: the voice signal is subjected to sub-frame processing.

Step S2102: the voice signal after sub-frame processing is subjected to acoustic feature extraction, obtains the sound sequence of the voice signal.

Step S2103: carrying out detection identification for the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.It specifically includes:

Firstly, the first classification information dictionary of building, and the first classification information dictionary is stored.The first classification information dictionary can warn the corresponding relationship of the state of the corresponding relationship of the sound characteristic of flute and the type of emergency vehicles, the sound equipment of emergency vehicles warning flute and emergency vehicles to construct according to emergency vehicles, the i.e. described first classification information dictionary includes at least the correspondence relationship information of the correspondence relationship information of the sound characteristic of emergency vehicles warning flute and the type of emergency vehicles, the state of the sound equipment of emergency vehicles warning flute and emergency vehicles, and the sound characteristic of the emergency vehicles warning flute includes audio frequency range, modified tone period., for example, whistling audio frequency range, modified tone week can be warned according to emergency vehicles as defined in chinese national standard GB8108-1999 The standard correspondence relationship information of phase and the emergency vehicles type construct the first classification information dictionary.Wherein, the sound that the warning flute of each emergency vehicles issues has corresponding frequency range, and the sound in this kind of frequency range has the corresponding modified tone period, and the state of emergency vehicles be divided into it is active and inactive, warning whistling to ring then indicates active, warning whistling, which is not rung, then indicates inactive, as long as therefore detect the voice signal of warning flute, i.e., it is believed that emergency vehicles are active.

Secondly, building voice recognition model；

Furthermore voice signal defined in the first classification information dictionary can be used to be trained the voice recognition model, obtain the type identification parameter of the voice recognition model；

Finally, carrying out detection identification to the sound sequence of the voice signal by the voice recognition model according to the type identification parameter, the sense of hearing classification coding of the voice signal is obtained.

Specifically, the video image is analyzed and processed, obtains the visual category coding of the video image, specifically includes:

Step S2104: the second classification information dictionary of building, and the second classification information dictionary is stored.Specifically, the second classification information dictionary can be according to the luminescence feature of emergency vehicles warning lamp and the corresponding relationship of emergency vehicles type, and flash pattern and the corresponding relationship of state construct, wherein, the luminescence feature of warning lamp includes the flash pattern of the color of warning lamp, warning lamp, the i.e. described second classification information dictionary includes at least color, the corresponding relationship of flash pattern and emergency vehicles type and the corresponding relationship of flash pattern and state of emergency vehicles warning lamp.Wherein, the warning lamp of each emergency vehicles has corresponding color variation and flash pattern to be corresponding to it when luminous, and the state of emergency vehicles can be divided into active state and disabled state, which can be used different flash patterns to indicate.

Step S2105: detecting the video image, determines the video frame of emergency vehicles in video image；

Step S2106: the video frame is subjected to image, semantic segmentation, obtains the warning lamp region of the emergency vehicles；

Step S2107: Video Semantic Analysis is carried out to the warning lamp region of the emergency vehicles, obtains the color and flash pattern of the warning lamp；

Step S2108: according to the color and flash pattern of the warning lamp, the visual category coding of the video image is generated.

Step S220: it is encoded according to the visual category of the sense of hearing classification of voice signal coding and/or the video image, determines the type and state of the emergency vehicles.

Specifically, it is encoded according to sense of hearing classification coding and/or the visual category, generates access index.When the access index is only made of sense of hearing classification coding, then according to the access index, is searched in the first classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the sense of hearing classification；When the access index is only made of visual category coding, according to the access index, is searched in the second classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the visual category；When the access index is made of sense of hearing classification coding and visual category coding, according to the access index, the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and visual category are searched in the first classification information dictionary and the second classification information dictionary.

It is encoded according to sense of hearing classification, it is searched in the first classification information dictionary and matches the type and state for encoding corresponding emergency vehicles with the sense of hearing classification, or it is encoded according to the visual category, it is searched in the second classification information dictionary and matches the type and state for encoding corresponding emergency vehicles with the visual category, or sense of hearing classification coding and visual category coding are combined, the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and visual category are searched in the first classification information dictionary and the second classification information dictionary.

In the embodiment of the present invention, respectively by environment locating for emergency vehicles voice signal and video image be analyzed and processed, obtained sense of hearing classification coding and visual category coding can be encoded according to sense of hearing classification coding or visual category respectively, It identifies the type and state of emergency vehicles, either can identify the type and state of emergency vehicles in conjunction with sense of hearing classification coding or visual category coding, further increase the accuracy rate of identification.

Above-mentioned combination Fig. 2 has carried out detailed elaboration to the recognition methods of emergency vehicles provided in an embodiment of the present invention, and the device of emergency vehicles can be identified below with reference to Fig. 3 detailed description.

Fig. 3 is the structural schematic diagram of the identification device of another emergency vehicles provided in an embodiment of the present invention.The device is the device that can recognize emergency vehicles, can be used for executing method as shown in Figure 2 provided in this embodiment.As shown in figure 3, the device includes: data receipt unit 301, multi-modal sension unit 302, matching unit 303, storage unit 304.

Wherein, data receipt unit 301 includes the audible signal reception module 3011 for acquiring autonomous driving vehicle peripheral sound signal, and the video image receiving module 3012 for acquiring autonomous driving vehicle periphery video image.

Multi-modal sension unit 302 includes auditory processing module 3021, vision processing module 3022, access index generation module 3023.Collected voice signal is carried out sub-frame processing by auditory processing module 3021, and the voice signal after sub-frame processing is subjected to acoustic feature extraction, then by establishing acoustics identification model, the acoustic feature of the voice signal is identified, obtains the sense of hearing classification coding of voice signal.Vision processing module 3022 detects collected video image, confirm the video frame of emergency vehicles in the video image, and the video frame is subjected to image, semantic segmentation, obtain the video flowing in emergency vehicles warning lamp region, then Video Semantic Analysis is carried out to the video flowing in the warning lamp region, the color and flash pattern of warning lamp are obtained, and then obtains the visual category coding of the emergency vehicles.Access index generation module 3023 generates access index in conjunction with sense of hearing classification coding and visual category coding.

Optionally, when implementing voice signal and video image, auditory processing module 3021, vision processing module 3022 can be respectively two different processors (or hardware capability device), it or can be two dedicated processes modules in same processor, or the two distinct program modules that can be called for a processor, one of program module for realizing voice signal processing, another program module for realizing video image processing.

The matching unit 303 accesses storage unit 304, the type and state of emergency vehicles is matched from storage unit 304 according to the access index.

First classification information dictionary, second classification information dictionary of the storage including emergency vehicles type and all possible state in the storage unit 304.The first classification information dictionary is the correspondence relationship information of the sound frequency that flute is warned according to emergency vehicles, modified tone period definition emergency vehicles type and state.The second classification information dictionary is the correspondence relationship information that emergency vehicles type and state are defined according to the warning lamp color of emergency vehicles, warning lamp flash pattern.

Specifically, the specific implementation process of the auditory processing module 3021 are as follows:

As shown in figure 4, the auditory processing module 3021 uses window function, moved by required frame length, frame, received voice signal is subjected to sub-frame processing, i.e., the received voice signal is cut into overlapping multiple frames.Such as: it selects frame length to move 15ms for 25ms, frame and carries out cutting, the calculating that frame moves are as follows: 25ms-10ms=15ms.

Window function used by the embodiment of the present invention refers to the cutted function that voice signal is truncated, frame moves the lap for referring to adjacent two frame data in front and back, the lap of i.e. previous tail and a later frame head, after being moved due to frame, each frame signal has the ingredient of previous frame and next frame, therefore it can prevent discontinuous between two frames, make all to be relevant data between every frame data, can preferably be close with actual sound.

As shown in figure 5, the auditory processing module 3021 carries out acoustic feature extraction using the mode for extracting mel-frequency cepstrum coefficient (Mel Frequency Cepstral Coefficents, MFCC), by the voice signal after framing.Specifically, the auditory processing module 3021 extracts the MFCC feature of M point in each frame waveform of the voice signal according to the physiological property of human ear, is transformed to a M dimensional vector.Wherein, the M dimensional vector of each frame voice signal is believed comprising the frame sound Number content information.The matrix that the M dimensional vector of every frame voice signal is formed to a M row, N column obtains one for characterizing the observable sound sequence of the voice signal, and wherein N is totalframes.For example, the embodiment of the present invention takes M to be equal to 12, the observable sound sequence of 12 row * N column as shown in Figure 5 is obtained, wherein the vector that each frame is all tieed up with one 12 indicates, the shade of color lump indicates the size of vector value.

Specifically, the above-mentioned detailed process that MFCC feature is extracted to every frame voice signal are as follows: by obtaining corresponding frequency spectrum to each short-time analysis window and Fast Fourier Transform (FFT) (Fast Fourier Transform, FFT)；Obtained frequency spectrum is obtained into Mel frequency spectrum by Mel filter group；Cepstral analysis is carried out on Mel frequency spectrum, by taking logarithm and long-lost cosine code (discrete cosine transform, DCT) the inverse transformation realized, and take the 2nd after DCT to the 13rd coefficient as MFCC coefficient, obtain Mel frequency cepstral coefficient MFCC, the as feature of the frame sound.

Further, during the acoustic feature of said extracted sound waveform, the auditory processing module 3021 is in addition to using extraction mel-frequency cepstrum coefficient (Mel Frequency Cepstral Coefficents, MFCC mode) extracts outside acoustic feature the voice signal, and the acoustic feature of the voice signal can also be extracted by the way of deep learning.

Further, as shown in Figure 6, the auditory processing module 3021 can construct voice recognition model based on Hidden Markov model (HMM), it is identified by sound sequence described in the voice recognition model, obtain the classification information of the sound sequence, and the classification information is encoded and is exported, obtain the sense of hearing classification coding of the sound sequence.Specifically, after the auditory processing module 3021 constructs voice recognition model, the voice signal that can be used in the first classification information dictionary is trained the voice recognition model, obtains the type identification parameter of the voice recognition model.The voice recognition model identifies the sound sequence after MFCC feature extraction according to the type identification parameter, obtain the sense of hearing classification coding of the sound sequence, the classification for the voice signal that the sense of hearing classification refers to, the classification of the voice signal can according to comply with standard or the sound frequency of convention, tone period of change and distinguish.

Further, voice signal in the first classification information dictionary is the largely road environment voice signal through marking, it can is the standard corresponding relation data being defined according to sound characteristic as defined in chinese national standard GB8108-1999 (i.e. sound frequency and modified tone period) to warning whistling.For example, as shown in table 1, sound frequency existsBetween, voice signal of the modified tone period between 0.333-0.385 be urgent frequency modulation tune, corresponding vehicle is police car.

1 sound frequency of table and modified tone period

In practical applications, above-mentioned auditory processing module 3021 provided in an embodiment of the present invention can be used for the detection identification to other objects, as long as being stored the one-to-one relationship of sound characteristic and object that the object issues as the first classification information dictionary.

By taking above-mentioned national standard as an example, different emergency vehicles have different siren sound, thus in this case, the detection and identification of active emergency vehicles can be singly completed only with the auditory processing module 2031.

But same siren sound is possibly used for different emergency vehicles sometimes, it is therefore desirable in conjunction with the vision processing module 3022, more accurate detection and can identify active emergency vehicles.

The following are the specific implementation process of the vision processing module 3022:

Specifically, the vision processing module 3022 can be used multi-target detection algorithm and detect in the video image whether have emergency vehicles, obtain the video frame of emergency vehicles.The multi-target detection algorithm can are as follows: real-time multi-target detection algorithm SSD, YOLO etc. and R-CNN, Fast R-CNN or Faster R-CNN algorithm.

Further, as shown in Figure 7, the vision processing module 3022 can be used convolutional neural networks (convolutional neural network, CNN) and the video frame carried out image, semantic segmentation in conjunction with the algorithm of depth convolutional neural networks DCNN, and warning lamp region is cut out.Specifically, the feature that each pixel in image is first extracted using CNN is reused DCNN and restores the position for returning to each pixel in original image, realized Pixel-level identification, export the segmentation in warning lamp region.

Further, as shown in figure 8, the vision processing module 3022 is directed to the video flowing in the warning lamp region split, Video Semantic Analysis is carried out, detects and identify the color and flash pattern of warning lamp, and then obtain the visual category coding of video image.Specifically, CNN combination Recognition with Recurrent Neural Network (Recurrent neural Network can be used, RNN) algorithm, time recurrent neural network (Long Short-Term Memory, LSTM) algorithm, for area-of-interest (the Region of Interest split in frame image each in multi-frame video stream, ROI), that is warning lamp region, the feature that ROI region in each frame image is extracted using CNN, different color blocks, shape including warning lamp in present frame；For the feature extracted, the sequential relationship of each frame in video is extracted using the RNN network that the double-deck LSTM is realized, exports the color change feature of warning lamp, obtain warning lamp whether flash, the feature of color lump color, output visual category coding.The visual category refer to according to comply with standard or the warning lamp color and color change (i.e. flash pattern) of convention and distinguish classification.The visual category coding is the coding of visual category.

In the embodiment of the present invention, the vision processing module 3022 is directed to stream video, is divided by target detection, ROI region, the region of Video Semantic Analysis is limited to the limited range of warning lamp, it reduces calculation amount and improves recognition speed, be applicable to the weaker autonomous driving vehicle of computing capability.

In addition, when different emergency vehicles have different warning lamps and flash pattern, only with vision processing module 3022 provided in an embodiment of the present invention to the process of processing and the identification of video image, the detection and identification of active emergency vehicles can be singly completed.

In practical application, above-mentioned vision processing module 3022 provided in an embodiment of the present invention can be used for the detection identification to other objects, as long as being stored the one-to-one relationship of the visual signature of the object and object as the second classification information dictionary.

Further, after obtaining sense of hearing classification coding and visual category coding, the access index generation module 3023 is encoded according to the sense of hearing classification and/or visual category coding, generates access index.

Further, the matching unit 303 is according to the access index, when the access index is only made of sense of hearing classification coding, the matching unit 303 is searched in the first classification information dictionary according to the access index and matches the type and state that encode corresponding emergency vehicles with the sense of hearing classification；When the access index only encodes structure by the visual category Cheng Shi, the matching unit 303 are searched in the second classification information dictionary according to the access index and match the type and state for encoding corresponding emergency vehicles with the visual category；When the access index is made of sense of hearing classification coding and visual category coding, the matching unit 303 searches the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and visual category according to the access index in the first classification information dictionary and the second classification information dictionary.

Further, the indexed mode when matching unit 303 accesses the storage unit 304, depending on the storage mode for type of vehicle and the state of being met an urgent need according to 304 Duis of storage unit, such as:

The mode enumerated can be used in the storage unit 304, and type of vehicle and its all possible state is stored in advance.

As shown in table 2, the mode of bivariate table can be used to be stored for the car type and status data of emergency vehicles, and table 2 is the table that N row 2 arranges, and every a line corresponds to a kind of emergency vehicles and its state in table, a column in table indicate the type of emergency vehicles, and another column indicate the state of emergency vehicles.

A kind of storage list of 2 emergency vehicles of table

Vehicle	State
Police car	It is inactive
Police car	It is active
Fire fighting truck	It is inactive
···	···

When storage unit 304 is using storage mode as shown in Table 2, access index generation module 3023 encodes visual category coding with sense of hearing classification, is spliced into index coding, accesses to storage unit 304, so that it is determined that the type and state of emergency vehicles.Specifically, it is assumed that shared N class visual category information and M class sense of hearing classification information, then visual category is compiled and sense of hearing classification coding is respectively log2N and log2M, then the index for accessing storage unit shares the position (log2N+log2M).

Optionally, shown in table 3, the storage unit 304 can also use the car type of the mode emergency vehicles of bivariate table and status data to be stored, and each single item is indicated with bivector in table, and two components of the vector are respectively the car type and its state of emergency vehicles.

Another storage list of 3 emergency vehicles of table

<police car, inactive>	<fire fighting truck, active>
<fire fighting truck, inactive>	<police car, active>
<ambulance, inactive>	<breakdown lorry, active>
···	···
<breakdown lorry, inactive>	<ambulance, active>

When storage unit 304 is using storage mode as shown in table 3, access index generation module 3023 encodes visual category coding and sense of hearing classification as line index and column index, it accesses to storage unit 304, so that it is determined that the type and state of emergency vehicles.Specifically, it is assumed that shared N class visual category information and M class sense of hearing classification information, then vision and sense of hearing classification coding are respectively log2N and log2M.

In the embodiment of the present invention, the bivariate table storage mode as shown in table 2 and table 3 used by the storage unit 304, the storage convenient for type and state to emergency vehicles is extended, can be realized by the extension of memory cell content.

The identification device of emergency vehicles in above-mentioned Fig. 1 and Fig. 3 provided in an embodiment of the present invention can be applied to automatic driving vehicle and other transit equipments, for identifying the emergency vehicles on periphery in time, it can also be applied to the mobile object identified by sound, video, convenient for evading the emergency vehicles and other described objects in time.

In embodiments of the present invention, when different emergency vehicles may have identical warning whistling, or different emergency vehicles are when may have identical warning lamp and identical flash pattern, emergency vehicles identification device provided by the embodiment of the present invention, in combination with auditory processing module 3021 and vision processing module 3022 to the treatment process of multiple modal datas, the type and active state or disabled state of the identification emergency vehicles of efficiently and accurately, support the planning control of automatic Pilot.

It is described above; only a specific embodiment of the invention; but scope of protection of the present invention is not limited thereto; anyone skilled in the art is in the technical scope disclosed by the present invention; it is contemplated that various equivalent modifications or substitutions, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims

A kind of recognition methods of emergency vehicles, which is characterized in that the described method includes:

Obtain the voice signal and video image of emergency vehicles local environment；

The voice signal is analyzed and processed, obtains the sense of hearing classification coding of the voice signal, and the video image is analyzed and processed, obtains the visual category coding of the video image；

According to the sense of hearing classification of voice signal coding and/or the visual category coding of the video image, the type and state of the emergency vehicles are determined.
The method according to claim 1, wherein described is analyzed and processed the voice signal, the sense of hearing classification coding of the voice signal is obtained, is specifically included:

The voice signal is subjected to sub-frame processing；

The voice signal after sub-frame processing is subjected to acoustic feature extraction, obtains the sound sequence of the voice signal；

Detection identification is carried out to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.
According to the method described in claim 2, the sense of hearing classification for obtaining the voice signal encodes it is characterized in that, the sound sequence to the voice signal carries out detection identification, specifically include:

Construct voice recognition model；

The voice recognition model is trained, the type identification parameter of the voice recognition model is obtained；

The voice recognition model carries out detection identification according to the type identification parameter, to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.
According to claim 1 to method described in 3 any claims, which is characterized in that it is described that the voice recognition model is trained, it specifically includes:

The voice recognition model is trained using the voice signal in the first classification information dictionary；The first classification information dictionary includes at least the correspondence relationship information of the correspondence relationship information of the sound characteristic of emergency vehicles warning flute and the type of emergency vehicles, the state of the sound equipment of emergency vehicles warning flute and emergency vehicles.
According to the method described in claim 4, it is characterized in that, described before being trained using the voice signal in the first classification information dictionary to the voice recognition model, the method also includes:

It warns the corresponding relationship of the sound characteristic of flute and the type of emergency vehicles, emergency vehicles to warn the corresponding relationship of the sound equipment of flute and the state of emergency vehicles according to emergency vehicles, constructs the first classification information dictionary, and the first classification information dictionary is stored；The sound characteristic of the emergency vehicles warning flute includes audio frequency range, modified tone period.
Method according to any one of claims 1 to 5, which is characterized in that it is described to be analyzed and processed the video image, the visual category coding of the video image is obtained, is specifically included:

The video image is detected, the video frame of the emergency vehicles is obtained；

The video frame is subjected to image, semantic segmentation, obtains the warning lamp region of the emergency vehicles；

Video Semantic Analysis is carried out to the warning lamp region of the emergency vehicles, obtains the color and flash pattern of the warning lamp；

According to the color and flash pattern of the warning lamp, the visual category coding of the video image is generated.
According to the method described in claim 6, it is characterized in that, be analyzed and processed by the video image, before obtaining the visual category coding of the video image, the method also includes:

According to the corresponding relationship of the state of the corresponding relationship of the type of the luminescence feature of emergency vehicles warning lamp and emergency vehicles and warning lamp flash pattern and emergency vehicles, the second classification information dictionary is constructed, and the second classification information dictionary is stored；The luminescence feature of the warning lamp includes the flash pattern of the color of warning lamp, warning lamp.
The method according to claim 5 or 7, which is characterized in that the visual category according to the sense of hearing classification of voice signal coding and/or the video image encodes, and determines the type and state of the emergency vehicles, specifically includes:

According to sense of hearing classification coding and/or visual category coding, access index is generated；

According to the access index, the type and state of the emergency vehicles are matched from the first classification information dictionary and/or the second classification information dictionary.
According to the method described in claim 8, it is characterized in that,

When the access index includes that the sense of hearing classification encodes, according to the access index, is searched in the first classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the sense of hearing classification；

When the access index includes that the visual category encodes, according to the access index, is searched in the second classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the visual category；

When the access index includes sense of hearing classification coding and visual category coding, according to the access index, the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and the visual category are searched in the first classification information dictionary and the second classification information dictionary.
A kind of identification device of emergency vehicles characterized by comprising

Sound transducer, for obtaining the voice signal of emergency vehicles local environment；

Optical sensor, for obtaining the video image of emergency vehicles local environment；

Processor obtains the sense of hearing classification coding of the voice signal, and the video image is analyzed and processed, obtains the visual category coding of the video image for the voice signal to be analyzed and processed；And encoded according to sense of hearing classification coding and/or the visual category, determine the type and state of the emergency vehicles；

Memory, for storing the type and status information of emergency vehicles.
Device according to claim 10, which is characterized in that the processor is specifically used for,

The voice signal is subjected to sub-frame processing；

The voice signal after sub-frame processing is subjected to acoustic feature extraction, obtains the sound sequence of the voice signal；

Detection identification is carried out to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.
Device according to claim 11, which is characterized in that the processing implement body is also used to,

Construct voice recognition model；

The voice recognition model is trained, the type identification parameter of the voice recognition model is obtained；

The voice recognition model carries out detection identification according to the type identification parameter, to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.
Device described in 0 to 12 any claim according to claim 1, which is characterized in that the voice signal that the processor is read in the first classification information dictionary is trained the voice recognition model；The first classification information dictionary includes at least the correspondence relationship information of the correspondence relationship information of the sound characteristic of emergency vehicles warning flute and the type of emergency vehicles, the state of the sound equipment of emergency vehicles warning flute and emergency vehicles.
Device according to claim 13, it is characterized in that, before being trained to the voice recognition model, the processor warns the corresponding relationship of the state of the corresponding relationship of the sound characteristic of flute and the type of emergency vehicles, the sound equipment of emergency vehicles warning flute and emergency vehicles according to emergency vehicles, the first classification information dictionary is constructed, and the memory is written into the first classification information dictionary；The sound characteristic of the emergency vehicles warning flute includes audio frequency range, modified tone period.
Device described in 0 to 14 any claim according to claim 1, which is characterized in that the processing implement body is also used to,

The video image is detected, the video frame of the emergency vehicles is obtained；

The video frame is subjected to image, semantic segmentation, obtains the warning lamp region of the emergency vehicles；

Video Semantic Analysis is carried out to the warning lamp region of the emergency vehicles, obtains the color and flash pattern of the warning lamp；

According to the color and flash pattern of the warning lamp, the visual category coding of the video image is generated.
Device according to claim 15, it is characterized in that, before the video image is analyzed and processed, the processor is according to the corresponding relationship of the state of the luminescence feature of emergency vehicles warning lamp and the corresponding relationship of the type of emergency vehicles, the flash pattern of warning lamp and emergency vehicles, the second classification information dictionary is constructed, and the memory is written into the second classification information dictionary；The luminescence feature of the warning lamp includes the flash pattern of the color of warning lamp, warning lamp.
Device described in 4 or 16 according to claim 1, which is characterized in that the processing implement body is also used to,

According to sense of hearing classification coding and/or visual category coding, access index is generated；

According to the access index, the type and state of the emergency vehicles are matched from the first classification information dictionary and/or the second classification information dictionary.
Device according to claim 7, which is characterized in that the processing implement body is also used to,

When the access index includes that the sense of hearing classification encodes, according to the access index, is searched in the first classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the sense of hearing classification；

When the access index includes that the visual category encodes, according to the access index, is searched in the second classification information dictionary and match the type and state for encoding corresponding emergency vehicles with the visual category；

When the access index includes sense of hearing classification coding and visual category coding, according to the access index, the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and the visual category are searched in the first classification information dictionary and the second classification information dictionary.
A kind of identification device of emergency vehicles characterized by comprising

Data receipt unit, for obtaining the voice signal and video image of emergency vehicles local environment；

Multi-modal sension unit obtains the sense of hearing classification coding of the voice signal, and the video image is analyzed and processed, obtains the visual category coding of the video image for the voice signal to be analyzed and processed；

Matching unit determines the type and state of the emergency vehicles for encoding according to the visual category of the sense of hearing classification of voice signal coding and/or the video image.
Device according to claim 19, which is characterized in that the multi-modal sension unit includes: auditory processing module；The auditory processing module is specifically used for,

The voice signal is subjected to sub-frame processing；

The voice signal after sub-frame processing is subjected to acoustic feature extraction, obtains the sound sequence of the voice signal；

Detection identification is carried out to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.
Device according to claim 20, which is characterized in that the auditory processing module is specifically also used to,

Construct voice recognition model；

The voice recognition model is trained, the type identification parameter of the voice recognition model is obtained；

The voice recognition model carries out detection identification according to the type identification parameter, to the sound sequence of the voice signal, obtains the sense of hearing classification coding of the voice signal.
Device according to claim 21, which is characterized in that the voice signal that the auditory processing module is read in the first classification information dictionary is trained the voice recognition model；The first classification information dictionary includes at least the correspondence relationship information of the correspondence relationship information of the sound characteristic of emergency vehicles warning flute and the type of emergency vehicles, the state of the sound equipment of emergency vehicles warning flute and emergency vehicles.
Device according to claim 22, it is characterized in that, before being trained to the voice recognition model, the auditory processing module warns the corresponding relationship of the state of the corresponding relationship of the sound characteristic of flute and the type of emergency vehicles, the sound equipment of emergency vehicles warning flute and emergency vehicles according to emergency vehicles, the first classification information dictionary is constructed, and the first classification information dictionary is stored；The sound characteristic of the emergency vehicles warning flute includes audio frequency range, modified tone period.
Device described in 9 to 23 any claims according to claim 1, which is characterized in that the multi-modal sension unit further includes vision processing module；The vision processing module is used for,

The video image is detected, the video frame of the emergency vehicles is obtained；

The video frame is subjected to image, semantic segmentation, obtains the warning lamp region of the emergency vehicles；

Video Semantic Analysis is carried out to the warning lamp region of the emergency vehicles, obtains the color and flash pattern of the warning lamp；

According to the color and flash pattern of the warning lamp, the visual category coding of the video image is generated.
Device according to claim 24, it is characterized in that, before the video image is analyzed and processed, the vision processing module is according to the corresponding relationship of the state of the luminescence feature of emergency vehicles warning lamp and the corresponding relationship of the type of emergency vehicles, the flash pattern of warning lamp and emergency vehicles, the second classification information dictionary is constructed, and the second classification information dictionary is stored；The luminescence feature of the warning lamp includes the flash pattern of the color of warning lamp, warning lamp.
The device according to claim 23 or 25, which is characterized in that the multi-modal sension unit further includes access index generation module；

The access index generation module generates access index for encoding according to sense of hearing classification coding and/or the visual category；

The matching unit matches the type and state of the emergency vehicles according to the access index from the first classification information dictionary and/or the second classification information dictionary.
Device according to claim 26, which is characterized in that

When the access index includes that the sense of hearing classification encodes, the matching unit is searched in the first classification information dictionary according to the access index and matches the type and state for encoding corresponding emergency vehicles with the sense of hearing classification；

When the access index includes that the visual category encodes, the matching unit is searched in the second classification information dictionary according to the access index and matches the type and state for encoding corresponding emergency vehicles with the visual category；

When the access index includes sense of hearing classification coding and visual category coding, the matching unit searches the type and state for matching and encoding common corresponding emergency vehicles with sense of hearing classification coding and the visual category according to the access index in the first classification information dictionary and the second classification information dictionary.
A kind of vehicle, which is characterized in that including anxious vehicle identifier described in claim 10-18,19-27 any claim, the type and state of emergency vehicles identification device emergency vehicles for identification.