CN114944152A - Vehicle whistling sound identification method - Google Patents

Vehicle whistling sound identification method Download PDF

Info

Publication number
CN114944152A
CN114944152A CN202210854185.7A CN202210854185A CN114944152A CN 114944152 A CN114944152 A CN 114944152A CN 202210854185 A CN202210854185 A CN 202210854185A CN 114944152 A CN114944152 A CN 114944152A
Authority
CN
China
Prior art keywords
audio
image
vehicle
segment
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210854185.7A
Other languages
Chinese (zh)
Inventor
王丹
崔洋洋
杨登舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Weina Perception Computing Technology Co ltd
Original Assignee
Shenzhen Weina Perception Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Weina Perception Computing Technology Co ltd filed Critical Shenzhen Weina Perception Computing Technology Co ltd
Priority to CN202210854185.7A priority Critical patent/CN114944152A/en
Publication of CN114944152A publication Critical patent/CN114944152A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application provides a vehicle whistle recognition method, which comprises the following steps: acquiring a first audio, wherein the first audio comprises a whistle sound of a vehicle; segmenting the first audio to obtain a first audio segment; extracting the characteristics of the first audio segment to obtain a characteristic vector of the first audio segment; and determining the category of the vehicle according to the feature vector of the first audio segment. According to the method, the lengthy automobile whistle is divided into a plurality of sections, and then the characteristic extraction is carried out, so that the adverse effect of additional factors on the characteristic extraction can be effectively reduced, and the accuracy of the characteristic extraction is improved. In addition, the method of intercepting the fragments can effectively reduce the audio data of the subsequent processing and improve the processing efficiency. According to the scheme, the audio features which fully reflect the characteristics of the whistle sounds of different vehicles can be acquired, classification is carried out based on the audio features, and the accuracy of classification can be effectively improved.

Description

Vehicle whistling sound identification method
Technical Field
The application belongs to the technical field of vehicle whistle recognition, and particularly relates to a vehicle whistle recognition method.
Background
Sound is sound waves generated by the vibration of an object. Generally, sound is a wave phenomenon that propagates through a medium and can be perceived by the human or animal auditory organs. The object that initially emits the vibration is called the sound source. Sound is propagated in the form of waves. The sounds are the movement formed by sound waves propagating through any medium, and the sounds of many objects are different, wherein the car whistling sounds are one type of sounds, and in order to automatically determine some car whistling sounds and thus penalize vehicles prohibited from whistling in some scenes, the category of the whistling vehicles needs to be identified.
In the conventional scheme, the vehicle category is simply judged according to the whole section of whistle, but the judged vehicle category may be wrong due to excessive audio interference factors of the whistle, such as doped environmental noise, and even possible influence of whistle of other vehicles.
Therefore, how to improve the accuracy of vehicle whistle recognition is an urgent technical problem to be solved.
Disclosure of Invention
The embodiment of the application provides a vehicle whistle recognition method, which can improve the accuracy of vehicle whistle recognition.
In a first aspect, an embodiment of the present application provides a vehicle whistle recognition method, including: acquiring a first audio frequency, wherein the first audio frequency comprises a whistle sound of a vehicle; segmenting the first audio to obtain a first audio segment; extracting the characteristics of the first audio segment to obtain a characteristic vector of the first audio segment; and determining the category of the vehicle according to the feature vector of the first audio segment.
In a possible implementation manner of the first aspect, when performing feature extraction on the first audio to obtain the feature vector of the first audio segment, the feature extraction may be implemented by using a feature extraction model, that is, performing feature extraction on the first audio by using the feature extraction model.
In a possible implementation manner of the first aspect, when performing feature extraction on the first audio segment to obtain a feature vector of the first audio segment, the following operations may be performed: preprocessing the first audio clip to obtain a second audio clip, wherein the second audio clip is used for representing the preprocessed first audio clip; and performing feature extraction on the second audio segment to obtain a feature vector of the second audio segment.
In a possible implementation manner of the first aspect, when the first audio segment is preprocessed to obtain the second audio segment, the following operations may be performed: extracting and eliminating fuzzy fragments in the first audio fragment to obtain a second audio fragment; and/or, removing noise of the first audio segment, thereby obtaining a second audio segment; and/or filtering the first audio segment to obtain a second audio segment.
In a possible implementation manner of the first aspect, when determining the category of the vehicle according to the feature vector of the first audio segment, the feature vector of the first audio segment may be processed by using the first deep learning model, and the category (classification result) of the vehicle output by the first deep learning model may be used as the category of the vehicle.
In a possible implementation manner of the first aspect, the method may further include: acquiring a first image, wherein the first image comprises a vehicle; performing feature extraction on the first image to obtain a feature vector of the first image, wherein the feature vector of the first image comprises a category parameter of a vehicle; the determining the category of the vehicle according to the feature vector of the first audio piece may include: and determining the class of the vehicle according to the feature vector of the first audio segment and the feature vector of the first image.
In one possible implementation manner of the first aspect, when determining the category of the vehicle according to the feature vector of the first audio piece and the feature vector of the first image, the following operations may be performed: determining a first candidate category of the vehicle according to the feature vector of the first audio segment; determining a second candidate category of the vehicle according to the feature vector of the first image; when the first candidate category is the same as the second candidate category, the first candidate category or the second candidate category is determined as a category of the vehicle.
In a possible implementation manner of the first aspect, when determining the category of the vehicle according to the feature vector of the first audio segment and the feature vector of the first image, the feature vector of the first audio segment may be processed by using a first deep learning model, and the category (classification result) of the vehicle output by the first deep learning model is taken as the first candidate category; the feature vector of the first image is processed by the second deep learning model, and the vehicle category (classification result) output by the second deep learning model is taken as the second candidate category.
In a possible implementation manner of the first aspect, when performing feature extraction on the first image to obtain a feature vector of the first image, the following operations may be performed: preprocessing the first image to obtain a second image, wherein the second image is used for representing the preprocessed first image; and performing feature extraction on the second image to obtain a feature vector of the second image.
In a possible implementation manner of the first aspect, when the first image is preprocessed to obtain the second image, the following operations may be performed: adjusting at least one parameter of contrast, definition or pixels of the first image to obtain a second image; and/or carrying out binarization processing on the first image to obtain a second image.
In a second aspect, the present application provides a vehicle whistle recognition apparatus, which includes means for implementing the method of the first aspect and any one of the implementation manners thereof.
In one possible implementation manner of the second aspect, the vehicle whistle recognition device includes: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring first audio, and the first audio comprises a whistling sound of a vehicle; the processing unit is used for segmenting the first audio to obtain a first audio segment; extracting the characteristics of the first audio segment to obtain a characteristic vector of the first audio segment; and determining the category of the vehicle according to the feature vector of the first audio segment.
In one possible implementation manner of the second aspect, the processing unit includes: the segmentation module is used for segmenting the first audio to obtain a first audio segment; and the audio feature extraction module is used for extracting features of the first audio segment to obtain a feature vector of the first audio segment.
In one possible implementation manner of the second aspect, the audio feature extraction module includes: the first preprocessing submodule is used for preprocessing the first audio clip to obtain a second audio clip, and the second audio clip is used for representing the preprocessed first audio clip; and the first extraction submodule is used for extracting the features of the second audio segment to obtain the feature vector of the second audio segment.
In a possible implementation manner of the second aspect, the first preprocessing submodule is specifically configured to: extracting and eliminating fuzzy fragments in the first audio fragment to obtain a second audio fragment; and/or, removing noise of the first audio segment, thereby obtaining a second audio segment; and/or filtering the first audio segment to obtain a second audio segment.
In a possible implementation manner of the second aspect, the obtaining unit is further configured to obtain a first image, where the first image includes the vehicle; the processing unit further includes: the image feature extraction module is used for extracting features of the first image to obtain a feature vector of the first image, wherein the feature vector of the first image comprises the category parameters of the vehicle; and the classification module is used for determining the category of the vehicle according to the feature vector of the first audio fragment and the feature vector of the first image.
In one possible implementation manner of the second aspect, the classification module includes: the audio classification submodule is used for determining a first candidate category of the vehicle according to the feature vector of the first audio fragment; the image classification submodule is used for determining a second candidate category of the vehicle according to the feature vector of the first image; a category determination sub-module for determining the first candidate category or the second candidate category as a category of the vehicle when the first candidate category is the same as the second candidate category.
In one possible implementation manner of the second aspect, the image feature extraction module includes: the second preprocessing submodule is used for preprocessing the first image to obtain a second image, and the second image is used for representing the preprocessed first image; and the second extraction submodule is used for extracting the features of the second image to obtain the feature vector of the second image.
In a possible implementation manner of the second aspect, the second preprocessing submodule is specifically configured to:
adjusting at least one parameter of contrast, definition or pixels of the first image to obtain a second image; and/or
And carrying out binarization processing on the first image to obtain a second image.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method of the first aspect and any implementation manner thereof is implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method of the first aspect and any one of the implementation manners thereof.
In a fifth aspect, embodiments of the present application provide a computer program product, which, when running on a computer device, enables the computer device to implement the method of the first aspect and any implementation manner thereof.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Compared with the prior art, the embodiment of the application has the advantages that: the method comprises the steps of firstly segmenting a first audio frequency containing whistling sounds, then carrying out feature extraction on an obtained first audio frequency segment to obtain a feature vector of the first audio frequency segment, and then classifying vehicles according to the feature vector of the first audio frequency segment. Partial interference can be filtered out through segmentation, so that effective audio features can be extracted more easily during feature extraction. In other words, the lengthy automobile whistle is divided into a plurality of sections, and then the characteristic extraction is carried out, so that the adverse effect of additional factors on the characteristic extraction can be effectively reduced, and the accuracy of the characteristic extraction is improved. In addition, the method of intercepting the fragments can effectively reduce the audio data of the subsequent processing and improve the processing efficiency. According to the scheme, the audio features which fully reflect the characteristics of the whistle sounds of different vehicles can be acquired, classification is carried out based on the audio features, and the accuracy of classification can be effectively improved.
Drawings
Fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a vehicle whistle recognition method according to an embodiment of the present application.
Fig. 3 is a schematic flowchart of one example of step S202.
Fig. 4 is a schematic flow chart of one implementation of step S302.
FIG. 5 is a schematic flow chart diagram of another vehicle whistle recognition method according to the embodiment of the application.
Fig. 6 is a schematic diagram of a vehicle whistle recognition device according to an embodiment of the present application.
Fig. 7 is a schematic view of another vehicle whistle recognition device according to the embodiment of the present application.
Fig. 8 is a schematic view of still another vehicle whistle recognition device according to the embodiment of the present application.
Fig. 9 is a schematic diagram of a whistling audio learning module according to an embodiment of the application.
FIG. 10 is a schematic diagram of an automobile type learning module of an embodiment of the present application.
FIG. 11 is a schematic diagram of an automobile image processing and analyzing module according to an embodiment of the present application.
Fig. 12 is a schematic diagram of an automobile whistle audio processing and analyzing module according to an embodiment of the application.
Fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The vehicle whistle sound identification method provided by the embodiment of the application can be applied to computer devices such as vehicle-mounted devices, vehicle networking terminals, road monitoring devices, mobile phones, tablet computers, wearable devices, vehicle-mounted devices, Augmented Reality (AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs), and the like, and the embodiment of the application does not have any limitation on specific types of the computer devices.
Optionally, the computer device may also be a wearable device, and the wearable device may also be a generic term for intelligently designing daily wearing by applying a wearable technology, and developing wearable devices, such as glasses or helmets. The wearable device may be worn directly on the body or may be a portable device integrated into the user's clothing or accessory. The wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction and cloud interaction. The generalized wearable intelligent device has the advantages that the generalized wearable intelligent device is complete in function and large in size, can realize complete or partial functions without depending on a smart phone, such as a smart watch or smart glasses, and only is concentrated on a certain application function, and needs to be matched with other devices such as the smart phone for use, such as various smart bracelets for monitoring physical signs, smart jewelry and the like.
In one example, where the wearable device is a helmet, the user may perform the steps associated with utilizing a module disposed in the helmet that can be used to perform the vehicle siren recognition method of the embodiments of the present application. For example, the audio acquisition device of the helmet acquires a whistling audio signal, and then the processor built in the helmet performs operations such as feature extraction on the acquired audio signal to obtain the category of the vehicle. Thus, a user wearing the helmet can know what type of vehicle the whistle is.
Fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present application. As shown in fig. 1, the scene includes a vehicle a and a vehicle whistle recognition device B, where the vehicle whistle recognition device B collects an audio signal and/or an image signal of the vehicle a, processes the collected signal, and outputs a category of the vehicle a.
It should be understood that, in the embodiment of the present application, the vehicle may be various vehicles capable of whistling, such as a conventional automobile, an electric automobile, a new energy automobile, an intelligent automobile, an electric bicycle, or a motorcycle. Such as a bus, a fire truck, an ambulance, a truck, a passenger car, a private car, etc., to name but a few.
The audio signal of the vehicle a includes a whistle sound of the vehicle a. In one example, the vehicle a is a vehicle to be classified (i.e., a vehicle to be detected), and the audio signal of the vehicle a is the first audio described herein.
The image signal of the vehicle a is an imaged image containing the vehicle a. In one example, the vehicle a is a vehicle to be classified, and the image signal of the vehicle a is a first image described herein.
The vehicle whistle recognition device B in fig. 1 may be provided in any of the above-described computer devices, or may be any of the above-described computer devices.
Fig. 2 is a schematic flow chart of a vehicle whistle recognition method according to an embodiment of the present application. As shown in fig. 2, the method includes:
s201, obtaining a first audio, wherein the first audio comprises a whistling sound of a vehicle.
It should be understood that step S201 may be to acquire the first audio in real time by using a microphone, an audio acquisition device, or the like capable of acquiring an audio signal; or reading the first audio from the storage device; the first audio may also be obtained via a network using the communication interface. There is no limitation in the embodiments of the present application.
In one implementation, a Sound Source Localization (SSL) module may be further used to locate the vehicle in whistle, and then an audio acquisition module is used to acquire the first audio, and a data transmission module is used to transmit the first audio to the processing unit for further processing, for example, the processing in steps S202 and S203.
The sound source positioning module is a module for positioning a sound source by using a sound source positioning technology. The sound source localization technology is to measure sound signals at different positions of an environment by using a plurality of microphones, and to process the measured sound signals by using an algorithm due to different delays of the arrival times of the sound signals at the microphones, so as to obtain the arrival directions (including azimuth angles and pitch angles) and distances of sound source points relative to the microphones.
Optionally, when the first audio is collected, a silencing device may be further utilized to remove the influence of environmental noise and improve the quality of the first audio.
S202, extracting the characteristics of the first audio to obtain the characteristic vector of the first audio.
It should be noted that, the feature extraction is to extract the audio features in the audio signal through a feature extraction model by using a machine learning or deep learning method. The method is different from the traditional method for estimating the single parameter of the audio signal, and the characteristics which can reflect the characteristics of the audio can be obtained more abundantly by extracting through a characteristic extraction model.
The feature extraction model may be a convolutional neural network model, a deep neural network model, or other non-neural network model.
In one implementation, the audio features include audio frequency, audio pitch, and audio intensity.
That is, step S202 mainly extracts the audio features of the first audio by means of feature extraction, and these audio features are included in the feature vector of the first audio.
In one implementation, step S202 may include the steps shown in fig. 3, that is, the first audio may be segmented, and then feature extraction may be performed on the obtained first audio segment, and the details will be expanded in detail in fig. 3, and will not be repeated here.
And S203, determining the category of the vehicle according to the feature vector of the first audio.
Alternatively, step S203 may be performed by constructing and training a neural network model, which may be referred to as a deep learning model, using a deep learning method. That is, the feature vector of the first audio may be classified using a deep learning model, resulting in a vehicle class. The convolutional neural network model is taken as an example below.
Assuming that step S203 is executed by using the convolutional neural network model, the feature vector of the first audio is input into the convolutional neural network model, and after the convolutional neural network model is processed, the vehicle category corresponding to the feature vector is output. The training data of the convolutional neural network model includes audio samples of whistling sounds of different vehicles and labels of vehicle classes corresponding to each audio sample (which may be understood as actual vehicle classes corresponding to the audio samples). During training, the audio samples are input into the convolutional neural network model to obtain an estimated value and confidence degree of the vehicle category, the estimated value may be the same as or different from the label, and the weight parameters of the convolutional neural network model are adjusted according to a loss function constructed by the difference of the estimated value and the label, so that the convolutional neural network model can have more and more accurate classification capability.
The audio sample may be obtained by causing a vehicle of a known vehicle category to whistle in a noise-free environment and then collecting an audio signal as the audio sample, where the known vehicle category is a label of the audio sample.
It should be understood that the above is only described by taking a convolutional neural network model as an example, and those skilled in the art may also use other deep learning models, such as a cyclic neural network model, etc., without limitation.
It should be noted that, since step S202 may include the step shown in fig. 3, the feature vector of the first audio segment is extracted, in this case, step S203 determines the category of the vehicle specifically according to the feature vector of the first audio segment. That is, only the feature vector input into the deep learning model is different, being the feature vector of the first audio in the case of no segmentation and the feature vector of the first audio piece in the case of segmentation.
In the method shown in fig. 2, the first audio including the whistling sound is segmented, the feature of the obtained first audio segment is extracted to obtain the feature vector of the first audio, and the vehicles are classified according to the feature vector of the first audio segment.
Fig. 3 is a schematic flowchart of one example of step S202. As shown in fig. 3, step S202 includes:
s301, segmenting the first audio to obtain a first audio segment.
The first audio segment may be one or more segments, that is, one or more segments may be cut from the first audio. The time length of the first audio piece is less than or equal to the time length of the first audio. It should be understood that if the time length of the first audio segment is equal to the time length of the first audio, it is equivalent to no segmentation, so the time length of the first audio segment generally takes a value less than the time length of the first audio. It will also be appreciated that if the time length of the first audio piece is too short, this may result in too few audio features being included or difficult to extract, and therefore, a suitable time length of the first audio piece should be cut.
In one implementation, the first audio piece has only one piece, that is, one audio piece is cut from the entire audio signal of the first audio.
Alternatively, the first audio piece may be a piece having an appropriate length located at an intermediate period of the entire piece of the audio signal of the first audio. In general, the audio is stable in the middle part, so that the interference is small when the segment of the middle time period is intercepted.
In one example, assuming that the time length of the first audio is a, the time length of the first audio segment is B% × a. A, B are all positive real numbers, and B is greater than 0 and less than 100. For example, a =10 seconds (S) and B =30, 40 or 50, that is, the temporal length of the first audio piece is 3S, 4S or 5S. In this example, the first audio clip may be cut from between the 2 nd second and the 8 th second. It should be understood that this example is only for illustrating that an audio clip with a proper length can be preferentially cut from the middle time interval, but there is no limitation on the specific numerical value, and those skilled in the art can set the numerical value according to practical situations.
In another implementation, the first audio segment includes a plurality of segments, that is, a plurality of audio segments are cut from the entire audio signal of the first audio. For each audio segment, reference may be made to the related description of one audio segment, and details are not repeated.
S302, feature extraction is carried out on the first audio segment to obtain a feature vector of the first audio segment.
Step S302 may refer to the relevant introduction of the feature extraction of the first audio in step S202. For example, the first audio segment may be feature extracted using a feature extraction model. The feature extraction model may also be a convolutional neural network model, a deep neural network model, or other non-neural network model. Except that the audio samples used to train the model are audio clips.
In the conventional scheme, a whole audio is processed, and if the first audio is taken as an example, the audio parameters are calculated for the whole audio of the first audio, and due to the existence of a plurality of interference factors, the calculated audio parameters are inaccurate. In the scheme of fig. 3, a proper segment is intercepted for processing, so that partial interference can be filtered, especially, a part where interference easily exists at the head and the tail of the audio is removed, and effective audio features are more easily extracted during feature extraction. In other words, the lengthy automobile whistle is divided into a plurality of sections, and then the characteristic extraction is carried out, so that the adverse effect of additional factors on the characteristic extraction can be effectively reduced, and the accuracy of the characteristic extraction is improved. In addition, the method of intercepting the fragments can effectively reduce the audio data of the subsequent processing and improve the processing efficiency.
Fig. 4 is a schematic flow chart of one implementation of step S302. As shown in fig. 4, step S302 includes:
s401, preprocessing the first audio clip to obtain a second audio clip, wherein the second audio clip is used for representing the preprocessed first audio clip.
The first audio segment is preprocessed, so that interference components in the first audio segment can be mainly removed, and audio features can be better extracted.
In one implementation, step S402 may include performing the following operations: extracting and eliminating fuzzy fragments in the first audio fragment to obtain a second audio fragment; and/or, removing noise of the first audio segment, thereby obtaining a second audio segment; and/or filtering the first audio segment to obtain a second audio segment.
That is, the preprocessing may include at least one of the three preprocessing methods.
S402, extracting the features of the second audio segment to obtain the feature vector of the second audio segment.
In step S402, reference may be made to the related introduction of feature extraction performed on the first audio segment in step S302, which is not described in detail.
It should be further noted that, the methods shown in fig. 3 and fig. 4 are further improved solutions capable of achieving better technical effects, and the methods shown in fig. 3 and fig. 4 may be used together, that is, the first audio is segmented, and then the audio segment is preprocessed and feature extracted, in this case, the accuracy of vehicle whistle recognition can be further improved, and the manner of cutting one segment can also improve the processing efficiency. However, in some cases, the steps shown in fig. 3 may not be executed, and only the preprocessing related step may be executed, that is, the first audio may be directly preprocessed and feature extracted, and in this case, the accuracy of the vehicle whistle recognition may be further improved to some extent, but the segmentation operation does not have the technical effect of the segmentation operation. In other cases, the first audio can be preprocessed, and then the preprocessed first audio is segmented, so that the accuracy of vehicle whistle recognition can be further improved, and the processing efficiency can be improved by intercepting one segment.
Fig. 5 is a schematic flow chart of another vehicle whistle recognition method according to the embodiment of the present application. Fig. 5 can be regarded as an example of fig. 1. As shown in fig. 5, the method includes:
s501, obtaining a first audio, wherein the first audio comprises a vehicle whistling sound.
Step S501 may refer to the related description of step S201 and will not be repeated.
S502, extracting the characteristics of the first audio to obtain the characteristic vector of the first audio.
Step S502 may refer to step S202 and the related descriptions of fig. 3 and 4, and will not be repeated.
S503, acquiring a first image, wherein the first image comprises the vehicle.
Step S503 may be acquiring the first image in real time by using an image acquisition device, such as a camera or a video camera capable of acquiring images; the first image may be read from a storage device; the first image may also be acquired over a network using the communication interface. There is no limitation in the embodiments of the present application.
It should be understood that the execution order of step S501 and step S503 is not limited, and may be executed simultaneously or not simultaneously, and the order is not affected.
S504, feature extraction is carried out on the first image to obtain a feature vector of the first image.
Wherein the feature vector of the first image comprises a category parameter of the vehicle. The category parameter of the vehicle refers to a parameter that can distinguish a category of the vehicle, for example, a shape, a contour, a volume, a structure, and the like of the vehicle.
The feature extraction of the first image can also extract features of the vehicle in the first image through a feature extraction model by using a machine learning or deep learning method. The feature extraction model may be a convolutional neural network model, a deep neural network model, or other non-neural network model.
And S505, determining the category of the vehicle according to the feature vector of the first audio and the feature vector of the first image.
Step S505 may be regarded as an example of one specific implementation of step S203.
In one possible implementation, when determining the category of the vehicle from the feature vector of the first audio and the feature vector of the first image, the following operations may be performed: determining a first candidate category of the vehicle according to the feature vector of the first audio; determining a second candidate category of the vehicle according to the feature vector of the first image; when the first candidate category is the same as the second candidate category, the first candidate category or the second candidate category is determined as a category of the vehicle.
It should be understood that step S505 may determine the category of the vehicle according to the feature vector of the first audio and the feature vector of the first image; the category of the vehicle may also be determined based on the feature vector of the first audio piece and the feature vector of the first image. For simplicity of description, only the former case will be described below, and for the latter case, only the feature vector of the first audio needs to be replaced by the feature vector of the first audio segment.
In one example, a feature vector of a first audio may be classified using a first deep learning model to obtain a first candidate class, and a feature vector of a first image may be classified using a second deep learning model to obtain a second candidate class. Since the classification of the feature vector for the first audio has been described above, the classification of the feature vector for the first image is mainly described here.
If the step of obtaining the second candidate category is executed by using the second deep learning model, the feature vector of the first image is input into the second deep learning model, and the vehicle category corresponding to the feature vector is output after the second deep learning model performs processing. The training data of the second deep learning model includes image samples of different vehicles and labels of vehicle classes corresponding to each image sample (which can be understood as actual vehicle classes corresponding to the image samples). During training, the image sample is input into the second deep learning model to obtain an estimated value and confidence degree of the vehicle category, the estimated value may be the same as or different from the label, and the weight parameter of the second deep learning model is adjusted according to a loss function constructed by the difference between the estimated value and the label, so that the second deep learning model can have more and more accurate classification capability.
The image sample can be obtained by taking clear images of vehicles of known vehicle classes, which are labels of the image sample, and then using the clear images as the image sample.
It should be noted that, a person skilled in the art may select different deep learning models according to different characteristics of the audio signal and the image signal, for example, audio generally has correlation before and after time, so a cyclic neural network such as a long-short term memory neural network may be selected as the first deep learning model, and an image may need more convolution operations, so a convolutional neural network may be selected as the second deep learning model.
By comparing whether the two candidate categories are consistent or not and outputting the category of the vehicle only when the two candidate categories are consistent, the accuracy of vehicle category identification can be further improved. Steps S502, S504 and S505 may be re-executed if they are not identical, or steps S501 to S505 may be re-executed.
In a possible implementation manner, when feature extraction is performed on the first image to obtain a feature vector of the first image, the following operations may be performed: preprocessing the first image to obtain a second image, wherein the second image is used for representing the preprocessed first image; and performing feature extraction on the second image to obtain a feature vector of the second image.
In a possible implementation manner, when the first image is preprocessed to obtain the second image, the following operations may be performed: adjusting at least one parameter of contrast, definition or pixels of the first image to obtain a second image; and/or carrying out binarization processing on the first image to obtain a second image.
The pre-processing of the first image is intended to make the features of the vehicle in the first image more prominent. Since the category of the vehicle is generally less correlated with the color, and vehicles in the same category may have multiple colors, the elimination of the image feature of the color does not affect the accuracy of subsequent classification.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 6 is a schematic view of a vehicle whistle recognition apparatus according to the embodiment of the present application, corresponding to the vehicle whistle recognition method according to the above embodiment. For convenience of explanation, only portions related to the embodiments of the present application are shown. Referring to fig. 6, the apparatus 1000 includes: an acquisition unit 1001 and a processing unit 1002.
The apparatus 1000 can be used to perform any of the steps of the vehicle whistle recognition method described above. For example, the acquisition unit 1001 may be used to perform step S201; the processing unit 1002 may be configured to perform steps S202 and S203. For another example, the processing unit 1002 may be configured to perform steps S301, S302, S401, and S402. For another example, the acquisition unit 1001 may be configured to perform steps S501 and S503; the processing unit 1002 may be configured to perform steps S502, S504, and S505.
Fig. 7 is a schematic view of another vehicle whistle recognition device according to the embodiment of the present application. Fig. 7 can be regarded as an example of the device shown in fig. 6. The apparatus 1000 as shown in fig. 7 comprises an acquisition unit 1001 and a processing unit 1002.
In one possible implementation, the processing unit 1002 includes: a segmenting module 210, configured to segment the first audio to obtain a first audio segment; the audio feature extraction module 220 is configured to perform feature extraction on the first audio segment to obtain a feature vector of the first audio segment.
In one possible implementation, the audio feature extraction module 220 includes: the first preprocessing submodule 221 is configured to preprocess the first audio segment to obtain a second audio segment, where the second audio segment is used to represent the preprocessed first audio segment; the first extracting sub-module 222 is configured to perform feature extraction on the second audio segment to obtain a feature vector of the second audio segment.
In a possible implementation manner, the first preprocessing sub-module 221 is specifically configured to: extracting and eliminating fuzzy fragments in the first audio fragment to obtain a second audio fragment; and/or, removing noise of the first audio segment, thereby obtaining a second audio segment; and/or filtering the first audio segment to obtain a second audio segment.
In a possible implementation manner, the obtaining unit 1001 is further configured to obtain a first image, where the first image includes a vehicle; the processing unit 1002 further includes: an image feature extraction module 230, configured to perform feature extraction on the first image to obtain a feature vector of the first image, where the feature vector of the first image includes the category parameter of the vehicle; and the classification module 240 is used for determining the category of the vehicle according to the feature vector of the first audio and the feature vector of the first image.
In one possible implementation, the classification module 240 includes: the audio classification submodule 241 is used for determining a first candidate category of the vehicle according to the feature vector of the first audio; an image classification sub-module 242 for determining a second candidate category of the vehicle based on the feature vector of the first image; a category determination sub-module 243 for determining the first candidate category or the second candidate category as the category of the vehicle when the first candidate category is the same as the second candidate category.
In one possible implementation, the image feature extraction module 230 includes: a second preprocessing submodule 231, configured to preprocess the first image to obtain a second image, where the second image is used to represent the preprocessed first image; the second extraction submodule 232 is configured to perform feature extraction on the second image to obtain a feature vector of the second image.
In a possible implementation manner, the second preprocessing submodule 231 is specifically configured to: adjusting at least one parameter of contrast, definition or pixels of the first image to obtain a second image; and/or carrying out binarization processing on the first image to obtain a second image.
Fig. 8 is a schematic view of still another vehicle whistle recognition device according to the embodiment of the present application. Fig. 8 may be regarded as an example of the apparatus shown in fig. 6 or 7.
As shown in fig. 8, the vehicle whistle sound recognition apparatus includes a whistle audio learning module 1, an automobile type learning module 2, an automobile whistle sound source localization module 8, and an automobile outline image acquisition module 13. The output ends of the whistling audio learning module 1 and the automobile type learning module 2 are connected with the input end of the machine learning module 5, the output end of the machine learning module 5 is electrically connected with the input end of the data storage module 6, and the output end of the data storage module 6 is connected with the input end of the database 7.
It should be understood that in this example, the machine learning module 5, the data storage module 6, and the database 7 are not part of the vehicle siren recognition device.
This vehicle recognition device that whistles sound still includes environment noise and rejects module 9, car whistling audio frequency acquisition module 10, data transmission module 11 and car whistling audio frequency processing analysis module 3, the output of car whistling sound source positioning module 8 is connected with environment noise and rejects module 9's input, environment noise rejects module 9's output and car whistling audio frequency acquisition module 10's input and is connected, car whistling audio frequency acquisition module 10's output is connected with data transmission module 11's input, data transmission module 11's output is connected with car whistling audio frequency processing analysis module 3's input.
The vehicle whistle recognition device further comprises an automobile image processing and analyzing module 4, an automobile appearance image acquiring module 12, an information comparison and analysis module 13 and an automobile category judging module 14. The output end of the automobile appearance image acquisition module 12 is connected with the input end of the automobile image processing and analyzing module 4, the output ends of the automobile whistle audio processing and analyzing module 3 and the automobile image processing and analyzing module 4 are both connected with the input end of the information comparison and analysis module 13, the information comparison and analysis module 13 is in two-way connection with the database 7, and the output end of the information comparison and analysis module 13 is connected with the input end of the automobile type judging module 14.
In one example, as shown in fig. 9, the whistle audio learning module 1 includes a car type classification module 101, a whistle audio classification module 102, a whistle audio acquisition module 103, a whistle audio feature extraction module 104, and a deep learning module 105. The output end of the automobile type dividing module 101 is connected with the input end of the whistle audio classification module 102, the output end of the whistle audio classification module 102 is connected with the input end of the whistle audio acquisition module 103, the output end of the whistle audio acquisition module 103 is connected with the input end of the whistle audio feature extraction module 104, the output end of the whistle audio feature extraction module 104 is connected with the input end of the deep learning module 105, and the extracted audio features include audio frequency, audio tone and audio sound intensity.
In one example, as shown in fig. 10, the automobile type learning module 2 includes an automobile type dividing module 201, an automobile outline image obtaining module 202, an image feature extracting module 203, an interference feature eliminating module 204, and a deep learning module 205. The output end of the automobile type dividing module 201 is connected with the input end of the automobile outline image obtaining module 202, the output end of the automobile outline image obtaining module 202 is connected with the input end of the image feature extracting module 203, the output end of the image feature extracting module 203 is connected with the input end of the interference feature eliminating module 204, and the output end of the interference feature eliminating module 204 is connected with the input end of the deep learning module 205.
Through being provided with deep learning module 105 in, can study to the different whistle voices of different cars, extract the whistle sound characteristic, when carrying out subsequent contrast, can go on fast, improve system work efficiency, deep learning module 205 still learns car appearance, type simultaneously, when subsequently carrying out the automobile type judgement, at first judge to the car type according to the whistle sound, secondly through the image information who acquires, verify to the car type result of judging the completion, the two judges that all are unanimous, just can regard as final result. Through the bidirectional judgment design, the judgment precision is greatly improved.
In one example, as shown in fig. 11, the automobile image processing and analyzing module 4 includes an image contrast adjusting module 401, an image sharpness adjusting module 402, an image binarization processing module 403, and an image pixel adjusting module 404. The output end of the image contrast adjusting module 401 is connected to the input end of the image definition adjusting module 402, the output end of the image definition adjusting module 402 is connected to the input end of the image binarization processing module 403, and the output end of the image binarization processing module 403 is connected to the input end of the image pixel adjusting module 404. The acquired image information is effectively processed, so that the accuracy of image analysis is ensured.
In one example, as shown in fig. 12, the car whistle audio processing and analyzing module 3 includes a car whistle audio segmentation module 301, a car whistle audio fuzzy segment extraction module 302, a fuzzy segment rejection module 303, a car whistle audio denoising module 304, a car whistle audio filtering processing module 305, and a car whistle audio feature extraction module 306. The output end of the automobile whistle audio segmentation module 301 is connected with the input end of the automobile whistle audio fuzzy section extraction module 302, the output end of the automobile whistle audio fuzzy section extraction module 302 is connected with the input end of the fuzzy section rejection module 303, and the output end of the fuzzy section rejection module 303 is connected with the input end of the automobile whistle audio denoising module 304. The time period of the segmentation of the car whistle audio segmentation module 301 may be, for example, 3S, 4S, or 5S, the output end of the car whistle audio denoising module 304 is connected to the input end of the car whistle audio filtering processing module 305, and the output end of the car whistle audio filtering processing module 305 is connected to the input end of the car whistle audio feature extraction module 306.
Through setting up car whistle audio frequency processing analysis module 3, when carrying out the feature extraction to car whistle sound, can carry out the segmentation to car whistle sound, will be long car whistle sound and split into the multistage, carry out the feature extraction, can effectively reduce the adverse effect of extra factor to the feature extraction, promote the feature and draw the degree of accuracy, simultaneously, can reject to some fuzzy whistle sounds after cutting apart, avoid producing adverse effect.
In one example, the obtaining unit 1001 includes a car whistle audio obtaining module 10 and a data transmission module 11, wherein the car whistle audio obtaining module 10 is configured to collect an audio signal of a whistle (for ease of understanding, the first audio is taken as an example), and the data transmission module 11 is configured to transmit the first audio to the car whistle audio processing and analyzing module 12. The car whistle audio acquisition module 10 may be a device such as a microphone capable of acquiring an audio signal, and the data transmission module 11 may be a device such as a communication interface or an interface circuit capable of transmitting data. The obtaining unit 1001 may further include an automobile whistle sound source positioning module 8 and an environmental noise rejection module 9, where the automobile whistle sound source positioning module 8 and the environmental noise rejection module 9 are respectively used for positioning a whistle vehicle and eliminating environmental noise.
In one example, the car whistle audio acquisition module 10 and the data transmission module 11 are configured to perform steps S201 and/or S501.
In one example, the car whistle sound source positioning module 8, the environmental noise rejection module 9, the car whistle audio acquisition module 10, and the data transmission module 11 are configured to perform steps S201 and/or S501.
In one example, the acquiring unit 1001 includes a car outline image acquiring module 12 for acquiring an image (for ease of understanding, the first image is taken as an example), where the car outline image acquiring module 12 may be an image capturing device such as a camera, a video camera, or the like. The automobile outline image acquisition module 12 transmits the acquired first image to the automobile image processing and analyzing module 4.
In one example, the car outline image obtaining module 12 is configured to execute step S503.
In one example, the processing unit 1002 includes a car blast audio processing and analyzing module 3 for executing step S202 or S502. The car whistle audio processing and analyzing module 3 can also be used for executing the steps in fig. 3 or fig. 4.
In one example, the processing unit 1002 includes a car image processing and analyzing module 4 for executing step S504. The automobile image processing and analyzing module 4 may also be used for performing operations such as feature extraction and preprocessing on the first image. For example, when feature extraction is performed on the first image to obtain a feature vector of the first image, the following operations may be performed: preprocessing the first image to obtain a second image, wherein the second image is used for representing the preprocessed first image; and performing feature extraction on the second image to obtain a feature vector of the second image. For another example, when the first image is preprocessed to obtain the second image, the following operations may be performed: adjusting at least one parameter of contrast, definition or pixels of the first image to obtain a second image; and/or carrying out binarization processing on the first image to obtain a second image.
In one example, the processing unit 1002 includes the information comparison and analysis module 13 and the automobile type determination module 14, which are combined to execute step S203 or S505.
In one example, the first deep learning model is obtained using the whistling audio learning module 1, and the second deep learning model is obtained using the car type learning module 2. It should be understood that the whistle audio learning module 1 and the car type learning module 2 perform training of the deep learning model in a training phase (learning phase), so that the trained deep learning model has the capability of classifying the vehicle according to the audio signal or classifying the vehicle according to the image.
In one example, the inclusion of the car whistle audio processing analysis module 3, including the car whistle audio segmentation module 301, may be viewed as one example of the segmentation module 210.
In one example, the car whistle audio processing and analyzing module 3 further includes an audio feature extraction module 220, the audio feature extraction module 220 includes a first preprocessing sub-module 221 and a first extraction sub-module 222, the first preprocessing sub-module 221 includes a car whistle audio fuzzy segment extraction module 302, a fuzzy segment rejection module 303, a car whistle audio denoising module 304 and a car whistle audio filtering processing module 305, and the first extraction sub-module 222 includes a car whistle audio feature extraction module 306.
In one example, the image feature extraction module 230 includes an automobile image processing analysis module 4, and the automobile image processing analysis module 4 can be regarded as an example of the second preprocessing submodule 231, that is, the second preprocessing submodule 231 includes an image contrast adjustment module 401, an image sharpness adjustment module 402, an image binarization processing module 403, and an image pixel adjustment module 404.
It should be noted that fig. 6-12 describe the vehicle whistle recognition device according to the embodiment of the present application by using two division modes of modules or units, where fig. 6 and 7 are one division mode, and fig. 8-12 are another division mode, and it can be seen that although the two division modes are different, each functional module or unit can implement the functions of the vehicle whistle recognition method according to the embodiment of the present application when executing the steps of the vehicle whistle recognition method according to the embodiment of the present application. In an actual scenario, how to divide the functional modules does not affect the technical effect of the method in the embodiment of the present application, and a person skilled in the art may adopt any one of the two dividing manners or adopt any other dividing manner.
Fig. 13 is a schematic structural diagram of a vehicle whistle recognition device/computer device according to an embodiment of the present application. As shown in fig. 13, the vehicle siren identification device/computer apparatus 600 of this embodiment comprises: at least one processor 60 (only one shown in fig. 13), a memory 61, and a computer program 62 stored in the memory 61 and operable on the at least one processor 60, the processor 60 implementing the steps in any of the various vehicle blast identification method embodiments described above when executing the computer program 62.
The vehicle whistle recognition device/computer device 600 may be a desktop computer, a notebook, a palm top computer, a cloud server, or other computing device. The vehicle whistle recognition device/computer apparatus may include, but is not limited to, a processor 60, a memory 61. Those skilled in the art will appreciate that fig. 6 is merely exemplary of a vehicle siren identification device/computer device 600 and does not constitute a limitation of the vehicle siren identification device/computer device 600 and may include more or fewer components than shown, or some components in combination, or different components, such as input output devices, network access devices, etc.
The Processor 60 may be a Central Processing Unit (CPU), and the Processor 60 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may in some embodiments be an internal storage unit of the vehicle siren identification device/computer device 600, such as a hard disk or memory of the vehicle siren identification device/computer device 600. The memory 61 may also be an external storage device of the vehicle whistle recognition device/computer device 600 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the vehicle whistle recognition device/computer device 600. Further, the memory 61 may also include both an internal memory unit and an external memory device of the vehicle whistle recognition apparatus/computer device 600. The memory 61 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 61 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
An embodiment of the present application further provides a computer device, where the computer device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by instructing relevant hardware by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the methods described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/computer device, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-drive, a removable hard drive, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A vehicle whistle recognition method is characterized by comprising the following steps:
acquiring a first audio, wherein the first audio comprises a whistle sound of a vehicle;
segmenting the first audio to obtain a first audio segment;
extracting the characteristics of the first audio frequency segment to obtain a characteristic vector of the first audio frequency segment;
determining the category of the vehicle according to the feature vector of the first audio segment.
2. The method as claimed in claim 1, wherein said extracting features of the first audio segment to obtain a feature vector of the first audio segment comprises:
preprocessing the first audio clip to obtain a second audio clip, wherein the second audio clip is used for representing the preprocessed first audio clip;
and extracting the characteristics of the second audio frequency segment to obtain the characteristic vector of the second audio frequency segment.
3. The method of claim 2, wherein the pre-processing the first audio segment to obtain a second audio segment comprises:
extracting and eliminating fuzzy fragments in the first audio fragment so as to obtain a second audio fragment; and/or
Removing noise of the first audio segment, thereby obtaining the second audio segment; and/or
And filtering the first audio segment to obtain the second audio segment.
4. The method of any of claims 1 to 3, further comprising:
acquiring a first image, wherein the first image comprises the vehicle;
performing feature extraction on the first image to obtain a feature vector of the first image, wherein the feature vector of the first image comprises the category parameters of the vehicle;
the determining the category of the vehicle according to the feature vector of the first audio piece comprises:
determining the category of the vehicle according to the feature vector of the first audio segment and the feature vector of the first image.
5. The method of claim 4, wherein the determining the category of the vehicle from the feature vector of the first audio segment and the feature vector of the first image comprises:
determining a first candidate category of the vehicle according to the feature vector of the first audio piece;
determining a second candidate category of the vehicle according to the feature vector of the first image;
determining the first candidate category or the second candidate category as a category of the vehicle when the first candidate category is the same as the second candidate category.
6. The method of claim 5, wherein said extracting features from said first image to obtain a feature vector of said first image comprises:
preprocessing the first image to obtain a second image, wherein the second image is used for representing the preprocessed first image;
and performing feature extraction on the second image to obtain a feature vector of the second image.
7. The method of claim 6, wherein pre-processing the first image to obtain a second image comprises:
adjusting at least one parameter of contrast, definition or pixels of the first image to obtain a second image; and/or
And carrying out binarization processing on the first image to obtain the second image.
8. A vehicle whistle recognition device characterized by comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first audio frequency, and the first audio frequency comprises a whistle sound of a vehicle;
the processing unit is used for segmenting the first audio to obtain a first audio segment; extracting the characteristics of the first audio frequency segment to obtain a characteristic vector of the first audio frequency segment; determining the category of the vehicle according to the feature vector of the first audio segment.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210854185.7A 2022-07-20 2022-07-20 Vehicle whistling sound identification method Pending CN114944152A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210854185.7A CN114944152A (en) 2022-07-20 2022-07-20 Vehicle whistling sound identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210854185.7A CN114944152A (en) 2022-07-20 2022-07-20 Vehicle whistling sound identification method

Publications (1)

Publication Number Publication Date
CN114944152A true CN114944152A (en) 2022-08-26

Family

ID=82910975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210854185.7A Pending CN114944152A (en) 2022-07-20 2022-07-20 Vehicle whistling sound identification method

Country Status (1)

Country Link
CN (1) CN114944152A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116232A (en) * 2022-08-29 2022-09-27 深圳市微纳感知计算技术有限公司 Voiceprint comparison method, device and equipment for automobile whistling and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065627A (en) * 2012-12-17 2013-04-24 中南大学 Identification method for horn of special vehicle based on dynamic time warping (DTW) and hidden markov model (HMM) evidence integration
CN106448183A (en) * 2016-11-19 2017-02-22 郑州玄机器人有限公司 An automobile horn-blowing monitor system, apparatus and method
CN109816987A (en) * 2019-01-24 2019-05-28 苏州清听声学科技有限公司 A kind of automobile whistle electronic police enforces the law capturing system and its grasp shoot method
CN110765837A (en) * 2019-08-30 2020-02-07 深圳市元征科技股份有限公司 Method and device for identifying vehicle whistling against regulations and terminal equipment
CN111739542A (en) * 2020-05-13 2020-10-02 深圳市微纳感知计算技术有限公司 Method, device and equipment for detecting characteristic sound
CN112532941A (en) * 2020-11-30 2021-03-19 南京中科声势智能科技有限公司 Vehicle source intensity monitoring method and device, electronic equipment and storage medium
CN113420178A (en) * 2021-07-14 2021-09-21 腾讯音乐娱乐科技(深圳)有限公司 Data processing method and equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065627A (en) * 2012-12-17 2013-04-24 中南大学 Identification method for horn of special vehicle based on dynamic time warping (DTW) and hidden markov model (HMM) evidence integration
CN106448183A (en) * 2016-11-19 2017-02-22 郑州玄机器人有限公司 An automobile horn-blowing monitor system, apparatus and method
CN109816987A (en) * 2019-01-24 2019-05-28 苏州清听声学科技有限公司 A kind of automobile whistle electronic police enforces the law capturing system and its grasp shoot method
CN110765837A (en) * 2019-08-30 2020-02-07 深圳市元征科技股份有限公司 Method and device for identifying vehicle whistling against regulations and terminal equipment
CN111739542A (en) * 2020-05-13 2020-10-02 深圳市微纳感知计算技术有限公司 Method, device and equipment for detecting characteristic sound
CN112532941A (en) * 2020-11-30 2021-03-19 南京中科声势智能科技有限公司 Vehicle source intensity monitoring method and device, electronic equipment and storage medium
CN113420178A (en) * 2021-07-14 2021-09-21 腾讯音乐娱乐科技(深圳)有限公司 Data processing method and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116232A (en) * 2022-08-29 2022-09-27 深圳市微纳感知计算技术有限公司 Voiceprint comparison method, device and equipment for automobile whistling and storage medium
CN115116232B (en) * 2022-08-29 2022-12-09 深圳市微纳感知计算技术有限公司 Voiceprint comparison method, device and equipment for automobile whistling and storage medium

Similar Documents

Publication Publication Date Title
EP3839942A1 (en) Quality inspection method, apparatus, device and computer storage medium for insurance recording
CN111477250B (en) Audio scene recognition method, training method and device for audio scene recognition model
CN110781784A (en) Face recognition method, device and equipment based on double-path attention mechanism
CN110718235B (en) Abnormal sound detection method, electronic device and storage medium
CN108335694B (en) Far-field environment noise processing method, device, equipment and storage medium
CN108172213A (en) Tender asthma audio identification methods, device, equipment and computer-readable medium
CN111639596A (en) Anti-glasses-shielding face recognition method based on attention mechanism and residual error network
CN114418998A (en) Vehicle part defect detection method and system and electronic equipment
CN115116232B (en) Voiceprint comparison method, device and equipment for automobile whistling and storage medium
CN114387977B (en) Voice cutting trace positioning method based on double-domain depth feature and attention mechanism
CN111899760A (en) Audio event detection method and device, electronic equipment and storage medium
CN111091845A (en) Audio processing method and device, terminal equipment and computer storage medium
CN114612987A (en) Expression recognition method and device
CN114944152A (en) Vehicle whistling sound identification method
CN111860253A (en) Multitask attribute identification method, multitask attribute identification device, multitask attribute identification medium and multitask attribute identification equipment for driving scene
CN111539341A (en) Target positioning method, device, electronic equipment and medium
CN111444788A (en) Behavior recognition method and device and computer storage medium
CN113158773B (en) Training method and training device for living body detection model
CN110659631A (en) License plate recognition method and terminal equipment
CN112562727A (en) Audio scene classification method, device and equipment applied to audio monitoring
CN116844567A (en) Depth synthesis audio detection method and system based on multi-feature reconstruction fusion
CN115719428A (en) Face image clustering method, device, equipment and medium based on classification model
CN113761269B (en) Audio recognition method, apparatus and computer readable storage medium
CN117063229A (en) Interactive voice signal processing method, related equipment and system
CN113139561B (en) Garbage classification method, garbage classification device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220826

RJ01 Rejection of invention patent application after publication