CN110706700A

CN110706700A - In-vehicle disturbance prevention alarm method and device, server and storage medium

Info

Publication number: CN110706700A
Application number: CN201910932287.4A
Authority: CN
Inventors: 刘均; 邹鹏
Original assignee: Shenzhen Launch Technology Co Ltd
Current assignee: Shenzhen Launch Technology Co Ltd
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2020-01-17
Anticipated expiration: 2039-09-29
Also published as: CN110706700B

Abstract

The embodiment of the application is suitable for in-vehicle disturbance prevention alarm, and discloses an in-vehicle disturbance prevention alarm method and device, a server and a storage medium, wherein the method comprises the following steps: the server collects conversation contents in the vehicle to obtain a voice signal in the vehicle and analyzes the voice signal to obtain a voice recognition result in the vehicle; the server acquires and analyzes the in-vehicle image information to obtain an in-vehicle image recognition result; the server determines the disturbance level in the vehicle according to the recognition result of the voice in the vehicle and the recognition result of the image in the vehicle; and the server executes corresponding early warning operation according to the disturbance level in the vehicle. By adopting the method and the device, the situation in the networked appointment vehicle can be monitored and judged in real time, the adverse behaviors are avoided in time, and the safety of passengers is effectively guaranteed.

Description

In-vehicle disturbance prevention alarm method and device, server and storage medium

Technical Field

The application relates to the technical field of voice recognition and image recognition, in particular to a method and a device for preventing and alarming disturbance in a vehicle, a server and a storage medium.

Background

With the rapid development of mobile internet, people's mode of going out is abundanter various, and the net car of making an appointment has gradually become a custom of people's trip. Meanwhile, a plurality of problems existing in the network appointment operation process are gradually highlighted, wherein the most important problem is the safety problem.

At present, the online taxi appointment can carry out video monitoring on the behaviors of a driver and passengers through a camera, and the video data is only used as evidence when an accident happens. Therefore, the net appointment car cannot take preventive measures against harassment of drivers or passengers in the event of an accident.

Disclosure of Invention

The embodiment of the application provides an in-vehicle disturbance prevention alarm method and device, a server and a storage medium, so that adverse behaviors are prevented from occurring in time, and riding safety is improved.

In a first aspect, an embodiment of the present application provides an in-vehicle disturbance prevention alarm method, including:

acquiring conversation content in the vehicle to obtain a voice signal in the vehicle, and analyzing to obtain a voice recognition result in the vehicle;

obtaining and analyzing image information in the vehicle to obtain an image identification result in the vehicle;

determining the disturbance level in the vehicle according to the speech recognition result in the vehicle and the image recognition result in the vehicle;

and executing corresponding early warning operation according to the disturbance level in the vehicle.

Optionally, the acquiring of the in-vehicle dialogue content obtains an in-vehicle speech signal and analyzes the in-vehicle speech signal to obtain an in-vehicle speech recognition result, and the method specifically includes:

recording the conversation content in the vehicle through a voice acquisition module to obtain a voice signal in the vehicle;

carrying out feature extraction on the collected in-vehicle voice signal to obtain in-vehicle voice features;

inputting the voice features in the vehicle into an acoustic database, wherein the acoustic database comprises the voice features of at least one sensitive word, and judging whether the voice signals in the vehicle contain the sensitive word or not by comparing the voice features with the voice features of at least one sensitive word in the acoustic database;

and determining the in-vehicle voice recognition result according to the judgment result. Optionally, the in-vehicle speech features include a feature vector sequence composed of feature vectors corresponding to a plurality of speech frames;

the determining whether the in-vehicle voice signal contains the sensitive word by comparing the voice feature with the voice feature of at least one sensitive word in the acoustic database includes:

inputting the feature vector corresponding to each voice frame into a preset sensitive word judgment model to obtain the probability of the existence of the sensitive words in the corresponding voice frame output by the sensitive word judgment model;

and if the in-vehicle voice feature contains a first voice frame, determining that the in-vehicle voice signal contains sensitive words, wherein the probability of the first voice frame containing the sensitive words is greater than or equal to a preset threshold value.

Optionally, the extracting the features of the collected in-vehicle voice signal to obtain the in-vehicle voice features includes:

framing the in-vehicle voice signal, and performing discrete Fourier transform on each voice frame obtained by framing to obtain the frequency spectrum of each voice frame;

calculating the frequency spectrum of each voice frame to obtain an energy spectrum of each voice frame, and filtering the energy spectrum through M Mel band-pass filters to obtain output power spectrums of the M Mel band-pass filters;

and obtaining the static characteristics of each voice frame based on the output power spectrum, calculating a first order difference parameter and a second order difference parameter of the static characteristics to obtain the dynamic characteristics of each voice frame, and calculating the sum of the static characteristics and the dynamic characteristics to obtain a characteristic vector corresponding to each voice frame.

Optionally, the obtaining and analyzing of the image information in the vehicle to obtain the result of the image recognition in the vehicle specifically includes:

acquiring an in-vehicle image;

processing the acquired in-vehicle image by adopting a background subtraction method to obtain a first object contour and a second object contour;

and calculating the distance between the first object contour and the second object contour, and obtaining the in-vehicle image recognition result according to the distance.

Optionally, the processing the in-vehicle image acquired by the vehicle-mounted camera by using the background subtraction method to obtain the first object contour and the second object contour includes:

respectively carrying out gray processing and Gaussian blur smoothing processing on the in-vehicle image and the in-vehicle background image which does not contain the character object to obtain a first in-vehicle image and a first in-vehicle background image;

processing the first vehicle interior image and the first vehicle interior background image by adopting a background subtraction method to obtain a plurality of contour images; obtaining coordinates of contour points and coordinates of feature points on the contour image through a pixel distribution histogram and a pixel gradient distribution diagram of the contour image, wherein the feature points comprise the leftmost side of the head, the leftmost side of the shoulders and the rightmost side of the shoulders;

and calculating a head-shoulder ratio according to coordinates of feature points on the contour image, if the head-shoulder ratio is within a preset range, confirming that the contour image is a human-shaped contour, marking the contour image positioned in a specific area as the first object contour, and marking the rest human-shaped contours as the second object contour.

Optionally, the calculating a distance between the first object contour and the second object contour, and obtaining the in-vehicle image recognition result according to the distance includes:

in a preset time, if the number of times of appearance of a second in-vehicle image acquired by the vehicle-mounted camera is greater than or equal to a preset number of times, the in-vehicle image recognition result is that limb contact exists, and the second in-vehicle image comprises a first object outline and a second object outline, wherein the minimum distance between the first object outline and the second object outline is equal to the preset threshold value.

Optionally, the in-vehicle disturbance rating includes very severe, mild, general;

the executing of the corresponding early warning operation according to the disturbance level in the vehicle specifically comprises the following steps:

when the disturbance level in the vehicle is very serious or serious, voice warning is carried out and an alarm is given to a rescue service system;

when the disturbance level in the vehicle is slight, voice warning is carried out;

and when the disturbance level in the vehicle is general, not processing.

In a second aspect, an in-vehicle disturbance prevention alarm device is provided for an embodiment of the present application, including:

the in-vehicle voice recognition unit is used for acquiring in-vehicle conversation content to obtain in-vehicle voice signals and analyzing the in-vehicle voice signals to obtain in-vehicle voice recognition results;

the in-vehicle image recognition unit is used for acquiring and analyzing in-vehicle image information to obtain an in-vehicle image recognition result;

the in-vehicle disturbance grade determining unit is used for determining an in-vehicle disturbance grade according to the in-vehicle voice recognition result and the in-vehicle image recognition result;

and the early warning operation unit is used for executing corresponding early warning operation according to the disturbance level in the vehicle.

Optionally, the in-vehicle speech recognition unit acquires and analyzes in-vehicle speech signals by collecting in-vehicle dialogue content, and obtains in-vehicle speech recognition results, which specifically includes:

the voice acquisition subunit is used for recording the conversation content in the vehicle through the in-vehicle voice acquisition module to obtain an in-vehicle voice signal;

the voice feature extraction subunit is used for performing feature extraction on the collected in-vehicle voice signals to obtain in-vehicle voice features; the in-vehicle voice features comprise a feature vector sequence consisting of feature vectors corresponding to a plurality of voice frames;

the voice recognition subunit is used for inputting the in-vehicle voice features into an acoustic database, the acoustic database comprises the voice features of at least one sensitive word, and whether the in-vehicle voice signals contain the sensitive words or not is judged by comparing the voice features with the voice features of at least one sensitive word in the acoustic database;

and the recognition result determining subunit is used for determining the in-vehicle voice recognition result according to the judgment result.

Optionally, the speech recognition subunit is specifically configured to input the feature vector corresponding to each speech frame into a preset sensitive word judgment model, so as to obtain a probability that a corresponding speech frame output by the sensitive word judgment model has a sensitive word;

the voice recognition subunit is specifically configured to, if the in-vehicle voice feature includes a first voice frame, determine that the in-vehicle voice signal includes a sensitive word, where a probability of the first voice frame having the sensitive word is greater than or equal to a preset threshold.

Optionally, the speech feature extraction subunit is specifically configured to perform framing on the in-vehicle speech signal, and perform discrete fourier transform on each speech frame obtained through framing to obtain a frequency spectrum of each speech frame;

the speech feature extraction subunit is specifically configured to calculate a frequency spectrum of each speech frame to obtain an energy spectrum of each speech frame, and filter the energy spectrum through M Mel band-pass filters to obtain output power spectra of the M Mel band-pass filters;

the speech feature extraction subunit is specifically configured to obtain a static feature of each speech frame based on the output power spectrum, calculate a first order difference parameter and a second order difference parameter of the static feature to obtain a dynamic feature of each speech frame, and calculate a sum of the static feature and the dynamic feature to obtain a feature vector corresponding to each speech frame.

Optionally, the in-vehicle image recognition unit specifically includes, in terms of obtaining and analyzing in-vehicle image information and obtaining an in-vehicle image recognition result:

the image acquisition subunit is used for acquiring images in the vehicle;

the object contour acquisition subunit is used for processing the acquired in-vehicle image by adopting a background subtraction method to obtain a first object contour and a second object contour;

and the image recognition result subunit is used for calculating the distance between the first object outline and the second object outline and obtaining the in-vehicle image recognition result according to the distance. Optionally, the object contour obtaining subunit is specifically configured to perform gray processing and gaussian blur smoothing processing on the in-vehicle image and the in-vehicle background image that does not include the person object, respectively, to obtain a first in-vehicle image and a first in-vehicle background image;

the object contour obtaining subunit is specifically configured to process the first in-vehicle image and the first in-vehicle background image by using a background subtraction method to obtain a plurality of contour images; obtaining coordinates of contour points and coordinates of feature points on the contour image through a pixel distribution histogram and a pixel gradient distribution diagram of the contour image, wherein the feature points comprise the leftmost side of the head, the leftmost side of the shoulders and the rightmost side of the shoulders;

the object contour obtaining subunit is specifically configured to calculate a head-shoulder ratio according to coordinates of feature points on the contour image, and if the head-shoulder ratio is within a preset range, determine that the contour image is a human-shaped contour, mark the contour image located in a specific area as the first object contour, and mark the remaining human-shaped contours as the second object contour.

Optionally, the image recognition result subunit is specifically configured to: in a preset time, if the number of times of appearance of a second in-vehicle image acquired by the vehicle-mounted camera is greater than or equal to a preset number of times, the in-vehicle image recognition result is that limb contact exists, and the second in-vehicle image comprises a first object outline and a second object outline, wherein the minimum distance between the first object outline and the second object outline is equal to the preset threshold value.

Optionally, the level of in-vehicle disturbance includes very severe, mild, and general.

The early warning operation unit executes corresponding early warning operation according to the disturbance level in the vehicle, and specifically comprises the following steps:

the first early warning subunit is used for carrying out voice warning and giving an alarm to a rescue service system when the disturbance level in the vehicle is very serious or serious;

the second early warning subunit is used for carrying out voice warning when the disturbance level in the vehicle is slight;

and the third early warning subunit is used for not processing the disturbance level in the vehicle when the disturbance level in the vehicle is general.

In a third aspect, a server is provided for an embodiment of the present application, and includes a processor, a memory, and a transceiver, where the processor, the memory, and the transceiver are connected to each other, where the memory is used to store a computer program that supports the electronic device to execute the in-vehicle disturbance prevention alarm method, and the computer program includes program instructions; the processor is configured to call the program instructions to execute the in-vehicle disturbance prevention alarm method as described in an aspect of the embodiment of the present application.

In a fourth aspect, a storage medium is provided for embodiments of the present application, the storage medium storing a computer program, the computer program comprising program instructions; the program instructions, when executed by a processor, cause the processor to perform a method of in-vehicle disturbance prevention warning as described in an aspect of an embodiment of the present application.

In the embodiment of the application, the server acquires and analyzes the in-vehicle voice signal by acquiring the in-vehicle conversation content to obtain an in-vehicle voice recognition result; the server acquires and analyzes the in-vehicle image information to obtain an in-vehicle image recognition result; the server determines the disturbance level in the vehicle according to the recognition result of the voice in the vehicle and the recognition result of the image in the vehicle; and the server executes corresponding early warning operation according to the disturbance level in the vehicle. Because the server determines the disturbance level in the vehicle by analyzing the voice signals and the images in the vehicle in real time, and then timely acquires the disturbance prevention alarm operation in the vehicle according to the determined disturbance level in the vehicle, the occurrence of bad behaviors can be effectively avoided, and the safety of passengers is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a scene schematic diagram of an in-vehicle disturbance prevention alarm method provided in an embodiment of the present application;

fig. 2 is a schematic flow chart of an in-vehicle disturbance prevention alarm method according to an embodiment of the present application;

FIG. 3 is a schematic flowchart of an in-vehicle speech recognition method according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of an in-vehicle image recognition method according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a first object contour and a second object contour obtained by processing an in-vehicle image according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an in-vehicle disturbance prevention alarm device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Please refer to fig. 1, which is a scene schematic diagram of an in-vehicle disturbance prevention alarm method according to an embodiment of the present application. As shown in fig. 1, a voice acquisition module 101 (such as a microphone) in a vehicle-mounted device end 100 records conversation contents of a passenger and a driver in a vehicle, meanwhile, a camera 102 in the vehicle acquires image information in the vehicle, the vehicle-mounted device end 100 uploads the acquired voice signal in the vehicle and the image information in the vehicle to a server 103, the server 103 identifies the voice signal in the vehicle and the image information in the vehicle, determines a disturbance level in the vehicle according to whether a sensitive word is included in the voice signal in the vehicle and whether a recognition result of body contact exists in the image in the vehicle, and then executes a corresponding early warning operation according to the determined disturbance level in the vehicle. For example, if the obtained recognition result is that the in-vehicle voice signal contains the sensitive word and the in-vehicle image does not have limb contact, the server 103 determines the in-vehicle disturbance level as slight, and sends a voice warning instruction to the vehicle-mounted device end 100 according to the slight disturbance level, so that the vehicle-mounted device end 100 sends a voice warning of "please notice the civilization term" to the in-vehicle personnel; if the obtained recognition result is that the in-vehicle voice signal does not contain the sensitive words and the in-vehicle image information has limb contact, the server 103 determines the in-vehicle disturbance level as serious, sends a message that the limb contact exists to the network car appointment monitoring platform 104, and sends a voice warning instruction to the vehicle-mounted equipment end 100, so that the vehicle-mounted equipment end 100 sends a voice warning of 'please pay attention to your behavior for lifting' to the in-vehicle personnel, and dials 110, thereby realizing effective prevention and alarm for disturbance or violent behaviors existing in the vehicle.

Please refer to fig. 2, which is a schematic flow chart of an in-vehicle disturbance prevention alarm method according to an embodiment of the present application. As shown in fig. 2, the method embodiment comprises the following steps:

s101, the server collects conversation content in the vehicle to obtain voice signals in the vehicle and analyzes the voice signals to obtain voice recognition results in the vehicle.

Specifically, a specific implementation manner of analyzing the in-vehicle voice signal by the server to obtain the in-vehicle voice recognition result may refer to fig. 3, and fig. 3 is a flowchart of a method for in-vehicle voice recognition provided in the embodiment of the present application. As shown in fig. 3, the embodiment of the method specifically includes the following steps:

s201, recording conversation contents in the vehicle by the server through the voice acquisition module in the vehicle to obtain voice signals in the vehicle.

For example, the in-vehicle voice acquisition module records and samples conversation contents of passengers and drivers in the vehicle at the frequency of 8KHz, that is, 8K sampling points are acquired in 1 second, and the in-vehicle voice signals acquired in real time are uploaded to the server.

S202, the server extracts the features of the collected voice signals in the vehicle to obtain the voice features in the vehicle.

In one possible implementation manner, the server obtains the in-vehicle voice feature of the in-vehicle voice signal by the following steps:

firstly, the server frames the in-vehicle voice signal and performs discrete Fourier transform on each voice frame obtained through framing to obtain the frequency spectrum of each voice frame.

Specifically, since the speech signal has short-term stationarity, i.e., the speech characteristics remain relatively unchanged in a short time, the server needs to frame the in-vehicle speech signal, i.e., divide the in-vehicle speech signal into segments to analyze the characteristic parameters. The server can intercept the speech signal in the car to form an analysis frame by using a window function with a limited length, and the window function sets sampling points outside a region needing to be processed to zero to obtain the current speech frame.

Alternatively, the window function in the embodiment of the present application may use a hamming window function, that is,

where N is the frame length, which is 256 or 512.

Then obtaining the in-vehicle voice signal S corresponding to the nth moment after windowing_ω(n) that is, a group of,

S_ω(n)＝S(n)×ω(n)

wherein, s (n) is the in-vehicle voice signal corresponding to the nth time, i.e. the voice sampling value at the n time.

Specifically, because there is a radiation effect of lips during human pronunciation, the voice signal loses high-frequency components, and the signal is greatly damaged in the transmission process along with the increase of the signal rate, and in order to obtain a better signal waveform, the damaged signal needs to be compensated_ω(n) performing pre-emphasis by using y (n) ═ x (n) -ax (n-1) to the windowed in-vehicle speech signal S_ω(n), wherein x (n) is the in-vehicle voice signal S processed by windowing at the nth time_ωAnd (n) the speech sample value a is a pre-emphasis coefficient, the value of a is between 0.9 and 1, and exemplarily, a is 0.9375, and y (n) is a signal subjected to pre-emphasis processing. Can be understood asThe pre-emphasis process compensates the high-frequency component by passing the voice signal through a high-pass filter, thereby reducing the high-frequency loss caused by the process of lip pronunciation or microphone recording.

Specifically, since the characteristics of the signal are difficult to find by transforming the signal in the time domain, the signal needs to be observed by transforming the signal into the energy distribution in the frequency domain, and different energy distributions can represent the characteristics of different speech signals. Therefore, after windowing and pre-emphasis processing is performed on the in-vehicle voice signal to obtain each voice frame of the in-vehicle voice signal, fast fourier transform is also required to be performed on each voice frame to obtain the frequency spectrum of each voice frame. Illustratively, the spectrum of each speech frame can be obtained by performing a discrete fourier transform on each speech frame according to the following formula.

Wherein, x (N) is the in-vehicle voice signal after the windowing pre-emphasis processing, and N represents the point number of Fourier transform.

Secondly, the server calculates the frequency spectrum of each voice frame to obtain an energy spectrum of each voice frame, and filters the energy spectrum through M Mel band-pass filters to obtain output power spectrums of the M Mel band-pass filters.

Specifically, the server squares the spectrum amplitude of each speech frame to obtain the energy spectrum of each speech frame, and since the cochlea is equivalent to a filter bank when the human ear distinguishes speech, the speech is filtered in the logarithmic domain, that is, compared with the frequency f, the mel frequency f_Mel2595 × log (1+ f/700) is closer to the auditory mechanism of human ears, so the energy spectrum of each speech frame needs to be passed through a group of Mel frequency filter banks (M Mel band pass filters) to obtain the output power spectrum of the M Mel band pass filters.

And thirdly, obtaining the static characteristics of each voice frame based on the output power spectrum, calculating a first order difference parameter and a second order difference parameter of the static characteristics by the server to obtain the dynamic characteristics of each voice frame, and calculating the sum of the static characteristics and the dynamic characteristics to obtain a characteristic vector corresponding to each voice frame.

Specifically, the server logarithms the output power spectrum, and then performs inverse discrete cosine change to obtain a plurality of MFCC Coefficients (Mel-Frequency Cepstral Coefficients, Mel-Frequency cepstrum Coefficients), that is, static features, which are generally 12 to 16, and the static features can be calculated by the following formula:

where x (k) is the output power spectrum of each Mel band-pass filter, C₀Is the spectral energy.

And then, the server performs first-order and second-order difference on the static characteristics and the spectral energy to obtain dynamic characteristics, and sums the static characteristics and the dynamic characteristics to obtain a characteristic vector corresponding to each voice frame.

S203, the server inputs the in-vehicle voice features into an acoustic database, the acoustic database comprises voice features of at least one sensitive word, and whether the in-vehicle voice signal contains the sensitive word or not is judged by comparing the voice features with the voice features of the at least one sensitive word in the acoustic database.

In a possible implementation manner, the in-vehicle speech feature includes a feature vector sequence composed of feature vectors corresponding to a plurality of speech frames;

Wherein the acoustic database includes at least one sensitive word that may include abuse, kiss, harassment, follow you, etc., and speech characteristics corresponding to the sensitive word.

Specifically, the server trains the speech features of the fixed sensitive words by using an HMM model (Hidden Markov model) based on the fixed sensitive words to obtain HMM models of the sensitive words, and globally connects the obtained HMM models of the sensitive words to form a global HMM model, namely an acoustic database.

In a possible implementation manner, the server inputs the feature vector of each speech frame of the in-vehicle speech signal into the global HMM model, obtains the probability of the presence of the sensitive word in the corresponding speech frame through the Viterbi algorithm, and determines that the in-vehicle speech signal contains the sensitive word if the probability of the presence of the sensitive word in one speech frame is greater than or equal to a preset threshold. For example, the server processes the in-vehicle speech signal to obtain a feature vector sequence consisting of feature vectors corresponding to 100 speech frames, inputs the feature vector sequence into the global HMM model, and obtains probabilities of presence of sensitive words of the speech frames being 0.2, 0.4, and 0.7 … … through the Viterbi algorithm, wherein if the probability of presence of sensitive words of the third speech frame is 0.7 and is greater than the preset threshold value 0.6, it is determined that the in-vehicle speech signal contains a keyword.

Further, in another possible implementation manner, the feature vectors of each speech frame of the in-vehicle speech signal of the server are input into a global HMM model, the optimal word output result generated by each speech frame is found out through a Viterbi algorithm, an optimal state path is searched out, and if the optimal state path includes a subsequence such that each state in the subsequence is a state in a certain sensitive word HMM model, the server confirms that the in-vehicle speech signal includes a sensitive word, and obtains the content of the sensitive word included in the in-vehicle speech signal. Step S204 is then performed.

And S204, the server determines the in-vehicle voice recognition result according to the judgment result.

Specifically, if the judgment result in the step S203 is that there is a sensitive word, the server obtains the in-vehicle voice recognition result as the presence-sensitive word; if the determination result in the step S203 is that no sensitive word exists, the server obtains that the in-vehicle voice recognition result is that no sensitive word exists.

And S102, the server acquires and analyzes the image information in the vehicle to obtain the image identification result in the vehicle.

Specifically, a specific implementation manner of analyzing the in-vehicle image information by the server to obtain the in-vehicle image recognition result may refer to fig. 4, and fig. 4 is a flowchart of a method for recognizing the in-vehicle image according to an embodiment of the present application. As shown in fig. 4, the embodiment of the method specifically includes the following steps:

s301, the server collects images in the vehicle.

Specifically, the server collects scenes in the vehicle at a certain frequency through a vehicle-mounted camera in the vehicle-mounted equipment terminal, so as to obtain images in the vehicle. For example, the vehicle-mounted camera acquires an in-vehicle scene every 5 seconds to obtain a corresponding in-vehicle image.

S302, the server processes the acquired in-vehicle image by adopting a background subtraction method to obtain a first object outline and a second object outline.

In a possible implementation manner, the server performs gray processing and gaussian fuzzy smoothing processing on the in-vehicle image and the in-vehicle background image which does not contain the person object respectively to obtain a first in-vehicle image and a first in-vehicle background image;

Wherein the first object contour is a driver contour and the second object contour is a passenger contour.

Specifically, the server selects an in-vehicle scene image without containing a person as an in-vehicle background image, and then performs Gray processing on the in-vehicle background image and the in-vehicle image acquired by the vehicle-mounted camera, and since image pixel information can be represented by RGB information, the in-vehicle background image and the in-vehicle image can be subjected to Gray processing by using a formula Gray of 0.3R +0.6G +0.1B, so as to obtain the Gray-processed in-vehicle image and the in-vehicle background image, where Gray is a Gray value of a pixel, and R, G, and B are RGB values in the pixel. Then, the server performs Gaussian blur smoothing processing on the in-vehicle image and the in-vehicle background image after the gray processing to obtain a first in-vehicle image I_iAnd a first in-vehicle background image I₀And the noise reduction of the image in the vehicle and the background image in the vehicle is effectively realized.

The server adopts the background subtraction method to carry out the first in-vehicle image I_iAnd a first in-vehicle background image I₀Processing the image to obtain a contour image I ═ I containing a plurality of contours_i-I₀|，I_iAnd carrying out gray processing and Gaussian smoothing processing on the ith in-vehicle image acquired by the vehicle-mounted camera. Then, the number of pixels with the gray value of 255 is counted in the horizontal direction and the vertical direction of the outline image I respectively to obtain pixel distribution histograms in the horizontal direction and the vertical direction, and the pixel gradient distribution histograms in the horizontal direction and the vertical direction are calculated to obtain coordinates of the outline points and coordinates of feature points on the outline image I, wherein the feature points comprise the leftmost side of the head, the leftmost side of the shoulders and the rightmost side of the shoulders.

Then, the server sets the formula y as y₁/y₂Calculating a head-shoulder ratio, wherein y₁May be the distance between the leftmost point of the head and the rightmost point of the head, y₂The distance between two points on the leftmost side of the shoulder and the rightmost side of the shoulder can be calculated, if the calculated head-shoulder ratio is within the range of the preset humanoid outline, the outline is determined to be the humanoid outline, the humanoid outline positioned in a specific area is marked as a driver outline, and the rest humanoid outlines in the outline image are marked as passenger outlines, wherein if the vehicle-mounted camera is positionedIn front of the driver and the passenger, the human-shaped contour on the right side in the contour image is the driver contour, and the remaining human-shaped contour is the passenger contour.

S303, the server calculates the distance between the first object outline and the second object outline, and obtains an image recognition result according to the distance.

Wherein the image recognition result comprises the existence of limb contact and the nonexistence of limb contact.

In a possible implementation manner, within a preset time, if the number of times of occurrence of a second in-vehicle image acquired by the vehicle-mounted camera is greater than or equal to a preset number of times, the image recognition result is that there is limb contact, and the second in-vehicle image includes that a minimum distance between the first object contour and the second object contour is equal to the preset threshold.

For example, after the server performs step S301, a first object contour and a second object contour are obtained, please refer to fig. 5, which is a schematic diagram of the first object contour and the second object contour obtained by processing the in-vehicle image. As shown in fig. 5, a may be a first object contour, i.e., a passenger contour, and B may be a second object contour, i.e., a driver contour. Calculating the distance between two points a (i) of the first object contour A intersected with horizontal lines with different heights, a (i +1) of the first object contour A and two points B (i) of the second object contour B intersected with the same horizontal line, and B (i +1), obviously, the distance between the point a (9) and the point B (8) is 0 or more at the minimum, determining that the image in the vehicle is the first image in the vehicle, and if the number of times of the first image in the vehicle is 3 or more than the preset number of times 2 within the preset time of 20 seconds, obtaining the image recognition result that the limb contact exists. It can be understood that, by calculating the number of in-vehicle images satisfying that the minimum distance between the first object contour and the second object contour is equal to the preset threshold in tf in-vehicle images acquired by the camera at the frequency f within the preset time t, if the number of in-vehicle images satisfying the above condition is greater than or equal to the preset number (preset number), the image recognition result is that there is a body contact. Therefore, the situation that people in the automobile are in unconscious limb contact can be effectively avoided, the times of misjudgment are reduced, and the correct recognition rate of limb contact is improved.

S103, the server determines the disturbance level in the vehicle according to the in-vehicle voice recognition result and the in-vehicle image recognition result.

Wherein the level of the disturbance in the vehicle comprises very serious, slight and general.

Specifically, the server compares the in-vehicle voice recognition result and the in-vehicle image recognition result with a preset in-vehicle disturbance level comparison table (shown in table 1), and then determines the in-vehicle disturbance level.

TABLE 1 comparison table of disturbance levels in vehicle

For example, if the server performs step S101 and step S102 to obtain that the in-vehicle speech recognition result indicates that there is no sensitive word and the in-vehicle image recognition result indicates that there is a body contact, the in-vehicle disturbance level is obtained as serious by comparing a preset in-vehicle disturbance level comparison table (shown in table 1).

And S104, the server executes corresponding early warning operation according to the disturbance level in the vehicle.

In a possible implementation manner, the server executes a corresponding early warning operation according to the in-vehicle disturbance level, and specifically includes:

and when the disturbance level in the vehicle is general, not processing.

For example, if the level of the disturbance in the vehicle obtained in step S103 is very serious, that is, the in-vehicle voice recognition result is that the sensitive word exists and the in-vehicle image recognition result is that the body contact exists, the server sends a message of "the sensitive word exists in the vehicle and the body contact exists" to the network appointment monitoring platform, and simultaneously sends a voice warning instruction to the vehicle-mounted device end, so that the vehicle-mounted device end sends a voice warning of "please pay attention to your speech for lifting" to the personnel in the vehicle, and simultaneously dials 110.

For another example, if the in-vehicle disturbance level obtained in step S103 is serious, that is, the in-vehicle voice recognition result indicates that there is no sensitive word and the in-vehicle image recognition result indicates that there is limb contact, the server sends a message of "there is limb contact in the vehicle" to the network car appointment monitoring platform, and simultaneously sends a voice warning instruction to the vehicle-mounted device end, so that the vehicle-mounted device end sends a voice warning of "please notice your behavior" to the in-vehicle personnel and simultaneously dials 110.

For another example, if the in-vehicle disturbance level obtained in step S103 is slight, that is, the in-vehicle speech recognition result is that there is a sensitive word and the in-vehicle image recognition result is that there is no body contact, the server sends a speech warning instruction to the vehicle-mounted device end, so that the vehicle-mounted device end sends a speech warning of "please notice the civilization term" to the in-vehicle personnel.

For another example, if the level of the disturbance in the vehicle obtained in step S103 is normal, that is, the in-vehicle speech recognition result indicates that there is no sensitive word and the in-vehicle image recognition result indicates that there is no body contact, the server does not perform any operation.

Please refer to fig. 6, which is a schematic structural diagram of an in-vehicle disturbance prevention alarm device according to an embodiment of the present application. As shown in fig. 6, the in-vehicle disturbance prevention alarm device includes an in-vehicle voice recognition unit 601, an in-vehicle image recognition unit 602, an in-vehicle disturbance level determination unit 603, and an early warning operation unit 604.

The in-vehicle voice recognition unit 601 is used for acquiring in-vehicle conversation content to obtain in-vehicle voice signals and analyzing the in-vehicle voice signals to obtain in-vehicle voice recognition results;

the in-vehicle image recognition unit 602 is configured to obtain and analyze in-vehicle image information to obtain an in-vehicle image recognition result;

an in-vehicle disturbance level determination unit 603, configured to determine an in-vehicle disturbance level according to the in-vehicle voice recognition result and the in-vehicle image recognition result;

and the early warning operation unit 604 is configured to execute a corresponding early warning operation according to the in-vehicle disturbance level.

Optionally, the in-vehicle speech recognition unit 601 acquires and analyzes an in-vehicle speech signal by collecting in-vehicle dialogue content, and specifically includes:

a voice acquisition subunit 6011, configured to record the in-vehicle conversation content through the in-vehicle voice acquisition module, so as to obtain an in-vehicle voice signal;

a voice feature extraction subunit 6012, configured to perform feature extraction on the collected in-vehicle voice signal to obtain an in-vehicle voice feature; the in-vehicle voice features comprise a feature vector sequence consisting of feature vectors corresponding to a plurality of voice frames;

a voice recognition subunit 6013, configured to input the in-vehicle voice feature into an acoustic database, where the acoustic database includes a voice feature of at least one sensitive word, and determine whether the in-vehicle voice signal includes the sensitive word by comparing the voice feature with the voice feature of the at least one sensitive word in the acoustic database;

a recognition result determining subunit 6014, configured to determine the in-vehicle speech recognition result according to the determination result.

Optionally, the speech recognition subunit 6013 is specifically configured to input the feature vector corresponding to each speech frame into a preset sensitive word judgment model, so as to obtain a probability that a corresponding speech frame output by the sensitive word judgment model has a sensitive word;

the speech recognition subunit 6013 is specifically configured to, if the in-vehicle speech feature includes a first speech frame, determine that the in-vehicle speech signal includes a sensitive word, where a probability that the sensitive word exists in the first speech frame is greater than or equal to a preset threshold.

Optionally, the speech feature extraction subunit 6012 is specifically configured to perform framing on the in-vehicle speech signal, and perform discrete fourier transform on each speech frame obtained through framing to obtain a frequency spectrum of each speech frame;

the speech feature extraction subunit 6012 is specifically configured to calculate a frequency spectrum of each speech frame to obtain an energy spectrum of each speech frame, and filter the energy spectrum through M Mel band-pass filters to obtain output power spectra of the M Mel band-pass filters;

the speech feature extraction subunit 6012 is specifically configured to obtain a static feature of each speech frame based on the output power spectrum, calculate a first order difference parameter and a second order difference parameter of the static feature to obtain a dynamic feature of each speech frame, and calculate a sum of the static feature and the dynamic feature to obtain a feature vector corresponding to each speech frame.

Optionally, the in-vehicle image recognition unit 602 specifically includes, in terms of obtaining and analyzing in-vehicle image information to obtain an in-vehicle image recognition result:

the image acquisition subunit 6021 is used for acquiring images in the vehicle;

an object contour acquisition subunit 6022, configured to process the acquired in-vehicle image by using a background subtraction method to obtain a first object contour and a second object contour;

an image recognition result subunit 6023, configured to calculate a distance between the first object contour and the second object contour, and obtain the in-vehicle image recognition result according to the distance.

Optionally, the object contour obtaining subunit 6022 is specifically configured to perform gray processing and gaussian blur smoothing processing on the in-vehicle image and the in-vehicle background image that does not include the human object, respectively, to obtain a first in-vehicle image and a first in-vehicle background image;

the object contour obtaining subunit 6022 is specifically configured to process the first in-vehicle image and the first in-vehicle background image by a background subtraction method to obtain a plurality of contour images; obtaining coordinates of contour points and coordinates of feature points on the contour image through a pixel distribution histogram and a pixel gradient distribution diagram of the contour image, wherein the feature points comprise the leftmost side of the head, the leftmost side of the shoulders and the rightmost side of the shoulders;

the object contour acquiring subunit 6022 is specifically configured to calculate a head-shoulder ratio according to coordinates of feature points on the contour image, and if the head-shoulder ratio is within a preset range, determine that the contour image is a human-shaped contour, mark the contour image located in a specific area as the first object contour, and mark the remaining human-shaped contours as the second object contour.

Optionally, the image recognition result subunit 6023 is specifically configured to: in a preset time, if the number of times of appearance of a second in-vehicle image acquired by the vehicle-mounted camera is greater than or equal to a preset number of times, the in-vehicle image recognition result is that limb contact exists, and the second in-vehicle image comprises a first object outline and a second object outline, wherein the minimum distance between the first object outline and the second object outline is equal to the preset threshold value.

The early warning operation unit 604 specifically includes, in terms of executing corresponding early warning operation according to the in-vehicle disturbance level:

a first warning subunit 6041, configured to, when the level of the in-vehicle disturbance is very serious or serious, perform a voice warning and alarm a rescue service system;

a second warning subunit 6042, configured to perform a voice warning when the level of the in-vehicle disturbance is slight;

and a third warning subunit 6043, configured to not perform processing when the level of the in-vehicle disturbance is normal. It is understood that the in-vehicle disturbance prevention alarm device 600 is used for implementing the steps executed by the server in the embodiment of fig. 2. As to the specific implementation manner and the corresponding beneficial effects of the functional blocks included in the vehicle interior disturbance prevention alarm device 600 in fig. 6, reference may be made to the specific description of the embodiment in fig. 2, which is not repeated herein.

The in-vehicle disturbance prevention alarm device 600 in the embodiment shown in fig. 6 described above may be implemented by the server 700 shown in fig. 7. Please refer to fig. 7, which provides a schematic structural diagram of a server according to an embodiment of the present application. As shown in fig. 7, the server 700 may include: one or more processors 701, a memory 702, and a transceiver 703. The processor 701, the memory 702, and the transceiver 703 are connected by a bus 704. The transceiver 703 is configured to acquire an in-vehicle audio signal and an in-vehicle image or send an audio warning command, and the memory 702 is configured to store a computer program, where the computer program includes a program command; the processor 701 is configured to execute the program instructions stored in the memory 702 to perform the following operations:

Optionally, the processor 701 acquires and analyzes in-vehicle dialogue content to obtain an in-vehicle speech signal, and obtains an in-vehicle speech recognition result, and specifically performs the following operations:

and determining the in-vehicle voice recognition result according to the judgment result. Optionally, the in-vehicle speech features include a feature vector sequence composed of feature vectors corresponding to a plurality of speech frames. Optionally, the processor 701 compares the voice feature with a voice feature of at least one sensitive word in the acoustic database to determine whether the in-vehicle voice signal contains the sensitive word, and specifically performs the following operations:

Optionally, the processor 701 performs feature extraction on the collected in-vehicle voice signal to obtain in-vehicle voice features, and specifically performs the following operations:

Optionally, the processor 701 obtains and analyzes in-vehicle image information to obtain an in-vehicle image recognition result, and specifically performs the following operations:

acquiring an in-vehicle image;

processing an in-vehicle image acquired by a vehicle-mounted camera by adopting a background subtraction method to obtain a first object contour and a second object contour;

and calculating the distance between the first object contour and the second object contour, and obtaining the in-vehicle image recognition result according to the distance. Optionally, the server 701 processes an in-vehicle image acquired by the vehicle-mounted camera by using a background subtraction method to obtain a first object contour and a second object contour, and specifically performs the following operations:

Optionally, the server 701 calculates a distance between the first object contour and the second object contour, obtains the in-vehicle image recognition result according to the distance, and specifically performs the following operations: in a preset time, if the number of times of appearance of a second in-vehicle image acquired by the vehicle-mounted camera is greater than or equal to a preset number of times, the in-vehicle image recognition result is that limb contact exists, and the second in-vehicle image comprises a first object outline and a second object outline, wherein the minimum distance between the first object outline and the second object outline is equal to the preset threshold value.

the server 701 executes corresponding early warning operation according to the in-vehicle disturbance level, and specifically executes the following operation:

and when the disturbance level in the vehicle is general, not processing.

In the embodiment of the present application, a computer storage medium may be provided, which may be used to store computer software instructions for the in-vehicle disturbance prevention alarm apparatus in the embodiment shown in fig. 6, and includes a program designed to execute the in-vehicle disturbance prevention alarm apparatus in the embodiment described above. The storage medium includes, but is not limited to, flash memory, hard disk, solid state disk.

In the embodiment of the present application, a computer program product is also provided, and when being executed by a computing device, the computer program product may execute the in-vehicle disturbance prevention alarm apparatus designed in the embodiment shown in fig. 6.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

In the present application, "a and/or B" means one of the following cases: a, B, A and B. "at least one of … …" refers to any combination of the listed items or any number of the listed items, e.g., "at least one of A, B and C" refers to one of: any one of seven cases, a, B, C, a and B, B and C, a and C, A, B and C.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block or blocks.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An in-vehicle disturbance prevention alarm method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the acquiring of the in-vehicle dialogue content obtains an in-vehicle speech signal and analyzes the in-vehicle speech signal to obtain an in-vehicle speech recognition result, specifically comprising:

and determining the in-vehicle voice recognition result according to the judgment result.

3. The method of claim 2, wherein the in-vehicle speech features comprise a sequence of feature vectors comprising feature vectors corresponding to a plurality of speech frames;

4. The method according to claim 2, wherein the performing feature extraction on the collected in-vehicle voice signal to obtain in-vehicle voice features comprises:

5. The method according to claim 1, wherein the obtaining and analyzing of the in-vehicle image information to obtain the in-vehicle image recognition result specifically comprises:

acquiring an in-vehicle image;

6. The method of claim 5, wherein the processing the acquired in-vehicle image using background subtraction to obtain the first object contour and the second object contour comprises:

7. The method of claim 5, wherein calculating a distance between the first object contour and the second object contour, and wherein deriving the in-vehicle image recognition result from the distance comprises:

in a preset time, if the occurrence frequency of the collected second in-vehicle image is greater than or equal to a preset frequency, the in-vehicle image recognition result is that limb contact exists, and the second in-vehicle image comprises that the minimum distance between the first object outline and the second object outline is equal to the preset threshold value.

8. The method of any of claims 1-7, wherein the in-vehicle disturbance ratings include very severe, mild, general;

and when the disturbance level in the vehicle is general, not processing.

9. An in-vehicle disturbance prevention alarm device, characterized by comprising:

the in-vehicle disturbance grade determining unit is used for determining an in-vehicle disturbance grade according to the in-vehicle voice recognition result and the in-vehicle image recognition result; and the early warning operation unit is used for executing corresponding early warning operation according to the disturbance level in the vehicle.

10. A server comprising a processor, a memory, and a transceiver;

the memory is used for storing a computer program supporting the multimedia terminal to execute the in-vehicle disturbance prevention alarm method, and the computer program comprises program instructions;

the processor is configured to call the program instructions to execute the in-vehicle disturbance prevention alarm method according to any one of claims 1 to 8.