CN116229987B - Campus voice recognition method, device and storage medium - Google Patents

Campus voice recognition method, device and storage medium Download PDF

Info

Publication number
CN116229987B
CN116229987B CN202211592939.2A CN202211592939A CN116229987B CN 116229987 B CN116229987 B CN 116229987B CN 202211592939 A CN202211592939 A CN 202211592939A CN 116229987 B CN116229987 B CN 116229987B
Authority
CN
China
Prior art keywords
voice
information
campus
violent
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211592939.2A
Other languages
Chinese (zh)
Other versions
CN116229987A (en
Inventor
郑桂鹏
刘芝秉
李景恒
林弟
张常华
朱正辉
赵定金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Baolun Electronics Co ltd
Original Assignee
Guangdong Baolun Electronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Baolun Electronics Co ltd filed Critical Guangdong Baolun Electronics Co ltd
Priority to CN202211592939.2A priority Critical patent/CN116229987B/en
Publication of CN116229987A publication Critical patent/CN116229987A/en
Application granted granted Critical
Publication of CN116229987B publication Critical patent/CN116229987B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a campus voice recognition method, a device and a storage medium, wherein the method comprises the following steps: acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information; inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not; if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining the sound source information in the voice information according to the voice print scale factor; wherein the sound source information includes: the number of people sending out voice information of the voice and the position direction of the people; and sending the first audio signal data, the position information and the sound source information of the first campus voice equipment to a management system to realize recognition and positioning of violent voices in the campus.

Description

Campus voice recognition method, device and storage medium
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, and a storage medium for campus speech recognition.
Background
Speech recognition is the conversion of lexical content in input speech into corresponding text information. The existing speech recognition model firstly processes the speech, then decodes the speech by using an acoustic model, then matches syllables with word lists to obtain word sequences, and finally obtains sentences by using the speech model.
When people perform natural spoken dialogue, not only sound is transmitted, but also emotion states, attitudes, intentions and the like of a speaker are transmitted. In the voice recognition function of current wisdom campus equipment, the keyword retrieval and emotion voice recognition of the violent vocabulary are lacked, the sound source positioning can not be carried out on the acquired voice, the voice recognition performance is poor, and the safety of the campus students can not be comprehensively protected through the voice recognition of the students.
Disclosure of Invention
The invention provides a campus voice recognition method, a device and a storage medium, which are used for realizing recognition and positioning of violent voices in a campus.
In order to identify and locate violent voices in a campus, an embodiment of the invention provides a method, a device and a storage medium for identifying campus voices, which comprise the following steps: acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information;
Inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not;
if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining sound source information in the voice information according to a voice print scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
and transmitting the first audio signal data, the position information of the first campus voice device and the sound source information to a management system.
As a preferred scheme, the method and the device for voice recognition of the campus have the advantages that the first audio signal data of any voice device in the campus are subjected to feature extraction, input into a voice recognition model for voice analysis, and whether violent voice exists in the first audio signal data is judged; if the obtained first audio signal data is violent voice, then the obtained violent voice is subjected to voiceprint analysis, sound source information of the violent voice is obtained, the number of people sending out the voice information of the human voice and the position distance and direction of the people are obtained, the voice information of students is recorded in the campus in real time, whether the voice information is violent voice is detected, and the number of people sending out the violent voice, the position distance and the direction are judged, so that sound source positioning is carried out.
As a preferred scheme, first audio signal data in a first voice device is obtained, and the first audio signal data is filtered to obtain voice information, specifically:
dividing the first audio signal data into a voice area and a mute area, removing noise of the voice area, and taking the voice area after noise removal as the voice information of the human voice.
As a preferred scheme, the voice information is firstly segmented and extracted, the characteristic information of the voice region is extracted, the calculation of environmental voices is reduced, the precision of voice analysis is improved, keywords and voiceprints of the voice are extracted, the voice information of students is recorded in campus in real time, whether the voice information is violent voice is detected, the number of characters, the position distance and the direction of the violent voice are judged according to the characteristics of the voiceprints, and therefore sound source positioning is carried out.
As a preferred scheme, detecting whether the voice information includes a preset violence keyword or not specifically includes:
calling a unified API interface to acquire channel information of a first keyword of voice information;
matching and calculating the channel information of the first keyword and the channel information of the second keyword in the training voice information; wherein the second keywords are preset violent keywords;
If the channel information of the first keyword is identical to the channel information of the second keyword in a matching manner, the voice recognition model judges that the voice information of the voice contains a preset violent keyword.
As a preferred scheme, the invention judges whether the keywords of the voice information are violent words or words with negative emotion by matching the keyword characteristic information of the voice information with the keyword characteristic information of the training voice information, thereby realizing the real-time recording of the voice information of students in campus and detecting whether the voice information is violent voice.
As a preferred scheme, the energy value calculation is performed on the voice information, and the sound source information in the voice information is determined according to the voiceprint scale factor and the energy distribution of the voice information, specifically:
respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
Extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
and determining the number of people and the direction of sound according to the matrix energy distribution and the positions of the plurality of audio acquisition terminals.
As a preferred scheme, the first campus voice equipment is provided with a plurality of audio acquisition terminals, the energy value and the frequency domain energy distribution of each piece of voice information are respectively calculated according to voice information after filtering processing of first audio signal data acquired by the plurality of audio acquisition terminals, voiceprint scale factors are extracted, equalization processing is carried out on the voice information, and matrix energy distribution is output; and determining the number of people and the direction of sound according to the matrix energy distribution and the positions of a plurality of audio acquisition terminals, thereby performing sound source localization.
Preferably, before inputting the voice information into the voice recognition model, the method further comprises:
acquiring a plurality of training audio data, and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
Dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
and respectively modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model.
Before voice information is input into a voice recognition model, the voice recognition model is trained, voice containing violent words or emotion keywords and voice without violent words or emotion keywords are used as training audio data, so that the model can train and distinguish various characteristic values containing violent words or emotion keywords and without violent words or emotion keywords, fusion is carried out according to the characteristics of the characteristic values, and the model established according to the fused characteristic parameters can detect whether the voice information is violent voice and emotion values expressed by the voice information.
Preferably, before inputting the voiceprint parameters into the voiceprint recognition model, the method further comprises:
acquiring a plurality of training audio data, and extracting first energy characteristic information of the training audio data; fusion calculation is carried out on the first energy characteristic information, and voiceprint characteristic parameters of the training audio data are obtained; and modeling the training audio data according to the voiceprint characteristic parameters to obtain a voiceprint recognition model.
Before inputting the voiceprint parameters into the voiceprint recognition model, the voiceprint recognition model is trained, the first energy characteristic information of the training audio data is extracted, the voiceprint characteristic parameters of the training audio data are obtained, and the voiceprint recognition model is trained according to the voiceprint characteristic parameters, so that the voiceprint recognition model can judge the number, the position distance and the direction of people sending violent voices, and sound source positioning is performed.
Preferably, before the first audio signal data, the location information of the first campus voice device, and the sound source information are sent to a management system, the method further includes:
playing the alarm information through broadcasting equipment; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
When violent voice is detected for the second time within preset time, the voice information, the position of voice equipment for acquiring the voice information and the character information of the voice information are sent to a management system to inform an administrator of violent voice content, the number of people and the character position, the voice information of students is recorded in the campus in real time, whether the voice information is violent voice or not is detected, sound source positioning of violent voice is carried out, the administrator is informed timely, relevant content is sent, and the safety of the students on the campus is comprehensively protected.
Correspondingly, the invention also provides a device for campus voice recognition, which comprises: the device comprises an acquisition module, a violence detection module, a voiceprint positioning module and an information sending module;
the acquisition module is used for acquiring audio signal data in campus voice equipment, and extracting characteristics of the audio signal data to acquire voice information;
the violence detection module is used for inputting the voice information into a voice recognition model so that the voice recognition model can judge whether the voice information contains preset violence keywords or not;
the voiceprint positioning module is used for inputting the voice information into a voiceprint recognition model if the voice information contains a preset violent keyword, so that the voiceprint recognition model calculates the energy value of the voice information, and determines sound source information in the voice information according to a voiceprint scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
the information sending module is used for sending the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
As a preferred scheme, an acquisition module of the campus voice recognition device acquires first audio signal data of any voice equipment of a campus, performs feature extraction on the first audio signal data to obtain voice information, and inputs the voice information into a voice recognition model for voice analysis by a violence detection module to judge whether violence voice exists in the first audio signal data; if the obtained first audio signal data is violent voice, the voiceprint positioning module carries out voiceprint analysis on the obtained violent voice, obtains sound source information of the violent voice, sends out the number of people of the voice information and the position distance and direction of the people, realizes real-time recording of the voice information of students in a campus, detects whether the voice information is violent voice or not, and judges the number of people and the position distance and direction of the violent voice, so that sound source positioning is carried out. The information sending module timely feeds back sound source information of violent language to the manager.
As a preferred solution, the acquisition module includes a segmentation unit and a feature extraction unit;
the segmentation unit is used for segmenting the first audio signal data into a voice area and a mute area, and acquiring the voice area;
The characteristic extraction unit is used for extracting voice information of the voice area; the voice information comprises keyword characteristic information and voiceprint characteristic information.
As a preferred scheme, the segmentation unit of the invention carries out segmentation extraction of the voice information on the voice region before detecting the voice, the feature extraction unit extracts the feature information of the voice region, reduces the calculation of environmental voice, improves the precision of voice analysis, extracts key words and voiceprints of the voice, realizes real-time recording of the voice information of students in a campus, detects whether the voice information is violent voice, and judges the number of people sending violent voice, the position distance and the direction according to the voiceprint features, thereby carrying out sound source positioning.
As a preferred solution, the violence detection module comprises a training unit and a detection unit;
the training unit is used for acquiring a plurality of training audio data and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
Modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model;
the detection unit is used for extracting characteristic information of a first keyword of voice information; calling a unified API interface to acquire the characteristic information of the first keyword; matching and calculating the characteristic information of the first keyword with the characteristic information of the second keyword in the training voice information, and judging whether the first keyword is an violent vocabulary or not; and if the second keyword is the violent vocabulary and the characteristic information of the first keyword is the same as the characteristic information of the second keyword in a matching way, judging that the first keyword is the violent vocabulary.
Before voice information is input into a voice recognition model, a training unit firstly trains the voice recognition model, and voice containing violent words or emotion keywords and voice without violent words or emotion keywords are used as training audio data, so that the model can train and distinguish various characteristic values containing violent words or emotion keywords and without violent words or emotion keywords, and is fused according to the characteristics of the various characteristic values, and the model established according to the fused characteristic parameters can detect whether the voice information is violent voice and emotion values expressed by the voice information; the detection unit judges whether the keywords of the voice information are violent words or words with negative emotion through matching the keyword characteristic information of the voice information with the keyword characteristic information of the training voice information, so that the voice information of students can be recorded in the campus in real time and whether the voice information is violent voice can be detected.
Accordingly, the present invention also provides a computer-readable storage medium including a stored computer program; the computer program controls the device where the computer readable storage medium is located to execute the campus voice recognition method according to the present invention when running.
Drawings
FIG. 1 is a flow chart of one embodiment of a method of campus voice recognition provided by the present invention;
fig. 2 is a schematic structural diagram of an embodiment of a campus voice recognition device provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, a method for campus voice recognition according to an embodiment of the present invention includes steps S101 to S104:
Step S101: acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information;
in this embodiment, first audio signal data in a first voice device is obtained, and filtering processing is performed on the first audio signal data to obtain voice information, specifically:
dividing the first audio signal data into a voice area and a mute area, removing noise of the voice area, and taking the voice area after noise removal as the voice information of the human voice.
In this embodiment, the first audio signal data is divided into a voice area and a mute area, noise in the voice area is removed, and the voice area after noise removal is used as the voice information of the voice, specifically:
the first audio signal data is subjected to Hanning window and short-time fast Fourier transform to be subjected to segmentation processing from a time domain to a frequency domain;
inputting the segmented first audio signal data into an IIR filter for filtering, weakening the frequency band of the segmented first audio signal data containing noise, enhancing the audio containing human voice, and finally obtaining the voice information of the human voice through inverse Fourier transform to a time domain.
In this embodiment, the first audio signal data dividing and filtering process specifically includes:
the gain of the first audio signal data is adjusted to be 0.01-10 randomly, the noise gain is adjusted to be 0.1-10 randomly, the gain increase is calculated according to frames, and the audio signal data after gain is obtained; processing the audio signal data after gain through a random second-order filter to obtain a voice signal and a noise signal;
calculating the voice energy value of the voice signal, and calculating 1 voice vad feature point according to the voice energy value characteristic; calculating the energy spectrum of the noise signal to obtain 22 voiceprint feature points; mixing the voice signal and the noise signal after the gain processing to obtain a voice signal with noise, and calculating mixed characteristic points to obtain 44 mixed characteristic points;
and calculating the ratio of the energy value of the voice signal to the energy of the voice with noise, the vad characteristic points and the mute voice signal to obtain 22 gain characteristic points.
In this embodiment, training is performed on the deep neural network model according to training data, 44 mixed feature points, 22 gain feature points and 1 voice vad feature point of the training data are extracted and input into the deep neural network model for training, and the deep neural network model outputs a voice language signal. 10% of training data was used as a validation test set, and the remaining training data was divided into 32 pieces, with 120 training times.
Step S102: inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not;
in this embodiment, the detecting and determining whether the voice information includes a preset violence keyword specifically includes:
calling a unified API interface to acquire channel information of a first keyword of voice information;
matching and calculating the channel information of the first keyword and the channel information of the second keyword in the training voice information; wherein the second keywords are preset violent keywords;
if the channel information of the first keyword is identical to the channel information of the second keyword in a matching manner, the voice recognition model judges that the voice information of the voice contains a preset violent keyword.
In this embodiment, before inputting the voice information into the voice recognition model, the method further includes:
acquiring a plurality of training audio data, and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
Dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
and respectively modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model.
In this embodiment, according to the feature types of the voice region and the silence region, the feature information is fused and calculated, an initial voice recognition model is established by using a network structure of DenseNet-LSTM, the initial voice recognition model is trained by using a plurality of training audio data, and after the model accuracy is judged to be greater than 99.5% according to a test set, the voice recognition model is obtained.
In this embodiment, each time the voice recognition model obtains a piece of voice information, an SDK is generated; providing a unified API interface for an application program in an alsa-lib library to collect channel information of keywords of voice information, and carrying out matching calculation on the channel information of the keywords of the voice information and channel information of second keywords in training voice information; the second keywords comprise, but are not limited to, preset violent words and preset emotion keywords; and the emotion keywords are key information which is intelligently identified from the text of the voice information of the voice and has the greatest influence on the overall emotion of the text.
In this embodiment, after a piece of voice information is obtained each time, the obtained voice information and the calculation result are used as training data, and the learning experience of the voice recognition model is accumulated.
In this embodiment, based on the emotion analysis engine, full analysis can be performed on emotion extremum and emotion expressed by voice information of a person, and super calculation in a server is used for training to update network models and parameters for violent text analysis, and the trained models are loaded in voice recognition in idle time.
Step S103: if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining sound source information in the voice information according to a voice print scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
in this embodiment, energy value calculation is performed on the voice information, and sound source information in the voice information is determined according to a voiceprint scale factor and energy distribution of the voice information, which specifically includes:
Respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
and determining the number of people and the direction of sound according to the matrix energy distribution and the positions of the plurality of audio acquisition terminals.
In this embodiment, the number of people is determined by accumulating and comparing voiceprints of the collected voice information of the people, and comparing differences between voiceprints and differences between frequencies.
In this embodiment, before inputting the voiceprint parameters into the voiceprint recognition model, the method further includes:
acquiring a plurality of training audio data, and extracting first energy characteristic information of the training audio data; fusion calculation is carried out on the first energy characteristic information, and voiceprint characteristic parameters of the training audio data are obtained; and modeling the training audio data according to the voiceprint characteristic parameters to obtain a voiceprint recognition model.
In the embodiment, a plurality of voice information are respectively input into 4 corresponding matrix units, and the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal are respectively calculated; the first campus voice device is provided with 4 unidirectional audio acquisition terminals facing different directions; wherein the audio acquisition terminal includes, but is not limited to, a microphone device; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
in this embodiment, when the first campus voice device operates, the 4 audio acquisition terminals work simultaneously and acquire audio, and input the audio into the 4 corresponding matrix units respectively, where the audio acquired by each unit has different energy values, and the energy distribution of each frequency band in the frequency domain is inconsistent. Aiming at the ratio of the total energy of different matrix units and the energy distribution proportion of different frequency bands, namely the voiceprint scale factor, the signals with large proportion are subjected to enhancement processing, and the signals with small proportion are subjected to attenuation processing.
In the embodiment, the 31-segment equalizer performs enhancement and attenuation processing on signals by using a differential equation and a transfer function, and performs corresponding gain adjustment by equalizing at a center frequency point of each segment of signals; the magnitude of the gain adjustment value is controlled by a matrix calculation factor; the equalizer employs a biquad filter.
The differential equation is:
y[n]=(b0/a0)*x[n]+(b1/a0)*x[n-1]+(b2/a0)*x[n-2]-(a1/a0)*y[n-1]-(a2/a0)*y[n-2];
wherein a0, a1, a2, b0, b1, b2 are coefficients of the dual-order filter, y n is current audio output, x n is current audio input, x n-1 is audio input at the last moment, y n-1 is audio output value at the last moment, y n-2 is audio output value at the last moment, and y n-1 and y n-2 are feedback values of the system.
The transfer function is:
H(z)=(b0+b1*z -1 +b2*(z -2 ))/(1+a1*z -1 +a2*z -2 );
wherein a1, a2, b0, b1, b2 are coefficients of a two-order filter, H (z) is y [ n ] in the differential equation]Z-transformation of (2); z on molecule -1 And z -2 Is x [ n-1] in the differential equation]And x [ n-2]]Z-transformation of (2); z on denominator -1 And z -2 Is y [ n-1] in the differential equation]And y [ n-2]]Is a Z-transform of (c).
The voiceprint scale factors of the current different matrix units are stored, and are used as feedback signal processing for the next matrix calculation, so that the dynamic adjustment matrix calculation factors are realized.
In this embodiment, based on multi-feature fusion, voiceprint feature information of human voice features in a network structure of DenseNet-LSTM is used, and the more distant the sound is, and the less distant the sound is. By the energy distribution of the matrix, the direction and distance of the sound can be determined. The first campus voice device is provided with 4 unidirectional microphones facing different directions; the distance between the sound and the microphone is determined by the energy of the matrix units detected by the four microphones, and the ratio of the maximum matrix energy value to the minimum matrix energy value is multiplied by a coefficient to obtain the distance.
The direction of sound is determined by the mutual ratio of the energy values of the four matrix units, and after certain treatment, the sound is mapped into 0-3:0 is the direction between microphone 0 and microphone 1, 1 is the direction between microphone 1 and microphone 2, 2 is the direction between microphone 2 and microphone 3, and 3 is the direction between microphone 3 and microphone 0.
Step S104: and transmitting the first audio signal data, the position information of the first campus voice device and the sound source information to a management system.
In this embodiment, the alarm information is played through the broadcasting device; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
In this embodiment, if it is determined that the voice information includes a preset violent keyword; transmitting the detected violent keywords of the first audio signal data to a background MySQL through an Ethernet module, and playing alarm information through broadcasting equipment; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the violent keyword, the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
In this embodiment, the campus voice system includes a plurality of campus voice devices and a management system, where the campus voice devices are used to collect voice and play alert information.
The campus voice system further comprises: the system comprises a network interface and a control terminal, wherein the network interface is used for connecting the control terminal. The control terminal can directly modify the IP, the subnet mask, the gateway, the DHCP service and the script assisted configuration through the netplan configuration tool, and send the broadcast packet by utilizing the TCP network protocol communication issuing instruction; each campus voice device responds one by one, trains a heartbeat packet every five seconds, sends the heartbeat packet to ensure that each campus voice device is always on, and each campus voice device can configure a device tool to modify IP, a subnet mask, a gateway and a DHCP service to play music and tune volume.
In the embodiment, in the evening from ten to one, campus voice equipment in a dormitory area detects the voice volume in real time, and if the current campus voice equipment detects that the voice volume is in a frequency band of 40-70 decibels, the campus voice equipment automatically plays alarm information; if the voice in the frequency band of 40-70 dB is detected again within the preset time after the alarm information is played, the position information and the collected voice information of the garden checking voice equipment are sent to a background MySQL through an Ethernet module to send data and inform an administrator.
In this embodiment, the implementation of the embodiment of the present invention has the following effects:
the method comprises the steps of extracting the characteristics of first audio signal data of any voice equipment in a campus, inputting the first audio signal data into a voice recognition model for voice analysis, and judging whether violent voice exists in the first audio signal data; if the obtained first audio signal data is violent voice, then the obtained violent voice is subjected to voiceprint analysis, sound source information of the violent voice is obtained, the number of people sending out the voice information of the human voice and the position distance and direction of the people are obtained, the voice information of students is recorded in the campus in real time, whether the voice information is violent voice is detected, and the number of people sending out the violent voice, the position distance and the direction are judged, so that sound source positioning is carried out.
Example two
Referring to fig. 2, a device for campus voice recognition according to an embodiment of the present invention includes: the device comprises an acquisition module 201, a violence detection module 202, a voiceprint positioning module 203 and an information sending module 204;
the acquiring module 201 is configured to acquire audio signal data in campus voice equipment, perform feature extraction on the audio signal data, and obtain voice information;
The violence detection module 202 is configured to input the voice information into a voice recognition model, so that the voice recognition model determines whether the voice information contains a preset violence keyword;
the voiceprint positioning module 203 is configured to input the voice information into a voiceprint recognition model if the voice information includes a preset violent keyword, so that the voiceprint recognition model performs energy value calculation on the voice information, and determines sound source information in the voice information according to a voiceprint scale factor and energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
the information sending module 204 is configured to send the first audio signal data, the location information of the first campus voice device, and the sound source information to a management system.
The acquisition module 201 includes a segmentation unit and a feature extraction unit;
the segmentation unit is used for segmenting the first audio signal data into a voice area and a mute area, and acquiring the voice area;
the characteristic extraction unit is used for extracting voice information of the voice area; the voice information comprises keyword characteristic information and voiceprint characteristic information.
The violence detection module 202 comprises a training unit and a detection unit;
the training unit is used for acquiring a plurality of training audio data and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model;
the detection unit is used for extracting characteristic information of a first keyword of voice information; calling a unified API interface to acquire the characteristic information of the first keyword; matching and calculating the characteristic information of the first keyword with the characteristic information of the second keyword in the training voice information, and judging whether the first keyword is an violent vocabulary or not; and if the second keyword is the violent vocabulary and the characteristic information of the first keyword is the same as the characteristic information of the second keyword in a matching way, judging that the first keyword is the violent vocabulary.
The campus voice recognition device can implement the campus voice recognition method of the embodiment of the method. The options in the method embodiments described above are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the content of the above method embodiments, and in this embodiment, no further description is given.
The implementation of the embodiment of the application has the following effects:
the acquisition module of the campus voice recognition device acquires first audio signal data of any voice equipment of a campus, performs feature extraction on the first audio signal data to acquire voice information, and inputs the voice information into a voice recognition model for voice analysis by the violence detection module to judge whether violent voice exists in the first audio signal data; if the obtained first audio signal data is violent voice, the voiceprint positioning module carries out voiceprint analysis on the obtained violent voice, obtains sound source information of the violent voice, sends out the number of people of the voice information and the position distance and direction of the people, realizes real-time recording of the voice information of students in a campus, detects whether the voice information is violent voice or not, and judges the number of people and the position distance and direction of the violent voice, so that sound source positioning is carried out. The information sending module timely feeds back sound source information of violent language to the manager.
Example III
Correspondingly, the invention further provides a computer readable storage medium, which comprises a stored computer program, wherein the computer program is used for controlling equipment where the computer readable storage medium is located to execute the campus voice recognition method according to any embodiment.
The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the terminal device.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines.
The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile terminal, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the terminal device integrated modules/units may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as stand alone products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims (9)

1. A method for campus voice recognition, comprising:
acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information; inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not;
if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining sound source information in the voice information according to a voice print scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
The step of calculating the energy value of the voice information, and determining the sound source information in the voice information according to the voiceprint scale factor and the energy distribution of the voice information, specifically comprises the following steps:
respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
determining the number of people and the direction of sound according to the matrix energy distribution and the positions of a plurality of audio acquisition terminals;
and transmitting the first audio signal data, the position information of the first campus voice device and the sound source information to a management system.
2. The method of campus voice recognition according to claim 1, wherein the obtaining first audio signal data in the first voice device, and performing filtering processing on the first audio signal data, obtains voice information, specifically includes:
Dividing the first audio signal data into a voice area and a mute area, removing noise of the voice area, and taking the voice area after noise removal as the voice information of the human voice.
3. The method of campus voice recognition according to claim 2, wherein the determining whether the voice information includes a preset violence keyword is specifically:
calling a unified API interface to acquire channel information of a first keyword of voice information;
matching and calculating the channel information of the first keyword and the channel information of the second keyword in the training voice information; wherein the second keywords are preset violent keywords;
if the channel information of the first keyword is identical to the channel information of the second keyword in a matching manner, the voice recognition model judges that the voice information of the voice contains a preset violent keyword.
4. The method of campus voice recognition of claim 1 wherein prior to entering the voice information into a voice recognition model, further comprising:
acquiring a plurality of training audio data, and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
Dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
and respectively modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model.
5. The method of campus voice recognition of claim 1, wherein before inputting the voiceprint parameters into the voiceprint recognition model, further comprising:
acquiring a plurality of training audio data, and extracting first energy characteristic information of the training audio data; fusion calculation is carried out on the first energy characteristic information, and voiceprint characteristic parameters of the training audio data are obtained; and modeling the training audio data according to the voiceprint characteristic parameters to obtain a voiceprint recognition model.
6. The method of campus voice recognition of claim 1, wherein before the transmitting the first audio signal data, the location information of the first campus voice device, and the sound source information to the management system, further comprising:
Playing the alarm information through broadcasting equipment; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
7. An apparatus for campus speech recognition, comprising: the device comprises an acquisition module, a violence detection module, a voiceprint positioning module and an information sending module;
the acquisition module is used for acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information;
the violence detection module is used for inputting the voice information into a voice recognition model so that the voice recognition model can judge whether the voice information contains preset violence keywords or not;
the voiceprint positioning module is used for inputting the voice information into a voiceprint recognition model if the voice information contains a preset violent keyword, so that the voiceprint recognition model calculates the energy value of the voice information, and determines sound source information in the voice information according to a voiceprint scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
The step of calculating the energy value of the voice information, and determining the sound source information in the voice information according to the voiceprint scale factor and the energy distribution of the voice information, specifically comprises the following steps:
respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
determining the number of people and the direction of sound according to the matrix energy distribution and the positions of a plurality of audio acquisition terminals;
the information sending module is used for sending the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
8. The apparatus for campus voice recognition of claim 7, wherein the acquisition module comprises a segmentation unit and a feature extraction unit;
The segmentation unit is used for segmenting the first audio signal data into a voice area and a mute area, and acquiring the voice area;
the characteristic extraction unit is used for extracting voice information of the voice area; the voice information comprises keyword characteristic information and voiceprint characteristic information.
9. The apparatus of claim 7, wherein the violence detection module comprises a training unit and a detection unit;
the training unit is used for acquiring a plurality of training audio data and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model;
The detection unit is used for extracting characteristic information of a first keyword of voice information; calling a unified API interface to acquire the characteristic information of the first keyword; matching and calculating the characteristic information of the first keyword with the characteristic information of the second keyword in the training voice information, and judging whether the first keyword is an violent vocabulary or not; and if the second keyword is the violent vocabulary and the characteristic information of the first keyword is the same as the characteristic information of the second keyword in a matching way, judging that the first keyword is the violent vocabulary.
CN202211592939.2A 2022-12-13 2022-12-13 Campus voice recognition method, device and storage medium Active CN116229987B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211592939.2A CN116229987B (en) 2022-12-13 2022-12-13 Campus voice recognition method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211592939.2A CN116229987B (en) 2022-12-13 2022-12-13 Campus voice recognition method, device and storage medium

Publications (2)

Publication Number Publication Date
CN116229987A CN116229987A (en) 2023-06-06
CN116229987B true CN116229987B (en) 2023-11-21

Family

ID=86588111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211592939.2A Active CN116229987B (en) 2022-12-13 2022-12-13 Campus voice recognition method, device and storage medium

Country Status (1)

Country Link
CN (1) CN116229987B (en)

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5831936A (en) * 1995-02-21 1998-11-03 State Of Israel/Ministry Of Defense Armament Development Authority - Rafael System and method of noise detection
KR19990042393A (en) * 1997-11-26 1999-06-15 전주범 Character Substitution Method on TV
JPH11202890A (en) * 1998-01-20 1999-07-30 Ricoh Co Ltd Speech retrieval device
RU2008141557A (en) * 2008-10-20 2010-04-27 Федеральное государственное образовательное учреждение высшего профессионального образования "Чувашский государственный университе METHOD FOR RECOGNIZING KEY WORDS IN CONNECTED SPEECH
WO2011041977A1 (en) * 2009-10-10 2011-04-14 Xiong Dianyuan Cross monitoring method and system based on voiceprint identification and location tracking
KR101184012B1 (en) * 2011-03-31 2012-09-21 경남대학교 산학협력단 Intelligent robot for prevention of school violence and protection of children
CN104821882A (en) * 2015-05-08 2015-08-05 南京财经大学 Network security verification method based on voice biometric features
WO2015180447A1 (en) * 2014-05-28 2015-12-03 西安中兴新软件有限责任公司 Alarming method, terminal, and storage medium
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN106100777A (en) * 2016-05-27 2016-11-09 西华大学 Broadcast support method based on speech recognition technology
WO2017012496A1 (en) * 2015-07-23 2017-01-26 阿里巴巴集团控股有限公司 User voiceprint model construction method, apparatus, and system
CN107564530A (en) * 2017-08-18 2018-01-09 浙江大学 A kind of unmanned plane detection method based on vocal print energy feature
WO2018018906A1 (en) * 2016-07-27 2018-02-01 深圳市鹰硕音频科技有限公司 Voice access control and quiet environment monitoring method and system
CN109410521A (en) * 2018-12-28 2019-03-01 苏州思必驰信息科技有限公司 Voice monitoring alarm method and system
CN109635872A (en) * 2018-12-17 2019-04-16 上海观安信息技术股份有限公司 Personal identification method, electronic equipment and computer program product
CN110970049A (en) * 2019-12-06 2020-04-07 广州国音智能科技有限公司 Multi-person voice recognition method, device, equipment and readable storage medium
CN111508475A (en) * 2020-04-16 2020-08-07 五邑大学 Robot awakening voice keyword recognition method and device and storage medium
CN111540342A (en) * 2020-04-16 2020-08-14 浙江大华技术股份有限公司 Energy threshold adjusting method, device, equipment and medium
CN111971647A (en) * 2018-04-09 2020-11-20 麦克赛尔株式会社 Speech recognition apparatus, cooperation system of speech recognition apparatus, and cooperation method of speech recognition apparatus
WO2021093380A1 (en) * 2019-11-13 2021-05-20 苏宁云计算有限公司 Noise processing method and apparatus, and system
CN112887872A (en) * 2021-01-04 2021-06-01 深圳千岸科技股份有限公司 Playing method of earphone voice instruction, earphone and storage medium
CN113556313A (en) * 2021-01-27 2021-10-26 福建环宇通信息科技股份公司 Real-time talkback intervention and alarm platform based on AI technology
CN114492196A (en) * 2022-02-14 2022-05-13 瑶声科技(苏州)有限责任公司 Fault rapid detection method and system based on normal wave energy ratio theory
CN114694344A (en) * 2020-12-28 2022-07-01 深圳云天励飞技术股份有限公司 Campus violence monitoring method and device and electronic equipment
CN114743562A (en) * 2022-06-09 2022-07-12 成都凯天电子股份有限公司 Method and system for recognizing airplane voiceprint, electronic equipment and storage medium
CN115116437A (en) * 2022-04-07 2022-09-27 腾讯科技(深圳)有限公司 Speech recognition method, apparatus, computer device, storage medium and product

Patent Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5831936A (en) * 1995-02-21 1998-11-03 State Of Israel/Ministry Of Defense Armament Development Authority - Rafael System and method of noise detection
KR19990042393A (en) * 1997-11-26 1999-06-15 전주범 Character Substitution Method on TV
JPH11202890A (en) * 1998-01-20 1999-07-30 Ricoh Co Ltd Speech retrieval device
RU2008141557A (en) * 2008-10-20 2010-04-27 Федеральное государственное образовательное учреждение высшего профессионального образования "Чувашский государственный университе METHOD FOR RECOGNIZING KEY WORDS IN CONNECTED SPEECH
WO2011041977A1 (en) * 2009-10-10 2011-04-14 Xiong Dianyuan Cross monitoring method and system based on voiceprint identification and location tracking
KR101184012B1 (en) * 2011-03-31 2012-09-21 경남대학교 산학협력단 Intelligent robot for prevention of school violence and protection of children
WO2015180447A1 (en) * 2014-05-28 2015-12-03 西安中兴新软件有限责任公司 Alarming method, terminal, and storage medium
CN104821882A (en) * 2015-05-08 2015-08-05 南京财经大学 Network security verification method based on voice biometric features
WO2017012496A1 (en) * 2015-07-23 2017-01-26 阿里巴巴集团控股有限公司 User voiceprint model construction method, apparatus, and system
CN105280183A (en) * 2015-09-10 2016-01-27 百度在线网络技术(北京)有限公司 Voice interaction method and system
CN106100777A (en) * 2016-05-27 2016-11-09 西华大学 Broadcast support method based on speech recognition technology
WO2018018906A1 (en) * 2016-07-27 2018-02-01 深圳市鹰硕音频科技有限公司 Voice access control and quiet environment monitoring method and system
CN107564530A (en) * 2017-08-18 2018-01-09 浙江大学 A kind of unmanned plane detection method based on vocal print energy feature
CN111971647A (en) * 2018-04-09 2020-11-20 麦克赛尔株式会社 Speech recognition apparatus, cooperation system of speech recognition apparatus, and cooperation method of speech recognition apparatus
CN109635872A (en) * 2018-12-17 2019-04-16 上海观安信息技术股份有限公司 Personal identification method, electronic equipment and computer program product
CN109410521A (en) * 2018-12-28 2019-03-01 苏州思必驰信息科技有限公司 Voice monitoring alarm method and system
WO2021093380A1 (en) * 2019-11-13 2021-05-20 苏宁云计算有限公司 Noise processing method and apparatus, and system
CN110970049A (en) * 2019-12-06 2020-04-07 广州国音智能科技有限公司 Multi-person voice recognition method, device, equipment and readable storage medium
CN111540342A (en) * 2020-04-16 2020-08-14 浙江大华技术股份有限公司 Energy threshold adjusting method, device, equipment and medium
CN111508475A (en) * 2020-04-16 2020-08-07 五邑大学 Robot awakening voice keyword recognition method and device and storage medium
CN114694344A (en) * 2020-12-28 2022-07-01 深圳云天励飞技术股份有限公司 Campus violence monitoring method and device and electronic equipment
CN112887872A (en) * 2021-01-04 2021-06-01 深圳千岸科技股份有限公司 Playing method of earphone voice instruction, earphone and storage medium
CN113556313A (en) * 2021-01-27 2021-10-26 福建环宇通信息科技股份公司 Real-time talkback intervention and alarm platform based on AI technology
CN114492196A (en) * 2022-02-14 2022-05-13 瑶声科技(苏州)有限责任公司 Fault rapid detection method and system based on normal wave energy ratio theory
CN115116437A (en) * 2022-04-07 2022-09-27 腾讯科技(深圳)有限公司 Speech recognition method, apparatus, computer device, storage medium and product
CN114743562A (en) * 2022-06-09 2022-07-12 成都凯天电子股份有限公司 Method and system for recognizing airplane voiceprint, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种改进的语音关键词特征提取方法;王耀明;;上海电机学院学报(第04期);全文 *
基于声音位置指纹的室内声源定位方法;王硕朋;杨鹏;孙昊;;北京工业大学学报(第02期);全文 *

Also Published As

Publication number Publication date
CN116229987A (en) 2023-06-06

Similar Documents

Publication Publication Date Title
CN111161752B (en) Echo cancellation method and device
US8438026B2 (en) Method and system for generating training data for an automatic speech recognizer
Kingsbury et al. Recognizing reverberant speech with RASTA-PLP
US10614827B1 (en) System and method for speech enhancement using dynamic noise profile estimation
CN111816218A (en) Voice endpoint detection method, device, equipment and storage medium
CN108091323B (en) Method and apparatus for emotion recognition from speech
CN108899047A (en) The masking threshold estimation method, apparatus and storage medium of audio signal
CN112382300A (en) Voiceprint identification method, model training method, device, equipment and storage medium
TWI523006B (en) Method for using voiceprint identification to operate voice recoginition and electronic device thereof
JPH11296192A (en) Speech feature value compensating method for speech recognition, speech recognizing method, device therefor, and recording medium recorded with speech recognision program
CN116229987B (en) Campus voice recognition method, device and storage medium
CN113658596A (en) Semantic identification method and semantic identification device
WO2021152566A1 (en) System and method for shielding speaker voice print in audio signals
CN105355206A (en) Voiceprint feature extraction method and electronic equipment
CN110661923A (en) Method and device for recording speech information in conference
Upadhyay et al. Robust recognition of English speech in noisy environments using frequency warped signal processing
CN110767238B (en) Blacklist identification method, device, equipment and storage medium based on address information
Dai et al. 2D Psychoacoustic modeling of equivalent masking for automatic speech recognition
Kim et al. Spectral distortion model for training phase-sensitive deep-neural networks for far-field speech recognition
Singh et al. A novel algorithm using MFCC and ERB gammatone filters in speech recognition
Prasanna Kumar et al. Supervised and unsupervised separation of convolutive speech mixtures using f 0 and formant frequencies
Wang et al. An ideal Wiener filter correction-based cIRM speech enhancement method using deep neural networks with skip connections
CN111833897B (en) Voice enhancement method for interactive education
CN117153185B (en) Call processing method, device, computer equipment and storage medium
Fan et al. Power-normalized PLP (PNPLP) feature for robust speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: No. 56 Nanli East Road, Shiqi Town, Panyu District, Guangzhou City, Guangdong Province, 510000

Applicant after: Guangdong Baolun Electronics Co.,Ltd.

Address before: No.19 Chuangyuan Road, Zhongcun street, Panyu District, Guangzhou, Guangdong 510000

Applicant before: GUANGZHOU ITC ELECTRONIC TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant