CN116229987B - Campus voice recognition method, device and storage medium - Google Patents
Campus voice recognition method, device and storage medium Download PDFInfo
- Publication number
- CN116229987B CN116229987B CN202211592939.2A CN202211592939A CN116229987B CN 116229987 B CN116229987 B CN 116229987B CN 202211592939 A CN202211592939 A CN 202211592939A CN 116229987 B CN116229987 B CN 116229987B
- Authority
- CN
- China
- Prior art keywords
- voice
- information
- campus
- violent
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000005236 sound signal Effects 0.000 claims abstract description 69
- 238000001914 filtration Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 78
- 230000008451 emotion Effects 0.000 claims description 32
- 239000011159 matrix material Substances 0.000 claims description 30
- 238000004364 calculation method Methods 0.000 claims description 20
- 238000001514 detection method Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 11
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- KZMAWJRXKGLWGS-UHFFFAOYSA-N 2-chloro-n-[4-(4-methoxyphenyl)-1,3-thiazol-2-yl]-n-(3-methoxypropyl)acetamide Chemical compound S1C(N(C(=O)CCl)CCCOC)=NC(C=2C=CC(OC)=CC=2)=C1 KZMAWJRXKGLWGS-UHFFFAOYSA-N 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000003313 weakening effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The invention discloses a campus voice recognition method, a device and a storage medium, wherein the method comprises the following steps: acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information; inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not; if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining the sound source information in the voice information according to the voice print scale factor; wherein the sound source information includes: the number of people sending out voice information of the voice and the position direction of the people; and sending the first audio signal data, the position information and the sound source information of the first campus voice equipment to a management system to realize recognition and positioning of violent voices in the campus.
Description
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a method, an apparatus, and a storage medium for campus speech recognition.
Background
Speech recognition is the conversion of lexical content in input speech into corresponding text information. The existing speech recognition model firstly processes the speech, then decodes the speech by using an acoustic model, then matches syllables with word lists to obtain word sequences, and finally obtains sentences by using the speech model.
When people perform natural spoken dialogue, not only sound is transmitted, but also emotion states, attitudes, intentions and the like of a speaker are transmitted. In the voice recognition function of current wisdom campus equipment, the keyword retrieval and emotion voice recognition of the violent vocabulary are lacked, the sound source positioning can not be carried out on the acquired voice, the voice recognition performance is poor, and the safety of the campus students can not be comprehensively protected through the voice recognition of the students.
Disclosure of Invention
The invention provides a campus voice recognition method, a device and a storage medium, which are used for realizing recognition and positioning of violent voices in a campus.
In order to identify and locate violent voices in a campus, an embodiment of the invention provides a method, a device and a storage medium for identifying campus voices, which comprise the following steps: acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information;
Inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not;
if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining sound source information in the voice information according to a voice print scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
and transmitting the first audio signal data, the position information of the first campus voice device and the sound source information to a management system.
As a preferred scheme, the method and the device for voice recognition of the campus have the advantages that the first audio signal data of any voice device in the campus are subjected to feature extraction, input into a voice recognition model for voice analysis, and whether violent voice exists in the first audio signal data is judged; if the obtained first audio signal data is violent voice, then the obtained violent voice is subjected to voiceprint analysis, sound source information of the violent voice is obtained, the number of people sending out the voice information of the human voice and the position distance and direction of the people are obtained, the voice information of students is recorded in the campus in real time, whether the voice information is violent voice is detected, and the number of people sending out the violent voice, the position distance and the direction are judged, so that sound source positioning is carried out.
As a preferred scheme, first audio signal data in a first voice device is obtained, and the first audio signal data is filtered to obtain voice information, specifically:
dividing the first audio signal data into a voice area and a mute area, removing noise of the voice area, and taking the voice area after noise removal as the voice information of the human voice.
As a preferred scheme, the voice information is firstly segmented and extracted, the characteristic information of the voice region is extracted, the calculation of environmental voices is reduced, the precision of voice analysis is improved, keywords and voiceprints of the voice are extracted, the voice information of students is recorded in campus in real time, whether the voice information is violent voice is detected, the number of characters, the position distance and the direction of the violent voice are judged according to the characteristics of the voiceprints, and therefore sound source positioning is carried out.
As a preferred scheme, detecting whether the voice information includes a preset violence keyword or not specifically includes:
calling a unified API interface to acquire channel information of a first keyword of voice information;
matching and calculating the channel information of the first keyword and the channel information of the second keyword in the training voice information; wherein the second keywords are preset violent keywords;
If the channel information of the first keyword is identical to the channel information of the second keyword in a matching manner, the voice recognition model judges that the voice information of the voice contains a preset violent keyword.
As a preferred scheme, the invention judges whether the keywords of the voice information are violent words or words with negative emotion by matching the keyword characteristic information of the voice information with the keyword characteristic information of the training voice information, thereby realizing the real-time recording of the voice information of students in campus and detecting whether the voice information is violent voice.
As a preferred scheme, the energy value calculation is performed on the voice information, and the sound source information in the voice information is determined according to the voiceprint scale factor and the energy distribution of the voice information, specifically:
respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
Extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
and determining the number of people and the direction of sound according to the matrix energy distribution and the positions of the plurality of audio acquisition terminals.
As a preferred scheme, the first campus voice equipment is provided with a plurality of audio acquisition terminals, the energy value and the frequency domain energy distribution of each piece of voice information are respectively calculated according to voice information after filtering processing of first audio signal data acquired by the plurality of audio acquisition terminals, voiceprint scale factors are extracted, equalization processing is carried out on the voice information, and matrix energy distribution is output; and determining the number of people and the direction of sound according to the matrix energy distribution and the positions of a plurality of audio acquisition terminals, thereby performing sound source localization.
Preferably, before inputting the voice information into the voice recognition model, the method further comprises:
acquiring a plurality of training audio data, and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
Dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
and respectively modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model.
Before voice information is input into a voice recognition model, the voice recognition model is trained, voice containing violent words or emotion keywords and voice without violent words or emotion keywords are used as training audio data, so that the model can train and distinguish various characteristic values containing violent words or emotion keywords and without violent words or emotion keywords, fusion is carried out according to the characteristics of the characteristic values, and the model established according to the fused characteristic parameters can detect whether the voice information is violent voice and emotion values expressed by the voice information.
Preferably, before inputting the voiceprint parameters into the voiceprint recognition model, the method further comprises:
acquiring a plurality of training audio data, and extracting first energy characteristic information of the training audio data; fusion calculation is carried out on the first energy characteristic information, and voiceprint characteristic parameters of the training audio data are obtained; and modeling the training audio data according to the voiceprint characteristic parameters to obtain a voiceprint recognition model.
Before inputting the voiceprint parameters into the voiceprint recognition model, the voiceprint recognition model is trained, the first energy characteristic information of the training audio data is extracted, the voiceprint characteristic parameters of the training audio data are obtained, and the voiceprint recognition model is trained according to the voiceprint characteristic parameters, so that the voiceprint recognition model can judge the number, the position distance and the direction of people sending violent voices, and sound source positioning is performed.
Preferably, before the first audio signal data, the location information of the first campus voice device, and the sound source information are sent to a management system, the method further includes:
playing the alarm information through broadcasting equipment; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
When violent voice is detected for the second time within preset time, the voice information, the position of voice equipment for acquiring the voice information and the character information of the voice information are sent to a management system to inform an administrator of violent voice content, the number of people and the character position, the voice information of students is recorded in the campus in real time, whether the voice information is violent voice or not is detected, sound source positioning of violent voice is carried out, the administrator is informed timely, relevant content is sent, and the safety of the students on the campus is comprehensively protected.
Correspondingly, the invention also provides a device for campus voice recognition, which comprises: the device comprises an acquisition module, a violence detection module, a voiceprint positioning module and an information sending module;
the acquisition module is used for acquiring audio signal data in campus voice equipment, and extracting characteristics of the audio signal data to acquire voice information;
the violence detection module is used for inputting the voice information into a voice recognition model so that the voice recognition model can judge whether the voice information contains preset violence keywords or not;
the voiceprint positioning module is used for inputting the voice information into a voiceprint recognition model if the voice information contains a preset violent keyword, so that the voiceprint recognition model calculates the energy value of the voice information, and determines sound source information in the voice information according to a voiceprint scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
the information sending module is used for sending the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
As a preferred scheme, an acquisition module of the campus voice recognition device acquires first audio signal data of any voice equipment of a campus, performs feature extraction on the first audio signal data to obtain voice information, and inputs the voice information into a voice recognition model for voice analysis by a violence detection module to judge whether violence voice exists in the first audio signal data; if the obtained first audio signal data is violent voice, the voiceprint positioning module carries out voiceprint analysis on the obtained violent voice, obtains sound source information of the violent voice, sends out the number of people of the voice information and the position distance and direction of the people, realizes real-time recording of the voice information of students in a campus, detects whether the voice information is violent voice or not, and judges the number of people and the position distance and direction of the violent voice, so that sound source positioning is carried out. The information sending module timely feeds back sound source information of violent language to the manager.
As a preferred solution, the acquisition module includes a segmentation unit and a feature extraction unit;
the segmentation unit is used for segmenting the first audio signal data into a voice area and a mute area, and acquiring the voice area;
The characteristic extraction unit is used for extracting voice information of the voice area; the voice information comprises keyword characteristic information and voiceprint characteristic information.
As a preferred scheme, the segmentation unit of the invention carries out segmentation extraction of the voice information on the voice region before detecting the voice, the feature extraction unit extracts the feature information of the voice region, reduces the calculation of environmental voice, improves the precision of voice analysis, extracts key words and voiceprints of the voice, realizes real-time recording of the voice information of students in a campus, detects whether the voice information is violent voice, and judges the number of people sending violent voice, the position distance and the direction according to the voiceprint features, thereby carrying out sound source positioning.
As a preferred solution, the violence detection module comprises a training unit and a detection unit;
the training unit is used for acquiring a plurality of training audio data and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
Modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model;
the detection unit is used for extracting characteristic information of a first keyword of voice information; calling a unified API interface to acquire the characteristic information of the first keyword; matching and calculating the characteristic information of the first keyword with the characteristic information of the second keyword in the training voice information, and judging whether the first keyword is an violent vocabulary or not; and if the second keyword is the violent vocabulary and the characteristic information of the first keyword is the same as the characteristic information of the second keyword in a matching way, judging that the first keyword is the violent vocabulary.
Before voice information is input into a voice recognition model, a training unit firstly trains the voice recognition model, and voice containing violent words or emotion keywords and voice without violent words or emotion keywords are used as training audio data, so that the model can train and distinguish various characteristic values containing violent words or emotion keywords and without violent words or emotion keywords, and is fused according to the characteristics of the various characteristic values, and the model established according to the fused characteristic parameters can detect whether the voice information is violent voice and emotion values expressed by the voice information; the detection unit judges whether the keywords of the voice information are violent words or words with negative emotion through matching the keyword characteristic information of the voice information with the keyword characteristic information of the training voice information, so that the voice information of students can be recorded in the campus in real time and whether the voice information is violent voice can be detected.
Accordingly, the present invention also provides a computer-readable storage medium including a stored computer program; the computer program controls the device where the computer readable storage medium is located to execute the campus voice recognition method according to the present invention when running.
Drawings
FIG. 1 is a flow chart of one embodiment of a method of campus voice recognition provided by the present invention;
fig. 2 is a schematic structural diagram of an embodiment of a campus voice recognition device provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, a method for campus voice recognition according to an embodiment of the present invention includes steps S101 to S104:
Step S101: acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information;
in this embodiment, first audio signal data in a first voice device is obtained, and filtering processing is performed on the first audio signal data to obtain voice information, specifically:
dividing the first audio signal data into a voice area and a mute area, removing noise of the voice area, and taking the voice area after noise removal as the voice information of the human voice.
In this embodiment, the first audio signal data is divided into a voice area and a mute area, noise in the voice area is removed, and the voice area after noise removal is used as the voice information of the voice, specifically:
the first audio signal data is subjected to Hanning window and short-time fast Fourier transform to be subjected to segmentation processing from a time domain to a frequency domain;
inputting the segmented first audio signal data into an IIR filter for filtering, weakening the frequency band of the segmented first audio signal data containing noise, enhancing the audio containing human voice, and finally obtaining the voice information of the human voice through inverse Fourier transform to a time domain.
In this embodiment, the first audio signal data dividing and filtering process specifically includes:
the gain of the first audio signal data is adjusted to be 0.01-10 randomly, the noise gain is adjusted to be 0.1-10 randomly, the gain increase is calculated according to frames, and the audio signal data after gain is obtained; processing the audio signal data after gain through a random second-order filter to obtain a voice signal and a noise signal;
calculating the voice energy value of the voice signal, and calculating 1 voice vad feature point according to the voice energy value characteristic; calculating the energy spectrum of the noise signal to obtain 22 voiceprint feature points; mixing the voice signal and the noise signal after the gain processing to obtain a voice signal with noise, and calculating mixed characteristic points to obtain 44 mixed characteristic points;
and calculating the ratio of the energy value of the voice signal to the energy of the voice with noise, the vad characteristic points and the mute voice signal to obtain 22 gain characteristic points.
In this embodiment, training is performed on the deep neural network model according to training data, 44 mixed feature points, 22 gain feature points and 1 voice vad feature point of the training data are extracted and input into the deep neural network model for training, and the deep neural network model outputs a voice language signal. 10% of training data was used as a validation test set, and the remaining training data was divided into 32 pieces, with 120 training times.
Step S102: inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not;
in this embodiment, the detecting and determining whether the voice information includes a preset violence keyword specifically includes:
calling a unified API interface to acquire channel information of a first keyword of voice information;
matching and calculating the channel information of the first keyword and the channel information of the second keyword in the training voice information; wherein the second keywords are preset violent keywords;
if the channel information of the first keyword is identical to the channel information of the second keyword in a matching manner, the voice recognition model judges that the voice information of the voice contains a preset violent keyword.
In this embodiment, before inputting the voice information into the voice recognition model, the method further includes:
acquiring a plurality of training audio data, and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
Dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
and respectively modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model.
In this embodiment, according to the feature types of the voice region and the silence region, the feature information is fused and calculated, an initial voice recognition model is established by using a network structure of DenseNet-LSTM, the initial voice recognition model is trained by using a plurality of training audio data, and after the model accuracy is judged to be greater than 99.5% according to a test set, the voice recognition model is obtained.
In this embodiment, each time the voice recognition model obtains a piece of voice information, an SDK is generated; providing a unified API interface for an application program in an alsa-lib library to collect channel information of keywords of voice information, and carrying out matching calculation on the channel information of the keywords of the voice information and channel information of second keywords in training voice information; the second keywords comprise, but are not limited to, preset violent words and preset emotion keywords; and the emotion keywords are key information which is intelligently identified from the text of the voice information of the voice and has the greatest influence on the overall emotion of the text.
In this embodiment, after a piece of voice information is obtained each time, the obtained voice information and the calculation result are used as training data, and the learning experience of the voice recognition model is accumulated.
In this embodiment, based on the emotion analysis engine, full analysis can be performed on emotion extremum and emotion expressed by voice information of a person, and super calculation in a server is used for training to update network models and parameters for violent text analysis, and the trained models are loaded in voice recognition in idle time.
Step S103: if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining sound source information in the voice information according to a voice print scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
in this embodiment, energy value calculation is performed on the voice information, and sound source information in the voice information is determined according to a voiceprint scale factor and energy distribution of the voice information, which specifically includes:
Respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
and determining the number of people and the direction of sound according to the matrix energy distribution and the positions of the plurality of audio acquisition terminals.
In this embodiment, the number of people is determined by accumulating and comparing voiceprints of the collected voice information of the people, and comparing differences between voiceprints and differences between frequencies.
In this embodiment, before inputting the voiceprint parameters into the voiceprint recognition model, the method further includes:
acquiring a plurality of training audio data, and extracting first energy characteristic information of the training audio data; fusion calculation is carried out on the first energy characteristic information, and voiceprint characteristic parameters of the training audio data are obtained; and modeling the training audio data according to the voiceprint characteristic parameters to obtain a voiceprint recognition model.
In the embodiment, a plurality of voice information are respectively input into 4 corresponding matrix units, and the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal are respectively calculated; the first campus voice device is provided with 4 unidirectional audio acquisition terminals facing different directions; wherein the audio acquisition terminal includes, but is not limited to, a microphone device; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
in this embodiment, when the first campus voice device operates, the 4 audio acquisition terminals work simultaneously and acquire audio, and input the audio into the 4 corresponding matrix units respectively, where the audio acquired by each unit has different energy values, and the energy distribution of each frequency band in the frequency domain is inconsistent. Aiming at the ratio of the total energy of different matrix units and the energy distribution proportion of different frequency bands, namely the voiceprint scale factor, the signals with large proportion are subjected to enhancement processing, and the signals with small proportion are subjected to attenuation processing.
In the embodiment, the 31-segment equalizer performs enhancement and attenuation processing on signals by using a differential equation and a transfer function, and performs corresponding gain adjustment by equalizing at a center frequency point of each segment of signals; the magnitude of the gain adjustment value is controlled by a matrix calculation factor; the equalizer employs a biquad filter.
The differential equation is:
y[n]=(b0/a0)*x[n]+(b1/a0)*x[n-1]+(b2/a0)*x[n-2]-(a1/a0)*y[n-1]-(a2/a0)*y[n-2];
wherein a0, a1, a2, b0, b1, b2 are coefficients of the dual-order filter, y n is current audio output, x n is current audio input, x n-1 is audio input at the last moment, y n-1 is audio output value at the last moment, y n-2 is audio output value at the last moment, and y n-1 and y n-2 are feedback values of the system.
The transfer function is:
H(z)=(b0+b1*z -1 +b2*(z -2 ))/(1+a1*z -1 +a2*z -2 );
wherein a1, a2, b0, b1, b2 are coefficients of a two-order filter, H (z) is y [ n ] in the differential equation]Z-transformation of (2); z on molecule -1 And z -2 Is x [ n-1] in the differential equation]And x [ n-2]]Z-transformation of (2); z on denominator -1 And z -2 Is y [ n-1] in the differential equation]And y [ n-2]]Is a Z-transform of (c).
The voiceprint scale factors of the current different matrix units are stored, and are used as feedback signal processing for the next matrix calculation, so that the dynamic adjustment matrix calculation factors are realized.
In this embodiment, based on multi-feature fusion, voiceprint feature information of human voice features in a network structure of DenseNet-LSTM is used, and the more distant the sound is, and the less distant the sound is. By the energy distribution of the matrix, the direction and distance of the sound can be determined. The first campus voice device is provided with 4 unidirectional microphones facing different directions; the distance between the sound and the microphone is determined by the energy of the matrix units detected by the four microphones, and the ratio of the maximum matrix energy value to the minimum matrix energy value is multiplied by a coefficient to obtain the distance.
The direction of sound is determined by the mutual ratio of the energy values of the four matrix units, and after certain treatment, the sound is mapped into 0-3:0 is the direction between microphone 0 and microphone 1, 1 is the direction between microphone 1 and microphone 2, 2 is the direction between microphone 2 and microphone 3, and 3 is the direction between microphone 3 and microphone 0.
Step S104: and transmitting the first audio signal data, the position information of the first campus voice device and the sound source information to a management system.
In this embodiment, the alarm information is played through the broadcasting device; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
In this embodiment, if it is determined that the voice information includes a preset violent keyword; transmitting the detected violent keywords of the first audio signal data to a background MySQL through an Ethernet module, and playing alarm information through broadcasting equipment; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the violent keyword, the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
In this embodiment, the campus voice system includes a plurality of campus voice devices and a management system, where the campus voice devices are used to collect voice and play alert information.
The campus voice system further comprises: the system comprises a network interface and a control terminal, wherein the network interface is used for connecting the control terminal. The control terminal can directly modify the IP, the subnet mask, the gateway, the DHCP service and the script assisted configuration through the netplan configuration tool, and send the broadcast packet by utilizing the TCP network protocol communication issuing instruction; each campus voice device responds one by one, trains a heartbeat packet every five seconds, sends the heartbeat packet to ensure that each campus voice device is always on, and each campus voice device can configure a device tool to modify IP, a subnet mask, a gateway and a DHCP service to play music and tune volume.
In the embodiment, in the evening from ten to one, campus voice equipment in a dormitory area detects the voice volume in real time, and if the current campus voice equipment detects that the voice volume is in a frequency band of 40-70 decibels, the campus voice equipment automatically plays alarm information; if the voice in the frequency band of 40-70 dB is detected again within the preset time after the alarm information is played, the position information and the collected voice information of the garden checking voice equipment are sent to a background MySQL through an Ethernet module to send data and inform an administrator.
In this embodiment, the implementation of the embodiment of the present invention has the following effects:
the method comprises the steps of extracting the characteristics of first audio signal data of any voice equipment in a campus, inputting the first audio signal data into a voice recognition model for voice analysis, and judging whether violent voice exists in the first audio signal data; if the obtained first audio signal data is violent voice, then the obtained violent voice is subjected to voiceprint analysis, sound source information of the violent voice is obtained, the number of people sending out the voice information of the human voice and the position distance and direction of the people are obtained, the voice information of students is recorded in the campus in real time, whether the voice information is violent voice is detected, and the number of people sending out the violent voice, the position distance and the direction are judged, so that sound source positioning is carried out.
Example two
Referring to fig. 2, a device for campus voice recognition according to an embodiment of the present invention includes: the device comprises an acquisition module 201, a violence detection module 202, a voiceprint positioning module 203 and an information sending module 204;
the acquiring module 201 is configured to acquire audio signal data in campus voice equipment, perform feature extraction on the audio signal data, and obtain voice information;
The violence detection module 202 is configured to input the voice information into a voice recognition model, so that the voice recognition model determines whether the voice information contains a preset violence keyword;
the voiceprint positioning module 203 is configured to input the voice information into a voiceprint recognition model if the voice information includes a preset violent keyword, so that the voiceprint recognition model performs energy value calculation on the voice information, and determines sound source information in the voice information according to a voiceprint scale factor and energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
the information sending module 204 is configured to send the first audio signal data, the location information of the first campus voice device, and the sound source information to a management system.
The acquisition module 201 includes a segmentation unit and a feature extraction unit;
the segmentation unit is used for segmenting the first audio signal data into a voice area and a mute area, and acquiring the voice area;
the characteristic extraction unit is used for extracting voice information of the voice area; the voice information comprises keyword characteristic information and voiceprint characteristic information.
The violence detection module 202 comprises a training unit and a detection unit;
the training unit is used for acquiring a plurality of training audio data and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model;
the detection unit is used for extracting characteristic information of a first keyword of voice information; calling a unified API interface to acquire the characteristic information of the first keyword; matching and calculating the characteristic information of the first keyword with the characteristic information of the second keyword in the training voice information, and judging whether the first keyword is an violent vocabulary or not; and if the second keyword is the violent vocabulary and the characteristic information of the first keyword is the same as the characteristic information of the second keyword in a matching way, judging that the first keyword is the violent vocabulary.
The campus voice recognition device can implement the campus voice recognition method of the embodiment of the method. The options in the method embodiments described above are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the content of the above method embodiments, and in this embodiment, no further description is given.
The implementation of the embodiment of the application has the following effects:
the acquisition module of the campus voice recognition device acquires first audio signal data of any voice equipment of a campus, performs feature extraction on the first audio signal data to acquire voice information, and inputs the voice information into a voice recognition model for voice analysis by the violence detection module to judge whether violent voice exists in the first audio signal data; if the obtained first audio signal data is violent voice, the voiceprint positioning module carries out voiceprint analysis on the obtained violent voice, obtains sound source information of the violent voice, sends out the number of people of the voice information and the position distance and direction of the people, realizes real-time recording of the voice information of students in a campus, detects whether the voice information is violent voice or not, and judges the number of people and the position distance and direction of the violent voice, so that sound source positioning is carried out. The information sending module timely feeds back sound source information of violent language to the manager.
Example III
Correspondingly, the invention further provides a computer readable storage medium, which comprises a stored computer program, wherein the computer program is used for controlling equipment where the computer readable storage medium is located to execute the campus voice recognition method according to any embodiment.
The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the terminal device.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory.
The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines.
The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile terminal, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the terminal device integrated modules/units may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as stand alone products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.
Claims (9)
1. A method for campus voice recognition, comprising:
acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information; inputting the voice information into a voice recognition model so that the voice recognition model judges whether the voice information contains preset violent keywords or not;
if yes, inputting the voice information into a voice print recognition model, so that the voice print recognition model calculates the energy value of the voice information, and determining sound source information in the voice information according to a voice print scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
The step of calculating the energy value of the voice information, and determining the sound source information in the voice information according to the voiceprint scale factor and the energy distribution of the voice information, specifically comprises the following steps:
respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
determining the number of people and the direction of sound according to the matrix energy distribution and the positions of a plurality of audio acquisition terminals;
and transmitting the first audio signal data, the position information of the first campus voice device and the sound source information to a management system.
2. The method of campus voice recognition according to claim 1, wherein the obtaining first audio signal data in the first voice device, and performing filtering processing on the first audio signal data, obtains voice information, specifically includes:
Dividing the first audio signal data into a voice area and a mute area, removing noise of the voice area, and taking the voice area after noise removal as the voice information of the human voice.
3. The method of campus voice recognition according to claim 2, wherein the determining whether the voice information includes a preset violence keyword is specifically:
calling a unified API interface to acquire channel information of a first keyword of voice information;
matching and calculating the channel information of the first keyword and the channel information of the second keyword in the training voice information; wherein the second keywords are preset violent keywords;
if the channel information of the first keyword is identical to the channel information of the second keyword in a matching manner, the voice recognition model judges that the voice information of the voice contains a preset violent keyword.
4. The method of campus voice recognition of claim 1 wherein prior to entering the voice information into a voice recognition model, further comprising:
acquiring a plurality of training audio data, and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
Dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
and respectively modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model.
5. The method of campus voice recognition of claim 1, wherein before inputting the voiceprint parameters into the voiceprint recognition model, further comprising:
acquiring a plurality of training audio data, and extracting first energy characteristic information of the training audio data; fusion calculation is carried out on the first energy characteristic information, and voiceprint characteristic parameters of the training audio data are obtained; and modeling the training audio data according to the voiceprint characteristic parameters to obtain a voiceprint recognition model.
6. The method of campus voice recognition of claim 1, wherein before the transmitting the first audio signal data, the location information of the first campus voice device, and the sound source information to the management system, further comprising:
Playing the alarm information through broadcasting equipment; and if the violent voice is detected again within the preset time after the alarm information is played, transmitting the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
7. An apparatus for campus speech recognition, comprising: the device comprises an acquisition module, a violence detection module, a voiceprint positioning module and an information sending module;
the acquisition module is used for acquiring first audio signal data in first campus voice equipment, and filtering the first audio signal data to acquire voice information;
the violence detection module is used for inputting the voice information into a voice recognition model so that the voice recognition model can judge whether the voice information contains preset violence keywords or not;
the voiceprint positioning module is used for inputting the voice information into a voiceprint recognition model if the voice information contains a preset violent keyword, so that the voiceprint recognition model calculates the energy value of the voice information, and determines sound source information in the voice information according to a voiceprint scale factor and the energy distribution of the voice information; wherein the sound source information includes: the number of people sending out the voice information of the voice and the position distance and direction of the people;
The step of calculating the energy value of the voice information, and determining the sound source information in the voice information according to the voiceprint scale factor and the energy distribution of the voice information, specifically comprises the following steps:
respectively inputting a plurality of voice information into a plurality of corresponding matrix units, and respectively calculating the energy value and the frequency domain energy distribution of the voice information acquired by each audio acquisition terminal; the first campus voice equipment is provided with a plurality of audio acquisition terminals; the voice information of the voice is respectively filtered and processed by the first audio signal data acquired by different audio acquisition terminals;
extracting voiceprint scale factors according to the energy value and the frequency domain energy distribution of each matrix unit, performing equalization processing on the voice information of the human voice, and outputting matrix energy distribution;
determining the number of people and the direction of sound according to the matrix energy distribution and the positions of a plurality of audio acquisition terminals;
the information sending module is used for sending the first audio signal data, the position information of the first campus voice equipment and the sound source information to a management system.
8. The apparatus for campus voice recognition of claim 7, wherein the acquisition module comprises a segmentation unit and a feature extraction unit;
The segmentation unit is used for segmenting the first audio signal data into a voice area and a mute area, and acquiring the voice area;
the characteristic extraction unit is used for extracting voice information of the voice area; the voice information comprises keyword characteristic information and voiceprint characteristic information.
9. The apparatus of claim 7, wherein the violence detection module comprises a training unit and a detection unit;
the training unit is used for acquiring a plurality of training audio data and extracting characteristic information of the training audio data; wherein the training audio data comprises voice sounds containing violent vocabulary or emotion keywords and voice sounds without violent vocabulary or emotion keywords;
dividing the training audio data into a voice area and a mute area according to the characteristic information; according to the characteristic types of the voice area and the mute area, carrying out fusion calculation on the characteristic information to obtain characteristic parameters of the training audio data;
modeling channels of a voice area and a mute area of the training audio data according to the characteristic parameters to obtain a voice recognition model;
The detection unit is used for extracting characteristic information of a first keyword of voice information; calling a unified API interface to acquire the characteristic information of the first keyword; matching and calculating the characteristic information of the first keyword with the characteristic information of the second keyword in the training voice information, and judging whether the first keyword is an violent vocabulary or not; and if the second keyword is the violent vocabulary and the characteristic information of the first keyword is the same as the characteristic information of the second keyword in a matching way, judging that the first keyword is the violent vocabulary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211592939.2A CN116229987B (en) | 2022-12-13 | 2022-12-13 | Campus voice recognition method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211592939.2A CN116229987B (en) | 2022-12-13 | 2022-12-13 | Campus voice recognition method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116229987A CN116229987A (en) | 2023-06-06 |
CN116229987B true CN116229987B (en) | 2023-11-21 |
Family
ID=86588111
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211592939.2A Active CN116229987B (en) | 2022-12-13 | 2022-12-13 | Campus voice recognition method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116229987B (en) |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5831936A (en) * | 1995-02-21 | 1998-11-03 | State Of Israel/Ministry Of Defense Armament Development Authority - Rafael | System and method of noise detection |
KR19990042393A (en) * | 1997-11-26 | 1999-06-15 | 전주범 | Character Substitution Method on TV |
JPH11202890A (en) * | 1998-01-20 | 1999-07-30 | Ricoh Co Ltd | Speech retrieval device |
RU2008141557A (en) * | 2008-10-20 | 2010-04-27 | Федеральное государственное образовательное учреждение высшего профессионального образования "Чувашский государственный университе | METHOD FOR RECOGNIZING KEY WORDS IN CONNECTED SPEECH |
WO2011041977A1 (en) * | 2009-10-10 | 2011-04-14 | Xiong Dianyuan | Cross monitoring method and system based on voiceprint identification and location tracking |
KR101184012B1 (en) * | 2011-03-31 | 2012-09-21 | 경남대학교 산학협력단 | Intelligent robot for prevention of school violence and protection of children |
CN104821882A (en) * | 2015-05-08 | 2015-08-05 | 南京财经大学 | Network security verification method based on voice biometric features |
WO2015180447A1 (en) * | 2014-05-28 | 2015-12-03 | 西安中兴新软件有限责任公司 | Alarming method, terminal, and storage medium |
CN105280183A (en) * | 2015-09-10 | 2016-01-27 | 百度在线网络技术(北京)有限公司 | Voice interaction method and system |
CN106100777A (en) * | 2016-05-27 | 2016-11-09 | 西华大学 | Broadcast support method based on speech recognition technology |
WO2017012496A1 (en) * | 2015-07-23 | 2017-01-26 | 阿里巴巴集团控股有限公司 | User voiceprint model construction method, apparatus, and system |
CN107564530A (en) * | 2017-08-18 | 2018-01-09 | 浙江大学 | A kind of unmanned plane detection method based on vocal print energy feature |
WO2018018906A1 (en) * | 2016-07-27 | 2018-02-01 | 深圳市鹰硕音频科技有限公司 | Voice access control and quiet environment monitoring method and system |
CN109410521A (en) * | 2018-12-28 | 2019-03-01 | 苏州思必驰信息科技有限公司 | Voice monitoring alarm method and system |
CN109635872A (en) * | 2018-12-17 | 2019-04-16 | 上海观安信息技术股份有限公司 | Personal identification method, electronic equipment and computer program product |
CN110970049A (en) * | 2019-12-06 | 2020-04-07 | 广州国音智能科技有限公司 | Multi-person voice recognition method, device, equipment and readable storage medium |
CN111508475A (en) * | 2020-04-16 | 2020-08-07 | 五邑大学 | Robot awakening voice keyword recognition method and device and storage medium |
CN111540342A (en) * | 2020-04-16 | 2020-08-14 | 浙江大华技术股份有限公司 | Energy threshold adjusting method, device, equipment and medium |
CN111971647A (en) * | 2018-04-09 | 2020-11-20 | 麦克赛尔株式会社 | Speech recognition apparatus, cooperation system of speech recognition apparatus, and cooperation method of speech recognition apparatus |
WO2021093380A1 (en) * | 2019-11-13 | 2021-05-20 | 苏宁云计算有限公司 | Noise processing method and apparatus, and system |
CN112887872A (en) * | 2021-01-04 | 2021-06-01 | 深圳千岸科技股份有限公司 | Playing method of earphone voice instruction, earphone and storage medium |
CN113556313A (en) * | 2021-01-27 | 2021-10-26 | 福建环宇通信息科技股份公司 | Real-time talkback intervention and alarm platform based on AI technology |
CN114492196A (en) * | 2022-02-14 | 2022-05-13 | 瑶声科技(苏州)有限责任公司 | Fault rapid detection method and system based on normal wave energy ratio theory |
CN114694344A (en) * | 2020-12-28 | 2022-07-01 | 深圳云天励飞技术股份有限公司 | Campus violence monitoring method and device and electronic equipment |
CN114743562A (en) * | 2022-06-09 | 2022-07-12 | 成都凯天电子股份有限公司 | Method and system for recognizing airplane voiceprint, electronic equipment and storage medium |
CN115116437A (en) * | 2022-04-07 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Speech recognition method, apparatus, computer device, storage medium and product |
-
2022
- 2022-12-13 CN CN202211592939.2A patent/CN116229987B/en active Active
Patent Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5831936A (en) * | 1995-02-21 | 1998-11-03 | State Of Israel/Ministry Of Defense Armament Development Authority - Rafael | System and method of noise detection |
KR19990042393A (en) * | 1997-11-26 | 1999-06-15 | 전주범 | Character Substitution Method on TV |
JPH11202890A (en) * | 1998-01-20 | 1999-07-30 | Ricoh Co Ltd | Speech retrieval device |
RU2008141557A (en) * | 2008-10-20 | 2010-04-27 | Федеральное государственное образовательное учреждение высшего профессионального образования "Чувашский государственный университе | METHOD FOR RECOGNIZING KEY WORDS IN CONNECTED SPEECH |
WO2011041977A1 (en) * | 2009-10-10 | 2011-04-14 | Xiong Dianyuan | Cross monitoring method and system based on voiceprint identification and location tracking |
KR101184012B1 (en) * | 2011-03-31 | 2012-09-21 | 경남대학교 산학협력단 | Intelligent robot for prevention of school violence and protection of children |
WO2015180447A1 (en) * | 2014-05-28 | 2015-12-03 | 西安中兴新软件有限责任公司 | Alarming method, terminal, and storage medium |
CN104821882A (en) * | 2015-05-08 | 2015-08-05 | 南京财经大学 | Network security verification method based on voice biometric features |
WO2017012496A1 (en) * | 2015-07-23 | 2017-01-26 | 阿里巴巴集团控股有限公司 | User voiceprint model construction method, apparatus, and system |
CN105280183A (en) * | 2015-09-10 | 2016-01-27 | 百度在线网络技术(北京)有限公司 | Voice interaction method and system |
CN106100777A (en) * | 2016-05-27 | 2016-11-09 | 西华大学 | Broadcast support method based on speech recognition technology |
WO2018018906A1 (en) * | 2016-07-27 | 2018-02-01 | 深圳市鹰硕音频科技有限公司 | Voice access control and quiet environment monitoring method and system |
CN107564530A (en) * | 2017-08-18 | 2018-01-09 | 浙江大学 | A kind of unmanned plane detection method based on vocal print energy feature |
CN111971647A (en) * | 2018-04-09 | 2020-11-20 | 麦克赛尔株式会社 | Speech recognition apparatus, cooperation system of speech recognition apparatus, and cooperation method of speech recognition apparatus |
CN109635872A (en) * | 2018-12-17 | 2019-04-16 | 上海观安信息技术股份有限公司 | Personal identification method, electronic equipment and computer program product |
CN109410521A (en) * | 2018-12-28 | 2019-03-01 | 苏州思必驰信息科技有限公司 | Voice monitoring alarm method and system |
WO2021093380A1 (en) * | 2019-11-13 | 2021-05-20 | 苏宁云计算有限公司 | Noise processing method and apparatus, and system |
CN110970049A (en) * | 2019-12-06 | 2020-04-07 | 广州国音智能科技有限公司 | Multi-person voice recognition method, device, equipment and readable storage medium |
CN111540342A (en) * | 2020-04-16 | 2020-08-14 | 浙江大华技术股份有限公司 | Energy threshold adjusting method, device, equipment and medium |
CN111508475A (en) * | 2020-04-16 | 2020-08-07 | 五邑大学 | Robot awakening voice keyword recognition method and device and storage medium |
CN114694344A (en) * | 2020-12-28 | 2022-07-01 | 深圳云天励飞技术股份有限公司 | Campus violence monitoring method and device and electronic equipment |
CN112887872A (en) * | 2021-01-04 | 2021-06-01 | 深圳千岸科技股份有限公司 | Playing method of earphone voice instruction, earphone and storage medium |
CN113556313A (en) * | 2021-01-27 | 2021-10-26 | 福建环宇通信息科技股份公司 | Real-time talkback intervention and alarm platform based on AI technology |
CN114492196A (en) * | 2022-02-14 | 2022-05-13 | 瑶声科技(苏州)有限责任公司 | Fault rapid detection method and system based on normal wave energy ratio theory |
CN115116437A (en) * | 2022-04-07 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Speech recognition method, apparatus, computer device, storage medium and product |
CN114743562A (en) * | 2022-06-09 | 2022-07-12 | 成都凯天电子股份有限公司 | Method and system for recognizing airplane voiceprint, electronic equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
一种改进的语音关键词特征提取方法;王耀明;;上海电机学院学报(第04期);全文 * |
基于声音位置指纹的室内声源定位方法;王硕朋;杨鹏;孙昊;;北京工业大学学报(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116229987A (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111161752B (en) | Echo cancellation method and device | |
US8438026B2 (en) | Method and system for generating training data for an automatic speech recognizer | |
Kingsbury et al. | Recognizing reverberant speech with RASTA-PLP | |
US10614827B1 (en) | System and method for speech enhancement using dynamic noise profile estimation | |
CN111816218A (en) | Voice endpoint detection method, device, equipment and storage medium | |
CN108091323B (en) | Method and apparatus for emotion recognition from speech | |
CN108899047A (en) | The masking threshold estimation method, apparatus and storage medium of audio signal | |
CN112382300A (en) | Voiceprint identification method, model training method, device, equipment and storage medium | |
TWI523006B (en) | Method for using voiceprint identification to operate voice recoginition and electronic device thereof | |
JPH11296192A (en) | Speech feature value compensating method for speech recognition, speech recognizing method, device therefor, and recording medium recorded with speech recognision program | |
CN116229987B (en) | Campus voice recognition method, device and storage medium | |
CN113658596A (en) | Semantic identification method and semantic identification device | |
WO2021152566A1 (en) | System and method for shielding speaker voice print in audio signals | |
CN105355206A (en) | Voiceprint feature extraction method and electronic equipment | |
CN110661923A (en) | Method and device for recording speech information in conference | |
Upadhyay et al. | Robust recognition of English speech in noisy environments using frequency warped signal processing | |
CN110767238B (en) | Blacklist identification method, device, equipment and storage medium based on address information | |
Dai et al. | 2D Psychoacoustic modeling of equivalent masking for automatic speech recognition | |
Kim et al. | Spectral distortion model for training phase-sensitive deep-neural networks for far-field speech recognition | |
Singh et al. | A novel algorithm using MFCC and ERB gammatone filters in speech recognition | |
Prasanna Kumar et al. | Supervised and unsupervised separation of convolutive speech mixtures using f 0 and formant frequencies | |
Wang et al. | An ideal Wiener filter correction-based cIRM speech enhancement method using deep neural networks with skip connections | |
CN111833897B (en) | Voice enhancement method for interactive education | |
CN117153185B (en) | Call processing method, device, computer equipment and storage medium | |
Fan et al. | Power-normalized PLP (PNPLP) feature for robust speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: No. 56 Nanli East Road, Shiqi Town, Panyu District, Guangzhou City, Guangdong Province, 510000 Applicant after: Guangdong Baolun Electronics Co.,Ltd. Address before: No.19 Chuangyuan Road, Zhongcun street, Panyu District, Guangzhou, Guangdong 510000 Applicant before: GUANGZHOU ITC ELECTRONIC TECHNOLOGY Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |