CN109658953A - A kind of vagitus recognition methods, device and equipment - Google Patents
A kind of vagitus recognition methods, device and equipment Download PDFInfo
- Publication number
- CN109658953A CN109658953A CN201910029052.4A CN201910029052A CN109658953A CN 109658953 A CN109658953 A CN 109658953A CN 201910029052 A CN201910029052 A CN 201910029052A CN 109658953 A CN109658953 A CN 109658953A
- Authority
- CN
- China
- Prior art keywords
- vagitus
- characteristic sequence
- sequence
- feature vector
- voice data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 239000013598 vector Substances 0.000 claims abstract description 63
- 238000012544 monitoring process Methods 0.000 claims abstract description 24
- 238000004590 computer program Methods 0.000 claims description 18
- 230000003595 spectral effect Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 16
- 238000009432 framing Methods 0.000 claims description 5
- 238000003062 neural network model Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 206010011469 Crying Diseases 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 4
- 235000013399 edible fruits Nutrition 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000005713 exacerbation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
- 238000010792 warming Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Telephonic Communication Services (AREA)
Abstract
A kind of vagitus recognition methods includes: acquisition voice data, intercepts the voice data section of scheduled duration;Calculate two or more the audio feature vector for including in the voice data section;The audio feature vector is identified according to preset identification model, and the recognition result is sent to monitoring terminal.Since recognition result is to be identified according to two or more audio feature vectors, thus recognition result is more accurate and reliable, be conducive to the precision for improving vagitus identification.
Description
Technical field
The application belongs to voice recognition field more particularly to a kind of vagitus recognition methods, device and equipment.
Background technique
Ewborn infant generally passes to the external world by the crying of baby in the demand of showing emotion or psychological need.
In real life, since the guardians such as the old man of family or mother of baby are entrusted in the work for looking after ewborn infant substantially,
And guardian is often simultaneously as more duties.Thus, when baby is in sleep state, guardian may be busy with it
Its thing and be in leave state.Due to space length, the crying of baby may not be able to children under guardianship directly hear,
Prevent guardian is from timely hearing the demand of baby.
In order to enable guardian can timely hear the demand of baby, the prompting of some vagitus is had already appeared at present
Data acquisition equipment is accessed network, by cloud come complete based on data acquisition equipments such as camera or wearable devices by device
At identification work.Identifying schemes common at present are to determine whether to need based on these single indexs of decibel, zero-crossing rate or energy
It alarms, when occurring interference sound in environment, is easy so that rate of false alarm is higher.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of vagitus recognition methods, device and equipment, it is existing to solve
It is easy so that the high problem of rate of false alarm when environment interferes in the method for identifying vagitus in technology.
The first aspect of the embodiment of the present application provides a kind of vagitus recognition methods, the vagitus recognition methods
Include:
Voice data is acquired, the voice data section of scheduled duration is intercepted;
Calculate two or more the audio feature vector for including in the voice data section;
The audio feature vector is identified according to preset identification model, and the recognition result is sent to monitoring eventually
End.
With reference to first aspect, described to calculate the voice data section in the first possible implementation of first aspect
In include two or more audio feature vector the step of include:
Calculate zero-crossing rate characteristic sequence in the voice data section, energy feature sequence, multistage mel-frequency cepstrum system
It is two kinds or more of in number characteristic sequence or spectral centroid characteristic sequence;
Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or frequency spectrum matter
Two or more characteristic sequence in heart characteristic sequence generates audio feature vector.
The possible implementation of with reference to first aspect the first, in second of possible implementation of first aspect, institute
State selection zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid feature
The step of two or more characteristic sequence in sequence generates audio feature vector include:
Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or frequency spectrum matter
Two or more characteristic sequence in heart characteristic sequence, calculates the mean value of selected characteristic sequence;
The audio feature vector is determined according to mean value calculated.
With reference to first aspect, described according to preset identification in the third possible implementation of first aspect
The step of model identifies the audio feature vector, and the recognition result is sent to monitoring terminal include:
Judge whether current network is in connection status;
If current network is in connection status, the audio feature vector is sent to Cloud Server, so that cloud
Server is according to recognition result to the monitoring terminal sending application reminder message.
The third possible implementation with reference to first aspect, in the 4th kind of possible implementation of first aspect, institute
State method further include:
If current network is in an off state, the audio frequency characteristics are identified by the neural network model being locally stored
Vector;
When recognition result is scheduled alarm result, short message is sent to monitoring terminal or dials alarm telephone.
With reference to first aspect, in the 5th kind of possible implementation of first aspect, the voice data is calculated described
Before the step of two or more audio feature vector for including in section, the method also includes:
The voice data section is aggravated, the one or more in framing and windowing process.
The second aspect of the embodiment of the present application provides a kind of vagitus identification device, the vagitus identification device
Include:
Data under voice unit intercepts the voice data section of scheduled duration for acquiring voice data;
Audio feature vector computing unit, for calculating two or more for including in the voice data section
Audio feature vector;
Recognition unit ties the identification for identifying the audio feature vector according to preset identification model
Fruit is sent to monitoring terminal.
In conjunction with second aspect, in the first possible implementation of second aspect, the audio feature vector calculates single
Member includes:
Characteristic sequence computation subunit, for calculating zero-crossing rate characteristic sequence, energy feature in the voice data section
It is two kinds or more of in sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence;
Subelement is selected, for selecting zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient special
The two or more characteristic sequence levied in sequence or spectral centroid characteristic sequence generates audio feature vector.
The third aspect of the embodiment of the present application provide a kind of vagitus identification equipment, including memory, processor with
And the computer program that can be run in the memory and on the processor is stored, the processor executes the calculating
It is realized when machine program as described in any one of first aspect the step of vagitus recognition methods.
The fourth aspect of the embodiment of the present application provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, and the baby as described in any one of first aspect is realized when the computer program is executed by processor
The step of crying recognition methods.
Existing beneficial effect is the embodiment of the present application compared with prior art: by acquiring voice data, interception is predetermined
The voice data section of duration calculates the two or more audio feature vectors for including in the voice data section, according to
Preset identification model identifies the audio feature vector, and recognition result is sent to monitoring terminal.Since identification is tied
Fruit is to be identified according to two or more audio feature vectors, thus recognition result is more accurate and reliable, is conducive to mention
The precision of high vagitus identification.
Detailed description of the invention
It in order to more clearly explain the technical solutions in the embodiments of the present application, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only some of the application
Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is a kind of implement scene schematic diagram of vagitus recognition methods provided by the embodiments of the present application;
Fig. 2 is a kind of implementation process schematic diagram of vagitus recognition methods provided by the embodiments of the present application;
Fig. 3 is the implementation process schematic diagram of another vagitus recognition methods provided by the embodiments of the present application;
Fig. 4 is a kind of schematic diagram of vagitus identification device provided by the embodiments of the present application;
Fig. 5 is the schematic diagram of vagitus identification equipment provided by the embodiments of the present application.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed
Body details, so as to provide a thorough understanding of the present application embodiment.However, it will be clear to one skilled in the art that there is no these specific
The application also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity
The detailed description of road and method, so as not to obscure the description of the present application with unnecessary details.
In order to illustrate technical solution described herein, the following is a description of specific embodiments.
Fig. 1 is a kind of implement scene schematic diagram of vagitus recognition methods provided by the embodiments of the present application, such as
Shown in Fig. 1, the implement scene includes acquisition terminal, monitoring terminal, Cloud Server.Wherein, the acquisition terminal
It can be smart phone, tablet computer etc..It can be set in the mode of acquisition terminal installation application program from additionally configuring hardware
It is standby, advantageously reduce the hardware cost for realizing vagitus identification.
The acquisition terminal is used for the voice data using baby, according to preset duration, to the voice number of acquisition
According to carrying out segmentation interception.Such as using current time as starting point, institute in a period of time of the scheduled duration before current time is intercepted
The voice data of acquisition obtains voice data section.The scheduled duration can be 30 seconds etc..When network normally connects,
The voice data section can be sent to Cloud Server, voice data section collected is counted by the Cloud Server
It calculates, by preset identification model, identifies the two or more audio frequency characteristics for including in the voice data section
Vector generates vagitus recognition result.It certainly, can also be by acquisition eventually in the case where acquisition terminal is not connected to the network
The identification model being locally stored is held to be identified.When restoring network connection, the ratio of identification model version number can be passed through
Compared with newest identification model stored in cloud server is updated to the acquisition terminal.The monitoring terminal is care provider
The equipment of carrying can be smart phone etc..It can be received and be reminded by the application program of installation, either pass through short message or electricity
The mode of words receives prompting.
Fig. 2 is a kind of implementation process schematic diagram of vagitus recognition methods provided by the embodiments of the present application, and details are as follows:
In step s 201, voice data is acquired, the voice data section of scheduled duration is intercepted;
Specifically, herein described vagitus recognition methods, can be realized based on existing smart machine.It is being equipped with
Herein described vagitus recognition methods corresponding application program is installed in the smart machine of microphone, it can be effectively to adopting
The voice data of collection is analyzed and is handled, and the recognition result of vagitus is obtained.
After acquiring voice data, segmentation interception can be carried out to voice data according to preset scheduled duration.It can
, as the end time of voice data section, to take the voice data of scheduled duration before to current time according to current time, obtain
Voice data section.In addition, the interception time interval of the voice data section can be set according to the scheduled duration of voice data section
It is fixed.For example the certain proportion value of the scheduled duration of voice data section can be taken.As scheduled duration be 30 seconds when, the voice data
The interception time of section can be 5 seconds etc., so as to dynamically be analyzed voice data.
In step S202, calculate two or more the audio frequency characteristics that include in the voice data section to
Amount;
The audio feature vector for including in the voice data section may include zero-crossing rate characteristic sequence, energy feature sequence
It is two kinds or more of in column, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence.Wherein, described more
Rank mel-frequency cepstrum coefficient characteristic sequence can be 13 rank mel-frequency cepstrum coefficient characteristic sequences.Pass through the zero passage to audio
Two or more feature extraction in the features such as rate, energy, multistage mel-frequency cepstrum coefficient, spectral centroid, obtains
To the audio feature vector for having merged two or more characteristic sequences.Wherein, the zero-crossing rate of audio, refers to audio signal
Sign change ratio, sign change includes or becoming positive number from negative as audio signal from positive number becomes negative.
The energy can be the numerical value of the size variation of energy.
The multistage mel-frequency cepstrum is the logarithm of the non-linear melscale (mel scale) based on sound frequency
The linear transformation of energy frequency spectrum.Mel-frequency cepstrum coefficient (Mel-Frequency Cepstral Coefficients,
It MFCCs) is exactly the coefficient for forming mel-frequency cepstrum.It is derived from the cepstrum (cepstrum) of message segment.
It can choose any two characteristic sequence and calculate audio feature vector.It is, of course, preferable to embodiment be, selection packet
Include zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence and spectral centroid characteristic sequence
Multidimensional characteristic sequence.Such as multistage mel-frequency cepstrum coefficient characteristic sequence be 13 rank when, then can choose 16 dimensional feature sequences
Column, consequently facilitating more accurate recognition result can be obtained.
In addition, the audio feature vector, can choose selection zero-crossing rate characteristic sequence, energy feature sequence, multistage plum
Two or more characteristic sequence in that frequency cepstral coefficient characteristic sequence or spectral centroid characteristic sequence directly obtains
To audio feature vector.
It, can also be by selection zero-crossing rate characteristic sequence, energy feature sequence, multistage alternatively, the audio feature vector
Two or more characteristic sequence in mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence calculates
The mean value of selected characteristic sequence determines the audio feature vector according to mean value calculated.
It can also include to institute's predicate when calculating the voice data section in a kind of preferred embodiment
The step of sound data segment aggravated, one or more in framing and windowing process.Wherein:
In order to eliminate in voiced process, effect caused by vocal cords and lip, to compensate voice signal by articulatory system institute
Oppressive high frequency section highlights the formant of high frequency, is handled by exacerbation, and multiplied by a coefficient on frequency domain, this coefficient can
To be positively correlated with frequency, so that the amplitude of high frequency can be promoted.
Although voice signal macroscopically unstable, upper have short-term stationarity (can in 10---30ms microcosmic
To think that voice signal approximation is constant), according to microcosmic balance, voice signal can be divided into some short sections, i.e. framing comes
It is handled, each short section is known as a frame.
For the ease of carry out Fourier expansion, can also to voice data section carry out windowing process, i.e., by voice data with
One window function is multiplied, and keeps the overall situation more continuous, avoids the occurrence of Gibbs' effect.By windowing process, make originally without the period
The voice signal of property shows the Partial Feature of periodic function.
Moreover, it is noted that due to windowing process meeting so that the both ends of a frame signal are weakened, in framing
When, it is needed between frame and frame Chong Die.
In step S203, the audio feature vector is identified according to preset identification model, the identification is tied
Fruit is sent to monitoring terminal.
The identification model can be neural network model.A large amount of vagitus sample and noise sample can be acquired,
The audio feature vector for calculating crying sample and noise sample, is trained the neural network model, is completed according to training
Neural network model, to the audio feature vector carry out identifying processing.In a kind of optional embodiment, quilt can be acquired
The crying data for guarding baby, are trained the identification model, enable to obtain more structurally sound recognition result.
According to the recognition result of identification model, whether available current baby has vagitus, can be by the identification
As a result it is sent to monitoring terminal, so that the guardian in leave state can timely see prompting message.Due to this
Application uses various features Sequence composition audio feature vector, so that recognition result is more accurate, and by existing
Smart machine installs recognition application, can effectively carry out vagitus identification, not need in addition to purchase special identification
Equipment advantageously reduces system hardware cost.
Fig. 3 is the implementation process schematic diagram of another vagitus recognition methods provided by the embodiments of the present application, and details are as follows:
In step S301, voice data is acquired, the voice data section of scheduled duration is intercepted;
In step s 302, calculate two or more the audio frequency characteristics that include in the voice data section to
Amount;
Step S201-S202 in step S301-S302 and Fig. 2 is essentially identical.
In step S303, judge whether current network is in connection status;
In this application, when the acquisition equipment is the equipment such as smart phone, it may be in different network scenarios.Than
Such as, may acquire equipment and be in has in the scene of WIFI network, and smart machine can pass through the WIFI network and Cloud Server
Interaction, alternatively, acquisition equipment is likely to be at the state of no network connection, but acquisition equipment itself has mobile communication module, than
It such as acquires and is built-in with phonecard in equipment, alternatively, acquisition equipment can connect network, and phonecard is built-in with, below to this
The result sending method of a little scenes discusses respectively.
In step s 304, if current network is in connection status, the audio feature vector is sent to cloud clothes
It is engaged in device, so that Cloud Server is according to recognition result to the monitoring terminal sending application reminder message.
When the network for acquiring equipment is in connection status, acquisition equipment can be handed over by network and Cloud Server
Mutually, the audio data section of acquisition can be sent to server, or calculated two or more in voice data section
After audio feature vector, audio feature vector is sent to Cloud Server, the identification of vagitus is carried out by Cloud Server.If
When Cloud Server is identified in audio data section including vagitus, then prompting message can be sent to monitoring terminal by network,
Perhaps short message can also be sent to monitoring terminal or is dialed network telephone.
Certainly, when network is in connection status, the Cloud Server can also send out the latest edition number of identification model
It send to acquisition terminal, acquisition terminal by comparing, can be to if the identification model in acquisition terminal is not latest edition
Cloud Server, which is sent, updates request, downloads newest identification model.
In addition, if current network is in an off state, passing through the neural network mould being locally stored in step S305
Type identifies the audio feature vector;
If current network is in an off state, audio feature vector or voice data section can not be carried out by server
Identification, can be identifying the audio feature vector by local by way of identification model is locally stored.Once extensive
Multiple network can also update the local identification model stored.
In step S306, when recognition result is scheduled alarm result, short message is sent to monitoring terminal or is dialled
Beat alarm telephone.
If recognition result is scheduled alarm result, such as when recognizing vagitus, then sent to monitoring terminal short
Message dials alarm telephone, and care provider is prompted to look after baby in time.
Certainly, a kind of embodiment as the application optimization can also be acquired the audio frequency characteristics of baby by acquisition terminal
Vector, and baby's demand of user's input is received as a result, obtaining to identify the identification model of baby's demand.Baby's demand
It may include sucking demand, warming demand, cooling demand or sense of security demand etc..Described in audio feature vector input by acquisition
When identification model, the specific requirements of baby is exported, and the specific requirements are sent to monitoring terminal, improve the use of guardian
Convenience.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present application constitutes any limit
It is fixed.
Fig. 4 is a kind of structural schematic diagram of vagitus identification device provided by the embodiments of the present application, and details are as follows:
The vagitus identification device includes:
Data under voice unit 401 intercepts the voice data section of scheduled duration for acquiring voice data;
Audio feature vector computing unit 402, for calculate two that include in the voice data section or two with
On audio feature vector;
Recognition unit 403, for identifying the audio feature vector according to preset identification model, by the identification
As a result it is sent to monitoring terminal.
Preferably, the audio feature vector computing unit includes:
Characteristic sequence computation subunit, for calculating zero-crossing rate characteristic sequence, energy feature in the voice data section
It is two kinds or more of in sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence;
Subelement is selected, for selecting zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient special
The two or more characteristic sequence levied in sequence or spectral centroid characteristic sequence generates audio feature vector.
Vagitus identification device described in Fig. 4, it is corresponding with the vagitus recognition methods described in Fig. 2-3.
Fig. 5 is the schematic diagram for the vagitus identification equipment that one embodiment of the application provides.As shown in figure 5, the embodiment
Vagitus identification equipment 5 include: processor 50, memory 51 and be stored in the memory 51 and can be at the place
The computer program 52 run on reason device 50, such as vagitus recognizer.The processor 50 executes the computer journey
The step in above-mentioned each vagitus recognition methods embodiment is realized when sequence 52.Alternatively, the processor 50 executes the meter
The function of each module/unit in above-mentioned each Installation practice is realized when calculation machine program 52.
Illustratively, the computer program 52 can be divided into one or more module/units, it is one or
Multiple module/units are stored in the memory 51, and are executed by the processor 50, to complete the application.Described one
A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for
Implementation procedure of the computer program 52 in vagitus identification equipment 5 is described.For example, the computer program 52
It can be divided into:
Data under voice unit intercepts the voice data section of scheduled duration for acquiring voice data;
Audio feature vector computing unit, for calculating two or more for including in the voice data section
Audio feature vector;
Recognition unit ties the identification for identifying the audio feature vector according to preset identification model
Fruit is sent to monitoring terminal.
The vagitus identification equipment may include, but be not limited only to, processor 50, memory 51.Those skilled in the art
Member is appreciated that Fig. 5 is only the example of vagitus identification equipment 5, does not constitute the limit to vagitus identification equipment 5
It is fixed, it may include perhaps combining certain components or different components, such as the baby than illustrating more or fewer components
Vagitus sound identifies that equipment can also include input-output equipment, network access equipment, bus etc..
Alleged processor 50 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
The memory 51 can be the internal storage unit of the vagitus identification equipment 5, such as vagitus is known
The hard disk or memory of other equipment 5.The memory 51 is also possible to the External memory equipment of the vagitus identification equipment 5,
Such as the plug-in type hard disk being equipped in the vagitus identification equipment 5, intelligent memory card (Smart Media Card, SMC),
Secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, the memory 51 may be used also
Both to include that the vagitus identifies the internal storage unit of equipment 5 or including External memory equipment.The memory 51 is used
Other programs and data needed for storing the computer program and vagitus identification equipment.The memory 51
It can be also used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function
Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing
The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also
To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list
Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system
The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
Scope of the present application.
In embodiment provided herein, it should be understood that disclosed device/terminal device and method, it can be with
It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute
The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as
Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately
A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device
Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or
In use, can store in a computer readable storage medium.Based on this understanding, the application realizes above-mentioned implementation
All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program
Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on
The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program
Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie
Matter may include: can carry the computer program code any entity or device, recording medium, USB flash disk, mobile hard disk,
Magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described
The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice
Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and
Telecommunication signal.
Embodiment described above is only to illustrate the technical solution of the application, rather than its limitations;Although referring to aforementioned reality
Example is applied the application is described in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution should all
Comprising within the scope of protection of this application.
Claims (10)
1. a kind of vagitus recognition methods, which is characterized in that the vagitus recognition methods includes:
Voice data is acquired, the voice data section of scheduled duration is intercepted;
Calculate two or more the audio feature vector for including in the voice data section;
The audio feature vector is identified according to preset identification model, and the recognition result is sent to monitoring terminal.
2. vagitus recognition methods according to claim 1, which is characterized in that described to calculate in the voice data section
Including two or more audio feature vector the step of include:
Zero-crossing rate characteristic sequence, energy feature sequence, the multistage mel-frequency cepstrum coefficient calculated in the voice data section is special
It is two kinds or more of in sign sequence or spectral centroid characteristic sequence;
Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid special
The two or more characteristic sequence levied in sequence generates audio feature vector.
3. vagitus recognition methods according to claim 2, which is characterized in that the selection zero-crossing rate characteristic sequence,
Two kinds or two kinds in energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence with
On characteristic sequence generate audio feature vector the step of include:
Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid special
The two or more characteristic sequence in sequence is levied, the mean value of selected characteristic sequence is calculated;
The audio feature vector is determined according to mean value calculated.
4. vagitus recognition methods according to claim 1, which is characterized in that described according to preset identification mould
The step of type identifies the audio feature vector, and the recognition result is sent to monitoring terminal include:
Judge whether current network is in connection status;
If current network is in connection status, the audio feature vector is sent to Cloud Server, so that cloud service
Device is according to recognition result to the monitoring terminal sending application reminder message.
5. vagitus recognition methods according to claim 4, which is characterized in that the method also includes:
If current network is in an off state, by the neural network model that is locally stored identify the audio frequency characteristics to
Amount;
When recognition result is scheduled alarm result, short message is sent to monitoring terminal or dials alarm telephone.
6. vagitus recognition methods according to claim 1, which is characterized in that calculate the voice data section described
In include two or more audio feature vector the step of before, the method also includes:
The voice data section is aggravated, the one or more in framing and windowing process.
7. a kind of vagitus identification device, which is characterized in that the vagitus identification device includes:
Data under voice unit intercepts the voice data section of scheduled duration for acquiring voice data;
Audio feature vector computing unit, for calculating two or more the audio for including in the voice data section
Feature vector;
Recognition unit sends out the recognition result for identifying the audio feature vector according to preset identification model
It send to monitoring terminal.
8. vagitus identification device according to claim 7, which is characterized in that the audio feature vector computing unit
Include:
Characteristic sequence computation subunit, for calculate the zero-crossing rate characteristic sequence in the voice data section, energy feature sequence,
It is two kinds or more of in multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence;
Subelement is selected, for selecting zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient feature sequence
Two or more characteristic sequence in column or spectral centroid characteristic sequence generates audio feature vector.
9. a kind of vagitus identifies equipment, including memory, processor and storage are in the memory and can be described
The computer program run on processor, which is characterized in that the processor realizes such as right when executing the computer program
It is required that the step of any one of 1 to 6 vagitus recognition methods.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In realizing the vagitus recognition methods as described in any one of claim 1 to 6 when the computer program is executed by processor
Step.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910029052.4A CN109658953A (en) | 2019-01-12 | 2019-01-12 | A kind of vagitus recognition methods, device and equipment |
PCT/CN2019/130824 WO2020143512A1 (en) | 2019-01-12 | 2019-12-31 | Infant crying recognition method, apparatus, and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910029052.4A CN109658953A (en) | 2019-01-12 | 2019-01-12 | A kind of vagitus recognition methods, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109658953A true CN109658953A (en) | 2019-04-19 |
Family
ID=66119244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910029052.4A Pending CN109658953A (en) | 2019-01-12 | 2019-01-12 | A kind of vagitus recognition methods, device and equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109658953A (en) |
WO (1) | WO2020143512A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415724A (en) * | 2019-08-08 | 2019-11-05 | 中南大学湘雅二医院 | Transmission method, device, system and the computer readable storage medium of alert data |
CN111128100A (en) * | 2019-12-20 | 2020-05-08 | 网易(杭州)网络有限公司 | Rhythm point detection method and device and electronic equipment |
CN111370025A (en) * | 2020-02-25 | 2020-07-03 | 广州酷狗计算机科技有限公司 | Audio recognition method and device and computer storage medium |
WO2020143512A1 (en) * | 2019-01-12 | 2020-07-16 | 深圳先进技术研究院 | Infant crying recognition method, apparatus, and device |
CN112270932A (en) * | 2020-10-22 | 2021-01-26 | 北京小米松果电子有限公司 | Alarm method and device for intelligent device, electronic device and storage medium |
CN112382302A (en) * | 2020-12-02 | 2021-02-19 | 漳州立达信光电子科技有限公司 | Baby cry identification method and terminal equipment |
CN112992136A (en) * | 2020-12-16 | 2021-06-18 | 呼唤(上海)云计算股份有限公司 | Intelligent infant monitoring system and method |
CN113436650A (en) * | 2021-08-25 | 2021-09-24 | 深圳市北科瑞声科技股份有限公司 | Baby cry identification method and device, electronic equipment and storage medium |
EP3940698A1 (en) | 2020-07-13 | 2022-01-19 | Zoundream AG | A computer-implemented method of providing data for an automated baby cry assessment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102163427A (en) * | 2010-12-20 | 2011-08-24 | 北京邮电大学 | Method for detecting audio exceptional event based on environmental model |
CN102246228A (en) * | 2008-12-15 | 2011-11-16 | 音频分析有限公司 | Sound identification systems |
KR20120107382A (en) * | 2011-03-21 | 2012-10-02 | 호서대학교 산학협력단 | Device for analyzing crying of infants |
CN103280220A (en) * | 2013-04-25 | 2013-09-04 | 北京大学深圳研究生院 | Real-time recognition method for baby cry |
CN103489282A (en) * | 2013-09-24 | 2014-01-01 | 华南理工大学 | Infant monitor capable of identifying infant crying sound and method for identifying infant crying sound |
CN107106027A (en) * | 2014-12-16 | 2017-08-29 | 皇家飞利浦有限公司 | Baby sleep monitor |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5278952B2 (en) * | 2009-03-09 | 2013-09-04 | 国立大学法人福井大学 | Infant emotion diagnosis apparatus and method |
CN107808658A (en) * | 2016-09-06 | 2018-03-16 | 深圳声联网科技有限公司 | Based on real-time baby's audio serial behavior detection method under domestic environment |
CN106653001B (en) * | 2016-11-17 | 2020-03-27 | 沈晓明 | Method and system for identifying baby crying |
CN107818779A (en) * | 2017-09-15 | 2018-03-20 | 北京理工大学 | A kind of infant's crying sound detection method, apparatus, equipment and medium |
CN109658953A (en) * | 2019-01-12 | 2019-04-19 | 深圳先进技术研究院 | A kind of vagitus recognition methods, device and equipment |
-
2019
- 2019-01-12 CN CN201910029052.4A patent/CN109658953A/en active Pending
- 2019-12-31 WO PCT/CN2019/130824 patent/WO2020143512A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102246228A (en) * | 2008-12-15 | 2011-11-16 | 音频分析有限公司 | Sound identification systems |
CN102163427A (en) * | 2010-12-20 | 2011-08-24 | 北京邮电大学 | Method for detecting audio exceptional event based on environmental model |
KR20120107382A (en) * | 2011-03-21 | 2012-10-02 | 호서대학교 산학협력단 | Device for analyzing crying of infants |
CN103280220A (en) * | 2013-04-25 | 2013-09-04 | 北京大学深圳研究生院 | Real-time recognition method for baby cry |
CN103489282A (en) * | 2013-09-24 | 2014-01-01 | 华南理工大学 | Infant monitor capable of identifying infant crying sound and method for identifying infant crying sound |
CN107106027A (en) * | 2014-12-16 | 2017-08-29 | 皇家飞利浦有限公司 | Baby sleep monitor |
Non-Patent Citations (1)
Title |
---|
韩志艳著: "《语音识别及语音可视化技术研究》", 31 January 2017 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020143512A1 (en) * | 2019-01-12 | 2020-07-16 | 深圳先进技术研究院 | Infant crying recognition method, apparatus, and device |
CN110415724A (en) * | 2019-08-08 | 2019-11-05 | 中南大学湘雅二医院 | Transmission method, device, system and the computer readable storage medium of alert data |
CN111128100A (en) * | 2019-12-20 | 2020-05-08 | 网易(杭州)网络有限公司 | Rhythm point detection method and device and electronic equipment |
CN111128100B (en) * | 2019-12-20 | 2021-04-20 | 网易(杭州)网络有限公司 | Rhythm point detection method and device and electronic equipment |
CN111370025A (en) * | 2020-02-25 | 2020-07-03 | 广州酷狗计算机科技有限公司 | Audio recognition method and device and computer storage medium |
EP3940698A1 (en) | 2020-07-13 | 2022-01-19 | Zoundream AG | A computer-implemented method of providing data for an automated baby cry assessment |
WO2022012777A1 (en) | 2020-07-13 | 2022-01-20 | Zoundream Ag | A computer-implemented method of providing data for an automated baby cry assessment |
CN112270932A (en) * | 2020-10-22 | 2021-01-26 | 北京小米松果电子有限公司 | Alarm method and device for intelligent device, electronic device and storage medium |
CN112382302A (en) * | 2020-12-02 | 2021-02-19 | 漳州立达信光电子科技有限公司 | Baby cry identification method and terminal equipment |
CN112992136A (en) * | 2020-12-16 | 2021-06-18 | 呼唤(上海)云计算股份有限公司 | Intelligent infant monitoring system and method |
CN113436650A (en) * | 2021-08-25 | 2021-09-24 | 深圳市北科瑞声科技股份有限公司 | Baby cry identification method and device, electronic equipment and storage medium |
CN113436650B (en) * | 2021-08-25 | 2021-11-16 | 深圳市北科瑞声科技股份有限公司 | Baby cry identification method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020143512A1 (en) | 2020-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109658953A (en) | A kind of vagitus recognition methods, device and equipment | |
CN110459222A (en) | Sound control method, phonetic controller and terminal device | |
CN111862951B (en) | Voice endpoint detection method and device, storage medium and electronic equipment | |
CN110428824A (en) | A kind of exchange method of intelligent sound box, device and intelligent sound box | |
CN108847222B (en) | Speech recognition model generation method and device, storage medium and electronic equipment | |
CN111161728B (en) | Awakening method, awakening device, awakening equipment and awakening medium of intelligent equipment | |
US11133022B2 (en) | Method and device for audio recognition using sample audio and a voting matrix | |
CN110290280B (en) | Terminal state identification method and device and storage medium | |
CN110010125A (en) | A kind of control method of intelligent robot, device, terminal device and medium | |
CN113239872A (en) | Event identification method, device, equipment and storage medium | |
CN114356703A (en) | Root cause analysis method and device | |
CN113823313A (en) | Voice processing method, device, equipment and storage medium | |
CN108347531A (en) | Recourse method, recourse device, electronic equipment and computer readable storage medium | |
CN108231074A (en) | A kind of data processing method, voice assistant equipment and computer readable storage medium | |
CN112398952A (en) | Electronic resource pushing method, system, equipment and storage medium | |
CN112002339B (en) | Speech noise reduction method and device, computer-readable storage medium and electronic device | |
CN110263135A (en) | A kind of data exchange matching process, device, medium and electronic equipment | |
CN109767751A (en) | Production method, device, computer equipment and the storage device of baby comforting music | |
CN112331187B (en) | Multi-task speech recognition model training method and multi-task speech recognition method | |
CN108540858A (en) | A kind of method, apparatus and equipment that prevent user from indulging TV programme | |
CN109975795B (en) | Sound source tracking method and device | |
CN112489678A (en) | Scene recognition method and device based on channel characteristics | |
CN111899747A (en) | Method and apparatus for synthesizing audio | |
CN110534128A (en) | A kind of noise processing method, device, equipment and storage medium | |
CN113014460A (en) | Voice processing method, home master control device, voice system and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190419 |