CN109658953A

CN109658953A - A kind of vagitus recognition methods, device and equipment

Info

Publication number: CN109658953A
Application number: CN201910029052.4A
Authority: CN
Inventors: 乔宇; 王群
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-01-12
Filing date: 2019-01-12
Publication date: 2019-04-19
Also published as: WO2020143512A1

Abstract

A kind of vagitus recognition methods includes: acquisition voice data, intercepts the voice data section of scheduled duration；Calculate two or more the audio feature vector for including in the voice data section；The audio feature vector is identified according to preset identification model, and the recognition result is sent to monitoring terminal.Since recognition result is to be identified according to two or more audio feature vectors, thus recognition result is more accurate and reliable, be conducive to the precision for improving vagitus identification.

Description

A kind of vagitus recognition methods, device and equipment

Technical field

The application belongs to voice recognition field more particularly to a kind of vagitus recognition methods, device and equipment.

Background technique

Ewborn infant generally passes to the external world by the crying of baby in the demand of showing emotion or psychological need. In real life, since the guardians such as the old man of family or mother of baby are entrusted in the work for looking after ewborn infant substantially, And guardian is often simultaneously as more duties.Thus, when baby is in sleep state, guardian may be busy with it Its thing and be in leave state.Due to space length, the crying of baby may not be able to children under guardianship directly hear, Prevent guardian is from timely hearing the demand of baby.

In order to enable guardian can timely hear the demand of baby, the prompting of some vagitus is had already appeared at present Data acquisition equipment is accessed network, by cloud come complete based on data acquisition equipments such as camera or wearable devices by device At identification work.Identifying schemes common at present are to determine whether to need based on these single indexs of decibel, zero-crossing rate or energy It alarms, when occurring interference sound in environment, is easy so that rate of false alarm is higher.

Summary of the invention

In view of this, the embodiment of the present application provides a kind of vagitus recognition methods, device and equipment, it is existing to solve It is easy so that the high problem of rate of false alarm when environment interferes in the method for identifying vagitus in technology.

The first aspect of the embodiment of the present application provides a kind of vagitus recognition methods, the vagitus recognition methods Include:

Voice data is acquired, the voice data section of scheduled duration is intercepted；

Calculate two or more the audio feature vector for including in the voice data section；

The audio feature vector is identified according to preset identification model, and the recognition result is sent to monitoring eventually End.

With reference to first aspect, described to calculate the voice data section in the first possible implementation of first aspect In include two or more audio feature vector the step of include:

Calculate zero-crossing rate characteristic sequence in the voice data section, energy feature sequence, multistage mel-frequency cepstrum system It is two kinds or more of in number characteristic sequence or spectral centroid characteristic sequence；

Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or frequency spectrum matter Two or more characteristic sequence in heart characteristic sequence generates audio feature vector.

The possible implementation of with reference to first aspect the first, in second of possible implementation of first aspect, institute State selection zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid feature The step of two or more characteristic sequence in sequence generates audio feature vector include:

Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or frequency spectrum matter Two or more characteristic sequence in heart characteristic sequence, calculates the mean value of selected characteristic sequence；

The audio feature vector is determined according to mean value calculated.

With reference to first aspect, described according to preset identification in the third possible implementation of first aspect The step of model identifies the audio feature vector, and the recognition result is sent to monitoring terminal include:

Judge whether current network is in connection status；

If current network is in connection status, the audio feature vector is sent to Cloud Server, so that cloud Server is according to recognition result to the monitoring terminal sending application reminder message.

The third possible implementation with reference to first aspect, in the 4th kind of possible implementation of first aspect, institute State method further include:

If current network is in an off state, the audio frequency characteristics are identified by the neural network model being locally stored Vector；

When recognition result is scheduled alarm result, short message is sent to monitoring terminal or dials alarm telephone.

With reference to first aspect, in the 5th kind of possible implementation of first aspect, the voice data is calculated described Before the step of two or more audio feature vector for including in section, the method also includes:

The voice data section is aggravated, the one or more in framing and windowing process.

The second aspect of the embodiment of the present application provides a kind of vagitus identification device, the vagitus identification device Include:

Data under voice unit intercepts the voice data section of scheduled duration for acquiring voice data；

Audio feature vector computing unit, for calculating two or more for including in the voice data section Audio feature vector；

Recognition unit ties the identification for identifying the audio feature vector according to preset identification model Fruit is sent to monitoring terminal.

In conjunction with second aspect, in the first possible implementation of second aspect, the audio feature vector calculates single Member includes:

Characteristic sequence computation subunit, for calculating zero-crossing rate characteristic sequence, energy feature in the voice data section It is two kinds or more of in sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence；

Subelement is selected, for selecting zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient special The two or more characteristic sequence levied in sequence or spectral centroid characteristic sequence generates audio feature vector.

The third aspect of the embodiment of the present application provide a kind of vagitus identification equipment, including memory, processor with And the computer program that can be run in the memory and on the processor is stored, the processor executes the calculating It is realized when machine program as described in any one of first aspect the step of vagitus recognition methods.

The fourth aspect of the embodiment of the present application provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, and the baby as described in any one of first aspect is realized when the computer program is executed by processor The step of crying recognition methods.

Existing beneficial effect is the embodiment of the present application compared with prior art: by acquiring voice data, interception is predetermined The voice data section of duration calculates the two or more audio feature vectors for including in the voice data section, according to Preset identification model identifies the audio feature vector, and recognition result is sent to monitoring terminal.Since identification is tied Fruit is to be identified according to two or more audio feature vectors, thus recognition result is more accurate and reliable, is conducive to mention The precision of high vagitus identification.

Detailed description of the invention

It in order to more clearly explain the technical solutions in the embodiments of the present application, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only some of the application Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is a kind of implement scene schematic diagram of vagitus recognition methods provided by the embodiments of the present application；

Fig. 2 is a kind of implementation process schematic diagram of vagitus recognition methods provided by the embodiments of the present application；

Fig. 3 is the implementation process schematic diagram of another vagitus recognition methods provided by the embodiments of the present application；

Fig. 4 is a kind of schematic diagram of vagitus identification device provided by the embodiments of the present application；

Fig. 5 is the schematic diagram of vagitus identification equipment provided by the embodiments of the present application.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, so as to provide a thorough understanding of the present application embodiment.However, it will be clear to one skilled in the art that there is no these specific The application also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, so as not to obscure the description of the present application with unnecessary details.

In order to illustrate technical solution described herein, the following is a description of specific embodiments.

Fig. 1 is a kind of implement scene schematic diagram of vagitus recognition methods provided by the embodiments of the present application, such as

Shown in Fig. 1, the implement scene includes acquisition terminal, monitoring terminal, Cloud Server.Wherein, the acquisition terminal It can be smart phone, tablet computer etc..It can be set in the mode of acquisition terminal installation application program from additionally configuring hardware It is standby, advantageously reduce the hardware cost for realizing vagitus identification.

The acquisition terminal is used for the voice data using baby, according to preset duration, to the voice number of acquisition According to carrying out segmentation interception.Such as using current time as starting point, institute in a period of time of the scheduled duration before current time is intercepted The voice data of acquisition obtains voice data section.The scheduled duration can be 30 seconds etc..When network normally connects, The voice data section can be sent to Cloud Server, voice data section collected is counted by the Cloud Server It calculates, by preset identification model, identifies the two or more audio frequency characteristics for including in the voice data section Vector generates vagitus recognition result.It certainly, can also be by acquisition eventually in the case where acquisition terminal is not connected to the network The identification model being locally stored is held to be identified.When restoring network connection, the ratio of identification model version number can be passed through Compared with newest identification model stored in cloud server is updated to the acquisition terminal.The monitoring terminal is care provider The equipment of carrying can be smart phone etc..It can be received and be reminded by the application program of installation, either pass through short message or electricity The mode of words receives prompting.

Fig. 2 is a kind of implementation process schematic diagram of vagitus recognition methods provided by the embodiments of the present application, and details are as follows:

In step s 201, voice data is acquired, the voice data section of scheduled duration is intercepted；

Specifically, herein described vagitus recognition methods, can be realized based on existing smart machine.It is being equipped with Herein described vagitus recognition methods corresponding application program is installed in the smart machine of microphone, it can be effectively to adopting The voice data of collection is analyzed and is handled, and the recognition result of vagitus is obtained.

After acquiring voice data, segmentation interception can be carried out to voice data according to preset scheduled duration.It can , as the end time of voice data section, to take the voice data of scheduled duration before to current time according to current time, obtain Voice data section.In addition, the interception time interval of the voice data section can be set according to the scheduled duration of voice data section It is fixed.For example the certain proportion value of the scheduled duration of voice data section can be taken.As scheduled duration be 30 seconds when, the voice data The interception time of section can be 5 seconds etc., so as to dynamically be analyzed voice data.

In step S202, calculate two or more the audio frequency characteristics that include in the voice data section to Amount；

The audio feature vector for including in the voice data section may include zero-crossing rate characteristic sequence, energy feature sequence It is two kinds or more of in column, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence.Wherein, described more Rank mel-frequency cepstrum coefficient characteristic sequence can be 13 rank mel-frequency cepstrum coefficient characteristic sequences.Pass through the zero passage to audio Two or more feature extraction in the features such as rate, energy, multistage mel-frequency cepstrum coefficient, spectral centroid, obtains To the audio feature vector for having merged two or more characteristic sequences.Wherein, the zero-crossing rate of audio, refers to audio signal Sign change ratio, sign change includes or becoming positive number from negative as audio signal from positive number becomes negative.

The energy can be the numerical value of the size variation of energy.

The multistage mel-frequency cepstrum is the logarithm of the non-linear melscale (mel scale) based on sound frequency The linear transformation of energy frequency spectrum.Mel-frequency cepstrum coefficient (Mel-Frequency Cepstral Coefficients, It MFCCs) is exactly the coefficient for forming mel-frequency cepstrum.It is derived from the cepstrum (cepstrum) of message segment.

It can choose any two characteristic sequence and calculate audio feature vector.It is, of course, preferable to embodiment be, selection packet Include zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence and spectral centroid characteristic sequence Multidimensional characteristic sequence.Such as multistage mel-frequency cepstrum coefficient characteristic sequence be 13 rank when, then can choose 16 dimensional feature sequences Column, consequently facilitating more accurate recognition result can be obtained.

In addition, the audio feature vector, can choose selection zero-crossing rate characteristic sequence, energy feature sequence, multistage plum Two or more characteristic sequence in that frequency cepstral coefficient characteristic sequence or spectral centroid characteristic sequence directly obtains To audio feature vector.

It, can also be by selection zero-crossing rate characteristic sequence, energy feature sequence, multistage alternatively, the audio feature vector Two or more characteristic sequence in mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence calculates The mean value of selected characteristic sequence determines the audio feature vector according to mean value calculated.

It can also include to institute's predicate when calculating the voice data section in a kind of preferred embodiment The step of sound data segment aggravated, one or more in framing and windowing process.Wherein:

In order to eliminate in voiced process, effect caused by vocal cords and lip, to compensate voice signal by articulatory system institute Oppressive high frequency section highlights the formant of high frequency, is handled by exacerbation, and multiplied by a coefficient on frequency domain, this coefficient can To be positively correlated with frequency, so that the amplitude of high frequency can be promoted.

Although voice signal macroscopically unstable, upper have short-term stationarity (can in 10---30ms microcosmic To think that voice signal approximation is constant), according to microcosmic balance, voice signal can be divided into some short sections, i.e. framing comes It is handled, each short section is known as a frame.

For the ease of carry out Fourier expansion, can also to voice data section carry out windowing process, i.e., by voice data with One window function is multiplied, and keeps the overall situation more continuous, avoids the occurrence of Gibbs' effect.By windowing process, make originally without the period The voice signal of property shows the Partial Feature of periodic function.

Moreover, it is noted that due to windowing process meeting so that the both ends of a frame signal are weakened, in framing When, it is needed between frame and frame Chong Die.

In step S203, the audio feature vector is identified according to preset identification model, the identification is tied Fruit is sent to monitoring terminal.

The identification model can be neural network model.A large amount of vagitus sample and noise sample can be acquired, The audio feature vector for calculating crying sample and noise sample, is trained the neural network model, is completed according to training Neural network model, to the audio feature vector carry out identifying processing.In a kind of optional embodiment, quilt can be acquired The crying data for guarding baby, are trained the identification model, enable to obtain more structurally sound recognition result.

According to the recognition result of identification model, whether available current baby has vagitus, can be by the identification As a result it is sent to monitoring terminal, so that the guardian in leave state can timely see prompting message.Due to this Application uses various features Sequence composition audio feature vector, so that recognition result is more accurate, and by existing Smart machine installs recognition application, can effectively carry out vagitus identification, not need in addition to purchase special identification Equipment advantageously reduces system hardware cost.

Fig. 3 is the implementation process schematic diagram of another vagitus recognition methods provided by the embodiments of the present application, and details are as follows:

In step S301, voice data is acquired, the voice data section of scheduled duration is intercepted；

In step s 302, calculate two or more the audio frequency characteristics that include in the voice data section to Amount；

Step S201-S202 in step S301-S302 and Fig. 2 is essentially identical.

In step S303, judge whether current network is in connection status；

In this application, when the acquisition equipment is the equipment such as smart phone, it may be in different network scenarios.Than Such as, may acquire equipment and be in has in the scene of WIFI network, and smart machine can pass through the WIFI network and Cloud Server Interaction, alternatively, acquisition equipment is likely to be at the state of no network connection, but acquisition equipment itself has mobile communication module, than It such as acquires and is built-in with phonecard in equipment, alternatively, acquisition equipment can connect network, and phonecard is built-in with, below to this The result sending method of a little scenes discusses respectively.

In step s 304, if current network is in connection status, the audio feature vector is sent to cloud clothes It is engaged in device, so that Cloud Server is according to recognition result to the monitoring terminal sending application reminder message.

When the network for acquiring equipment is in connection status, acquisition equipment can be handed over by network and Cloud Server Mutually, the audio data section of acquisition can be sent to server, or calculated two or more in voice data section After audio feature vector, audio feature vector is sent to Cloud Server, the identification of vagitus is carried out by Cloud Server.If When Cloud Server is identified in audio data section including vagitus, then prompting message can be sent to monitoring terminal by network, Perhaps short message can also be sent to monitoring terminal or is dialed network telephone.

Certainly, when network is in connection status, the Cloud Server can also send out the latest edition number of identification model It send to acquisition terminal, acquisition terminal by comparing, can be to if the identification model in acquisition terminal is not latest edition Cloud Server, which is sent, updates request, downloads newest identification model.

In addition, if current network is in an off state, passing through the neural network mould being locally stored in step S305 Type identifies the audio feature vector；

If current network is in an off state, audio feature vector or voice data section can not be carried out by server Identification, can be identifying the audio feature vector by local by way of identification model is locally stored.Once extensive Multiple network can also update the local identification model stored.

In step S306, when recognition result is scheduled alarm result, short message is sent to monitoring terminal or is dialled Beat alarm telephone.

If recognition result is scheduled alarm result, such as when recognizing vagitus, then sent to monitoring terminal short Message dials alarm telephone, and care provider is prompted to look after baby in time.

Certainly, a kind of embodiment as the application optimization can also be acquired the audio frequency characteristics of baby by acquisition terminal Vector, and baby's demand of user's input is received as a result, obtaining to identify the identification model of baby's demand.Baby's demand It may include sucking demand, warming demand, cooling demand or sense of security demand etc..Described in audio feature vector input by acquisition When identification model, the specific requirements of baby is exported, and the specific requirements are sent to monitoring terminal, improve the use of guardian Convenience.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present application constitutes any limit It is fixed.

Fig. 4 is a kind of structural schematic diagram of vagitus identification device provided by the embodiments of the present application, and details are as follows:

The vagitus identification device includes:

Data under voice unit 401 intercepts the voice data section of scheduled duration for acquiring voice data；

Audio feature vector computing unit 402, for calculate two that include in the voice data section or two with On audio feature vector；

Recognition unit 403, for identifying the audio feature vector according to preset identification model, by the identification As a result it is sent to monitoring terminal.

Preferably, the audio feature vector computing unit includes:

Vagitus identification device described in Fig. 4, it is corresponding with the vagitus recognition methods described in Fig. 2-3.

Fig. 5 is the schematic diagram for the vagitus identification equipment that one embodiment of the application provides.As shown in figure 5, the embodiment Vagitus identification equipment 5 include: processor 50, memory 51 and be stored in the memory 51 and can be at the place The computer program 52 run on reason device 50, such as vagitus recognizer.The processor 50 executes the computer journey The step in above-mentioned each vagitus recognition methods embodiment is realized when sequence 52.Alternatively, the processor 50 executes the meter The function of each module/unit in above-mentioned each Installation practice is realized when calculation machine program 52.

Illustratively, the computer program 52 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 51, and are executed by the processor 50, to complete the application.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 52 in vagitus identification equipment 5 is described.For example, the computer program 52 It can be divided into:

The vagitus identification equipment may include, but be not limited only to, processor 50, memory 51.Those skilled in the art Member is appreciated that Fig. 5 is only the example of vagitus identification equipment 5, does not constitute the limit to vagitus identification equipment 5 It is fixed, it may include perhaps combining certain components or different components, such as the baby than illustrating more or fewer components Vagitus sound identifies that equipment can also include input-output equipment, network access equipment, bus etc..

Alleged processor 50 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

The memory 51 can be the internal storage unit of the vagitus identification equipment 5, such as vagitus is known The hard disk or memory of other equipment 5.The memory 51 is also possible to the External memory equipment of the vagitus identification equipment 5, Such as the plug-in type hard disk being equipped in the vagitus identification equipment 5, intelligent memory card (Smart Media Card, SMC), Secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, the memory 51 may be used also Both to include that the vagitus identifies the internal storage unit of equipment 5 or including External memory equipment.The memory 51 is used Other programs and data needed for storing the computer program and vagitus identification equipment.The memory 51 It can be also used for temporarily storing the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed Scope of the present application.

In embodiment provided herein, it should be understood that disclosed device/terminal device and method, it can be with It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the application realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie Matter may include: can carry the computer program code any entity or device, recording medium, USB flash disk, mobile hard disk, Magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and Telecommunication signal.

Embodiment described above is only to illustrate the technical solution of the application, rather than its limitations；Although referring to aforementioned reality Example is applied the application is described in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution should all Comprising within the scope of protection of this application.

Claims

1. a kind of vagitus recognition methods, which is characterized in that the vagitus recognition methods includes:

The audio feature vector is identified according to preset identification model, and the recognition result is sent to monitoring terminal.

2. vagitus recognition methods according to claim 1, which is characterized in that described to calculate in the voice data section Including two or more audio feature vector the step of include:

Zero-crossing rate characteristic sequence, energy feature sequence, the multistage mel-frequency cepstrum coefficient calculated in the voice data section is special It is two kinds or more of in sign sequence or spectral centroid characteristic sequence；

Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid special The two or more characteristic sequence levied in sequence generates audio feature vector.

3. vagitus recognition methods according to claim 2, which is characterized in that the selection zero-crossing rate characteristic sequence, Two kinds or two kinds in energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence with On characteristic sequence generate audio feature vector the step of include:

Select zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid special The two or more characteristic sequence in sequence is levied, the mean value of selected characteristic sequence is calculated；

The audio feature vector is determined according to mean value calculated.

4. vagitus recognition methods according to claim 1, which is characterized in that described according to preset identification mould The step of type identifies the audio feature vector, and the recognition result is sent to monitoring terminal include:

Judge whether current network is in connection status；

If current network is in connection status, the audio feature vector is sent to Cloud Server, so that cloud service Device is according to recognition result to the monitoring terminal sending application reminder message.

5. vagitus recognition methods according to claim 4, which is characterized in that the method also includes:

If current network is in an off state, by the neural network model that is locally stored identify the audio frequency characteristics to Amount；

6. vagitus recognition methods according to claim 1, which is characterized in that calculate the voice data section described In include two or more audio feature vector the step of before, the method also includes:

7. a kind of vagitus identification device, which is characterized in that the vagitus identification device includes:

Audio feature vector computing unit, for calculating two or more the audio for including in the voice data section Feature vector；

Recognition unit sends out the recognition result for identifying the audio feature vector according to preset identification model It send to monitoring terminal.

8. vagitus identification device according to claim 7, which is characterized in that the audio feature vector computing unit Include:

Characteristic sequence computation subunit, for calculate the zero-crossing rate characteristic sequence in the voice data section, energy feature sequence, It is two kinds or more of in multistage mel-frequency cepstrum coefficient characteristic sequence or spectral centroid characteristic sequence；

Subelement is selected, for selecting zero-crossing rate characteristic sequence, energy feature sequence, multistage mel-frequency cepstrum coefficient feature sequence Two or more characteristic sequence in column or spectral centroid characteristic sequence generates audio feature vector.

9. a kind of vagitus identifies equipment, including memory, processor and storage are in the memory and can be described The computer program run on processor, which is characterized in that the processor realizes such as right when executing the computer program It is required that the step of any one of 1 to 6 vagitus recognition methods.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realizing the vagitus recognition methods as described in any one of claim 1 to 6 when the computer program is executed by processor Step.