CN112185364A

CN112185364A - Method and device for detecting baby crying

Info

Publication number: CN112185364A
Application number: CN202011039588.3A
Authority: CN
Inventors: 徐俊峰
Original assignee: AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2021-01-05

Abstract

The invention discloses a method for detecting baby crying, which comprises the following steps: outputting a confidence level of the infant crying sound via an infant crying sound classifier in response to the received audio signal, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on a deep learning model; judging whether the confidence coefficient of the baby crying sound is smaller than a preset confidence coefficient threshold value or not; and if the confidence coefficient of the infant crying sound is not less than the preset confidence coefficient threshold value, outputting an infant crying detection success signal. The problem of far-field identification of the infant crying can be solved by enhancing the infant crying sound through the microphone array, and the infant crying sound classifier is trained by taking various types of infant crying sounds as expected sounds based on the deep learning model, so that the identification performance of the infant crying sound classifier can be improved, and the effect of high confidence degree accuracy of the output infant crying sound can be achieved.

Description

Method and device for detecting baby crying

Technical Field

The invention belongs to the technical field of voice recognition, and particularly relates to a method and a device for detecting baby crying.

Background

Crying is an instinctive response of infants, especially infants younger than two years of age, and because they do not yet have speech expression ability, crying is the most important way they express feelings to respond to external stimuli, so when an infant crying, a caretaker needs to attend to it in a timely manner. However, in a real environment, a caretaker cannot attend at all times, and particularly when the baby is asleep, the caretaker often does other tasks such as doing housework, watching television and the like, and if the baby cries at the moment, the caretaker, particularly the elderly, often cannot hear and attend in time, so that the baby can be accidentally injured and sadness is brought to the whole family.

At present, some techniques for detecting crying of infants are available, and the main principle is to judge whether an infant is in a crying state by counting the characteristics of external audio within a period of time based on the characteristics of higher volume and higher audio frequency when the infant cryes.

In the process of implementing the present application, the inventor finds that the prior art solution has at least the following problems: the baby cry is small and is a little far away, and the recognition rate is seriously reduced; there are also sounds in normal speech that resemble a baby crying, making misidentification more serious.

Disclosure of Invention

The embodiment of the invention provides a method and a device for detecting baby crying, which are used for solving at least one of the technical problems.

In a first aspect, an embodiment of the present invention provides a method for detecting baby crying, including: outputting a confidence level of the infant crying sound via an infant crying sound classifier in response to the received audio signal, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on a deep learning model; judging whether the confidence coefficient of the baby crying sound is smaller than a preset confidence coefficient threshold value or not; and if the confidence coefficient of the infant crying sound is not less than the preset confidence coefficient threshold value, outputting an infant crying detection success signal.

In a second aspect, an embodiment of the present invention provides an apparatus for detecting baby crying, including: a first output module configured to output a confidence level of the infant crying sound via an infant crying sound classifier in response to the received audio signal, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on a deep learning model; the judging module is configured to judge whether the confidence of the infant crying sound is smaller than a preset confidence threshold value; the second output module is configured to output a signal indicating that the baby cry detection is successful if the confidence of the baby cry sound is not less than the preset confidence threshold.

In a third aspect, an electronic device is provided, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method for detecting baby crying according to any of the embodiments of the present invention.

In a fourth aspect, embodiments of the present invention also provide a computer program product including a computer program stored on a non-volatile computer-readable storage medium, the computer program including program instructions which, when executed by a computer, cause the computer to perform the steps of the infant crying detection method of any of the embodiments of the present invention.

According to the scheme provided by the method and the device, the infant crying sound is enhanced through the microphone array, the problem of far-field identification of the infant crying can be solved, the deep learning model is adopted, massive infant crying sounds and similar infant crying sounds can be trained, and the identification performance of the infant crying model can be further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a method for detecting crying of an infant according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a method for detecting crying of an infant according to an embodiment of the present invention;

FIG. 3 is a block diagram of an apparatus for detecting baby crying according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Please refer to fig. 1, which shows a flowchart of an embodiment of the baby crying detection method according to the present application, and the baby crying detection method according to the present embodiment may be applied to terminals with a language model or a real-time voice conversation function, such as an intelligent voice television, an intelligent sound box, and other existing intelligent terminals supporting intelligent voice recognition.

As shown in fig. 1, in step 101, in response to the received audio signal, a confidence level of the infant crying sound is output via an infant crying sound classifier, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on a deep learning model;

in step 102, judging whether the confidence of the baby crying sound is smaller than a preset confidence threshold;

in step 103, if the confidence of the baby cry is not less than the preset confidence threshold, a signal indicating that the baby cry is successfully detected is output.

In this embodiment, for step 101, the infant crying detection apparatus outputs the confidence level of the infant crying sound via the infant crying sound classifier in response to the received audio signal, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on the deep learning model. Then, in step 102, the infant crying detection apparatus determines whether the confidence of the infant crying sound is smaller than a preset confidence threshold, wherein the confidence may be a value normalized to 0 to 1, and the higher the value is, the higher the confidence is, the higher the probability of the infant crying sound is. Then, in step 103, if the confidence of the baby cry is not less than the preset confidence threshold, the baby cry detection apparatus outputs a signal indicating that the baby cry detection is successful.

According to the scheme provided by the embodiment, the infant crying sound classifiers are trained by taking various types of infant crying sounds as expected sounds based on the deep learning models, so that the identification performance of the infant crying sound classifiers can be improved, and the effect of high confidence degree accuracy of the output infant crying sounds is realized.

Further, the infant crying sound classifier is trained using at least one similar infant crying sound as a counterexample based on the deep learning model. Therefore, the sounds similar to the baby crying can be analyzed, and a large amount of similar audios are collected to be used as counterexamples to be added into the model training, so that the false recognition of the similar sounds of the baby crying by the model is reduced.

Specifically, the counter example of similar baby crying may be an animal cry, such as a cat cry, and the music background melody is similar in frequency spectrum.

In a preferred embodiment, the infant crying detection apparatus may perform signal enhancement on the audio signal based on the microphone array in response to the audio signal acquired in real time.

The microphone array in the scheme provided by the embodiment can be an array formed by a plurality of microphones, and compared with a single microphone, the microphone array has the advantages that the spatial information of voice can be obtained, spatial filtering can be realized, the directional noise can be well inhibited, signals in an undesired direction can be inhibited, signals of target sound can be retained, and the signal enhancement effect can be achieved.

Wherein the microphone array includes: the microphone array comprises a double-microphone array, a linear four-microphone array, an annular four-microphone array and an annular six-microphone array.

Specifically, the baby crying detection device responds to the received baby crying detection success signal, and the system sends detection success information to the user. And sending the detection success information comprises sending a short message or voice broadcast.

In a specific application scenario, parents cook in a kitchen, a story machine accompany a child to play in a room, and when the child cries, the story machine receives a signal indicating that the baby cries successfully and broadcasts the signal in voice, thereby alerting the parents in the kitchen that the child is crying.

In another specific application scenario, a parent goes out for a short time, a child in a room wakes up and cries, and after the intelligent sound box receives a signal indicating that the baby cries successfully, the intelligent sound box sends information to the mobile phone of the parent through the network so as to inform the parent to return to the home as soon as possible.

In one embodiment, if the confidence level of the baby crying sound is not less than the preset confidence threshold, the baby crying detection device stops collecting the audio signal. Therefore, the problem that the detection success information is repeatedly sent to the user due to the fact that the baby cry sound is repeatedly collected can be solved.

Further, when the baby crying sound needs to be detected again, the user can wake up the baby crying detection device again.

It should be noted that the above method steps are not intended to limit the execution order of the steps, and in fact, some steps may be executed simultaneously or in the reverse order of the steps, which is not limited herein.

The following description is provided to enable those skilled in the art to better understand the present disclosure by describing some of the problems encountered by the inventors in implementing the present disclosure and by describing one particular embodiment of the finally identified solution.

The inventor finds that the defects in the prior art are mainly caused by the following reasons in the process of implementing the application: the baby cry is small and is a little far away, and the recognition rate is seriously reduced; sounds similar to baby cry also exist in normal voice, so that the rate of speech recognition interference is not high, and false recognition is serious.

The inventor also found that: the method solves the problem of far-field identification, improves the identification accuracy, not only needs to understand the characteristics of the baby crying sound deeply, but also has knowledge in the aspects of enhancing the signal processing of the microphone array, deep learning and the like. The practitioner in the industry can hardly have the knowledge at the same time.

The scheme of this application mainly starts to design and optimize from following several aspects and improves the discernment precision, solves and can not carry out far field discernment, and the distance is far away, and the problem that the recognition rate descends:

(1) the problem of far-field identification of the baby crying is solved by enhancing the baby crying sound through the microphone array

(2) Through the deep learning model, massive baby crying sounds and similar baby crying sounds are trained, and the recognition performance of the baby crying model is further improved.

Please refer to fig. 2, which shows a flow chart of the baby cry detection method of the present application.

As shown in fig. 2, the first step: collecting audio by a multi-microphone array;

the second step is that: the audio collected by the multi-microphone array is subjected to signal processing, so that the sound of crying of the baby is enhanced;

the third step: inputting the enhanced baby crying sound into a deep learning-based baby crying sound classifier model, and outputting a confidence coefficient that a section of sound can be the baby crying sound by the model;

the fourth step: judging whether the confidence coefficient of the infant crying sound output by the model is larger than a preset threshold value or not, wherein the confidence coefficient can be a numerical value which is normalized to 0-1, the higher the numerical value is, the higher the confidence coefficient is, the higher the probability of the infant crying sound is, if the confidence coefficient is larger than or equal to the threshold value, the infant crying sound is successfully detected, and the system sends a signal that the detection is successful; otherwise, the microphone array continues to acquire audio.

The inventors have also adopted the following alternatives in the course of carrying out the present application and summarized the advantages and disadvantages of the alternatives.

Beta version: in the initial version, a large number of babies cry in various states are used for training a deep learning model, sound detection similar to the baby cry is obtained, the error recognition rate is high, and although the sound detection is usable, the final performance of the system is influenced to a certain extent.

Referring to fig. 3, a block diagram of an apparatus for detecting baby crying according to an embodiment of the present invention is shown.

As shown in fig. 3, the apparatus 200 for detecting baby crying includes a first output module 210, a determining module 220 and a second output module 230.

Wherein the first output module 210 is configured to output a confidence level of the infant crying sound via an infant crying sound classifier in response to the received audio signal, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on a deep learning model; a determining module 220 configured to determine whether the confidence of the baby crying sound is smaller than a preset confidence threshold; the second output module 230 is configured to output a signal indicating that the baby cry detection is successful if the confidence of the baby cry sound is not less than the preset confidence threshold.

It should be understood that the modules depicted in fig. 3 correspond to various steps in the method described with reference to fig. 1. Thus, the operations and features described above for the method and the corresponding technical effects are also applicable to the modules in fig. 3, and are not described again here.

It is to be noted that the modules in the embodiments of the present disclosure are not intended to limit the aspects of the present disclosure, for example, the first output module may be described as a module that outputs the confidence of the infant crying sound via the infant crying sound classifier in response to the received audio signal. In addition, the related function module may also be implemented by a hardware processor, for example, the determining module may also be implemented by a processor, which is not described herein again.

In other embodiments, the present invention further provides a non-volatile computer storage medium storing computer-executable instructions for performing the method for detecting baby crying in any of the above method embodiments;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

outputting a confidence level of the infant crying sound via an infant crying sound classifier in response to the received audio signal, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on a deep learning model;

judging whether the confidence coefficient of the infant crying sound is smaller than a preset confidence coefficient threshold value, wherein the confidence coefficient is a numerical value which is normalized to 0-1, and the higher the numerical value is, the higher the confidence coefficient is, the higher the probability of the infant crying sound is;

and if the confidence coefficient of the baby crying sound is not less than the preset confidence coefficient threshold value, outputting a successful baby crying detection signal.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the infant crying detection apparatus, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes memory remotely located from the processor, which may be connected to the infant crying detection device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform any one of the above methods of detecting baby crying.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes: one or more processors 310 and a memory 320, one processor 310 being illustrated in fig. 4. The apparatus of the baby crying detection method may further include: an input device 330 and an output device 340. The processor 310, the memory 320, the input device 330, and the output device 340 may be connected by a bus or other means, such as the bus connection in fig. 4. The memory 320 is a non-volatile computer-readable storage medium as described above. The processor 310 executes various functional applications and data processing of the server by executing the nonvolatile software programs, instructions and modules stored in the memory 320, so as to implement the baby cry detection method of the above-mentioned method embodiment. The input device 330 may receive input numerical or character information and generate key signal inputs related to user settings and function control of the infant crying detection apparatus. The output device 340 may include a display device such as a display screen.

The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

As an embodiment, the electronic device is applied to an infant crying detection apparatus, and is used for a client, and includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc.

(3) A portable entertainment device: such devices can display and play multimedia content. The devices comprise audio and video players, handheld game consoles, electronic books, intelligent toys and portable vehicle-mounted navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting crying in an infant, comprising:

judging whether the confidence coefficient of the baby crying sound is smaller than a preset confidence coefficient threshold value or not;

and if the confidence coefficient of the infant crying sound is not less than the preset confidence coefficient threshold value, outputting an infant crying detection success signal.

2. The method of claim 1, wherein the infant crying sound classifier is trained using at least one similar infant crying sound as a counter example based on the deep learning model.

3. The method of claim 1, wherein after the outputting of the infant crying detection success signal if the confidence of the infant crying sound is not less than the preset confidence threshold, the method further comprises:

and responding to the received baby crying detection success signal, and sending detection success information to the user by the system.

4. The method of claim 1, wherein after said determining whether the confidence of the infant crying sound is less than a preset confidence threshold, the method further comprises:

and if the confidence coefficient of the baby crying sound is not less than the preset confidence coefficient threshold value, stopping collecting the audio signal.

5. The method of any one of claims 1-4, wherein prior to said outputting, via an infant crying sound classifier, a confidence level of an infant crying sound in response to a received audio signal, the method further comprises:

in response to an audio signal acquired in real-time, signal enhancement is performed on the audio signal based on a microphone array.

6. The method of claim 5, wherein the microphone array comprises: the microphone array comprises a double-microphone array, a linear four-microphone array, an annular four-microphone array and an annular six-microphone array.

7. An infant crying detection device comprising:

a first output module configured to output a confidence level of the infant crying sound via an infant crying sound classifier in response to the received audio signal, wherein the infant crying sound classifier is obtained by training at least one infant crying sound based on a deep learning model;

the judging module is configured to judge whether the confidence of the infant crying sound is smaller than a preset confidence threshold value;

the second output module is configured to output a signal indicating that the baby cry detection is successful if the confidence of the baby cry sound is not less than the preset confidence threshold.

8. The apparatus of claim 7, wherein the infant crying sound classifier is trained using at least one similar infant crying sound as a counter example based on the deep learning model.

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1 to 6.

10. A storage medium having stored thereon a computer program, characterized in that the program, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 6.