TW201442019A

TW201442019A - Building method and using method for infant crying implication determining model

Info

Publication number: TW201442019A
Application number: TW102114614A
Authority: TW
Inventors: Chuan-Yu Chang; Szu-Ta Chen; Chen Lin; Yi-Ping Chang
Original assignee: Univ Nat Yunlin Sci & Tech
Priority date: 2013-04-24
Filing date: 2013-04-24
Publication date: 2014-11-01
Also published as: TWI571862B

Abstract

The present invention provides a building method and a using method for infant crying implication determining model. The building method is applicable when an infant cries for a specific reason. The building method includes the steps of capturing a plurality of audio characteristics and a plurality of facial image characteristics of the infant, repeating the above step, and using a sorter to respectively integrate the audio characteristics and image characteristics into an audio model and an image model. The audio model and the image model form a determining model. The using method is applicable when an infant is crying, and the using method includes the steps of capturing a plurality of audio characteristics and a plurality of facial image characteristics of the infant, and making a comparison with the audio model and the image model, and if similarities of the audio and the image are respectively higher than a first threshold value and a second threshold value, a reason of the infant crying is determined to be identical to the specific reason when building the determining model.

Description

Establishment method and use method of baby crying meaning interpretation model

本發明係與聲音及影像辨識有關；特別是指一種嬰兒哭鬧含意判讀模型的建立方法及使用方法。 The invention relates to sound and image recognition; in particular to a method and a method for establishing a baby crying meaning interpretation model.

在嬰兒學會說話之前，皆是以哭聲及臉部表情來表達生理、心理及情緒上的各式需求，所以在某種程度上，哭聲及臉部表情可以視為一種與生俱來的特殊語言，嬰兒便是藉著這種特殊語言與外界溝通。 Before the baby learns to speak, he expresses various physiological, psychological and emotional needs with crying and facial expressions, so to some extent, crying and facial expressions can be regarded as an innate In special languages, babies communicate with the outside world through this special language.

一般而言，誘發嬰兒哭鬧的原因不脫環境不舒適(過熱或過冷)、需要照顧(肚子餓或尿布濕)以及情感上的依賴(渴求關注)等等，而醫護人員或父母在照護嬰兒時，多是憑藉經驗猜測嬰兒哭鬧背後的理由，缺乏客觀可靠的判斷依據，尤其缺乏育兒經驗的新手父母，更是常常在嬰兒哭鬧時感到手足無措。 In general, the cause of babies crying is not to be uncomfortable (overheated or too cold), to take care (hungry or diaper wet) and emotional dependence (craving attention), etc., while the medical staff or parents are caring for In babies, most of them rely on experience to guess the reasons behind baby crying, lack of objective and reliable judgment basis, especially novice parents who lack parenting experience, but often feel helpless when the baby is crying.

有鑑於此，本發明之目的用於提供一種嬰兒哭鬧含意判讀模型的建立方法及使用方法，能準確判斷嬰兒哭鬧的理由，提供給嬰兒的照護者參考。 In view of this, the object of the present invention is to provide a method for establishing a baby crying meaning interpretation model and a method for using the same, which can accurately determine the reason for the baby crying and provide a reference for the infant caregiver.

緣以達成上述目的，本發明所提供嬰兒哭鬧含意判讀模型的建立方法係在嬰兒因為一特定理由而哭鬧時進行，包含以下步驟：a、記錄該嬰兒哭鬧時的一哭聲訊號，並從該哭聲訊號中擷取複數音訊特徵；b、擷取該嬰兒哭鬧時臉部影像的複數影像特徵；c、重覆步驟a至步驟b，並使用一分類器，將每次進行步驟a至步驟b而擷取到的該些音訊特徵與該些影像特徵分別整合成一音訊模型與一影像模型，該音訊模型與該影像模型共同組成該判讀模型。 In order to achieve the above object, the method for establishing a baby crying meaning interpretation model provided by the present invention is performed when the baby is crying for a specific reason, and includes the following steps: a. recording a crying signal when the baby is crying. And extracting the plural audio features from the crying signal; b, taking the baby crying a plurality of image features of the facial image; c, repeating steps a to b, and using a classifier to integrate the audio features extracted from steps a to b each and the image features respectively An audio model and an image model are formed, and the audio model and the image model together form the interpretation model.

本發明所提供使用前述判讀模型之嬰兒哭鬧含意判讀方法，係在一嬰兒哭鬧時進行以下步驟：a、記錄該嬰兒哭鬧時的一哭聲訊號，並從該哭聲訊號中擷取複數音訊特徵；b、擷取該嬰兒哭鬧時臉部影像的複數影像特徵；c、將步驟a所擷取之該些音訊特徵與該音訊模型進行比對，將步驟b所擷取之該些影像特徵與該影像模型進行比對，若該些音訊特徵與該音訊模型比對得到的相似度高於一第一閥值，且該些影像特徵與該影像模型比對得到的相似度高於一第二閥值，則判斷該嬰兒此時哭鬧的理由與該判讀模型建立時的該特定理由相同。 The invention provides a baby crying intentional interpretation method using the foregoing interpretation model, and performs the following steps when a baby is crying: a. recording a crying signal when the baby is crying, and extracting from the crying signal a plurality of audio features; b, capturing a plurality of image features of the face image of the baby crying; c, comparing the audio features captured in step a with the audio model, and taking the step b The image features are compared with the image model, and if the audio features are compared with the audio model, the similarity is higher than a first threshold, and the image features are compared with the image model to obtain a high degree of similarity. At a second threshold, the reason for the baby to be crying at this time is the same as the specific reason for the establishment of the interpretation model.

藉此，本發明所提供嬰兒哭鬧含意判讀模型的建立方法及使用方法能可靠判斷嬰兒哭鬧的理由，使醫護人員或父母能更妥適地照護嬰兒。 Thereby, the method and the method for establishing the baby crying meaning interpretation model provided by the invention can reliably determine the reason for the baby crying, so that the medical staff or the parents can properly care for the baby.

圖1係本發明嬰兒哭鬧含意判讀模型的建立方法之流程圖；圖2係採用主動外觀模型於嬰兒臉部影像上標示出特徵點之示意圖；圖3係本發明嬰兒哭鬧含意判讀模型的使用方法之流程圖。 1 is a flow chart of a method for establishing a baby crying meaning interpretation model of the present invention; FIG. 2 is a schematic diagram showing a feature point on a baby face image by using an active appearance model; FIG. 3 is a schematic diagram of a baby crying meaning interpretation model of the present invention. A flow chart of the method of use.

為能更清楚地說明本發明，茲舉較佳實施例並配合圖示詳細說明如後，請參閱圖1，本發明嬰兒哭鬧含意判讀模型的建立方法之較佳實施例，是利用嬰兒因為一特定理由而哭鬧的時候進行，係分別擷取該嬰兒哭鬧時的複數音訊特徵及臉部表情的複數影像特徵，留待後續整合成該判讀模型；關於該些音訊特徵及該些影像特徵，詳述如下：該些音訊特徵係擷取自一哭聲訊號，該哭聲訊號由複數個音框(Frame)組成，且為使擷取該些音訊特徵的結果更為精準，本實施例對該哭聲訊號進行快速傅立葉轉換(Fast Fourier Transform)，並對每一該音框進行頻譜分析(Spectral Analysis)，而後再擷取出包括強度(Intensity)和音色(Timbre)在內的複數音訊特徵，其中屬於音色的該些音訊特徵包括有波形重心(Centroid)、音域頻寬(Bandwidth)、衰減度(Roll-off)、八度音(Octave-Based)和過零率(Zero-Crossing Rate)等。需特別說明的是，以上所列舉的音訊特徵種類並非本發明的侷限所在，於其他實施例中所擷取的音訊特徵當然也可以是未見於此處之種類。 In order to explain the present invention more clearly, the preferred embodiment will be described in detail with reference to the accompanying drawings. Referring to FIG. 1, a preferred embodiment of the method for establishing a baby crying meaning interpretation model of the present invention utilizes a baby because When crying for a specific reason, the plural audio features of the baby crying and the complex image features of the facial expression are respectively taken for later integration into the interpretation model; regarding the audio features and the image features The details are as follows: the audio features are extracted from a crying signal, and the crying signal is composed of a plurality of frames, and the result of capturing the audio features is more accurate. Fast Fourier Transform is performed on the crying signal, and Spectral Analysis is performed for each of the frames, and then complex audio features including intensity (Intensity) and tone (Timbre) are extracted. The audio features belonging to the timbre include a Centroid, a Bandwidth, a Roll-off, an Octave-Based, and a Zero-Crossing. Rate) and so on. It should be noted that the types of audio features enumerated above are not the limitations of the present invention. The audio features captured in other embodiments may of course be of a type not found here.

至於該些影像特徵，本實施例係使用主動外觀模型(Active Appearance Models)技術擷取，但並不以此為限。在使用主動外觀模型技術之前，需要先行於另一臉部影像上標示出複數個特徵點做為訓練(Training)之用；待訓練完成後，主動外觀模型技術便能於該嬰兒的該臉部影像上抽取出相對應的該些特徵點，惟此處所涉之訓練及抽取該些特徵點之方法係為此領域的習知技術，於此不再贅述。圖2所示為抽取該些特徵點的示範，在本較佳實施例中，利用主動外觀模型技術抽取出的該些特徵點分別位於該臉部影像中的眼瞼、鼻翼和嘴唇等處，實務上可在該些特徵點之中選取某些特定的特徵點，取該些受選取的特徵點之間的距離及其所圍成的幾何形狀之大小作為該些影像特徵之用，惟該些特徵點的選取方式並非一成不變，於不同的實施例中可以有不一樣的選取方式。 As for the image features, the present embodiment uses Active Appearance Models technology, but is not limited thereto. Before using the active appearance model technology, it is necessary to mark a plurality of feature points on the other facial image as training (Training); after the training is completed, the active appearance model technology can be used on the face of the baby. The corresponding feature points are extracted from the image, but the training involved and the methods for extracting the feature points are known in the art, and will not be described here. FIG. 2 is an example of extracting the feature points. In the preferred embodiment, the feature points extracted by the active appearance model technology are respectively located in the eyelids, noses, and lips of the facial image. Can be selected among the feature points Taking certain feature points, taking the distance between the selected feature points and the size of the surrounding geometric features as the image features, but the selection of the feature points is not static, Different embodiments may have different ways of selecting.

本發明嬰兒哭鬧含意判讀模型的建立方法會在該嬰兒因為該特定理由而哭鬧的情況下，多次重複上述擷取該些音訊特徵與該些影像特徵之步驟；本實施例使用以支援向量迴歸(Support Vector Regression)為例的分類器，將前述該些音訊特徵整合為一音訊模型，同時亦將前述該些影像特徵整合為一影像模型。該音訊模型與該影像模型共同組成在該特定理由下的該判讀模型。 The method for establishing a baby crying intentional interpretation model of the present invention repeats the steps of capturing the audio features and the image features a plurality of times when the baby is crying for the specific reason; the embodiment is used to support Vector Regression (Support Vector Regression) is an example of a classifier that integrates the aforementioned audio features into an audio model, and also integrates the image features into an image model. The audio model and the image model together form the interpretation model for this particular reason.

請參閱圖3，本發明使用前述該判讀模型的嬰兒哭鬧含意判讀方法係使用在一嬰兒哭鬧時。首先擷取該嬰兒哭鬧時的複數音訊特徵與臉部影像的複數影像特徵。概如前所述，該些音訊特徵擷取自一哭聲訊號，且該哭聲訊號同樣由複數個音框所組成；理由同前，為使擷取出的該些音訊特徵更為準確，該哭聲訊號先經快速傅立葉轉換處理，每一該音框亦接受頻譜分析。擷取出的該些音訊特徵之種類與本發明該判讀模型的建立方法所述相同，於此不再贅述。 Referring to FIG. 3, the baby crying intentional interpretation method of the present invention using the aforementioned interpretation model is used when a baby is crying. First, the complex audio features of the baby crying and the complex image features of the facial image are captured. As mentioned above, the audio features are extracted from a crying signal, and the crying signal is also composed of a plurality of sound frames; for the same reason as before, in order to make the audio features extracted by the hammer more accurate, The crying signal is processed by fast Fourier transform, and each of the sound frames is also subjected to spectrum analysis. The types of the audio features extracted are the same as those described in the method for establishing the interpretation model of the present invention, and are not described herein again.

同理，該些影像特徵一樣是對該臉部影像採用主動外觀模型技術而取得，且該些影像特徵的擷取方式必須與該判讀模型的建立方法中所述相同，以利後續比對步驟之進行。 Similarly, the image features are obtained by using the active appearance model technology for the facial images, and the image features must be extracted in the same manner as described in the method for establishing the interpretation model, so as to facilitate subsequent comparison steps. Go on.

待擷取出該些音訊特徵與該些影像特徵後，本實施例使用以多類支援向量機(Multi-Class Support Vector Machine)為例的比對方法，分別對該些音訊特徵與該音訊模型、該些影像特徵與該影像模型進行比對；若該些音訊特徵與該音訊模型比對得到的相似度高於一第一閥值，且該些影像特徵與該影像模型比對得到的相似度高於一第二閥值，則判斷此時該嬰兒哭鬧的理由與該判讀模型所建立時的該特定理由相同；反之，則判斷該嬰兒哭鬧的理由與該判讀模型所建立時的該特定理由不同。 After the audio features and the image features are removed, the present embodiment uses a multi-class support vector machine (Multi-Class Support Vector Machine) as an example to compare the audio features and the audio model. Comparing the image features with the image model; if the audio features are compared with the audio model, the similarity is higher than a first threshold, and the If the similarity between the image feature and the image model is higher than a second threshold, it is determined that the reason for the baby crying at this time is the same as the specific reason when the interpretation model is established; otherwise, the baby is determined to cry. The reason for the trouble is different from the specific reason when the interpretation model was established.

實務上，可針對多種特定理由建立多個判讀模型，於該嬰兒哭鬧時，將擷取到的該些音訊特徵與該些影像特徵逐一與該些判讀模型的該音訊模型與該影像模型進行比對，以判斷該嬰兒哭鬧的理由最接近該些判讀模型之中何者建立時的特定理由。 In practice, a plurality of interpretation models can be established for a plurality of specific reasons. When the baby is crying, the captured audio features and the image features are performed one by one with the audio model of the interpretation models and the image model. The reason for comparing the baby's crying is closest to the specific reason for which of the interpretation models was established.

綜上所述，本發明嬰兒哭鬧含意判讀模型的建立方法及使用方法能提供一可靠而準確的判斷基準，使嬰兒的照護者能夠輕易推斷嬰兒哭鬧的原因，而能做出適當的應對處置。 In summary, the method and method for establishing a baby crying meaning interpretation model of the present invention can provide a reliable and accurate judgment criterion, so that the infant caregiver can easily infer the cause of the baby crying and can appropriately respond. Dispose of.

以上所述僅為本發明較佳可行實施例而已，舉凡應用本發明說明書及申請專利範圍所為之方法等效變化，理應包含在本發明之專利範圍內。 The above is only a preferred embodiment of the present invention, and equivalent variations of the method of the present invention and the scope of the patent application are intended to be included in the scope of the present invention.

Claims

A method for establishing a baby crying meaning interpretation model is to perform the following steps when the baby is crying for a specific reason: a, recording a crying signal when the baby is crying, and taking a plural from the crying signal Audio feature; b, capturing a plurality of image features of the face image of the baby when crying; c, repeating steps a through b, and using a classifier, each step a to step b is taken The audio features and the image features are respectively integrated into an audio model and an image model, and the audio model and the image model together form the interpretation model.

The method for establishing a baby crying meaning interpretation model according to claim 1, wherein the audio features described in step a include intensity and tone (Timbre); wherein the audio features belonging to the tone include a waveform center of gravity (Centroid), Bandwidth, Roll-off, Octave-Based, and Zero-Crossing Rate.

The method for establishing a baby crying meaning interpretation model according to claim 2, wherein the crying signal described in step a is composed of a plurality of frames, and the crying signal is fast Fourier transform (Fast Fourier) Transform), each of the frames is subjected to Spectral Analysis processing.

The method for establishing a baby crying meaning interpretation model according to claim 1, wherein the image features described in step b are captured using Active Appearance Models.

The method for establishing a baby crying meaning interpretation model according to claim 1, wherein the classifier described in step c is a support vector regression (Support Vector Regression).

A baby crying intentional interpretation method using the interpretation model as claimed in claim 1, the reading method performing the following steps when a baby is crying: a, recording a crying signal when the baby is crying, and crying from the crying sound Obtaining a plurality of audio features in the signal; b, capturing a plurality of image features of the facial image of the baby crying; c, comparing the audio features captured in step a with the audio model, and step b Comparing the image features with the image model, if the audio features are compared with the audio model, the similarity is higher than a first threshold, and the image features are compared with the image model. If the similarity is higher than a second threshold, the reason for the baby to be crying at this time is the same as the specific reason for the establishment of the interpretation model.

The infant crying meaning interpretation method according to claim 6, wherein the audio features described in step a include intensity and timbre; wherein the audio features belonging to the timbre include a waveform center of gravity, a range width, a degree of attenuation, Octave and zero crossing rate.

The infant crying meaning interpretation method according to claim 7, wherein the crying signal described in the step a is composed of a plurality of sound boxes, and the crying signal is processed by fast Fourier transform, and each of the sound boxes is accepted. Spectrum analysis processing.

The baby crying meaning interpretation method according to claim 6, wherein the image features described in step b are captured using an active appearance model technique.

The baby crying meaning interpretation method according to claim 6, wherein the comparison method described in the step c is performed by using a multi-class support vector machine (Multi-Class Support Vector Machine).