CN202855297U

CN202855297U - Background music control device based on expression

Info

Publication number: CN202855297U
Application number: CN 201220371686
Authority: CN
Inventors: 郭雷; 陈智慧; 赵天云
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2012-07-30
Filing date: 2012-07-30
Publication date: 2013-04-03
Anticipated expiration: 2022-07-30

Abstract

The utility model discloses a background music control device based on expression. The device comprises a power supply unit, an image acquisition unit, a main processor (DSP), a storage unit and a background music adjusting unit. The main processor (DSP) is respectively connected with the image acquisition unit, the storage unit, and the background music adjusting unit. The power supply unit is connected with the image acquisition unit, the main processor (DSP), the storage unit, and the background music adjusting unit. The device can identify facial expression and mood of users through expression, and adjust background music modes according to mood of the users, so as to achieve effect of interactive home furnishing and user, so that the device makes the home furnishing more humanized, and experience of the users is improved. The system is simple in operation, convenient in use, and has good practicality.

Description

Background music control device based on expression

Technical field

The present invention relates to video image and process and the control technology field, especially a kind of background music control device based on expression.

Background technology

Current, modern house is comfortable in order to build, warm living environment atmosphere, background music becomes in the household very part and parcel, so modern house has been installed background music control system, background music control system can effectively cover environmental noise, creates easily comfortable environment.But the household background music control system can only be selected song by manually regulating volume at present, can not select song according to people's mood automatic regulating volume, changes audio, can't be complementary with people's mood impersonality.

Summary of the invention

In order to solve the problem of above-mentioned existence, the object of the present invention is to provide a kind of background music control device based on expression that can automatically regulate audio according to people's mood, select song.

A kind of background music control device based on expression is characterized in that comprising power supply unit, image acquisition units, primary processor DSP, storage unit, power amplifier, sound effect processor, MP3 decoding device and micro-control unit MCU; Primary processor DSP respectively with image acquisition units, storage unit and micro-control unit MCU, micro-control unit MCU sequentially connects MP3 decoding device, sound effect processor and power amplifier; Micro-control unit MCU is directly sound effect processor connection also; Power supply unit connects with above-mentioned unit and working power is provided.

Described image acquisition units adopts cmos image sensor.

Adopt the RS-485 bus to be connected between described primary processor DSP and the micro-control unit MCU, adopt the baud rate the transmission of data of 9600bps.

The present invention is simple to operate easy to use, need not loaded down with trivial details operation.Can automatically adjust the music pattern of background music in the household by the detection to human face expression, thereby the hommization that realizes background music control system is interactive.Have extensibility, can pass through a plurality of background music regulons of the total line traffic control of RS-485.Has very strong practicality.

Description of drawings

Fig. 1 is system hardware structure figure of the present invention.

Embodiment

In conjunction with shown in Figure 1, a kind of background music control device based on expression of the present invention is comprised of power supply unit, image acquisition units, primary processor (DSP), storage unit and background music regulon.Described primary processor (DSP) is connected with image acquisition units, storage unit and background music regulon respectively; Described power supply unit is connected in image acquisition units, primary processor (DSP), storage unit and background music regulon;

Adopt the RS-485 bus to be connected between primary processor (DSP) and the background music regulon;

Primary processor (DSP) adopts the TMS320DM6467T chip of TI company;

Image acquisition units adopts cmos image sensor;

Storage unit is comprised of memory device and chip, comprises synchronous DRAM and flash chip; Wherein, synchronous DRAM is used for the intermediate images data storage, and flash chip is used for the storage of program; The background music regulon is comprised of background music controller spare and chip, comprises micro-control unit (MCU), MP3 decoding device, sound effect processor, power amplifier and audio amplifier;

Control procedure is as follows:

Image acquisition units is carried out image acquisition to people's face, 24 true color images that collect are encoded according to certain picture format, and the image after will encoding is stored on the SDRAM storer.At first primary processor (DSP) carries out the gray processing processing according to system program to gathering image, and the gray processing algorithm adopts weighted average method, and formula is as follows:

f(i,j)＝0.30R(i,j)+0.59G(i,j)+0.11B(i,j)

F (i in the formula, j) gray-scale value of locating at (i, j) for the gray level image after the conversion, R (i, j) be that original image is at (i, j) the R component gray-scale value of locating, G (i, j) are that original image is at (i, j) the G component gray-scale value of locating, B (i, j) is the B component gray-scale value that original image is located at (i, j)

The Haar feature of image after utilizing the integral image algorithm to extract gray processing to process is come human face region in the detected image with the image Haar feature extracted by cascade classifier.This cascade classifier is people's face Haar feature of utilizing sample image, adopts the AdaBoost sorting algorithm, carries out the sorter training, and it is resulting then to make up several simple sorters.

After detecting human face region, estimate roughly the approximate region of human eye according to people's face section ratio feature, with the approximate region frame of human eye out, be referred to as to get window; If the height of facial image is h, width is u, and getting upper left angle point is initial point, and the origin coordinates that we get two windows in experiment is: left eye

Right eye

The size of window is

Then according to people's face pupil and eyebrow the most black characteristics in window, the image of window inner region is done histogram analysis, we get 5% minimum pixel to take out that minimum part of gray scale, and the remainder gray scale is set to 255.Through behind this step Threshold segmentation. eyes and eyebrow can be split significantly.Then, image in the window is made horizontal projection, projection function is:

pv (y) = Σ_{x = 1}^{N} I (x, y)

I (x, y) is that N is that projected pixel is counted at the gray-scale value of (x, y) point in the formula.

Obtain a dimension curve.Obvious two troughs are arranged on the curve, represent respectively eyebrow zone and eye areas.Can obtain by the method that one-dimensional signal is processed the ordinate of eyes.Then, the gray-scale value with the eyebrow zone is set to 255. removal eyebrows.Do the vertical projection curve, projection function is again:

pv (x) = Σ_{y = 1}^{N} I (x, y)

Determine the horizontal ordinate of eyes.

Image is rotated correction: after position of human eye is determined, calculate the central point of two lines.Take this point as true origin.Set up the facial image coordinate system, the rotation facial image makes eyes maintenance level, rectifies facial image.

Image is carried out the yardstick normalized: after obtaining two centre distance information, determine again the position of face.Obtain two centers to the vertical range of face, utilize these two range informations from the facial image coordinate system, the major part of people's face to be cut out. and yardstick normalizes to same size.According to the facial ratio feature of people's face, roughly determine first the position of face.Get the face window from facial image, the origin coordinates of window is

The size of window is

The window inner region is done horizontal projection, get the minimum value of curve obtained as face coordinate in vertical direction.If distance is w between two, two centers are h to the distance of face.Centered by two eye distance decenterings, about get

Upwards get

Down get

Cut out the major part of people's face, the facial image scaling of well cutting to same size, in the native system, is got 100 * 100 pixels.

Image is carried out gray scale normalization to be processed: regard facial image as a two-dimensional matrix M[w] [h]. the image size is w * h.The average of this image is:

\overset{&OverBar;}{μ} = \frac{1}{w \cdot h} Σ_{i = 0}^{w - 1} Σ_{j = 0}^{h - 1} M [i] [j]

The variance of image is:

{\overset{&OverBar;}{σ}}^{2} = \frac{1}{w \cdot h} Σ_{i = 0}^{w - 1} Σ_{j = 0}^{h - 1} {(M [i] [j] - \overset{&OverBar;}{μ})}^{2}

Utilize following formula that facial image is done gray scale normalization:

μ in the formula ₀, σ ₀Average and the variance of image after the conversion.Gradation of image average and variance being transformed to the value of prior setting, pass through the method. the facial image after the normalization has identical average and variance.

Adopt two-dimension discrete cosine transform (2D-Discrete Cosine Transform, 2D-DCT) to extract the human face expression frequency domain character to the facial image after the normalized, get upper left corner low frequency coefficient after the conversion and form and observe vector.

Adopt traversal method, be P through the facial image plane after the normalized with a pixel SerComm, length is the sample window of L, from left to right, slide from the top down, obtain the sampled images piece, each image block utilizes following two-dimension discrete cosine transform (2D-Discrete Cosine Transform, 2D-DCT) formula to carry out conversion.

C (u, v) = a (u) a (v) Σ_{x = 0}^{M - 1} Σ_{y = 0}^{N - 1} f (x, y) \cos (\frac{(2 x + 1) uπ}{2 M}) \cos (\frac{(2 y + 1) vπ}{2 N})

(u＝0,1,2,...,M-1;v＝0,1,2,...,N-1)

C (u, v) is the result of two-dimension discrete cosine transform (2D-Discrete Cosine Transform, 2D-DCT) in the formula, i.e. two-dimension discrete cosine transform (2D-Discrete Cosine Transform, 2D-DCT) coefficient.Wherein a (u) and a (v) are defined as follows respectively:

a (u) = \{\begin{matrix} \sqrt{1 / M}, u = 0 \\ \sqrt{2 / M}, u = 1,2, . . . M - 1 \end{matrix}

a (v) = \{\begin{matrix} \sqrt{1 / N}, v = 0 \\ \sqrt{2 / N}, v = 1,2, . . . N - 1 \end{matrix}

Get P * L and be 16 * 16 sample window, sliding step is 4 * 4, get M=N=8, namely adopt 8 two-dimension discrete cosine transform (2D-Discrete Cosine Transform, 2D-DCT), get low frequency part 4 * 4 coefficients of conversion coefficient as the observation vector of built-in type hidden Markov model (EHMM).

Utilize two-dimension discrete cosine transform (2D-Discrete Cosine Transform, 2D-DCT) extract corresponding observation vector after. calculate the likelihood probability that produces this observations vector with the built-in type hidden Markov model (EHMM) of three kinds of expressions respectively, the highest model of selection probability. come the expression information of people's face in the Recognition and Acquisition image with this.After obtaining the Expression Recognition object information, primary processor (DSP) passes through the RS-485 bus with the Expression Recognition consequential signal, be sent to the speed of 9600bps in the micro-control unit (MCU) of background music regulon, micro-control unit (MCU) calls corresponding program module according to the Expression Recognition consequential signal that receives, regulate MP3 decoding device and sound effect processor, obtain the music effect that needs.

Native system is that three kinds of expressions have been set up built-in type hidden Markov model (EHMM), and three kinds of expressions are respectively: " happy ", " normally " and " sad ".Surprised and glad basic facial expression during wherein " happy " comprises." sad " comprises indignation and sad basic facial expression." normally " refers to not other expressions in 6 kinds of basic facial expression definition.

The step of the built-in type hidden Markov model of three kinds of expressions of described training " happy " " normally " " sad ":

The built-in type hidden Markov model training of " happy ": select the sample image of " happy " expression, wherein " happy " expression refers to surprised and glad expression in 6 kinds of basic facial expressions; The number of getting super state is 5, is respectively forehead, eyes, nose, mouth and chin five major parts; The number that embeds state in the super state is defined as { 3,5,3,5,3}, the built-in type hidden Markov model of training " happy " expression;

The built-in type hidden Markov model training of " sad ": select the sample image of " sad " expression, wherein " sad " expression refers to indignation and sad expression in 6 kinds of basic facial expressions; The number of getting super state is 5, is respectively forehead, eyes, nose, mouth and chin five major parts; The number that embeds state in the super state is defined as { 3,5,3,5,3}, the built-in type hidden Markov model of training " sad " expression;

The built-in type hidden Markov model training of " normally ": select the sample image of " normally " expression, wherein " normally " expression refers to be not included in other expressions in " happy " and " sad " expression; The number of getting super state is 5, is respectively forehead, eyes, nose, mouth and chin five major parts; The number that embeds state in the super state is defined as { 3,5,3,5,3}, the built-in type hidden Markov model of training " normally " expression;

The sample image of described three kinds of expressions is selected from the Cohn-Kanade expression database of Carnegie Mellon University;

When the human face expression that recognizes was " happy ", micro-control unit (MCU) calls carried out the happiness program module, strengthens music pattern; When the human face expression that recognizes was " normally ", micro-control unit (MCU) calls carried out gentle program module, and the control music pattern is intermediate value; When the expression that recognizes was " sad ", micro-control unit (MCU) calls carried out dejected program module, weakens music pattern.

Described happiness program module is: background music is improved 5 decibels of volumes, strengthen 3 decibels of supper bass, improve 3 decibels of high pitchs, open surround sound;

Described gentle program module is: background music is reduced volume to 45 decibel, reduce supper bass to 30 decibel, reduce high pitch to 25 decibel, open surround sound;

Described dejected program module is: background music is reduced by 5 decibels of volumes, close supper bass, reduce high pitch to 20 decibel, close surround sound.

Strengthening music pattern among the present invention includes but not limited to: play the music list of happiness type, the size of raising volume, strengthen supper bass, promote high pitch, open surround sound.Music pattern is that intermediate value includes but not limited to: music list, reduction volume to comparatively suitable intermediate value, the reduction supper bass of playing gentle type is comparatively suitable intermediate value, intermediate value, the unlatching surround sound that the reduction high tone quality is most suitable.Weakening music pattern includes but not limited to: play sad type music list, reduce volume to smaller value, reduce high pitch to smaller value, close surround sound.

Claims

1. the background music control device based on expression is characterized in that comprising power supply unit, image acquisition units, primary processor DSP, storage unit, power amplifier, sound effect processor, MP3 decoding device and micro-control unit MCU; Primary processor DSP respectively with image acquisition units, storage unit and micro-control unit MCU, micro-control unit MCU sequentially connects MP3 decoding device, sound effect processor and power amplifier; Micro-control unit MCU is directly sound effect processor connection also; Power supply unit connects with above-mentioned unit and working power is provided.

2. the background music control device based on expression according to claim 1 is characterized in that: described image acquisition units employing cmos image sensor.

3. according to the background music control device based on expression claimed in claim 1, it is characterized in that: adopt the RS-485 bus to be connected between described primary processor DSP and the micro-control unit MCU, the baud rate the transmission of data of employing 9600bps.