CN110197677A

CN110197677A - A kind of control method for playing back, device and playback equipment

Info

Publication number: CN110197677A
Application number: CN201910407403.0A
Authority: CN
Inventors: 付家源
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2019-09-03

Abstract

The disclosure is a kind of control method for playing back, device and playback equipment, the method comprise the steps that characteristics of human body's information of acquisition target person, characteristics of human body's information includes voice signal and/or the image comprising facial expression；Identify the emotional state of characteristics of human body's information representation；If the emotional state of characteristics of human body's information representation is specified emotional state, audio data corresponding with the specified emotional state or video data are played.The mode for the audio or video that playback equipment is played according to the real-time emotion of people or behavior adjustment may be implemented in the present embodiment, achievees the effect that propitiator's mood, improves the intelligence degree of playback equipment, simplify user's operation.

Description

A kind of control method for playing back, device and playback equipment

Technical field

This disclosure relates to data processing art field more particularly to a kind of control method for playing back, device and playback equipment.

Background technique

Existing baby toy is mostly to guide children to go to interact by the modes such as sounding, luminous, shaking, to a certain extent For that can achieve the effect that baby soothing.But these baby toy need baby or caregiver artificially to trigger the related function of key progress Can unlatching, it is cumbersome to reach interaction effect, baby soothing it is ineffective.

Summary of the invention

To overcome the problems in correlation technique, present disclose provides a kind of control method for playing back, device and broadcastings to set It is standby.

According to the first aspect of the embodiments of the present disclosure, a kind of control method for playing back is provided, which comprises

Characteristics of human body's information of target person is acquired, characteristics of human body's information includes voice signal and/or comprising face The image of expression；

Identify the emotional state of characteristics of human body's information representation；

If the emotional state of characteristics of human body's information representation is specified emotional state, play and the specified mood shape The corresponding audio data of state or video data.

Optionally, the emotional state of identification characteristics of human body's information representation, comprising:

Characteristics of human body's information is matched with multiple preset feature templates, wherein the feature templates are silent The feature templates for the preparatory typing of feature templates or user recognized, each feature templates have corresponding emotional state；

Using emotional state corresponding to the feature templates matched as the emotional state of characteristics of human body's information representation.

Optionally, when the feature templates are the feature templates of preparatory typing, in identification characteristics of human body's letter Before the emotional state for ceasing characterization, the method also includes:

It receives the feature templates that external equipment is sent and instruction is set, the feature templates setting instruction includes that instruction user is logical Cross the emotional state mark of the external equipment input fixed reference feature information and the fixed reference feature information representation；

Instruction is set according to the feature templates and generates the feature templates identified for characterizing the emotional state, and is stored The feature templates.

Optionally, when characteristics of human body's information is the voice signal, the feature templates include crying template with And laugh template；When the feature templates matched are crying template, then corresponding emotional state is mood of crying and screaming；When matching Feature templates be laugh template when, then corresponding emotional state be pleasant mood.

Optionally, after broadcasting audio data corresponding with the emotional state or video data, the method Further include:

When the volume for detecting the voice signal weakens, by the audio data of broadcasting or the volume of video data It reduces or is adjusted to and is mute.

Optionally, when characteristics of human body's information is the voice signal, identification characteristics of human body's information table The emotional state of sign, comprising:

In the deep learning network that voice signal input has been trained, institute's predicate is extracted by the deep learning network Audio frequency characteristics in sound signal predict that the voice signal corresponds to the probability of different emotional states according to the audio frequency characteristics, The emotional state that the maximum emotional state of output probability is characterized as the voice signal.

Optionally, when characteristics of human body's information is described image, identification characteristics of human body's information representation Emotional state, comprising:

In the deep learning network that described image input has been trained, extracted in described image by the deep learning network Facial expression feature, according to the facial expression feature predict described image correspond to different emotional states probability, it is defeated The emotional state that the emotional state of maximum probability is characterized as described image out.

When by the voice signal determine emotional state and by described image determine emotional state it is consistent when, The emotional state for then determining characteristics of human body's information representation is the consistent emotional state.

The disclosure additionally provides a kind of broadcast control device, and described device includes:

Characteristic information acquisition module is configured as characteristics of human body's information of acquisition target person, characteristics of human body's information Including voice signal and/or include the image of facial expression；

Emotion identification module is configured as identifying the emotional state of characteristics of human body's information representation；

Playing module is configured as broadcasting when the emotional state of characteristics of human body's information representation is specified emotional state Put audio data corresponding with the specified emotional state or video data.

Optionally, the Emotion identification module includes:

Feature templates matched sub-block is configured as carrying out characteristics of human body's information and multiple preset feature templates Matching, wherein the feature templates are that the feature templates of default or the feature templates of the preparatory typing of user, each feature templates have Corresponding emotional state；Using emotional state corresponding to the feature templates matched as the mood shape of characteristics of human body's information representation State.

Optionally, when the feature templates are the feature templates of preparatory typing, described device further include:

Command reception module is arranged in feature templates, is configured as receiving the feature templates setting instruction that external equipment is sent, The feature templates setting instruction includes that instruction user passes through external equipment input fixed reference feature information and the reference The emotional state mark of characteristic information characterization；

Feature templates generation module is configured as being arranged according to the feature templates instruction generation for characterizing the mood The feature templates of status indicator, and store the feature templates.

Optionally, described device further include:

Volume adjusting module is configured as when the volume for detecting the voice signal weakens, by the sound of broadcasting Frequency evidence or the volume of video data are reduced or are adjusted to mute.

Optionally, when characteristics of human body's information is the voice signal, the Emotion identification module is specifically configured Are as follows:

Optionally, when characteristics of human body's information is described image, the Emotion identification module is specifically configured to:

Optionally, the Emotion identification module is specifically configured to:

The disclosure additionally provides a kind of playback equipment, comprising:

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to:

The technical scheme provided by this disclosed embodiment can include the following benefits:

The embodiment of the present disclosure identifies human body characteristic information characterization by acquiring characteristics of human body's information of target person Emotional state plays and specified emotional state pair if the emotional state of human body characteristic information characterization is specified emotional state The audio data or video data answered, thus realize playback equipment according to the real-time emotion of people or behavior adjustment play audio or The mode of video achievees the effect that propitiator's mood, improves the intelligence degree of playback equipment, simplifies user's operation.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.

Fig. 1 is a kind of disclosure step process of control method for playing back embodiment shown according to an exemplary embodiment Figure；

Fig. 2 is a kind of disclosure applicable schematic diagram of a scenario shown according to an exemplary embodiment；

Fig. 3 is the step process of the disclosure another control method for playing back embodiment shown according to an exemplary embodiment Figure；

Fig. 4 is the step process of the disclosure another control method for playing back embodiment shown according to an exemplary embodiment Figure；

Fig. 5 is the step process of the disclosure another control method for playing back embodiment shown according to an exemplary embodiment Figure；

Fig. 6 is a kind of disclosure block diagram of broadcast control device embodiment shown according to an exemplary embodiment；

Fig. 7 is a kind of disclosure block diagram of playback equipment shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.

It is only to be not intended to be limiting the disclosure merely for for the purpose of describing particular embodiments in the term that the disclosure uses. The "an" of the singular used in disclosure and the accompanying claims book, " described " and "the" are also intended to including majority Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps It may be combined containing one or more associated any or all of project listed.

It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the disclosure A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from In the case where disclosure range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination ".

The step of with reference to Fig. 1 being a kind of disclosure control method for playing back embodiment shown according to an exemplary embodiment stream Cheng Tu can specifically include following steps:

Step 101, characteristics of human body's information of target person is acquired.

The present embodiment can be applied to pacify the scene of baby child's mood.In such a scenario, illustratively, target person can Think baby child.

In a kind of example, the present embodiment be can be applied in playback equipment, can then be executed after playback equipment powers on The step of the present embodiment.In other examples, playback equipment can also access network, to be communicated with external equipment.When So, playback equipment can also be communicated by modes such as bluetooths with external equipment.

Wherein, playback equipment can pacify baby child's mood purpose equipment for that can play audio or video to reach.Example Such as, playback equipment may include the baby child's toy for capableing of sounding, speaker etc., also may include with shooting function and/or display The equipment of function.

As an example, characteristics of human body's information may include voice signal and/or the image comprising facial expression etc..? In a kind of embodiment, the voice signal of target person can be acquired by the sound transducer of playback equipment.When broadcasting is set When the device with shooting function being installed in standby, the facial expression of target person can be acquired by the device, is wrapped Image containing facial expression.Illustratively, which can be camera.

Step 102, the emotional state of characteristics of human body's information representation is identified.

In this step, after acquiring characteristics of human body's information, human body characteristic information characterization can be further identified Emotional state.

As an example, specified emotional state can include but is not limited to cry and scream mood, pleasant mood, hypnagogic state Deng.

In a kind of possible embodiment of the present embodiment, step 102 may include following sub-step:

Sub-step S11 matches characteristics of human body's information with multiple preset feature templates.

In a kind of example, feature templates can be the feature templates of the feature templates or the preparatory typing of user defaulted, and Each feature templates have corresponding emotional state.For example, the feature templates of multiple defaults, such as laugh can be stored in advance in equipment Template, crying template etc..

When the feature templates that feature templates are preparatory typing, in a kind of possible embodiment, user can direct root According to the setting function in playback equipment in playback equipment typing feature templates, for example, then being touched when needing typing crying template It sends out the crying template typing button of playback equipment and typing crying or the expression cried is as crying template.When needing typing laugh mould When plate, then the laugh template typing button of playback equipment and typing laugh or the expression laughed at are triggered as laugh template.Wherein, it needs Illustrate, the application can be using target person as prototype typing crying or laugh template, can improve subsequent identification judgement Accuracy.

In alternatively possible embodiment, when feature templates be preparatory typing feature templates, step 101 with Before, the present embodiment can also include the following steps:

It receives the feature templates that external equipment is sent and instruction is set, the feature templates setting instruction includes that instruction user is logical Cross the emotional state mark of the external equipment input fixed reference feature information and the input fixed reference feature information representation；According to The feature templates setting instruction generates the feature templates for characterizing the emotional state mark, and stores the character modules Plate.

In this step, user can be by the relevant information of external equipment typing feature templates, and passes through external equipment The relevant information is sent to this playback equipment.In one implementation, it can be installed in external equipment and manage this broadcasting The control program of equipment can add this playback equipment after playback equipment networking in a control program.It may then pass through Control the setting of the relevant information of the setting function progress feature templates of program.For example, being set when adding this broadcasting in control program , can be according to the feature templates input function of control program for after, the emotional state mark that selection needs be arranged, and being directed to should Emotional state identifies typing fixed reference feature information, such as typing laugh or crying.After typing completion, external equipment can root Feature templates setting instruction is generated according to the fixed reference feature information of typing and the emotional state mark of the fixed reference feature information representation, And the instruction of this feature template-setup is sent in this playback equipment by network, setting is easy.

After this equipment receives the instruction of this feature template-setup, parsing this feature template-setup instruction obtains corresponding ginseng Characteristic information and emotional state mark are examined, and the emotional state is generated according to the fixed reference feature information and emotional state mark Identify corresponding feature templates.

After generating feature templates, this feature template can be locally stored in playback equipment.Alternatively, playback equipment may be used also This feature template synchronous to be stored into External memory equipment.

It should be noted that in other embodiments, the relevant information of the feature templates of user setting can also include should The validity period of feature templates, when reaching validity period, external equipment, which may remind the user that, re-types information.And in playback equipment Side can delete this feature template.

In a kind of example, feature templates can have multiple.It, can be by characteristics of human body's information point after obtaining feature templates It is not matched with multiple feature templates, in one implementation, can be believed characteristics of human body by way of pattern match Breath is matched with feature templates.

Sub-step S12, using emotional state corresponding to the feature templates matched as the feelings of characteristics of human body's information representation Not-ready status.

According to above-mentioned matched as a result, the feature templates in characteristics of human body's information matches can be obtained, then by the matching On emotional state of the corresponding emotional state of feature templates as target body.For example, if the feature that voice signal matches Template is crying template, then can be determined that the corresponding emotional state of the voice signal is mood of crying and screaming.

In the alternatively possible embodiment of the present embodiment, step 102 be may include steps of:

In the deep learning network that characteristics of human body's information input has been trained, by the deep learning neural network forecast institute The probability that characteristics of human body's information corresponds to different emotional states is stated, the maximum emotional state of output probability is special as the human body Levy the emotional state of information representation.

In this embodiment it is possible to train deep learning network in advance, in training, different emotional states can be used Characteristic information is as training set, for example, it is used as training set using a variety of cryings (or the expression cried) or laugh (or the expression laughed at), And marking the training set is crying (or the expression cried) or laugh (or the expression laughed at), then according to the content of mark to training set Learnt, obtains deep learning network.It, can be by characteristics of human body's information input of acquisition deep learning net in forecast period Network, deep learning network are operated by convolution and pond etc., and prediction human body characteristic information corresponds to different emotional states Probability, then emotional state of the maximum emotional state of output probability as characteristics of human body's information representation.

It is to be appreciated that the present embodiment is not limited to the side of the emotional state of above-mentioned identification characteristics of human body information representation Formula, those skilled in the art are identified using other modes and are possible.

Step 103, it if the emotional state of characteristics of human body's information representation is specified emotional state, plays and is specified with described The corresponding audio data of emotional state or video data.

In this step, when the emotional state for identifying the characteristics of human body's information representation currently acquired is specified emotional state When, then the specified corresponding audio data of emotional state or video data can be played, for example, identifying that current emotional states are When mood of crying and screaming, then plays and pacify pettish audio or video of crying.

For example, relevant information (such as typing of feature templates is arranged by mobile phone P by user such as the applicable schematic diagram of a scenario of Fig. 2 The crying of target person, laugh, image when crying or image when laughing at), and the relevant information is sent in play equipment T, Relevant information generates corresponding feature templates to play equipment T based on the received, such as crying template, laugh template, mood of crying and screaming mould Plate or pleasant mood template etc..In addition, play equipment T can also acquire the voice signal of target baby child C or the figure of facial expression Picture, and go out according to the voice signal and/or image recognition the emotional state of baby child C, if the emotional state identified is to cry and scream When mood then play pacify pettish audio of crying, if the emotional state identified be pleasant mood when if play with the pleasure The relevant audio of mood.

In order to enable those skilled in the art more fully understand the embodiment of the present disclosure, below to pacify the use of baby child's mood For scene, the example of two application scenarios is enumerated to be illustrated to the embodiment of the present disclosure, but it is to be understood that, this application The example of scene is served only for explaining the embodiment of the present disclosure, and cannot be used for the limitation disclosure.All meet the reality of disclosure thought Example is applied within the protection scope of the disclosure.

The step of with reference to Fig. 3 being the disclosure another control method for playing back embodiment shown according to an exemplary embodiment Flow chart, in this example, playback equipment can be the equipment for not having camera or display screen, and the present embodiment passes through acquisition baby Virgin sound can specifically include following steps to achieve the effect that play control:

Step 301, the voice signal of target person is acquired.

In a kind of scene, the present embodiment be can be applied in playback equipment, and playback equipment can be the object for appreciation for capableing of sounding Has equipment, target person can be virgin for baby.

When realizing, the voice signal of target person can be acquired by the sound transducer of playback equipment.

Step 302, the emotional state of the voice signal characterization is identified.

In this step, for the voice signal of acquisition, which can be identified, to judge that the voice is believed Number characterization emotional state, wherein the emotional state may include pleasant mood or mood of crying and screaming, the voice under pleasant mood Signal is laugh, and the voice signal is crying under mood of crying and screaming.

In a kind of possible embodiment, step 302 may include following sub-step:

The voice signal is matched with preset crying template or laugh template；If the voice signal matches The crying template then determines that the emotional state of the voice signal characterization is mood of crying and screaming；If the voice signal matches The laugh template then determines that the emotional state of the voice signal characterization is pleasant mood.

In one embodiment, crying template or laugh template can be the feature templates of default.For example, being set in broadcasting Preconfigured crying or laugh template when standby factory.

In another embodiment, crying template or laugh template can also for the preparatory typing baby child of user crying or The template that laugh generates.Then before step 302, the present embodiment can also include the following steps:

It receives the feature templates that external equipment is sent and instruction is set, the feature templates setting instruction includes that user passes through institute State the reference crying of external equipment input or with reference to laugh；Instruction is set according to the feature templates and generates crying template or laugh Template, and store the crying template or laugh template.

For example, external equipment can be the mobile phone of user, user can be installed in mobile phone for managing this playback equipment Control program, when playback equipment networking after, the playback equipment can be added in a control program.Then user passes through control The setting function setting crying template of program or the relevant information of laugh template.For example user can click crying in control program Then sound Template button records the crying of baby, and the crying and crying template identification is generated feature templates setting instruction hair It send to playback equipment, identifies that the sound is crying according to the crying template identification by playback equipment, and store crying conduct and cry Acoustic mode plate.For another example, user can also click the laugh Template button in control program, then record the laugh of baby, and should Laugh and laugh template identification generate feature templates setting instruction and are sent to playback equipment, by playback equipment according to the laugh template Mark identifies that the sound is laugh, and stores the laugh as laugh template.

In one embodiment, voice signal and preset crying template and laugh template are carried out matched mode can With are as follows: feature extraction is carried out to the crying template and the laugh template respectively, obtains corresponding crying feature vector sequence Column and laugh characteristic vector sequence；Feature extraction is carried out to the voice signal, obtains corresponding mentioned speech feature vector sequence；It will The mentioned speech feature vector sequence carries out frame by frame with the crying characteristic vector sequence and the laugh characteristic vector sequence respectively Matching, recognition result of the maximum classification of similarity that every frame speech feature vector is matched to as the frame.Crying is counted respectively Quantity in characteristic vector sequence and laugh characteristic vector sequence as recognition result；If the identification in crying characteristic vector sequence As a result quantity is greater than the quantity of the recognition result in laugh characteristic vector sequence, then can be determined that current speech signal matches Crying template；If the quantity of the recognition result in crying characteristic vector sequence is less than the recognition result in laugh characteristic vector sequence Quantity, then can be determined that current speech signal matches laugh template.

Illustratively, the mode that features described above is extracted can be with are as follows: passes through voice signal or crying template or laugh template The time-domain signal x (n) of corresponding each speech frame is obtained after preemphasis, framing, windowing process；Time-domain signal is passed through into quick Fu In obtain linear spectral x (k) after leaf transformation (FFT)；Above-mentioned linear spectral x (k) is obtained into Mel frequency by Mel filter group. For the number value of filter between 24-40, this example takes M=25 in filter group；Calculate the logarithmic energy S of Mel filter (m), it and to it does long-lost cosine code (DCT) and has just obtained Mel frequency cepstral coefficient C (n).Feature vector is in the present embodiment The mel-frequency cepstrum coefficient MFCC of 12 dimensions.

In the alternatively possible embodiment of the present embodiment, step 302 may include following sub-step:

As an example, audio frequency characteristics may include the features such as acoustic amplitudes.

Assuming that different emotional states include pleasant mood and mood of crying and screaming, if voice signal belongs to pettish likelihood ratio of crying It is larger, then it represents that the voice signal is crying, and corresponding emotional state is mood of crying and screaming.If voice signal belongs to pleasant mood Probability is bigger, then it represents that the voice signal is laugh, and corresponding emotional state is pleasant mood.

Step 303, it if the emotional state of voice signal characterization is mood of crying and screaming, plays and the mood pair of crying and screaming The audio data answered；If the emotional state of the voice signal characterization is pleasant mood, play corresponding with the pleasant mood Audio data.

As an example, the corresponding audio data of mood of crying and screaming can include but is not limited to following one kind or combination: Gentle language that the sound of amniotic fluid, parent record in advance, the nursery rhymes releived or other music, etc..The corresponding sound of pleasant mood Frequency is according to can include but is not limited to following one kind or combination: if cheerful and light-hearted nursery rhymes or other music, parent record in advance Language etc..

In a kind of possible embodiment, it can also include the following steps:

When the volume for detecting the voice signal weakens, the volume of the audio data of broadcasting is reduced or adjusted Section is mute.

In one example, when playback equipment detects that the laugh of baby child or crying gradually weaken, for example, baby child be will When the hypnagogic state of sleep, the volume of the audio data of broadcasting can be reduced or be adjusted to mute, such as gradually become volume It is small until mute (volume from be varied down to mute duration can be by user setting).When detect again baby it is virgin cry and scream mood or When pleasant mood, then step 302 and step 303 are continued to execute.

In the present embodiment, playback equipment voluntarily identifies voice signal characterization by the voice signal of acquisition baby child Emotional state is to cry and scream mood or pleasant mood, and played and mood or the corresponding sound of pleasant mood of crying and screaming according to the result of identification Frequency evidence determines suitable audio of pacifying according to the real-time emotion state of baby child to realize, helps the virgin adjustment feelings in time of baby While thread, parent is also assisted to nurse baby virgin.

The step of with reference to Fig. 4 being the disclosure another control method for playing back embodiment shown according to an exemplary embodiment Flow chart, in this example, playback equipment can be the equipment for having the device (such as camera) and display screen of camera function, this Embodiment, which achievees the effect that play by the facial expression of acquisition baby child, to be controlled, and can specifically include following steps:

Step 401, the image of facial expression of the acquisition comprising target person.

In a kind of scene, playback equipment can be the play equipment with camera, and target person can be virgin for baby.

When realizing, the image of target person can be acquired by the camera of playback equipment.

Step 402, the emotional state of the facial expression characterization in described image is identified.

In this step, for the image of acquisition, Emotion identification can be carried out to the image, to identify in the image The emotional state of facial expression characterization is cry and scream mood or pleasant mood.

In a kind of possible embodiment of the present embodiment, step 402 may include following sub-step:

In the deep learning network that described image input has been trained, extracted in described image by the deep learning network Facial expression feature, according to the facial expression feature predict described image correspond respectively to the general of different emotional states Rate, the emotional state that the maximum emotional state of output probability is characterized as described image.

For example, predict that the emotional state of facial expression characterization is belonging respectively to cry and scream the probability of mood or pleasant mood, if The corresponding probability of mood of crying and screaming is big, then determines that facial expression is mood of crying and screaming；If the corresponding probability of pleasant mood is big, determining should Facial expression belongs to pleasant mood.

Step 403, if mood of crying and screaming, then audio data corresponding with the mood of crying and screaming or video data are played；If For pleasant mood, then audio data corresponding with the pleasant mood or video data are played.

As an example, the corresponding audio data of mood of crying and screaming or video data can include but is not limited to following one Kind or combination: the gentle language that the sound or video of amniotic fluid, parent record in advance, the nursery rhymes or other animated videos releived, etc. Deng.The corresponding audio data of pleasant mood or video data can include but is not limited to following one kind or combination: cheerful and light-hearted youngster The language etc. that song or other animations, parent are recorded in advance.

In a kind of possible embodiment, it can also include the following steps:

When detecting that target person is in hypnagogic state from image, by the audio data or video data of broadcasting Volume reduce or be adjusted to mute.

It in one example, can be by broadcasting when detecting that target person is in hypnagogic state by neural network The volume of audio data or video data reduces or is adjusted to mute, and volume is such as become progressively smaller until that mute (volume is from change It is small to mute duration can be by user setting).When detect again target person cry and scream mood or pleasant mood when, then after It is continuous to execute step 402 and step 403.

In the present embodiment, playback equipment voluntarily identifies the facial expression by the image of the facial expression of acquisition baby child The mood of characterization be cry and scream mood or pleasant mood, and according to the result of identification play with cry and scream mood or pleasant mood it is corresponding Audio data or video data determine suitable audio of pacifying according to the real-time emotion state of baby child to realize, help baby While the virgin mood of adjustment in time, parent is also assisted to nurse baby virgin.

It is the step of the embodiment of the method for the disclosure another broadcasting control shown according to an exemplary embodiment with reference to Fig. 5 Rapid flow chart, in this example, playback equipment can be the equipment for having camera and display screen, and the present embodiment passes through baby child's The combination of facial expression and voice signal can specifically include following steps to achieve the effect that play control:

Step 501, the voice signal of target person and the image comprising facial expression are acquired.

Step 502, identify whether the facial expression in described image is cry and scream mood or pleasant mood, and, identify institute Whether predicate sound signal is crying or laugh.

Step 503, if the facial expression is to cry and scream mood and the voice signal is crying, broadcasting is cry and screamed with described The corresponding audio data of mood or video data；If the facial expression is pleasant mood and the voice signal is laugh, Play audio data corresponding with the pleasant mood or video data.

In the present embodiment, playback equipment is voluntarily known by the image and voice signal of the facial expression of acquisition baby child The mood of other baby child be cry and scream mood or pleasant mood, and according to the result of identification play with cry and scream mood or pleasant mood it is corresponding Audio data or video data, thus realize suitable audio of pacifying is determined according to the real-time emotion state of baby child, help While baby child adjusts mood in time, parent is also assisted to nurse baby child.

Various technical characteristics in embodiment of above can be arbitrarily combined, as long as the combination between feature is not present Conflict or contradiction, but as space is limited, it is not described one by one, therefore the various technical characteristics in above embodiment is any It is combined the range for also belonging to this disclosure.

Corresponding with a kind of aforementioned embodiment of the method for playing control, the disclosure additionally provides a kind of broadcast control device Embodiment.

As shown in fig. 6, Fig. 6 is a kind of disclosure broadcast control device embodiment shown according to an exemplary embodiment Block diagram can specifically include such as lower unit:

Characteristic information acquisition module 601 is configured as characteristics of human body's information of acquisition target person, characteristics of human body's letter Breath includes voice signal and/or the image comprising facial expression；

Emotion identification module 602 is configured as identifying the emotional state of characteristics of human body's information representation；

Playing module 603 is configured as when the emotional state of characteristics of human body's information representation is specified emotional state, Play audio data corresponding with the specified emotional state or video data.

In a kind of alternative embodiment of the embodiment of the present disclosure, the Emotion identification module 602 may include following submodule Block:

In a kind of alternative embodiment of the embodiment of the present disclosure, when the feature templates that the feature templates are preparatory typing When, described device further include:

Command reception module is arranged in feature templates, is configured as receiving the feature templates setting instruction that external equipment is sent, The feature templates setting instruction includes the fixed reference feature information and the ginseng that instruction user is inputted by the external equipment Examine the emotional state mark of characteristic information characterization；

In a kind of alternative embodiment of the embodiment of the present disclosure, when characteristics of human body's information is the voice signal, The feature templates include crying template and laugh template；It is when the feature templates matched are crying template, then corresponding Emotional state is mood of crying and screaming；When the feature templates matched are laugh template, then corresponding emotional state is pleasant mood.

In a kind of alternative embodiment of the embodiment of the present disclosure, described device further include:

In a kind of alternative embodiment of the embodiment of the present disclosure, when characteristics of human body's information is the voice signal, The Emotion identification module 602 is specifically configured to:

It is described when characteristics of human body's information is described image in a kind of alternative embodiment of the embodiment of the present disclosure Emotion identification module 602 is specifically configured to:

In a kind of alternative embodiment of the embodiment of the present disclosure, the Emotion identification module 602 is specifically configured to:

As seen from the above-described embodiment, the embodiment of the present disclosure is by acquiring characteristics of human body's information of target person, and identifying should The emotional state of characteristics of human body's information representation is broadcast if the emotional state of human body characteristic information characterization is specified emotional state Audio data corresponding with specified emotional state or video data are put, to realize playback equipment according to the real-time emotion or row of people For the mode for the audio or video that adjustment plays, achievees the effect that propitiator's mood, improves the intelligence degree of playback equipment, Simplify user's operation.

The specific details of the realization process of the function of each unit and effect are shown in the above system embodiment in above-mentioned apparatus It specifically describes, details are not described herein.

For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual The purpose for needing to select some or all of the modules therein to realize disclosure scheme.Those of ordinary skill in the art are not paying Out in the case where creative work, it can understand and implement.

As shown in fig. 7, Fig. 7 is a kind of disclosure block diagram of playback equipment 700 shown according to an exemplary embodiment.

Referring to Fig. 7, equipment 700 may include following one or more components: processing component 702, memory 704, power supply Component 706, multimedia component 708, audio component 710, the interface 712 of input/output (I/O), sensor module 714, and Communication component 716.

The integrated operation of the usually control equipment 700 of processing component 702, processing component 702 may include one or more places Device 720 is managed to execute instruction, to perform all or part of the steps of the methods described above.In addition, processing component 702 may include one A or multiple modules, convenient for the interaction between processing component 702 and other assemblies.For example, processing component 702 may include more matchmakers Module, to facilitate the interaction between multimedia component 708 and processing component 702.

Memory 704 is configured as storing various types of data to support the operation in equipment 700.These data are shown Example includes the instruction of any application or method for operating in equipment 700.Memory 704 can be by any kind of Volatibility or non-volatile memory device or their combination are realized, such as static random access memory (SRAM), electrically erasable Except programmable read only memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, disk or CD.

Power supply module 706 provides electric power for the various assemblies of equipment 700.Power supply module 706 may include power management system System, one or more power supplys and other with for equipment 700 generate, manage, and distribute the associated component of electric power.

Multimedia component 708 includes the screen of one output interface of offer between the equipment 700 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.Audio component 710 is configured as Output and/or input audio signal.For example, audio component 710 includes a microphone (MIC), when equipment 700 is in operation mould Formula, when such as call mode, recording mode, and voice recognition mode, microphone is configured as receiving external audio signal.It is received Audio signal can be further stored in memory 704 or via communication component 714 send.In some embodiments, sound Frequency component 710 further includes a loudspeaker, is used for output audio signal.

I/O interface 712 provides interface between processing component 702 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 714 includes one or more sensors, and the state for providing various aspects for equipment 700 is commented Estimate.For example, sensor module 714 can detecte the state that opens/closes of equipment 700, and the relative positioning of component, for example, it is described Component is the display and keypad of equipment 700, and sensor module 714 can be with a group in detection device 700 or equipment 700 The position change of part, the existence or non-existence that user contacts with equipment 700,700 orientation of equipment or acceleration/deceleration and equipment 700 Temperature change.Sensor module 414 may include proximity sensor, be configured to examine without any physical contact Survey presence of nearby objects.Sensor module 714 can also include that optical sensor is used for such as CMOS or ccd image sensor It is used in imaging applications.In some embodiments, which can also include acceleration transducer, and gyroscope passes Sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 716 is configured to facilitate the communication of wired or wireless way between equipment 700 and other equipment.Equipment 700 can access the wireless network based on communication standard, such as WiFi, 2G or 4G or their combination.In an exemplary implementation In example, communication component 716 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 716 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, equipment 700 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 704 of instruction, above-metioned instruction can be executed by the processor 720 of equipment 700 to complete the above method.For example, The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..

Wherein, when the instruction in the storage medium is executed by the processor, so that equipment 700 is able to carry out one kind The method for playing control, comprising: acquire characteristics of human body's information of target person, characteristics of human body's information includes voice signal And/or the image comprising facial expression；Identify the emotional state of characteristics of human body's information representation；If characteristics of human body's information The emotional state of characterization is specified emotional state, then plays audio data corresponding with the specified emotional state or video counts According to.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the disclosure Its embodiment.The disclosure is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

The foregoing is merely the preferred embodiments of the disclosure, not to limit the disclosure, all essences in the disclosure Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of disclosure protection.

Claims

1. a kind of control method for playing back, which is characterized in that the described method includes:

Characteristics of human body's information of target person is acquired, characteristics of human body's information includes voice signal and/or comprising facial expression Image；

If the emotional state of characteristics of human body's information representation is specified emotional state, play and the specified emotional state pair The audio data or video data answered.

2. the method according to claim 1, wherein the mood shape of identification characteristics of human body's information representation State, comprising:

Characteristics of human body's information is matched with multiple preset feature templates, wherein the feature templates are default Feature templates or the feature templates of the preparatory typing of user, each feature templates have corresponding emotional state；

3. according to the method described in claim 2, it is characterized in that, when the feature templates that the feature templates are preparatory typing When, before the emotional state of identification characteristics of human body's information representation, the method also includes:

It receives the feature templates that external equipment is sent and instruction is set, the feature templates setting instruction includes that instruction user passes through institute State the emotional state mark of external equipment input fixed reference feature information and the fixed reference feature information representation；

Instruction is set according to the feature templates and generates the feature templates identified for characterizing the emotional state, and described in storage Feature templates.

4. according to the method described in claim 2, it is characterized in that, when characteristics of human body's information be the voice signal when, The feature templates include crying template and laugh template；It is when the feature templates matched are crying template, then corresponding Emotional state is mood of crying and screaming；When the feature templates matched are laugh template, then corresponding emotional state is pleasant mood.

5. according to the method described in claim 4, it is characterized in that, playing audio number corresponding with the emotional state described According to or video data after, the method also includes:

When the volume for detecting the voice signal weakens, the volume of the audio data of broadcasting or video data is reduced Or it is adjusted to mute.

6. method according to claim 1-5, which is characterized in that when characteristics of human body's information is the voice When signal, the emotional state of identification characteristics of human body's information representation, comprising:

In the deep learning network that voice signal input has been trained, the voice is extracted by the deep learning network and is believed Audio frequency characteristics in number predict that the voice signal corresponds to the probability of different emotional states, output according to the audio frequency characteristics The emotional state that the emotional state of maximum probability is characterized as the voice signal.

7. method according to claim 1-3, which is characterized in that when characteristics of human body's information is described image When, the emotional state of identification characteristics of human body's information representation, comprising:

In the deep learning network that described image input has been trained, the face in described image is extracted by the deep learning network Portion's expressive features predict that described image corresponds to the probability of different emotional states according to the facial expression feature, and output is general The emotional state that the maximum emotional state of rate is characterized as described image.

8. the method according to claim 1, wherein the mood shape of identification characteristics of human body's information representation State, comprising:

When by the voice signal determine emotional state and by described image determine emotional state it is consistent when, then really The emotional state of fixed characteristics of human body's information representation is the consistent emotional state.

9. a kind of broadcast control device, which is characterized in that described device includes:

Characteristic information acquisition module, is configured as characteristics of human body's information of acquisition target person, and characteristics of human body's information includes Voice signal and/or image comprising facial expression；

Playing module is configured as when the emotional state of characteristics of human body's information representation is specified emotional state, play with The corresponding audio data of the specified emotional state or video data.

10. device according to claim 9, which is characterized in that the Emotion identification module includes:

Feature templates matched sub-block is configured as characteristics of human body's information and multiple preset feature templates progress Match, wherein the feature templates be default feature templates or the preparatory typing of user feature templates, each feature templates have pair The emotional state answered；Using emotional state corresponding to the feature templates matched as the mood shape of characteristics of human body's information representation State.

11. device according to claim 10, which is characterized in that when the feature templates that the feature templates are preparatory typing When, described device further include:

Command reception module is arranged in feature templates, is configured as receiving the feature templates setting instruction that external equipment is sent, described Feature templates setting instruction includes that instruction user passes through external equipment input fixed reference feature information and the fixed reference feature The emotional state of information representation identifies；

Feature templates generation module is configured as being arranged according to the feature templates instruction generation for characterizing the emotional state The feature templates of mark, and store the feature templates.

12. device according to claim 10, which is characterized in that when characteristics of human body's information is the voice signal When, the feature templates include crying template and laugh template；When the feature templates matched are crying template, then correspond to Emotional state be to cry and scream mood；When the feature templates matched are laugh template, then corresponding emotional state is pleasant feelings Thread.

13. device according to claim 12, which is characterized in that described device further include:

Volume adjusting module is configured as when the volume for detecting the voice signal weakens, by the audio number of broadcasting According to or video data volume reduce or be adjusted to mute.

14. according to the described in any item devices of claim 9-13, which is characterized in that when characteristics of human body's information is institute's predicate When sound signal, the Emotion identification module is specifically configured to:

15. according to the described in any item devices of claim 9-11, which is characterized in that when characteristics of human body's information is the figure When picture, the Emotion identification module is specifically configured to:

16. device according to claim 9, which is characterized in that the Emotion identification module is specifically configured to:

17. a kind of playback equipment characterized by comprising

Processor；

Memory for storage processor executable instruction；

Wherein, the processor is configured to: