CN106503127B

CN106503127B - Music data processing method and system based on facial action identification

Info

Publication number: CN106503127B
Application number: CN201610912440.3A
Authority: CN
Inventors: 简仁贤; 何芳琳; 赵伟翔; 于庭婕; 黄品瑞; 廖健宏; 陈智凯; 孙廷伟; 杨闵淳
Original assignee: Intelligent Technology (shanghai) Co Ltd
Current assignee: Intelligent Technology (shanghai) Co Ltd
Priority date: 2016-10-19
Filing date: 2016-10-19
Publication date: 2019-09-27
Anticipated expiration: 2036-10-19
Also published as: CN106503127A

Abstract

The present invention provides a kind of music data processing method and system based on facial action identification, methods are as follows: obtains background music data and foreground music data, foreground music data is divided into multistage trifle by beat, every trifle includes multiple bats；The face action that people is detected within the foreground music time obtains multiple facial action data, the foreground music data of the corresponding trifle time span of each facial action；Background music data is carried out to continue broadcasting, and the foreground music data of the corresponding trifle time span of each facial action data are matched, and in conjunction with background music, generate new music.The present invention is based on the music data processing methods and system of facial action identification, using based on face's key point identification technology, the mode that music data and real image are combined, realize the interaction of people and corresponding scene, it does not need to be assisted by outside setting, implementation is simple, improves user experience.

Description

Music data processing method and system based on facial action identification

Technical field

The present invention relates to data processing fields, more particularly to the music data processing based on facial action identification.

Background technique

It in the prior art, is mostly to utilize mouse, keyboard, joystick, Touch Screen, external sensor (such as Wii, dancing machine Pedal etc.), human posture (such as Kinect) mode operate, realize people and corresponding scene interaction.Wherein, human posture operates Mode needs the information such as deep space, it is necessary to use specific device；Therefore it needs to detect whole body, needs biggish space ability Game.In addition the problems such as there are also accuracy rate is low, offset is high, causes the interest in interactive process to reduce；In addition such shape Scene interactivity major part is less in conjunction with real image for human action detecting (example: raising one's hand, skirt).

Face mainly in conjunction with image composing technique, is become other animals, worn by the application of face's key point identification at present Upper difference ornaments simultaneously combine facial action to generate animation etc..

The application when facial movements such as thin face and face's rehabilitation is provided: traditional facial movement through text process description or Achieve the effect that facial movement through the face movement demonstration in study film, but such facial movement process there is no with it is true Image combines, and Experience Degree is low.

Therefore, defect in the prior art is the interaction for people and corresponding scene, needs to assist by external equipment real Existing, implementation is single, and user experience is low.Face's key point identification technology and real image can not be combined.

Summary of the invention

In view of the above technical problems, the present invention provide it is a kind of based on facial action identification music data processing method and be System uses and is based on face's key point identification technology, the mode that music data and real image are combined, and realizes people and corresponding The interaction of scene does not need to be assisted by outside setting, and implementation is simple, improves user experience.

In order to solve the above technical problems, present invention provide the technical scheme that

In a first aspect, the present invention provides a kind of music data processing method based on facial action identification, comprising:

Step S1 obtains background music data and foreground music data, the background music data and the foreground music Data are respectively music of the number of segment second to several minutes of length；

The foreground music data are divided into multistage trifle by beat by step S2, and every trifle includes multiple bats；

Step S3 detects the face action of people within the foreground music time, obtains more within the foreground music time A facial action data, the foreground music data of the corresponding trifle time span of each facial action；

Step S4 carries out the background music data to continue broadcasting, and each facial action data are right with it The foreground music data for the trifle time span answered are matched, and in conjunction with the background music, generate new music.

The technical scheme is that first obtaining background music data and foreground music data, the background music data It is respectively music of the number of segment second to several minutes of length with the foreground music data；Then by the foreground music data by section Bat is divided into multistage trifle, and every trifle includes multiple bats；

Then the face action that people is detected within the foreground music time, obtains multiple faces within the foreground music time Portion's action data, the foreground music data of the corresponding trifle time span of each facial action；Finally by the background music number According to carrying out continuing broadcasting, and by the foreground music of the corresponding trifle time span of each facial action data Data are matched, and in conjunction with the background music, generate new music.

The present invention is based on the music data processing methods of facial action identification, use and identify skill based on face's key point Art, the mode that music data and real image are combined realize the interaction of people and corresponding scene, do not need to be arranged by outside It is assisted, implementation is simple, improves user experience.

Further, after the step S2, further includes:

The background music data is carried out to continue broadcasting, obtains target face action data, target face action data Corresponding unique trifle foreground music；

According to the target face action data, acquisition starts the face in the bit time of front and back in every trifle and moves Make data；

The facial action data and the target face action data are subjected to matching judgement, carry out a trifle foreground voice The happy selection played:

When the facial action data are matched with the target face action data, the trifle foreground music is played, The one trifle foreground music is and the unique corresponding foreground music of the target face action data；

When the facial action data and target face action data mismatch, the facial action data are not played Corresponding trifle foreground music.

Further, after the step S2, further includes:

By the background music data carry out continue broadcasting, obtain virtual scene data, the virtual scene data be to The virtual scene data of each position mobile object in human face；

According to the virtual scene data of the position mobile object each into human face, corresponding facial action number is obtained According to the acquisition of the corresponding facial action data is before the position that the mobile object reaches in the human face；

The facial action data and the virtual scene data are subjected to matching judgement, carry out the virtual scene data The processing of middle corresponding mobile object:

When the facial action data and the virtual scene Data Matching, moved corresponding in the virtual scene data Object removes；

When the facial action data and the virtual scene data mismatch, to shifting corresponding in the virtual scene data Animal body is not dealt with；

After the rear bit time that every trifle starts, without corresponding facial action and target face action data Match, mobile object removal will be corresponded in the virtual scene data.

Further, will be corresponded in the virtual scene data mobile object remove after, comprising:

Obtain the effect data that mobile object removal is corresponded in the virtual scene data；

According to corresponded in the virtual scene data mobile object removal effect data, to the virtual scene data The matched facial action data are evaluated, and evaluation result is obtained.

Further, the identification of facial action is carried out by the identification of face key point and fuzzy control theory.

A kind of music data processing method based on facial action identification of the present invention, is based on face's key point identification technology, The mode that music data and real image are combined realizes the interaction of people and corresponding scene, i.e., the knowledge being failure to actuate by face Not, it is matched with the music data of corresponding scene, realizes the creation of music, correspond to virtual objects in the broadcasting and scene of music Elimination, be presented in user in animated way at the moment, do not need by outside setting assisted, implementation is simple, mentions High user experience.

Further, further includes:

The background music data is carried out to continue broadcasting, obtains target face action data, the target face movement Data correspond to a trifle foreground music, and the foreground music is divided into the first foreground music and the second foreground music, before described first Scape music matches broadcasting with the background music, and second foreground music and the background music, which mismatch, to be played；

The facial action data and the target face action data are subjected to matching judgement, carry out the target face The broadcasting of the corresponding trifle foreground music of action data is chosen:

When the facial action data are matched with the target face action data, the target face action data is played A corresponding trifle foreground music, and the trifle foreground music is the first prospect corresponding with the target face action data Music；

When the facial action data and target face action data mismatch, the target face movement number is played According to a corresponding trifle foreground music, and the trifle foreground music be with before the target face action data corresponding second Scape music.

After facial action data match target face action data successful match with subject performance data, corresponding play should Target face acts corresponding trifle foreground music, also, the first foreground music and background music match harmonious broadcasting, , whereas if facial action data and target face action data that user makes mismatch, then the corresponding target face that plays moves Make corresponding second foreground music, the second foreground music and background music do not match that harmonious broadcasting, pass through broadcasting in this way The difference of music can judge facial action data that user makes whether with target face action data successful match, improve User experience.

Second aspect, the present invention provides a kind of music data processing systems based on facial action identification, comprising:

Music data obtains module, for obtaining background music data and foreground music data, the background music data It is respectively music of the number of segment second to several minutes of length with the foreground music data；

Music data processing module, for the foreground music data to be divided into multistage trifle by beat, every trifle includes Multiple bats；

Facial action obtains module, for detecting the face action of people within the foreground music time, obtains in the prospect Multiple facial action data in musical time, the foreground music data of the corresponding trifle time span of each facial action；

Musical composition module continues broadcasting for carrying out the background music data, and by each facial action The foreground music data of the corresponding trifle time span of data are matched, raw in conjunction with the background music The music of Cheng Xin.

The technical solution of the present invention is as follows: first passing through music data obtains module, background music data and foreground music are obtained Data, the background music data and the foreground music data are respectively music of the number of segment second to several minutes of length；Then By music data processing module, the foreground music data are divided into multistage trifle by beat, every trifle includes multiple bats；

Then module is obtained by facial action, the face action of people is detected within the foreground music time, obtained described Multiple facial action data in the foreground music time, the foreground music number of the corresponding trifle time span of each facial action According to；Finally by musical composition module, the background music data continue broadcasting, and by each facial action number It is matched according to the foreground music data of the corresponding trifle time span, in conjunction with the background music, is generated New music.

A kind of music data processing system based on facial action identification of the present invention, is used and is identified based on face's key point Technology, the mode that music data and real image are combined realize the interaction of people and corresponding scene, do not need to set by outside It sets and is assisted, implementation is simple, improves user experience.

Further, after the music data processing module, further include the selection of music playing module, be used for:

Further, after the music data processing module, further include music virtual scene module, be used for:

Further, in the music virtual scene module, including effect assessment submodule, by the virtual scene number After removing according to middle corresponding mobile object, the effect assessment submodule is used for:

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art are briefly described.

Fig. 1 shows a kind of music data processing side based on facial action identification provided by first embodiment of the invention The flow chart of method；

Fig. 2 shows a kind of music data processing sides based on facial action identification provided by first embodiment of the invention First schematic diagram of time shaft in method；

Fig. 3 shows a kind of music data processing side based on facial action identification provided by first embodiment of the invention Second schematic diagram of time shaft in method；

Fig. 4 shows a kind of music data processing system based on facial action identification provided by second embodiment of the invention The schematic diagram of system.

Specific embodiment

It is described in detail below in conjunction with embodiment of the attached drawing to technical solution of the present invention.Following embodiment is only used for Clearly illustrate technical solution of the present invention, therefore be intended only as example, and cannot be used as a limitation and limit protection of the invention Range.

Embodiment one

Fig. 1 shows a kind of music data processing side based on facial action identification provided by first embodiment of the invention The flow chart of method；As shown in Figure 1, the embodiment of the present invention one provides a kind of music data processing side based on facial action identification Method, comprising:

Step S1 obtains background music data and foreground music data, background music data and foreground music data difference For a number of segment second to several minutes of long music；

Foreground music data are divided into multistage trifle by beat by step S2, and every trifle includes multiple bats；

Step S3 detects the face action of people within the foreground music time, obtains multiple faces within the foreground music time Portion's action data, the foreground music data of the corresponding trifle time span of each facial action；

Background music data continue broadcasting by step S4, and by corresponding one small of each facial action data The foreground music data of section time span are matched, and in conjunction with background music, generate new music.

The technical scheme is that first obtaining background music data and foreground music data, background music data is with before Scape music data is respectively music of the number of segment second to several minutes of length；Then it is small foreground music data to be divided into multistage by beat Section, every trifle include multiple bats；

Then the face action that people is detected within the foreground music time, the multiple faces obtained within the foreground music time are dynamic Make data, the foreground music data of the corresponding trifle time span of each facial action；Finally background music data is held Continued broadcasting is put, and the foreground music data of the corresponding trifle time span of each facial action data are matched, with Background music combines, and generates new music.

The present invention is based on the music data processing methods of facial action identification, and different scene informations can be set, different Scene information have different needs, need to complete according to different demand in scene information at the appointed time, if different Scene information matched from different facial actions, then may be implemented different movements, and then realize interacting for various people and scene.

Specifically, after step S2, further includes:

Background music data is carried out to continue broadcasting, obtains target face action data, target face action data is corresponding Unique trifle foreground music；

According to target face action data, obtains and start the facial action data in the bit time of front and back in every trifle；

Facial action data and target face action data are subjected to matching judgement, carry out what a trifle foreground music played It chooses:

When facial action data are matched with target face action data, a trifle foreground music, a trifle foreground voice are played It is happy for the unique corresponding foreground music of target face action data；

When facial action data and target face action data mismatch, the corresponding trifle of facial action data is not played Foreground music.

Specifically, the foreground music of more trifles is set, and every trifle foreground music corresponds to only one facial action, corresponding After first facial action and first aim facial action successful match, next facial action and next target are carried out The matching of facial action can act according to different target faces in this way, realize the continuous broadcasting of foreground music.

Specifically, after step S2, further includes:

Background music data is carried out to continue broadcasting, obtains virtual scene data, virtual scene data are into human face The virtual scene data of each position mobile object；

According to the virtual scene data of position each into human face mobile object, corresponding facial action data are obtained, The acquisition of corresponding facial action data is before the position that mobile object reaches in human face；

Facial action data and virtual scene data are subjected to matching judgement, carry out corresponding to motive objects in virtual scene data The processing of body:

When facial action data and virtual scene Data Matching, mobile object removal will be corresponded in virtual scene data；

When facial action data and virtual scene data mismatch, do not make to locate to mobile object is corresponded in virtual scene data Reason；

After the rear bit time that every trifle starts, matched without corresponding facial action with target face action data, Mobile object removal will be corresponded in virtual scene data.

In conjunction with AR technology, face facial positions are corresponded in real scene, and to be arranged different virtual objects mobile to face, Bit time is event horizon before and after virtual objects are moved to the time point in face facial positions, and correspondence is made each virtual Article corresponding elimination movement is then done next movement, is increased after the facial action and elimination that people is done act successful match Interest, while successful match or unsuccessfully having different an audio and animation.After a period of time, match time shortens, Movement speed is eliminated in matching to be accelerated.

Specifically, will be corresponded in virtual scene data mobile object remove after, comprising:

Obtain the effect data that mobile object removal is corresponded in virtual scene data；

According to the effect data for corresponding to mobile object removal in virtual scene data, to the face with virtual scene Data Matching Portion's action data is evaluated, and evaluation result is obtained.

As shown in Fig. 2, horizontal axis is time shaft, time, the right represent later time earlier for left side representative.More slightly grow The longitudinal axis is trifle separation, and the separation of point is clapped in relatively thin short representative.In this legend, driving range is before every trifle starts Two bats are all correct driving, and 0.5 bat drives before and after starting previous bat with every trifle to be perfect, in Fig. 2, the part that e is shown, and e Front and back 0.5 clap as common driving, the part shown such as f.Perfect, common driving time and range all can be replaced freely.With The method does the evaluation criterion to virtual scene Data Matching facial action data.

Specifically, the identification of facial action is carried out by the identification of face key point and fuzzy control theory.

In the present invention, handled based on above-mentioned music data, be based on the identification of face key point, establish one it is reliable it Face system, finally combine fuzzy control theory accurately identify facial action: blink, cross-eye, choose eyebrow, frown, the nose that wrinkles, It lolls, mouth of beeping, open one's mouth, wapperijaw, lick lip, close lightly lip, nod, head rotates left and right, head is rotated up and down etc..Wherein face is crucial Point identification technology is the well known prior art, does not do excessive narration herein.

Specifically, further includes:

Background music data is carried out to continue broadcasting, obtains target face action data, target face action data is corresponding One trifle foreground music, foreground music are divided into the first foreground music and the second foreground music, the first foreground music and background music Matching plays, and the second foreground music and background music are mismatched and played；

Facial action data and target face action data are subjected to matching judgement, it is corresponding to carry out target face action data The broadcasting of one trifle foreground music is chosen:

When facial action data are matched with target face action data, before playing the corresponding trifle of target face action data Scape music, and a trifle foreground music is the first foreground music corresponding with target face action data；

When facial action data and target face action data mismatch, broadcasting target face action data corresponds to a trifle Foreground music, and a trifle foreground music is the second foreground music corresponding with target face action data.

Specifically, the first foreground music may be configured as indicating that successful music, the second foreground music can be set to indicate to lose The foreground music lost makes music more having any different property in this way.

As shown in figure 3, horizontal axis is time shaft, time, the right represent later time earlier for left side representative.More slightly grow The longitudinal axis is trifle separation, and the separation of point is clapped in relatively thin short representative.With the example in Fig. 3, the range of c is a trifle, the model of d It encloses and is clapped for one.A is detecting time point, represents in Fig. 3 and starts front and back one in every trifle and clap as detecting time point, it is any at this moment Between put the facial action done and can be detected.B is sphere of action, usually as unit of a trifle, is represented in this figure small herein The front and back one of section claps in range and does expression, all this trifle can be driven to make feedback.Every trifle is several to clap, detects time point and effect model Enclosing all can freely replace.

Embodiment two

Fig. 4 shows a kind of music data processing system based on facial action identification provided by second embodiment of the invention The schematic diagram of system；As shown in figure 4, second embodiment of the present invention provides a kind of music data processing systems based on facial action identification System 10, comprising:

Music data obtains module 101, for obtaining background music data and foreground music data, background music data and Foreground music data are respectively music of the number of segment second to several minutes of length；

Music data processing module 102, for foreground music data to be divided into multistage trifle by beat, every trifle includes more A bat；

Facial action obtains module 103, for detecting the face action of people within the foreground music time, obtains in foreground voice Multiple facial action data in the happy time, the foreground music data of the corresponding trifle time span of each facial action；

Musical composition module 104, for by background music data carry out continue broadcasting, and by each facial action data with The foreground music data of its corresponding trifle time span are matched, and in conjunction with background music, generate new music.

The technical solution of the present invention is as follows: first passing through music data obtains module 101, background music data and foreground voice are obtained Happy data, background music data and foreground music data are respectively music of the number of segment second to several minutes of length；Then pass through sound Foreground music data are divided into multistage trifle by beat by happy data processing module 102, and every trifle includes multiple bats；

Then module 103 is obtained by facial action, the face action of people is detected within the foreground music time, obtained preceding Multiple facial action data in scape musical time, the foreground music data of the corresponding trifle time span of each facial action； Finally by musical composition module 104, background music data is carried out to continue broadcasting, and each facial action data are right with it The foreground music data for the trifle time span answered are matched, and in conjunction with background music, generate new music.

A kind of music data processing system 10 based on facial action identification of the present invention, is used and is known based on face's key point Other technology, the mode that music data and real image are combined are realized the interaction of people and corresponding scene, are not needed by outside Setting is assisted, and implementation is simple, improves user experience.

Specifically, after music data processing module 102, further include the selection of music playing module, be used for:

Specifically, after music data processing module 102, further include music virtual scene module, be used for:

Specifically, in music virtual scene module 104, including effect assessment submodule, will be right in virtual scene data After answering mobile object to remove, effect assessment submodule is used for:

It specifically, further include facial action identification module 100, for being identified by face key point and fuzzy control theory Carry out the identification of facial action.

Specifically, further include the selection of music playing module, be used for:

A kind of music data processing system based on facial action identification of the present invention, is based on face's key point identification technology, The mode that music data and real image are combined realizes the interaction of people and corresponding scene, i.e., the knowledge acted by face Not, it is matched with the music data of corresponding scene, realizes the creation of music, correspond to virtual objects in the broadcasting and scene of music Elimination, be presented in user in animated way at the moment, do not need by outside setting assisted, implementation is simple, mentions High user experience.

Embodiment three

The music data processing method and embodiment identified in conjunction with one of embodiment of the present invention one based on facial action One of two music data processing systems based on facial action identification, are illustrated in conjunction with specific scene of game.

Scene one

Melody creation: before background music persistently plays, and each facial action corresponds to one section of one trifle time span Scape music starts the facial action detected in the bit time of front and back in every trifle, can all drive the trifle to play corresponding Foreground music.For example a series of facial actions are set, it blinks, chooses eyebrow, frowns, the nose that wrinkles etc., then identifying the facial action of people, root Corresponding to play different music according to the difference of the facial action of identification, the trifle music played every time forms different music, User is set to do different music according to the mood of oneself wound.

Scene two

Music game: background music and foreground music are the music of a number of segment minute, and background music persistently plays, game according to So as unit of trifle, specified facial action is randomly generated by game per small festival-gathering, player must in driving time (every trifle Start front and back bit time) specified facial action is completed, the foreground music of the trifle can continue to play；If there is more than one prospect Music, also may done correct expression it is more multiple after by music superposition up.By motivational music in conjunction with human face action, make to swim It plays more challenging and interest.

Scene three

AR Rhythmic game: background music persistently plays, and different articles are had on picture rhythmic past portion each on the face Displacement is dynamic, and player must do corresponding expression and remove the article when article is moved to the position.Such as: mosquito flies toward eyes Past, it is necessary to which blink kills it when mosquito reaches eyes.Success (being divided into perfect, common) unsuccessfully has different sounds Effect, animation and score.After a period of time, rhythm tempo can be accelerated, and increase game degree of difficulty.

Directly make face's action recognition using the image of camera acquirement to operate game, and can in real time with true face's shadow Interesting animation effect is generated as combining.This mode of operation is not required to other purchase of equipment, and it is dynamic can in real time, accurately to differentiate face Make；And the mode of operation of four limbs is not needed, handicapped people can be also benefited, allows them that can also enjoy the enjoyment of game.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover within the scope of the claims and the description of the invention.

Claims

1. the music data processing method based on facial action identification characterized by comprising

Step S1 obtains background music data and foreground music data, the background music data and the foreground music data The music that a respectively number of segment second is grown to several minutes；

Step S4 carries out the background music data to continue broadcasting, and each facial action data are corresponding The foreground music data of the one trifle time span are matched, and in conjunction with the background music, generate new music；

After the step S2, further includes:

The background music data is carried out to continue broadcasting, obtains target face action data, target face action data is corresponding Unique trifle foreground music；

According to the target face action data, obtains and start the facial action number in the bit time of front and back in every trifle According to；

The facial action data and the target face action data are subjected to matching judgement, a trifle foreground music is carried out and broadcasts The selection put:

When the facial action data are matched with the target face action data, the broadcasting one trifle foreground music is described One trifle foreground music is and the unique corresponding foreground music of the target face action data；

When the facial action data and target face action data mismatch, it is corresponding that the facial action data are not played A trifle foreground music.

2. the music data processing method according to claim 1 based on facial action identification, which is characterized in that

After the step S2, further includes:

The background music data is carried out to continue broadcasting, obtains virtual scene data, the virtual scene data are to people face The virtual scene data of each position mobile object in portion；

According to the virtual scene data of the position mobile object each into human face, corresponding facial action data are obtained, The acquisition of the corresponding facial action data is before the position that the mobile object reaches in the human face；

The facial action data and the virtual scene data are subjected to matching judgement, it is right in the virtual scene data to carry out Answer the processing of mobile object:

When the facial action data and the virtual scene Data Matching, mobile object will be corresponded in the virtual scene data It removes；

When the facial action data and the virtual scene data mismatch, to corresponding to motive objects in the virtual scene data Body is not dealt with；

After the rear bit time that every trifle starts, matched without corresponding facial action with target face action data, Mobile object removal will be corresponded in the virtual scene data.

3. the music data processing method according to claim 2 based on facial action identification, which is characterized in that

To be corresponded in the virtual scene data mobile object remove after, comprising:

According to corresponded in the virtual scene data mobile object removal effect data, to the virtual scene Data Matching The facial action data evaluated, obtain evaluation result.

4. the music data processing method according to claim 1 based on facial action identification, which is characterized in that

The identification of facial action is carried out by the identification of face key point and fuzzy control theory.

5. the music data processing method according to claim 1 based on facial action identification, which is characterized in that

Further include:

The background music data is carried out to continue broadcasting, obtains target face action data, the target face action data A corresponding trifle foreground music, the foreground music are divided into the first foreground music and the second foreground music, first foreground voice Happy that broadcasting is matched with the background music, second foreground music and the background music, which mismatch, to be played；

The facial action data and the target face action data are subjected to matching judgement, carry out the target face movement The broadcasting of the corresponding trifle foreground music of data is chosen:

When the facial action data are matched with the target face action data, it is corresponding to play the target face action data One trifle foreground music, and the trifle foreground music is the first foreground voice corresponding with the target face action data It is happy；

When the facial action data and target face action data mismatch, the target face action data pair is played A trifle foreground music is answered, and the trifle foreground music is the second foreground voice corresponding with the target face action data It is happy.

6. the music data processing system based on facial action identification characterized by comprising

Music data obtains module, for obtaining background music data and foreground music data, the background music data and institute Stating foreground music data is respectively music of the number of segment second to several minutes of length；

Music data processing module, for the foreground music data to be divided into multistage trifle by beat, every trifle includes multiple Bat；

Facial action obtains module, for detecting the face action of people within the foreground music time, obtains in the foreground music Multiple facial action data in time, the foreground music data of the corresponding trifle time span of each facial action；

Musical composition module continues broadcasting for carrying out the background music data, and by each facial action data The foreground music data of the corresponding trifle time span are matched, and in conjunction with the background music, are generated new Music；

After the music data processing module, further includes the selection of music playing module, is used for:

7. the music data processing system according to claim 6 based on facial action identification, which is characterized in that

After the music data processing module, further includes music virtual scene module, is used for:

8. the music data processing system according to claim 7 based on facial action identification, which is characterized in that

In the music virtual scene module, including effect assessment submodule, it is moved by corresponding in the virtual scene data After object removes, the effect assessment submodule is used for: