CN107509105A

CN107509105A - A kind of audio content filtration system and method

Info

Publication number: CN107509105A
Application number: CN201710680396.2A
Authority: CN
Inventors: 王恒生
Original assignee: Shanghai Feixun Data Communication Technology Co Ltd
Current assignee: SHANGHAI MICROPHONE CULTURE MEDIA Co.,Ltd.
Priority date: 2017-08-10
Filing date: 2017-08-10
Publication date: 2017-12-22
Anticipated expiration: 2037-08-10
Also published as: CN107509105B

Abstract

The present invention relates to video display arts field, more particularly to a kind of audio content filtration system and method.Including：Audio frame database, cluster analysis is carried out to target audio content to be played and obtains audio frame class, stores each audio frame and its corresponding audio frame class；Mood monitoring modular, audience emotion is monitored in real time when playing target audio content, doubtful audio content is obtained when audience emotion is target emotion；Doubtful audio content data is updated according to doubtful audio content；Playing module, when the triggering times of doubtful audio content data are more than default number of observation, audio frame class corresponding to the sound characteristic of doubtful audio content data in Filtration Goal audio content.Monitor the mood of spectators in real time by mood monitoring modular, Active Learning causes the sound characteristic of the uncomfortable audio content of spectators, dynamic updates doubtful audio content data so that and the uncomfortable audio content of spectators may be caused not to be played and be just skipped, real intelligently filters audio content.

Description

A kind of audio content filtration system and method

Technical field

The present invention relates to intelligent play-back technology field, more particularly to a kind of audio content filtration system and method.

Background technology

Many people are keen to watch horror film, and horror film not only has frightful fragment, also has frightful sound.Broadcasting When being put into most frightful fragment or frightful sound, the situation for viewing of lacking courage may be there is also.But directly close Above-mentioned video, it is clear that be the intention for not meeting beholder, but how accurately to skip these videos for sending frightful sound Fragment, become insoluble problem.

, may be there is also in the video of some non-terrified sheet types in addition, especially children are when video is watched Some sound that are more frightful or making them fear.And because children can not express well, it can not well operate and regard Frequency playback equipment, so can not independently go to screen audio content.

The A of application publication number CN 106454490, the date of publication 2017 year 2 month application for a patent for invention of 22 days is authorized to disclose A kind of intelligence plays the method and device of video.The method that the intelligence plays video, including：Implement the image of monitoring user； According to the image of user by default human body image algorithm, the body part of user, the body part of the user are analyzed Including any one in head, eye and hand or more persons；Judge whether the hand covers in the eye；When the hand It is the video that F.F. is playing to cover in the eye；When meeting the condition of default playing function back to normal, stop The state of F.F. video, broadcast state back to normal play the video.Realize the video that specific type is watched in user When, the intention of accurate understanding user, the function of playing video automatically, intelligent, so as to lift the viewing experience of user.

But the technical scheme has following deficiencies：

1. in the technical scheme, judge whether to need F.F. to broadcast according to whether user occurs to cover the actions of eyes The video put.Since user covers eyes with hand, then current broadcasting content whether F.F. meaning for user Less.

2. different users for cause the reaction of its uncomfortable video be it is different, than if any user can send point The surface that the user cry, having covers eyes with hand, user also may be only facial expression.Eye is covered only by user Eyeball acts the method that to trigger the operation of the currently playing video of F.F., can cause the technical scheme or device to other occur The reaction user of type is invalid.

3. always to wait until that user watches the video for causing discomfort, causing user, there occurs the action that eyes are covered with hand The operation of F.F. can be just triggered later, can not pre-estimate which audio content can cause the discomfort of user.

The content of the invention

In order to solve the above-mentioned technical problem the present invention, proposes a kind of audio content filtration system, it is characterised in that including： Audio frame database, playing module, mood monitoring modular；

The audio frame database, for carrying out cluster analysis to target audio content to be played, audio frame class is obtained, And store each audio frame and its corresponding audio frame class；

The mood monitoring modular, for monitoring audience emotion in real time when playing the target audio content, and The audience emotion obtains doubtful audio content when being target emotion；Doubtful audio content is updated according to the doubtful audio content Information, the doubtful audio content data include the sound characteristic and triggering times of the doubtful audio content；

The playing module, it is more than default number of observation for the triggering times when the doubtful audio content data When, filter audio frame class corresponding to the sound characteristic of doubtful audio content data described in the target audio content.

In above-mentioned technical proposal, the mood of spectators is monitored in real time by mood monitoring modular, the study of active causes spectators The sound characteristic of uncomfortable audio content, dynamically updates doubtful audio content data so that may cause the uncomfortable sound of spectators Frequency content is not played and is just skipped.The real intelligently filters audio content realized.

Preferably, the mood monitoring modular includes the facial characteristics of collection spectators and the collection apparatus list of sound characteristic Member, the mood monitoring modular monitor the audience emotion according to the facial characteristics and the sound characteristic.

Preferably, the target emotion refers to that facial characteristics and/or sound characteristic meet the spectators of mood requirement Mood.

Preferably, the sound characteristic of the doubtful audio content refers to each audio frame of the doubtful audio content Sound characteristic average value；The sound characteristic of the doubtful audio content meets tone, sound equipment in default range of pitch, sound Ring in scope.

Preferably, the doubtful audio content refer to detect the target emotion before a certain moment to detecting One or more audio frames that the playing module plays in the period between a certain moment after the target emotion.

The present invention also provides a kind of audio content filter method, it is characterised in that including：

Step S-1, cluster analysis is carried out to target audio content to be played, obtain audio frame class, and by each sound Frequency frame and its corresponding audio frame class are stored in audio database；

Step S-2, audience emotion is monitored when playing the target audio content in real time, and be mesh in the audience emotion Doubtful audio content is obtained when marking mood；

Step S-3, doubtful audio content data, the doubtful audio content data are updated according to the doubtful audio content Include the sound characteristic and triggering times of the doubtful audio content；

Step S-4, when the triggering times of the doubtful audio content data are more than default number of observation, described in filtering Audio frame class corresponding to the sound characteristic of doubtful audio content data described in target audio content.

Preferably, the audience emotion according to the facial characteristics of the spectators collected and sound data monitoring.

Brief description of the drawings

Fig. 1 is the system construction drawing of the embodiment of the present invention；

Fig. 2 is the method flow diagram of the embodiment of the present invention.

Embodiment

Specific examples below is only explanation of the invention, and it is not limitation of the present invention, art technology Personnel can make the modification of no creative contribution to the present embodiment as needed after this specification is read, but as long as All protected in scope of the presently claimed invention by Patent Law.

Embodiment one

Such as Fig. 1, a kind of audio content filtration system, including：Audio frame database, filtered data base, playing module, mood Monitoring modular.Mood monitoring modular watches the audio content in the real-time monitoring spectators that target audio content plays (to be had frightful Sound audio content) when mood (audience emotion), monitoring represent spectators generate discomfort mood (i.e. target feelings Thread, such as fear mood, scaring mood etc.) when, the audio content that triggering spectators are produced with target emotion carries out Active Learning, carries The feature of the audio content is taken, doubtful audio content data is updated according to the audio content extracted.Playing module is in doubtful sound When the triggering times of frequency content information are more than default number of observation, doubtful audio content letter described in Filtration Goal audio content Audio frame class corresponding to the sound characteristic of breath.Specifically, the triggering times in doubtful audio content data are more than default observation During number, the sound characteristic of the doubtful audio content data is stored into filtered data base.Playing module can be broadcast for audio Putting equipment either has the video playback apparatus or video equipment of audio playing function.Playing module is being played in target audio Rong Shi, the sound characteristic according to described in filtered data base judges to need the particular audio content skipped, and performs and skip this The operation of particular audio content.Monitor the mood of spectators in real time by mood monitoring modular, the study of active causes spectators uncomfortable Audio content sound characteristic, dynamically update filtered data base so that meet the audio content of feature in filtered data base It can just be skipped when not being played before not causing spectators uncomfortable by video module.The real intelligently filters sound realized Frequency content.

Specifically, specific objective audio content is marked according to the sound characteristic of audio content in the present embodiment.Sound has Three features：Tone, loudness (i.e. volume), tone color (i.e. rhythm).Tone is relevant with the frequency of sound, and the frequency of sound is bigger, Tone is higher, and the unit of frequency is Hz (hertz).Loudness is relevant with the amplitude of sound, and the amplitude of sound is bigger, and loudness is got over Greatly, the unit of loudness is dB (decibel).And tone color is relevant with the material of sounding, the difference of mainly sound waveform causes tone color It is different.One or more that can select in tonality feature, loudness feature and the tamber characteristic of sound marks different sound Sound.And in the present embodiment, the audio content of frightful sound is sent to mark using the tonality feature and loudness feature of sound.

Each frame voice data (i.e. audio frame) of audio frame database purchase target audio content in the present embodiment Key feature：Tonality feature and loudness feature.And concordance list is established respectively：Tonality feature and sound in each frame voice data The value of degree feature preserves respectively.For example it is [tone N, loudness N] corresponding to nth frame voice data.The whole that will be created The playing sequence deposit audio frame database of index, according to target audio content.Using the N datas in audio frame database as Example, content are nth frame videos, [tone N, loudness N].And cluster analysis is carried out to the audio frame in the audio frame database, Obtain audio frame class corresponding to the audio frame.

Mood monitoring modular includes：Collection apparatus unit, audio analysis unit, observed data storehouse.

Collection apparatus unit, the facial characteristics and sound for when target audio content plays, gathering spectators in real time are special Sign.Specifically, the image recorder including the facial characteristics for gathering spectators in real time and the sound for gathering spectators in real time The sound recording apparatus of sound feature, image recorder and sound recording apparatus can be separate equipment, or Collect image record and the multifunctional equipment of sound writing function one.The face that mood monitoring modular gathers according to collection apparatus unit Portion's feature and sound characteristic monitoring audience emotion, and enable audio analysis element analysis when audience emotion is target emotion and touch Send out the doubtful audio content of target emotion.Mood monitoring modular includes face feature, sound characteristic and target emotion (the present embodiment In to fear mood) corresponding to database.For example fear a certain group of face feature corresponding to mood, fear a certain corresponding to mood Group sound characteristic.Mood monitoring modular carries out the facial characteristics collected and sound characteristic database corresponding with target emotion Retrieval compares, and whether the current emotional for judging spectators is target emotion (whether cause harm and be afraid of mood).Specifically, in the present embodiment Exemplified by judging whether audience emotion be to fear mood.If mood analytic unit judges that the facial characteristics of spectators meets to fear feelings Thread requires (meet and fear database corresponding to mood), then marks spectators' to fear that mood is 1 grade；If mood analytic unit Judge that the sound characteristic of spectators meets to fear that mood requires (meet and fear database corresponding to mood), then mark the evil of spectators It is 2 grades to be afraid of mood；If mood analytic unit judge spectators facial characteristics and sound characteristic at the same meet fear mood requirement (meet and fear database corresponding to mood), then mark spectators' to fear that mood is 3 grades；If mood analytic unit judges to see Many facial characteristics and sound characteristic are unsatisfactory for fearing that mood requires (meet and fear database corresponding to mood), then mark Spectators' fears that mood is 0 grade.If spectators it is current fear that mood is not 0 grade, show to have monitored that spectators' fears feelings Thread.If mood monitoring modular judges that audience emotion is not to fear mood, show the audio content that current playing module plays The discomfort of spectators is not caused, currently playing audio content is suitable；If mood monitoring modular judges audience emotion It is to fear mood, then the discomfort of spectators may be caused by showing the audio content of current playing module broadcasting, trigger spectators' Target emotion (even if spectators fear), currently playing audio content may be improper, it may be necessary to are prohibited to play and (are broadcasting It is skipped when putting).

In order to ensure the viewing effect of spectators and viewing experience, the False Rate of system is reduced.In the present embodiment, one is employed Individual verification scheme.Observational learning is carried out using this part audio content as doubtful audio content.For example, using an observed data The doubtful audio content data of library storage.The sound characteristic of doubtful audio content meets tone, sound equipment in default range of pitch, sound Ring in scope.The sound characteristic of doubtful audio content refers to the sound characteristic of each audio frame of the doubtful audio content Average value.Doubtful audio content data includes the sound characteristic and triggering times of doubtful audio content；Wherein, triggering times refer to The number of doubtful audio content triggering target emotion.

Mood monitoring modular enables sound after monitoring that currently playing audio content triggers the target emotion of spectators Frequency analysis unit extracts doubtful audio content first.Sound of the spectators in audio content is considered in the extraction of doubtful audio content Broadcast, form corresponding emotional feedback to spectators and postpone for some time, audio analysis unit will be gathered and currently broadcast The a certain moment (such as 2 seconds before current time) before putting the time and (detecting corresponding reproduction time during target emotion) arrives inspection Measure the sound that playing module plays in the period between a certain moment (such as 2 seconds after current time) after target emotion One or more audio frames of frequency content.The doubtful audio content of audio analysis element analysis extraction, specifically, audio analysis list Member establishes array of indexes corresponding to each audio frame of doubtful audio content collected, it is assumed that 5 seconds are a frame, then 4 seconds Doubtful audio content includes 20 audio frames.There is continuity in view of sound, the probability for lofty change occur is smaller. Audio analysis unit calculates the tone and loudness of each audio frame collected respectively, then by the tone of 20 audio frames and Loudness waits until the tone average value and loudness average value of the voice data of the doubtful audio content after averaging respectively.Audio Analytic unit is then according to the tone average value of the doubtful audio content got and loudness average value as doubtful audio content Sound characteristic, and update observed data storehouse：If exist in observed data storehouse doubtful with same sound feature Audio content data, then the triggering times of this doubtful audio content data are added 1；If in observed data storehouse and it is not present Doubtful audio content data with same sound feature, then new doubtful audio content data is added in observed data storehouse, The sound characteristic of new doubtful audio content data is arranged to the sound characteristic of the doubtful audio content, and will be new doubtful The triggering times of audio content data set 1.Preset when the triggering times of doubtful audio content data in observed data storehouse are more than Number of observation (number of observation for three times) in the present embodiment when, by doubtful audio content corresponding to doubtful audio content data The Filtration Goal being just to determine, it is added to sound characteristic scope is filtered corresponding to the sound characteristic of the doubtful audio content data In filtered data base：Assuming that the sound characteristic of doubtful audio content data is tone a and sound equipment b, the error range of tone is determined For c, the error range of sound equipment is d, then audio frame class corresponding to the doubtful audio content data is that sound characteristic exists：Range of pitch (tone a-c, tone a+c), the audio frame in soundscape (sound equipment b-d, sound equipment b+d).

Playing module is when playing audio content, it will skips audio frame corresponding to the doubtful audio content data automatically Class.

The audio content filtration system of the present embodiment, monitor the mood of spectators in real time by mood monitoring modular, active Study causes the sound characteristic of the uncomfortable audio content of spectators, dynamically updates doubtful audio content data so that may cause The uncomfortable audio content of spectators is just skipped when not being played.The real intelligently filters audio content realized.

Embodiment two

A kind of audio content filter method of audio content filtration system, such as Fig. 2, including：

Step S-1, establish audio frame database, the tone of each audio frame of audio frame database purchase audio content Feature and loudness feature.Concordance list is established respectively：The value of tonality feature and loudness feature in each frame voice data is protected respectively Leave and.For example it is [tone N, loudness N] corresponding to nth frame voice data.The whole created is indexed, by audio content Playing sequence is stored in audio frame database.By taking the N datas in audio frame database as an example, content is nth frame video, [sound Adjust N, loudness N].And cluster analysis is carried out to each audio frame in the audio frame database, obtains sound corresponding to the audio frame Frequency frame class.

Step S-2, when playing module plays target audio content, mood monitoring modular monitors audience emotion in real time, and Doubtful audio content is obtained when the audience emotion is target emotion.Mood monitoring modular is adopted when playing target audio content Collect the facial characteristics and sound characteristic of spectators, according to facial characteristics and sound signature analysis audience emotion, and in audience emotion To obtain doubtful audio content during target emotion.

The facial characteristics and sound signature analysis audience emotion that mood monitoring modular gathers in real time according to collection apparatus unit, And the doubtful audio content of audio analysis element analysis triggering target emotion is enabled when audience emotion is target emotion.Mood The facial characteristics collected and sound characteristic database corresponding with target emotion are carried out retrieval and compared by monitoring modular, judge to see Whether many current emotionals are target emotion (whether cause harm and be afraid of mood).Specifically, to judge audience emotion in the present embodiment Whether it is exemplified by fearing mood.If mood monitoring modular judges that the facial characteristics of spectators meets to fear that mood requirement (meets Fear database corresponding to mood), then mark spectators' to fear that mood is 1 grade；If mood monitoring modular judges the sound of spectators Sound feature satisfaction fears that mood requires (meet and fear database corresponding to mood), then marks spectators' to fear that mood is 2 grades； If mood monitoring modular judge spectators facial characteristics and sound characteristic at the same meet fear mood requirement (meet and fear feelings Database corresponding to thread), then mark spectators' to fear that mood is 3 grades；If mood monitoring modular judges the facial characteristics of spectators It is unsatisfactory for fearing that mood requires (meet and fear database corresponding to mood) with sound characteristic, then marks spectators' to fear feelings Thread is 0 grade.If spectators it is current fear that mood is not 0 grade, show to have monitored that spectators' fears mood.If mood is supervised Survey module and judge that audience emotion is not to fear mood, then show that the audio content that current playing module plays does not cause spectators Discomfort, currently playing audio content is suitable；If mood monitoring modular judges that audience emotion is to fear mood, table The audio content that playing module plays before improving eyesight may cause the discomfort of spectators, trigger the target emotion of spectators (even if seeing Crowd fears), currently playing audio content may be improper, it may be necessary to is prohibited to play and (be skipped when playing).

Step S-3, mood monitoring modular are updated in the doubtful audio in observed data storehouse according to the doubtful audio content Hold information, the doubtful audio content data includes the sound characteristic and triggering times of the doubtful audio content.Mood monitors The audio analysis unit of module updates observed data, the doubtful audio content letter of observed data library storage according to doubtful audio content Breath, doubtful audio content data include the sound characteristic and triggering times of doubtful audio content.Audio analysis unit extracts first Doubtful audio content.The extraction of doubtful audio content considers that sound of the spectators in audio content broadcasts, and phase is formed to spectators The emotional feedback answered postpones for some time, and collection is currently played the time and (detects mesh by audio analysis unit Mark mood when corresponding reproduction time) before a certain moment (such as 2 seconds before current time) to detect target emotion it One or more of the audio content that playing module plays in the period between a certain moment (such as 2 seconds after current time) afterwards Individual audio frame.Audio analysis unit is to the doubtful audio content of analysis extraction, and specifically, audio analysis unit establishes what is collected Array of indexes corresponding to each audio frame of doubtful audio content, it is assumed that 5 seconds be the doubtful audio content bag of a frame, then 4 seconds Containing 20 audio frames.There is continuity in view of sound, the probability for lofty change occur is smaller.Audio analysis unit point The tone and loudness of each audio frame collected are not calculated, are then averaged the tone of 20 audio frames and loudness respectively The tone average value and loudness average value of the voice data of the doubtful audio content are waited until after value.The subsequent root of audio analysis unit According to sound characteristic of the tone average value and loudness average value of the doubtful audio content got as doubtful audio content, and Update observed data storehouse：If the doubtful audio content data with same sound feature in observed data storehouse be present, The triggering times of this doubtful audio content data are then added 1；It is if in observed data storehouse and special in the absence of having same sound The doubtful audio content data of sign, then add new doubtful audio content data, by new doubtful audio in observed data storehouse The sound characteristic of content information is arranged to the sound characteristic of the doubtful audio content, and by new doubtful audio content data Triggering times set 1.

Step S-4, playing module is when the triggering times of the doubtful audio content data in the renewal observed data storehouse During more than default number of observation, audio frame corresponding to the sound characteristic of doubtful audio content data in Filtration Goal audio content Class；The audio frame class refers to the voice data that sound characteristic matches with the sound characteristic of the doubtful audio content data.For example, When the triggering times of the doubtful audio content data in observed data storehouse are more than default number of observation, by doubtful audio content Filtering sound characteristic scope is added in filtered data base corresponding to the sound characteristic of information；Triggering times refer in doubtful audio Hold the number of triggering target emotion.When the triggering times of doubtful audio content data in observed data storehouse are more than default observation time During number (number of observation in the present embodiment is three times), what doubtful audio content corresponding to doubtful audio content data had just been to determine Filtration Goal, filtered data base is added to by sound characteristic scope is filtered corresponding to the sound characteristic of the doubtful audio content data In：Assuming that the sound characteristic of doubtful audio content data is tone a and sound equipment b, the error range for determining tone is c, sound equipment Error range is d.Then audio frame class corresponding to the doubtful audio content data is that sound characteristic exists：Range of pitch (tone a-c, Tone a+c), the audio frame in soundscape (sound equipment b-d, sound equipment b+d).So, video module, will when playing audio content Audio frame class corresponding to the doubtful audio content data can be skipped automatically.

The audio content filter method of the present embodiment, can be using the audio content filtration system in embodiment one come real It is existing.The audio content filter method of this implementation causes the uncomfortable sound of spectators by monitoring the mood of spectators, the study of active in real time The sound characteristic of frequency content, dynamically update doubtful audio content data so that the audio content that spectators may be caused uncomfortable exists Just it is skipped when not being played.The real intelligently filters audio content realized.

Specific embodiment described herein is only to spirit explanation for example of the invention.Technology belonging to the present invention is led The technical staff in domain can be made various modifications or supplement to described specific embodiment or be replaced using similar mode Generation, but without departing from the spiritual of the present invention or surmount scope defined in appended claims.

Claims

A kind of 1. audio content filtration system, it is characterised in that including：Audio frame database, playing module, mood monitoring mould Block；

The audio frame database, for carrying out cluster analysis to target audio content to be played, audio frame class is obtained, and deposit Store up each audio frame and its corresponding audio frame class；

The mood monitoring modular, for monitoring audience emotion in real time when playing the target audio content, and described Audience emotion obtains doubtful audio content when being target emotion；Doubtful audio content letter is updated according to the doubtful audio content Breath, the doubtful audio content data include the sound characteristic and triggering times of the doubtful audio content；

The playing module, for when the triggering times of the doubtful audio content data are more than default number of observation, mistake Filter audio frame class corresponding to the sound characteristic of doubtful audio content data described in the target audio content.
A kind of 2. audio content filtration system according to claim 1, it is characterised in that：The mood monitoring modular includes Gather the facial characteristics of spectators and the collection apparatus unit of sound characteristic, the mood monitoring modular according to the facial characteristics and The sound characteristic monitors the audience emotion.
A kind of 3. audio content filtration system according to claim 1, it is characterised in that：The target emotion refers to face Feature and/or sound characteristic meet the audience emotion of mood requirement.
A kind of 4. audio content filtration system according to claim 1, it is characterised in that：The sound of the doubtful audio content Sound feature refers to the average value of the sound characteristic of each audio frame of the doubtful audio content；The doubtful audio content Sound characteristic meets tone, sound equipment in default range of pitch, soundscape.
A kind of 5. audio content filtration system according to claim 1, it is characterised in that：The doubtful audio content refers to A certain moment before detecting the target emotion between a certain moment detected after the target emotion when Between the playing module plays in section one or more audio frames.
A kind of 6. audio content filter method, it is characterised in that including：

Step S-1, cluster analysis is carried out to target audio content to be played, obtain audio frame class, and by each audio frame And its corresponding audio frame class is stored in audio database；

Step S-2, audience emotion is monitored when playing the target audio content in real time, and be target feelings in the audience emotion Doubtful audio content is obtained during thread；

Step S-3, doubtful audio content data is updated according to the doubtful audio content, the doubtful audio content data includes The sound characteristic and triggering times of the doubtful audio content；

Step S-4, when the triggering times of the doubtful audio content data are more than default number of observation, filter the target Audio frame class corresponding to the sound characteristic of doubtful audio content data described in audio content.
A kind of 7. audio content filter method according to claim 6, it is characterised in that：According to the face of the spectators collected Audience emotion described in portion's feature and sound data monitoring.
A kind of 8. audio content filter method according to claim 6, it is characterised in that：The target emotion refers to face Feature and/or sound characteristic meet the audience emotion of mood requirement.
A kind of 9. audio content filter method according to claim 6, it is characterised in that：The sound of the doubtful audio content Sound feature refers to the average value of the sound characteristic of each audio frame of the doubtful audio content；The sound of the doubtful audio content Sound feature meets tone, sound equipment in default range of pitch, soundscape.
A kind of 10. audio content filter method according to claim 6, it is characterised in that：The doubtful audio content is Refer to a certain moment before detecting the target emotion between a certain moment detected after the target emotion One or more audio frames that the playing module plays in period.