CN108922533A - Determine whether the method and apparatus sung in the real sense - Google Patents
Determine whether the method and apparatus sung in the real sense Download PDFInfo
- Publication number
- CN108922533A CN108922533A CN201810833758.1A CN201810833758A CN108922533A CN 108922533 A CN108922533 A CN 108922533A CN 201810833758 A CN201810833758 A CN 201810833758A CN 108922533 A CN108922533 A CN 108922533A
- Authority
- CN
- China
- Prior art keywords
- play time
- voice audio
- lip position
- acquisition
- live video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000004891 communication Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 14
- 238000005516 engineering process Methods 0.000 abstract description 6
- 230000001133 acceleration Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000002093 peripheral effect Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000000919 ceramic Substances 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 241001342895 Chorus Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- HAORKNGNJCEJBX-UHFFFAOYSA-N cyprodinil Chemical compound N=1C(C)=CC(C2CC2)=NC=1NC1=CC=CC=C1 HAORKNGNJCEJBX-UHFFFAOYSA-N 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The disclosure is directed to a kind of method and apparatus for determining whether to sing in the real sense, and belong to electronic technology field.The method includes:During target direct broadcasting room executes K song function, the live video frame in multiple preset play time acquisitions is obtained;Determine the distance between upper lip position and lower lip position in the live video frame of each acquisition;According to the distance between upper lip position and lower lip position in the live video frame of each acquisition, determine whether target direct broadcasting room is in main broadcaster's lip-sync state.Using the disclosure, can judge automatically whether main broadcaster is lip-sync by computer equipment, judge that the efficiency of processing is higher, even if the enormous amount of direct broadcasting room, can also find the violation operation of main broadcaster in time.
Description
Technical field
The disclosure is directed to electronic technology fields, especially with respect to a kind of method and apparatus for determining whether to sing in the real sense.
Background technique
With the development of science and technology, live events gradually enter into people's lives.Main broadcaster can be by being broadcast live application program handle
The performance of talent and art of oneself uses the spectators of the live streaming application program to other, and while being broadcast live activity, main broadcaster can be with
Increase the business revenue of oneself.
In some cases, platform is broadcast live in order to keep more users here, live streaming duration that can be daily to main broadcaster makees correlation
It is required that.Such as main broadcaster is required at least to be broadcast live 6 hours daily, and this is necessary for spectators and performs intelligence and art or mutual with spectators in 6 hours
It is dynamic, live streaming cannot be open and do private thing.In this way, more users can watch interesting live streaming, can just be ready
Stay in the live streaming platform.And certain main broadcasters can pass through the embedded K song that application program is broadcast live in order to reach the requirement that platform is broadcast live
Plug-in unit plays the accompaniment of some songs, while the audio of singing opera arias of these songs recorded in advance can be played against microphone, but
It is that main broadcaster oneself will not open one's mouth to sing, only does other things against camera and such as play mobile phone.This behavior for not opening one's mouth to sing
Belong to violation operation, the performance intelligence and art or can not effectively be interacted with main broadcaster that spectators can not see, to will affect
The living broadcast environment of platform is broadcast live.Therefore, live streaming platform can look for special patrolman to enter each direct broadcasting room and check, to ensure
Each main broadcaster does not carry out above-mentioned violation operation.
In implementing the present disclosure, inventor's discovery has at least the following problems:
Only the operating efficiency checked in turn into each direct broadcasting room by several patrolmans is extremely low, and direct broadcasting room
Quantity again it is huge, be difficult by way of manually patrolling in time find main broadcaster violation operation.
Summary of the invention
In order to overcome the problems, such as that present in the relevant technologies, present disclose provides following technical schemes:
According to the first aspect of the embodiments of the present disclosure, a kind of method for determining whether to sing in the real sense is provided, the method includes:
During target direct broadcasting room executes K song function, the live streaming in multiple preset play time acquisitions is obtained
Video frame;
Determine the distance between upper lip position and lower lip position in the live video frame of each acquisition;
According to the distance between upper lip position and lower lip position in the live video frame of each acquisition, determine that the target is straight
Whether in main broadcaster's lip-sync state between broadcasting.
Optionally, the distance between upper lip position and lower lip position in the live video frame according to each acquisition, really
Whether the fixed target direct broadcasting room is in main broadcaster's lip-sync state, including:
Determine the corresponding voice audio amplitude of the multiple preset play time;
According to the multiple preset corresponding voice audio amplitude of play time and the live streaming of each acquisition
The distance between upper lip position and lower lip position in video frame, determine whether the target direct broadcasting room is in main broadcaster's lip-sync state.
Optionally, described according to the corresponding voice audio amplitude of the multiple preset play time and every
The distance between upper lip position and lower lip position in the live video frame of a acquisition, determine whether the target direct broadcasting room is in main
Lip-sync state is broadcast, including:
For each play time, broadcast if be greater than in the corresponding voice audio amplitude of the play time with described
While putting time point adjacent any play time corresponding voice audio amplitude, in the straight of play time acquisition
It broadcasts the distance between upper lip position and lower lip position in video frame and is greater than the live video acquired in any play time
The distance between upper lip position and lower lip position in frame, it is determined that the corresponding voice audio of the play time is sound of singing in the real sense
Frequently;Alternatively, if being less than any adjacent with the play time in the corresponding voice audio amplitude of the play time
While play time corresponding voice audio amplitude, the upper lip position in the live video frame of play time acquisition
The distance between lower lip position is less than upper lip position and lower lip in the live video frame of any play time acquisition
The distance between position, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense;Alternatively, if being broadcast described
Time point corresponding voice audio amplitude is put equal to voice corresponding with the adjacent any play time of the play time
Distance while audio amplitude, in the live video frame of play time acquisition between upper lip position and lower lip position
Equal to the distance in the live video frame of any play time acquisition between upper lip position and lower lip position, it is determined that
The corresponding voice audio of the play time is audio of singing in the real sense;
The quantity for determining the audio of singing in the real sense in the corresponding voice audio of the multiple preset play time, if described
The ratio that quantity accounts for the total quantity of the corresponding voice audio of the multiple preset play time is less than preset threshold value, then really
The fixed target direct broadcasting room is in main broadcaster's lip-sync state.
Optionally, the multiple preset play time be the target song include multiple notes respectively described
The midpoint of shared period in target song.
Optionally, the method also includes:
It sings in the real sense state if it is determined that the target direct broadcasting room is in main broadcaster, then obtains the multiple preset play time
Corresponding benchmark voice audio amplitude;
In the voice audio of singing in the real sense of acquisition, the corresponding voice of singing in the real sense of the multiple preset play time is determined
Audio amplitude;
For each play time, according to corresponding benchmark voice audio amplitude and corresponding voice audio vibration of singing in the real sense
Width determines that intermediate K sings score value;
By the corresponding intermediate K song score value adduction of each play time, the corresponding K song total score of the target song is obtained
Value issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information.
According to the second aspect of an embodiment of the present disclosure, a kind of device for determining whether to sing in the real sense is provided, described device includes:
First obtains module, for obtaining in multiple preset broadcastings during target direct broadcasting room executes K song function
The live video frame of time point acquisition;
First determining module, in the live video frame for determining each acquisition between upper lip position and lower lip position away from
From;
Second determining module, in the live video frame according to each acquisition between upper lip position and lower lip position away from
From, determine the target direct broadcasting room whether be in main broadcaster pretend to sing state.
Optionally, second determining module includes:
First determination unit, for determining the corresponding voice audio amplitude of the multiple preset play time;
Second determination unit, for according to the corresponding voice audio amplitude of the multiple preset play time,
And the distance between upper lip position and lower lip position in the live video frame of each acquisition, whether determine the target direct broadcasting room
In main broadcaster's lip-sync state.
Optionally, second determination unit, is used for:
For each play time, broadcast if be greater than in the corresponding voice audio amplitude of the play time with described
While putting time point adjacent any play time corresponding voice audio amplitude, in the straight of play time acquisition
It broadcasts the distance between upper lip position and lower lip position in video frame and is greater than the live video acquired in any play time
The distance between upper lip position and lower lip position in frame, it is determined that the corresponding voice audio of the play time is sound of singing in the real sense
Frequently;Alternatively, if being less than any adjacent with the play time in the corresponding voice audio amplitude of the play time
While play time corresponding voice audio amplitude, the upper lip position in the live video frame of play time acquisition
The distance between lower lip position is less than upper lip position and lower lip in the live video frame of any play time acquisition
The distance between position, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense;Alternatively, if being broadcast described
Time point corresponding voice audio amplitude is put equal to voice corresponding with the adjacent any play time of the play time
Distance while audio amplitude, in the live video frame of play time acquisition between upper lip position and lower lip position
Equal to the distance in the live video frame of any play time acquisition between upper lip position and lower lip position, it is determined that
The corresponding voice audio of the play time is audio of singing in the real sense;
The quantity for determining the audio of singing in the real sense in the corresponding voice audio of the multiple preset play time, if described
The ratio that quantity accounts for the total quantity of the corresponding voice audio of the multiple preset play time is less than preset threshold value, then really
The fixed target direct broadcasting room is in main broadcaster's lip-sync state.
Optionally, the multiple preset play time be the target song include multiple notes respectively described
The midpoint of shared period in target song.
Optionally, described device further includes:
Second obtain module, for when determine the target direct broadcasting room be in main broadcaster sing in the real sense state when, acquisition it is the multiple
The corresponding benchmark voice audio amplitude of preset play time;
Third determining module, for determining the multiple preset play time in the voice audio of singing in the real sense of acquisition
Corresponding voice audio amplitude of singing in the real sense;
4th determining module is used for for each play time, according to corresponding benchmark voice audio amplitude and correspondence
Voice audio amplitude of singing in the real sense, determine that intermediate K sings score value;
Cue module, for obtaining the target song pair for the corresponding intermediate K song score value adduction of each play time
The K song total score answered, issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information
Value.
According to the third aspect of an embodiment of the present disclosure, a kind of computer equipment is provided, the computer equipment includes processing
Device, communication interface, memory and communication bus, wherein:
The processor, the communication interface and the memory complete mutual communication by the communication bus;
The memory, for storing computer program;
The processor above-mentioned determines whether to sing in the real sense for executing the program stored on the memory to realize
Method.
According to a fourth aspect of embodiments of the present disclosure, a kind of computer readable storage medium is provided, it is described computer-readable
It is stored with computer program in storage medium, the computer program is realized when being executed by processor above-mentioned to be determined whether to sing in the real sense
Method.
The technical scheme provided by this disclosed embodiment can include the following benefits:
The method provided by the embodiment of the present disclosure, can live streaming to acquiring respectively in multiple preset play times
Video frame is identified, determines the distance between upper lip position and the lower lip position in live video frame, and then determines that target is straight
Whether in main broadcaster's lip-sync state between broadcasting.In this way, can judge automatically whether main broadcaster is lip-sync by computer equipment, judge to locate
The efficiency of reason is higher, even if the enormous amount of direct broadcasting room, can also find the violation operation of main broadcaster in time.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.In the accompanying drawings:
Fig. 1 is a kind of flow diagram of method for determining whether to sing in the real sense shown according to an exemplary embodiment;
Fig. 2 is a kind of schematic diagram of face characteristic shown according to an exemplary embodiment;
Fig. 3 is the correspondence of the distance between a kind of voice audio amplitude shown according to an exemplary embodiment and upperlip
The schematic diagram of relationship;
Fig. 4 is a kind of structural schematic diagram of device for determining whether to sing in the real sense shown according to an exemplary embodiment;
Fig. 5 is a kind of structural schematic diagram of terminal shown according to an exemplary embodiment;
Fig. 6 is a kind of structural schematic diagram of server shown according to an exemplary embodiment.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings
It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments
Those skilled in the art illustrate the concept of the disclosure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The embodiment of the present disclosure provides a kind of method for determining whether to sing in the real sense, and this method can be by terminal or server
It realizes.Wherein, terminal can be mobile phone, tablet computer, desktop computer, notebook computer etc..
Terminal may include the components such as processor, memory.Processor can be CPU (Central Processing
Unit, central processing unit) etc., it is determined in the live video frame of each acquisition between upper lip position and lower lip position
Distance, wait processing.Memory can be RAM (Random Access Memory, random access memory) that Flash (dodges
Deposit) etc., data needed for can be used for storing the data received, treatment process, the data generated in treatment process etc. are such as more
A preset play time etc..
Terminal can also include transceiver, input part, display unit, audio output part etc..Transceiver can be used for
Carry out data transmission with server, transceiver may include bluetooth component, WiFi (Wireless-Fidelity, Wireless Fidelity
Technology) component, antenna, match circuit, modem etc..Input part can be touch screen, keyboard, mouse etc..Audio is defeated
Component can be speaker, earphone etc. out.
System program and application program can be installed in terminal.User during using terminal, based on oneself
Different demands will use various application programs.The application program for having direct broadcast function can be installed in terminal.
Server may include the components such as processor, memory.Processor can be CPU etc., be determined for each
The distance between upper lip position and lower lip position in the live video frame of acquisition, wait processing.Memory can be RAM, Flash
(flash memory) etc., data needed for can be used for storing the data received, treatment process, the data generated in treatment process etc.,
Such as multiple preset play times.
Server can also be including transceiver etc..Transceiver can be used for carrying out data transmission with terminal, and transceiver can be with
Including bluetooth component, WiFi component, antenna, match circuit, modem etc..
The method that the embodiment of the present disclosure provides can execute in the terminal, can also execute in the server.If at end
It is executed in end, needs to download various data needed for implementation procedure from server, such as target song is corresponding multiple default
Play time.If executed in the server, terminal needs to upload various data needed for implementation procedure to server,
Such as live video frame, voice audio.In the following embodiments, it is to execute the method for determining whether to sing in the real sense in the terminal
Example is illustrated, and the executive mode of server is similar, and details are not described herein.
An exemplary embodiment of the present disclosure provides a kind of method for determining whether to sing in the real sense, as shown in Figure 1, this method
Process flow may include following step:
Step S110, during target direct broadcasting room executes K song function, acquisition is adopted in multiple preset play times
The live video frame of collection.
In an implementation, main broadcaster can be broadcast live by the way that application program is broadcast live, image during live streaming, in terminal
Acquisition component such as camera can acquire the video data of main broadcaster, and the video data of acquisition may include multiframe live video frame.
At the same time, the audio collection component such as microphone in terminal can also acquire environmental audio, and the environmental audio of acquisition can wrap
It includes and generates when the audio or main broadcaster played by other audio-frequence player devices against microphone speaks, sings, playing an instrument
Audio etc..
During live streaming, if main broadcaster want performance sing, can by live streaming application program in K sing plug-in unit into
Row requesting song, or can sing application program with the associated K of live streaming application program by other in terminal and request a song.When
After main broadcaster has selected target song, terminal can receive the K song instruction to target song.
After terminal is received to the K song instruction of target song, it can send to server for obtaining target song
Audio accompaniment request.Meanwhile terminal can also be corresponding multiple preset for obtaining target song to server transmission
The request of play time.Server can be in the corresponding relationship of pre-stored song and time series, acquisition and target
The corresponding time series of song, wherein time series includes multiple preset play times.Optionally, multiple preset to broadcast
The midpoint of put multiple notes that time point can include for target song period shared in target song respectively.In reality
In, time series corresponding with each song can be marked by way of manually marking.
Terminal can play the audio accompaniment of the target song received from server, meanwhile, it can be according to from server
The corresponding multiple preset play times of the target song received select in the video data of acquisition multiple pre-
If play time acquisition live video frame.Then, face characteristic can be carried out in the live video frame selected to mention
Take processing.
Step S120 determines the distance between upper lip position and lower lip position in the live video frame of each acquisition.
In an implementation, multiple live video frames can be selected from video data by step S110, extracted multiple straight
Broadcast the face characteristic in video frame.If main broadcaster facing to camera, can extract always in each live video frame
Face characteristic.If main broadcaster is had not facing to camera midway, it cannot can be extracted in each live video frame
To face characteristic.
As shown in Fig. 2, if extracting face characteristic in a certain live video frame in multiple live video frames, it can
To determine the position of upper lip and lower lip in the images, it is further assured that the distance of upper lip and lower lip in the images.
It, can be directly by upper lip under if not extracting face characteristic in a certain live video frame in multiple live video frames
The distance between lip is determined as zero.
Step S130 is determined according to the distance between upper lip position and lower lip position in the live video frame of each acquisition
Whether target direct broadcasting room is in main broadcaster's lip-sync state.
In an implementation, in the live video frame acquired at every point of time, can determine upper lip position and lower lip position it
Between distance, and then can according to the distance between upper lip position and lower lip position in the live video frame of each acquisition, determine
Whether target direct broadcasting room is in main broadcaster's lip-sync state.
It is shared in target song respectively that multiple preset play times can be multiple notes that target song includes
Period midpoint, due to target song may include do not need main broadcaster performance part such as prelude and need main broadcaster to drill
The part sung such as main song, chorus section, therefore multiple note can be set to sound corresponding to the part for needing main broadcaster to sing
Symbol.Each note can occupy certain duration in target song, can choose in the corresponding duration of multiple notes
Point is multiple play times.Generally in the midpoint of the corresponding duration of note, most highly desirable people exerts oneself sounding, thus upper lip
Farther out with the distance of lower lip, it is easy to be identified in the picture.In practical applications, if main broadcaster is one during K is sung
It directly does not open one's mouth to sing, then almost the distance between upper lip position and lower lip position are zero in the live video frame of each acquisition
Or the distance threshold of a both less than preset very little.In turn, lip-sync will be determined to be.
In the implementation of another possibility, step S130 may include:Determine multiple preset play times
Corresponding voice audio amplitude;According to the corresponding voice audio amplitude of multiple preset play times and every
The distance between upper lip position and lower lip position in the live video frame of a acquisition, determine whether target direct broadcasting room is in main broadcaster's vacation
Sing state.
In an implementation, terminal can also be distinguished in multiple preset play times from acquisition target song in server
Corresponding voice audio amplitude.Voice audio can also be acquired by microphone, in the voice audio of acquisition, determined multiple
Corresponding voice audio amplitude in preset play time.It then, can be according to multiple preset play times point
The distance between upper lip position and lower lip position in not corresponding voice audio amplitude and the live video frame of each acquisition,
Determine whether target direct broadcasting room is in main broadcaster's lip-sync state.
In practical applications, when voice audio amplitude is higher, the sound that people can hear is bigger, and stress is stronger.In order to
So that the sound issued is larger, people needs to improve the amplitude of lips, therefore the distance of upper lip and lower lip is larger.In turn, voice
There are inner links for the distance between audio amplitude and upperlip, as shown in figure 3, have 8 time points, this 8 time point difference
Corresponding voice audio amplitude point can be shown in the left side in Fig. 3, can be with by figure by taking play time A and play time B as an example
Find out, the corresponding voice audio amplitude of play time A is greater than the corresponding voice audio amplitude of play time B, therefore in master
It broadcasts in the case where singing in the real sense, the distance of the upperlip in the live video frame of play time A acquisition is greater than in play time B
The distance of upperlip in the live video frame of acquisition.So can be corresponding according to multiple preset play times
The distance between upper lip position and lower lip position in voice audio amplitude and the live video frame of each acquisition, determine target
Whether direct broadcasting room is in main broadcaster's lip-sync state.In this way, deterministic process is more bonded reality, the result judged is also just more quasi-
Really.
Optionally, according to multiple preset corresponding voice audio amplitudes of play time and each acquisition
The distance between upper lip position and lower lip position in live video frame, determine whether target direct broadcasting room is in main broadcaster's lip-sync state
Step may include:For each play time, if be greater than and broadcasting in the corresponding voice audio amplitude of play time
While time point adjacent any play time corresponding voice audio amplitude, in the live video of play time acquisition
The distance between upper lip position and lower lip position are greater than the upper lip position in the live video frame that any play time acquires in frame
Set the distance between lower lip position, it is determined that the corresponding voice audio of play time is audio of singing in the real sense;Alternatively, if broadcasting
Time point corresponding voice audio amplitude is put less than voice audio corresponding with the adjacent any play time of play time
While amplitude, the distance in the live video frame of play time acquisition between upper lip position and lower lip position is less than in office
The distance between upper lip position and lower lip position in the live video frame of one play time acquisition, it is determined that play time pair
The voice audio answered is audio of singing in the real sense;Alternatively, if being equal to and play time in the corresponding voice audio amplitude of play time
While the adjacent any play time of point corresponding voice audio amplitude, in the live video frame of play time acquisition
The distance between upper lip position and lower lip position be equal in the live video frame that any play time acquire upper lip position with
The distance between lower lip position, it is determined that the corresponding voice audio of play time is audio of singing in the real sense;Determine multiple preset broadcast
The quantity of the audio of singing in the real sense in time point corresponding voice audio is put, if to account for multiple preset play times corresponding for quantity
The ratio of the total quantity of voice audio is less than preset threshold value, it is determined that target direct broadcasting room is in main broadcaster's lip-sync state.
In an implementation, the corresponding voice audio amplitude of n-th of play time and (n-1)th, (n+1)th can be played
The corresponding voice audio amplitude of any one play time in time point is compared, if n-th of play time is corresponding
Voice audio amplitude be greater than the corresponding voice audio amplitude of (n-1)th play time, theoretically for, when n-th of broadcasting
Between put in the live video frame of acquisition that the distance between upper lip position and lower lip position should also greater than (n-1)th play times
The distance between upper lip position and lower lip position in the live video frame of acquisition.If the live streaming of n-th of play time acquisition
The distance between upper lip position and lower lip position may be also essentially larger than the live streaming view of (n-1)th play time acquisition in video frame
The distance between upper lip position and lower lip position in frequency frame, then the corresponding voice audio of n-th of play time is audio of singing in the real sense.
Similarly, if the corresponding voice audio of less than (n-1)th play time of the corresponding voice audio amplitude of n-th of play time
Amplitude, theoretically for, n-th of play time acquisition live video frame in the distance between upper lip position and lower lip position
Should again smaller than (n-1)th play time acquisition live video frame in the distance between upper lip position and lower lip position.Such as
The distance between upper lip position and lower lip position are actually again smaller than the in the live video frame of n-th of play time of fruit acquisition
The distance between upper lip position and lower lip position in the live video frame of n-1 play time acquisition, then n-th of play time
The corresponding voice audio of point is audio of singing in the real sense.
Finally, the number of the audio of singing in the real sense in the corresponding voice audio of multiple preset play times can be determined
Amount, if the ratio that quantity accounts for the total quantity of the corresponding voice audio of multiple preset play times is less than preset threshold
Value can then determine that target direct broadcasting room is in main broadcaster's lip-sync state.For example, if the quantity for audio of singing in the real sense accounts for multiple preset broadcast
70% or more of the total quantity of time point corresponding voice audio is put, then can determine that target direct broadcasting room is sung in the real sense in main broadcaster
Otherwise state can determine that target direct broadcasting room is in main broadcaster's lip-sync state.It is to have in order to prevent there are certain threshold space
Main broadcaster because do not catch up with rhythm once in a while or influenced by other objective factors, and be mistaken for not singing in the real sense.
Alternatively, if it is determined that target direct broadcasting room is in main broadcaster's lip-sync state, then the first prompt information can be issued.
In an implementation, if it is determined that target direct broadcasting room is in main broadcaster's lip-sync state, and the first prompt letter then can be generated in terminal
Breath, sends the first prompt information in the corresponding caching of display interface, the first prompt information is shown in display interface, to mention
Show that main broadcaster does not sing really currently, the unlawful practice of timely correction main broadcaster.Meanwhile terminal can also be by the first prompt information
It is sent to server, the first prompt information is pushed to the account of patrolman's login by server again, in this way, patrolman is connecing
After receiving prompt information, it can enter in the corresponding direct broadcasting room of the first prompt information and be checked, be further processed.
If method provided in this embodiment executes in the server, determine target direct broadcasting room be in main broadcaster lip-sync
After state, the first prompt information can be generated, the first prompt information is pushed to the account of patrolman's login.Meanwhile it taking
First prompt information can also be pushed to the account that corresponding main broadcaster logs in by business device, to prompt main broadcaster not drill really currently
It sings, the unlawful practice of timely correction main broadcaster.
Optionally, the method for embodiment of the present disclosure offer can also include:If it is determined that be in main broadcaster true for target direct broadcasting room
State is sung, then obtains the corresponding benchmark voice audio amplitude of multiple preset play times;In the voice of singing in the real sense of acquisition
In audio, the corresponding voice audio amplitude of singing in the real sense of multiple preset play times is determined;For each play time,
According to corresponding benchmark voice audio amplitude and corresponding voice audio amplitude of singing in the real sense, determine that intermediate K sings score value;By each broadcasting
Time point corresponding intermediate K song score value adduction, obtains the corresponding K song total score of target song, issues the second prompt information.Its
In, it include that the corresponding K of target song sings total score in the second prompt information.
In an implementation, if it is determined that target direct broadcasting room is in main broadcaster and sings in the real sense state, and terminal can obtain mesh from server
Song is marked in the corresponding benchmark voice audio amplitude of multiple preset play times, wherein benchmark voice audio amplitude
It can be the amplitude for the song that target song should be sung on corresponding time point.Then, it can be acquired by microphone
Voice audio of singing in the real sense in, determine the corresponding voice audio amplitude of singing in the real sense of multiple preset play times.When will be each
Between put corresponding benchmark voice audio amplitude and corresponding voice audio amplitude of singing in the real sense is compared, determine that intermediate K sings score value.Example
Such as, if sometime putting corresponding voice audio amplitude of singing in the real sense is more than or equal to corresponding benchmark voice audio amplitude,
Add 1 point, otherwise not score.The corresponding intermediate K song score value adduction of each time point may finally be determined that target song is corresponding
K sing total score.It is then possible to issue the second prompt information.
If the method that the embodiment of the present disclosure provides executes in the terminal, the second prompt information can be generated, by the
Two prompt informations are sent in the corresponding caching of display interface, and the second prompt information is shown in display interface, to prompt main broadcaster
The song currently sung can obtain how many points, allow main broadcaster as reference.If the method that the embodiment of the present disclosure provides is to take
It is executed in business device, the second prompt information can be generated, the second prompt information is pushed to the account that corresponding main broadcaster logs in, with
The song that prompt main broadcaster currently sings can obtain how many points, allow main broadcaster as reference.
The method provided by the embodiment of the present disclosure, can live streaming to acquiring respectively in multiple preset play times
Video frame is identified, determines the distance between upper lip position and the lower lip position in live video frame, and then determines that target is straight
Whether in main broadcaster's lip-sync state between broadcasting.In this way, can judge automatically whether main broadcaster is lip-sync by computer equipment, judge to locate
The efficiency of reason is higher, even if the enormous amount of direct broadcasting room, can also find the violation operation of main broadcaster in time.
Disclosure another exemplary embodiment provides a kind of device for determining whether to sing in the real sense, as shown in figure 4, the device
Including:
First obtains module 410, for obtaining multiple preset during target direct broadcasting room executes K song function
The live video frame of play time acquisition;
First determining module 420, in the live video frame for determining each acquisition between upper lip position and lower lip position
Distance;
Second determining module 430, in the live video frame according to each acquisition between upper lip position and lower lip position
Distance, determine the target direct broadcasting room whether be in main broadcaster pretend to sing state.
Optionally, second determining module 430 includes:
First determination unit, for determining the corresponding voice audio amplitude of the multiple preset play time;
Second determination unit, for according to the corresponding voice audio amplitude of the multiple preset play time,
And the distance between upper lip position and lower lip position in the live video frame of each acquisition, whether determine the target direct broadcasting room
In main broadcaster's lip-sync state.
Optionally, second determination unit, is used for:
For each play time, broadcast if be greater than in the corresponding voice audio amplitude of the play time with described
While putting time point adjacent any play time corresponding voice audio amplitude, in the straight of play time acquisition
It broadcasts the distance between upper lip position and lower lip position in video frame and is greater than the live video acquired in any play time
The distance between upper lip position and lower lip position in frame, it is determined that the corresponding voice audio of the play time is sound of singing in the real sense
Frequently;Alternatively, if being less than any adjacent with the play time in the corresponding voice audio amplitude of the play time
While play time corresponding voice audio amplitude, the upper lip position in the live video frame of play time acquisition
The distance between lower lip position is less than upper lip position and lower lip in the live video frame of any play time acquisition
The distance between position, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense;Alternatively, if being broadcast described
Time point corresponding voice audio amplitude is put equal to voice corresponding with the adjacent any play time of the play time
Distance while audio amplitude, in the live video frame of play time acquisition between upper lip position and lower lip position
Equal to the distance in the live video frame of any play time acquisition between upper lip position and lower lip position, it is determined that
The corresponding voice audio of the play time is audio of singing in the real sense;
The quantity for determining the audio of singing in the real sense in the corresponding voice audio of the multiple preset play time, if described
The ratio that quantity accounts for the total quantity of the corresponding voice audio of the multiple preset play time is less than preset threshold value, then really
The fixed target direct broadcasting room is in main broadcaster's lip-sync state.
Optionally, the multiple preset play time be the target song include multiple notes respectively described
The midpoint of shared period in target song.
Optionally, described device further includes:
Second obtain module, for when determine the target direct broadcasting room be in main broadcaster sing in the real sense state when, acquisition it is the multiple
The corresponding benchmark voice audio amplitude of preset play time;
Third determining module, for determining the multiple preset play time in the voice audio of singing in the real sense of acquisition
Corresponding voice audio amplitude of singing in the real sense;
4th determining module is used for for each play time, according to corresponding benchmark voice audio amplitude and correspondence
Voice audio amplitude of singing in the real sense, determine that intermediate K sings score value;
Cue module, for obtaining the target song pair for the corresponding intermediate K song score value adduction of each play time
The K song total score answered, issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information
Value.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, no detailed explanation will be given here.
The device provided by the embodiment of the present disclosure, can live streaming to acquiring respectively in multiple preset play times
Video frame is identified, determines the distance between upper lip position and the lower lip position in live video frame, and then determines that target is straight
Whether in main broadcaster's lip-sync state between broadcasting.In this way, can judge automatically whether main broadcaster is lip-sync by computer equipment, judge to locate
The efficiency of reason is higher, even if the enormous amount of direct broadcasting room, can also find the violation operation of main broadcaster in time.
It should be noted that:It is provided by the above embodiment determine whether the device sung in the real sense when determining whether to sing in the real sense,
Only the example of the division of the above functional modules, it in practical application, can according to need and by above-mentioned function distribution
It is completed by different functional modules, i.e., the internal structure of computer equipment is divided into different functional modules, more than completing
The all or part of function of description.In addition, it is provided by the above embodiment determine whether the device sung in the real sense with determine whether be
The embodiment of the method sung in the real sense belongs to same design, and specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Fig. 5 shows the structural schematic diagram of the terminal 1800 of one exemplary embodiment of disclosure offer.The terminal 1800
It can be:Smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer
III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio
Layer IV, dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 1800 is also
Other titles such as user equipment, portable terminal, laptop terminal, terminal console may be referred to as.
In general, terminal 1800 includes:Processor 1801 and memory 1802.
Processor 1801 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place
Reason device 1801 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field-
Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed
Logic array) at least one of example, in hardware realize.Processor 1801 also may include primary processor and coprocessor, master
Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing
Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.?
In some embodiments, processor 1801 can be integrated with GPU (Graphics Processing Unit, image processor),
GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1801 can also be wrapped
AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning
Calculating operation.
Memory 1802 may include one or more computer readable storage mediums, which can
To be non-transient.Memory 1802 may also include high-speed random access memory and nonvolatile memory, such as one
Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1802 can
Storage medium is read for storing at least one instruction, at least one instruction performed by processor 1801 for realizing this Shen
Please in embodiment of the method provide the method for determining whether to sing in the real sense.
In some embodiments, terminal 1800 is also optional includes:Peripheral device interface 1803 and at least one periphery are set
It is standby.It can be connected by bus or signal wire between processor 1801, memory 1802 and peripheral device interface 1803.It is each outer
Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1803.Specifically, peripheral equipment includes:
In radio circuit 1804, touch display screen 1805, camera 1806, voicefrequency circuit 1807, positioning component 1808 and power supply 1809
At least one.
Peripheral device interface 1803 can be used for I/O (Input/Output, input/output) is relevant outside at least one
Peripheral equipment is connected to processor 1801 and memory 1802.In some embodiments, processor 1801, memory 1802 and periphery
Equipment interface 1803 is integrated on same chip or circuit board;In some other embodiments, processor 1801, memory
1802 and peripheral device interface 1803 in any one or two can be realized on individual chip or circuit board, this implementation
Example is not limited this.
Radio circuit 1804 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.
Radio circuit 1804 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1804 is by telecommunications
Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit
1804 include:Antenna system, one or more amplifiers, tuner, oscillator, digital signal processor, compiles solution at RF transceiver
Code chipset, user identity module card etc..Radio circuit 1804 can by least one wireless communication protocol come with it is other
Terminal is communicated.The wireless communication protocol includes but is not limited to:WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network
(2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some implementations
In example, radio circuit 1804 can also include that NFC (Near Field Communication, wireless near field communication) is related
Circuit, the application are not limited this.
Display screen 1805 is for showing UI (User Interface, user interface).The UI may include figure, text,
Icon, video and its their any combination.When display screen 1805 is touch display screen, display screen 1805 also there is acquisition to exist
The ability of the touch signal on the surface or surface of display screen 1805.The touch signal can be used as control signal and be input to place
Reason device 1801 is handled.At this point, display screen 1805 can be also used for providing virtual push button and/or dummy keyboard, it is also referred to as soft to press
Button and/or soft keyboard.In some embodiments, display screen 1805 can be one, and the front panel of terminal 1800 is arranged;Another
In a little embodiments, display screen 1805 can be at least two, be separately positioned on the different surfaces of terminal 1800 or in foldover design;
In still other embodiments, display screen 1805 can be flexible display screen, is arranged on the curved surface of terminal 1800 or folds
On face.Even, display screen 1805 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 1805 can be with
Using LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode,
Organic Light Emitting Diode) etc. materials preparation.
CCD camera assembly 1806 is for acquiring image or video.Optionally, CCD camera assembly 1806 includes front camera
And rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.?
In some embodiments, rear camera at least two is that main camera, depth of field camera, wide-angle camera, focal length are taken the photograph respectively
As any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide
Pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are realized in camera fusion in angle
Shooting function.In some embodiments, CCD camera assembly 1806 can also include flash lamp.Flash lamp can be monochromatic temperature flash of light
Lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for
Light compensation under different-colour.
Voicefrequency circuit 1807 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and
It converts sound waves into electric signal and is input to processor 1801 and handled, or be input to radio circuit 1804 to realize that voice is logical
Letter.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 1800 to be multiple.
Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 1801 or radio frequency will to be come from
The electric signal of circuit 1804 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking
Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action
Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 1807 may be used also
To include earphone jack.
Positioning component 1808 is used for the current geographic position of positioning terminal 1800, to realize navigation or LBS (Location
Based Service, location based service).Positioning component 1808 can be the GPS (Global based on the U.S.
Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group
Part.
Power supply 1809 is used to be powered for the various components in terminal 1800.Power supply 1809 can be alternating current, direct current
Electricity, disposable battery or rechargeable battery.When power supply 1809 includes rechargeable battery, which can be line charge
Battery or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is to pass through
The battery of wireless coil charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 1800 further includes having one or more sensors 1810.One or more sensing
Device 1810 includes but is not limited to:Acceleration transducer 1811, gyro sensor 1812, pressure sensor 1813, fingerprint sensing
Device 1814, optical sensor 1815 and proximity sensor 1816.
Acceleration transducer 1811 can detecte the acceleration in three reference axis of the coordinate system established with terminal 1800
Size.For example, acceleration transducer 1811 can be used for detecting component of the acceleration of gravity in three reference axis.Processor
The 1801 acceleration of gravity signals that can be acquired according to acceleration transducer 1811, control touch display screen 1805 with transverse views
Or longitudinal view carries out the display of user interface.Acceleration transducer 1811 can be also used for game or the exercise data of user
Acquisition.
Gyro sensor 1812 can detecte body direction and the rotational angle of terminal 1800, gyro sensor 1812
Acquisition user can be cooperateed with to act the 3D of terminal 1800 with acceleration transducer 1811.Processor 1801 is according to gyro sensors
The data that device 1812 acquires, may be implemented following function:Action induction (for example changing UI according to the tilt operation of user) is clapped
Image stabilization, game control and inertial navigation when taking the photograph.
The lower layer of side frame and/or touch display screen 1805 in terminal 1800 can be set in pressure sensor 1813.When
When the side frame of terminal 1800 is arranged in pressure sensor 1813, user can detecte to the gripping signal of terminal 1800, by
Reason device 1801 carries out right-hand man's identification or prompt operation according to the gripping signal that pressure sensor 1813 acquires.Work as pressure sensor
1813 when being arranged in the lower layer of touch display screen 1805, is grasped by processor 1801 according to pressure of the user to touch display screen 1805
Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control,
At least one of icon control, menu control.
Fingerprint sensor 1814 is used to acquire the fingerprint of user, is collected by processor 1801 according to fingerprint sensor 1814
Fingerprint recognition user identity, alternatively, by fingerprint sensor 1814 according to the identity of collected fingerprint recognition user.Knowing
Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation by processor 1801, which grasps
Make to include solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1814 can be set
Set the front, the back side or side of terminal 1800.When being provided with physical button or manufacturer Logo in terminal 1800, fingerprint sensor
1814 can integrate with physical button or manufacturer Logo.
Optical sensor 1815 is for acquiring ambient light intensity.In one embodiment, processor 1801 can be according to light
The ambient light intensity that sensor 1815 acquires is learned, the display brightness of touch display screen 1805 is controlled.Specifically, work as ambient light intensity
When higher, the display brightness of touch display screen 1805 is turned up;When ambient light intensity is lower, the aobvious of touch display screen 1805 is turned down
Show brightness.In another embodiment, the ambient light intensity that processor 1801 can also be acquired according to optical sensor 1815, is moved
The acquisition parameters of state adjustment CCD camera assembly 1806.
Proximity sensor 1816, also referred to as range sensor are generally arranged at the front panel of terminal 1800.Proximity sensor
1816 for acquiring the distance between the front of user Yu terminal 1800.In one embodiment, when proximity sensor 1816 is examined
When measuring the distance between the front of user and terminal 1800 and gradually becoming smaller, by processor 1801 control touch display screen 1805 from
Bright screen state is switched to breath screen state;When proximity sensor 1816 detect the distance between front of user and terminal 1800 by
When gradual change is big, touch display screen 1805 is controlled by processor 1801 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 1800 of structure shown in Fig. 5, can wrap
It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
Fig. 6 shows the structural schematic diagram of the server 1900 of one exemplary embodiment of disclosure offer.The server
1900 can generate bigger difference because configuration or performance are different, may include one or more processors (central
Processing units, CPU) 1910 and one or more memory 1920.Wherein, it is deposited in the memory 1920
At least one instruction is contained, at least one instruction is loaded by the processor 1910 and executed to realize above-described embodiment institute
The method for determining whether to sing in the real sense stated.
Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing disclosure disclosed herein
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are wanted by right
It asks and points out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.
Claims (12)
1. a kind of method for determining whether to sing in the real sense, which is characterized in that the method includes:
During target direct broadcasting room executes K song function, the live video in multiple preset play time acquisitions is obtained
Frame;
Determine the distance between upper lip position and lower lip position in the live video frame of each acquisition;
According to the distance between upper lip position and lower lip position in the live video frame of each acquisition, the target direct broadcasting room is determined
Whether in main broadcaster's lip-sync state.
2. the method according to claim 1, wherein upper lip position in the live video frame according to each acquisition
The distance between lower lip position is set, determines whether the target direct broadcasting room is in main broadcaster's lip-sync state, including:
Determine the corresponding voice audio amplitude of the multiple preset play time;
According to the multiple preset corresponding voice audio amplitude of play time and the live video of each acquisition
The distance between upper lip position and lower lip position in frame, determine whether the target direct broadcasting room is in main broadcaster's lip-sync state.
3. according to the method described in claim 2, it is characterized in that, described distinguish according to the multiple preset play time
The distance between upper lip position and lower lip position in corresponding voice audio amplitude and the live video frame of each acquisition, really
Whether the fixed target direct broadcasting room is in main broadcaster's lip-sync state, including:
For each play time, if when the corresponding voice audio amplitude of the play time is greater than with the broadcasting
Between while put adjacent any play time corresponding voice audio amplitude, in the live streaming view of play time acquisition
The distance between upper lip position and lower lip position are greater than in the live video frame of any play time acquisition in frequency frame
The distance between upper lip position and lower lip position, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense;Or
Person, if when the corresponding voice audio amplitude of the play time is less than any broadcasting adjacent with the play time
Between while put corresponding voice audio amplitude, upper lip position and lower lip in the live video frame of play time acquisition
The distance between position be less than in the live video frame of any play time acquisition upper lip position and lower lip position it
Between distance, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense;Alternatively, if in the play time
The corresponding voice audio amplitude of point is shaken equal to voice audio corresponding with the adjacent any play time of the play time
While width, the distance in the live video frame of play time acquisition between upper lip position and lower lip position is equal to
The distance between upper lip position and lower lip position in the live video frame of any play time acquisition, it is determined that described to broadcast
Putting time point corresponding voice audio is audio of singing in the real sense;
The quantity for determining the audio of singing in the real sense in the corresponding voice audio of the multiple preset play time, if the quantity
The ratio for accounting for the total quantity of the corresponding voice audio of the multiple preset play time is less than preset threshold value, it is determined that institute
It states target direct broadcasting room and is in main broadcaster's lip-sync state.
4. the method according to claim 1, wherein the multiple preset play time is to carry out K song
The midpoint of multiple notes that target song includes period shared in the target song respectively.
5. the method according to claim 1, wherein the method also includes:
It sings in the real sense state if it is determined that the target direct broadcasting room is in main broadcaster, then obtains the multiple preset play time difference
Corresponding benchmark voice audio amplitude;
In the voice audio of singing in the real sense of acquisition, the corresponding voice audio of singing in the real sense of the multiple preset play time is determined
Amplitude;
For each play time, according to corresponding benchmark voice audio amplitude and corresponding voice audio amplitude of singing in the real sense, really
Fixed intermediate K sings score value;
By the corresponding intermediate K song score value adduction of each play time, the corresponding K song total score of target song for carrying out K song is obtained
Value issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information.
6. a kind of device for determining whether to sing in the real sense, which is characterized in that described device includes:
First obtains module, for obtaining in multiple preset play times during target direct broadcasting room executes K song function
The live video frame of point acquisition;
First determining module, the distance between upper lip position and lower lip position in the live video frame for determining each acquisition;
Second determining module, for the distance between upper lip position and lower lip position in the live video frame according to each acquisition,
Determine whether the target direct broadcasting room is in main broadcaster's lip-sync state.
7. device according to claim 6, which is characterized in that second determining module includes:
First determination unit, for determining the corresponding voice audio amplitude of the multiple preset play time;
Second determination unit, for according to the corresponding voice audio amplitude of the multiple preset play time and
The distance between upper lip position and lower lip position in the live video frame of each acquisition, determine whether the target direct broadcasting room is in
Main broadcaster's lip-sync state.
8. device according to claim 7, which is characterized in that second determination unit is used for:
For each play time, if when the corresponding voice audio amplitude of the play time is greater than with the broadcasting
Between while put adjacent any play time corresponding voice audio amplitude, in the live streaming view of play time acquisition
The distance between upper lip position and lower lip position are greater than in the live video frame of any play time acquisition in frequency frame
The distance between upper lip position and lower lip position, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense;Or
Person, if when the corresponding voice audio amplitude of the play time is less than any broadcasting adjacent with the play time
Between while put corresponding voice audio amplitude, upper lip position and lower lip in the live video frame of play time acquisition
The distance between position be less than in the live video frame of any play time acquisition upper lip position and lower lip position it
Between distance, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense;Alternatively, if in the play time
The corresponding voice audio amplitude of point is shaken equal to voice audio corresponding with the adjacent any play time of the play time
While width, the distance in the live video frame of play time acquisition between upper lip position and lower lip position is equal to
The distance between upper lip position and lower lip position in the live video frame of any play time acquisition, it is determined that described to broadcast
Putting time point corresponding voice audio is audio of singing in the real sense;
The quantity for determining the audio of singing in the real sense in the corresponding voice audio of the multiple preset play time, if the quantity
The ratio for accounting for the total quantity of the corresponding voice audio of the multiple preset play time is less than preset threshold value, it is determined that institute
It states target direct broadcasting room and is in main broadcaster's lip-sync state.
9. device according to claim 6, which is characterized in that the multiple preset play time is to carry out K song
The midpoint of multiple notes that target song includes period shared in the target song respectively.
10. device according to claim 6, which is characterized in that described device further includes:
Second obtain module, for when determine the target direct broadcasting room be in main broadcaster sing in the real sense state when, obtain it is the multiple presets
The corresponding benchmark voice audio amplitude of play time;
Third determining module, in the voice audio of singing in the real sense of acquisition, determining the multiple preset play time difference
Corresponding voice audio amplitude of singing in the real sense;
4th determining module, for for each play time, according to corresponding benchmark voice audio amplitude and corresponding true
Voice audio amplitude is sung, determines that intermediate K sings score value;
Cue module, for obtaining the target song for carrying out K song for the corresponding intermediate K song score value adduction of each play time
Corresponding K sings total score, issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information
Value.
11. a kind of computer equipment, which is characterized in that the computer equipment includes processor, communication interface, memory and leads to
Believe bus, wherein:
The processor, the communication interface and the memory complete mutual communication by the communication bus;
The memory, for storing computer program;
The processor, for executing the program stored on the memory, to realize any side claim 1-5
Method step.
12. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program realizes claim 1-5 any method and step when the computer program is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810833758.1A CN108922533A (en) | 2018-07-26 | 2018-07-26 | Determine whether the method and apparatus sung in the real sense |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810833758.1A CN108922533A (en) | 2018-07-26 | 2018-07-26 | Determine whether the method and apparatus sung in the real sense |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108922533A true CN108922533A (en) | 2018-11-30 |
Family
ID=64418527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810833758.1A Pending CN108922533A (en) | 2018-07-26 | 2018-07-26 | Determine whether the method and apparatus sung in the real sense |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108922533A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232911A (en) * | 2019-06-13 | 2019-09-13 | 南京地平线集成电路有限公司 | With singing recognition methods, device, storage medium and electronic equipment |
CN110602529A (en) * | 2019-09-12 | 2019-12-20 | 广州虎牙科技有限公司 | Live broadcast monitoring method and device, electronic equipment and machine-readable storage medium |
CN111984818A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Singing following recognition method and device, storage medium and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0572531A4 (en) * | 1991-02-22 | 1995-03-22 | Seaway Technologies Inc | Acoustic method and apparatus for identifying human sonic sources. |
US20030212552A1 (en) * | 2002-05-09 | 2003-11-13 | Liang Lu Hong | Face recognition procedure useful for audiovisual speech recognition |
US20120004914A1 (en) * | 2006-06-21 | 2012-01-05 | Tell Me Networks c/o Microsoft Corporation | Audio human verification |
KR20140133056A (en) * | 2013-05-09 | 2014-11-19 | 중앙대학교기술지주 주식회사 | Apparatus and method for providing auto lip-synch in animation |
US20140368700A1 (en) * | 2013-06-12 | 2014-12-18 | Technion Research And Development Foundation Ltd. | Example-based cross-modal denoising |
CN105788610A (en) * | 2016-02-29 | 2016-07-20 | 广州酷狗计算机科技有限公司 | Audio processing method and device |
CN105959723A (en) * | 2016-05-16 | 2016-09-21 | 浙江大学 | Lip-synch detection method based on combination of machine vision and voice signal processing |
CN106599765A (en) * | 2015-10-20 | 2017-04-26 | 深圳市商汤科技有限公司 | Method and system for judging living body based on continuously pronouncing video-audio of object |
US9699288B1 (en) * | 2016-03-31 | 2017-07-04 | Hon Hai Precision Industry Co., Ltd. | Communication device and method for disguising communication environment thereof |
CN107862093A (en) * | 2017-12-06 | 2018-03-30 | 广州酷狗计算机科技有限公司 | File attribute recognition methods and device |
JP6315677B2 (en) * | 2014-03-28 | 2018-04-25 | 株式会社エクシング | Performance device and program |
-
2018
- 2018-07-26 CN CN201810833758.1A patent/CN108922533A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0572531A4 (en) * | 1991-02-22 | 1995-03-22 | Seaway Technologies Inc | Acoustic method and apparatus for identifying human sonic sources. |
US20030212552A1 (en) * | 2002-05-09 | 2003-11-13 | Liang Lu Hong | Face recognition procedure useful for audiovisual speech recognition |
US20120004914A1 (en) * | 2006-06-21 | 2012-01-05 | Tell Me Networks c/o Microsoft Corporation | Audio human verification |
KR20140133056A (en) * | 2013-05-09 | 2014-11-19 | 중앙대학교기술지주 주식회사 | Apparatus and method for providing auto lip-synch in animation |
US20140368700A1 (en) * | 2013-06-12 | 2014-12-18 | Technion Research And Development Foundation Ltd. | Example-based cross-modal denoising |
JP6315677B2 (en) * | 2014-03-28 | 2018-04-25 | 株式会社エクシング | Performance device and program |
CN106599765A (en) * | 2015-10-20 | 2017-04-26 | 深圳市商汤科技有限公司 | Method and system for judging living body based on continuously pronouncing video-audio of object |
CN105788610A (en) * | 2016-02-29 | 2016-07-20 | 广州酷狗计算机科技有限公司 | Audio processing method and device |
US9699288B1 (en) * | 2016-03-31 | 2017-07-04 | Hon Hai Precision Industry Co., Ltd. | Communication device and method for disguising communication environment thereof |
CN105959723A (en) * | 2016-05-16 | 2016-09-21 | 浙江大学 | Lip-synch detection method based on combination of machine vision and voice signal processing |
CN107862093A (en) * | 2017-12-06 | 2018-03-30 | 广州酷狗计算机科技有限公司 | File attribute recognition methods and device |
Non-Patent Citations (1)
Title |
---|
俞晓: "假唱的鉴别", 《音响技术》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984818A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Singing following recognition method and device, storage medium and electronic equipment |
CN110232911A (en) * | 2019-06-13 | 2019-09-13 | 南京地平线集成电路有限公司 | With singing recognition methods, device, storage medium and electronic equipment |
CN110602529A (en) * | 2019-09-12 | 2019-12-20 | 广州虎牙科技有限公司 | Live broadcast monitoring method and device, electronic equipment and machine-readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110267055A (en) | Recommend the methods, devices and systems of direct broadcasting room | |
CN108401124A (en) | The method and apparatus of video record | |
CN109040297A (en) | User's portrait generation method and device | |
CN109618212A (en) | Information display method, device, terminal and storage medium | |
CN110290421A (en) | Frame per second method of adjustment, device, computer equipment and storage medium | |
CN109151593A (en) | Main broadcaster's recommended method, device storage medium | |
CN109300482A (en) | Audio recording method, apparatus, storage medium and terminal | |
CN110278464A (en) | The method and apparatus for showing list | |
CN109327608A (en) | Method, terminal, server and the system that song is shared | |
CN110290392B (en) | Live broadcast information display method, device, equipment and storage medium | |
CN109348247A (en) | Determine the method, apparatus and storage medium of audio and video playing timestamp | |
CN108848394A (en) | Net cast method, apparatus, terminal and storage medium | |
CN110491358A (en) | Carry out method, apparatus, equipment, system and the storage medium of audio recording | |
CN108897597A (en) | The method and apparatus of guidance configuration live streaming template | |
CN108965757A (en) | video recording method, device, terminal and storage medium | |
CN109448761A (en) | The method and apparatus for playing song | |
CN108900925A (en) | The method and apparatus of live streaming template are set | |
CN109922356A (en) | Video recommendation method, device and computer readable storage medium | |
CN109361930A (en) | Method for processing business, device and computer readable storage medium | |
CN110418152A (en) | It is broadcast live the method and device of prompt | |
CN109635133A (en) | Visualize audio frequency playing method, device, electronic equipment and storage medium | |
CN108831513A (en) | Method, terminal, server and the system of recording audio data | |
CN110266982A (en) | The method and system of song is provided in recorded video | |
CN111402844A (en) | Song chorusing method, device and system | |
CN109218751A (en) | The method, apparatus and system of recommendation of audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181130 |
|
RJ01 | Rejection of invention patent application after publication |