CN108922533A

CN108922533A - Determine whether the method and apparatus sung in the real sense

Info

Publication number: CN108922533A
Application number: CN201810833758.1A
Authority: CN
Inventors: 汤伯超
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2018-07-26
Filing date: 2018-07-26
Publication date: 2018-11-30

Abstract

The disclosure is directed to a kind of method and apparatus for determining whether to sing in the real sense, and belong to electronic technology field.The method includes：During target direct broadcasting room executes K song function, the live video frame in multiple preset play time acquisitions is obtained；Determine the distance between upper lip position and lower lip position in the live video frame of each acquisition；According to the distance between upper lip position and lower lip position in the live video frame of each acquisition, determine whether target direct broadcasting room is in main broadcaster's lip-sync state.Using the disclosure, can judge automatically whether main broadcaster is lip-sync by computer equipment, judge that the efficiency of processing is higher, even if the enormous amount of direct broadcasting room, can also find the violation operation of main broadcaster in time.

Description

Determine whether the method and apparatus sung in the real sense

Technical field

The disclosure is directed to electronic technology fields, especially with respect to a kind of method and apparatus for determining whether to sing in the real sense.

Background technique

With the development of science and technology, live events gradually enter into people's lives.Main broadcaster can be by being broadcast live application program handle The performance of talent and art of oneself uses the spectators of the live streaming application program to other, and while being broadcast live activity, main broadcaster can be with Increase the business revenue of oneself.

In some cases, platform is broadcast live in order to keep more users here, live streaming duration that can be daily to main broadcaster makees correlation It is required that.Such as main broadcaster is required at least to be broadcast live 6 hours daily, and this is necessary for spectators and performs intelligence and art or mutual with spectators in 6 hours It is dynamic, live streaming cannot be open and do private thing.In this way, more users can watch interesting live streaming, can just be ready Stay in the live streaming platform.And certain main broadcasters can pass through the embedded K song that application program is broadcast live in order to reach the requirement that platform is broadcast live Plug-in unit plays the accompaniment of some songs, while the audio of singing opera arias of these songs recorded in advance can be played against microphone, but It is that main broadcaster oneself will not open one's mouth to sing, only does other things against camera and such as play mobile phone.This behavior for not opening one's mouth to sing Belong to violation operation, the performance intelligence and art or can not effectively be interacted with main broadcaster that spectators can not see, to will affect The living broadcast environment of platform is broadcast live.Therefore, live streaming platform can look for special patrolman to enter each direct broadcasting room and check, to ensure Each main broadcaster does not carry out above-mentioned violation operation.

In implementing the present disclosure, inventor's discovery has at least the following problems：

Only the operating efficiency checked in turn into each direct broadcasting room by several patrolmans is extremely low, and direct broadcasting room Quantity again it is huge, be difficult by way of manually patrolling in time find main broadcaster violation operation.

Summary of the invention

In order to overcome the problems, such as that present in the relevant technologies, present disclose provides following technical schemes：

According to the first aspect of the embodiments of the present disclosure, a kind of method for determining whether to sing in the real sense is provided, the method includes：

During target direct broadcasting room executes K song function, the live streaming in multiple preset play time acquisitions is obtained Video frame；

Determine the distance between upper lip position and lower lip position in the live video frame of each acquisition；

According to the distance between upper lip position and lower lip position in the live video frame of each acquisition, determine that the target is straight Whether in main broadcaster's lip-sync state between broadcasting.

Optionally, the distance between upper lip position and lower lip position in the live video frame according to each acquisition, really Whether the fixed target direct broadcasting room is in main broadcaster's lip-sync state, including：

Determine the corresponding voice audio amplitude of the multiple preset play time；

According to the multiple preset corresponding voice audio amplitude of play time and the live streaming of each acquisition The distance between upper lip position and lower lip position in video frame, determine whether the target direct broadcasting room is in main broadcaster's lip-sync state.

Optionally, described according to the corresponding voice audio amplitude of the multiple preset play time and every The distance between upper lip position and lower lip position in the live video frame of a acquisition, determine whether the target direct broadcasting room is in main Lip-sync state is broadcast, including：

For each play time, broadcast if be greater than in the corresponding voice audio amplitude of the play time with described While putting time point adjacent any play time corresponding voice audio amplitude, in the straight of play time acquisition It broadcasts the distance between upper lip position and lower lip position in video frame and is greater than the live video acquired in any play time The distance between upper lip position and lower lip position in frame, it is determined that the corresponding voice audio of the play time is sound of singing in the real sense Frequently；Alternatively, if being less than any adjacent with the play time in the corresponding voice audio amplitude of the play time While play time corresponding voice audio amplitude, the upper lip position in the live video frame of play time acquisition The distance between lower lip position is less than upper lip position and lower lip in the live video frame of any play time acquisition The distance between position, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense；Alternatively, if being broadcast described Time point corresponding voice audio amplitude is put equal to voice corresponding with the adjacent any play time of the play time Distance while audio amplitude, in the live video frame of play time acquisition between upper lip position and lower lip position Equal to the distance in the live video frame of any play time acquisition between upper lip position and lower lip position, it is determined that The corresponding voice audio of the play time is audio of singing in the real sense；

The quantity for determining the audio of singing in the real sense in the corresponding voice audio of the multiple preset play time, if described The ratio that quantity accounts for the total quantity of the corresponding voice audio of the multiple preset play time is less than preset threshold value, then really The fixed target direct broadcasting room is in main broadcaster's lip-sync state.

Optionally, the multiple preset play time be the target song include multiple notes respectively described The midpoint of shared period in target song.

Optionally, the method also includes：

It sings in the real sense state if it is determined that the target direct broadcasting room is in main broadcaster, then obtains the multiple preset play time Corresponding benchmark voice audio amplitude；

In the voice audio of singing in the real sense of acquisition, the corresponding voice of singing in the real sense of the multiple preset play time is determined Audio amplitude；

For each play time, according to corresponding benchmark voice audio amplitude and corresponding voice audio vibration of singing in the real sense Width determines that intermediate K sings score value；

By the corresponding intermediate K song score value adduction of each play time, the corresponding K song total score of the target song is obtained Value issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information.

According to the second aspect of an embodiment of the present disclosure, a kind of device for determining whether to sing in the real sense is provided, described device includes：

First obtains module, for obtaining in multiple preset broadcastings during target direct broadcasting room executes K song function The live video frame of time point acquisition；

First determining module, in the live video frame for determining each acquisition between upper lip position and lower lip position away from From；

Second determining module, in the live video frame according to each acquisition between upper lip position and lower lip position away from From, determine the target direct broadcasting room whether be in main broadcaster pretend to sing state.

Optionally, second determining module includes：

First determination unit, for determining the corresponding voice audio amplitude of the multiple preset play time；

Second determination unit, for according to the corresponding voice audio amplitude of the multiple preset play time, And the distance between upper lip position and lower lip position in the live video frame of each acquisition, whether determine the target direct broadcasting room In main broadcaster's lip-sync state.

Optionally, second determination unit, is used for：

Optionally, described device further includes：

Second obtain module, for when determine the target direct broadcasting room be in main broadcaster sing in the real sense state when, acquisition it is the multiple The corresponding benchmark voice audio amplitude of preset play time；

Third determining module, for determining the multiple preset play time in the voice audio of singing in the real sense of acquisition Corresponding voice audio amplitude of singing in the real sense；

4th determining module is used for for each play time, according to corresponding benchmark voice audio amplitude and correspondence Voice audio amplitude of singing in the real sense, determine that intermediate K sings score value；

Cue module, for obtaining the target song pair for the corresponding intermediate K song score value adduction of each play time The K song total score answered, issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information Value.

According to the third aspect of an embodiment of the present disclosure, a kind of computer equipment is provided, the computer equipment includes processing Device, communication interface, memory and communication bus, wherein：

The processor, the communication interface and the memory complete mutual communication by the communication bus；

The memory, for storing computer program；

The processor above-mentioned determines whether to sing in the real sense for executing the program stored on the memory to realize Method.

According to a fourth aspect of embodiments of the present disclosure, a kind of computer readable storage medium is provided, it is described computer-readable It is stored with computer program in storage medium, the computer program is realized when being executed by processor above-mentioned to be determined whether to sing in the real sense Method.

The technical scheme provided by this disclosed embodiment can include the following benefits：

The method provided by the embodiment of the present disclosure, can live streaming to acquiring respectively in multiple preset play times Video frame is identified, determines the distance between upper lip position and the lower lip position in live video frame, and then determines that target is straight Whether in main broadcaster's lip-sync state between broadcasting.In this way, can judge automatically whether main broadcaster is lip-sync by computer equipment, judge to locate The efficiency of reason is higher, even if the enormous amount of direct broadcasting room, can also find the violation operation of main broadcaster in time.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.In the accompanying drawings：

Fig. 1 is a kind of flow diagram of method for determining whether to sing in the real sense shown according to an exemplary embodiment；

Fig. 2 is a kind of schematic diagram of face characteristic shown according to an exemplary embodiment；

Fig. 3 is the correspondence of the distance between a kind of voice audio amplitude shown according to an exemplary embodiment and upperlip The schematic diagram of relationship；

Fig. 4 is a kind of structural schematic diagram of device for determining whether to sing in the real sense shown according to an exemplary embodiment；

Fig. 5 is a kind of structural schematic diagram of terminal shown according to an exemplary embodiment；

Fig. 6 is a kind of structural schematic diagram of server shown according to an exemplary embodiment.

Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate the concept of the disclosure.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.

The embodiment of the present disclosure provides a kind of method for determining whether to sing in the real sense, and this method can be by terminal or server It realizes.Wherein, terminal can be mobile phone, tablet computer, desktop computer, notebook computer etc..

Terminal may include the components such as processor, memory.Processor can be CPU (Central Processing Unit, central processing unit) etc., it is determined in the live video frame of each acquisition between upper lip position and lower lip position Distance, wait processing.Memory can be RAM (Random Access Memory, random access memory) that Flash (dodges Deposit) etc., data needed for can be used for storing the data received, treatment process, the data generated in treatment process etc. are such as more A preset play time etc..

Terminal can also include transceiver, input part, display unit, audio output part etc..Transceiver can be used for Carry out data transmission with server, transceiver may include bluetooth component, WiFi (Wireless-Fidelity, Wireless Fidelity Technology) component, antenna, match circuit, modem etc..Input part can be touch screen, keyboard, mouse etc..Audio is defeated Component can be speaker, earphone etc. out.

System program and application program can be installed in terminal.User during using terminal, based on oneself Different demands will use various application programs.The application program for having direct broadcast function can be installed in terminal.

Server may include the components such as processor, memory.Processor can be CPU etc., be determined for each The distance between upper lip position and lower lip position in the live video frame of acquisition, wait processing.Memory can be RAM, Flash (flash memory) etc., data needed for can be used for storing the data received, treatment process, the data generated in treatment process etc., Such as multiple preset play times.

Server can also be including transceiver etc..Transceiver can be used for carrying out data transmission with terminal, and transceiver can be with Including bluetooth component, WiFi component, antenna, match circuit, modem etc..

The method that the embodiment of the present disclosure provides can execute in the terminal, can also execute in the server.If at end It is executed in end, needs to download various data needed for implementation procedure from server, such as target song is corresponding multiple default Play time.If executed in the server, terminal needs to upload various data needed for implementation procedure to server, Such as live video frame, voice audio.In the following embodiments, it is to execute the method for determining whether to sing in the real sense in the terminal Example is illustrated, and the executive mode of server is similar, and details are not described herein.

An exemplary embodiment of the present disclosure provides a kind of method for determining whether to sing in the real sense, as shown in Figure 1, this method Process flow may include following step：

Step S110, during target direct broadcasting room executes K song function, acquisition is adopted in multiple preset play times The live video frame of collection.

In an implementation, main broadcaster can be broadcast live by the way that application program is broadcast live, image during live streaming, in terminal Acquisition component such as camera can acquire the video data of main broadcaster, and the video data of acquisition may include multiframe live video frame. At the same time, the audio collection component such as microphone in terminal can also acquire environmental audio, and the environmental audio of acquisition can wrap It includes and generates when the audio or main broadcaster played by other audio-frequence player devices against microphone speaks, sings, playing an instrument Audio etc..

During live streaming, if main broadcaster want performance sing, can by live streaming application program in K sing plug-in unit into Row requesting song, or can sing application program with the associated K of live streaming application program by other in terminal and request a song.When After main broadcaster has selected target song, terminal can receive the K song instruction to target song.

After terminal is received to the K song instruction of target song, it can send to server for obtaining target song Audio accompaniment request.Meanwhile terminal can also be corresponding multiple preset for obtaining target song to server transmission The request of play time.Server can be in the corresponding relationship of pre-stored song and time series, acquisition and target The corresponding time series of song, wherein time series includes multiple preset play times.Optionally, multiple preset to broadcast The midpoint of put multiple notes that time point can include for target song period shared in target song respectively.In reality In, time series corresponding with each song can be marked by way of manually marking.

Terminal can play the audio accompaniment of the target song received from server, meanwhile, it can be according to from server The corresponding multiple preset play times of the target song received select in the video data of acquisition multiple pre- If play time acquisition live video frame.Then, face characteristic can be carried out in the live video frame selected to mention Take processing.

Step S120 determines the distance between upper lip position and lower lip position in the live video frame of each acquisition.

In an implementation, multiple live video frames can be selected from video data by step S110, extracted multiple straight Broadcast the face characteristic in video frame.If main broadcaster facing to camera, can extract always in each live video frame Face characteristic.If main broadcaster is had not facing to camera midway, it cannot can be extracted in each live video frame To face characteristic.

As shown in Fig. 2, if extracting face characteristic in a certain live video frame in multiple live video frames, it can To determine the position of upper lip and lower lip in the images, it is further assured that the distance of upper lip and lower lip in the images. It, can be directly by upper lip under if not extracting face characteristic in a certain live video frame in multiple live video frames The distance between lip is determined as zero.

Step S130 is determined according to the distance between upper lip position and lower lip position in the live video frame of each acquisition Whether target direct broadcasting room is in main broadcaster's lip-sync state.

In an implementation, in the live video frame acquired at every point of time, can determine upper lip position and lower lip position it Between distance, and then can according to the distance between upper lip position and lower lip position in the live video frame of each acquisition, determine Whether target direct broadcasting room is in main broadcaster's lip-sync state.

It is shared in target song respectively that multiple preset play times can be multiple notes that target song includes Period midpoint, due to target song may include do not need main broadcaster performance part such as prelude and need main broadcaster to drill The part sung such as main song, chorus section, therefore multiple note can be set to sound corresponding to the part for needing main broadcaster to sing Symbol.Each note can occupy certain duration in target song, can choose in the corresponding duration of multiple notes Point is multiple play times.Generally in the midpoint of the corresponding duration of note, most highly desirable people exerts oneself sounding, thus upper lip Farther out with the distance of lower lip, it is easy to be identified in the picture.In practical applications, if main broadcaster is one during K is sung It directly does not open one's mouth to sing, then almost the distance between upper lip position and lower lip position are zero in the live video frame of each acquisition Or the distance threshold of a both less than preset very little.In turn, lip-sync will be determined to be.

In the implementation of another possibility, step S130 may include：Determine multiple preset play times Corresponding voice audio amplitude；According to the corresponding voice audio amplitude of multiple preset play times and every The distance between upper lip position and lower lip position in the live video frame of a acquisition, determine whether target direct broadcasting room is in main broadcaster's vacation Sing state.

In an implementation, terminal can also be distinguished in multiple preset play times from acquisition target song in server Corresponding voice audio amplitude.Voice audio can also be acquired by microphone, in the voice audio of acquisition, determined multiple Corresponding voice audio amplitude in preset play time.It then, can be according to multiple preset play times point The distance between upper lip position and lower lip position in not corresponding voice audio amplitude and the live video frame of each acquisition, Determine whether target direct broadcasting room is in main broadcaster's lip-sync state.

In practical applications, when voice audio amplitude is higher, the sound that people can hear is bigger, and stress is stronger.In order to So that the sound issued is larger, people needs to improve the amplitude of lips, therefore the distance of upper lip and lower lip is larger.In turn, voice There are inner links for the distance between audio amplitude and upperlip, as shown in figure 3, have 8 time points, this 8 time point difference Corresponding voice audio amplitude point can be shown in the left side in Fig. 3, can be with by figure by taking play time A and play time B as an example Find out, the corresponding voice audio amplitude of play time A is greater than the corresponding voice audio amplitude of play time B, therefore in master It broadcasts in the case where singing in the real sense, the distance of the upperlip in the live video frame of play time A acquisition is greater than in play time B The distance of upperlip in the live video frame of acquisition.So can be corresponding according to multiple preset play times The distance between upper lip position and lower lip position in voice audio amplitude and the live video frame of each acquisition, determine target Whether direct broadcasting room is in main broadcaster's lip-sync state.In this way, deterministic process is more bonded reality, the result judged is also just more quasi- Really.

Optionally, according to multiple preset corresponding voice audio amplitudes of play time and each acquisition The distance between upper lip position and lower lip position in live video frame, determine whether target direct broadcasting room is in main broadcaster's lip-sync state Step may include：For each play time, if be greater than and broadcasting in the corresponding voice audio amplitude of play time While time point adjacent any play time corresponding voice audio amplitude, in the live video of play time acquisition The distance between upper lip position and lower lip position are greater than the upper lip position in the live video frame that any play time acquires in frame Set the distance between lower lip position, it is determined that the corresponding voice audio of play time is audio of singing in the real sense；Alternatively, if broadcasting Time point corresponding voice audio amplitude is put less than voice audio corresponding with the adjacent any play time of play time While amplitude, the distance in the live video frame of play time acquisition between upper lip position and lower lip position is less than in office The distance between upper lip position and lower lip position in the live video frame of one play time acquisition, it is determined that play time pair The voice audio answered is audio of singing in the real sense；Alternatively, if being equal to and play time in the corresponding voice audio amplitude of play time While the adjacent any play time of point corresponding voice audio amplitude, in the live video frame of play time acquisition The distance between upper lip position and lower lip position be equal in the live video frame that any play time acquire upper lip position with The distance between lower lip position, it is determined that the corresponding voice audio of play time is audio of singing in the real sense；Determine multiple preset broadcast The quantity of the audio of singing in the real sense in time point corresponding voice audio is put, if to account for multiple preset play times corresponding for quantity The ratio of the total quantity of voice audio is less than preset threshold value, it is determined that target direct broadcasting room is in main broadcaster's lip-sync state.

In an implementation, the corresponding voice audio amplitude of n-th of play time and (n-1)th, (n+1)th can be played The corresponding voice audio amplitude of any one play time in time point is compared, if n-th of play time is corresponding Voice audio amplitude be greater than the corresponding voice audio amplitude of (n-1)th play time, theoretically for, when n-th of broadcasting Between put in the live video frame of acquisition that the distance between upper lip position and lower lip position should also greater than (n-1)th play times The distance between upper lip position and lower lip position in the live video frame of acquisition.If the live streaming of n-th of play time acquisition The distance between upper lip position and lower lip position may be also essentially larger than the live streaming view of (n-1)th play time acquisition in video frame The distance between upper lip position and lower lip position in frequency frame, then the corresponding voice audio of n-th of play time is audio of singing in the real sense. Similarly, if the corresponding voice audio of less than (n-1)th play time of the corresponding voice audio amplitude of n-th of play time Amplitude, theoretically for, n-th of play time acquisition live video frame in the distance between upper lip position and lower lip position Should again smaller than (n-1)th play time acquisition live video frame in the distance between upper lip position and lower lip position.Such as The distance between upper lip position and lower lip position are actually again smaller than the in the live video frame of n-th of play time of fruit acquisition The distance between upper lip position and lower lip position in the live video frame of n-1 play time acquisition, then n-th of play time The corresponding voice audio of point is audio of singing in the real sense.

Finally, the number of the audio of singing in the real sense in the corresponding voice audio of multiple preset play times can be determined Amount, if the ratio that quantity accounts for the total quantity of the corresponding voice audio of multiple preset play times is less than preset threshold Value can then determine that target direct broadcasting room is in main broadcaster's lip-sync state.For example, if the quantity for audio of singing in the real sense accounts for multiple preset broadcast 70% or more of the total quantity of time point corresponding voice audio is put, then can determine that target direct broadcasting room is sung in the real sense in main broadcaster Otherwise state can determine that target direct broadcasting room is in main broadcaster's lip-sync state.It is to have in order to prevent there are certain threshold space Main broadcaster because do not catch up with rhythm once in a while or influenced by other objective factors, and be mistaken for not singing in the real sense.

Alternatively, if it is determined that target direct broadcasting room is in main broadcaster's lip-sync state, then the first prompt information can be issued.

In an implementation, if it is determined that target direct broadcasting room is in main broadcaster's lip-sync state, and the first prompt letter then can be generated in terminal Breath, sends the first prompt information in the corresponding caching of display interface, the first prompt information is shown in display interface, to mention Show that main broadcaster does not sing really currently, the unlawful practice of timely correction main broadcaster.Meanwhile terminal can also be by the first prompt information It is sent to server, the first prompt information is pushed to the account of patrolman's login by server again, in this way, patrolman is connecing After receiving prompt information, it can enter in the corresponding direct broadcasting room of the first prompt information and be checked, be further processed.

If method provided in this embodiment executes in the server, determine target direct broadcasting room be in main broadcaster lip-sync After state, the first prompt information can be generated, the first prompt information is pushed to the account of patrolman's login.Meanwhile it taking First prompt information can also be pushed to the account that corresponding main broadcaster logs in by business device, to prompt main broadcaster not drill really currently It sings, the unlawful practice of timely correction main broadcaster.

Optionally, the method for embodiment of the present disclosure offer can also include：If it is determined that be in main broadcaster true for target direct broadcasting room State is sung, then obtains the corresponding benchmark voice audio amplitude of multiple preset play times；In the voice of singing in the real sense of acquisition In audio, the corresponding voice audio amplitude of singing in the real sense of multiple preset play times is determined；For each play time, According to corresponding benchmark voice audio amplitude and corresponding voice audio amplitude of singing in the real sense, determine that intermediate K sings score value；By each broadcasting Time point corresponding intermediate K song score value adduction, obtains the corresponding K song total score of target song, issues the second prompt information.Its In, it include that the corresponding K of target song sings total score in the second prompt information.

In an implementation, if it is determined that target direct broadcasting room is in main broadcaster and sings in the real sense state, and terminal can obtain mesh from server Song is marked in the corresponding benchmark voice audio amplitude of multiple preset play times, wherein benchmark voice audio amplitude It can be the amplitude for the song that target song should be sung on corresponding time point.Then, it can be acquired by microphone Voice audio of singing in the real sense in, determine the corresponding voice audio amplitude of singing in the real sense of multiple preset play times.When will be each Between put corresponding benchmark voice audio amplitude and corresponding voice audio amplitude of singing in the real sense is compared, determine that intermediate K sings score value.Example Such as, if sometime putting corresponding voice audio amplitude of singing in the real sense is more than or equal to corresponding benchmark voice audio amplitude, Add 1 point, otherwise not score.The corresponding intermediate K song score value adduction of each time point may finally be determined that target song is corresponding K sing total score.It is then possible to issue the second prompt information.

If the method that the embodiment of the present disclosure provides executes in the terminal, the second prompt information can be generated, by the Two prompt informations are sent in the corresponding caching of display interface, and the second prompt information is shown in display interface, to prompt main broadcaster The song currently sung can obtain how many points, allow main broadcaster as reference.If the method that the embodiment of the present disclosure provides is to take It is executed in business device, the second prompt information can be generated, the second prompt information is pushed to the account that corresponding main broadcaster logs in, with The song that prompt main broadcaster currently sings can obtain how many points, allow main broadcaster as reference.

Disclosure another exemplary embodiment provides a kind of device for determining whether to sing in the real sense, as shown in figure 4, the device Including：

First obtains module 410, for obtaining multiple preset during target direct broadcasting room executes K song function The live video frame of play time acquisition；

First determining module 420, in the live video frame for determining each acquisition between upper lip position and lower lip position Distance；

Second determining module 430, in the live video frame according to each acquisition between upper lip position and lower lip position Distance, determine the target direct broadcasting room whether be in main broadcaster pretend to sing state.

Optionally, second determining module 430 includes：

Optionally, second determination unit, is used for：

Optionally, described device further includes：

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

The device provided by the embodiment of the present disclosure, can live streaming to acquiring respectively in multiple preset play times Video frame is identified, determines the distance between upper lip position and the lower lip position in live video frame, and then determines that target is straight Whether in main broadcaster's lip-sync state between broadcasting.In this way, can judge automatically whether main broadcaster is lip-sync by computer equipment, judge to locate The efficiency of reason is higher, even if the enormous amount of direct broadcasting room, can also find the violation operation of main broadcaster in time.

It should be noted that：It is provided by the above embodiment determine whether the device sung in the real sense when determining whether to sing in the real sense, Only the example of the division of the above functional modules, it in practical application, can according to need and by above-mentioned function distribution It is completed by different functional modules, i.e., the internal structure of computer equipment is divided into different functional modules, more than completing The all or part of function of description.In addition, it is provided by the above embodiment determine whether the device sung in the real sense with determine whether be The embodiment of the method sung in the real sense belongs to same design, and specific implementation process is detailed in embodiment of the method, and which is not described herein again.

Fig. 5 shows the structural schematic diagram of the terminal 1800 of one exemplary embodiment of disclosure offer.The terminal 1800 It can be：Smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 1800 is also Other titles such as user equipment, portable terminal, laptop terminal, terminal console may be referred to as.

In general, terminal 1800 includes：Processor 1801 and memory 1802.

Processor 1801 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 1801 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 1801 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 1801 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1801 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.

Memory 1802 may include one or more computer readable storage mediums, which can To be non-transient.Memory 1802 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1802 can Storage medium is read for storing at least one instruction, at least one instruction performed by processor 1801 for realizing this Shen Please in embodiment of the method provide the method for determining whether to sing in the real sense.

In some embodiments, terminal 1800 is also optional includes：Peripheral device interface 1803 and at least one periphery are set It is standby.It can be connected by bus or signal wire between processor 1801, memory 1802 and peripheral device interface 1803.It is each outer Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1803.Specifically, peripheral equipment includes： In radio circuit 1804, touch display screen 1805, camera 1806, voicefrequency circuit 1807, positioning component 1808 and power supply 1809 At least one.

Peripheral device interface 1803 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 1801 and memory 1802.In some embodiments, processor 1801, memory 1802 and periphery Equipment interface 1803 is integrated on same chip or circuit board；In some other embodiments, processor 1801, memory 1802 and peripheral device interface 1803 in any one or two can be realized on individual chip or circuit board, this implementation Example is not limited this.

Radio circuit 1804 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal. Radio circuit 1804 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1804 is by telecommunications Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 1804 include：Antenna system, one or more amplifiers, tuner, oscillator, digital signal processor, compiles solution at RF transceiver Code chipset, user identity module card etc..Radio circuit 1804 can by least one wireless communication protocol come with it is other Terminal is communicated.The wireless communication protocol includes but is not limited to：WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some implementations In example, radio circuit 1804 can also include that NFC (Near Field Communication, wireless near field communication) is related Circuit, the application are not limited this.

Display screen 1805 is for showing UI (User Interface, user interface).The UI may include figure, text, Icon, video and its their any combination.When display screen 1805 is touch display screen, display screen 1805 also there is acquisition to exist The ability of the touch signal on the surface or surface of display screen 1805.The touch signal can be used as control signal and be input to place Reason device 1801 is handled.At this point, display screen 1805 can be also used for providing virtual push button and/or dummy keyboard, it is also referred to as soft to press Button and/or soft keyboard.In some embodiments, display screen 1805 can be one, and the front panel of terminal 1800 is arranged；Another In a little embodiments, display screen 1805 can be at least two, be separately positioned on the different surfaces of terminal 1800 or in foldover design； In still other embodiments, display screen 1805 can be flexible display screen, is arranged on the curved surface of terminal 1800 or folds On face.Even, display screen 1805 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 1805 can be with Using LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) etc. materials preparation.

CCD camera assembly 1806 is for acquiring image or video.Optionally, CCD camera assembly 1806 includes front camera And rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.? In some embodiments, rear camera at least two is that main camera, depth of field camera, wide-angle camera, focal length are taken the photograph respectively As any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide Pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are realized in camera fusion in angle Shooting function.In some embodiments, CCD camera assembly 1806 can also include flash lamp.Flash lamp can be monochromatic temperature flash of light Lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for Light compensation under different-colour.

Voicefrequency circuit 1807 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and It converts sound waves into electric signal and is input to processor 1801 and handled, or be input to radio circuit 1804 to realize that voice is logical Letter.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 1800 to be multiple. Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 1801 or radio frequency will to be come from The electric signal of circuit 1804 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 1807 may be used also To include earphone jack.

Positioning component 1808 is used for the current geographic position of positioning terminal 1800, to realize navigation or LBS (Location Based Service, location based service).Positioning component 1808 can be the GPS (Global based on the U.S. Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group Part.

Power supply 1809 is used to be powered for the various components in terminal 1800.Power supply 1809 can be alternating current, direct current Electricity, disposable battery or rechargeable battery.When power supply 1809 includes rechargeable battery, which can be line charge Battery or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is to pass through The battery of wireless coil charging.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, terminal 1800 further includes having one or more sensors 1810.One or more sensing Device 1810 includes but is not limited to：Acceleration transducer 1811, gyro sensor 1812, pressure sensor 1813, fingerprint sensing Device 1814, optical sensor 1815 and proximity sensor 1816.

Acceleration transducer 1811 can detecte the acceleration in three reference axis of the coordinate system established with terminal 1800 Size.For example, acceleration transducer 1811 can be used for detecting component of the acceleration of gravity in three reference axis.Processor The 1801 acceleration of gravity signals that can be acquired according to acceleration transducer 1811, control touch display screen 1805 with transverse views Or longitudinal view carries out the display of user interface.Acceleration transducer 1811 can be also used for game or the exercise data of user Acquisition.

Gyro sensor 1812 can detecte body direction and the rotational angle of terminal 1800, gyro sensor 1812 Acquisition user can be cooperateed with to act the 3D of terminal 1800 with acceleration transducer 1811.Processor 1801 is according to gyro sensors The data that device 1812 acquires, may be implemented following function：Action induction (for example changing UI according to the tilt operation of user) is clapped Image stabilization, game control and inertial navigation when taking the photograph.

The lower layer of side frame and/or touch display screen 1805 in terminal 1800 can be set in pressure sensor 1813.When When the side frame of terminal 1800 is arranged in pressure sensor 1813, user can detecte to the gripping signal of terminal 1800, by Reason device 1801 carries out right-hand man's identification or prompt operation according to the gripping signal that pressure sensor 1813 acquires.Work as pressure sensor 1813 when being arranged in the lower layer of touch display screen 1805, is grasped by processor 1801 according to pressure of the user to touch display screen 1805 Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control, At least one of icon control, menu control.

Fingerprint sensor 1814 is used to acquire the fingerprint of user, is collected by processor 1801 according to fingerprint sensor 1814 Fingerprint recognition user identity, alternatively, by fingerprint sensor 1814 according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation by processor 1801, which grasps Make to include solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1814 can be set Set the front, the back side or side of terminal 1800.When being provided with physical button or manufacturer Logo in terminal 1800, fingerprint sensor 1814 can integrate with physical button or manufacturer Logo.

Optical sensor 1815 is for acquiring ambient light intensity.In one embodiment, processor 1801 can be according to light The ambient light intensity that sensor 1815 acquires is learned, the display brightness of touch display screen 1805 is controlled.Specifically, work as ambient light intensity When higher, the display brightness of touch display screen 1805 is turned up；When ambient light intensity is lower, the aobvious of touch display screen 1805 is turned down Show brightness.In another embodiment, the ambient light intensity that processor 1801 can also be acquired according to optical sensor 1815, is moved The acquisition parameters of state adjustment CCD camera assembly 1806.

Proximity sensor 1816, also referred to as range sensor are generally arranged at the front panel of terminal 1800.Proximity sensor 1816 for acquiring the distance between the front of user Yu terminal 1800.In one embodiment, when proximity sensor 1816 is examined When measuring the distance between the front of user and terminal 1800 and gradually becoming smaller, by processor 1801 control touch display screen 1805 from Bright screen state is switched to breath screen state；When proximity sensor 1816 detect the distance between front of user and terminal 1800 by When gradual change is big, touch display screen 1805 is controlled by processor 1801 and is switched to bright screen state from breath screen state.

It will be understood by those skilled in the art that the restriction of the not structure paired terminal 1800 of structure shown in Fig. 5, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.

Fig. 6 shows the structural schematic diagram of the server 1900 of one exemplary embodiment of disclosure offer.The server 1900 can generate bigger difference because configuration or performance are different, may include one or more processors (central Processing units, CPU) 1910 and one or more memory 1920.Wherein, it is deposited in the memory 1920 At least one instruction is contained, at least one instruction is loaded by the processor 1910 and executed to realize above-described embodiment institute The method for determining whether to sing in the real sense stated.

Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing disclosure disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are wanted by right It asks and points out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims

1. a kind of method for determining whether to sing in the real sense, which is characterized in that the method includes：

During target direct broadcasting room executes K song function, the live video in multiple preset play time acquisitions is obtained Frame；

According to the distance between upper lip position and lower lip position in the live video frame of each acquisition, the target direct broadcasting room is determined Whether in main broadcaster's lip-sync state.

2. the method according to claim 1, wherein upper lip position in the live video frame according to each acquisition The distance between lower lip position is set, determines whether the target direct broadcasting room is in main broadcaster's lip-sync state, including：

According to the multiple preset corresponding voice audio amplitude of play time and the live video of each acquisition The distance between upper lip position and lower lip position in frame, determine whether the target direct broadcasting room is in main broadcaster's lip-sync state.

3. according to the method described in claim 2, it is characterized in that, described distinguish according to the multiple preset play time The distance between upper lip position and lower lip position in corresponding voice audio amplitude and the live video frame of each acquisition, really Whether the fixed target direct broadcasting room is in main broadcaster's lip-sync state, including：

For each play time, if when the corresponding voice audio amplitude of the play time is greater than with the broadcasting Between while put adjacent any play time corresponding voice audio amplitude, in the live streaming view of play time acquisition The distance between upper lip position and lower lip position are greater than in the live video frame of any play time acquisition in frequency frame The distance between upper lip position and lower lip position, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense；Or Person, if when the corresponding voice audio amplitude of the play time is less than any broadcasting adjacent with the play time Between while put corresponding voice audio amplitude, upper lip position and lower lip in the live video frame of play time acquisition The distance between position be less than in the live video frame of any play time acquisition upper lip position and lower lip position it Between distance, it is determined that the corresponding voice audio of the play time is audio of singing in the real sense；Alternatively, if in the play time The corresponding voice audio amplitude of point is shaken equal to voice audio corresponding with the adjacent any play time of the play time While width, the distance in the live video frame of play time acquisition between upper lip position and lower lip position is equal to The distance between upper lip position and lower lip position in the live video frame of any play time acquisition, it is determined that described to broadcast Putting time point corresponding voice audio is audio of singing in the real sense；

The quantity for determining the audio of singing in the real sense in the corresponding voice audio of the multiple preset play time, if the quantity The ratio for accounting for the total quantity of the corresponding voice audio of the multiple preset play time is less than preset threshold value, it is determined that institute It states target direct broadcasting room and is in main broadcaster's lip-sync state.

4. the method according to claim 1, wherein the multiple preset play time is to carry out K song The midpoint of multiple notes that target song includes period shared in the target song respectively.

5. the method according to claim 1, wherein the method also includes：

It sings in the real sense state if it is determined that the target direct broadcasting room is in main broadcaster, then obtains the multiple preset play time difference Corresponding benchmark voice audio amplitude；

In the voice audio of singing in the real sense of acquisition, the corresponding voice audio of singing in the real sense of the multiple preset play time is determined Amplitude；

For each play time, according to corresponding benchmark voice audio amplitude and corresponding voice audio amplitude of singing in the real sense, really Fixed intermediate K sings score value；

By the corresponding intermediate K song score value adduction of each play time, the corresponding K song total score of target song for carrying out K song is obtained Value issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information.

6. a kind of device for determining whether to sing in the real sense, which is characterized in that described device includes：

First obtains module, for obtaining in multiple preset play times during target direct broadcasting room executes K song function The live video frame of point acquisition；

First determining module, the distance between upper lip position and lower lip position in the live video frame for determining each acquisition；

Second determining module, for the distance between upper lip position and lower lip position in the live video frame according to each acquisition, Determine whether the target direct broadcasting room is in main broadcaster's lip-sync state.

7. device according to claim 6, which is characterized in that second determining module includes：

Second determination unit, for according to the corresponding voice audio amplitude of the multiple preset play time and The distance between upper lip position and lower lip position in the live video frame of each acquisition, determine whether the target direct broadcasting room is in Main broadcaster's lip-sync state.

8. device according to claim 7, which is characterized in that second determination unit is used for：

9. device according to claim 6, which is characterized in that the multiple preset play time is to carry out K song The midpoint of multiple notes that target song includes period shared in the target song respectively.

10. device according to claim 6, which is characterized in that described device further includes：

Second obtain module, for when determine the target direct broadcasting room be in main broadcaster sing in the real sense state when, obtain it is the multiple presets The corresponding benchmark voice audio amplitude of play time；

Third determining module, in the voice audio of singing in the real sense of acquisition, determining the multiple preset play time difference Corresponding voice audio amplitude of singing in the real sense；

4th determining module, for for each play time, according to corresponding benchmark voice audio amplitude and corresponding true Voice audio amplitude is sung, determines that intermediate K sings score value；

Cue module, for obtaining the target song for carrying out K song for the corresponding intermediate K song score value adduction of each play time Corresponding K sings total score, issues prompt information, wherein includes that the corresponding K of the target song sings total score in the prompt information Value.

11. a kind of computer equipment, which is characterized in that the computer equipment includes processor, communication interface, memory and leads to Believe bus, wherein：

The memory, for storing computer program；

The processor, for executing the program stored on the memory, to realize any side claim 1-5 Method step.

12. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium Program realizes claim 1-5 any method and step when the computer program is executed by processor.