CN108377422B

CN108377422B - Multimedia content playing control method, device and storage medium

Info

Publication number: CN108377422B
Application number: CN201810157614.9A
Authority: CN
Inventors: 蒋伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-02-24
Filing date: 2018-02-24
Publication date: 2020-05-19
Anticipated expiration: 2038-02-24
Also published as: CN108377422A

Abstract

The invention relates to the multimedia field, in particular to a playing control method, a device and a storage medium of multimedia content, wherein the method comprises the steps of acquiring a playing environment image of the multimedia content when the playing environment of the multimedia content is determined to accord with a preset trigger condition; searching a face image in the playing environment image; extracting preset feature information in a face image, and determining a user category corresponding to the feature information of the face image; when the user category is determined to be set to be forbidden to watch the multimedia content, the multimedia content is determined to be stopped to be played, so that the current user category can be automatically and accurately judged according to the playing environment and the collected images, the playing control of the multimedia content can be realized according to different user categories, the efficiency is higher, the manual operation of a user is not needed, the watching experience of the user is improved, and the protection effect on a child user can be ensured.

Description

Multimedia content playing control method, device and storage medium

Technical Field

The present invention relates to the multimedia field, and in particular, to a method, an apparatus, and a storage medium for controlling playing of multimedia content.

Background

At present, various video platforms far from reaching the intelligent stage for protecting the kid user cannot effectively avoid the content such as violence and the like which are not suitable for the kid user to watch if the kid user watches the video.

In the prior art, the method usually provides an entry of a juvenile channel on a video homepage, and a user can click the entry of the juvenile channel and then jump to a corresponding juvenile content display interface to display the video content that the juvenile user is suitable to watch.

However, the method in the prior art needs the user to manually and actively select the juvenile channel, the display of the juvenile video content is more manually operated by the user, the juvenile user is not easy to control, if the parent does not accompany the juvenile video content, the juvenile video content can easily jump to other video content which is not suitable for the juvenile video content, the efficiency is low, and the protection effect on the juvenile user cannot be ensured.

Disclosure of Invention

Embodiments of the present invention provide a method and an apparatus for controlling playing of multimedia content, and a storage medium, so as to solve the problems in the prior art that efficiency is low and a protection effect for a juvenile user cannot be guaranteed due to a need of manual operation by a user.

The embodiment of the invention provides the following specific technical scheme:

according to a first aspect of the embodiments of the present invention, there is provided a method for controlling playing of multimedia content, the method including:

when the playing environment of the multimedia content is determined to meet a preset trigger condition, acquiring a playing environment image of the multimedia content;

searching a face image in the playing environment image;

extracting preset feature information in the face image, and determining a user category corresponding to the feature information of the face image;

and determining to stop playing the multimedia content when the user category is set to prohibit viewing of the multimedia content.

According to a second aspect of the embodiments of the present invention, there is provided a method for controlling playback of multimedia content, the method including:

acquiring user sound appearing in a playing environment of multimedia content;

confirming a user category corresponding to the voice characteristics of the user voice appearing in the playing environment;

According to a third aspect of the embodiments of the present invention, there is provided a method for controlling playback of multimedia content, the method including:

sending the playing environment image to a multimedia content server;

receiving a playing control instruction returned by the multimedia content server;

and stopping playing the multimedia content when the playing control instruction is confirmed to carry indication information for prohibiting playing the multimedia content, wherein the multimedia content server searches a face image in the playing environment image, extracts preset feature information in the face image, and carries the indication information in the playing control instruction when determining that the user category corresponding to the feature information of the face image is prohibited to watch the multimedia content.

According to a fourth aspect of the embodiments of the present invention, there is provided a play control method of multimedia content, the method including:

detecting a user sound occurring in a playing environment of the multimedia content;

transmitting the user voice to a multimedia content server;

and stopping playing the multimedia content when the playing control instruction is confirmed to carry indication information for prohibiting playing the multimedia content, wherein the multimedia content server confirms a user category corresponding to the voice feature of the user voice and carries the indication information in the playing control instruction when the user category is set to prohibit watching the multimedia content.

According to a fifth aspect of the embodiments of the present invention, there is provided a playback control apparatus for multimedia content, the apparatus including:

the acquisition module is used for acquiring a playing environment image of the multimedia content when the playing environment of the multimedia content is determined to meet a preset trigger condition;

the searching module is used for searching a face image in the playing environment image;

the user category determining module is used for extracting preset feature information in the face image and determining a user category corresponding to the feature information of the face image;

and the control module is used for determining to stop playing the multimedia content when the user category is set to be forbidden to watch the multimedia content.

According to a sixth aspect of the embodiments of the present invention, there is provided a playback control apparatus for multimedia content, the apparatus including:

the acquisition module is used for acquiring user sound appearing in the playing environment of the multimedia content;

the user category determining module is used for determining a user category corresponding to the voice feature of the user voice appearing in the playing environment;

According to a seventh aspect of the embodiments of the present invention, there is provided a playback control apparatus of multimedia content, the apparatus including:

the sending module is used for sending the playing environment image to a multimedia content server;

the receiving module is used for receiving a playing control instruction returned by the multimedia content server;

a processing module, configured to stop playing the multimedia content when it is determined that the play control instruction carries indication information for prohibiting playing the multimedia content, where: the multimedia content server searches a face image in the playing environment image, extracts preset characteristic information in the face image, and carries the indication information in the playing control instruction when determining that the user category corresponding to the characteristic information of the face image is forbidden to watch the multimedia content.

According to an eighth aspect of the embodiments of the present invention, there is provided a playback control apparatus for multimedia content, the apparatus including:

the acquisition module is used for detecting user voice appearing in the playing environment of the multimedia content;

the sending module is used for sending the user voice to a multimedia content server;

and the processing module is used for stopping playing the multimedia content when the playing control instruction carries indication information for prohibiting playing the multimedia content, wherein the multimedia content server confirms a user category corresponding to the voice feature of the user voice and carries the indication information in the playing control instruction when the user category is set as prohibiting watching the multimedia content.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of any of the above-described methods for controlling playback of multimedia content.

In the embodiment of the invention, when the playing environment of the multimedia content is determined to meet the preset triggering condition, the playing environment image of the multimedia content is obtained; searching a face image in the playing environment image; extracting preset feature information in the face image, and determining a user category corresponding to the feature information of the face image; when the user type is determined to be forbidden to watch the multimedia content, the multimedia content is determined to stop being played, so that the corresponding user type can be determined according to the playing environment image, the playing control of the multimedia content can be realized according to the user type, the current user type and the playing control can be automatically identified, the manual operation of a user is not needed, the efficiency is improved, the protection effect on a child user can be ensured, the watching experience of the user is improved, and when the playing environment is determined to meet the preset triggering condition, the playing environment image is triggered to be obtained, the camera is not required to be called all the time to collect the playing environment image, the resource waste is avoided, and the efficiency is improved.

Drawings

Fig. 1A is an application scene architecture diagram of a playing control method of multimedia content according to an embodiment of the present invention;

fig. 1B is an application scene architecture diagram of a playing control method of multimedia content according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for controlling playing of multimedia content according to an embodiment of the present invention;

FIG. 3 is a graph showing the trend of normalized syllable/second with age according to the embodiment of the present invention;

fig. 4 is a flowchart of another method for controlling playing of multimedia content according to an embodiment of the present invention;

fig. 5 is a flowchart of a method for controlling playing of multimedia content in a specific application scenario according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a playing control apparatus for multimedia content according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a playing control apparatus for multimedia content according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a playing control apparatus for multimedia content according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a playing control apparatus for multimedia content according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating a server architecture according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a terminal structure in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

To facilitate an understanding of the embodiments of the present invention, a few concepts are briefly introduced below:

face detection: the method is characterized in that for any given image, the image is searched by adopting a certain strategy to determine whether the image contains a human face, and if so, the position, the size and the posture of a face are returned.

The smart television: the television is a new television product which is provided with a full-open platform, carries an operating system, and can automatically install and uninstall various application software and continuously expand and upgrade functions while users enjoy common television contents.

A television box: the system is a small-sized computing terminal device, and can realize webpage browsing, network video playing and application program installation on the traditional television as long as the system is simply connected with the traditional television through High Definition Multimedia Interface (HDMI) or color difference line and other technologies, even can project photos and videos in a mobile phone and a flat panel into a large-screen television, and can play internet contents on the television through the system.

Mel-frequency cepstral coefficients (MFCC): the Mel frequency is extracted based on the auditory characteristics of human ears, and forms a nonlinear corresponding relation with the Hz frequency, and the MFCC calculates the obtained Hz frequency spectrum characteristic by utilizing the relation between the Mel frequency and the Hz frequency, and is mainly used for extracting the voice data characteristic and reducing the operation dimensionality.

Linear Prediction Cepstrum Coefficient (LPCC): the linear prediction cepstrum coefficient is a representation mode of the linear prediction coefficient in a cepstrum domain, and the LPC order in an experiment is a linear prediction cepstrum parameter.

Gaussian Mixed Model (GMM): this can also be abbreviated as (migtroore of gaussian, MOG), and the gaussian model is a model that accurately quantifies objects with gaussian probability density functions and decomposes one object into a plurality of objects based on the gaussian probability density functions.

Referring to fig. 1A and fig. 1B, application scene architecture diagrams of a playing control method of multimedia content according to an embodiment of the present invention at least include a terminal and a server.

The terminal can be any intelligent device such as a smart phone, a tablet computer, a portable personal computer and a smart television. Various Applications (APPs) can be installed on the terminal, for example, a video APP, and the terminal can provide video services to the user through the video APP.

The terminal and the server are connected through the Internet to realize mutual communication.

For example, in fig. 1A, the terminal is a smart television, the smart television may be connected to the television box through a preset interface, the smart television communicates with the server through the television box, or the smart television may directly communicate with the server.

For another example, in fig. 1B, the terminal is a smart phone, and the smart phone is connected to the server through the internet.

The server provides various network services for the terminal, and for different terminals or application programs on the terminals, the server can be regarded as a background server providing corresponding network services. For another example, the server may send an update package to the terminal to implement the update of the APP.

The server may be one server, a server cluster formed by a plurality of servers, or a cloud computing center.

Referring to fig. 1A and fig. 1B, in a possible implementation, a terminal is installed and runs a video APP, and accordingly, a server is a multimedia content server, the terminal sends a video content acquisition request to the server through the video APP, and after receiving the video content acquisition request, the server returns corresponding video content to the server, so that the terminal displays the corresponding video content to a user through the video APP.

However, in the prior art, the video content cannot be automatically filtered and displayed in the process, regardless of the category of the user currently watching the video, so that for the juvenile user, the juvenile user can easily see the video content which is not suitable for the juvenile user to watch in the current age range, which results in that on one hand, the health growth of the juvenile user is not facilitated, and on the other hand, the image of the video APP operator is also affected.

In order to solve the problems that the efficiency is low and the protection effect for children users cannot be guaranteed due to the fact that manual operation of users is needed in the prior art, the embodiment of the invention provides a playing control method for multimedia contents, and in a possible implementation mode, a terminal acquires playing environment images of the multimedia contents and sends the playing environment images to a multimedia content server; after receiving the playing environment image, the multimedia content server searches a face image in the playing environment image, determines a user category corresponding to the face image, sends a playing control instruction to the terminal when determining that the user category corresponding to the characteristic information of the face image is forbidden to watch the multimedia content, and carries indication information for forbidding playing the multimedia content in the playing control instruction; the terminal receives a playing control instruction returned by the multimedia content server, and stops playing the multimedia content when the playing control instruction is confirmed to carry indication information for prohibiting playing the multimedia content, so that the current user category can be identified by playing the environment image, whether the current multimedia content is stopped to be played is determined according to the user category, the watching content of a child user can be controlled, manual operation is not needed, and personalized content recommendation can be realized according to different user categories.

In the embodiment of the invention, the playing control method of the multimedia content also provides another possible implementation mode, and the terminal detects the user sound appearing in the playing environment of the multimedia content and sends the user sound to the multimedia content server; after receiving the user voice, the multimedia content server determines the user category corresponding to the voice feature of the user voice, and when determining that the user category is set as forbidden to watch the multimedia content, sends a playing control instruction to the terminal, and the playing control instruction carries indication information for forbidding playing the multimedia content; the terminal receives a playing control instruction returned by the multimedia content server, and stops playing the multimedia content when the playing control instruction is confirmed to carry indication information for prohibiting playing the multimedia content, so that user category identification and playing control can be directly carried out according to sound.

In another possible implementation manner, in the embodiment of the present invention, a playing control function of all multimedia contents may also be integrated in a terminal, the terminal obtains a playing environment image of the multimedia contents, performs a judgment, searches for a face image in the playing environment image, and determines a user category corresponding to the face image, or determines a user category corresponding to a user sound according to a user sound appearing in the playing environment, and when it is determined that the user category is a user prohibited from watching the multimedia contents, it is determined to stop playing the multimedia contents.

That is to say, the method for controlling playing of multimedia content may be executed by a terminal, or may be executed by a server, and the embodiment of the present invention is not limited thereto.

Optionally, in the embodiment of the present invention, after the user category is determined, there may be other implementation manners, and the switching to the video channel corresponding to the user category may be determined according to a preset mapping relationship between the user category and the video channel; or recommending the multimedia content corresponding to the user category according to the user category.

Optionally, in the embodiment of the present invention, to further improve efficiency, after determining the user category, another possible implementation manner is provided, and the terminal may directly switch to the corresponding video channel according to the user category, and only ensure that the corresponding video channel is available, that is, the user is not allowed to switch to another video channel, at this time, the user may only select video content on the corresponding video channel to watch, so as to ensure a protection effect for the juvenile user, and the user category does not need to be sent to the server, so that interaction with the server is reduced, and time and network resources are also saved.

Optionally, the internet described above uses standard communication techniques and/or protocols. The internet is typically the internet, but can be any Network including, but not limited to, a Local Area Network (LAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a mobile, wireline or wireless Network, a private Network, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including HyperText Mark-up Language (HTML), Extensible Mark-up Language (XML), and so forth. All or some of the links may also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec). In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.

It should be noted that the application scenario architecture diagram in the embodiment of the present invention is to more clearly illustrate the technical solution in the embodiment of the present invention, and does not limit the technical solution provided in the embodiment of the present invention, and in the embodiment of the present invention, the technical solution is not limited to the video field, and is only described with respect to user category identification and application in the video field, and for other application scenario architectures and service applications, the technical solution provided in the embodiment of the present invention is also applicable to similar problems.

In the embodiments of the present invention, a method for controlling playing of multimedia content is schematically illustrated by using a terminal and a server shown in fig. 1A or fig. 1B as an example.

In order to solve the problems that the efficiency is low and the protection effect on the juvenile user cannot be guaranteed due to the fact that manual operation of a user is needed in the prior art, the method and the device for playing the multimedia content can automatically identify the user type, for example, whether the current juvenile user is identified or not, and further play control of the multimedia content can be achieved according to the user type. Based on the foregoing embodiment, taking a server as a multimedia content server as an example, referring to fig. 2, a flowchart of a playing control method of multimedia content in the embodiment of the present invention is shown, where the method includes:

step 200: the terminal acquires a playing environment of the multimedia content.

In the embodiment of the present invention, in order to enable the terminal to obtain data of a required playing environment and then to collect an image of the playing environment, the terminal needs to be preset with corresponding hardware devices, for example, the hardware devices that the terminal needs to be preset include a camera and an optical sensor, such as an ambient light sensor and a microphone. The camera is used for collecting images of a playing environment, the optical sensor is used for acquiring ambient brightness and detecting ambient brightness change of the surrounding playing environment, and the microphone is used for collecting sound of the surrounding playing environment.

For example, if the terminal is a smart phone, these hardware devices are usually integrated in the smart phone at present, and may be used to support the technical solution provided by the embodiment of the present invention.

For another example, for a situation that the above hardware device is not installed in the current smart television, there may be several solutions as follows: 1) the external camera and the optical sensor integrated equipment can be provided for the smart television or a television box of the smart television; the intelligent television is connected with the television box, and mutual communication is achieved. 2) A free hardware device platform, such as a penguin polar light box, may be provided on which hardware devices such as cameras, light sensors, etc. are integrated. 3) A microphone can be arranged on a remote controller of the intelligent television, and the user sound in the playing environment can be acquired through the remote controller. That is, for an application scenario of the smart tv, the terminal may be a combination of the smart tv and/or a tv box of the smart tv.

Wherein, the playing environment at least comprises the environment brightness and the user sound. Of course, in the embodiment of the present invention, the data is not limited to the ambient brightness and the user sound, and other data of the playing environment that can be used for the user category identification may also be applied to the scheme provided in the embodiment of the present invention.

For example, after a user starts a certain video APP on a terminal, in the process of displaying a program played normally by the certain video APP, the playing environment around the terminal can be detected and acquired in real time.

Step 201: the terminal sends the playing environment to the multimedia content server.

Step 202: and the multimedia content server receives the playing environment sent by the terminal.

For example, if personalized recommendation of video content is implemented for a certain video APP in the embodiment of the present invention, the multimedia content server may be a background server corresponding to the certain video APP, and after the certain video APP is started on the terminal, in a video interface display process of the video APP, if data of a playing environment is detected, the multimedia content server may receive data of the playing environment sent by the terminal.

Step 203: and if the server determines that the playing environment meets the preset triggering condition, the terminal is triggered to collect the playing environment image.

When step 203 is executed, the following situations may be included:

in the first case: if the playing environment at least includes the ambient brightness, it is determined that the playing environment meets the preset trigger condition, which specifically includes:

firstly, judging whether the value of the intensity reduction of the ambient brightness is not less than a preset threshold value, and if the value of the intensity reduction of the ambient brightness is not less than the preset threshold value, determining an area with the intensity reduction of the ambient brightness.

In the embodiment of the present invention, the threshold is not limited, and may be set according to actual requirements.

That is to say, in the embodiment of the present invention, the ambient brightness may be detected by the optical sensor, and when a shadow blocks the screen of the terminal, it is detected that the intensity of the ambient brightness in a partial area of the screen of the terminal is reduced, so that it may be determined whether the intensity of the ambient brightness is reduced, to improve accuracy, a preset threshold is set, and if it is determined that the reduced value of the intensity of the ambient brightness is not less than the preset threshold, it may be determined that the screen of the terminal has a blocking object, and in order to improve accuracy of triggering the user category identification, it is further determined whether the blocking object is a shadow.

And then, judging whether the shape of the region conforms to the figure shape or not according to the shape of the region and the preset figure shape, and if so, determining that the ambient brightness conforms to a preset trigger condition.

The method specifically comprises the following steps: and if the similarity between the shape of the determined area and the preset figure shape is not less than the set value, determining that the shape of the determined area conforms to the figure shape.

In the embodiment of the invention, a plurality of different figure shapes can be preset, can be used as a figure shape database, and can also determine the characteristics and the like of the figure shapes, so that the shape of the area with reduced ambient brightness is subjected to pattern recognition and comparison, and the scene with the figure can be accurately recognized.

In the second case: if the playing environment at least includes the user sound, determining that the playing environment meets a preset trigger condition, specifically including:

and confirming the user category corresponding to the voice characteristics of the user voice appearing in the playing environment, and setting the user category to be prohibited from watching the multimedia content.

In the embodiment of the present invention, the determination of the user category is performed by voice, because, in practice, the user voice has a certain correlation with the age, as shown in fig. 3, which is a schematic diagram of the change trend of normalized syllables/seconds with the age, wherein the abscissa is the age, and the ordinate is the normalized syllables/seconds, which can be used to represent the speech rate, it can be known that the speech rate is different for men and women, and the voices are different, therefore, in the embodiment of the present invention, men and women need to be distinguished when establishing the speech recognition model, and the speech rate is constantly changed with the age regardless of men and women, therefore, in the embodiment of the present invention, the age can be determined by voice, i.e., the user category is recognized and determined, and in order to improve the accuracy, other speech characteristics may also be used to make the determination of the user category.

The method specifically comprises the following steps: first, speech features of a user's voice in a playback environment are extracted.

The voice feature is preferably selected from voice features with good age differentiation, such as MFCC and LPCC, but other voice features may be selected, and the embodiment of the present invention is not limited thereto.

Then, a user category corresponding to a speech feature of a user's voice appearing in the playback environment is determined.

In the embodiment of the present invention, the corresponding user category is determined according to the user voice, and there may be the following two ways:

the first mode is as follows: and analyzing the age corresponding to the voice feature of the user voice according to the pre-established voice recognition model and the voice feature of the user voice, and determining the user category corresponding to the user voice.

In the embodiment of the present invention, the pre-established speech recognition model is used to recognize ages corresponding to different user voices, and the training mode of the speech recognition model is as follows:

1) a speech sample is obtained.

In the embodiment of the present invention, voice samples of all ages are obtained, in order to realize recognition of users of children, and since the voices of men and women are usually clearly distinguished, in the embodiment of the present invention, four intervals may be divided according to the ages and genders, for example, male is 18 years old or older, male is 18 years old or younger, female is 18 years old or younger, and female is 18 years old or older, so that four voice recognition models may be obtained accordingly.

2) And extracting the voice characteristics of the voice sample.

For example, speech features are MFCC, or LPCC, or the like.

3) And establishing a voice recognition model according to the voice characteristics of the voice sample and the corresponding age of the voice sample and based on a preset training model.

For example, the preset training model is a GMM model, the voice features are MFCCs, MFCCs of each voice sample are extracted, and training learning is performed by using the GMM model according to the MFCCs and the corresponding ages, thereby establishing a voice recognition model.

The preset training model may also be a neural network, and the like, and in the embodiment of the present invention, the present invention is not limited.

Thus, in the embodiment of the present invention, the current user category may be determined according to the user voice, for example, whether there is a child user may be determined according to a speech recognition model of the child user.

The second mode is as follows: in the embodiment of the invention, another embodiment for determining the user category according to the user voice is also provided, and a voice sample input by a user and the user category corresponding to the voice sample set by the user are received; and comparing the user voice appearing in the playing environment with the voice sample, and if the similarity is determined to be greater than a first set value, determining the user category corresponding to the user voice appearing in the playing environment according to the user category corresponding to the voice sample set by the user.

That is to say, in the embodiment of the present invention, the user sound input by the user and the corresponding user category may be received in advance, for example, an interface operation entry is provided, the user uploads the voice sample of the juvenile user in the family through the entry, and the user category of the uploaded voice sample is preset as the juvenile user, so that after the user sound is obtained, the user sound may be directly compared with the voice sample, and if the similarity is greater than the first setting value, the user category may be determined as the juvenile user, so that the sound of the juvenile user in different families may be directly determined, and the recognition accuracy is increased.

And finally, confirming the user category corresponding to the voice characteristics of the user voice appearing in the playing environment, and determining that the playing environment meets the preset triggering condition when the user category is set to be forbidden to watch the multimedia content.

This is because there may be a certain error in identifying the user category through the user voice, for example, there may be a case where the voice is compared with an adult like a child, so as to improve the accuracy, and therefore, when it is determined that the user category is prohibited from viewing the multimedia content through the user voice, it is necessary to further determine whether the corresponding user category is prohibited from viewing through capturing an image.

Step 204: and the terminal collects the playing environment image.

For example, the terminal may capture a playing environment image of the multimedia content by calling a camera.

Step 205: and the terminal sends the playing environment image to the multimedia content server.

Step 206: the multimedia content server searches the face image in the playing environment image and determines the user category corresponding to the feature information of the face image.

When step 206 is executed, the method specifically includes:

firstly, according to a preset human face detection algorithm, a human face image is searched in a playing environment image.

In the embodiment of the present invention, the preset face detection algorithm is not limited, and according to the face detection algorithm, if a face is detected, information such as a position of the face can be obtained, for example, a coordinate sequence of a face frame is output.

And then, extracting the preset characteristic information in the face image.

The preset feature information, such as the feature information of the eyes and the nose, is not limited in the embodiment of the present invention, and preferably, the preset feature information is the feature information capable of distinguishing different ages.

And finally, comparing the preset feature information in the face image with a preset face model to determine the corresponding user category.

The preset face model represents face features corresponding to all age stages.

Specifically, in the embodiment of the present invention, the face registration is performed according to the coordinate sequences of the playing environment image and the recognized face image, and the coordinate sequence of the feature information in the face image can be output; processing the face image according to the coordinate sequence of the feature information, for example, rotating, zooming, deducting and the like, and adjusting the face image to a preset size and shape; and then extracting all feature information, performing attribute analysis, comparing the feature information with a preset face model, and identifying the age corresponding to the face image, thereby determining the corresponding user category, namely judging whether the user is a juvenile user.

Furthermore, in the embodiment of the present invention, the image age recognition model may be obtained by training and learning according to each feature information of faces of different ages, so that the age of the face image may be determined according to the image age model and the feature information in the extracted face image.

Optionally, in the embodiment of the present invention, in order to improve accuracy and efficiency, another implementation manner for determining a user category corresponding to feature information of a face image is provided, where an image sample input by a user and a user category corresponding to the image sample set by the user are received; and comparing the face image with the image sample, and if the similarity is determined to be greater than a second set value, determining the user type corresponding to the face image according to the user type corresponding to the image sample set by the user.

For example, a user can upload a photo of a child in a family, the photo is used as an image sample, and the user category corresponding to the photo is set as the child user, so that the judgment can be directly performed according to the photo, and whether a face image in a currently acquired playing environment image is similar to the child in the family is judged, so that whether the image is the child user in the family can be quickly identified for different families, the efficiency is improved, and the pertinence is higher.

Optionally, in the embodiment of the present invention, an implementation manner of determining a user category corresponding to feature information of a face image is provided, and another implementation manner is provided, where a voice sample and an image sample input by a user are received, a playing environment image is compared with the image sample, and a user sound occurring in a playing environment is compared with the voice sample, so as to determine the user category, so that accuracy of determining the user category can be further improved, and a situation that a user scene such as a father-son portrait may cause erroneous determination can be well avoided.

Step 207: and when the multimedia content server confirms that the user category is set as the prohibition of watching the multimedia content, the multimedia content server returns a play control instruction to the terminal.

Wherein, the playing control instruction carries indication information for prohibiting playing the multimedia content.

Step 208: and the terminal receives a play control instruction returned by the multimedia content server.

Step 209: and when the terminal confirms that the playing control instruction carries the indication information for prohibiting playing the multimedia content, the terminal stops playing the multimedia content.

It should be noted that fig. 2 is only described as an example of implementing the playing control of the multimedia content by stopping playing the multimedia content, and certainly, other manners may also be adopted, for example, determining to switch to the video channel corresponding to the user category according to a preset mapping relationship between the user category and the video channel. For another example, according to the user category, the multimedia content corresponding to the user category is recommended.

For example, if the user category is determined to be a juvenile user, the user is switched to a juvenile channel, the content of the juvenile channel is displayed, the situation that the video content which is not suitable for the juvenile user is displayed for the juvenile user can be avoided, the purpose of protecting the juvenile user is achieved, the use degree of the user is improved, and the brand image and the competitiveness of the video APP can also be improved.

Of course, in the embodiment of the present invention, the method for controlling playing of multimedia content is not limited to the video field, and may also be applied to other fields, for example, according to the determined user category, if it is determined that the user category is inconsistent with the user category of the preset information viewing right, the currently displayed information is closed, or other information corresponding to the user category is displayed.

In another possible embodiment of the present invention, instead of determining whether to trigger the acquisition of the playing environment image according to the acquired playing environment, other triggering methods are adopted for other use scenarios, specifically, the following situations are provided: 1) in the first case: determining that the application program is started for the first time. 2) In the second case: and awakening after determining that the screen is dormant.

Optionally, in the embodiment of the present invention, a function key for turning on the play control method of the multimedia content may be further provided, and a user may select whether to turn on the function key according to the function key, and only when the user selects to turn on the function, the play control of the multimedia content is performed, so that flexibility may be improved, and user requirements may be further satisfied. For example, if there is no children in the user's home, the user may choose not to turn on, preventing interference with the user's playing of multimedia content.

Based on the foregoing embodiment, referring to fig. 4, a flowchart of another method for controlling playing of multimedia content according to an embodiment of the present invention is shown, where the method includes:

step 400: user sound occurring in a playback environment of multimedia content is acquired.

Step 401: and confirming the user category corresponding to the voice characteristics of the user voice appearing in the playing environment.

Step 402: and determining to stop playing the multimedia content when the user category is set to prohibit viewing of the multimedia content.

That is to say, in the embodiment of the present invention, the user category can be determined according to the user voice, and the playing control of the multimedia content is realized without calling a camera, so that the realization is simpler.

Based on the foregoing embodiment, taking the example of performing the playing control of the multimedia content for the juvenile user, that is, determining whether the user category of the current user is the juvenile user, as an example, the description is given with reference to fig. 5, which is a flowchart of another method for controlling the playing of the multimedia content provided in the embodiment of the present invention, where the method includes:

step 500: the application starts.

Step 501: and acquiring a playing environment image of the multimedia content, and detecting the user category according to the playing environment image.

For example, when a user opens a certain video APP on the terminal, the user type judgment is actively carried out once, the camera is called, the playing environment image is obtained through the camera, and the user type detection is carried out.

Step 502: and judging whether the user is a juvenile user, if so, executing a step 503, otherwise, executing a step 504.

Step 503: and displaying the content of the children.

For example, switching to the juvenile channel, the content of the juvenile channel is presented.

For another example, the juvenile content is recommended and presented directly on the current interface.

Step 504: and (5) displaying normal content.

That is, without limitation, the content that the user wants to view can be normally presented.

Step 505: it is determined whether a sound is detected, if so, step 506 is performed, otherwise, step 507 is performed.

Step 506: and judging whether the user is a juvenile user, if so, executing the step 503, otherwise, executing the step 509.

Specifically, it is determined whether the user category corresponding to the voice feature of the user voice appearing in the playing environment is a juvenile user.

For example, it is determined whether the user is a child user based on the user's voice and a speech recognition model established in advance.

In the embodiment of the present invention, as shown in fig. 5, in another method for controlling playing of multimedia content, if a playing environment at least includes a user sound, it is determined whether the user sound meets a preset trigger condition, that is, whether a playing environment image is acquired by triggering, and when a user category is identified according to the playing environment image, another possible implementation manner is provided. This is because, a certain error may exist in identifying the user category by the user voice, so as to improve accuracy, and the user category can be identified and controlled for playing the multimedia content according to a specific user category, therefore, when the user category identified by the user voice is allowed to view the multimedia content, that is, not a juvenile user, it is necessary to further determine whether viewing is indeed allowed, then trigger collection of the playing environment image, identify the user category according to the playing environment image, for example, identify the user category not a juvenile user according to the user voice, determine that the triggering condition is met, and further determine whether the user category is a juvenile user by identifying the playing environment image, so that the accuracy of determination can be improved, and erroneous determination can be prevented.

Step 507: detecting whether the ambient brightness changes, if so, executing step 508, otherwise, executing step 504.

Step 508: and judging whether the image is a shadow, if so, executing a step 509, otherwise, returning to execute the step 504.

Step 509: and acquiring a playing environment image of the multimedia content, performing user category detection according to the playing environment image, and returning to the step 502.

In the embodiment of the invention, when the playing environment of the multimedia content is determined to meet the preset triggering condition, the playing environment image of the multimedia content is obtained, the face image is searched in the playing environment image, the preset characteristic information in the face image is extracted, the user category corresponding to the characteristic information of the face image is determined, and when the user category is determined to be set to be forbidden to watch the multimedia content, the playing of the multimedia content is determined to be stopped, so that the current user category can be accurately judged, the playing control of the multimedia content is automatically realized according to different user categories, the personalized video content recommendation is carried out, the efficiency is higher, the manual operation of a user is not needed, the watching experience of the user is improved, the protection effect on children is ensured, and when the playing environment is determined to meet the preset triggering condition, the playing environment image is triggered to be obtained, therefore, the camera does not need to be called all the time to collect the playing environment image, and is started to collect the image when the triggering condition is met, so that the resource waste is avoided, and the efficiency is improved.

Based on the foregoing embodiments, referring to fig. 6, in an embodiment of the present invention, a multimedia content playing control apparatus at a multimedia content server side is implemented by hardware or a combination of hardware and software as all or a part of a multimedia content server, and specifically includes:

a first determining module 64, configured to determine whether a playing environment of the multimedia content meets a preset trigger condition;

an obtaining module 60, configured to obtain a playing environment image of the multimedia content when the first determining module 64 determines that the playing environment of the multimedia content meets a preset trigger condition;

a searching module 61, configured to search for a face image in the playing environment image;

a user category determining module 62, configured to extract preset feature information in the face image, and determine a user category corresponding to the feature information of the face image;

and the control module 63 is configured to determine to stop playing the multimedia content when the user category is set to prohibit viewing the multimedia content.

Optionally, the playing environment at least includes ambient brightness; then, the determining that the playing environment of the multimedia content meets the preset trigger condition, the first determining module 64 is configured to: judging whether the value of the intensity reduction of the ambient brightness is not less than a preset threshold value, and if the value of the intensity reduction of the ambient brightness is not less than the preset threshold value, determining an area with the intensity reduction of the ambient brightness; and judging whether the shape of the region conforms to the shape of the figure according to the shape of the region and the shape of the preset figure, and if so, determining that the ambient brightness conforms to the preset triggering condition.

Optionally, if the playing environment at least includes a user sound, the first determining module 64 is configured to determine that the playing environment of the multimedia content meets a preset trigger condition: and confirming the user category corresponding to the voice characteristics of the user voice appearing in the playing environment, and setting the user category to be prohibited from watching the multimedia content.

Optionally, the user category determining module 62 is specifically configured to: extracting preset characteristic information in the face image; and comparing the preset feature information in the face image with a preset face model to determine the corresponding user category.

The apparatus further comprises a receiving module 65 configured to: receiving a voice sample input by a user and a user category corresponding to the voice sample set by the user; and/or receiving an image sample input by a user and a user category corresponding to the image sample set by the user.

Optionally, the user category determining module 62 is further configured to: comparing the user voice appearing in the playing environment with the voice sample, and if the similarity is determined to be greater than a first set value, determining the user category corresponding to the user voice appearing in the playing environment according to the user category corresponding to the voice sample set by the user;

the user category determination module 62 is specifically configured to: and comparing the face image with the image sample, and if the similarity is determined to be greater than a second set value, determining the user type corresponding to the face image according to the user type corresponding to the image sample set by the user.

Optionally, after determining the user category corresponding to the feature information of the face image, the control module 63 is further configured to: determining to switch to a video channel corresponding to a user category according to a preset mapping relation between the user category and the video channel; or recommending the multimedia content corresponding to the user category according to the user category.

Based on the foregoing embodiments, referring to fig. 7, in an embodiment of the present invention, another apparatus for controlling playing of multimedia content at a multimedia content server side is further provided, where the apparatus is implemented by hardware or a combination of hardware and software as all or a part of a multimedia content server, and specifically includes:

an obtaining module 70, configured to obtain a user sound appearing in a playing environment of the multimedia content;

a user category determining module 71, configured to determine a user category corresponding to a voice feature of a user sound appearing in the playing environment;

the control module 72 is configured to determine to stop playing the multimedia content when the user category is set as prohibited from viewing the multimedia content.

Based on the foregoing embodiments, as shown in fig. 8, in an embodiment of the present invention, a device for controlling playing of multimedia content at a terminal side is provided, and the device is implemented by hardware or a combination of hardware and software to become all or a part of the terminal, and specifically includes:

an obtaining module 80, configured to obtain a playing environment image of the multimedia content when it is determined that the playing environment of the multimedia content meets a preset trigger condition;

a sending module 81, configured to send the playing environment image to a multimedia content server;

a receiving module 82, configured to receive a play control instruction returned by the multimedia content server;

a processing module 83, configured to stop playing the multimedia content when it is determined that the play control instruction carries indication information for prohibiting playing the multimedia content, where: the multimedia content server searches a face image in the playing environment image, extracts preset characteristic information in the face image, and carries the indication information in the playing control instruction when determining that the user category corresponding to the characteristic information of the face image is forbidden to watch the multimedia content.

Based on the foregoing embodiments, referring to fig. 9, in an embodiment of the present invention, another apparatus for controlling playing of multimedia content at a terminal side is provided, where the apparatus is implemented by hardware or a combination of hardware and software as all or a part of the terminal, and specifically includes:

an obtaining module 90, configured to detect a user sound occurring in a playing environment of the multimedia content;

a sending module 91, configured to send the user sound to a multimedia content server;

a receiving module 92, configured to receive a play control instruction returned by the multimedia content server;

a processing module 93, configured to stop playing the multimedia content when it is determined that the play control instruction carries indication information for prohibiting playing the multimedia content, where the multimedia content server determines a user category corresponding to a voice feature of the user sound, and when it is determined that the user category is set to prohibit viewing the multimedia content, carries the indication information in the play control instruction.

Based on the above embodiments, referring to fig. 10, a schematic structural diagram of a server in an embodiment of the present invention is shown.

Embodiments of the present invention provide a server, which may include a processor 1010 (CPU), a memory 1020, an input device 1030, an output device 1040, and the like, wherein the input device 1030 may include a keyboard, a mouse, a touch screen, and the like, and the output device 1040 may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.

Memory 1020 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides processor 1010 with program instructions and data stored in memory 1020. In the embodiment of the present invention, the memory 1020 may be used to store a program of a play control method of multimedia contents.

By calling the program instructions stored in the memory 1020, the processor 1010 is configured to perform the following steps according to the obtained program instructions:

searching a face image in the playing environment image;

Optionally, the playing environment at least includes ambient brightness; then, determining that the playing environment of the multimedia content meets a preset trigger condition, the processor 1010 is configured to: judging whether the value of the intensity reduction of the ambient brightness is not less than a preset threshold value, and if the value of the intensity reduction of the ambient brightness is not less than the preset threshold value, determining an area with the intensity reduction of the ambient brightness; and judging whether the shape of the region conforms to the shape of the figure according to the shape of the region and the shape of the preset figure, and if so, determining that the ambient brightness conforms to the preset triggering condition.

Optionally, the playing environment at least includes a user sound; then, determining that the playing environment of the multimedia content meets a preset trigger condition, the processor 1010 is configured to: and confirming the user category corresponding to the voice characteristics of the user voice appearing in the playing environment, and setting the user category to be prohibited from watching the multimedia content.

Optionally, the processor 1010 is specifically configured to: extracting preset characteristic information in the face image; and comparing the preset feature information in the face image with a preset face model to determine the corresponding user category.

Optionally, the processor 1010 is further configured to: receiving a voice sample input by a user and a user category corresponding to the voice sample set by the user; and/or receiving an image sample input by a user and a user category corresponding to the image sample set by the user.

Optionally, the processor 1010 is further configured to: comparing the user voice appearing in the playing environment with the voice sample, and if the similarity is determined to be greater than a first set value, determining the user category corresponding to the user voice appearing in the playing environment according to the user category corresponding to the voice sample set by the user;

the processor 1010 is specifically configured to determine a user category corresponding to the feature information of the face image, and to:

and comparing the face image with the image sample, and if the similarity is determined to be greater than a second set value, determining the user type corresponding to the face image according to the user type corresponding to the image sample set by the user.

Optionally, the processor 1010 is further configured to:

determining to switch to a video channel corresponding to a user category according to a preset mapping relation between the user category and the video channel; or the like, or, alternatively,

and recommending the multimedia content corresponding to the user category according to the user category.

In another implementation manner of the embodiment of the present invention, by calling the program instruction stored in the memory 1020, the processor 1010 is further configured to execute, according to the obtained program instruction:

acquiring user sound appearing in a playing environment of multimedia content;

Referring to fig. 11, a schematic structural diagram of a terminal according to an embodiment of the present invention is shown.

The embodiment of the invention provides a terminal which can be but is not limited to a mobile phone, a tablet computer, a smart television and the like. The terminal may include: memory 1110, input module 1120, transmitting module 1130, receiving module 1140, output module 1150, wireless communication module 1160, and processor 1170, sensors 1180, audio circuit 1190, and the like. The method specifically comprises the following steps:

memory 1110 may include read-only memory (ROM) and random-access memory (RAM), and provides processor 1170 with program instructions and data stored in memory 1110, as well as storing the terminal's operating system, Application programs (APPs) (e.g., video APPs), various data used by the modules and the terminal, and so forth.

The input module 1120 may include a keyboard, a mouse, a touch screen, etc. for receiving numbers, character information or touch operations input by a user, and generating input of key signals related to user settings and function control of the terminal, for example, in an embodiment of the present invention, the input module 1120 may receive a click operation performed on a video APP of the terminal by the user, video keywords input when searching for the video, etc. In particular, input module 1120 may include image input devices as well as other input devices. The image input device can be a camera and can also be a photoelectric scanning device. The input module 1120 may include other input devices in addition to the image input device. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like. For example, in the embodiment of the present invention, the playing environment image may be acquired by a camera.

The sending module 1130 may provide an interface between the terminal and the server.

The receiving module 1140 also provides an interface between the terminal and the server, for example, in an embodiment of the present invention, the receiving module is used for receiving personalized recommended video information returned by the server, and the like.

The output module 1150 may include a Display module such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), etc., wherein the Display module may be used to Display information input by or provided to a user, or menus of various terminals or social applications, user interfaces, etc. For example, the embodiment of the invention can be used for displaying the provided video information to the user.

The wireless communication module 1160 includes, but is not limited to, a wireless fidelity (WiFi) module, a bluetooth module, an infrared communication module, and the like. For example, in the embodiment of the present invention, the receiving module 1140 and the sending module 1130 in the terminal exchange information with the server through the wifi module, so as to achieve communication with the server.

The processor 1170 is a control center of the terminal, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory 1110 and calling data stored in the memory 1110, thereby performing overall monitoring of the terminal.

Sensors 1180 such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may acquire ambient brightness, and the ambient light sensor may further adjust brightness of a display panel of the terminal according to brightness of ambient light, and the proximity sensor may turn off the display panel and/or backlight when the terminal moves to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; in addition, the terminal may be further configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Audio circuitry 1190, speaker 1191, and microphone 1192 may provide an audio interface between the user and the terminal. The audio circuit 1190 may transmit the received electrical signal converted from the audio data to the speaker 1191, and convert the electrical signal into an audio signal for output by the speaker 1191; on the other hand, the microphone 1192 converts the collected sound signal into an electric signal, receives the electric signal by the audio circuit 1190, converts the electric signal into audio data, processes the audio data by the audio data output processor 1170, and transmits the processed audio data to another electronic device via the wireless communication module 1160, for example, or outputs the audio data to the memory 1110 for further processing. The audio circuitry 1190 may also include an ear-bud jack to provide communication of peripheral headphones with the terminal. For example, in the embodiment of the present invention, sound may be collected by the microphone 1192, wherein the microphone 1192 may also be referred to as a microphone.

Of course, the configuration of the terminal shown in fig. 11 is merely an example, and may include more or fewer components than those shown, or some of the components may be combined, or a different arrangement of components.

Based on the above embodiments, in an embodiment of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a play control method of multimedia content in any of the above method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A method for controlling playback of multimedia content, comprising:

searching a face image in the playing environment image;

determining to stop playing the multimedia content when the user category is set to prohibit viewing of the multimedia content;

wherein, the playing environment at least comprises the ambient brightness or the user sound; determining that the playing environment of the multimedia content meets a preset trigger condition includes: judging whether the value of the intensity reduction of the ambient brightness is not less than a preset threshold value, and if the value of the intensity reduction of the ambient brightness is not less than the preset threshold value, determining an area with the intensity reduction of the ambient brightness; judging whether the shape of the region conforms to the shape of the figure according to the shape of the region and the shape of the preset figure, and if so, determining that the ambient brightness conforms to a preset trigger condition;

or confirming the user category corresponding to the voice characteristics of the user voice appearing in the playing environment, and setting the user category to be prohibited from watching the multimedia content.

2. The method of claim 1, further comprising:

receiving a voice sample input by a user and a user category corresponding to the voice sample set by the user; and/or the presence of a gas in the gas,

receiving an image sample input by a user and a user category corresponding to the image sample set by the user.

3. The method of claim 2, further comprising:

comparing the user voice appearing in the playing environment with the voice sample, and if the similarity is determined to be greater than a first set value, determining the user category corresponding to the user voice appearing in the playing environment according to the user category corresponding to the voice sample set by the user;

the determining the user category corresponding to the feature information of the face image specifically includes:

4. The method of claim 1, wherein after determining the user category corresponding to the feature information of the face image, the method further comprises:

5. A method for controlling playback of multimedia content, comprising:

sending the playing environment image to a multimedia content server;

stopping playing the multimedia content when the playing control instruction is confirmed to carry indication information for prohibiting playing the multimedia content, wherein the multimedia content server searches a face image in the playing environment image, extracts preset feature information in the face image, and carries the indication information in the playing control instruction when determining that a user category corresponding to the feature information of the face image is prohibited to watch the multimedia content;

6. A playback control apparatus for multimedia content, comprising:

the control module is used for determining to stop playing the multimedia content when the user category is set to be forbidden to watch the multimedia content;

wherein, the playing environment at least includes the ambient brightness or the user sound, and further includes a first determining module for: judging whether the value of the intensity reduction of the ambient brightness is not less than a preset threshold value, and if the value of the intensity reduction of the ambient brightness is not less than the preset threshold value, determining an area with the intensity reduction of the ambient brightness; judging whether the shape of the region conforms to the shape of the figure according to the shape of the region and the shape of the preset figure, and if so, determining that the ambient brightness conforms to a preset trigger condition;

7. A playback control apparatus for multimedia content, comprising:

a processing module, configured to stop playing the multimedia content when it is determined that the play control instruction carries indication information for prohibiting playing the multimedia content, where: the multimedia content server searches a face image in the playing environment image, extracts preset characteristic information in the face image, and carries the indication information in the playing control instruction when determining that the user category corresponding to the characteristic information of the face image is forbidden to watch the multimedia content;

wherein, the playing environment at least comprises the environment brightness or the user sound, and the first determining module is used for: judging whether the value of the intensity reduction of the ambient brightness is not less than a preset threshold value, and if the value of the intensity reduction of the ambient brightness is not less than the preset threshold value, determining an area with the intensity reduction of the ambient brightness; judging whether the shape of the region conforms to the shape of the figure according to the shape of the region and the shape of the preset figure, and if so, determining that the ambient brightness conforms to a preset trigger condition;

8. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when being executed by a processor realizes the steps of the method as claimed in any one of the claims 1-4 or 5.