CN111629222A

CN111629222A - Video processing method, device and storage medium

Info

Publication number: CN111629222A
Application number: CN202010482616.2A
Authority: CN
Inventors: 阳萍
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-04
Anticipated expiration: 2040-05-29
Also published as: CN111629222B

Abstract

The embodiment of the application discloses a video processing method, video processing equipment and a storage medium, which are suitable for scenes such as cloud live broadcast, cloud video and cloud conference, and the video processing method comprises the following steps: displaying a live broadcast page, and playing a live broadcast video in the live broadcast page; monitoring the expression of a target audience of a live video in the playing process of the live video; when it is monitored that target audience generates target expressions, acquiring attribute data of the target expressions; and determining the attention content of the target audience in the live video according to the attribute data of the target expression. By adopting the method and the device, the attention content of the target audience can be quickly determined in the live video in the playing process of the live video, and the determination efficiency of the attention content is effectively improved.

Description

Video processing method, device and storage medium

Technical Field

The present application relates to the field of internet technologies, and in particular, to the field of live video technologies, and in particular, to a video processing method, a video processing device, and a computer-readable storage medium.

Background

With the development of internet technology, live network video is applied to various fields such as education, finance and the like as a novel video playing mode. For example, a teacher acts as a master and gives lessons through network live broadcasting, and students act as audiences and listen to lessons by watching live video; in another example, a merchant acts as a host to recommend and sell commodities through network live broadcast, and a buyer acts as a viewer to purchase commodities by watching live broadcast video and clicking a commodity link in the live broadcast video.

Currently, in the playing process of a live video, audiences can record concerned content in the live video (for example, an unintelligible video clip in a teacher live video and a commodity video clip of a purchaser in a merchant live video), so that the concerned content can be quickly played in a playback video. The mode of recording the attention content by the audience is generally manual recording, for example, a student records the playing time point or the playing time period of an unintelligible video clip in a notebook, and a buyer records the playing time point or the playing time period of a commodity video clip of a heart instrument in the notebook. However, when the viewer manually records the content of interest during the playing of the live video, not only the time cost is wasted, but also other content of interest in the live video is easily missed.

Disclosure of Invention

The embodiment of the application provides a video processing method, video processing equipment and a storage medium, which can quickly determine the attention content of a target audience in a live video in the playing process of the live video and effectively improve the determination efficiency of the attention content.

In one aspect, an embodiment of the present application provides a video processing method, where the method includes:

displaying a live broadcast page, and playing a live broadcast video in the live broadcast page;

monitoring the expression of a target audience of a live video in the playing process of the live video;

when it is monitored that target audience generates target expressions, acquiring attribute data of the target expressions;

and determining the attention content of the target audience in the live video according to the attribute data of the target expression.

In another aspect, an embodiment of the present application provides a video processing method, where the method includes:

displaying the playback page, and playing the playback video in the playback page, wherein the playback video is generated by recording the live video;

obtaining the concerned content of a target audience of a live video, wherein the concerned content is obtained by processing according to the video processing method;

generating an attention mark of a playback video according to the attention content;

the focus mark is displayed during the playing of the playback video.

obtaining the concerned content of at least one audience of a live video, wherein the concerned content of each audience is obtained by processing according to the video processing method;

generating statistical information of the played back video according to the attention content of at least one viewer;

and displaying the statistical information during the playing process of the playback video.

receiving expression images of at least one audience of a live video and monitoring time of the expression images;

recognizing the expression image;

when the target expression image is identified, acquiring attribute data of the target expression;

and determining the attention content of at least one viewer in the live video according to the attribute data of the target expression.

On the other hand, an embodiment of the present application provides a video processing apparatus, where the video processing apparatus is provided in a video processing device, and the video processing apparatus includes:

the display unit is used for displaying the live broadcast page and playing a live broadcast video in the live broadcast page;

a processing unit to:

In one implementation, the attribute data of the target expression comprises a target expression image and monitoring time of the target expression image; the processing unit is specifically configured to:

acquiring a playing time point of a live video corresponding to the monitoring time of the target expression image;

and determining the video frame corresponding to the playing time point in the live video as the attention content of the target audience.

In one implementation, the attribute data of the target expression comprises a plurality of target expression images and the monitoring time of each target expression image; the processing unit is specifically configured to:

sequencing the target expression images according to the sequence of the monitoring time of the target expression images;

acquiring a first playing time point of a live video corresponding to the monitoring time of a target expression image with the first order and acquiring a second playing time point of the live video corresponding to the monitoring time of a target expression image with the last order;

determining the time range from the first playing time point to the second playing time point as the duration of the target expression;

and determining a plurality of video frames corresponding to the duration of the target expression in the live video as the attention content of the target audience.

In one implementation, the processing unit is specifically configured to call the image pickup device to shoot an expression image of a target audience, and record monitoring time of the expression image, where the monitoring time of the expression image is shooting time of the image pickup device for shooting the expression image; the processing unit is further configured to:

calling an expression recognition model to recognize the expression image, and determining that the target audience generates the target expression when the expression recognition model is recognized to contain the target expression image; alternatively, the first and second electrodes may be,

and sending the expression images and the monitoring time of the expression images to a server so as to enable the server to identify the expression images, and receiving attribute data of the target expression returned by the server when the server identifies that the target expression images are contained.

the display unit is used for displaying the playback page and playing the playback video in the playback page, and the playback video is generated by recording the live video;

a processing unit to:

and the display unit is also used for displaying the attention mark in the playing process of the playback video.

In one implementation, the playback page includes a play timeline for playback of the video; the display unit is specifically configured to:

acquiring a position area corresponding to the attention content in a playing time axis of a playback video;

displaying a focus mark in a position area corresponding to the focus content;

wherein the focus mark is distinctively displayed in a playback time axis of the playback video; the differential display includes: the attention mark is displayed in a first color, and other position areas except the attention mark are displayed in a second color; alternatively, the focus mark is displayed in a first shape, and the other position area than the focus mark is displayed in a second shape.

In one implementation, the display unit is specifically configured to display a focus list in the playback page and to display a focus mark in the focus list.

In one implementation, the display unit is further configured to jump to the content of interest identified by the attention mark and play the content of interest identified by the attention mark when the attention mark is triggered.

a processing unit to:

and the display unit is also used for displaying the statistical information in the playing process of the playback video.

In one implementation, the statistical information includes at least one piece of attention content, a playing time point of each piece of attention content in the playback video, and an attention detail of each piece of attention content, wherein the attention detail includes a viewer identifier and a viewer number; the display unit is specifically configured to:

generating a statistical curve according to the statistical information;

and displaying the statistical curve during the playing process of the playback video.

In one implementation, the display unit is further configured to, when the statistical curve is triggered, acquire a triggered play time point, and display, in the playback page, the attention details of the attention content corresponding to the triggered play time point.

the receiving unit is used for receiving the expression image of at least one viewer of the live video and the monitoring time of the expression image;

a processing unit to:

recognizing the expression image;

In one implementation, the target audience is any of at least one audience; the processing unit is further configured to:

sending the attribute data of the target expression of the target audience to the target audience so that the target audience determines the attention content of the target audience in the live video according to the attribute data of the target expression and generates an attention mark of a playback video according to the attention content; alternatively, the first and second electrodes may be,

sending the attention content of the target audience to the target audience so that the target audience generates an attention mark of a playback video according to the attention content; alternatively, the first and second electrodes may be,

and generating statistical information of a playback video corresponding to the live video according to the attention content of at least one audience, and sending the statistical information to a main broadcast of the live video.

In another aspect, an embodiment of the present application provides a video processing apparatus, including:

a processor adapted to implement one or more instructions; and the number of the first and second groups,

a computer readable storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to execute the video processing method described above.

In another aspect, embodiments of the present application provide a computer-readable storage medium, which stores one or more instructions, where the one or more instructions are adapted to be loaded by a processor and execute the video processing method described above.

In the embodiment of the application, the expressions of target audiences watching the live video can be monitored in the playing process of the live video, when the target audiences are monitored to generate the target expressions, the attribute data of the target expressions are automatically acquired, and the attention content of the target audiences is determined in the live video according to the attribute data of the target expressions. Therefore, by monitoring the expression change condition of the target audience watching the live video, the attention content of the target audience in the live video can be quickly determined when the target audience has the target expression, and the attention content does not need to be determined by other modes (such as manually recording the playing time point and the playing time period of the attention content), so that the time cost is saved, and the determination efficiency of the attention content can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1a is a schematic diagram illustrating an architecture of a video live broadcast system according to an exemplary embodiment of the present application;

FIG. 1b is a block diagram illustrating an architecture of a video processing system according to an exemplary embodiment of the present application;

fig. 1c is a scene schematic diagram illustrating an expression monitoring method according to an exemplary embodiment of the present application;

FIG. 2a is a schematic diagram illustrating an interface of a student terminal provided by an exemplary embodiment of the present application;

FIG. 2b is a schematic diagram illustrating an interface of an instructor terminal according to an illustrative embodiment of the present application;

fig. 3 is a flow chart illustrating a video processing method according to an exemplary embodiment of the present application;

FIG. 4a is a diagram illustrating a method for determining attention content according to an exemplary embodiment of the present application;

FIG. 4b is a diagram illustrating a method for determining attention content according to another exemplary embodiment of the present application;

fig. 5 is a flow chart illustrating a video processing method according to another exemplary embodiment of the present application;

FIG. 6a is a schematic diagram illustrating a display interface of a focus mark provided by an exemplary embodiment of the present application;

FIG. 6b is a schematic illustration of a display interface for a focus mark provided by another exemplary embodiment of the present application;

fig. 7 is a flowchart illustrating a video processing method according to another exemplary embodiment of the present application;

fig. 8 is a flow chart illustrating a video processing method according to another exemplary embodiment of the present application;

fig. 9 is a flow chart illustrating a video processing method according to another exemplary embodiment of the present application;

fig. 10 is a flow chart illustrating a video processing method according to another exemplary embodiment of the present application;

fig. 11 is a schematic structural diagram of a video processing apparatus according to an exemplary embodiment of the present application;

fig. 12 is a schematic structural diagram of a video processing apparatus according to another exemplary embodiment of the present application;

fig. 13 shows a schematic structural diagram of a video processing device according to an exemplary embodiment of the present application.

Detailed description of the invention

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Cloud Technology refers to a hosting Technology for unifying resources of hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Cloud Computing (Cloud Computing) is a Computing model that distributes Computing tasks over a resource pool of large numbers of computers, enabling various application systems to obtain Computing power, storage space, and information services as needed. The network that provides the resources is referred to as the "cloud". Resources in the "cloud" appear to the user to be infinitely expandable and to be available at any time, available on demand, expandable at any time, and pay per use. As a basic capability provider of cloud computing, a cloud computing resource pool (referred to as a cloud platform for short), generally referred to as an Iaa S (Infrastructure as a Service) platform, is established, and multiple types of virtual resources are deployed in the cloud computing resource pool and are selectively used by external clients. The cloud computing resource pool mainly comprises: computing devices (which are virtualized machines, including operating systems), storage devices, and network devices. According to the logic function division, a Platform as a Service (PaaS) layer can be deployed on the IaaS layer, a Software as a Service (SaaS) layer is deployed on the PaaS layer, and the SaaS layer can be directly deployed on the IaaS layer. PaaS is a platform on which software (e.g., databases, web containers, etc.) runs. SaaS is a wide variety of business software (e.g., web portal, sms, etc.). Generally speaking, SaaS and PaaS are upper layers relative to IaaS.

Cloud computing can be applied to various fields of daily life. For example, Cloud Computing Education (CCEDU), Cloud conference, and the like. Cloud education refers to educational platform services based on cloud computing business model applications. On the cloud platform, all education institutions, training institutions, enrollment service institutions, propaganda institutions, industry associations, management institutions, industry media, legal structures and the like are integrated into a resource pool in a centralized cloud mode, all resources are mutually displayed and interacted and communicated according to needs to achieve intentions, so that education cost is reduced, and efficiency is improved. The cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology. A user can share voice, data files and videos with teams and clients all over the world quickly and efficiently only by performing simple and easy-to-use operation through an internet interface, and complex technologies such as transmission and processing of data in a conference are assisted by a cloud conference service provider to operate. At present, domestic cloud conferences mainly focus on service contents which mainly adopt an SaaS mode, wherein the service contents comprise service forms such as telephone, network, video and the like, and the cloud computing-based video conference is called as a cloud conference. In the cloud conference era, data transmission, processing and storage are all processed by computer resources of video conference manufacturers, so that users can carry out efficient teleconference simply by opening a browser and logging in a corresponding interface without purchasing expensive hardware and installing complicated software. The cloud conference system supports multi-server dynamic cluster deployment, provides a plurality of high-performance servers, and greatly improves conference stability, safety and usability. In recent years, video conferences are popular with many users because of greatly improving communication efficiency, continuously reducing communication cost and bringing about upgrading of internal management level, and the video conferences are widely applied to various fields such as governments, armies, transportation, finance, operators, education, enterprises and the like. Undoubtedly, after the video conference uses cloud computing, the cloud computing has stronger attraction in convenience, rapidness and usability, and the arrival of new climax of video conference application is necessarily stimulated.

The embodiment of the present application relates to a live video technology, and the principle of the live video technology will be briefly described below with reference to a live video system shown in fig. 1 a. As shown in fig. 1a, the video live system comprises a main terminal 102, a server 103 and at least one viewer terminal 101. The anchor terminal 102 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The at least one spectator terminal 101 may also be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, etc. The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The anchor terminal 102, the server 103 and the at least one viewer terminal 101 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.

In a video live system consisting of a main broadcasting terminal 102, a server 103 and at least one viewer terminal 101, the principle of the video live technology is as follows: anchor terminal 102 collects video files and audio files through hardware devices (e.g., cameras, sound pickups, etc.); processing the acquired audio file and video file by the anchor terminal 102, for example, adding a watermark to the video file, adding a filter to the video file, or performing noise reduction processing on the audio file; thirdly, the anchor terminal 102 compresses and encodes the processed video file and audio file; the anchor terminal 102 encapsulates the encoded video file and audio file into a streaming media data packet; the anchor terminal 102 transmits the encapsulated Streaming media data packet to the server 103 through a Streaming media transmission Protocol such as RTMP (Real Time Messaging Protocol), HLS (HTTP Live Streaming, HTTP (Hyper Text Transfer Protocol) Real-Time Streaming), and the like; sixthly, the server 103 distributes the streaming media data packets to each viewer terminal 101 through the streaming media transmission protocol; and after receiving the streaming media data packet, each spectator terminal 101 decapsulates and decodes the streaming media data packet, and plays the decoded video file and audio file (i.e., live video).

In one implementation, in the process of playing the live video, the viewer terminal 101 may record the live video to generate a playback video of the live video; or, in the process of playing the live video, the server 103 may record the live video to generate a playback video of the live video, and the server 103 sends the playback video to each viewer terminal 101; alternatively, the anchor terminal 102 may record the live video to generate a playback video of the live video, and the anchor terminal 102 transmits the playback video to the server 103 and each of the viewer terminals 101.

Based on the above description, please refer to fig. 1b, and fig. 1b shows an architecture diagram of a video processing system according to an exemplary embodiment of the present application. As shown in fig. 1b, the video processing system comprises a main terminal 102, a server 103 and any one of the viewer terminals 101 in the video live system shown in fig. 1 a. The embodiment of the present application is described by taking one viewer terminal 101 as an example, and in an actual scene, the video processing system may include at least one viewer terminal 101.

In a video processing system composed of a main broadcasting terminal 102, a server 103 and any one of the audience terminals 101, besides implementing the video live broadcasting technology implemented by the video live broadcasting system shown in fig. 1a, other methods can be implemented, and the specific implementation method is as follows:

(1) for the viewer terminal 101:

the viewer terminal 101 displays a live broadcast page, and plays a live broadcast video in the live broadcast page, where the live broadcast video is transmitted from the server 103 to the viewer terminal 101 through the video live broadcast technique. In the playing process of a live video, as shown in fig. 1c, fig. 1c shows a scene schematic diagram of an expression monitoring method provided by an exemplary embodiment of the present application, where a viewer terminal 101 invokes an image pickup device (e.g., a camera) to monitor an expression of a target viewer (i.e., a viewer user watching the live video by using the viewer terminal 101), that is, the viewer terminal 101 invokes the image pickup device to shoot an expression image of the target viewer. When the audience terminal 101 calls the expression recognition model to recognize that the target expression appears in the expression image of the target audience, the audience terminal 101 determines that the target expression of the target audience is monitored, and the audience terminal 101 acquires attribute data of the target expression. Accordingly, the viewer terminal 101 may determine the attention content of the target viewer in the live video according to the attribute data of the target expression. By target expression is meant an expression that occurs when a particular emotion of interest (e.g., an emotion of interest, confusion, doubt, etc.) is produced for something, which may be an expression produced by facial movements, including but not limited to: dilated pupil, wrinkled eyebrow, contracted eyes, etc.

The attribute data of the target expression can comprise a target expression image and monitoring time of the target expression image; alternatively, the attribute data of the target expression may include a plurality of target expression images and the monitoring time of each target expression image. The monitoring time of the target expression image is a shooting time when the viewer terminal 101 calls the image pickup apparatus to shoot the target expression image. The expression recognition model may be a facial expression recognition model based on ANN (Artificial Neural Network), a facial expression recognition model based on CNN (Convolutional Neural Network), or the like.

In one implementation, after the live video is played, the viewer terminal 101 may display a playback page, and play a playback video in the playback page, where the playback video is generated by the viewer terminal 101 by recording the live video. After the audience terminal 101 determines the attention content of the target audience in the live video, the attention mark of the playback video may be generated according to the attention content, and the attention mark may be displayed in the playing process of the playback video. The target audience can trigger the attention mark in the process of watching the playback video so as to view the attention content of the target audience in the live video. Among other things, the viewer terminal 101 may display a focus mark in a play time axis (e.g., a progress bar) of the playback video; alternatively, the viewer terminal 101 may display the attention mark in an attention list of the playback video.

(2) For anchor terminal 102:

after the live video is played, the anchor terminal 102 may display a playback page, and play a playback video in the playback page, where the playback video is generated by recording the live video by the anchor terminal 102. Anchor terminal 102 may obtain the attention of at least one viewer of anchor terminal 102, where the attention of each viewer is determined by the viewer terminal 101 corresponding to the viewer in the live video. Anchor terminal 102 may count the content of interest of at least one viewer, generate statistics information for the playback video, generate a statistics curve based on the statistics information, and display the statistics curve during the playback video.

The statistical information may include at least one content of interest, a playing time point of each content of interest in the playback video, and a detail of interest of each content of interest. The details of interest for each content of interest may include viewer identification and viewer number. An anchor (i.e., an anchor user who records a live video using anchor terminal 102) may trigger a statistical curve during viewing of a playback video to view attention details of an attention content corresponding to a triggered play time point.

(3) For the server 103:

if the audience terminal 101 does not have the ability to recognize the expression image, the audience terminal 101 may send the expression image of the target audience captured by the audience terminal 101 calling the image capturing device to the server 103, and the server 103 recognizes the expression image. The server 103 may receive the emoticons and the monitoring time of the emoticons transmitted by at least one viewer of the live video through the viewer terminal 101. The server 103 may call the expression recognition model to recognize the expression image, and when it is recognized that the target expression image is included, obtain attribute data of the target expression. The server 103 may determine the attention content of at least one viewer in the live video based on the attribute data of the target expression.

In one implementation, the server 103 may send attribute data of a target expression of a target viewer to the viewer terminal 101 of the target viewer, so that the viewer terminal 101 of the target viewer determines a content of interest of the target viewer in a live video according to the attribute data of the target expression, generates a mark of interest of a playback video according to the content of interest, and displays the mark of interest in a playing process of the playback video; alternatively, the server 103 may directly transmit the attention content of the target viewer to the viewer terminal 101 of the target viewer, so that the viewer terminal 101 of the target viewer generates an attention mark of the playback video according to the attention content and displays the attention mark during the playing of the playback video.

In one implementation, if the anchor terminal 102 does not have the capability of performing statistics on the content of interest of at least one viewer, the server 103 may perform statistics on the content of interest of at least one viewer to generate statistical information of a playback video corresponding to a live video. The server 103 sends the statistical information of the playback video to the anchor terminal 102, so that the anchor terminal 102 generates a statistical curve according to the statistical information and displays the statistical curve in the playing process of the playback video.

In the embodiment of the application, the audience terminal can monitor the expression of a target audience watching a live video in the playing process of the live video, automatically acquire the attribute data of the target expression when the target audience is monitored to generate the target expression, and determine the attention content of the target audience in the live video according to the attribute data of the target expression; by monitoring the expression change condition of a target audience watching a live video, the attention content of the target audience in the live video is quickly determined, and the attention content is determined without other modes (such as manually recording the playing time point and the playing time period of the attention content), so that the time cost is saved, and the determination efficiency of the attention content can be effectively improved. In addition, after the audience terminal determines the attention content of the target audience in the live video, the attention mark of the playback video can be generated according to the attention content of the target audience, and the attention mark is displayed in the playing process of the playback video; the attention mark is highlighted in the playback video of the audience terminal, the target audience can trigger the attention mark in the process of watching the playback video, the attention content identified by the attention mark is checked, the attention content is not required to be searched by sliding or clicking the playing time axis of the playback video for many times, the time cost is saved, and the watching efficiency of the attention content can be improved. In addition, the anchor terminal can obtain the concerned content determined by at least one audience through the audience terminal, generate a statistical curve of the playback video according to the concerned content of the at least one audience, and display the statistical curve in the playing process of the playback video; the statistical curve is highlighted in the playback video of the anchor terminal, the anchor can trigger the statistical curve in the process of watching the playback video, the attention details of the attention content corresponding to the triggered playing time point (namely the audience identification and the audience number paying attention to the attention content) can be checked, the statistical curve can reflect the attention condition of all audiences of the anchor to the live broadcast video, the anchor can adjust the live broadcast content according to the statistical curve (for example, the attention content of the audiences is properly increased, the attention-free content of the audiences is properly reduced, and the like), and the live broadcast quality of the anchor is favorably improved. In addition, when the audience terminal and the anchor terminal do not have the capability of calculating a large amount of data, the server may provide the calculation capability for the audience terminal and the anchor terminal, for example, the server identifies an expression image of at least one audience, the server determines the attention content of the at least one audience in the live video according to the attribute data of the target expression, and the server generates the statistical information of the playback video corresponding to the live video according to the attention content of the at least one audience, which is beneficial to improving the overall processing efficiency of the video processing system.

It is to be understood that the video processing system described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not constitute a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that as the system architecture evolves and new service scenes appear, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

For example, the video processing system described in the embodiment of the present application may be applied to a scene of live lecture. The teacher is used as a main broadcast, the students are used as audiences, the teacher carries out live broadcast through the teacher terminal, and the students watch live broadcast videos through the student terminals.

For the student terminals, the student terminals display live broadcast pages and play live broadcast videos in the live broadcast pages. In the playing process of the live video, the student terminal calls the camera shooting equipment to shoot expression images of students watching the live video. When the student terminal calls the expression recognition model to recognize that a target expression (such as pupil enlargement, eye reduction, eyebrow wrinkling and the like) appears in the expression image of the student, the student terminal can determine the attention content of the student in the live video (such as a video clip of the student who is in doubt in the live video, a video clip of the student who is not in comprehension in the live video, a video frame of the student who is in doubt in the live video and the like). After the live video playing is finished, the student terminal displays the playback page, and plays the playback video in the playback page, wherein the playback video is generated by the student terminal by recording the live video. The student terminal may generate an attention mark of the playback video according to the attention content, as shown in fig. 2a, fig. 2a shows an interface schematic diagram of a student terminal provided by an exemplary embodiment of the present application, and the student terminal may display the attention mark in a progress bar of the playback video. The student can trigger the attention mark in the process of watching the playback video so as to view attention content of the student in the live video. Through the mode, the attention mark is highlighted in the playback video of the student terminal, the student can trigger the attention mark in the process of watching the playback video, the attention content identified by the attention mark is checked, the attention content is not required to be searched by sliding or clicking the progress bar of the playback video for many times, the time cost is saved, and the learning efficiency of the student can be improved.

For the teacher terminal, after the live video is played, the teacher terminal can display a playback page, and play a playback video in the playback page, wherein the playback video is generated by the teacher terminal by recording the live video. The teacher terminal can obtain the attention content of at least one student of the teacher terminal, and the attention content of each student is determined by the student terminal corresponding to the student in the live video. The teacher terminal may count the attention content of at least one student to generate a statistical curve of the playback video, as shown in fig. 2b, where fig. 2b shows an interface schematic diagram of the teacher terminal according to an exemplary embodiment of the present application, and the teacher terminal may display the statistical curve during the playing process of the playback video. The teacher may trigger a statistical curve during the process of viewing the playback video to view the attention details of the attention content corresponding to the triggered play time point (for example, the student identification and the number of students with the target expression appearing in the expression image corresponding to the triggered play time point). Through the mode, a teacher can trigger a statistical curve in the process of watching the playback video and check the attention details of the attention content corresponding to the triggered playing time point, the statistical curve can reflect the attention conditions of all students of the teacher to the playback video, and the teacher can adjust a teaching scheme according to the statistical curve, so that the teaching quality is favorably improved.

Based on the above description, please refer to fig. 3, fig. 3 shows a flowchart of a video processing method provided in an exemplary embodiment of the present application, which may be executed by the viewer terminal 101 in the embodiment shown in fig. 1b, and the video processing method includes the following steps S301 to S304:

step S301, displaying the live broadcast page, and playing the live broadcast video in the live broadcast page.

Step S302, in the playing process of the live video, the expression of the target audience of the live video is monitored.

In one implementation mode, in the playing process of the live video, the audience terminal can call the camera device to shoot the expression image of the target audience, and record the monitoring time of the expression image. And the monitoring time of the expression image is the shooting time of the audience terminal for shooting the expression image.

Step S303, when it is monitored that the target audience generates the target expression, acquiring attribute data of the target expression.

In one implementation mode, the audience terminal can call an expression recognition model to recognize the expression image of the target audience, and when the expression image of the target audience contains the target expression, the audience terminal determines to monitor that the target audience generates the target expression. For example, in a live teaching scene, when the audience terminal recognizes that the expression image of the target audience includes expressions such as pupil dilation, eye reduction, eyebrow wrinkling and the like, the audience terminal determines that the target audience generates the target expression.

In one implementation, when it is monitored that a target audience generates a target expression, the audience terminal acquires attribute data of the target expression. If the target expression image identified by the audience terminal is one, the attribute data of the target expression can comprise one target expression image and the monitoring time of the target expression image; or, if the target expression images identified by the viewer terminal are multiple (two or more), the attribute data of the target expression may include multiple target expression images and the monitoring time of each target expression image.

And step S304, determining the attention content of the target audience in the live video according to the attribute data of the target expression.

In one implementation, the attribute data of the target expression may include a target expression image and a monitoring time of the target expression image. As shown in fig. 4a, fig. 4a is a schematic diagram illustrating a method for determining content of interest according to an exemplary embodiment of the present application, and a specific implementation manner of determining, by a viewer terminal, content of interest of a target viewer in a live video according to attribute data of a target expression may be: the audience terminal can acquire the playing time point of the live video corresponding to the monitoring time of the target expression image, and the video frame corresponding to the playing time point in the live video is determined as the attention content of the target audience.

In one implementation, the attribute data of the target expression may include a plurality of target expression images and a monitoring time of each target expression image. As shown in fig. 4b, fig. 4b is a schematic diagram illustrating a method for determining attention content according to another exemplary embodiment of the present application, and a specific implementation manner of determining, by a viewer terminal, attention content of a target viewer in a live video according to attribute data of a target expression may be: sequencing the target expression images by the audience terminal according to the sequence of the monitoring time of the target expression images; the method comprises the steps that a spectator terminal obtains a first playing time point of a live video corresponding to the monitoring time of a target expression image with the first sequence and obtains a second playing time point of the live video corresponding to the monitoring time of a target expression image with the last sequence; the audience terminal determines the time range from the first playing time point to the second playing time point as the duration of the target expression; and the audience terminal determines a plurality of video frames corresponding to the duration of the target expression in the live video as the attention content of the target audience.

In one implementation mode, if the audience terminal does not have the ability of recognizing the expression image, the audience terminal can also send the expression image and the monitoring time of the expression image, which are shot by the audience terminal calling the camera device, to the server, the server recognizes the expression image, and when the target expression image is recognized, the server acquires the attribute data of the target expression and sends the acquired attribute data of the target expression to the audience terminal. Or the server can also determine the attention content of the target audience in the live video according to the attribute data of the target expression and send the attention content of the target audience to the audience terminal.

In one implementation, in the playing process of the live video, the audience terminal may also call the camera device to monitor the limb movement of the target audience of the live video, that is, the audience terminal calls the camera device to shoot the limb image of the target audience. When the audience terminal calls the limb action recognition model to recognize that the target limb action appears in the limb image of the target audience, the audience terminal 101 determines that the target limb action is generated by the target audience, and the audience terminal acquires the attribute data of the target limb action. Therefore, the audience terminal can determine the attention content of the target audience in the live video according to the attribute data of the target limb action. By target limb action is meant an action made when a particular emotion of interest (e.g., an emotion of interest, confusion, doubt, etc.) is created for something, which may include, but is not limited to, waving, lifting a hand, etc.

In one implementation, in the playing process of the live video, the audience terminal may also call the recording device to acquire the voice of the target audience of the live video, that is, the audience terminal calls the recording device to acquire the voice file of the target audience. When the audience terminal calls the voice recognition model to recognize that the target voice appears in the voice file of the target audience, the audience terminal determines that the target voice appears in the voice file of the target audience, and the audience terminal acquires attribute data of the target voice. Therefore, the audience terminal can determine the attention content of the target audience in the live video according to the attribute data of the target voice. By target speech is meant sound that is emitted when a particular emotion of interest (e.g., an emotion of interest, confusion, doubt, etc.) is created for something, and may include, but is not limited to, "i don't understand", "i don't do", etc.

In the embodiment of the application, the audience terminal can monitor the expression of a target audience watching a live video in the playing process of the live video, automatically acquire the attribute data of the target expression when the target audience is monitored to generate the target expression, and determine the attention content of the target audience in the live video according to the attribute data of the target expression; by monitoring the expression change condition of a target audience watching a live video, the attention content of the target audience in the live video is quickly determined, and the attention content is determined without other modes (such as manually recording the playing time point and the playing time period of the attention content), so that the time cost is saved, and the determination efficiency of the attention content can be effectively improved.

Referring to fig. 5, fig. 5 is a flowchart illustrating a video processing method provided in another exemplary embodiment of the present application, which may be executed by the viewer terminal 101 in the embodiment illustrated in fig. 1b, and the video processing method includes the following steps S501 to S507:

step S501, displaying a live broadcast page, and playing a live broadcast video in the live broadcast page.

Step S502, in the playing process of the live video, the expression of the target audience of the live video is monitored.

Step S503, when it is monitored that the target audience generates the target expression, acquiring attribute data of the target expression.

And step S504, determining the attention content of the target audience in the live video according to the attribute data of the target expression.

In this embodiment of the application, an execution process of step S501 is the same as an execution process of step S301 in the embodiment shown in fig. 3, an execution process of step S502 is the same as an execution process of step S302 in the embodiment shown in fig. 3, an execution process of step S503 is the same as an execution process of step S303 in the embodiment shown in fig. 3, an execution process of step S504 is the same as an execution process of step S304 in the embodiment shown in fig. 3, and a specific execution process may refer to the description of the embodiment shown in fig. 3, and is not described again here.

Step S505, a playback page is displayed, and a playback video is played in the playback page.

In one implementation, the playback video of the viewer terminal is generated by the viewer terminal recording the live video.

In step S506, an attention mark of the playback video is generated from the attention content.

In step S507, the attention mark is displayed during the playing of the playback video.

In one implementation, the playback page includes a play timeline for playback of the video. The audience terminal can acquire a position area corresponding to the attention content in a playing time axis of the playback video; the audience terminal displays the attention mark in a position area corresponding to the attention content in a distinguishing manner.

In one implementation, as shown in fig. 2a, in a play time axis of a playback video, a focus mark is displayed in a first color, and a position region other than the focus mark is displayed in a second color; alternatively, as shown in fig. 6a, fig. 6a is a schematic diagram of a display interface of a focus mark provided by an exemplary embodiment of the present application, in a play time axis of a playback video, the focus mark is displayed in a first shape (for example, an oval shape), and a position region other than the focus mark is displayed in a second shape (for example, a rectangle shape).

In one implementation, as shown in fig. 6b, fig. 6b is a schematic diagram of a display interface of a focus mark according to another exemplary embodiment of the present application, a focus list may further be included in a playback page, and the viewer terminal may further display the focus mark in the focus list.

In one implementation, when the attention mark in the playback video is triggered by the target viewer, the viewer terminal may jump to the attention content identified by the attention mark and play the attention content identified by the attention mark.

In the embodiment of the application, after the audience terminal determines the attention content of the target audience in the live video, the attention mark of the playback video can be generated according to the attention content of the target audience, and the attention mark is displayed in the playing process of the playback video; the attention mark is highlighted in the playback video of the audience terminal, the target audience can trigger the attention mark in the process of watching the playback video, the attention content identified by the attention mark is checked, the attention content is not required to be searched by sliding or clicking the playing time axis of the playback video for many times, the time cost is saved, and the watching efficiency of the attention content can be improved.

Referring to fig. 7, fig. 7 is a flowchart illustrating a video processing method provided in another exemplary embodiment of the present application, which may be executed by the anchor terminal 102 in the embodiment shown in fig. 1b, and the video processing method includes the following steps S701 to S704:

step S701 displays the playback video, and plays the playback video in the playback page.

In one implementation, the playback video of the anchor terminal is generated by the anchor terminal recording the live video.

Step S702, the attention content of at least one audience of the live video is acquired.

In an implementation manner, the anchor terminal obtains the attention content of at least one viewer of the live video, where the attention content of each viewer is obtained by processing, according to the video processing method shown in fig. 3, by using the viewer terminal, a specific execution process may refer to the description of the embodiment shown in fig. 3, and is not described herein again.

Step S703, generating statistical information of the playback video according to the attention content of at least one viewer.

In one implementation, the statistical information may include at least one content of interest, a playing time point of each content of interest in the playback video, and a content of interest detail of each content of interest, and the content of interest detail of each content of interest may include an identification of a viewer who cares about the content of interest and a number of viewers. The specific implementation manner of the anchor terminal generating the statistical information of the played back video according to the attention content of the at least one viewer can be as follows: and the anchor terminal generates a statistical curve according to the statistical information and displays the statistical curve in the playing process of the playback video.

Step S704, displaying the statistical information during the playing process of the playback video.

In one implementation, as shown in fig. 2b, when the statistics curve is triggered by the anchor, the anchor terminal obtains a playing time point at which the statistics curve is triggered, and displays the attention details of the attention content corresponding to the triggered playing time point (i.e. the audience identifier and the audience number corresponding to the triggered playing time point) in the playback page.

In the embodiment of the application, the anchor terminal can obtain the concerned content determined by at least one audience through the audience terminal, generate the statistical curve of the playback video according to the concerned content of the at least one audience, and display the statistical curve in the playing process of the playback video; the statistical curve is highlighted in the playback video of the anchor terminal, the anchor can trigger the statistical curve in the process of watching the playback video, the attention details (the audience identification and the audience number of the attention content) of the attention content corresponding to the triggered play time point can be checked, the statistical curve can reflect the attention condition of all audiences of the anchor to the live broadcast video, the anchor can adjust the live broadcast content according to the statistical curve, and the live broadcast quality of the anchor is favorably improved.

Referring to fig. 8, fig. 8 is a flowchart illustrating a video processing method provided in another exemplary embodiment of the present application, which may be executed by the server 103 in the embodiment shown in fig. 1b, and the video processing method includes the following steps S801 to S804:

step S801, receiving an expression image of at least one viewer of a live video and monitoring time of the expression image.

In one implementation, a server receives an emoticon of at least one viewer of a live video and a monitoring time of the emoticon. And the expression image of each audience is obtained by calling the camera equipment to shoot the expressions of the audiences in the playing process of the live video by the audience terminal used by the audience. The monitoring time of the expression image is the shooting time of the audience terminal calling the camera equipment to shoot the expression image.

Step S802, recognizing the expression image.

In one implementation, the server may invoke an expression recognition model to recognize an expression image of a target audience (any audience of a live video), and when it is recognized that the expression image of the target audience includes the target expression image, the server determines that the target audience generates a target expression. For example, in a live lecture scene, when the server recognizes that the target audience expression image contains expressions such as pupil enlargement, eye reduction, and eyebrow puckering, the server determines that the target audience is recognized to produce the target expression. According to this method, the server can recognize the expression image of each viewer.

In step S803, when it is recognized that the target expression image is included, the attribute data of the target expression is acquired.

Step S804, the attention content of at least one audience is determined in the live video according to the attribute data of the target expression.

In one implementation, the server may determine the content of interest of each viewer in the live video based on attribute data of the target expression of each viewer.

In one implementation mode, a server can send attribute data of a target expression of a target audience to the target audience, an audience terminal used by the target audience determines attention content of the target audience in a live video according to the attribute data of the target expression, generates an attention mark of a playback video according to the attention content, and displays the attention mark in the playing process of the playback video; alternatively, the server may send the attention content of the target audience to the target audience, and the attention terminal used by the target audience generates the attention mark of the playback video according to the attention content and displays the attention mark in the playing process of the playback video.

In one implementation, the server may generate statistical information of a playback video corresponding to the live video according to the attention content of at least one viewer, the server sends the statistical information to a main broadcast of the live video, and a main broadcast terminal used by the main broadcast displays the statistical information in a playing process of the playback video.

In the embodiment of the application, when the audience terminal and the anchor terminal do not have the capability of calculating a large amount of data, the server can provide the calculation capability for the audience terminal and the anchor terminal, for example, the server identifies the expression image of at least one audience, the server determines the attention content of the at least one audience in the live video according to the attribute data of the target expression, the server generates the statistical information of the playback video corresponding to the live video according to the attention content of the at least one audience, and the like, and the overall processing efficiency of the video processing system is improved based on the calculation capability of the server.

Referring to fig. 9, fig. 9 is a flowchart illustrating a video processing method provided in another exemplary embodiment of the present application, where the video processing method may be implemented by the viewer terminal 101 and the server 103 in the embodiment illustrated in fig. 1b, and the video processing method includes the following steps S901 to S909:

step S901, the viewer terminal displays a live broadcast page, and plays a live broadcast video in the live broadcast page.

Step S902, the audience terminal monitors the expression of the target audience of the live video in the playing process of the live video.

Step S903, the audience terminal sends the expression image of the target audience and the monitoring time of the expression image to a server.

In step S904, the server identifies the expression image.

In step S905, when it is identified that the target expression image is included, the server acquires attribute data of the target expression.

In an implementation manner, the server may further send the obtained attribute data of the target expression to a viewer terminal used by the target viewer, and the viewer terminal determines the attention content of the target viewer in the live video according to the attribute data of the target expression, generates an attention mark of the playback video according to the attention content, and displays the attention mark in the playing process of the playback video.

Step S906, the server determines the attention content of the target audience in the live video according to the attribute data of the target expression.

In step S907, the server sends the attention content of the target audience in the live video to the audience terminal.

In step S908, the viewer terminal generates an attention mark of the playback video from the attention content.

In step S909, the viewer terminal displays the attention mark during the playing of the playback video.

The execution process of each step in the embodiment of the present application can refer to the description of the above embodiment, and is not described herein again.

In the embodiment of the application, the audience terminal can monitor the expression of a target audience watching a live video in the playing process of the live video and send the monitored expression image and the monitoring time of the expression image to the server; when the server identifies that the target expression image appears in the expression image of the target audience, the server acquires attribute data of the target expression; the server determines the attention content of the target audience in the live broadcast video according to the attribute data of the target expression, sends the attention content to the audience terminal, rapidly determines the attention content of the target audience in the live broadcast video by monitoring the expression change condition of the target audience watching the live broadcast video, and does not need to determine the attention content by other modes (such as manually recording the playing time point and the playing time period of the attention content), so that the time cost is saved, and the determination efficiency of the attention content can be effectively improved. In addition, after the attention content of the target audience in the live video is determined, the audience terminal can also generate an attention mark of the playback video according to the attention content of the target audience, and display the attention mark in the playing process of the playback video; the attention mark is highlighted in the playback video of the audience terminal, the target audience can trigger the attention mark in the process of watching the playback video, the attention content identified by the attention mark is checked, the attention content is not required to be searched by sliding or clicking the playing time axis of the playback video for many times, the time cost is saved, and the watching efficiency of the attention content can be improved.

Referring to fig. 10, fig. 10 is a flowchart illustrating a video processing method provided in another exemplary embodiment of the present application, where the video processing method can be interactively implemented by at least one of the audience terminal 101, the anchor terminal 102 and the server 103 in the embodiment illustrated in fig. 1b, and the video processing method includes the following steps 1001 to S1011:

step S1001, each audience terminal monitors the expressions of audiences of the live video in the playing process of the live video.

Step S1002, each audience terminal sends the monitored expression images and the monitoring time of the expression images to a server.

In step S1003, the server identifies the expression image transmitted by at least one viewer terminal.

In step S1004, when it is recognized that the target expression image is included, the server acquires attribute data of the target expression.

In step S1005, the server determines the attention content of at least one viewer in the live video according to the attribute data of the target expression.

In step S1006, the server transmits the attention content of at least one viewer to the corresponding viewer terminal.

In step S1007, each viewer terminal generates a focus flag from the focus content.

In step S1008, each viewer terminal displays the attention mark during the playback of the video.

In step S1009, the server generates statistical information of the playback video corresponding to the live video according to the attention content of the at least one viewer.

In one implementation, the server may send the attention content of at least one viewer to the anchor terminal, and the anchor terminal may generate statistical information of a playback video corresponding to the live video according to the attention content of the at least one viewer, and display the statistical information in a playing process of the playback video.

Step S1010, the server sends the statistical information to the anchor terminal.

In step S1011, the anchor terminal displays the statistical information during the playing process of the playback video.

In the embodiment of the application, at least one audience terminal can monitor the expression of an audience watching a live video in the playing process of the live video and send the monitored expression image of the at least one audience and the monitoring time of the expression image to a server; when the server identifies that the target expression image appears in the expression image, the server acquires attribute data of the target expression; the server determines the attention content of at least one audience in the live video according to the attribute data of the target expression; the server sends the concerned content to the corresponding audience terminal; by monitoring the expression change condition of a target audience watching a live video, the attention content of the target audience in the live video is quickly determined, and the attention content is determined without other modes (such as manually recording the playing time point and the playing time period of the attention content), so that the time cost is saved, and the determination efficiency of the attention content can be effectively improved. In addition, after the attention content of at least one audience in the live video is determined, each audience terminal can also generate an attention mark of the playback video according to the attention content of the audience, and the attention mark is displayed in the playing process of the playback video; the attention mark is highlighted in the playback video of the audience terminal, the audience can trigger the attention mark in the process of watching the playback video, the attention content identified by the attention mark is checked, the attention content is not required to be searched by sliding or clicking the playing time axis of the playback video for many times, the time cost is saved, and the watching efficiency of the attention content can be improved. In addition, the server can generate the statistical information of the playback video corresponding to the live video according to the attention content of at least one viewer; the server sends the statistical information to the anchor terminal, and the anchor terminal can display the statistical information in the playing process of the playback video; the statistical information can reflect the attention situation of all audiences of the anchor to the live video, and the anchor can adjust the live content according to the statistical information (for example, the content concerned by the audiences is properly increased, the content not concerned by the audiences is properly reduced, and the like), thereby being beneficial to improving the live quality of the anchor.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a video processing apparatus according to an exemplary embodiment of the present application, where the video processing apparatus 110 may be a computer program (including program code) running in the viewer terminal 101, for example, may be an application software in the viewer terminal 101; the video processing apparatus 110 may be configured to perform corresponding steps in the methods shown in fig. 3, 5, 9 or 10. Referring to fig. 11, the video processing apparatus 110 includes the following units:

a display unit 1101, configured to display a live broadcast page, and play a live broadcast video in the live broadcast page;

a processing unit 1102 configured to:

In one implementation, the attribute data of the target expression comprises a target expression image and monitoring time of the target expression image; the processing unit 1102 is specifically configured to:

In one implementation, the attribute data of the target expression comprises a plurality of target expression images and the monitoring time of each target expression image; the processing unit 1102 is specifically configured to:

In one implementation, the processing unit 1102 is specifically configured to invoke the image capturing device to capture an expression image of a target audience, and record monitoring time of the expression image, where the monitoring time of the expression image is shooting time of the image capturing device for shooting the expression image; the processing unit 1102 is further configured to:

In one implementation, the video processing apparatus 110 may be a computer program (including program code) running in the viewer terminal 101, such as an application software in the viewer terminal 101; the video processing apparatus 110 may be configured to perform the corresponding steps in the methods shown in fig. 5, fig. 9 or fig. 10. Referring to fig. 11, the video processing apparatus 110 includes the following units:

a display unit 1101 configured to display a playback page and play a playback video in the playback page, where the playback video is generated by recording a live video;

a processing unit 1102 configured to:

the display unit 1101 is also used for displaying a focus mark during playing of the playback video.

In one implementation, the playback page includes a play timeline for playback of the video; the display unit 1101 is specifically configured to:

displaying a focus mark in a position area corresponding to the focus content;

In one implementation, the display unit 1101 is specifically configured to display a focus list in a playback page and a focus mark in the focus list.

In one implementation, the display unit 1101 is further configured to jump to the content of interest identified by the attention mark and play the content of interest identified by the attention mark when the attention mark is triggered.

In one implementation, the video processing apparatus 110 may also be a computer program (including program code) running in the anchor terminal 102, such as an application software in the anchor terminal 102; the video processing apparatus 110 may be configured to perform the corresponding steps in the methods shown in fig. 7 or fig. 10. Referring to fig. 11, the video processing apparatus 110 includes the following units:

a processing unit 1102 configured to:

In one implementation, the statistical information includes at least one piece of attention content, a playing time point of each piece of attention content in the playback video, and an attention detail of each piece of attention content, wherein the attention detail includes a viewer identifier and a viewer number; the display unit 1101 is specifically configured to:

generating a statistical curve according to the statistical information;

In one implementation, the display unit 1101 is further configured to, when the statistical curve is triggered, acquire a triggered play time point, and display an attention detail of an attention content corresponding to the triggered play time point in a playback page.

According to an embodiment of the present application, the units in the video processing apparatus 110 shown in fig. 11 may be respectively or entirely combined into one or several other units to form one or several other units, or some unit(s) thereof may be further split into multiple units with smaller functions to form the same operation, without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the video processing apparatus 110 may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of multiple units. According to another embodiment of the present application, the video processing apparatus 110 as shown in fig. 11 may be configured by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 3, fig. 5, fig. 7, fig. 9, or fig. 10 on a general-purpose computing device including a general-purpose computer such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), or the like, and a storage element, and the video processing method of the embodiment of the present application may be implemented. The computer program may be embodied on a computer-readable storage medium, for example, and loaded and executed in the viewer terminal 101 or the anchor terminal 102 described above via the computer-readable storage medium.

In the embodiment of the application, the audience terminal can monitor the expression of a target audience watching a live video in the playing process of the live video, automatically acquire the attribute data of the target expression when the target audience is monitored to generate the target expression, and determine the attention content of the target audience in the live video according to the attribute data of the target expression; by monitoring the expression change condition of a target audience watching a live video, the attention content of the target audience in the live video is quickly determined, and the attention content is determined without other modes (such as manually recording the playing time point and the playing time period of the attention content), so that the time cost is saved, and the determination efficiency of the attention content can be effectively improved. In addition, after the audience terminal determines the attention content of the target audience in the live video, the attention mark of the playback video can be generated according to the attention content of the target audience, and the attention mark is displayed in the playing process of the playback video; the attention mark is highlighted in the playback video of the audience terminal, the target audience can trigger the attention mark in the process of watching the playback video, the attention content identified by the attention mark is checked, the attention content is not required to be searched by sliding or clicking the playing time axis of the playback video for many times, the time cost is saved, and the watching efficiency of the attention content can be improved. In addition, the anchor terminal can obtain the concerned content determined by at least one audience through the audience terminal, generate a statistical curve of the playback video according to the concerned content of the at least one audience, and display the statistical curve in the playing process of the playback video; the statistical curve is highlighted in the playback video of the anchor terminal, the anchor can trigger the statistical curve in the process of watching the playback video, the attention details of the attention content corresponding to the triggered playing time point (namely the audience identification and the audience number paying attention to the attention content) can be checked, the statistical curve can reflect the attention condition of all audiences of the anchor to the live broadcast video, the anchor can adjust the live broadcast content according to the statistical curve (for example, the attention content of the audiences is properly increased, the attention-free content of the audiences is properly reduced, and the like), and the live broadcast quality of the anchor is favorably improved.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a video processing apparatus according to another exemplary embodiment of the present application, where the video processing apparatus 120 may be a computer program (including program code) running in the server 103, for example, may be an application software in the server 103; the video processing apparatus 120 may be configured to perform the corresponding steps in the methods shown in fig. 8, 9 or 10. Referring to fig. 12, the video processing apparatus 120 includes the following units:

a receiving unit 1201, configured to receive an expression image of at least one viewer of a live video and a monitoring time of the expression image;

a processing unit 1202 for:

recognizing the expression image;

In one implementation, the target audience is any of at least one audience; the processing unit 1202 is further configured to:

According to an embodiment of the present application, the units in the video processing apparatus 120 shown in fig. 12 may be respectively or entirely combined into one or several other units to form one or several other units, or some unit(s) thereof may be further split into multiple units with smaller functions to form the same operation, without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the video processing apparatus 120 may also include other units, and in practical applications, these functions may also be implemented by being assisted by other units, and may be implemented by cooperation of multiple units. According to another embodiment of the present application, the video processing apparatus 120 as shown in fig. 12 may be configured by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 8, fig. 9, or fig. 10 on a general-purpose computing device including a general-purpose computer such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), or the like, and a storage element, and the video processing method of the embodiment of the present application may be implemented. The computer program may be, for example, loaded on a computer-readable storage medium and loaded in the server 103 described above through the computer-readable storage medium and executed therein.

Referring to fig. 13, fig. 13 is a schematic structural diagram of a video processing apparatus according to an exemplary embodiment of the present application, where the video processing apparatus 130 includes at least a processor 1301 and a computer-readable storage medium 1302. The processor 1301 and the computer-readable storage medium 1302 may be connected by a bus or other means. A computer-readable storage medium 1301 may be stored in the memory, the computer-readable storage medium 1301 being for storing a computer program, the computer program comprising program instructions, the processor 1301 for executing the program instructions stored by the computer-readable storage medium 1302. The processor 1301 (or CPU) is a computing core and a control core of the video Processing apparatus 130, and is adapted to implement one or more instructions, and specifically, adapted to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.

Embodiments of the present application also provide a computer-readable storage medium (Memory), which is a Memory device in the video processing device 130 and is used for storing programs and data. It is understood that the computer readable storage medium 1302 herein may include a built-in storage medium in the video processing apparatus 130, and may also include an extended storage medium supported by the video processing apparatus 130. The computer readable storage medium provides a storage space that stores an operating system of the video processing device 130. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 1301. It should be noted that the computer-readable storage medium 1302 herein may be a high-speed RAM Memory, or may be a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory; optionally, at least one computer readable storage medium may be located remotely from the processor 1301.

In one implementation, the video processing device 130 may be the viewer terminal 101 shown in FIG. 1 b; the computer-readable storage medium 1302 has stored therein one or more first instructions; one or more first instructions stored in the computer-readable storage medium 1302 are loaded and executed by the processor 1301 to implement the corresponding steps in the above-described video processing method embodiments; in particular implementations, one or more first instructions in the computer-readable storage medium 1302 are loaded by the processor 1301 and perform the following steps:

In one implementation, the attribute data of the target expression comprises a target expression image and monitoring time of the target expression image; one or more first instructions in the computer-readable storage medium 1302 are loaded by the processor 1301 and executed to determine the attention content of the target audience in the live video according to the attribute data of the target expression, and specifically execute the following steps:

In one implementation, the attribute data of the target expression comprises a plurality of target expression images and the monitoring time of each target expression image; one or more first instructions in the computer-readable storage medium 1302 are loaded by the processor 1301 and executed to determine the attention content of the target audience in the live video according to the attribute data of the target expression, and specifically execute the following steps:

In one implementation, one or more first instructions in the computer-readable storage medium 1302, when loaded and executed by the processor 1301, perform the following steps in particular:

calling camera equipment to shoot an expression image of a target audience, and recording the monitoring time of the expression image, wherein the monitoring time of the expression image is the shooting time of the camera equipment for shooting the expression image; the loading of the one or more first instructions in the computer-readable storage medium 1302 by the processor 1301 further performs the steps of:

In one implementation, the video processing device 130 may be the viewer terminal 101 shown in FIG. 1 b; the computer-readable storage medium 1302 has one or more second instructions stored therein; one or more second instructions stored in the computer-readable storage medium 1302 are loaded and executed by the processor 1301 to implement the corresponding steps in the above-described video processing method embodiment; in particular implementations, one or more second instructions in the computer-readable storage medium 1302 are loaded by the processor 1301 and perform the following steps:

the focus mark is displayed during the playing of the playback video.

In one implementation, the playback page includes a play timeline for playback of the video; when one or more second instructions in the computer-readable storage medium 1302 is loaded by the processor 1301 and the attention mark is displayed during the playing process of the playback video, the following steps are specifically performed:

displaying a focus mark in a position area corresponding to the focus content;

In one implementation, when one or more second instructions in the computer-readable storage medium 1302 is loaded by the processor 1301 and executes displaying the attention mark during the playing process of the playback video, the following steps are specifically performed:

an attention list is displayed in the playback page, and an attention mark is displayed in the attention list.

In one implementation, the one or more second instructions in the computer-readable storage medium 1302 being loaded by the processor 1301 further performs the steps of:

and when the attention mark is triggered, skipping to the attention content identified by the attention mark, and playing the attention content identified by the attention mark.

In one implementation, the video processing device 130 may be the anchor terminal 102 shown in FIG. 1 b; the computer-readable storage medium 1302 has stored therein one or more third instructions; one or more third instructions stored in the computer-readable storage medium 1302 are loaded and executed by the processor 1301 to implement the corresponding steps in the above-described video processing method embodiments; in particular implementations, one or more third instructions in the computer-readable storage medium 1302 are loaded by the processor 1301 and perform the following steps:

In one implementation, the statistical information includes at least one piece of attention content, a playing time point of each piece of attention content in the playback video, and an attention detail of each piece of attention content, wherein the attention detail includes a viewer identifier and a viewer number; when one or more third instructions in the computer-readable storage medium 1302 is loaded by the processor 1301 and performs displaying statistical information during the playing process of the playback video, the following steps are specifically performed:

generating a statistical curve according to the statistical information;

In one implementation, the one or more third instructions in the computer-readable storage medium 1302 being loaded by the processor 1301 further performs the steps of:

and when the statistical curve is triggered, acquiring the triggered playing time point, and displaying the attention details of the attention content corresponding to the triggered playing time point in a playback page.

In one implementation, the video processing device 130 may be the server 103 shown in FIG. 1 b; the computer-readable storage medium 1302 has one or more fourth instructions stored therein; one or more fourth instructions stored in the computer-readable storage medium 1302 are loaded and executed by the processor 1301 to implement the corresponding steps in the above-described video processing method embodiments; in particular implementations, one or more fourth instructions in the computer-readable storage medium 1302 are loaded by the processor 1301 and perform the following steps:

recognizing the expression image;

In one implementation, the target audience is any of at least one audience; the one or more fourth instructions in the computer-readable storage medium 1302 is loaded by the processor 1301 to further perform the steps of: sending the attribute data of the target expression of the target audience to the target audience so that the target audience determines the attention content of the target audience in the live video according to the attribute data of the target expression and generates an attention mark of a playback video according to the attention content; alternatively, the first and second electrodes may be,

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of video processing, the method comprising:

monitoring the expression of a target audience of the live video in the playing process of the live video;

when it is monitored that the target audience generates the target expression, acquiring attribute data of the target expression;

2. The method of claim 1, wherein the attribute data of the target expression comprises a target expression image and a monitoring time of the target expression image;

the determining the attention content of the target audience in the live video according to the attribute data of the target expression comprises the following steps:

acquiring a playing time point of the live video corresponding to the monitoring time of the target expression image;

3. The method according to claim 1, wherein the attribute data of the target expression comprises a plurality of target expression images and a monitoring time of each target expression image;

acquiring a first playing time point of the live video corresponding to the monitoring time of a target expression image with the first order and acquiring a second playing time point of the live video corresponding to the monitoring time of a target expression image with the last order;

determining a plurality of video frames corresponding to the duration of the target expression in the live video as the attention content of the target audience.

4. The method of claim 1, wherein the monitoring of the target audience expression of the live video comprises: calling camera equipment to shoot the expression image of the target audience, and recording the monitoring time of the expression image, wherein the monitoring time of the expression image is the shooting time of the camera equipment for shooting the expression image;

the method further comprises the following steps:

calling an expression recognition model to recognize the expression image, and determining that the target audience generates the target expression when the expression image is recognized to contain the target expression image; alternatively, the first and second electrodes may be,

and sending the expression image and the monitoring time of the expression image to a server so that the server identifies the expression image and receives the attribute data of the target expression returned by the server when the server identifies that the target expression image is included.

5. A method of video processing, the method comprising:

displaying a playback page, and playing a playback video in the playback page, wherein the playback video is generated by recording a live video;

acquiring attention content of a target audience of the live video, wherein the attention content is obtained by processing according to the method of any one of claims 1 to 4;

generating an attention mark of the playback video according to the attention content;

displaying the attention mark during playing of the playback video.

6. The method of claim 5, wherein the playback page includes a play timeline for the playback video; the displaying the attention mark in the playing process of the playback video comprises:

acquiring a position area corresponding to the attention content in a playing time axis of the playback video;

displaying the attention mark in a position area corresponding to the attention content;

wherein the focus mark is distinctively displayed in a play time axis of the playback video; the differential display includes: the attention mark is displayed in a first color, and other position areas except the attention mark are displayed in a second color; alternatively, the focus mark is displayed in a first shape, and a position region other than the focus mark is displayed in a second shape.

7. The method of claim 5, wherein displaying the attention mark during the playing of the playback video comprises:

displaying a focus list in the playback page, and displaying the focus mark in the focus list.

8. The method of claim 5, further comprising:

9. A method of video processing, the method comprising:

acquiring attention content of at least one viewer of the live video, wherein the attention content of each viewer is obtained by processing according to the method of any one of claims 1 to 4;

generating statistical information of the playback video according to the attention content of the at least one viewer;

and displaying the statistical information in the playing process of the playback video.

10. The method of claim 9, wherein the statistical information comprises at least one content of interest, a playing time point of each content of interest in the playback video, and a content of interest detail of each content of interest, wherein the content of interest detail comprises a viewer identification and a viewer number;

the displaying the statistical information in the playing process of the playback video comprises:

generating a statistical curve according to the statistical information;

and displaying the statistical curve in the playing process of the playback video.

11. The method of claim 10, further comprising:

when the statistical curve is triggered, acquiring a triggered playing time point, and displaying the attention details of the attention content corresponding to the triggered playing time point in the playback page.

12. A method of video processing, the method comprising:

receiving an expression image of at least one viewer of a live video and monitoring time of the expression image;

recognizing the expression image;

and determining the attention content of the at least one audience in the live video according to the attribute data of the target expression.

13. The method of claim 12, wherein the target viewer is any of the at least one viewer; the method further comprises the following steps:

sending the attention content of the target audience to the target audience so that the target audience generates an attention mark of the playback video according to the attention content; alternatively, the first and second electrodes may be,

and generating statistical information of a playback video corresponding to the live video according to the attention content of the at least one audience, and sending the statistical information to a main broadcast of the live video.

14. A video processing apparatus, characterized in that the video processing apparatus comprises:

a computer readable storage medium having stored thereon one or more instructions adapted to be loaded by the processor and to execute the video processing method according to any of claims 1 to 13.

15. A computer-readable storage medium having stored thereon one or more instructions adapted to be loaded by a processor and to perform the video processing method of any of claims 1 to 13.