WO2023207516A1 - 直播视频处理方法、装置、电子设备及存储介质 - Google Patents

直播视频处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023207516A1
WO2023207516A1 PCT/CN2023/085650 CN2023085650W WO2023207516A1 WO 2023207516 A1 WO2023207516 A1 WO 2023207516A1 CN 2023085650 W CN2023085650 W CN 2023085650W WO 2023207516 A1 WO2023207516 A1 WO 2023207516A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
live broadcast
data
image size
video stream
Prior art date
Application number
PCT/CN2023/085650
Other languages
English (en)
French (fr)
Inventor
朱承丞
张雯
赵飞
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023207516A1 publication Critical patent/WO2023207516A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4314Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for fitting data in a restricted space on the screen, e.g. EPG data in a rectangular grid
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Definitions

  • the present application relates to the field of computer technology, and in particular to a live video processing method, device, electronic equipment and storage medium.
  • online live broadcast has become one of the scenes for people's leisure interaction in the Internet era.
  • users can also post comments while watching the live broadcast.
  • the purpose of this application is to propose a live video processing method, device, electronic equipment and storage medium.
  • this application provides a live video processing method, including:
  • the target object area data is added to the video stream data and sent, so that the barrage is rendered and displayed outside the area occupied by the target object during the live broadcast.
  • the method further includes: obtaining barrage data for live broadcast; displaying the live broadcast picture according to the video stream data, and displaying the live picture in the live broadcast picture according to the target object area data and the barrage data.
  • the barrage is rendered and displayed outside the area occupied by the target object.
  • the video stream data includes a certain number of video frames; generating target object area data according to the video stream data includes: performing contour recognition on the video frames to obtain the corresponding A set of coordinate points of the target object outline, and the set of coordinate points is is the target object area data.
  • adding and sending the target object area data to the video stream data includes: adding the target object area data to the video stream data as supplementary enhancement information according to predetermined video stream encoding rules.
  • Video streaming data includes: adding the target object area data to the video stream data as supplementary enhancement information according to predetermined video stream encoding rules.
  • the method further includes determining the area occupied by the target object in the live broadcast screen by: for each coordinate point in the coordinate point set, mapping the coordinate point according to a predetermined mapping relationship to Target coordinate points; wherein, the mapping relationship is the mapping relationship between the image size of the contour recognition output, the image size of the video frame, and the image size of the live screen; connect the target coordinate points in sequence, To obtain a closed curve; determine the area defined by the closed curve as the area occupied by the target object in the live broadcast picture.
  • mapping the coordinate point to a target coordinate point according to a predetermined mapping relationship includes: determining a first scaling parameter based on the image size of the contour recognition output and the image size of the video frame; The second scaling parameter is determined according to the image size of the video frame and the image size of the live broadcast screen; the second scaling parameter is determined according to the image size output by the contour recognition, the image size of the video frame and the image size of the live broadcast screen.
  • Cropping parameters map the coordinate point to a target coordinate point according to the first scaling parameter, the second scaling parameter and the cropping parameter.
  • the coordinate point includes an abscissa and an ordinate; mapping the coordinate point to a target coordinate point according to a predetermined mapping relationship includes: determining a movement parameter according to the image size of the live broadcast screen; The first scaling parameter, the second scaling parameter and the cropping parameter are used to map the abscissa; according to the first scaling parameter, the second scaling parameter and the movement parameter, the The ordinate is mapped; and the target coordinate point is obtained according to the abscissa and ordinate after mapping.
  • this application also provides a live video processing method, including:
  • Receive video stream data and barrage data for live broadcast wherein the video stream data carries target object area data, and the target object area data is used to determine the area occupied by the target object in the live broadcast screen;
  • the live broadcast picture is displayed according to the video stream data, and the barrage is rendered and displayed outside the area occupied by the target object in the live broadcast picture according to the target object area data and the barrage data.
  • the method further includes determining the area occupied by the target object in the live broadcast image by: for each coordinate point in the set of coordinate points, assign the coordinate point according to The predetermined mapping relationship is mapped to the target coordinate point; wherein the mapping relationship is the mapping relationship between the image size of the outline recognition output, the image size of the video frame, and the image size of the live screen;
  • the target coordinate points are connected in sequence to obtain a closed curve; the area defined by the closed curve is determined as the area occupied by the target object in the live broadcast picture.
  • mapping the coordinate point to a target coordinate point according to a predetermined mapping relationship includes: determining a first scaling parameter based on the image size of the contour recognition output and the image size of the video frame; The second scaling parameter is determined according to the image size of the video frame and the image size of the live broadcast screen; the second scaling parameter is determined according to the image size output by the contour recognition, the image size of the video frame and the image size of the live broadcast screen.
  • Cropping parameters map the coordinate point to a target coordinate point according to the first scaling parameter, the second scaling parameter and the cropping parameter.
  • the coordinate point includes an abscissa and an ordinate; mapping the coordinate point to a target coordinate point according to a predetermined mapping relationship includes: determining a movement parameter according to the image size of the live broadcast screen; The first scaling parameter, the second scaling parameter and the cropping parameter are used to map the abscissa; according to the first scaling parameter, the second scaling parameter and the movement parameter, the The ordinate is mapped; and the target coordinate point is obtained according to the abscissa and ordinate after mapping.
  • this application also provides a live video processing device, including:
  • An acquisition module configured to acquire video stream data for live broadcast
  • a generation module configured to generate target object area data according to the video stream data
  • the sending module is configured to add the target object area data to the video stream data and send it, so that the barrage is rendered and displayed outside the area occupied by the target object during the live broadcast.
  • this application also provides a live video processing device, including:
  • the receiving module is configured to receive video stream data and barrage data for live broadcast; wherein the video stream data carries target object area data, and the target object area data is used to represent the area occupied by the target object in the live broadcast screen. area;
  • the display module is configured to display the live broadcast picture according to the video stream data, and render and display the barrage outside the area occupied by the target object in the live broadcast picture according to the target object area data and the barrage data.
  • the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes Implement any of the above methods when executing the program.
  • the present application also provides a non-transitory computer-readable storage medium, which stores computer instructions, and the computer instructions are used to cause the computer to perform any of the above. method described in the item.
  • Figure 1 is a schematic diagram of an application scenario according to an embodiment of the present application.
  • Figure 2 is a schematic flow chart of a live video processing method used on the anchor end or server according to an embodiment of the present application
  • Figure 3 is a schematic diagram of the live broadcast screen of the barrage avoidance target object implemented in the embodiment of the present application.
  • Figure 4 is a schematic diagram of the mapping relationship in the embodiment of the present application.
  • FIG. 5 is a schematic flowchart of the live video processing method for the viewing and broadcasting end according to the embodiment of the present application
  • Figure 6 is a schematic structural diagram of a live video processing device used at the anchor end or server according to an embodiment of the present application
  • Figure 7 is a schematic structural diagram of a live video processing device used for viewing and broadcasting according to an embodiment of the present application
  • Figure 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • a live video processing method, device, electronic device and storage medium are proposed.
  • the live broadcast viewers of the live broadcast can turn on the barrage function.
  • the live video will display a large number of comments in the form of subtitles (or pictures) from all viewers of the live broadcast, which makes watching Users can not only watch the current live video, but also see the barrage interactions of other video viewers on the live video, which adds interest and interactivity to the live broadcast process.
  • the target object can be an avatar (virtual anchor), or a tape for viewers to read.
  • this application provides a live video processing solution.
  • target object area data used to characterize the area occupied by the target object in the live broadcast screen is generated.
  • the target object area data is added to the live video stream data and then sent to the viewing end.
  • On the anchor side and the viewing side it is possible to determine the area occupied by the target object in the live broadcast screen based on the target object area data carried in the video stream data, and only render in the live broadcast screen outside the area occupied by the target object. Display barrages to achieve the effect of barrages avoiding target objects during live broadcast.
  • the solution of this application avoids the barrage blocking the target object in the live video.
  • Figure 1 is a schematic diagram of an application scenario of the live video processing method according to the embodiment of the present application.
  • This application scenario includes a client device 101, a client device 103, and a server 102.
  • the client device 101 and the client device 103 can each be connected to the server 102 through a network to implement data interaction.
  • the client device 101 and the client device 103 may be electronic devices with data transmission and multimedia input/output functions close to the user side, such as computers, tablets, smart phones, vehicle-mounted computers, wearable devices, etc.
  • the server 102 can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server.
  • the client device 101 may be a client device used by the anchor.
  • the anchor end in this embodiment of the present application.
  • the client device 103 may be a client device used by viewers.
  • it is called a viewing terminal in the embodiment of this application; in general, there will be multiple viewing terminals.
  • both the anchor end and the viewing end can communicate with the server 102 through the installed live broadcast client to use the online live broadcast service provided by the server 102.
  • embodiments of this application provide a live video processing method, which is applied to the anchor side.
  • the live video processing method of this embodiment may include the following steps:
  • Step S201 Obtain video stream data for live broadcast.
  • the video stream data used for live broadcast refers to the video data generated by the anchor end for online live broadcast to each viewing end.
  • the video stream data is set by the anchor end or an external image.
  • the data is obtained by the acquisition device (such as a camera) and processed by the streaming media processing software set on the anchor end before being sent to the server.
  • Step S202 Generate target object area data according to the video stream data.
  • the target object area data refers to data used to represent the area occupied by the target object in the live broadcast picture.
  • the target object can be an object in the live broadcast screen that is expected to be displayed to the audience.
  • Objects that are blocked by barrage and affect the audience experience; accordingly, the area occupied by the target object may refer to the area surrounded and defined by the outer contour of the target object in the live broadcast screen.
  • the content of the screen in this area is generally in the solution of this application. Live content that you don’t want to be blocked by barrages.
  • the video stream data includes a certain number of video frames arranged in sequence according to the live playback time sequence, and each video frame is a frame of the live video.
  • the target object area data when the target object area data is generated according to the video stream data, the target object area data may be generated correspondingly for each video frame.
  • the target object area data can be obtained by identifying coordinate points corresponding to the contour.
  • the method of generating target object area data based on video stream data may include: performing contour recognition on the video frame to obtain a coordinate point set corresponding to the recognized target object contour, and using the coordinate point set as the target object area data.
  • the Matting algorithm and the FindContour algorithm can be used to identify the contour of the video frame and generate the coordinate points of the target object contour.
  • the Matting algorithm can perform contour recognition on video frames and identify the contours of target objects included in the video frames. Based on the contour of the target object, and then through the FindContour algorithm, all coordinate points corresponding to the contour of the target object are output. These coordinate points constitute a set of coordinate points corresponding to the contour of the target object in the video frame. This set of coordinate points can be used as target object area data.
  • the Matting algorithm and the FindContour algorithm can be pre-deployed locally on the anchor side and called locally when used; in addition, the above-mentioned algorithms can also be deployed in the storage location on the network side, such as an algorithm platform that provides open interface access.
  • the corresponding access algorithm platform on the anchor side can be called and the calculation results can be received.
  • any other feasible related scheme can also be used to identify the outline of the video frame and generate the coordinate points corresponding to the outline of the target object.
  • the method is not limited in the examples of this application.
  • the corresponding generated target object area data should correspond to the number of target objects included in the video frame.
  • the target object in each video frame its corresponding target object area data is generated and stored separately. For example, for the aforementioned method of obtaining a set of coordinate points through the FindContour algorithm, when the video frame includes multiple target objects, each target object will be divided into Don't generate the corresponding set of coordinate points.
  • Step S203 Add the target object area data to the video stream data and send it, so that the barrage is rendered and displayed outside the area occupied by the target object during the live broadcast.
  • the target object area data will be added to the video stream data, so that the video stream data carries the target object area data and is pushed to the live broadcast server. Subsequently, watch the broadcast When the client pulls the video stream data from the live broadcast server, what it pulls is the above-mentioned video stream data carrying the target object area data.
  • the target object area data can be added to the video stream data as supplemental enhancement information (SEI).
  • SEI supplemental enhancement information
  • adding and sending the target object area data to the video stream data may include: adding the target object area data to the video stream data as supplementary enhancement information according to predetermined video stream encoding rules.
  • SEI is one of the characteristics of the video stream coding and compression standard H.264/H.265.
  • SEI can be added during the generation and transmission of video stream data. The data carried by SEI will be transmitted to the player of the video stream data together with the video stream data.
  • common SEI contents may include compression encoding parameters, camera parameters, etc.
  • the generated target object area data is added to the video stream data through SEI. Adding target object area data to the video stream data through SEI can make full use of the characteristics of SEI. At the same time, the structure and transmission method of the existing video stream data will not change, and the compatibility is excellent.
  • any other feasible related scheme can also be used to add target object area data to video stream data.
  • this application embodiment There are no restrictions in it.
  • the viewer and broadcaster can determine the live broadcast based on the target object area data carried by the video stream data.
  • the specific method of determining the area occupied by the target object in the live broadcast screen based on the target object area data will be described in detail in the embodiments described later.
  • the method of the embodiment of the present application generates target object area data used to characterize the area occupied by the target object in the live broadcast screen at the anchor end, and adds the target object area data to the video stream The data is then sent together. After the subsequent viewing and broadcasting end pulls the video stream data, it can directly obtain the target object area data and accordingly achieve the effect of barrage avoidance of the target object.
  • the processing of target object area data is only performed on the host side. Neither the live broadcast server nor the viewing side needs to make too many additional settings. While achieving the effect of barrage avoidance of target objects, the solution is easy to implement and has higher compatibility. .
  • the anchor end in addition to sending the video stream data carrying the target object area data, the anchor end also needs to display the live broadcast image locally, and in the live broadcast image displayed on the anchor end, it also needs to Realize the effect of barrages avoiding target objects. Therefore, the live video processing method applied to the anchor side may also include: obtaining barrage data for live broadcast; displaying the live broadcast picture according to the video stream data, and based on the target object area data and the barrage data, in The barrage is rendered and displayed outside the area occupied by the target object in the live broadcast screen.
  • the barrage data is obtained by the host from the live broadcast server, which includes the barrages sent by each viewer during the current live broadcast. Based on the obtained barrage data, and when the barrage function of the current live broadcast is turned on, the barrage can be displayed in the live broadcast screen based on the barrage data.
  • the processing of video stream data and barrage data can be realized through EffectSDK; EffectSDK provides a cross-platform audio and video special effects library, which can realize rich audio and video special effects, graphic editing and rendering, interactive functions, etc.
  • generating and displaying barrages can be implemented using any feasible related technology.
  • the barrage can be generated and displayed in the live broadcast screen by using a mask layer; the mask layer, also called a mask layer or mask layer, can be intuitively understood as the image layer displayed on the live broadcast screen.
  • the mask layer can partially cover the image layer it covers; when the barrage is generated and displayed through mask layer rendering, the barrage is generated in the form of text or pictures and displayed in the live broadcast screen to realize the pop-up in the live broadcast screen. curtain effect.
  • the area occupied by the target object in the live broadcast picture can be determined based on the target object area data.
  • the final effect of the barrage avoiding the target object can be seen in Figure 3; Figure 3 shows a live broadcast as a whole, and the barrage 301 achieves the effect of avoiding the target object 302 during the live broadcast.
  • the area occupied by the target object in the live broadcast picture determined based on the target object area data in the embodiment of the present application can also be controlled by any other feasible related technology within the area occupied by the target object in the live broadcast picture. Render and display the barrage, and only render and display the barrage effect outside this area.
  • the area occupied by the target object in the live broadcast screen can also be determined by the following method: for each coordinate point in the set of coordinate points, all The coordinate points are mapped to target coordinate points according to a predetermined mapping relationship; the target coordinate points are connected in sequence to obtain a closed curve; the area defined by the closed curve is determined as the area occupied by the target object in the live broadcast screen.
  • the image size based on the algorithm used is different from the image size used to subsequently form the video frame of the video stream data, and the size of the video frame is The image size is also different from the image size of the live broadcast screen during playback (host end or viewer end). Since there are differences in image sizes in each of the above stages, each coordinate point in the coordinate point set should be mapped first. Specifically, a mapping relationship can be set in advance, which is a mapping relationship between the image size of the contour recognition output, the image size of the video frame, and the image size of the live broadcast picture.
  • each coordinate point in the coordinate point set is first mapped to a representation of the image size based on the video frame, and then mapped to a representation of the image size based on the live broadcast screen, and the finally obtained coordinate point represents this embodiment. is called the target coordinate point.
  • the target coordinate points are connected in sequence to obtain a closed curve.
  • Bezier curves can be connected between them, so that all target coordinate points are connected in sequence to form a closed curve.
  • the area defined by the closed curve can be determined as the area occupied by the target object in the live broadcast screen.
  • any related technology can be used to draw a closed curve.
  • the graphics API in the Android system: Path.quadTo can be used to form the above closed curve; subsequently, the canvas tool Canvas Api in the Android system can be used to draw the above closed curve, and Combined with the setting of the mask layer, the area within the drawn closed curve is controlled to not render and display the barrage, so that the target object in the live broadcast screen can be clearly and completely displayed, and the barrage avoidance can be achieved.
  • mapping the coordinate points to target coordinate points according to a predetermined mapping relationship may include: identifying the coordinate points according to the contour.
  • the first scaling parameter is determined based on the image size of the output image and the image size of the video frame;
  • the second scaling parameter is determined based on the image size of the video frame and the image size of the live screen;
  • the output image is identified based on the outline.
  • the image size, the image size of the video frame and the image size of the live screen determine the cropping parameters; according to the first scaling parameter, the second scaling parameter and the cropping parameter, the coordinate point is mapped as target coordinate point.
  • the output of the FindContour algorithm is based on a fixed size image (128 ⁇ 224), and the image size used for video frame transmission is generally (720 ⁇ 1080), and the size of the live broadcast screen depends on Software and hardware settings on the host or viewer side.
  • Figure 4 shows the mapping relationship between the above-mentioned image sizes. Among them, according to the software and hardware settings of the host or viewer, the live broadcast screen will generally be cropped horizontally.
  • the first scaling parameter is used to realize the mapping of the coordinate points from the image size of the contour recognition output to the image size of the video frame
  • the second scaling parameter is used to realize the mapping of the coordinate points from the image size of the video frame to the broadcast picture.
  • Image size mapping; cropping parameters are used to achieve horizontal cropping of live broadcast images, which are obtained based on the above mapping relationship.
  • mapping process from coordinate points to target coordinate points can be expressed by the following formula:
  • x is the abscissa of the target coordinate point
  • y is the ordinate of the target coordinate point
  • originX is the abscissa of the coordinate point
  • originY is the ordinate of the coordinate point
  • EFFECT_OUTPUT_WIDTH is the width of the image output by the recognition algorithm
  • EFFECT_OUTPUT_HEIGHT is The height of the image output by the recognition algorithm
  • STREAM_WIDTH is the image width of the video frame
  • STREAM_HEIGHT is the height of the image of the video frame
  • PLAYER_VIEW_WIDTH is the width of the image of the live screen
  • PLAYER_VIEW_HEIGHT is the height of the image of the live screen
  • C is the cropping parameter; each of the above Among the parameters, the width refers to the horizontal scale of the image, and the height refers to the vertical scale of the image.
  • not all areas of the live broadcast screen are used to display barrages.
  • the upper part of the screen is often used to display some relevant information related to the live broadcast, such as the name of the live broadcast room, the name of the live broadcast, etc., and the area where the relevant information is displayed is not used to display barrages.
  • the coordinate point from the coordinate point to the target During the mapping process of coordinate points, the coordinate values are further adjusted.
  • the movement parameter is determined based on the image size of the live broadcast screen and the setting of the area used to display the barrage.
  • the movement parameter represents the distance from the area used to display the barrage to the upper edge of the live broadcast screen.
  • the abscissa of the coordinate point is mapped according to the first scaling parameter, the second scaling parameter and the cropping parameter.
  • the specific calculation process please refer to the aforementioned calculation formula for x.
  • the ordinate of the coordinate point is mapped according to the first scaling parameter, the second scaling parameter and the movement parameter.
  • the mapping process can be expressed by the following formula:
  • TOP_MARGIN is the movement parameter.
  • the area occupied by the target object in the live broadcast screen determined based on the obtained target coordinate point can be adapted to the screen output settings of the host and the viewer, and can be used to actually display the barrages in the live screen. corresponding to the area.
  • the live video processing method in the embodiment of this application can also be used on a server.
  • This server refers to a live broadcast server based on streaming media transmission technology and used to realize online live broadcast of videos.
  • step S201 the server obtains video stream data uploaded by the viewing terminal for live broadcast.
  • the execution subject of steps such as generating video stream data and transmitting video stream data is the server.
  • steps 202 and 203 for generating target object area data, convert the target object area
  • the execution subject of steps such as adding domain data to video stream data and sending video stream data is the server.
  • the sending step in step 203 means that the server sends the video stream data after adding the target object area data to the viewing end.
  • target object area data used to characterize the area occupied by the target object in the live broadcast screen is generated.
  • the target object area data is added to the live video stream data and sent to the viewing and broadcasting end. After the viewing and broadcasting end pulls the video stream data, it can directly obtain the target object area data and accordingly achieve the effect of barrage avoidance of the target object.
  • the processing of target object area data is only done on the server, and neither the host nor the viewer needs to make too many additional settings. While achieving the effect of barrage avoidance of target objects, the solution is easy to implement and has higher compatibility.
  • embodiments of the present application also provide a live video processing method, which is applied to the viewing end.
  • the live video processing method in this embodiment may include the following steps:
  • Step S501 Receive video stream data and barrage data for live broadcast; wherein the video stream data carries target object area data, and the target object area data is used to determine the area occupied by the target object in the live broadcast screen;
  • Step S502 Display the live broadcast picture according to the video stream data, and render and display the barrage outside the area occupied by the target object in the live broadcast picture according to the target object area data and the barrage data.
  • the viewing terminal obtains video stream data and barrage data for live broadcast from the live broadcast server to display the live broadcast screen and the barrage in the live broadcast screen.
  • the video stream data carries target object area data
  • the viewer can determine the area occupied by the target object in the live broadcast screen based on the target object area data, and control only the area occupied by the target object in the live broadcast screen.
  • the barrage is rendered and displayed in the area, and the barrage is not rendered and displayed in the area occupied by the target object in the live broadcast screen.
  • the target object in the live broadcast screen is displayed clearly and completely, and the barrage avoids the target object.
  • the method in the embodiment of the present application can be executed by a single device, such as a computer or server.
  • the method of this embodiment can also be applied in a distributed scenario, and is completed by multiple devices cooperating with each other.
  • one of the multiple devices can only execute one or more steps in the method of the embodiment of the present application, and the multiple devices will interact with each other to complete all the steps. method described.
  • the live video processing device 600 includes:
  • the acquisition module 601 is configured to acquire video stream data for live broadcast
  • the generation module 602 is configured to generate target object area data according to the video stream data
  • the sending module 603 is configured to add the target object area data to the video stream data and send it, so that the barrage is rendered and displayed outside the area occupied by the target object during the live broadcast.
  • the device further includes: a display module configured to obtain barrage data for live broadcast; display the live broadcast picture according to the video stream data, and display the live broadcast picture according to the target object area data and the The barrage data is described, and the barrage is rendered and displayed outside the area occupied by the target object in the live broadcast screen.
  • a display module configured to obtain barrage data for live broadcast; display the live broadcast picture according to the video stream data, and display the live broadcast picture according to the target object area data and the The barrage data is described, and the barrage is rendered and displayed outside the area occupied by the target object in the live broadcast screen.
  • the video stream data includes a certain number of video frames; the generation module 602 is specifically configured to perform contour recognition on the video frames to obtain coordinate points corresponding to the recognized contour of the target object. Set, and use the coordinate point set as the target object area data.
  • the sending module 603 is specifically configured to add the target object area data as supplementary enhancement information to the video stream data according to predetermined video stream encoding rules.
  • the display module is specifically configured to, for each coordinate point in the set of coordinate points, map the coordinate point to a target coordinate point according to a predetermined mapping relationship; wherein, the mapping The relationship is a mapping relationship between the image size output by the contour recognition, the image size of the video frame, and the image size of the live screen; connect the target coordinate points in sequence to obtain a closed curve; The area defined by the curve is determined as the area occupied by the target object in the live broadcast screen.
  • the display module is specifically configured to determine the first scaling parameter according to the image size of the contour recognition output and the image size of the video frame; according to the image size of the video frame and the Determine the second scaling parameter according to the image size of the live broadcast screen; determine the cropping parameter according to the image size output by the contour recognition, the image size of the video frame and the image size of the live broadcast screen; determine the cropping parameter according to the first scaling parameter , the second scaling parameter and the cropping parameter, mapping the coordinate point to the target coordinate point.
  • the coordinate points include abscissas and ordinates; a display module is specifically configured to determine movement parameters according to the image size of the live broadcast; and according to the first scaling parameter, the The second scaling parameter and the cropping parameter are used to map the abscissa; the ordinate is mapped according to the first scaling parameter, the second scaling parameter and the movement parameter; according to the mapping The processed abscissa and ordinate are used to obtain the target coordinate point.
  • the devices of the above embodiments are used to implement the corresponding live video processing methods in any of the foregoing embodiments applied to the anchor end or the server, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.
  • the live video processing device 700 includes:
  • the receiving module 701 is configured to receive video stream data and barrage data for live broadcast; wherein the video stream data carries target object area data, and the target object area data is used to determine the occupancy of the target object in the live broadcast screen. Area;
  • the display module 702 is configured to display the live broadcast picture according to the video stream data, and to display the area occupied by the target object in the live broadcast picture according to the target object area data and the barrage data. External rendering displays barrages.
  • the display module 702 is specifically configured to, for each coordinate point in the set of coordinate points, map the coordinate point to a target coordinate point according to a predetermined mapping relationship; wherein, the The mapping relationship is the mapping relationship between the image size output by the contour recognition, the image size of the video frame, and the image size of the live screen; connect the target coordinate points in sequence to obtain a closed curve; The area defined by the closed curve is determined as the area occupied by the target object in the live broadcast screen.
  • the display module 702 is specifically configured to determine the first scaling parameter according to the image size of the contour recognition output and the image size of the video frame; according to the image size of the video frame and Determine the second zoom parameter based on the image size of the live screen; determine the cropping parameter based on the image size output by the contour recognition, the image size of the video frame, and the image size of the live screen; determine the cropping parameter based on the first zoom parameters, the second scaling parameter and the cropping parameter to map the coordinate point to the target coordinate point.
  • the coordinate points include abscissas and ordinates; the display module 702 is specifically configured to determine movement parameters according to the image size of the live broadcast; according to the first scaling parameter, the According to the second scaling parameter and the cropping parameter, the abscissa is mapped; according to the first scaling parameter, the second scaling parameter and the movement parameter, the ordinate is mapped; according to The processed abscissa and ordinate are mapped to obtain the target coordinate point.
  • the devices of the above embodiments are used to implement the corresponding live video processing method in any of the foregoing embodiments applied to the viewing end, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.
  • embodiments of the present application also provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, any of the above steps are implemented.
  • a method for processing live video according to an embodiment.
  • FIG. 8 shows a more specific hardware structure diagram of an electronic device provided by this embodiment.
  • the device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050.
  • the processor 1010, the memory 1020, the input/output interface 1030 and the communication interface 1040 implement communication connections between each other within the device through the bus 1050.
  • the processor 1010 can be implemented using a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related program to implement the technical solutions provided by the embodiments of this specification.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor central processing unit
  • ASIC Application Specific Integrated Circuit
  • the memory 1020 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc.
  • the memory 1020 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1020 and called and executed by the processor 1010 .
  • the input/output interface 1030 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • Input devices can include keyboards, mice, touch screens, microphones, various sensors, etc., and output devices can include monitors, speakers, vibrators, indicator lights, etc.
  • the communication interface 1040 is used to connect a communication module (not shown in the figure) to realize communication interaction between this device and other devices.
  • the communication module can realize communication through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 1050 includes a path that carries information between various components of the device (eg, processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
  • the above device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, during specific implementation, the device may also include necessary components for normal operation. Other components.
  • the above-mentioned device may only include components necessary to implement the embodiments of this specification, and does not necessarily include all components shown in the drawings.
  • the electronic devices of the above embodiments are used to implement the corresponding live video processing method in any of the foregoing embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.
  • embodiments of the present application also provide a non-transitory computer-readable storage medium.
  • the non-transitory computer-readable storage medium stores computer instructions.
  • the computer instructions are used to cause the computer to execute the above tasks.
  • the computer-readable media in this embodiment include permanent and non-permanent, removable and non-removable media, and information storage can be implemented by any method or technology.
  • Information may be computer readable instructions, Data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory.
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disc read-only memory CD-ROM
  • DVD digital versatile disc
  • Magnetic tape cassettes tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium can be used to store information that can be accessed by a computing device.
  • the computer instructions stored in the storage medium of the above embodiments are used to cause the computer to execute the live video processing method as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which will not be described again here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请提供一种直播视频处理方法、装置、电子设备及存储介质。所述方法包括:获取用于直播的视频流数据;根据所述视频流数据,生成目标对象区域数据;将所述目标对象区域数据添加至所述视频流数据并发送,以使在直播过程中在目标对象所占的区域区域之外渲染显示弹幕。

Description

直播视频处理方法、装置、电子设备及存储介质
本申请要求2022年4月27日递交的、标题为“直播视频处理方法、装置、电子设备及存储介质”、申请号为CN202210459338.8的中国发明专利申请的优先权。
技术领域
本申请涉及计算机技术领域,尤其涉及一种直播视频处理方法、装置、电子设备及存储介质。
背景技术
随着互联网和智能终端的发展,在线直播已经成为网络时代人们休闲互动的场景之一。在直播过程中,用户在观看直播的同时,还可以发弹幕。
发明内容
有鉴于此,本申请的目的在于提出一种直播视频处理方法、装置、电子设备及存储介质。
基于上述目的,本申请提供了一种直播视频处理方法,包括:
获取用于直播的视频流数据;
根据所述视频流数据,生成目标对象区域数据;
将所述目标对象区域数据添加至所述视频流数据并发送,以使在直播过程中在目标对象所占的区域区域之外渲染显示弹幕。
在一些实施方式中,所述方法还包括:获取用于直播的弹幕数据;根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
在一些实施方式中,所述视频流数据包括一定数量的视频帧;所述根据所述视频流数据,生成目标对象区域数据,包括:对所述视频帧进行轮廓识别,得到对应于识别到的目标对象轮廓的坐标点集合,将所述坐标点集合作 为所述目标对象区域数据。
在一些实施方式中,所述将所述目标对象区域数据添加至所述视频流数据并发送,包括:根据预定的视频流编码规则,将所述目标对象区域数据作为补充增强信息添加至所述视频流数据。
在一些实施方式中,所述方法还包括通过以下方法确定直播画面中目标对象所占的区域:对于所述坐标点集合中的每个坐标点,将所述坐标点根据预定的映射关系映射为目标坐标点;其中,所述映射关系为所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小之间的映射关系;将所述目标坐标点依次相连,以得到闭合曲线;将所述闭合曲线限定的区域确定为直播画面中目标对象所占的区域。
在一些实施方式中,所述将所述坐标点根据预定的映射关系映射为目标坐标点,包括:根据所述轮廓识别输出的图像大小与所述视频帧的图像大小,确定第一缩放参数;根据所述视频帧的图像大小与所述直播画面的图像大小,确定第二缩放参数;根据所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小,确定裁剪参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,将所述坐标点映射为目标坐标点。
在一些实施方式中,所述坐标点包括横坐标和纵坐标;所述将该坐标点根据预定的映射关系映射为目标坐标点,包括:根据所述直播画面的图像大小,确定移动参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,对所述横坐标进行映射处理;根据所述第一缩放参数、所述第二缩放参数和所述移动参数,对所述纵坐标进行映射处理;根据映射处理后的所述横坐标和所述纵坐标,得到所述目标坐标点。
根据本公开的另一方面,本申请还提供了一种直播视频处理方法,包括:
接收用于直播的视频流数据和弹幕数据;其中,所述视频流数据携带有目标对象区域数据,所述目标对象区域数据用于确定直播画面中目标对象所占的区域;
根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
在一些实施方式中,所述方法还包括通过以下方法确定直播画面中目标对象所占的区域:对于所述坐标点集合中的每个坐标点,将所述坐标点根据 预定的映射关系映射为目标坐标点;其中,所述映射关系为所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小之间的映射关系;将所述目标坐标点依次相连,以得到闭合曲线;将所述闭合曲线限定的区域确定为直播画面中目标对象所占的区域。
在一些实施方式中,所述将所述坐标点根据预定的映射关系映射为目标坐标点,包括:根据所述轮廓识别输出的图像大小与所述视频帧的图像大小,确定第一缩放参数;根据所述视频帧的图像大小与所述直播画面的图像大小,确定第二缩放参数;根据所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小,确定裁剪参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,将所述坐标点映射为目标坐标点。
在一些实施方式中,所述坐标点包括横坐标和纵坐标;所述将该坐标点根据预定的映射关系映射为目标坐标点,包括:根据所述直播画面的图像大小,确定移动参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,对所述横坐标进行映射处理;根据所述第一缩放参数、所述第二缩放参数和所述移动参数,对所述纵坐标进行映射处理;根据映射处理后的所述横坐标和所述纵坐标,得到所述目标坐标点。
根据本公开的另一方面,本申请还提供了一种直播视频处理装置,包括:
获取模块,被配置为获取用于直播的视频流数据;
生成模块,被配置为根据所述视频流数据,生成目标对象区域数据;
发送模块,被配置为将所述目标对象区域数据添加至所述视频流数据并发送,以使在直播过程中在目标对象所占的区域区域之外渲染显示弹幕。
根据本公开的另一方面,本申请还提供了一种直播视频处理装置,包括:
接收模块,被配置为接收用于直播的视频流数据和弹幕数据;其中,所述视频流数据携带有目标对象区域数据,所述目标对象区域数据用于表征直播画面中目标对象所占的区域;
显示模块,被配置为根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
根据本公开的另一方面,本申请还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执 行所述程序时实现如上任意一项所述的方法。
根据本公开的另一方面,本申请还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使计算机执行如上任意一项所述的方法。
附图说明
为了更清楚地说明本申请或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例的应用场景示意图;
图2为本申请实施例用于主播端或服务器的直播视频处理方法流程示意图;
图3为本申请实施例实现的弹幕避让目标对象的直播画面示意图;
图4为本申请实施例中的映射关系示意图;
图5为本申请实施例用于看播端的直播视频处理方法流程示意图;
图6为本申请实施例用于主播端或服务器的直播视频处理装置结构示意图;
图7为本申请实施例用于看播端的直播视频处理装置结构示意图;
图8为本申请实施例的电子设备结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请进一步详细说明。
下面将参考若干示例性实施方式来描述本申请的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本申请,而并非以任何方式限制本申请的范围。相反,提供这些实施方式是为了使本申请更加透彻和完整,并且能够将本申请的范围完整地传达给本领域的技术人员。
根据本申请的实施方式,提出了一种直播视频处理方法、装置、电子设备及存储介质。
在本文中,需要理解的是,附图中的任何元素数量均用于示例而非限制,以及任何命名都仅用于区分,而不具有任何限制含义。
下面参考本申请的若干代表性实施方式,详细阐释本申请的原理和精神。
在直播过程中,直播的观看者可以开启弹幕功能,在弹幕功能开启后,直播视频中会显示所有该直播的观看者发出的大量以字幕(或者图片)形式显示的评论,这使得观看者既能够观看当前的直播视频,又能够看到其他视频观看者对该直播视频的弹幕互动,给直播过程增加了趣味性和互动性。然而,当大量的弹幕同时显示在当前直播画面中时,容易对当前直播视频中的目标对象进行遮挡;例如,该目标对象可以是虚拟形象(虚拟主播),或者是供观看者阅读的带有特定信息(如广告)的信息展示窗口等,从而导致观看者无法清楚、完整的观看到直播视频中的目标对象,影响直播效果。可以理解的是,在直播的场景下,直播视频中的目标对象恰恰是直播视频中最重要的视频内容,也是观看者最希望清楚观看的内容。而当大量弹幕对直播视频中的目标对象进行遮挡时,会显著的影响直播效果。虽然,目前也出现一些针对弹幕遮挡直播视频中的目标对象的解决方案,但是通常需要对看播端做额外配置,不易实施,成本较高。
在相关技术中,存在一些实现弹幕避让视频中的目标对象的方案。然而,相关技术中的方案,是针对于视频播放场景下的,该场景下播放的视频是离线的,需要预先对于离线视频进行相应的处理,以实现视频中的弹幕避让目标对象效果。该相关技术中的方案无法应用到在线直播的场景下。
针对于上述问题,本申请提供了一种直播视频处理方案,在主播端或服务器,基于用于直播的视频流数据,生成用于表征直播画面中目标对象所占的区域的目标对象区域数据,将该目标对象区域数据添加到直播的视频流数据后再发送至看播端。在主播端和看播端上,能够基于视频流数据中携带的目标对象区域数据,确定出直播画面中目标对象所占的区域,并仅在目标对象所占的区域之外的直播画面中渲染显示弹幕,从而在直播过程中实现弹幕避让目标对象的效果。本申请的方案,避免了弹幕对直播视频中目标对象的遮挡,仅在主播端或服务器进行处理得到目标对象区域数据,并将目标对象区域数据添加至视频流数据一并传输,故仅需对主播端或服务器的硬件配置进行相应设置,看播端的硬件配置均无需额外设置,兼容性强且易于实施。
参考图1,为本申请实施例的直播视频处理方法的应用场景示意图。
该应用场景包括客户端设备101、客户端设备103和服务器102,其中,客户端设备101、客户端设备103均可以与服务器102通过网络连接以实现数据交互。
可选地,客户端设备101和客户端设备103可以是靠近用户侧的具有数据传输、多媒体输入/输出功能的电子设备,如,计算机、平板电脑、智能手机、车载电脑、可穿戴设备等。
可选的,服务器102可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是云服务器等。
在本申请实施例的在线直播场景中,客户端设备101可以为主播使用的客户端设备,为更加清楚明确的表述,本申请实施例中称之为主播端。客户端设备103可以为观看者使用的客户端设备,为更加清楚明确的表述,本申请实施例中称之为看播端;在一般情况下,看播端会存在有多个。具体的,主播端和看播端均可以通过安装的直播客户端与服务器102通信,以使用服务器102提供的在线直播服务。
下面结合图1的应用场景,来描述根据本申请示例性实施方式的直播视频处理方法。需要注意的是,上述应用场景仅是为了便于理解本申请的精神和原理而示出,本申请的实施方式在此方面不受任何限制。相反,本申请的实施方式可以应用于适用的任何场景。
首先,本申请实施例提供了一种直播视频处理方法,该方法应用于主播端。
参考图2,本实施例的直播视频处理方法,可以包括以下步骤:
步骤S201、获取用于直播的视频流数据。
本实施例中,用于直播的视频流数据是指主播端生成的用于向各个看播端进行在线直播的视频数据,一般情况下,该视频流数据时通过主播端设置的或者外接的图像获取装置(如摄像头)获取并经过主播端设置的流媒体处理软件处理后发送至服务器的。
步骤S202、根据所述视频流数据,生成目标对象区域数据。
本实施例中,目标对象区域数据是指用于表征直播画面中目标对象所占的区域的数据。其中,目标对象可以是直播画面中的期望向观众展示的对象, 如直播画面中的虚拟形象(虚拟主播)或虚拟形象的局部或直播画面中展示的商品、或直播画面中的显示元素例如信息展示框(商品讲解卡)等,尤其是直播画面中不期望被弹幕遮挡而影响观众体验的对象;相应的,目标对象所占的区域可以是指直播画面中目标对象的外轮廓所包围限定出的区域,该区域内的画面内容一般是本申请的方案中不希望被弹幕所遮挡的直播内容。
本实施例中,视频流数据包括一定数量按照直播播放时序依次排列的视频帧,每一视频帧即直播视频的一帧图像。本步骤中根据视频流数据生成目标对象区域数据时,可以是对于每一视频帧均相应的生成目标对象区域数据。
作为一种可选的实施方式,可以通过识别轮廓所对应的坐标点的方式得到目标对象区域数据。具体的,根据视频流数据生成目标对象区域数据的方法可以包括:对视频帧进行轮廓识别,得到对应于识别到的目标对象轮廓的坐标点集合,将该坐标点集合作为目标对象区域数据。
其中,对视频帧进行轮廓识别和目标对象轮廓的坐标点的生成,可以使用Matting算法和FindContour算法实现。Matting算法可以对视频帧进行轮廓识别,识别出视频帧内包括的目标对象的轮廓。基于目标对象轮廓,再通过FindContour算法,输出目标对象轮廓对应的全部坐标点,该些坐标点即构成对应于视频帧中目标对象轮廓的坐标点集合。该坐标点集合即可以作为目标对象区域数据。其中,Matting算法和FindContour算法可以预先部署在主播端的本地,在使用时从本地调用;此外,上述算法也可以部署在网络侧的存储位置,例如提供开放接口接入的算法平台,在使用上述算法时主播端相应的访问算法平台进行调用并接收计算结果即可。
可以理解的是,具体实施时,除了上述使用Matting算法和FindContour算法之外,也可以使用其他任意可行的相关方案对视频帧进行轮廓识别以及生成目标对象的轮廓对应的坐标点,对于具体所使用的方法本申请实施例中不做限定。
此外,需要说明的是,对于一帧视频帧,其中包括的目标对象可能并不止一个,则相应的生成的目标对象区域数据应该是与视频帧中包括的目标对象的数量相对应的。且对于每个视频帧中的目标对象,其对应的目标对象区域数据是分别生成并存储的。例如,对于前述通过FindContour算法得到坐标点集合的方式,当视频帧中包括多个目标对象时,则会为每个目标对象分 别生成对应的坐标点集合。
步骤S203、将所述目标对象区域数据添加至所述视频流数据并发送,以使在直播过程中在目标对象所占的区域区域之外渲染显示弹幕。
本实施例中,对于前述步骤生成的目标对象区域数据,会将该目标对象区域数据相加至视频流数据,使视频流数据携带目标对象区域数据并推流至直播服务器,后续的,看播端从直播服务器拉取视频流数据时,其拉取到的即为上述携带有目标对象区域数据的视频流数据。
作为一种可选的实施方式,可以通过将目标对象区域数据作为补充增强信息(Supplemental Enhancement Information,SEI)添加至视频流数据。具体的,将目标对象区域数据添加至视频流数据并发送,可以包括:根据预定的视频流编码规则,将目标对象区域数据作为补充增强信息添加至视频流数据。
其中,SEI是视频流编码压缩标准H.264/H.265的特性之一。SEI可以在视频流数据生成和传输的过程中进行添加,SEI携带的数据会与视频流数据一起传输至视频流数据的播放端。在相关技术中,常见的SEI的内容可以是压缩编码参数、摄像头参数等。本申请实施例中,基于SEI的上述特征,将生成目标对象区域数据通过SEI的方式添加至视频流数据中。通过SEI的方式为视频流数据添加目标对象区域数据,能够利用SEI的特点并加以充分的利用,同时对于已有的视频流数据的结构、传输方式等均不会发生变化,兼容性极佳。
此外,具体实施时,除了上述使用SEI添加目标对象区域数据的方式之外,也可以使用其他任意可行的相关方案将目标对象区域数据添加至视频流数据,对于具体所使用的方法本申请实施例中不做限定。
本实施例中,携带有目标对象区域数据的视频流数据推流至直播服务器,并被看播端拉取进行直播时,看播端可以根据视频流数据携带的目标对象区域数据,确定出直播画面中目标对象所占的区域,并在开启弹幕效果时,能够基于确定出的直播画面中目标对象所占的区域,仅在该区域之外渲染显示弹幕。其中,根据目标对象区域数据确定直播画面中目标对象所占的区域的具体方式,会在后述实施例中详细说明。
可见,本申请实施例的方法,在主播端生成用于表征直播画面中目标对象所占的区域的目标对象区域数据,并将该目标对象区域数据添加至视频流 数据后一并发送,后续看播端拉取到视频流数据后,可以直接获得目标对象区域数据并相应实现弹幕避让目标对象的效果。目标对象区域数据的处理仅在主播端进行,直播服务器和看播端均无需进行过多额外的设置,在实现弹幕避让目标对象的效果的同时,方案易于实施,且具有更高的兼容性。
在一些可选的实施例中,在主播端除了发送携带有目标对象区域数据添的视频流数据之外,其本地也需要相应的显示直播画面,并且在主播端显示的直播画面中,也需要实现弹幕避让目标对象的效果。故对于应用于主播端的直播视频处理方法,还可以包括:获取用于直播的弹幕数据;根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
其中,弹幕数据是主播端从直播服务器获取的,其中包括了在当前直播中各个看播端发送的弹幕。基于该获取到的弹幕数据,并在当前直播的弹幕功能开启时,可以根据弹幕数据在直播画面中显示弹幕。例如,可以通过EffectSDK实现视频流数据和弹幕数据的处理;EffectSDK提供跨平台的音视频特效库,可以实现丰富的音视频特效、图文编辑渲、互动功能等。
本实施例中,生成并显示弹幕可以通过任意可行的相关技术实现。例如,可以通过蒙层的方式在直播画面中生成并显示弹幕;蒙层,也称为遮罩层或掩膜层,可以直观的理解为在直播画面上显示的图像层,通过透明度的设置,蒙层可以实现对其覆盖的图像层的部分遮盖;在通过蒙层渲染生成并显示弹幕时,弹幕以文本或图片等形式生成并在直播画面中显示,以实现直播画面中的弹幕效果。
本实施例中,基于视频流数据携带的目标对象区域数据,可以根据目标对象区域数据确定出直播画面中目标对象所占的区域。在显示弹幕时,可以仅在目标对象所占的区域之外渲染显示弹幕。以通过蒙层实现弹幕为例,基于确定出的直播画面中目标对象所占的区域,通过对于蒙层的设置,控制在直播画面中目标对象所占的区域之内不渲染生成任何弹幕,这样在目标对象所占的区域之内即可以清楚完整的看到原始的直播画面中的目标对象,从而实现弹幕避让目标对象的效果。最终形成的弹幕避让目标对象的效果可参考图3所示;图3中整体为直播画面,直播中弹幕301实现了避让目标对象302的效果。
此外,基于本申请实施例中的目标对象区域数据确定出的直播画面中目标对象所占的区域,也可以通过其他任意可行的相关技术,控制在直播画面中目标对象所占的区域之内不渲染显示弹幕,仅在该区域之外渲染显示弹幕的效果。
作为一个可选的实施方式,当目标对象区域数据为坐标点集合时,还可以通过以下方法确定直播画面中目标对象所占的区域:对于所述坐标点集合中的每个坐标点,将所述坐标点根据预定的映射关系映射为目标坐标点;将所述目标坐标点依次相连,以得到闭合曲线;将所述闭合曲线限定的区域确定为直播画面中目标对象所占的区域。
具体实施时,考虑到基于对视频流数据得到目标对象区域数据时,其具体使用的算法所基于的图像大小与后续形成视频流数据的视频帧时所使用的图像大小不相同,而视频帧的图像大小与在播放时(主播端或看播端)的直播画面的图像大小也不相同。由于存在上述各阶段中的图像大小的区别,则对于坐标点集合中的各个坐标点,应先将各个坐标点进行映射。具体的,可以预先设置映射关系,该映射关系为轮廓识别输出的图像大小、视频帧的图像大小以及直播画面的图像大小之间的映射关系。根据该映射关系,将坐标点集合中的每个坐标点先映射为基于视频帧的图像大小的表示,在映射为基于直播画面的图像大小的表示,将最终得到的该坐标点表示本实施例中称为目标坐标点。
具体实施时,对于经过映射处理得到的目标坐标点,将目标坐标点依次相连,以得到一闭合曲线。其中,对于任意相邻的两个目标坐标点,可以在其间连接贝塞尔曲线,从而将全部的目标坐标点依次连接形成闭合曲线。该闭合曲线限定出的区域,即可以确定为直播画面中目标对象所占的区域。其中,绘制得到闭合曲线可以采用任意的相关技术,例如,可以使用Android系统中的图形Api:Path.quadTo形成上述闭合曲线;后续可以使用Android系统中的画布工具Canvas Api绘制出上述闭合曲线,并结合蒙层的设置在该绘制出的闭合曲线内的区域控制不渲染显示弹幕,从而清楚完整的显示直播画面中的目标对象,实现弹幕避让。
作为一个可选的实施方式,当采用FindContour算法得到坐标点时,将坐标点根据预定的映射关系映射为目标坐标点,可以包括:根据所述轮廓识 别输出的图像大小与所述视频帧的图像大小,确定第一缩放参数;根据所述视频帧的图像大小与所述直播画面的图像大小,确定第二缩放参数;根据所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小,确定裁剪参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,将所述坐标点映射为目标坐标点。
本实施例中,FindContour算法的输出是以一个固定大小的图像(128×224)为参考系的,而视频帧的传输使用的图像大小一般为(720×1080),而直播画面的大小要取决于主播端或看播端的软硬件设置。图4示出上述各图像大小的映射关系。其中,根据主播端或看播端的软硬件设置,直播画面一般还会对画面在横向上进行裁剪。
其中,第一缩放参数,用于实现将坐标点从轮廓识别输出的图像大小到视频帧的图像大小的映射;第二缩放参数,用于实现将坐标点从视频帧的图像大小到播画面的图像大小的映射;裁剪参数,则用于实现直播画面在横向上的裁剪,其是基于上述映射关系得到的。
具体实施时,坐标点到目标坐标点的映射过程可以通过以下公式表示:


上式中,x为目标坐标点的横坐标,y为目标坐标点的纵坐标,originX为坐标点的横坐标,originY为坐标点的纵坐标,EFFECT_OUTPUT_WIDTH为识别算法输出的图像的宽度,EFFECT_OUTPUT_HEIGHT为识别算法输出的图像的高度,STREAM_WIDTH为视频帧的图像宽度,STREAM_HEIGHT为视频帧的图像的高度,PLAYER_VIEW_WIDTH为直播画面的图像的宽度,PLAYER_VIEW_HEIGHT为直播画面的图像的高度,C为裁剪参数;上述各参数中,宽度是指图像横向上的尺度,高度是指图像纵向上的尺度。
上述计算公式中,两项即为第一缩放参数;即 为第二缩放参数。
在一些实施方式中,在直播画面中,并非是全部直播画面的区域都用于显示弹幕。例如,对于一般的直播画面,其画面的上部往往会用于显示一些与直播有关的相关信息,如直播间名称、直播名称等,在显示相关信息的区域内并不会用于显示弹幕。考虑到上述直播画面存储的具体设置,本实施例中,为了实现最后确定出的直播画面中目标对象所占的区域与实际用于显示弹幕的区域的对应,还会在从坐标点到目标坐标点的映射过程中,进一步的对坐标值进行调整。
具体的,本实施例中,会根据直播画面的图像大小以及用于显示弹幕的区域的设置,确定移动参数,该移动参数表征用于显示弹幕的区域到直播画面的上边缘的距离。
具体实施时,根据第一缩放参数、第二缩放参数和裁剪参数,对坐标点的横坐标进行映射处理,具体的计算过程可以参考前述的关于x的计算公式。
具体实施时,根据第一缩放参数、第二缩放参数和移动参数对坐标点的纵坐标进行映射处理,映射过程可以通过以下公式表示:
上式中,TOP_MARGIN为移动参数。
通过上述映射处理,使基于得到的目标坐标点确定出的直播画面中目标对象所占的区域能够与主播端、看播端的画面输出设置向适应,并能够与直播画面中实际用于显示弹幕的区域的对应。
作为一个可选的实施方式,本申请实施例的直播视频处理方法,还可以用于服务器。该服务器即是指基于流媒体传输技术、用于实现视频在线直播的直播服务器。
本申请实施例的应用于服务器的直播视频处理方法所包括的步骤,可以参考图2所示。
对于步骤S201,服务器获取由看播端上传的、用于直播的视频流数据。此外,对于视频流数据的生成、视频流数据的传输等步骤的执行主体为服务器,上述各步骤的具体实施方式可以参照前述应用于主播端的方法实施例,本实施例中不再赘述。
对于步骤202和步骤203,对于生成目标对象区域数据、将目标对象区 域数据添加至视频流数据、视频流数据的发送等步骤的执行主体为服务器,上述各步骤的具体实施方式可以参照前述应用于主播端的方法实施例,本实施例中不再赘述。
此外,对于步骤203中的发送相关步骤,是指服务器将添加目标对象区域数据后的视频流数据向看播端发送。
本实施例的直播视频处理方法,通过服务器接收主播端上传的视频流数据后,基于用于直播的视频流数据,生成用于表征直播画面中目标对象所占的区域的目标对象区域数据,将该目标对象区域数据添加到直播的视频流数据后发送至看播端,后续看播端拉取到视频流数据后,可以直接获得目标对象区域数据并相应实现弹幕避让目标对象的效果。目标对象区域数据的处理仅在服务器进行,主播端和看播端均无需进行过多额外的设置,在实现弹幕避让目标对象的效果的同时,方案易于实施,且具有更高的兼容性。
基于同一技术构思,本申请实施例还提供了一种直播视频处理方法,该方法应用于看播端。
参考图5,本实施例的直播视频处理方法,可以包括以下步骤:
步骤S501、接收用于直播的视频流数据和弹幕数据;其中,所述视频流数据携带有目标对象区域数据,所述目标对象区域数据用于确定直播画面中目标对象所占的区域;
步骤S502、根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
本实施例中,看播端从直播服务器拉取得到用于直播的视频流数据和弹幕数据,以进行直播画面的显示,以及直播画面中弹幕的显示。其中,由于视频流数据携带有目标对象区域数据,则看播端可以根据目标对象区域数据,确定直播画面中目标对象所占的区域,并控制仅在直播画面中目标对象所占的区域之外的区域内渲染显示弹幕,在直播画面中目标对象所占的区域之内区域不渲染显示弹幕,从清楚完整的显示直播画面中的目标对象,实现弹幕避让目标对象的效果。
本实施例中,涉及的显示直播画面、渲染显示弹幕、确定直播画面中目标对象所占的区域、坐标点根据预定的映射关系映射为目标坐标点等具体的 实现方式以及相应的有益效果,在前述应用于主播端的方法实施例中已有详细说明,其具体实施方式均可以参照前述任一应用于主播端的方法的实施例,本实施例中不再赘述。
需要说明的是,本申请实施例的方法可以由单个设备执行,例如一台计算机或服务器等。本实施例的方法也可以应用于分布式场景下,由多台设备相互配合来完成。在这种分布式场景的情况下,这多台设备中的一台设备可以只执行本申请实施例的方法中的某一个或多个步骤,这多台设备相互之间会进行交互以完成所述的方法。
需要说明的是,上述对本申请的一些实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于上述实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
基于同一技术构思,本申请实施例还提供了一种直播视频处理装置。参考图6,所述的直播视频处理装置600,包括:
获取模块601,被配置为获取用于直播的视频流数据;
生成模块602,被配置为根据所述视频流数据,生成目标对象区域数据;
发送模块603,被配置为将所述目标对象区域数据添加至所述视频流数据并发送,以使在直播过程中在目标对象所占的区域区域之外渲染显示弹幕。
在一些可选的实施例中,所述装置还包括:显示模块,被配置为获取用于直播的弹幕数据;根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
在一些可选的实施例中,所述视频流数据包括一定数量的视频帧;生成模块602,具体被配置为对所述视频帧进行轮廓识别,得到对应于识别到的目标对象轮廓的坐标点集合,将所述坐标点集合作为所述目标对象区域数据。
在一些可选的实施例中,发送模块603,具体被配置为根据预定的视频流编码规则,将所述目标对象区域数据作为补充增强信息添加至所述视频流数据。
在一些可选的实施例中,显示模块,具体被配置为对于所述坐标点集合中的每个坐标点,将所述坐标点根据预定的映射关系映射为目标坐标点;其中,所述映射关系为所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小之间的映射关系;将所述目标坐标点依次相连,以得到闭合曲线;将所述闭合曲线限定的区域确定为直播画面中目标对象所占的区域。
在一些可选的实施例中,显示模块,具体被配置为根据所述轮廓识别输出的图像大小与所述视频帧的图像大小,确定第一缩放参数;根据所述视频帧的图像大小与所述直播画面的图像大小,确定第二缩放参数;根据所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小,确定裁剪参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,将所述坐标点映射为目标坐标点。
在一些可选的实施例中,所述坐标点包括横坐标和纵坐标;显示模块,具体被配置为根据所述直播画面的图像大小,确定移动参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,对所述横坐标进行映射处理;根据所述第一缩放参数、所述第二缩放参数和所述移动参数,对所述纵坐标进行映射处理;根据映射处理后的所述横坐标和所述纵坐标,得到所述目标坐标点。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本申请时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
上述实施例的装置用于实现前述任一应用于主播端或服务器的实施例中相应的直播视频处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一技术构思,本申请实施例还提供了一种直播视频处理装置。参考图7,所述的直播视频处理装置700,包括:
接收模块701,被配置为接收用于直播的视频流数据和弹幕数据;其中,所述视频流数据携带有目标对象区域数据,所述目标对象区域数据用于确定直播画面中目标对象所占的区域;
显示模块702,被配置为根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之 外渲染显示弹幕。
在一些可选的实施例中,显示模块702,具体被配置为对于所述坐标点集合中的每个坐标点,将所述坐标点根据预定的映射关系映射为目标坐标点;其中,所述映射关系为所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小之间的映射关系;将所述目标坐标点依次相连,以得到闭合曲线;将所述闭合曲线限定的区域确定为直播画面中目标对象所占的区域。
在一些可选的实施例中,显示模块702,具体被配置为根据所述轮廓识别输出的图像大小与所述视频帧的图像大小,确定第一缩放参数;根据所述视频帧的图像大小与所述直播画面的图像大小,确定第二缩放参数;根据所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小,确定裁剪参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,将所述坐标点映射为目标坐标点。
在一些可选的实施例中,所述坐标点包括横坐标和纵坐标;显示模块702,具体被配置为根据所述直播画面的图像大小,确定移动参数;根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,对所述横坐标进行映射处理;根据所述第一缩放参数、所述第二缩放参数和所述移动参数,对所述纵坐标进行映射处理;根据映射处理后的所述横坐标和所述纵坐标,得到所述目标坐标点。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本申请时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
上述实施例的装置用于实现前述任一应用于看播端的实施例中相应的直播视频处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一技术构思,本申请实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上任一实施例所述的直播视频处理方法。
图8示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图,该设备可以包括:处理器1010、存储器1020、输入/输出接口1030、通信接口1040和总线1050。其中处理器1010、存储器1020、输入/输出接口1030和通信接口1040通过总线1050实现彼此之间在设备内部的通信连接。
处理器1010可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器1020可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器1020可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器1020中,并由处理器1010来调用执行。
输入/输出接口1030用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口1040用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线1050包括一通路,在设备的各个组件(例如处理器1010、存储器1020、输入/输出接口1030和通信接口1040)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器1010、存储器1020、输入/输出接口1030、通信接口1040以及总线1050,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
上述实施例的电子设备用于实现前述任一实施例中相应的直播视频处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
基于同一技术构思,本申请实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行如上任一实施例所述的直播视频处理方法。
本实施例的计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、 数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
上述实施例的存储介质存储的计算机指令用于使所述计算机执行如上任一实施例所述的直播视频处理方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
需要说明的是,除非另外定义,本申请实施例使用的技术术语或者科学术语应当为本申请所属领域内具有一般技能的人士所理解的通常意义。本申请实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。元素前的冠词“一”或“一个”不排除多个这种元素的存在。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请的范围(包括权利要求)被限于这些例子;在本申请的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。
本申请实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (15)

  1. 一种直播视频处理方法,其特征在于,包括:
    获取用于直播的视频流数据;
    根据所述视频流数据,生成目标对象区域数据;
    将所述目标对象区域数据添加至所述视频流数据并发送,以使在直播过程中在目标对象所占的区域区域之外渲染显示弹幕。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取用于直播的弹幕数据;
    根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
  3. 根据权利要求2所述的方法,其特征在于,所述视频流数据包括至少一个视频帧;
    所述根据所述视频流数据,生成目标对象区域数据,包括:
    对所述视频帧进行轮廓识别,得到对应于识别到的目标对象轮廓的坐标点集合,将所述坐标点集合作为所述目标对象区域数据。
  4. 根据权利要求1所述的方法,其特征在于,所述将所述目标对象区域数据添加至所述视频流数据并发送,包括:
    根据预定的视频流编码规则,将所述目标对象区域数据作为补充增强信息添加至所述视频流数据。
  5. 根据权利要求3所述的方法,其特征在于,所述方法还包括通过以下方法确定直播画面中目标对象所占的区域:
    对于所述坐标点集合中的每个坐标点,将所述坐标点根据预定的映射关系映射为目标坐标点;其中,所述映射关系为所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小之间的映射关系;
    将所述目标坐标点依次相连,以得到闭合曲线;
    将所述闭合曲线限定的区域确定为直播画面中目标对象所占的区域。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述坐标点根据预定的映射关系映射为目标坐标点,包括:
    根据所述轮廓识别输出的图像大小与所述视频帧的图像大小,确定第一 缩放参数;
    根据所述视频帧的图像大小与所述直播画面的图像大小,确定第二缩放参数;
    根据所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小,确定裁剪参数;
    根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,将所述坐标点映射为目标坐标点。
  7. 根据权利要求6所述的方法,其特征在于,所述坐标点包括横坐标和纵坐标;
    所述将该坐标点根据预定的映射关系映射为目标坐标点,包括:
    根据所述直播画面的图像大小,确定移动参数;
    根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,对所述横坐标进行映射处理;
    根据所述第一缩放参数、所述第二缩放参数和所述移动参数,对所述纵坐标进行映射处理;
    根据映射处理后的所述横坐标和所述纵坐标,得到所述目标坐标点。
  8. 一种直播视频处理方法,其特征在于,包括:
    接收用于直播的视频流数据和弹幕数据;其中,所述视频流数据携带有目标对象区域数据,所述目标对象区域数据用于确定直播画面中目标对象所占的区域;
    根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括通过以下方法确定直播画面中目标对象所占的区域:
    对于所述坐标点集合中的每个坐标点,将所述坐标点根据预定的映射关系映射为目标坐标点;其中,所述映射关系为所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小之间的映射关系;
    将所述目标坐标点依次相连,以得到闭合曲线;
    将所述闭合曲线限定的区域确定为直播画面中目标对象所占的区域。
  10. 根据权利要求9所述的方法,其特征在于,所述将所述坐标点根据 预定的映射关系映射为目标坐标点,包括:
    根据所述轮廓识别输出的图像大小与所述视频帧的图像大小,确定第一缩放参数;
    根据所述视频帧的图像大小与所述直播画面的图像大小,确定第二缩放参数;
    根据所述轮廓识别输出的图像大小、所述视频帧的图像大小以及所述直播画面的图像大小,确定裁剪参数;
    根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,将所述坐标点映射为目标坐标点。
  11. 根据权利要求10所述的方法,其特征在于,所述坐标点包括横坐标和纵坐标;
    所述将该坐标点根据预定的映射关系映射为目标坐标点,包括:
    根据所述直播画面的图像大小,确定移动参数;
    根据所述第一缩放参数、所述第二缩放参数和所述裁剪参数,对所述横坐标进行映射处理;
    根据所述第一缩放参数、所述第二缩放参数和所述移动参数,对所述纵坐标进行映射处理;
    根据映射处理后的所述横坐标和所述纵坐标,得到所述目标坐标点。
  12. 一种直播视频处理装置,其特征在于,包括:
    获取模块,被配置为获取用于直播的视频流数据;
    生成模块,被配置为根据所述视频流数据,生成目标对象区域数据;
    发送模块,被配置为将所述目标对象区域数据添加至所述视频流数据并发送,以使在直播过程中在目标对象所占的区域区域之外渲染显示弹幕。
  13. 一种直播视频处理装置,其特征在于,包括:
    接收模块,被配置为接收用于直播的视频流数据和弹幕数据;其中,所述视频流数据携带有目标对象区域数据,所述目标对象区域数据用于确定直播画面中目标对象所占的区域;
    显示模块,被配置为根据所述视频流数据显示直播画面,并根据所述目标对象区域数据和所述弹幕数据,在直播画面中目标对象所占的区域之外渲染显示弹幕。
  14. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至11任意一项所述的方法。
  15. 一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储计算机指令,其特征在于,所述计算机指令用于使计算机执行如权利要求1至11任意一项所述的方法。
PCT/CN2023/085650 2022-04-27 2023-03-31 直播视频处理方法、装置、电子设备及存储介质 WO2023207516A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210459338.8A CN117014638A (zh) 2022-04-27 2022-04-27 直播视频处理方法、装置、电子设备及存储介质
CN202210459338.8 2022-04-27

Publications (1)

Publication Number Publication Date
WO2023207516A1 true WO2023207516A1 (zh) 2023-11-02

Family

ID=88517385

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/085650 WO2023207516A1 (zh) 2022-04-27 2023-03-31 直播视频处理方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN117014638A (zh)
WO (1) WO2023207516A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190156432A1 (en) * 2017-11-21 2019-05-23 International Business Machines Corporation Focus-object-determined communities for augmented reality users
CN109862380A (zh) * 2019-01-10 2019-06-07 北京达佳互联信息技术有限公司 视频数据处理方法、装置及服务器、电子设备和存储介质
CN111107381A (zh) * 2018-10-25 2020-05-05 武汉斗鱼网络科技有限公司 直播间弹幕显示方法、存储介质、设备及系统
CN111970532A (zh) * 2020-08-27 2020-11-20 网易(杭州)网络有限公司 视频播放方法、装置及设备
CN112423110A (zh) * 2020-08-04 2021-02-26 上海哔哩哔哩科技有限公司 直播视频数据生成方法、装置及直播视频播放方法、装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190156432A1 (en) * 2017-11-21 2019-05-23 International Business Machines Corporation Focus-object-determined communities for augmented reality users
CN111107381A (zh) * 2018-10-25 2020-05-05 武汉斗鱼网络科技有限公司 直播间弹幕显示方法、存储介质、设备及系统
CN109862380A (zh) * 2019-01-10 2019-06-07 北京达佳互联信息技术有限公司 视频数据处理方法、装置及服务器、电子设备和存储介质
CN112423110A (zh) * 2020-08-04 2021-02-26 上海哔哩哔哩科技有限公司 直播视频数据生成方法、装置及直播视频播放方法、装置
CN111970532A (zh) * 2020-08-27 2020-11-20 网易(杭州)网络有限公司 视频播放方法、装置及设备

Also Published As

Publication number Publication date
CN117014638A (zh) 2023-11-07

Similar Documents

Publication Publication Date Title
WO2017193576A1 (zh) 一种视频分辨率的适应方法、装置及虚拟现实终端
CN109327727B (zh) 一种WebRTC中的直播流处理方法及推流客户端
EP3375197B1 (en) Image display apparatus and method of operating the same
CN109168014B (zh) 一种直播方法、装置、设备及存储介质
WO2017211250A1 (zh) 图像的叠加显示方法和系统
WO2021093416A1 (zh) 信息播放方法、装置、计算机可读存储介质及电子设备
CN111541930B (zh) 直播画面的显示方法、装置、终端及存储介质
US10631025B2 (en) Encoding device and method, reproduction device and method, and program
CN110868625A (zh) 一种视频播放方法、装置、电子设备及存储介质
CN107040808B (zh) 用于视频播放中弹幕图片的处理方法和装置
WO2019002559A1 (en) SCREEN SHARING FOR DISPLAY IN A VIRTUAL REALITY
CN105874807B (zh) 用于在电视设备上对Web内容远程渲染的方法、系统和介质
US20110175988A1 (en) 3d video graphics overlay
CN111614993B (zh) 弹幕展示方法、装置、计算机设备及存储介质
CN111414225A (zh) 三维模型远程展示方法、第一终端、电子设备及存储介质
WO2023104102A1 (zh) 一种直播评论展示方法、装置、设备、程序产品及介质
CN110290398B (zh) 视频下发方法、装置、存储介质及电子设备
US11893770B2 (en) Method for converting a picture into a video, device, and storage medium
CN110730340B (zh) 基于镜头变换的虚拟观众席展示方法、系统及存储介质
CN107438203B (zh) 用于建立和接收清单的方法、网络设备及终端
US20190208281A1 (en) User device pan and scan
WO2019118028A1 (en) Methods, systems, and media for generating and rendering immersive video content
CN107580228B (zh) 一种监控视频处理方法、装置及设备
CN109862385B (zh) 直播的方法、装置、计算机可读存储介质及终端设备
US20230412891A1 (en) Video processing method, electronic device and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794951

Country of ref document: EP

Kind code of ref document: A1