CN112261420B

CN112261420B - Live video processing method and related device

Info

Publication number: CN112261420B
Application number: CN202011064701.3A
Authority: CN
Inventors: 李治中; 吴磊; 于志兴; 王元吉; 董亚魁
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2022-07-01
Anticipated expiration: 2040-09-30
Also published as: CN112261420A

Abstract

The embodiment of the application provides a live video processing method and a related device, wherein the method comprises the following steps: processing at least one video frame in a first direct-playing video to obtain position information of an information display area in the at least one video frame; acquiring information of a product to be displayed; displaying the product information in the information display area based on the position information to obtain a second live video; and sending the second live video, and displaying product information in the live video to improve the live effect.

Description

Live video processing method and related device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a live video processing method and a related apparatus.

Background

In recent years, live broadcast with goods becomes a mainstream product marketing mode, a main broadcast introduces product information in a live broadcast room, and a user purchases goods in a link given by the main broadcast, but the situation that the user does not know the product before and needs to switch to other interfaces to inquire the product information often occurs, and the live broadcast effect and the user experience are further improved.

Disclosure of Invention

The embodiment of the application provides a live video processing method and a related device.

In a first aspect of the embodiments of the present application, a live video processing method is provided, where the method includes:

processing at least one video frame in a first direct-playing video to obtain position information of an information display area in the at least one video frame;

acquiring information of a product to be displayed;

displaying the product information in the information display area based on the position information to obtain a second live video;

and sending the second live video.

In the example, the position information of the information display area in at least one video frame of the first live video is obtained, the second live video is obtained according to the position information and the product information displayed in the information display area, the second live video does not need to be switched to other interfaces to inquire the product information, the product information can be directly obtained from the second live video, the convenience of checking the product information is improved, and the live effect is further improved.

With reference to the first aspect, in a possible implementation manner, the acquiring information of a product to be displayed includes:

receiving an operation instruction, wherein the operation instruction comprises identification information of a product to be displayed;

and determining the product information to be displayed according to the product identification information.

In this example, the identification information in the operation instruction is received to determine the product information to be displayed, and the instruction is received to determine the product information to be displayed, so that the efficiency of obtaining the product information to be displayed can be improved.

acquiring an audio clip of a target user in a live broadcasting process;

performing voice recognition on the audio clip to obtain a voice recognition result;

and determining the product information to be displayed according to the voice recognition result.

In this example, the audio clip of the target user who gathers in the live broadcast process determines the product information to be displayed, and the target user is not required to send an instruction and the like, so that the convenience in obtaining the product information to be displayed can be improved.

With reference to the first aspect, in a possible implementation manner, the determining the product information to be displayed according to the voice recognition result includes:

extracting keywords from the voice recognition result to obtain a first keyword set;

and determining the product information to be displayed according to the first keyword set.

In this example, the product information to be displayed is determined through the first keyword set extracted from the voice recognition result, so that the voice text does not need to be completely analyzed, the product information to be displayed is obtained, and the efficiency of obtaining the product information to be displayed is improved.

and obtaining the product information to be displayed from the Internet.

With reference to the first aspect, in a possible implementation manner, the processing at least one video frame in the first live video to obtain the position information of the information display area in the at least one video frame includes:

performing foreground and background segmentation processing on each video frame in the at least one video frame to obtain the position of a background area of each video frame, wherein pixel points in the background area have the same pixel value;

determining at least a portion of the background area as the information presentation area.

In this example, the position of the background region is obtained by performing foreground and background segmentation processing on the video frame, and at least a part of the background region is determined as the information display region, so that the information display region can be determined from the regions with the same pixel value, and the information display region can be determined quickly.

performing area detection on each video frame in the first live video to obtain position information of at least one reference area of each video frame;

acquiring pixel information of at least one reference area of each video frame;

and determining an information display area of each video frame from at least one reference area of each video frame according to the pixel information.

With reference to the first aspect, in one possible implementation manner, the method further includes:

obtaining at least one first user image, wherein the at least one first user image comprises a first user watching the second live video;

processing the at least one first user image to obtain at least one of attention information, expression and action of the first user;

determining the purchase probability of the first user to the current product according to at least one of the attention information, the expression and the action;

and determining the simulated sales volume of the current product according to the purchase probability.

In this example, the accuracy of the determination of the simulated sales volume may be improved by determining at least one of the attention information, the motion and the expression of the first user through the first user image of the first user, determining the purchase rate of the first user for the current product according to the information, determining the simulated sales volume according to the purchase rate, and determining the simulated sales volume according to at least one of the attention information, the motion and the expression of the user.

acquiring at least one second user image, wherein the at least one first user image comprises a second user watching the second live video;

obtaining attention information of the second user by processing the at least one second user image;

under the condition that the attention information of the second user meets the amplification display triggering condition, amplifying and displaying the product information, or displaying the information display area in a popup window mode

With reference to the first aspect, in one possible implementation manner, the product information to be displayed includes at least one of the following: product introduction, product purchase evaluation, product trial pictures or videos, product use share, product price, product purchase links or two-dimensional codes.

A second aspect of an embodiment of the present application provides a live video processing apparatus, including:

the device comprises a first acquisition unit, a second acquisition unit and a display unit, wherein the first acquisition unit is used for processing at least one video frame in a first direct-playing video to obtain position information of an information display area in the at least one video frame;

the second acquisition unit is used for acquiring the information of the product to be displayed;

the replacing unit is used for displaying the product information in the information display area based on the position information to obtain a second live video;

and the sending unit is used for sending the second live broadcast video.

With reference to the second aspect, in a possible implementation manner, the second obtaining unit is configured to:

With reference to the second aspect, in one possible implementation manner, the second obtaining unit is configured to:

acquiring an audio clip of a target user in a live broadcasting process;

With reference to the second aspect, in a possible implementation manner, in terms of determining the product information to be displayed according to the voice recognition result, the second obtaining unit is configured to:

and obtaining the product information to be displayed from the Internet.

With reference to the second aspect, in a possible implementation manner, the first obtaining unit is configured to:

With reference to the second aspect, in one possible implementation manner, the first obtaining unit is configured to:

acquiring pixel information of at least one reference area of each video frame;

With reference to the second aspect, in one possible implementation manner, the apparatus is further configured to:

and under the condition that the attention information of the second user meets an amplification display triggering condition, amplifying and displaying the product information, or displaying the information display area in a pop-up window mode.

With reference to the second aspect, in one possible implementation manner, the product information to be displayed includes at least one of the following: product introduction, product purchase evaluation, product trial pictures or videos, product use share, product price, product purchase links or two-dimensional codes.

A third aspect of embodiments of the present application provides a terminal, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the step instructions in the first aspect of embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program makes a computer perform part or all of the steps as described in the first aspect of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the first aspect of embodiments of the present application. The computer program product may be a software installation package.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic view of a scene of video playing provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a live video processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a live video processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a live video processing method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a live video processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to better understand the live video processing method provided by the embodiment of the present application, first, an application scenario of the live video processing method is briefly introduced below. Referring to fig. 1, fig. 1 is a schematic view of a video playing scene according to an embodiment of the present disclosure. As shown in fig. 1, in a scenario where a live broadcaster carries out live broadcast delivery, live broadcast explanation and the like, a first live broadcast video may be a live video recorded by the live broadcaster, the video has an information display area, after the video is recorded, information replacement is performed on a video frame in the first live broadcast video to obtain a second video frame (only one frame of image is shown in fig. 1), the second live broadcast video can be sent and live broadcast is carried out, the video frame can be quickly processed to obtain a new video frame, a live broadcast video is obtained according to the new video frame, and a live broadcast effect can be improved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a live video processing method according to an embodiment of the present application. As shown in fig. 2, the live video processing method may be applied to processing of live video, for example, in a live cargo area scene, a live product explanation scene, and the like, and may be specifically applied to an electronic device for live broadcasting and the like, where the method includes:

201. and processing at least one video frame in the first direct playing video to obtain the position information of the information display area in the at least one video frame.

The position information of the information display area in the first live video may be understood as the position information of the information display area in each frame of image frame in the first live video, and the position information may be unchanged if the position of the camera acquiring the first live video is not changed, and may be changed if the position of the camera (the position of the camera may be understood as a spatial position, and may also be understood as a shooting angle, etc.) is changed.

For example, the camera may be mobile, the information presentation area may be a screen of a preset color or the like provided on a wall, a screen, or the like behind the target user, and the position information of the screen in the image frame may be determined as the position information of the information presentation area, for example, the information presentation area shown in fig. 1.

When the position information of the information display area is acquired, area detection can be performed on the video frame to obtain the position information.

202. And acquiring the information of the product to be displayed.

The product information to be displayed includes at least one of: product introduction, product purchase evaluation, product trial pictures or videos, product use sharing, product price, product purchase links or two-dimensional codes and the like. The product information to be displayed can be determined by receiving an operation instruction mode sent by a user; the product information to be displayed can be determined through voice information in the live broadcasting process of a user; and the product information to be displayed can be acquired from the Internet and the like. Of course, the information of the product to be displayed may also be acquired in other manners, for example, a manner of manual input by the user, and the like.

203. And displaying the product information in the information display area based on the position information to obtain a second live video.

The information of the product to be displayed may be covered in the information display area, for example, the information image to be displayed may be determined according to the information of the product to be displayed, and the information image to be displayed is covered in the information display area, so as to replace the information in the information display area with the information of the product to be displayed. Of course other alternatives are possible, etc.

204. And sending the second live video.

The second live video can be sent to the live watching end, or the second live video can be sent to the server and sent to the live watching end in a forwarding mode and the like through the server, and the live watching end can be a mobile phone, a tablet computer, a computer and other equipment.

In one possible implementation, a possible method for obtaining information of a product to be displayed includes:

a1, receiving an operation instruction, wherein the operation instruction comprises product identification information to be displayed;

a2, determining the information of the product to be displayed according to the product identification information.

The operation instruction may be received in a wired or wireless manner, for example, the target user sends the operation instruction through another electronic device. The operation instruction may be an operation instruction input by the user, for example, the user inputs the operation instruction through a touch panel, a tablet, or the like of the electronic device for live broadcasting, or the operation instruction may be input by voice. The product identification information to be displayed can be understood as an identity of the product to be displayed, and the like.

The product information to be displayed can be searched from the database according to the product identification information, the product information to be displayed can be obtained by searching in the internet, and the product information to be displayed can be searched according to the mapping relation between the product identification information and the product to be displayed and the mapping table.

In one possible embodiment, another method for acquiring information of a product to be displayed includes:

b1, acquiring an audio clip of the target user in the live broadcasting process;

b2, carrying out voice recognition on the audio clip to obtain a voice recognition result;

and B3, determining the product information to be displayed according to the voice recognition result.

The audio of the target user in the live broadcasting process can be collected through the microphone, the microphone array and the like, and an audio clip is obtained. The voice recognition is performed on the audio clip to obtain a voice recognition result, and the existing voice recognition algorithm and the like can be adopted for recognition to obtain the voice recognition result.

The product information to be displayed can be determined according to the keywords in the voice recognition result. The keyword may be, for example, the name of the product to be displayed, the product to be displayed being, the product to be displayed next being, etc. Other related keywords are of course also possible. And determining the corresponding product information to be displayed according to the corresponding relation between the key words and the product information to be displayed.

The key information of the voice recognition result may also be extracted according to the voice recognition result, and the key information may be, for example, information describing a product, such as product characteristics, and the product information to be displayed is acquired from a database, the internet, and the like according to the product characteristics.

In one possible implementation, a possible method for determining information of a product to be displayed according to a speech recognition result includes:

c1, extracting keywords from the voice recognition result to obtain a first keyword set;

and C2, determining the product information to be displayed according to the first keyword set.

According to the first keyword set, the method for determining the product information to be displayed can be as follows: determining a target keyword according to the first keyword set and a preset keyword set, wherein the preset keyword set comprises keywords corresponding to product information to be displayed; and determining the product information to be displayed according to the target keywords.

The keyword extraction may be performed in a keyword matching manner, or may be performed by using a keyword extraction algorithm, and the like, where the first keyword set includes at least one keyword. The keyword may be the name of the product to be displayed, the product to be displayed is, the product to be displayed next is, etc.

The preset keyword set may be a keyword corresponding to product information to be displayed, for example, the preset keyword set may be a preset keyword set determined by determining identification information of a product to be displayed from a database and determining according to the identification information. Specifically, it can be understood that: the product that needs demonstrate in the live broadcast can be set for, according to the product that needs demonstrate of this settlement, obtain the preset keyword set that corresponds with this product. Therefore, a preset keyword set can be preset, and when the target keyword is determined, the target keyword can be obtained by matching from the preset keyword set, so that the efficiency of determining the target keyword is improved, and the time delay is favorably reduced.

Determining product identification information to be displayed according to the target keyword, determining product information to be displayed according to the product identification information, determining the product information to be displayed according to a mapping relation between the target keyword and the product information to be displayed, and determining the product information to be displayed corresponding to the target keyword according to a keyword matching mode.

In one possible implementation, another method for acquiring information of a product to be displayed includes: and acquiring the product information to be displayed from the Internet. Specifically, the product information to be displayed can be acquired from the internet by the following method:

d1, acquiring product characteristic information;

d2, obtaining the product information to be displayed from the Internet according to the product characteristic information.

The product characteristics may be understood as characteristics that the product has, for example, the product characteristics may be: vegetables in food, fresh vegetables in vegetables, and vegetables containing a large amount of certain elements in fresh vegetables, for example, vegetables containing a large amount of iron (spinach, etc.).

The method for acquiring the information of the product to be displayed from the internet can be as follows: the product information to be displayed corresponding to the product characteristics is acquired from the vegetable selling website and the like, and the product characteristics can correspond to various products to be displayed, so that a plurality of new products to be displayed can be acquired from the vegetable selling website and the like.

In this example, the product information to be displayed is acquired from the associated selling website in the internet, so that convenience in acquiring the product information to be displayed can be improved.

In a possible implementation manner, a possible method for processing at least one video frame in a first live video to obtain location information of an information display area in the at least one video frame includes:

f1, performing foreground and background segmentation processing on each video frame in at least one video frame to obtain the position of a background area of each video frame, wherein pixel points in the background area have the same pixel value;

f2, determining at least one part of the background area as the information display area.

The method for performing foreground and background segmentation processing on the video frame may be to perform segmentation processing by using an image segmentation method to obtain a background region. The image segmentation method can segment the foreground image and the background image from the video frame, and can directly extract the background image. Or the video frame may be segmented by an image segmentation model to obtain the position of the background region, where the image segmentation model is a pre-trained segmentation model used to perform foreground and background segmentation on the video frame.

The information display area may be a plurality of areas, and an area of a preset size in the background area is determined as the information display area, and the preset size may be understood as having a preset area, a preset shape, and the like.

In a possible implementation manner, another possible method for processing at least one video frame in a first live video to obtain location information of an information display area in the at least one video frame includes:

g1, performing area detection on each video frame in the first live video to obtain the position information of at least one reference area of each video frame;

g2, acquiring pixel information of at least one reference area of each video frame;

and G3, determining the information display area of each video frame from at least one reference area of each video frame according to the pixel information.

Since there may be a plurality of different color regions in the background region of the video frame, for example, a disturbing color region and an information display region, a plurality of reference regions may appear after the region detection. The information display area may be provided with a specific color, e.g. green, blue, red, etc. The information display area of the video frame may be determined according to the pixel information of the reference area, where the pixel information may be, for example, a gray value, an RGB value, and the like. If a plurality of information display areas are determined according to the pixel information, a final information display area can be determined from the plurality of information display areas according to the size of the information display areas, and the area of the information display areas can be a preset size, so that matching can be performed according to the preset size to obtain the final information display area. The size of the information presentation area may be understood as the area, shape, etc. of the information presentation area.

The location information may be location coordinates or the like.

The embodiment of the application can also predict the simulated sales volume of the product to be displayed, and specifically can predict the simulated sales volume by the following method:

h1, acquiring at least one first user image, wherein the at least one first user image comprises a first user watching a second live video;

h2, processing at least one first user image to obtain at least one of attention information, expression and action of the first user;

h3, determining the purchase probability of the first user for the current product according to at least one of the attention information, the expression and the action;

h4, determining the simulated sales volume of the current product according to the purchase probability.

The first user image can be acquired through a terminal device of a user (first user) watching a live broadcast, and then the first user image is sent to a live broadcast terminal (electronic device for live broadcast). And playing the second live video on the terminal equipment of the user watching the live video.

At least one of the focus information, the expression information, and the action information of the first user may be obtained by performing feature extraction and the like on the first user image, for example, the focus information and the expression information, or the focus information and the action information, or the focus information, the expression information, and the action information, and the like are determined. This is by way of example only.

When the features are extracted, gray values and the like of the reference image may be extracted, attention information, expressions and action information of the first user are determined through the gray values and the like, the expressions may include facial expressions, eye expressions and the like, the action information includes body actions, hand actions and the like, and the attention information may be attention content and the like of the first user.

The attention information may be attention to product information, and the method of determining the attention information may be: and acquiring the sight direction of the first user, and determining the attention degree according to the association degree between the sight direction and the information display area. Specifically, the gaze direction may be determined according to the face plane of the target user and the position of the eyeball, for example, an outward direction perpendicular to the position of the eyeball in the face plane may be determined as the gaze direction, and outward may be understood as being consistent with the face orientation. The degree of association between the direction of sight and the information presentation area may be understood as: the intersection point between the sight line direction and the information display area is at the position in the information display area, the closer the position is to the center of the information display area, the higher the association degree is, and the farther the information display is from the center, the lower the association degree is. The higher the degree of association, the higher the degree of attention, and the lower the degree of association, the lower the degree of attention.

The specific purchase probability may be determined according to at least one of the attention information, the expression information, and the action information, for example, the purchase probability may be determined according to facial expressions and eye expressions, the purchase probability may be determined according to body actions and hand actions, the purchase probability may be determined according to attention, and the like.

The number of simulated sales can be determined by determining the reference user with the purchase probability exceeding the preset probability value as the user purchasing the product to be displayed, and the preset probability value can be set through experience values or historical data.

In this example, the accuracy of the determination of the simulated sales volume can be improved by determining at least one of the attention information, the motion and the expression of the first user through the first user image of the first user, determining the purchase rate of the first user for the current product according to the information, determining the simulated sales volume according to the purchase rate, and determining the simulated sales volume according to at least one of the attention information, the motion and the expression of the user.

In one possible implementation, a method for determining a probability of purchase of a current product by a first user based on at least one of interest information, expression, and motion includes:

j1, determining a first purchase probability according to the facial expression and the eye expression;

j2, determining a second purchase probability according to the body motion and the hand motion;

j3, determining a third purchase probability according to the attention information;

j4, determining a purchase probability of the current product based on at least one of the first purchase rate, the second purchase rate, and the third purchase probability.

A first mood of the reference user may be determined based on the facial expression, a second mood of the reference user may be determined based on the eye expression, and a first purchase probability may be determined based on the first mood and the second mood. Specifically, the different facial expressions correspond to different first moods, for example, the facial expression may be smiling, laughing, crying face, etc., and the first moods may be cheerful, happy, unhappy, etc. The eye expression may be normal blinking, continuous blinking, glaring (the amplitude of the eye opening is large), normal viewing glasses (the amplitude of the eye opening is general), squinting (the amplitude of the eye opening is small), and the like. Different eye expressions correspond to different moods, for example, glares, and the corresponding moods can be surprises, shock, and the like. The average value of the purchase probability corresponding to the first mood and the second mood may be determined as the first purchase probability, or a lower value of the purchase probability values corresponding to the first mood and the second mood may be determined as the first purchase probability.

The purchase tendency value of the reference user can be determined according to the hand action; determining a purchasing tendency correction value according to the body action; and determining a second purchasing probability according to the purchasing tendency value and the purchasing tendency correction value. The method specifically comprises the following steps: for example, the higher the purchase tendency value is when the hand is close to the live broadcast terminal, the lower the purchase probability value is when the hand is far from the live broadcast terminal. When the hand movement is stationary, the purchasing tendency value can be determined according to the distance between the palm and the live broadcast terminal, and the smaller the distance, the higher the purchasing tendency value, the larger the distance, the smaller the purchasing tendency value, and the like.

The body action comprises the approaching of the live broadcast terminal and the leaving of the live broadcast terminal, the larger the purchasing tendency correction value is when approaching the live broadcast terminal, and the smaller the purchasing tendency correction value is when leaving the live broadcast terminal. The product of the purchasing propensity value and the purchasing propensity modifier value may be determined as a second purchasing probability.

When the attention information is the attention degree, the higher the attention degree is, the higher the third purchase probability is, and the lower the attention degree is, the lower the third purchase probability is.

In the above step J4, if the first probability value, the second probability value, and the third probability value are present, the corresponding steps in J1-J3 are executed, and if the first probability value, the second probability value, and the third probability value are not present, the operation is not executed, for example, if the step J4 determines the purchase probability of the current product according to the first probability value and the second probability value, the step J3 is not required.

The average of the first, second, and third purchase probability values may be determined as a target purchase probability (purchase probability of the current product). The target purchase probability may also be determined as the minimum of the first purchase probability value, the second purchase probability value, and the third purchase probability value. Of course, if the method for determining the target probability value only according to the first purchase probability value and the second purchase probability value can refer to the method for determining the target probability value, and so on, which will not be described herein again.

In one possible implementation, the method further includes:

m1, acquiring at least one second user image, wherein the at least one first user image comprises a second user watching a second live video;

m2, obtaining attention information of a second user by processing at least one second user image;

and M3, under the condition that the focus information of the second user meets the zooming display triggering condition, zooming to display the product information or displaying the information display area in a pop-up window mode.

The method for acquiring the second user image may refer to the method for acquiring the first user image in the foregoing embodiment, and details are not repeated here.

The attention information may be understood as information of interest to the user, for example, information of an area where the user currently stays in the direction of the line of sight. The sight line direction may be obtained by the sight line acquisition method in the foregoing embodiment. The magnified display starting conditions may be: and the time for watching the attention information by the user exceeds the preset time, and the starting condition of amplified display is met.

The attention information may also be a specific mark, such as a product mark, and when the attention information is the product mark, the product information is displayed in an enlarged manner, or the information display area is displayed in a pop-up manner.

The attention information may also be attention, and the trigger condition for the enlarged display may be that the attention is higher than a preset threshold, that is, when the attention is higher than the preset threshold, the product information is enlarged and displayed, or the information display area is displayed in a pop-up window manner, and the preset threshold is set by an empirical value or historical data.

In one embodiment, the video processing method may be: the method comprises the steps of carrying out real-time live video stream processing, carrying out background identification on each frame of a video stream (identifying an information display area), replacing a product picture/video to the area after acquiring the frame area (the information display area), pulling the picture/video from a server (or acquiring the picture/video from other places such as the internet), and updating the replaced live video stream on a webpage background or an application program (an application program for live broadcasting) by a live broadcaster.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating another live video processing method according to an embodiment of the present application. As shown in fig. 3, the method includes:

301. processing at least one video frame in the first direct-playing video to obtain position information of an information display area in the at least one video frame;

302. acquiring an audio clip of a target user in a live broadcasting process;

303. performing voice recognition on the audio clip to obtain a voice recognition result;

304. determining the product information to be displayed according to the voice recognition result;

305. extracting keywords from the voice recognition result to obtain a first keyword set;

the preset keyword set may be a keyword corresponding to product information to be displayed, for example, the preset keyword set may be a preset keyword set determined by determining identification information of a product to be displayed from a database and determining according to the identification information. Specifically, it can be understood that: the product that needs demonstrate in the live broadcast can be set for, according to the product that needs demonstrate of this settlement, obtain the preset keyword set that corresponds with this product.

306. Determining product information to be displayed according to the first keyword set;

307. displaying product information in an information display area based on the position information to obtain a second live video;

308. and sending the second live video.

In the example, the target keywords are determined through the keyword set in the voice text, and the product information to be displayed is acquired through the target keywords, so that the voice text does not need to be completely analyzed, the product information to be displayed is acquired, and the efficiency of acquiring the information to be displayed is improved.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating another live video processing method according to an embodiment of the present application. As shown in fig. 4, the method includes:

401. processing at least one video frame in the first direct-playing video to obtain position information of an information display area in the at least one video frame;

402. acquiring information of a product to be displayed;

403. displaying product information in an information display area based on the position information to obtain a second live video;

404. sending a second live video;

405. acquiring at least one first user image, wherein the at least one first user image comprises a first user watching a second live video;

406. processing at least one first user image to obtain at least one of attention information, expression and action of a first user;

407. determining the purchase probability of the first user to the current product according to at least one of the attention information, the expression and the action;

408. and determining the simulated sales volume of the current product according to the purchase probability.

If the target purchasing probability exceeds the reference user corresponding to the preset probability value, the reference user is determined as the user for purchasing the product to be displayed, the simulated sales number can be determined, and the preset probability value can be set through experience values or historical data.

In this example, the target purchase probability of the reference user is determined by referring to the expression information and the action information of the reference user, which are acquired from the reference image of the user, and the simulated sales volume is determined according to the probability, so that the simulated sales volume can be determined according to the action information and the expression information of the user, and the accuracy in determining the simulated sales volume is improved.

In accordance with the foregoing embodiments, please refer to fig. 5, fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present application, and as shown in the figure, the terminal includes a processor and a memory, and the processor and the memory are connected to each other, where the memory is used for storing a computer program, the computer program includes program instructions, the processor is configured to call the program instructions, and the program includes instructions for executing the following steps;

processing at least one video frame in a first live video to obtain position information of an information display area in the at least one video frame;

acquiring information of a product to be displayed;

and sending the second live video.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the terminal includes corresponding hardware structures and/or software modules for performing the respective functions in order to implement the above-described functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments provided herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the terminal may be divided into the functional units according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

In accordance with the above, please refer to fig. 6, and fig. 6 is a schematic structural diagram of a live video processing apparatus according to an embodiment of the present application. As shown in fig. 6, the apparatus includes:

a first obtaining unit 601, configured to process at least one video frame in a first live video to obtain position information of an information display area in the at least one video frame;

a second obtaining unit 602, configured to obtain information of a product to be displayed;

a replacing unit 603, configured to display the product information in the information display area based on the location information, so as to obtain a second live video;

a sending unit 604, configured to send the second live video.

In a possible implementation manner, the second obtaining unit 602 is configured to:

acquiring an audio clip of a target user in a live broadcasting process;

In a possible implementation manner, in terms of determining the product information to be displayed according to the voice recognition result, the second obtaining unit 602 is configured to:

and obtaining the product information to be displayed from the Internet.

In one possible implementation manner, the first obtaining unit 601 is configured to:

acquiring pixel information of at least one reference area of each video frame;

In one possible implementation, the apparatus is further configured to:

In one possible implementation, the product information to be displayed includes at least one of: product introduction, product purchase evaluation, product trial pictures or videos, product use share, product price, product purchase links or two-dimensional codes.

Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the live video processing methods described in the above method embodiments.

Embodiments of the present application also provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, where the computer program causes a computer to execute some or all of the steps of any one of the live video processing methods described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solutions of the present application, in essence or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, can be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned memory comprises: various media that can store program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.

The foregoing embodiments have been described in detail, and specific examples are used herein to explain the principles and implementations of the present application, where the above description of the embodiments is only intended to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A live video processing method, characterized in that the method comprises:

acquiring information of a product to be displayed;

sending the second live video;

2. The method of claim 1, wherein the obtaining information about the product to be displayed comprises:

3. The method of claim 1, wherein the obtaining information about the product to be displayed comprises:

acquiring an audio clip of a target user in a live broadcasting process;

4. The method according to claim 3, wherein the determining the product information to be displayed according to the voice recognition result comprises:

5. The method according to any one of claims 1 to 4, wherein the processing at least one video frame in the first live video to obtain the position information of the information display area in the at least one video frame comprises:

6. The method according to any one of claims 1 to 4, wherein the processing at least one video frame in the first live video to obtain the position information of the information display area in the at least one video frame comprises:

acquiring pixel information of at least one reference area of each video frame;

7. The method according to any one of claims 1 to 4, further comprising:

acquiring at least one second user image, wherein the at least one second user image comprises a second user watching the second live video;

8. The method according to any one of claims 1-4, wherein the product information to be displayed comprises at least one of: product introduction, product purchase evaluation, product trial pictures or videos, product use share, product price, product purchase links or two-dimensional codes.

9. A live video processing apparatus, the apparatus comprising:

a sending unit, configured to send the second live video;

the apparatus is further configured to:

obtaining at least one first user image, the at least one first user image comprising a first user viewing the second live video;

10. A terminal, characterized in that it comprises a processor and a memory, said processor and memory being connected to each other, wherein said memory is used for storing a computer program comprising program instructions, said processor being configured for invoking said program instructions for performing the method according to any of claims 1-8.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to carry out the method according to any one of claims 1-8.