CN118233664A

CN118233664A - Video processing method and device and electronic equipment

Info

Publication number: CN118233664A
Application number: CN202211644421.9A
Authority: CN
Inventors: 吴金泳; 温大川; 赖磊; 李志威; 曾芝健; 唐立
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2024-06-21
Also published as: WO2024131576A1

Abstract

The disclosure provides a video processing method, a device and an electronic device, wherein the method comprises the following steps: displaying a video interface, wherein the video interface is used for playing live video; acquiring first interaction information input by a first user on the video interface, and sending the first interaction information to a server; and receiving the video stream sent by the server, and playing a live video on the video interface according to the video stream, wherein the live video comprises a main broadcasting video, first voice corresponding to the first interaction information and second voice corresponding to the second interaction information of at least one second user. And the complexity of live interaction is reduced.

Description

Video processing method and device and electronic equipment

Technical Field

The embodiment of the disclosure relates to the technical field of video processing, in particular to a video processing method, a video processing device and electronic equipment.

Background

When the host broadcast is live, the user watching the live broadcast can interact with the host broadcast, so that the live broadcast effect is improved.

Currently, the anchor may interact according to text sent by the user. For example, as the user views the live video of the anchor, the user may send text in real time, which may be displayed in the frame of the live video, which the anchor may see in the live video, and thus interact with the user. However, in some scenes (such as outdoor live broadcast and concert live broadcast), a host can not hold the live broadcast equipment and watch the live broadcast video, and the watched user can only hear the voice of the host, can not acquire the interaction information of a plurality of users in the live broadcast video, and the host also needs to acquire the interaction information of a plurality of users in other modes (such as live broadcast user reminding and the like), so that the complexity of live broadcast interaction is higher.

Disclosure of Invention

The disclosure provides a video processing method, a video processing device and electronic equipment, which are used for solving the technical problem of high complexity of live interaction in the prior art.

In a first aspect, the present disclosure provides a video processing method, the method comprising:

displaying a video interface, wherein the video interface is used for playing live video;

acquiring first interaction information input by a first user on the video interface, and sending the first interaction information to a server;

And receiving the video stream sent by the server, and playing a live video on the video interface according to the video stream, wherein the live video comprises a main broadcasting video, first voice corresponding to the first interaction information and second voice corresponding to the second interaction information of at least one second user.

In a second aspect, the present disclosure provides another video processing method, the method comprising:

receiving a plurality of target interaction information sent by a plurality of electronic devices, wherein the target interaction information comprises text information and/or voice information;

Receiving an anchor video sent by anchor equipment;

Determining a video stream according to the anchor video and the target interaction information, wherein the video stream comprises an encoding stream of the anchor video and an encoding stream of voice associated with the target interaction information;

and sending the video stream to the anchor device and the plurality of electronic devices.

In a third aspect, the present disclosure provides a video processing apparatus, including a display module, an acquisition module, a transmission module, a reception module, and a playback module, wherein:

The display module is used for displaying a video interface, and the video interface is used for playing live video;

The acquisition module is used for acquiring first interaction information input by a first user on the video interface;

the sending module is used for sending the first interaction information to a server;

the receiving module is used for receiving the video stream sent by the server;

The playing module is used for playing live video on the video interface according to the video stream, wherein the live video comprises a main broadcasting video, first voice corresponding to the first interaction information and second voice corresponding to the second interaction information of at least one second user.

In a fourth aspect, the present disclosure provides another video processing apparatus, including a receiving module, a determining module, and a transmitting module, wherein:

The receiving module is used for receiving a plurality of target interaction information sent by a plurality of electronic devices, wherein the target interaction information comprises text information and/or voice information;

The receiving module is also used for receiving the anchor video sent by the anchor device;

the determining module is used for determining a video stream according to the anchor video and the target interaction information, wherein the video stream comprises an encoding stream of the anchor video and an encoding stream of voice associated with the target interaction information;

the sending module is used for sending the video streams to the anchor device and the plurality of electronic devices.

In a fifth aspect, the present disclosure provides an electronic device comprising: a processor and a memory;

The memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the video processing method as described above in the first aspect and the various possible aspects of the first aspect.

In a sixth aspect, the present disclosure provides a server comprising: a processor and a memory;

The memory stores computer-executable instructions;

The processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the video processing method as described above in the second aspect and the various possible aspects of the second aspect.

In a seventh aspect, the present disclosure provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the video processing method as described above and as described in the first aspect and as described in the second aspect.

In an eighth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the video processing method as described above in the first aspect and the various possible aspects of the first aspect, or implements the video processing method as described above in the second aspect and the various possible aspects of the second aspect.

The electronic equipment can acquire a video interface, the video interface is used for playing live videos, first interaction information input by a first user on the video interface is acquired, the first interaction information is sent to a server, a video stream sent by the server is received, and the live videos are played on the video interface according to the video stream, wherein the live videos comprise a main broadcasting video, first voices corresponding to the first interaction information and second voices corresponding to second interaction information of at least one second user. In the method, when the live broadcast is performed by the live broadcast, the live broadcast video can comprise the voice corresponding to the interaction information of the watched user, so that the live broadcast can hear the interaction information of the user without watching the live broadcast video, and the user can acquire the voice interaction information of a plurality of users and the live broadcast through the live broadcast video, thereby reducing the complexity of live broadcast interaction.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the present disclosure, and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of a video processing method according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram of a process for displaying a video interface according to an embodiment of the disclosure;

Fig. 4A is a schematic diagram of a process for acquiring first interaction information according to an embodiment of the disclosure;

fig. 4B is a schematic diagram of another process for obtaining first interaction information according to an embodiment of the disclosure;

Fig. 5 is a schematic diagram of another process of obtaining first interaction information according to an embodiment of the disclosure;

Fig. 6 is a schematic diagram of a scenario for sending first interaction information according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of another video processing method according to an embodiment of the disclosure;

FIG. 8 is a schematic diagram of a process for determining a target voice according to an embodiment of the disclosure;

Fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the disclosure;

Fig. 10 is a schematic structural diagram of another video processing apparatus according to an embodiment of the disclosure;

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

In order to facilitate understanding, concepts related to the embodiments of the present disclosure are described below.

Electronic equipment: is a device with wireless receiving and transmitting function. The electronic device may be deployed on land, including indoors or outdoors, hand-held, wearable, or vehicle-mounted; can also be deployed on the water surface (such as a ship, etc.). The electronic device may be a mobile phone (mobile phone), a tablet (Pad), a computer with wireless transceiving function, a Virtual Reality (VR) electronic device, an augmented reality (augmented reality, AR) electronic device, a wireless terminal in industrial control (industrial control), a vehicle-mounted electronic device, a wireless terminal in unmanned (SELF DRIVING), a wireless electronic device in remote medical (remote medical), a wireless electronic device in smart grid (SMART GRID), a wireless electronic device in transportation security (transportation safety), a wireless electronic device in smart city (SMART CITY), a wireless electronic device in smart home (smart home), a wearable electronic device, or the like. The electronic device according to the embodiments of the present disclosure may also be referred to as a terminal, a User Equipment (UE), an access electronic device, a vehicle-mounted terminal, an industrial control terminal, a UE unit, a UE station, a mobile station, a remote electronic device, a mobile device, a UE electronic device, a wireless communication device, a UE proxy, a UE apparatus, or the like. The electronic device may also be stationary or mobile.

In the related technology, a user watching live broadcast can interact with a host, so that the live broadcast effect of the host can be improved. Currently, the anchor may interact according to text sent by the user. For example, when a user watches live video of a live broadcast, the user can send text "hello" in real time, the text can be displayed in a picture of the live video, and after the live broadcast sees the text "hello" in the live video, the user can reply to the text through voice so as to interact with the user watching live broadcast. However, in some scenes, the host cannot hold the live broadcast equipment and watch the live broadcast video, and the user can only hear the sound of the host, and cannot acquire the interaction information of a plurality of users in the live broadcast video, for example, in the live broadcast scene of a concert, the user watching the live broadcast video cannot feel the atmosphere of the live concert, and the host also needs to interact with the user through the interaction information in the live broadcast video reminded by other users on the scene, thereby resulting in higher complexity of live broadcast interaction.

In order to solve the above technical problems, an embodiment of the present disclosure provides a video processing method, in which an electronic device displays a video interface for playing a live video, obtains first interaction information input by a first user at the video interface, sends the first interaction information to a server, receives a video stream sent by the server, and obtains a main broadcast video stream and an audio stream in the video stream, where the audio stream is determined based on first voice corresponding to the first interaction information and second voice corresponding to the second interaction information, and obtains the live video according to the main broadcast video stream and the audio stream. Therefore, the live video comprises the live video, the first voice corresponding to the first interaction information and the second voice corresponding to the second interaction information of at least one second user, so that the live video is not required to be watched by the user, the interaction information of the user can be heard, the user can acquire the voice interaction information of a plurality of users and the live video through the live video, and the complexity of live interaction is further reduced.

Next, an application scenario of the embodiment of the present disclosure will be described with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present disclosure. Referring to fig. 1, the system includes electronic devices 1, … …, an electronic device N, a host device and a server. The electronic device 1, … … and the electronic device N may send interaction information to the server, and the anchor device may send anchor video to the server. And the electronic equipment generates a live video according to the voice and the anchor video corresponding to the interaction information, and transmits the live video to the electronic equipment 1, … …, the electronic equipment N and the anchor equipment.

Referring to fig. 1, in the above network structure, a viewer may directly transmit text and cheering to a host, the viewer may hear the host's singing and background sounds, and the background sounds may include cheering of a plurality of viewers. When a plurality of audiences send singing sounds through a plurality of electronic devices, the host can hear the singing sounds of the audiences, the audiences can hear the singing sounds of the host and the singing sounds of the audiences, and the audiences and the host can directly conduct voice communication, so that the audiences and the host can directly interact, and the complexity of live broadcast interaction is reduced.

It should be noted that fig. 1 is only an exemplary illustration of the application scenario of the embodiments of the present disclosure, and is not limited to the application scenario of the embodiments of the present disclosure.

The following describes the technical solutions of the present disclosure and how the technical solutions of the present disclosure solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart of a video processing method according to an embodiment of the disclosure. Referring to fig. 2, the method may include:

S201, displaying a video interface.

The execution body of the embodiment of the disclosure may be an electronic device, or may be a video processing apparatus provided in the electronic device. Alternatively, the video processing apparatus may be implemented by software, and the video processing apparatus may be implemented by a combination of software and hardware.

Optionally, the video interface is used for playing live video. For example, the video interface may display live video including anchor video. Optionally, the electronic device may display a video interface in response to a user triggering operation of the live application. For example, when a user clicks on a live application, the electronic device may display a live page, where the live page may include multiple live links of a host, and when the user clicks on any one of the live links, the electronic device may display a video interface in which live video including the host video of the host may be played.

Next, a process of displaying a video interface will be described with reference to fig. 3.

Fig. 3 is a schematic diagram of a process for displaying a video interface according to an embodiment of the disclosure. Referring to fig. 3, the method includes: an electronic device. The display page of the electronic equipment comprises a control of the live broadcast application. When a user clicks a control of the live broadcast application, the electronic device may display a page corresponding to the live broadcast application, where the page includes a live video control of the anchor 1, a live video control of the anchor 2, and a live video control of the anchor 3. When the user clicks the live video control of the anchor 2, the electronic device can display a video interface and play the live video corresponding to the anchor 2 in the video interface.

S202, acquiring first interaction information input by a first user on a video interface.

Alternatively, the first interactive information may include text information and voice information. For example, a user may input the text "hello" at the video interface, the electronic device may determine the text as the first interactive information, the user may input the voice "fueling" at the video interface, and the electronic device may determine the voice as the first interactive information.

Optionally, the electronic device may obtain the first interaction information input by the first user on the video interface based on the following two possible implementations:

One possible implementation:

And in response to a triggering operation of a text editing control in the video interface, displaying a text window for inputting text. For example, the electronic device may include a text editing control in the video interface when the video interface plays live video, and the video interface may display a text window for entering text when the user clicks on the text editing control. For example, the text editing control may be a text entry field in which a user may enter related text via the electronic device when the user clicks on the text entry field.

And acquiring the text content edited in the text window, and determining the text content as the first interaction information. For example, when the user clicks a text editing control in the video interface, the video interface may pop up the text window, and if the user enters the text "refuel" in the text window and clicks a confirmation control, the electronic device may determine the text "refuel" as the first interactive information. For example, when the user enters the text "fueling" in the text input field and clicks the send control, the electronic device may determine the text "fueling" as the first interactive information.

The process of acquiring the first interaction information in this implementation will be described below with reference to fig. 4A-4B.

Fig. 4A is a schematic diagram of a process for acquiring first interaction information according to an embodiment of the disclosure. Referring to fig. 4A, an electronic device is included. The display page of the electronic equipment is a video interface, and the video played in the video interface is a live video. The video interface comprises a text editing control, when a user clicks the text editing control, the video interface pops up a text window, the user inputs text fueling in a text input field of the text window, and clicks a sending control, and the electronic equipment determines that the acquired first interaction information is text fueling.

Fig. 4B is a schematic diagram of another process for obtaining first interaction information according to an embodiment of the disclosure. Referring to fig. 4B, an electronic device is included. The display page of the electronic equipment is a video interface, and the video played in the video interface is a live video. When the user clicks any point of the video interface, the bottom of the video interface can pop up a text input field, the user inputs text fueling in the text input field, and clicks a sending control, and the electronic equipment determines that the acquired first interaction information is text fueling.

Another possible implementation is:

And responding to the triggering operation of the voice acquisition control in the video interface, acquiring the voice of the first user, and determining the voice of the first user as the first interaction information. For example, when the electronic device plays live video on the video interface, the video interface may include a voice acquisition control, and when the user clicks the voice acquisition control, the electronic device may acquire the voice of the first user in real time and determine the voice as the first interaction information. For example, if the voice of the first user collected by the electronic device is voice "fueling", the electronic device may determine the voice "fueling" as the first interactive information.

In the following, a process of acquiring the first interaction information in this possible implementation manner will be described with reference to fig. 5.

Fig. 5 is a schematic diagram of another process of obtaining first interaction information according to an embodiment of the disclosure. Please refer to fig. 5, which includes an electronic device. The display page of the electronic equipment is a video interface, and the video played in the video interface is a live video. The video interface comprises a voice acquisition control. When a user presses the voice collection control for a long time, the electronic equipment can collect surrounding voices, and if the voice input by the user to the electronic equipment is voice 'refueling', the electronic equipment determines that the acquired first interaction information is voice 'refueling' when the user stops touching the voice collection control.

S203, the first interaction information is sent to the server.

Optionally, after the electronic device obtains the first interaction information input by the first user, the electronic device may send the first interaction information to the server. For example, the electronic device may acquire the first interaction information as text information and voice information, and the electronic device may send the acquired text information and voice information to the server, where it is noted that when the first interaction information is text information, the electronic device may send the text information to the server, or may convert the text information into voice information, and then send the voice information to the server, which is not limited in the embodiment of the present disclosure.

Next, a process of transmitting interactive information to the server will be described with reference to fig. 6.

Fig. 6 is a schematic diagram of a scenario for sending first interaction information according to an embodiment of the present disclosure. Referring to fig. 6, the method includes: audience 1, audience 2, audience 3, and servers. Wherein, when 3 viewers watch live video, viewer 1 may send text a to the server according to the electronic device used (not shown in fig. 6), viewer 2 may send text B to the server according to the electronic device used, and viewer 3 may send voice to the server according to the electronic device used. In this way, the server can receive the interaction information sent by 3 audiences, and then generates a live video stream according to the interaction information sent by 3 audiences and the acquired anchor video.

In the embodiment shown in fig. 6, the text of the audience 1 and the audience 2 may be converted into voice in the electronic device, and then the voice corresponding to the text may be sent to the server, or the text may be sent to the server, and the server converts the text into voice.

S204, receiving the video stream sent by the server.

Optionally, after the electronic device sends the first interaction information to the server, the electronic device may receive the video stream sent by the server.

S205, playing the live video on the video interface according to the video stream.

Optionally, the live video includes a main video, a first voice corresponding to the first interaction information, and a second voice corresponding to the second interaction information of at least one second user. Alternatively, the second user may be another user than the first user. For example, in the actual application process, the server may receive the interaction information sent by the plurality of users, so the live video may include voices corresponding to the interaction information of the plurality of users.

Optionally, the second interaction information may be interaction information input by the second user at the video interface. For example, the second interaction information may include text information and voice information, and the method for the second user to input the second interaction information to the video interface is the same as the method for the first user to input the first interaction information to the video interface, which is not described herein.

Optionally, the first voice may be a voice corresponding to the first interaction information. For example, when the first interactive information is text information, the first voice may be a voice corresponding to the text information, and when the first interactive information is voice information, the electronic device may determine the voice information as the first voice. For example, if the first interactive information is text "fueling", the first voice may be voice "fueling". If the first interactive information is voice "fueling", the first voice may be voice "fueling".

Optionally, the second voice may be a voice corresponding to the second interaction information. For example, when the second interactive information is text information, the second voice may be a voice corresponding to the text information, and when the second interactive information is voice information, the electronic device may determine the voice information as the first voice. For example, if the second interactive information is the text "hello", the second voice may be the voice "hello", and if the second interactive information is the voice "hello", the second voice may be the voice "hello".

Optionally, the video stream may include a main video stream and an audio stream, and the electronic device may play the live video at the video interface according to the following possible implementation manner: the anchor video stream and the audio stream are acquired from the video stream. Wherein the anchor video stream may be derived based on the anchor video. For example, the server encodes the anchor video, and may obtain an anchor video stream corresponding to the anchor video. The audio stream may be determined based on the first speech and the second speech. For example, when the server obtains the first voice and the second voice, the server may generate a synthesized voice according to the first voice and the second voice, and encode the synthesized voice to obtain an audio stream.

And obtaining the live video according to the main broadcasting video stream and the audio stream. For example, after receiving the video stream sent by the server, the electronic device may decode the anchor video stream and the audio stream in the video stream, so as to obtain a live video, where the live video may include the anchor video, the first voice and the second voice.

Optionally, the live video may further include images of the first user and the second user. For example, when the first user inputs the first interaction information to the video interface, the electronic device may acquire an image of the first user, and when the electronic device sends the first interaction information to the server, the electronic device may also send the image of the first user to the server, and the server may add the image of the first user to the anchor video, so that the live video further includes the image of the first user. For example, when the second user inputs the second interaction information to the video interface, the electronic device may acquire an image of the second user, and when the electronic device sends the second interaction information to the server, the electronic device may also send the image of the second user to the server, and the server may add the image of the second user to the anchor video, so that the live video further includes the image of the second user. For example, when the live video is a live video of a concert, an audience area of the live video may include images of the first user and the second user in the live video received by the electronic device, so as to improve an effect of the live video.

The embodiment of the disclosure provides a video processing method, an electronic device displays a video interface for playing live video, responds to a triggering operation of a text editing control in the video interface, displays a text window for inputting text, acquires text content edited in the text window, determines the text content as first interaction information, or responds to a triggering operation of a voice acquisition control in the video interface, acquires voice of a first user, determines the voice of the first user as first interaction information, and the electronic device can send the first interaction information to a server, receive a video stream sent by the server and play live video on the video interface according to the video stream. Therefore, the live video comprises the live video, the first voice corresponding to the first interaction information and the second voice corresponding to the second interaction information of at least one second user, so that the live video is not required to be watched by the user, the interaction information of the user can be heard, the user can acquire the voice interaction information of a plurality of users and the live video through the live video, and the complexity of live interaction is further reduced.

Based on the embodiment shown in fig. 2, another video processing method will be described below with reference to fig. 7.

Fig. 7 is a schematic diagram of another video processing method according to an embodiment of the disclosure. Referring to fig. 7, the method includes:

s701, receiving a plurality of target interaction information sent by the electronic equipment.

The execution subject of the embodiments of the present disclosure may be a server, or may be a video processing apparatus provided in the server. Alternatively, the video processing apparatus may be implemented by software, and the video processing apparatus may also be implemented by a combination of software and hardware, which is not limited by the embodiments of the present disclosure.

Optionally, the target interaction information may include text information and/or voice information. For example, the target interaction information may be text information input by a user watching live broadcast, the target interaction information may be voice information input by a user watching live broadcast, and the target interaction information may be text information and voice information input by a user watching live broadcast. For example, when a user views live video, if the user inputs text "fueling" on the video interface, the target interaction information received by the server is text "fueling", and if the user inputs voice "fueling" on the video interface, the target interaction information received by the server is voice "fueling".

Alternatively, the electronic device may be the electronic device of the first user or the second user. For example, the electronic device may be an electronic device of a user watching live video, and when the user watches live video, the user may input text information and voice information into the electronic device, and the server may receive the text information and voice information sent by the electronic device.

S702, receiving an anchor video sent by anchor equipment.

Alternatively, the anchor video may be a video including anchor, and the anchor device may be a device capturing the anchor video. For example, in a live scene, a live video in live videos is provided by a host device, after the host device shoots a video including the host, the live video including the host video may be sent to a server, and the server may send the live video to an electronic device of a user watching live so that the user may watch the live video.

S703, determining a video stream according to the anchor video and the target interaction information.

Optionally, the video stream includes an encoded stream of the anchor video and an encoded stream of the speech associated with the target interaction information. For example, if the target interaction information is text information, the server may convert the text information into voice information. For example, if the target interaction information is the text "fueling", the server may convert the text "fueling" to the voice "fueling".

When the server receives the text information sent by the electronic device, the server can convert the text information into voice information, if the server can determine the tone of the user through the voice sent by the electronic device, the tone associated with the voice information can be the tone of the user when the server converts the text information into the voice information.

Alternatively, the server may generate the video stream according to the following possible implementation: and generating an audio stream according to the multiple target interaction information, and generating a video stream according to the anchor video and the audio stream. For example, the server encodes the anchor video to obtain an anchor video stream, and combines the anchor video stream and the audio stream to obtain a video stream. Optionally, the audio stream is obtained based on voices corresponding to the plurality of target interaction information. For example, the audio stream may be a speech-associated encoded stream corresponding to a plurality of target interactive information.

Alternatively, the electronic device may generate the audio stream according to the following possible implementation: and acquiring a plurality of third voices associated with the target interaction information, determining target voices in the third voices, and generating an audio stream according to the target voices. Optionally, the third voice may be a voice corresponding to the target interaction information. For example, if the target interaction information is text "fueling", the third voice may be voice "fueling", and if the target interaction information is voice "hello", the third voice may be voice "hello".

Optionally, the number of target voices is greater than or equal to the first threshold. For example, in the practical application process, each voice includes corresponding semantic information, so the server can divide the third voice into multiple types of voices according to the semantic information, and then determine the target voice according to the number of each type of voices.

Optionally, the server determines the target voice in the third voice, specifically: clustering the semantics of the plurality of third voices to obtain at least one type of third voices, obtaining the voice quantity of each type of third voices, and determining the type of third voices with the voice quantity being greater than or equal to a first threshold value as target voices. For example, the number of third voices obtained by the server is 100, the first threshold is 20, if the server performs clustering processing on the semantics of the 100 voices to obtain 30 voices of the semantic A, 60 voices of the semantic B and 10 voices of the semantic C, the electronic device can determine the 30 voices corresponding to the semantic A and the 60 voices corresponding to the semantic B as target voices.

Next, a process of determining a target voice will be described with reference to fig. 8.

Fig. 8 is a schematic diagram of a process for determining a target voice according to an embodiment of the disclosure. Referring to fig. 8, 100 pieces of third voices are included. The server (not shown in fig. 8) may perform semantic clustering processing on 100 third voices according to the semantics of each third voice, to obtain two types of third voices. The semantics of the third voice are fueling, and the semantics of the other third voice are audible. The third voice with meaning of fueling has a voice number of 30 and the third voice with meaning of good hearing has a voice number of 70. If the first threshold is 50, the server determines that the semantics of the target voice are good, namely, determines 70 pieces of third voice with the semantics being good as the target voice.

Optionally, the server generates an audio stream according to the target voice, specifically: and determining the voice volume corresponding to each type of target voice according to the voice quantity of each type of target voice. Optionally, the server may acquire a first preset relationship, and determine, according to the first preset relationship and the number of voices of the target voice, a voice volume corresponding to the target voice. The first preset relation may include at least one number of voices and a voice volume corresponding to each number of voices. For example, the first preset relationship may be as shown in table 1:

TABLE 1

Number of voices	Volume of speech
		Quantity 1	Volume a
Quantity 2	Volume b
		Quantity 3	Volume c
……	……

It should be noted that table 1 illustrates the first preset relationship by way of example only, and is not limited to the first preset relationship.

For example, if the server determines that the number of the target voices is 1, the server determines that the voice volume corresponding to the target voices is a volume a; if the server determines that the number of the target voices is 2, the server determines that the voice volume corresponding to the target voices is volume b; if the server determines that the number of the target voices is 3, the server determines that the voice volume corresponding to the target voices is the volume c.

And generating an audio stream according to the target voice and the voice volume corresponding to the target voice. For example, the server may generate a synthesized voice according to the target voice, where the semantic meaning of the synthesized voice is the same as the semantic meaning of the target voice, and the synthesized voice may have an effect that a plurality of users send the target voice, and the server sets the volume of the synthesized voice to the volume of the voice corresponding to the target voice, and encodes the synthesized voice to obtain the audio stream.

Optionally, the server may also receive an image of the user sent by the electronic device, and the server may also add an image of at least one user to the anchor video. For example, in a live event scene of a singing concert, the server may add at least one image of a user to the obtained audience area of the anchor video, thereby improving the effect of the live video.

S704, sending the video stream to the anchor device and the electronic device.

Optionally, after the server obtains the video stream, the server may send the video stream to the hosting device and the electronic devices of the plurality of users watching live broadcast, so that the hosting may obtain the interactive information of the users through the audio of the live broadcast video, reply to the interactive information, and the users watching live broadcast may also feel the effect of live broadcast through the live broadcast video (such as the effect of chorus of the plurality of viewers, the effect of asking questions on the live broadcast, etc.), thereby improving the effect of live broadcast interaction.

The embodiment of the disclosure provides a video processing method, which receives a plurality of target interaction information sent by a plurality of electronic devices, receives a main broadcast video sent by main broadcast equipment, determines a video stream according to the main broadcast video and the plurality of target interaction information, and sends the video stream to the main broadcast equipment and the plurality of electronic devices. Thus, the anchor can hear the interaction information of the user without watching the live video, and the more the interaction information is, the better the sound mixing effect is, the live effect is improved, and the user can acquire the voice interaction information of a plurality of users and the anchor through the live video, so that the complexity of live interaction is reduced.

Fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the disclosure. Referring to fig. 9, the video processing apparatus 10 includes a display module 11, an acquisition module 12, a transmission module 13, a reception module 14, and a playing module 15, wherein:

The display module 11 is configured to display a video interface, where the video interface is used to play live video;

The acquiring module 12 is configured to acquire first interaction information input by a first user at the video interface;

the sending module 13 is configured to send the first interaction information to a server;

The receiving module 14 is configured to receive a video stream sent by the server;

The playing module 15 is configured to play a live video on the video interface according to the video stream, where the live video includes a main video, a first voice corresponding to the first interaction information, and a second voice corresponding to the second interaction information of at least one second user.

In accordance with one or more embodiments of the present disclosure, the acquisition module 12 is specifically configured to:

responding to the triggering operation of a text editing control in the video interface, and displaying a text window for inputting text;

and acquiring the text content edited in the text window, and determining the text content as the first interaction information.

In accordance with one or more embodiments of the present disclosure, the acquisition module 12 is specifically configured to: responding to triggering operation of a voice acquisition control in the video interface, and acquiring voice of a first user;

and determining the voice of the first user as the first interaction information.

According to one or more embodiments of the present disclosure, the live video further includes images of the first user and the second user.

The video processing device provided in the embodiments of the present disclosure may be used to execute the technical solutions of the embodiments of the methods, and the implementation principle and the technical effects are similar, and are not repeated here.

Fig. 10 is a schematic structural diagram of another video processing apparatus according to an embodiment of the disclosure. Referring to fig. 10, the video processing apparatus 20 includes a receiving module 21, a determining module 22, and a transmitting module 23, wherein:

The receiving module 21 is configured to receive a plurality of target interaction information sent by a plurality of electronic devices, where the target interaction information includes text information and/or voice information;

The receiving module 21 is further configured to receive an anchor video sent by an anchor device;

The determining module 22 is configured to determine a video stream according to the anchor video and the target interaction information, where the video stream includes an encoded stream of the anchor video and an encoded stream of speech associated with the target interaction information;

The sending module 23 is configured to send the video streams to the anchor device and the plurality of electronic devices.

In accordance with one or more embodiments of the present disclosure, the determining module 22 is specifically configured to:

Generating an audio stream according to the target interaction information, wherein the audio stream is obtained based on voices corresponding to the target interaction information;

and generating the video stream according to the anchor video and the audio stream.

Acquiring a plurality of third voices associated with a plurality of target interaction information;

determining target voices in the third voices, wherein the number of the target voices is larger than or equal to a first threshold value;

and generating the audio stream according to the target voice.

Clustering the semantics of the plurality of third voices to obtain at least one type of third voices;

And acquiring the number of voices of each type of third voice, and determining the type of third voice with the number of voices being greater than or equal to a first threshold value as the target voice.

Determining the voice volume corresponding to each type of target voice according to the voice quantity of each type of target voice;

and generating the audio stream according to the target voice and the voice volume corresponding to the target voice.

According to one or more embodiments of the present disclosure, an image of at least one user is added to the anchor video.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. Referring to fig. 11, a schematic diagram of an electronic device 1100 suitable for implementing embodiments of the present disclosure is shown, where the electronic device 1100 may be a terminal device or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a Personal Digital Assistant (PDA) or the like, a tablet computer (Portable Android Device) or the like, a Portable Multimedia Player (PMP) or the like, a car-mounted terminal (e.g., car navigation terminal) or the like, and a fixed terminal such as a digital TV or a desktop computer or the like. The electronic device shown in fig. 11 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 11, the electronic device 1100 may include a processing means (e.g., a central processor, a graphics processor, etc.) 1101 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage means 1108 into a random access Memory (Random Access Memory RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are also stored. The processing device 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

In general, the following devices may be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1107 including, for example, a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, a speaker, a vibrator, and the like; storage 1108, including for example, magnetic tape, hard disk, etc.; and a communication device 1109. The communication means 1109 may allow the electronic device 1100 to communicate wirelessly or by wire with other devices to exchange data. While fig. 11 illustrates an electronic device 1100 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communications device 1109, or from storage device 1108, or from ROM 1102. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 1101.

Embodiments of the present disclosure further include a server, where the server may include a processor and a memory, where the memory stores computer-executable instructions, and where the processor executes the computer-executable instructions stored in the memory, so that the processor executes the video processing method according to any one of the foregoing embodiments.

The disclosed embodiments further include a computer readable storage medium having stored therein computer executable instructions that, when executed by a processor, implement a video processing method as in any of the embodiments described above.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or may be connected to an external computer (e.g., through the internet using an internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations. The data may include information, parameters, messages, etc., such as tangential flow indication information.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

In a first aspect, one or more embodiments of the present disclosure provide a video processing method, the method comprising:

According to one or more embodiments of the present disclosure, obtaining first interaction information input by a first user at a video interface includes:

responding to triggering operation of a voice acquisition control in the video interface, and acquiring voice of a first user;

In a second aspect, one or more embodiments of the present disclosure provide another video processing method, the method comprising:

Receiving an anchor video sent by anchor equipment;

According to one or more embodiments of the present disclosure, determining a video stream from the anchor video and the plurality of target interaction information includes:

According to one or more embodiments of the present disclosure, generating an audio stream according to the plurality of target interaction information includes:

and generating the audio stream according to the target voice.

According to one or more embodiments of the present disclosure, determining a target voice in the third voice includes:

According to one or more embodiments of the present disclosure, generating the audio stream from the target speech includes:

According to one or more embodiments of the present disclosure, the method further comprises:

And adding at least one image of the user in the anchor video.

In a third aspect, one or more embodiments of the present disclosure provide a video processing apparatus, including a display module, an acquisition module, a transmission module, a reception module, and a playback module, wherein:

the receiving module is used for receiving the video stream sent by the server;

According to one or more embodiments of the present disclosure, the obtaining module is specifically configured to:

In a fourth aspect, one or more embodiments of the present disclosure provide another video processing apparatus, including a receiving module, a determining module, and a transmitting module, wherein:

According to one or more embodiments of the present disclosure, the determining module is specifically configured to:

and generating the audio stream according to the target voice.

The memory stores computer-executable instructions;

Claims

1. A video processing method, comprising:

2. The method of claim 1, wherein obtaining the first interaction information input by the first user at the video interface comprises:

3. The method of claim 1, wherein obtaining the first interaction information input by the first user at the video interface comprises:

4. A method according to any of claims 1-3, characterized in that the live video further comprises images of the first user and the second user.

5. A video processing method, comprising:

receiving a plurality of target interaction information sent by electronic equipment, wherein the target interaction information comprises text information and/or voice information;

Receiving an anchor video sent by anchor equipment;

6. The method of claim 5, wherein determining a video stream from the anchor video and the plurality of target interaction information comprises:

7. The method of claim 6, wherein generating an audio stream from the plurality of target interaction information comprises:

and generating the audio stream according to the target voice.

8. The method of claim 7, wherein determining a target voice in the third voice comprises:

9. The method according to claim 7 or 8, wherein generating the audio stream from the target speech comprises:

10. The method according to any one of claims 5-9, further comprising:

And adding at least one image of the user in the anchor video.

11. The video processing device is characterized by comprising a display module, an acquisition module, a sending module, a receiving module and a playing module, wherein:

the receiving module is used for receiving the video stream sent by the server;

12. A video processing apparatus, comprising a receiving module, a determining module, and a transmitting module, wherein:

The receiving module is used for receiving a plurality of target interaction information sent by the electronic equipment, wherein the target interaction information comprises text information and/or voice information;

13. An electronic device, comprising: a processor and a memory;

The memory stores computer-executable instructions;

The processor executing computer-executable instructions stored in the memory, causing the processor to perform the video processing method of any one of claims 1 to 4.

14. A server, comprising: a processor and a memory;

The memory stores computer-executable instructions;

the processor executing computer-executable instructions stored in the memory, causing the processor to perform the video processing method of any one of claims 5 to 10.

15. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the video processing method of any one of claims 1 to 4 or the video processing method of any one of claims 5 to 10.