WO2024131576A1 - 视频处理方法、装置及电子设备 - Google Patents

视频处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2024131576A1
WO2024131576A1 PCT/CN2023/137590 CN2023137590W WO2024131576A1 WO 2024131576 A1 WO2024131576 A1 WO 2024131576A1 CN 2023137590 W CN2023137590 W CN 2023137590W WO 2024131576 A1 WO2024131576 A1 WO 2024131576A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
voice
user
information
stream
Prior art date
Application number
PCT/CN2023/137590
Other languages
English (en)
French (fr)
Inventor
吴金泳
温大川
赖磊
李志威
曾芝健
唐立
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024131576A1 publication Critical patent/WO2024131576A1/zh

Links

Definitions

  • the embodiments of the present disclosure relate to the field of video processing technology, and in particular to a video processing method, device and electronic device.
  • the anchor can interact based on the text sent by the user. For example, when the user is watching the anchor's live video, the user can send text in real time, and the text can be displayed in the live video screen. The anchor can see the text in the live video and then interact with the user.
  • the anchor cannot hold the live broadcast device or watch the live video, and the viewing users can only hear the anchor's voice and cannot obtain the interactive information of multiple users in the live video.
  • the anchor also needs to obtain the interactive information of multiple users through other methods (such as user reminders at the live broadcast site, etc.), which leads to a high complexity of live interaction.
  • the present disclosure provides a video processing method, device and electronic device, which are used to solve the technical problem of high complexity of live broadcast interaction in the prior art.
  • the present disclosure provides a video processing method, the method comprising:
  • Video interface where the video interface is used to play live video
  • the present disclosure provides another video processing method, the method comprising:
  • Target interaction information includes text information and/or voice information
  • the video stream is sent to the anchor device and the multiple electronic devices.
  • the present disclosure provides a video processing device, the video processing device comprising a display module, an acquisition module, a sending module, a receiving module and a playing module, wherein:
  • the display module is used to display a video interface, and the video interface is used to play live video;
  • the acquisition module is used to acquire first interactive information input by a first user on the video interface
  • the sending module is used to send the first interactive information to the server
  • the receiving module is used to receive the video stream sent by the server
  • the playback module is used to play a live video on the video interface according to the video stream, wherein the live video includes a host video, a first voice corresponding to the first interactive information, and a second voice corresponding to second interactive information of at least one second user.
  • the present disclosure provides another video processing device, the video processing device comprising a receiving module, a determining module and a sending module, wherein:
  • the receiving module is used to receive multiple target interaction information sent by multiple electronic devices, and the target interaction information includes text information and/or voice information;
  • the receiving module is also used to receive the anchor video sent by the anchor device;
  • the determination module is used to determine a video stream according to the host video and the multiple target interaction information, where the video stream includes an encoded stream of the host video and an encoded stream of the voice associated with the target interaction information;
  • the sending module is used to send the video to the anchor device and the multiple electronic devices. Frequency flow.
  • the present disclosure provides an electronic device, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the video processing method as described in the first aspect and various possible aspects of the first aspect.
  • the present disclosure provides a server, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the second aspect as well as various possible video processing methods involved in the second aspect.
  • the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the video processing method as described in the first aspect and various possible aspects of the first aspect is implemented, or the video processing method as described in the second aspect and various possible aspects of the second aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the computer program implements the video processing method as described in the first aspect and various possible aspects of the first aspect, or implements the video processing method as described in the second aspect and various possible aspects of the second aspect.
  • the present disclosure provides a video processing method, device and electronic device, the electronic device can obtain a video interface, the video interface is used to play a live video, obtain the first interactive information input by the first user on the video interface, send the first interactive information to the server, receive the video stream sent by the server, and play the live video on the video interface according to the video stream, wherein the live video includes the anchor video, the first voice corresponding to the first interactive information, and the second voice corresponding to the second interactive information of at least one second user.
  • the anchor when the anchor is broadcasting live, since the live video can include the voice corresponding to the interactive information of the user watching, the anchor can hear the user's interactive information without watching the live video, and the user can also obtain the voice interactive information of multiple users and the anchor through the live video, thereby reducing the complexity of the live interaction.
  • FIG1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • FIG2 is a schematic diagram of a flow chart of a video processing method provided by an embodiment of the present disclosure
  • FIG3 is a schematic diagram of a process of displaying a video interface provided by an embodiment of the present disclosure
  • FIG4A is a schematic diagram of a process of obtaining first interactive information provided by an embodiment of the present disclosure
  • FIG4B is a schematic diagram of another process of obtaining first interactive information provided by an embodiment of the present disclosure.
  • FIG5 is a schematic diagram of another process of obtaining first interactive information provided by an embodiment of the present disclosure.
  • FIG6 is a schematic diagram of a scenario for sending first interactive information provided by an embodiment of the present disclosure.
  • FIG7 is a schematic diagram of another video processing method provided by an embodiment of the present disclosure.
  • FIG8 is a schematic diagram of a process for determining a target voice according to an embodiment of the present disclosure
  • FIG9 is a schematic diagram of the structure of a video processing device provided by an embodiment of the present disclosure.
  • FIG10 is a schematic diagram of the structure of another video processing device provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present disclosure.
  • Electronic device a device with wireless transceiver function.
  • Electronic devices can be deployed on land, including indoors or outdoors, handheld, wearable or vehicle-mounted; they can also be deployed on the water (such as ships, etc.).
  • the electronic device can be a mobile phone, a tablet computer, a computer with wireless transceiver function, a virtual reality (VR) electronic device, an augmented reality (AR) electronic device, an industrial control (industrial control)
  • the wireless terminals in the embodiments of the present disclosure may also be referred to as wireless terminals in vehicle-mounted electronic devices, wireless terminals in self-driving, wireless electronic devices in remote medical, wireless electronic devices in smart grid, wireless electronic devices in transportation safety, wireless electronic devices in smart city, wireless electronic devices in smart home, wearable electronic devices, etc.
  • the electronic devices involved in the embodiments of the present disclosure may also be referred to as terminals, user equipment (UE), access electronic devices, vehicle-mounted terminals, industrial control terminals, UE units, UE stations, mobile stations, mobile stations, remote stations, remote electronic devices, mobile devices, UE electronic devices, wireless communication devices, UE agents or UE devices, etc.
  • the electronic devices may also be fixed or mobile.
  • the anchor can interact based on the text sent by the user. For example, when a user is watching the anchor's live video, the user can send the text "Hello" in real time, and the text can be displayed in the screen of the live video. After the anchor sees the text "Hello" in the live video, he can reply to the text by voice, thereby interacting with the users who watch the live broadcast.
  • the anchor cannot hold the live broadcast device or watch the live video, and the user can only hear the anchor's voice and cannot obtain the interactive information of multiple users in the live video.
  • the user watching the live video cannot feel the atmosphere of the live concert, and the anchor also needs other users on the scene to remind the interactive information in the live video to interact with the user, which leads to a high complexity of the live interaction.
  • the embodiment of the present disclosure provides a video processing method, in which an electronic device displays a video interface for playing a live video, obtains first interactive information input by a first user on the video interface, sends the first interactive information to a server, receives a video stream sent by the server, and obtains a host video stream and an audio stream in the video stream, wherein the audio stream is determined based on a first voice corresponding to the first interactive information and a second voice corresponding to the second interactive information, and a live video is obtained based on the host video stream and the audio stream.
  • the live video includes the host video, the first voice corresponding to the first interactive information, and the second voice corresponding to the second interactive information of at least one second user
  • the host can hear the user's interactive information without watching the live video, and the user can also obtain the voice interactive information between multiple users and the host through the live video, thereby reducing the complexity of the live interaction.
  • FIG1 is a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • FIG1 includes electronic devices 1, ..., electronic devices N, anchor devices and servers.
  • electronic devices 1, ..., electronic devices N can send interactive information to the server
  • the anchor device can send anchor videos to the server.
  • the electronic device generates a live video based on the voice and anchor video corresponding to the interactive information, and sends the live video to electronic devices 1, ..., electronic devices N and the anchor device.
  • the audience can directly send text and cheers to the anchor, and the audience can hear the anchor's singing and background sound, and the background sound can include the cheers of multiple audiences.
  • the anchor can hear the audience chorus, and the audience can also hear the anchor's singing and the audience chorus, and the audience and the anchor can directly communicate by voice, so that the audience and the anchor can interact directly, reducing the complexity of live interaction.
  • FIG. 1 is only an illustrative example of an application scenario of an embodiment of the present disclosure, and is not intended to limit the application scenario of the embodiment of the present disclosure.
  • FIG2 is a flow chart of a video processing method provided by an embodiment of the present disclosure. Referring to FIG2 , the method may include:
  • the execution subject of the embodiment of the present disclosure may be an electronic device, or a video processing device arranged in the electronic device.
  • the video processing device may be implemented by software, or the video processing device may be implemented by a combination of software and hardware.
  • the video interface is used to play live video.
  • the video interface can display live video including the host video.
  • the electronic device can display the video interface in response to the user's triggering operation on the live broadcast application. For example, when the user clicks on the live broadcast application, the electronic device can display a live broadcast page, which can include live broadcast links of multiple hosts, and when the user clicks on any live broadcast link, the electronic device can display a video interface, which can play live video including the host video of the host.
  • FIG3 is a schematic diagram of a process of displaying a video interface provided by an embodiment of the present disclosure.
  • the display page of the electronic device includes a control of a live broadcast application.
  • the electronic device can display a page corresponding to the live broadcast application, which includes a live broadcast video control of anchor 1, a live broadcast video control of anchor 2, and a live broadcast video control of anchor 3.
  • the electronic device can display a video interface and play the live broadcast video corresponding to anchor 2 in the video interface.
  • S202 Obtain first interactive information input by a first user on a video interface.
  • the first interactive information may include text information and voice information. For example, if a user inputs the text "Hello” on the video interface, the electronic device may determine the text as the first interactive information, and if a user inputs the voice "Come on” on the video interface, the electronic device may determine the voice as the first interactive information.
  • the electronic device may obtain the first interactive information input by the first user on the video interface based on the following two feasible implementation methods:
  • a possible implementation method :
  • a text window for inputting text is displayed.
  • the video interface may include a text editing control, and when a user clicks on the text editing control, the video interface may display a text window for inputting text.
  • the text editing control may be a text input bar, and when a user clicks on the text input bar, the user may enter relevant text in the text input bar through the electronic device.
  • the text content edited in the text window is obtained, and the text content is determined as the first interactive information. For example, when the user clicks the text editing control in the video interface, the video interface may pop up a text window. If the user enters the text "Come on” in the text window and clicks the confirmation control, the electronic device may determine the text "Come on” as the first interactive information. For example, when the user enters the text "Come on” in the text input field and clicks the send control, the electronic device may determine the text "Come on” as the first interactive information.
  • FIG4A is a schematic diagram of a process of obtaining first interactive information provided by an embodiment of the present disclosure.
  • an electronic device is included.
  • the display page of the electronic device is a video interface
  • the video played in the video interface is a live video.
  • the video interface includes a text editing control.
  • a text window pops up on the video interface.
  • the user enters the text "Come on” in the text input field of the text window and clicks the send control.
  • the electronic device determines that the first interactive information obtained is the text "Come on”.
  • FIG4B is another schematic diagram of a process for obtaining first interactive information provided by an embodiment of the present disclosure.
  • the display page of the electronic device is a video interface
  • the video played in the video interface is a live video.
  • a text input bar may pop up at the bottom of the video interface, and the user enters the text "Come on” in the text input bar and clicks the send control, and the electronic device determines that the first interactive information obtained is the text "Come on”.
  • the voice of the first user is acquired, and the voice of the first user is determined as the first interactive information.
  • the video interface may include a voice collection control.
  • the electronic device may collect the voice of the first user in real time and determine the voice as the first interactive information. For example, if the voice of the first user collected by the electronic device is the voice "Come on”, the electronic device may determine the voice "Come on” as the first interactive information.
  • FIG5 is another schematic diagram of a process for obtaining first interactive information provided by an embodiment of the present disclosure.
  • the display page of the electronic device is a video interface
  • the video played in the video interface is a live video.
  • the video interface includes a voice acquisition control.
  • the electronic device can collect surrounding voices. If the voice input by the user to the electronic device is a voice of "Come on”, then when the user stops touching the voice acquisition control, the electronic device determines that the first interactive information obtained is the voice of "Come on”.
  • S203 Send first interactive information to the server.
  • the electronic device may send the first interactive information to the server.
  • the first interactive information obtained by the electronic device may be text information and voice information, and the electronic device may send the obtained text information and voice information to the server.
  • the electronic device may send the text information to the server, or convert the text information into voice information and then send the voice information to the server. This embodiment of the present disclosure is not limited to this.
  • FIG6 is a schematic diagram of a scenario for sending first interactive information provided by an embodiment of the present disclosure.
  • viewer 1 can send text A to the server according to the electronic device used (not shown in FIG6)
  • viewer 2 can send text B to the server according to the electronic device used
  • viewer 3 can send voice to the server according to the electronic device used.
  • the server can receive the interactive information sent by the three viewers, and then generate a live video stream according to the interactive information sent by the three viewers and the acquired anchor video.
  • the text of audience 1 and audience 2 can be converted into speech in an electronic device and then the speech corresponding to the text can be sent to a server.
  • the text can be sent to a server and the server converts the text into speech. This embodiment of the present disclosure is not limited to this.
  • S204 Receive the video stream sent by the server.
  • the electronic device may receive a video stream sent by the server.
  • S205 Play the live video on the video interface according to the video stream.
  • the live video includes the host video, the first voice corresponding to the first interactive information, and the second voice corresponding to the second interactive information of at least one second user.
  • the second user may be a user other than the first user.
  • the server may receive interactive information sent by multiple users, and therefore, the live video may include voices corresponding to the interactive information of multiple users.
  • the second interactive information may be interactive information input by the second user on the video interface.
  • the second interactive information may include text information and voice information.
  • the method by which the second user inputs the second interactive information to the video interface is the same as the method by which the first user inputs the first interactive information on the video interface, and the embodiments of the present disclosure will not be described in detail herein.
  • the first voice may be a voice corresponding to the first interactive information.
  • the first voice when the first interactive information is text information, the first voice may be a voice corresponding to the text information, and when the first interactive information is voice information, the electronic device may determine the voice information as the first voice. For example, if the first interactive information is the text "Come on”, the first voice may be a voice "Come on”. If the first interactive information is a voice "Come on”, the first voice may be a voice "Come on”.
  • the second voice may be a voice corresponding to the second interactive information.
  • the second voice may be the voice corresponding to the text information
  • the electronic device may determine the voice information as the first voice. For example, if the second interactive information is the text "Hello”, the second voice may be the voice "Hello”, and if the second interactive information is the voice "Hello", the second voice may be the voice "Hello”.
  • the video stream may include a host video stream and an audio stream
  • the electronic device may play the live video on the video interface according to the following feasible implementation method: obtain the host video stream and the audio stream in the video stream.
  • the host video stream can be obtained based on the host video.
  • the server encodes the host video and can obtain the host video stream corresponding to the host video.
  • the audio stream can be determined based on the first voice and the second voice. For example, when the server obtains the first voice and the second voice, it can generate a synthetic voice based on the first voice and the second voice, and encode the synthetic voice to obtain an audio stream.
  • a live video is obtained.
  • the electronic device receives the video stream sent by the server, it can decode the host video stream and the audio stream in the video stream to obtain a live video, which may include the host video, the first voice and the second voice.
  • the live video may also include images of the first user and the second user.
  • the electronic device may obtain the image of the first user, and when the electronic device sends the first interactive information to the server, the electronic device may also send the image of the first user to the server, and the server may add the image of the first user to the host video, so that the live video also includes the image of the first user.
  • the electronic device may obtain the image of the second user, and when the electronic device sends the second interactive information to the server, the electronic device may also send the image of the second user to the server, and the server may add the image of the second user to the host video, so that the live video also includes the image of the second user.
  • the live video is a live video of a concert
  • the images of the first user and the second user may be included in the audience area of the live video, thereby improving the effect of the live video.
  • the embodiment of the present disclosure provides a video processing method, in which an electronic device displays a video interface for playing a live video, displays a text window for inputting text in response to a trigger operation on a text editing control in the video interface, obtains the text content edited in the text window, and determines the text content as first interactive information, or, in response to a trigger operation on a voice acquisition control in the video interface, The voice of the first user is obtained and determined as the first interactive information.
  • the electronic device can send the first interactive information to the server, receive the video stream sent by the server, and play the live video on the video interface according to the video stream.
  • the live video includes the host video, the first voice corresponding to the first interactive information, and the second voice corresponding to the second interactive information of at least one second user
  • the host can hear the user's interactive information without watching the live video, and the user can also obtain the voice interaction information of multiple users and the host through the live video, thereby reducing the complexity of the live interaction.
  • FIG7 is a schematic diagram of another video processing method provided by an embodiment of the present disclosure. Referring to FIG7 , the method flow includes:
  • S701 receiving multiple target interaction information sent by an electronic device.
  • the execution subject of the embodiment of the present disclosure may be a server, or a video processing device disposed in the server.
  • the video processing device may be implemented by software, or by a combination of software and hardware, which is not limited in the embodiment of the present disclosure.
  • the target interaction information may include text information and/or voice information.
  • the target interaction information may be text information input by the user watching the live broadcast, the target interaction information may also be voice information input by the user watching the live broadcast, or the target interaction information may also be text information and voice information input by the user watching the live broadcast.
  • the target interaction information received by the server is the text "Come on”
  • the target interaction information received by the server is the voice "Come on”.
  • the electronic device may be an electronic device of the first user or the second user.
  • the electronic device may be an electronic device of a user watching a live video.
  • the user may input text information and voice information into the electronic device, and the server may receive the text information and voice information sent by the electronic device.
  • S702 Receive a host video sent by a host device.
  • the anchor video may be a video including the anchor
  • the anchor device may be a device that shoots the anchor video.
  • the anchor video in the live broadcast video is provided by the anchor device.
  • the anchor device shoots the video including the anchor, it can send the video including the anchor to the server.
  • the live video of the video the server can send the live video to the electronic devices of the users who watch the live video, so that the users can watch the live video.
  • S703 Determine a video stream according to the host video and multiple target interaction information.
  • the video stream includes an encoded stream of the host video and an encoded stream of the voice associated with the target interactive information.
  • the server can convert the text information into voice information.
  • the target interactive information is the text "Come on”
  • the server can convert the text "Come on” into the voice "Come on”.
  • the server when the server receives text information sent by an electronic device, the server can convert the text information into voice information. If the server can determine the user's timbre through the voice sent by the electronic device, then when the server converts the text information into voice information, the timbre associated with the voice information can be the user's timbre.
  • the server may generate a video stream according to the following feasible implementation methods: generate an audio stream according to multiple target interaction information, and generate a video stream according to the host video and the audio stream.
  • the server encodes the host video to obtain the host video stream, and merges the host video stream and the audio stream to obtain the video stream.
  • the audio stream is obtained based on the voices corresponding to the multiple target interaction information.
  • the audio stream may be an encoded stream associated with the voices corresponding to the multiple target interaction information.
  • the electronic device may generate an audio stream according to the following feasible implementation method: obtain multiple third voices associated with multiple target interaction information, determine the target voice in the third voices, and generate an audio stream according to the target voice.
  • the third voice may be the voice corresponding to the target interaction information. For example, if the target interaction information is the text "Come on”, the third voice may be the voice "Come on”, and if the target interaction information is the voice "Hello", the third voice may be the voice "Hello".
  • the number of target voices is greater than or equal to the first threshold.
  • each voice includes corresponding semantic information, so the server can classify the third voice into multiple types of voices according to the semantic information, and then determine the target voice according to the number of each type of voice.
  • the server determines the target speech in the third speech, specifically by clustering the semantics of the plurality of third speech to obtain at least one category of the third speech, obtaining the number of speech of each category of the third speech, and determining the category of the third speech whose number of speech is greater than or equal to the first threshold as the target speech.
  • the number of third voices acquired by the server is 100
  • the first threshold is 20. If the server performs clustering processing on the semantics of the 100 voices and obtains 30 voices with semantic A, 60 voices with semantic B, and 10 voices with semantic C, the electronic device can determine the 30 voices corresponding to semantic A and the 60 voices corresponding to semantic B as target voices.
  • FIG8 is a schematic diagram of a process for determining a target voice provided by an embodiment of the present disclosure.
  • the server (not shown in FIG8 ) can perform semantic clustering processing on the 100 third voices according to the semantics of each third voice, and obtain two categories of third voices.
  • the semantics of one category of third voices is "Come on”, and the semantics of the other category of third voices is "Nice to hear”.
  • the number of third voices with the semantics of "Come on” is 30, and the number of third voices with the semantics of "Nice to hear” is 70. If the first threshold is 50, the server determines that the semantics of the target voice is "Nice to hear", that is, the 70 third voices with the semantics of "Nice to hear” are determined as the target voice.
  • the server generates an audio stream according to the target voice, specifically: according to the number of voices of each type of target voice, the voice volume corresponding to each type of target voice is determined.
  • the server can obtain a first preset relationship, and determine the voice volume corresponding to the target voice according to the first preset relationship and the number of voices of the target voice.
  • the first preset relationship may include at least one voice number and the voice volume corresponding to each voice number.
  • the first preset relationship can be shown in Table 1:
  • Table 1 is only an example to illustrate the first preset relationship, and is not a limitation to the first preset relationship.
  • the server determines that the number of target voices is number 1, the server determines that the voice volume corresponding to the target voice is volume a; if the server determines that the number of target voices is number 2, the server determines that the voice volume corresponding to the target voice is volume b; if the server determines that the number of target voices is number 3, the server determines that the voice volume corresponding to the target voice is volume c.
  • the device can generate a synthesized speech according to the target speech, wherein the semantics of the synthesized speech is the same as that of the target speech, and the synthesized speech can have the effect of multiple users speaking the target speech.
  • the server sets the volume of the synthesized speech to the speech volume corresponding to the target speech, and encodes the synthesized speech to obtain an audio stream.
  • the server may also receive an image of a user sent by the electronic device, and the server may also add at least one user's image to the host video.
  • the server may add at least one user's image to the audience area of the acquired host video, thereby improving the effect of the live video.
  • S704 Send the video stream to the anchor device and the electronic device.
  • the server can send the video stream to the anchor device and the electronic devices of multiple users watching the live broadcast.
  • the anchor can obtain the user's interactive information through the audio of the live video and reply to the interactive information.
  • Users watching the live broadcast can also feel the effect of the live broadcast through the live video (such as the effect of multiple audiences singing together, the effect of asking questions to the anchor on the spot, etc.), thereby improving the effect of the live broadcast interaction.
  • the disclosed embodiment provides a video processing method, which receives multiple target interaction information sent by multiple electronic devices, receives a host video sent by a host device, determines a video stream according to the host video and multiple target interaction information, and sends the video stream to the host device and multiple electronic devices.
  • the host can hear the user's interaction information without watching the live video, and the more interaction information there is, the better the chorus effect is, which improves the effect of the live broadcast.
  • the user can also obtain the voice interaction information of multiple users and the host through the live video, thereby reducing the complexity of the live broadcast interaction.
  • FIG9 is a schematic diagram of the structure of a video processing device provided by an embodiment of the present disclosure.
  • the video processing device 10 includes a display module 11, an acquisition module 12, a sending module 13, a receiving module 14 and a playing module 15, wherein:
  • the display module 11 is used to display a video interface, and the video interface is used to play live video;
  • the acquisition module 12 is used to acquire first interactive information input by the first user on the video interface
  • the sending module 13 is used to send the first interactive information to the server;
  • the receiving module 14 is used to receive the video stream sent by the server;
  • the playback module 15 is used to play the live video on the video interface according to the video stream.
  • the live video includes a host video, a first voice corresponding to the first interactive information, and a second voice corresponding to the second interactive information of at least one second user.
  • the acquisition module 12 is specifically used for:
  • the text content edited in the text window is obtained, and the text content is determined as the first interactive information.
  • the acquisition module 12 is specifically used to: acquire the voice of the first user in response to a triggering operation on a voice acquisition control in the video interface;
  • the voice of the first user is determined as the first interactive information.
  • the live video also includes images of the first user and the second user.
  • the video processing device provided in the embodiment of the present disclosure may be used to execute the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
  • FIG10 is a schematic diagram of the structure of another video processing device provided by an embodiment of the present disclosure.
  • the video processing device 20 includes a receiving module 21, a determining module 22 and a sending module 23, wherein:
  • the receiving module 21 is used to receive multiple target interaction information sent by multiple electronic devices, and the target interaction information includes text information and/or voice information;
  • the receiving module 21 is also used to receive the anchor video sent by the anchor device;
  • the determination module 22 is used to determine a video stream according to the host video and the multiple target interaction information, where the video stream includes an encoded stream of the host video and an encoded stream of the voice associated with the target interaction information;
  • the sending module 23 is used to send the video stream to the anchor device and the multiple electronic devices.
  • the determination module 22 is specifically configured to:
  • the video stream is generated according to the host video and the audio stream.
  • the determination module 22 is specifically configured to:
  • the audio stream is generated according to the target speech.
  • the determination module 22 is specifically configured to:
  • the number of voices of each type of third voice is obtained, and a type of third voice whose number of voices is greater than or equal to a first threshold is determined as the target voice.
  • the determination module 22 is specifically configured to:
  • the audio stream is generated according to the target voice and the voice volume corresponding to the target voice.
  • an image of at least one user is added to the host video.
  • the video processing device provided in the embodiment of the present disclosure may be used to execute the technical solution of the above method embodiment, and its implementation principle and technical effect are similar, which will not be described in detail in this embodiment.
  • FIG11 is a schematic diagram of the structure of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 1100 may be a terminal device or a server.
  • the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (Portable Media Players, PMPs), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PMPs portable multimedia players
  • vehicle-mounted terminals such as vehicle-mounted navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG11 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 1100 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 1101, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1102 or a program loaded from a storage device 1108 to a random access memory (RAM) 1103.
  • a processing device 1101 e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • Various programs and data required for the operation of the electronic device 1100 are also stored in the RAM 1103.
  • the processing device 1101, the ROM 1102, and the RAM 1103 are connected to each other via a bus 1104.
  • An input/output (I/O) interface 1105 is also connected to the bus 1104 .
  • the following devices may be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 1107 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 1108 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 1109.
  • the communication device 1109 may allow the electronic device 1100 to communicate wirelessly or wired with other devices to exchange data.
  • FIG. 11 shows an electronic device 1100 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 1109, or installed from the storage device 1108, or installed from the ROM 1102.
  • the processing device 1101 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the embodiment of the present disclosure also includes a server, which may include a processor and a memory, the memory stores computer-executable instructions, and the processor executes the computer-executable instructions stored in the memory, so that the processor executes the video processing method described in any one of the above embodiments.
  • a server which may include a processor and a memory
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the processor executes the video processing method described in any one of the above embodiments.
  • the embodiment of the present disclosure also includes a computer-readable storage medium, in which computer-executable instructions are stored.
  • a processor executes the computer-executable instructions, the video processing method as described in any of the above embodiments is implemented.
  • the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only Memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program, which may be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, which carries a computer-readable program code. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which may send, propagate, or transmit a program for use by or in combination with an instruction execution system, device or device.
  • the program code contained on the computer-readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each box in the flowchart or block diagram may represent a module, a program segment, or a portion of code, which contains one or more executable instructions for implementing the specified logical functions.
  • the marked boxes may be Functions may also occur in a different order than that noted in the accompanying drawings. For example, two blocks shown in succession may actually be executed substantially in parallel, or they may sometimes be executed in the opposite order, depending on the functions involved.
  • each block in the block diagram and/or flow chart, and combinations of blocks in the block diagram and/or flow chart may be implemented with a dedicated hardware-based system that performs the specified functions or operations, or may be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • a prompt message is sent to the user to clearly prompt the user that the operation requested to be performed will require obtaining and using the user's personal information.
  • the user can autonomously choose whether to provide personal information to software or hardware such as an electronic device, application, server, or storage medium that performs the operation of the technical solution of the present disclosure according to the prompt message.
  • the prompt information in response to receiving an active request from the user, may be sent to the user in the form of a pop-up window, in which the prompt information may be presented in text form.
  • the pop-up window may also carry a selection control for the user to choose "agree” or “disagree” to provide personal information to the electronic device.
  • the data involved in this technical solution shall comply with the requirements of the relevant laws and regulations.
  • the data may include information, parameters and messages, such as flow switching indication information.
  • one or more embodiments of the present disclosure provide a video processing method, the method comprising:
  • Video interface where the video interface is used to play live video
  • obtaining first interactive information input by a first user on a video interface includes:
  • the text content edited in the text window is obtained, and the text content is determined as the first interactive information.
  • obtaining first interactive information input by a first user on a video interface includes:
  • the voice of the first user is determined as the first interactive information.
  • the live video also includes images of the first user and the second user.
  • one or more embodiments of the present disclosure provide another video processing method, the method comprising:
  • Target interaction information includes text information and/or voice information
  • the video stream is sent to the anchor device and the multiple electronic devices.
  • determining a video stream according to the host video and the plurality of target interaction information includes:
  • the video stream is generated according to the host video and the audio stream.
  • generating an audio stream according to the multiple target interaction information includes:
  • the audio stream is generated according to the target speech.
  • determining a target voice in the third voice includes:
  • the number of voices of each type of third voice is obtained, and a type of third voice whose number of voices is greater than or equal to a first threshold is determined as the target voice.
  • generating the audio stream according to the target speech includes:
  • the audio stream is generated according to the target voice and the voice volume corresponding to the target voice.
  • the method further includes:
  • An image of at least one user is added to the anchor video.
  • one or more embodiments of the present disclosure provide a video processing device, the video processing device comprising a display module, an acquisition module, a sending module, a receiving module and a playing module, wherein:
  • the display module is used to display a video interface, and the video interface is used to play live video;
  • the acquisition module is used to acquire the first interactive information input by the first user on the video interface. interest;
  • the sending module is used to send the first interactive information to the server
  • the receiving module is used to receive the video stream sent by the server
  • the playback module is used to play a live video on the video interface according to the video stream, wherein the live video includes a host video, a first voice corresponding to the first interactive information, and a second voice corresponding to second interactive information of at least one second user.
  • the acquisition module is specifically used to:
  • the text content edited in the text window is obtained, and the text content is determined as the first interactive information.
  • the acquisition module is specifically used to:
  • the voice of the first user is determined as the first interactive information.
  • the live video also includes images of the first user and the second user.
  • one or more embodiments of the present disclosure provide another video processing device, the video processing device comprising a receiving module, a determining module and a sending module, wherein:
  • the receiving module is used to receive multiple target interaction information sent by multiple electronic devices, and the target interaction information includes text information and/or voice information;
  • the receiving module is also used to receive the anchor video sent by the anchor device;
  • the determination module is used to determine a video stream according to the host video and the multiple target interaction information, where the video stream includes an encoded stream of the host video and an encoded stream of the voice associated with the target interaction information;
  • the sending module is used to send the video stream to the anchor device and the multiple electronic devices.
  • the determining module is specifically used to:
  • the video stream is generated according to the host video and the audio stream.
  • the determining module is specifically used to:
  • the audio stream is generated according to the target speech.
  • the determining module is specifically used to:
  • the number of voices of each type of third voice is obtained, and a type of third voice whose number of voices is greater than or equal to a first threshold is determined as the target voice.
  • the determining module is specifically used to:
  • the audio stream is generated according to the target voice and the voice volume corresponding to the target voice.
  • an image of at least one user is added to the host video.
  • the present disclosure provides an electronic device, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the video processing method as described in the first aspect and various possible aspects of the first aspect.
  • the present disclosure provides a server, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the second aspect as well as various possible video processing methods involved in the second aspect.
  • the present disclosure provides a computer-readable storage medium, in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the video processing method as described in the first aspect and various possible aspects of the first aspect is implemented, or the video processing method as described in the second aspect and various possible aspects of the second aspect is implemented.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, When the computer program is executed by the processor, it implements the first aspect as described above and various possible video processing methods involved in the first aspect, or implements the second aspect as described above and various possible video processing methods involved in the second aspect.

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本公开提供一种视频处理方法、装置及电子设备,该方法包括:显示视频界面(S201),视频界面用于播放直播视频;获取第一用户在视频界面输入的第一互动信息(S202),并向服务器发送第一互动信息(S203);接收服务器发送的视频流(S204),并根据视频流在视频界面播放直播视频(S205),直播视频中包括主播视频、第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。降低直播互动的复杂度。

Description

视频处理方法、装置及电子设备
本申请要求2022年12月20日递交的申请号为202211644421.9、标题为“视频处理方法、装置及电子设备”的中国发明专利申请的优先权,该中国专利申请的全部内容通过引用并入本文中。
技术领域
本公开实施例涉及视频处理技术领域,尤其涉及一种视频处理方法、装置及电子设备。
背景技术
在主播进行直播时,观看直播的用户可以与主播进行互动,进而提高直播的效果。
目前,主播可以根据用户发送的文本进行互动。例如,在用户观看主播的直播视频时,用户可以实时的发送文本,该文本可以显示在直播视频的画面中,主播可以在直播视频中看到该文本,进而与用户进行互动。但是,在部分场景中(如,户外直播、演唱会直播),主播无法手持直播设备,也无法观看直播视频,而观看的用户也只能听到主播的声音,无法在直播视频中获取多个用户的互动信息,主播也需要通过其它方式(如,直播现场的用户提醒等)获取多个用户的互动信息,进而导致直播互动的复杂度较高。
发明内容
本公开提供一种视频处理方法、装置及电子设备,用于解决现有技术中直播互动的复杂度较高的技术问题。
第一方面,本公开提供一种视频处理方法,该方法包括:
显示视频界面,所述视频界面用于播放直播视频;
获取第一用户在所述视频界面输入的第一互动信息,并向服务器发送 所述第一互动信息;
接收所述服务器发送的视频流,并根据所述视频流在所述视频界面播放直播视频,所述直播视频中包括主播视频、所述第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。
第二方面,本公开提供另一种视频处理方法,该方法包括:
接收多个电子设备发送的多个目标互动信息,所述目标互动信息包括文本信息和/或语音信息;
接收主播设备发送的主播视频;
根据所述主播视频和所述多个目标互动信息,确定视频流,所述视频流包括主播视频的编码流和所述目标互动信息相关联的语音的编码流;
向所述主播设备和所述多个电子设备发送所述视频流。
第三方面,本公开提供一种视频处理装置,该视频处理装置包括显示模块、获取模块、发送模块、接收模块和播放模块,其中:
所述显示模块用于,显示视频界面,所述视频界面用于播放直播视频;
所述获取模块用于,获取第一用户在所述视频界面输入的第一互动信息;
所述发送模块用于,向服务器发送所述第一互动信息;
所述接收模块用于,接收所述服务器发送的视频流;
所述播放模块用于,根据所述视频流在所述视频界面播放直播视频,所述直播视频中包括主播视频、所述第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。
第四方面,本公开提供另一种视频处理装置,该视频处理装置包括接收模块、确定模块和发送模块,其中:
所述接收模块用于,接收多个电子设备发送的多个目标互动信息,所述目标互动信息包括文本信息和/或语音信息;
所述接收模块还用于,接收主播设备发送的主播视频;
所述确定模块用于,根据所述主播视频和所述多个目标互动信息,确定视频流,所述视频流包括主播视频的编码流和所述目标互动信息相关联的语音的编码流;
所述发送模块用于,向所述主播设备和所述多个电子设备发送所述视 频流。
第五方面,本公开提供一种电子设备,包括:处理器和存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能涉及的所述视频处理方法。
第六方面,本公开提供一种服务器,包括:处理器和存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第二方面以及第二方面各种可能涉及的所述视频处理方法。
第七方面,本公开提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能涉及的所述视频处理方法,或者,实现如上第二方面以及第二方面各种可能涉及的所述视频处理方法。
第八方面,本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能涉及的所述视频处理方法,或者,实现如上第二方面以及第二方面各种可能涉及的所述视频处理方法。
本公开提供一种视频处理方法、装置及电子设备,电子设备可以获取视频界面,视频界面用于播放直播视频,获取第一用户在视频界面输入的第一互动信息,并向服务器发送第一互动信息,接收服务器发送的视频流,并根据视频流在视频界面播放直播视频,其中,直播视频中包括主播视频、第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。在上述方法中,在主播进行直播时,由于直播视频可以包括观看的用户的互动信息对应的语音,因此,主播无需观看直播视频也可以听到用户的互动信息,用户也可以通过直播视频,获取到多个用户与主播的语音互动信息,进而降低直播互动的复杂度。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种应用场景示意图;
图2为本公开实施例提供的一种视频处理方法的流程示意图;
图3为本公开实施例提供的一种显示视频界面的过程示意图;
图4A为本公开实施例提供的一种获取第一互动信息的过程示意图;
图4B为本公开实施例提供的另一种获取第一互动信息的过程示意图;
图5为本公开实施例提供的另一种获取第一互动信息的过程示意图;
图6为本公开实施例提供的一种发送第一互动信息的场景示意图;
图7为本公开实施例提供的另一种视频处理方法的示意图;
图8为本公开实施例提供的一种确定目标语音的过程示意图;
图9为本公开实施例提供的一种视频处理装置的结构示意图;
图10为本公开实施例提供的另一种视频处理装置的结构示意图;以及
图11为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
为了便于理解,下面,对本公开实施例涉及的概念进行说明。
电子设备:是一种具有无线收发功能的设备。电子设备可以部署在陆地上,包括室内或室外、手持、穿戴或车载;也可以部署在水面上(如轮船等)。所述电子设备可以是手机(mobile phone)、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)电子设备、增强现实(augmented reality,AR)电子设备、工业控制(industrial control) 中的无线终端、车载电子设备、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线电子设备、智能电网(smart grid)中的无线电子设备、运输安全(transportation safety)中的无线电子设备、智慧城市(smart city)中的无线电子设备、智慧家庭(smart home)中的无线电子设备、可穿戴电子设备等。本公开实施例所涉及的电子设备还可以称为终端、用户设备(user equipment,UE)、接入电子设备、车载终端、工业控制终端、UE单元、UE站、移动站、移动台、远方站、远程电子设备、移动设备、UE电子设备、无线通信设备、UE代理或UE装置等。电子设备也可以是固定的或者移动的。
在相关技术中,观看直播的用户可以与主播进行互动,进而可以提高主播的直播效果。目前,主播可以根据用户发送的文本进行互动。例如,在用户观看主播的直播视频时,用户可以实时的发送文本“你好”,该文本可以显示在直播视频的画面中,主播在直播视频中看到文本“你好”之后,可以通过语音对该文本进行回复,进而与观看直播的用户进行互动。但是,在部分场景中,主播无法手持直播设备,也无法观看直播视频,而用户也只能听到主播的声音,无法在直播视频中获取多个用户的互动信息,例如,在演唱会直播的场景中,观看直播视频的用户无法感受到现场演唱会的氛围,而主播也需要现场的其它用户提醒的直播视频中的互动信息与用户进行互动,进而导致直播互动的复杂度较高。
为了解决上述技术问题,本公开实施例提供一种视频处理方法,电子设备显示用于播放直播视频的视频界面,获取第一用户在视频界面输入的第一互动信息,并向服务器发送第一互动信息,接收服务器发送的视频流,并在视频流中获取主播视频流和音频流,音频流是基于第一互动信息对应的第一语音和第二互动信息对应的第二语音确定得到的,根据主播视频流和音频流,得到直播视频。这样,由于直播视频中包括主播视频、第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音,因此,主播无需观看直播视频也可以听到用户的互动信息,用户也可以通过直播视频,获取到多个用户与主播的语音互动信息,进而降低直播互动的复杂度。
下面,结合图1,对本公开实施例的应用场景进行说明。
图1为本公开实施例提供的一种应用场景示意图。请参见图1,包括电子设备1、……、电子设备N、主播设备和服务器。其中,电子设备1、……、电子设备N可以向服务器发送互动信息,主播设备可以向服务器发送主播视频。电子设备根据互动信息对应的语音和主播视频,生成直播视频,并向电子设备1、……、电子设备N和主播设备发送直播视频。
请参见图1,在上述网络结构中,观众可以直接向主播发送文本和欢呼声,观众可以听到主播的歌声和背景音,背景音可以包括多个观众的欢呼声。在多个观众通过多个电子设备发出歌唱声时,主播可以听到观众合唱,并且观众也可以听到主播的歌声和观众合唱,并且观众和主播可以直接进行语音交流,这样,观众和主播可以直接进行互动,降低直播互动的复杂度。
需要说明的是,图1只是示例性的示意本公开实施例的应用场景,并非对本公开实施例的应用场景的限定。
下面以具体地实施例对本公开的技术方案以及本公开的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本公开的实施例进行描述。
图2为本公开实施例提供的一种视频处理方法的流程示意图。请参见图2,该方法可以包括:
S201、显示视频界面。
本公开实施例的执行主体可以为电子设备,也可以为设置在电子设备中的视频处理装置。可选的,视频处理装置可以通过软件实现,视频处理装置也可以通过软件和硬件的结合实现。
可选的,视频界面用于播放直播视频。例如,视频界面可以显示包括主播视频的直播视频。可选的,电子设备可以响应于用户对直播应用程序的触发操作,显示视频界面。例如,在用户点击直播应用程序时,电子设备可以显示直播页面,该直播页面中可以包括多个主播的直播衔接,在用户点击任意一个直播衔接时,电子设备可以显示视频界面,该视频界面中可以播放包括该主播的主播视频的直播视频。
下面,结合图3,对显示视频界面的过程进行说明。
图3为本公开实施例提供的一种显示视频界面的过程示意图。请参见图3,包括:电子设备。其中,电子设备的显示页面中包括直播应用的控件。在用户点击直播应用的控件时,电子设备可以显示直播应用对应的页面,该页面中包括主播1的直播视频控件、主播2的直播视频控件和主播3的直播视频控件。在用户点击主播2的直播视频控件时,电子设备可以显示视频界面,并在视频界面中播放主播2对应的直播视频。
S202、获取第一用户在视频界面输入的第一互动信息。
可选的,第一互动信息可以包括文本信息和语音信息。例如,用户在视频界面输入文本“你好”,电子设备可以将该文本确定为第一互动信息,用户在视频界面输入语音“加油”,电子设备可以将该语音确定为第一互动信息。
可选的,电子设备可以基于如下两种可行的实现方式,获取第一用户在视频界面输入的第一互动信息:
一种可行的实现方式:
响应于对视频界面中的文本编辑控件的触发操作,显示输入文本的文本窗口。例如,电子设备在视频界面播放直播视频时,视频界面中可以包括文本编辑控件,在用户点击文本编辑控件时,视频界面可以显示输入文本的文本窗口。例如,文本编辑控件可以为文本输入栏,在用户点击文本输入栏时,用户可以通过电子设备在文本输入栏中输入相关的文本。
获取文本窗口内编辑的文本内容,并将文本内容确定为第一互动信息。例如,在用户点击视频界面中的文本编辑控件时,视频界面可以弹出文本窗口,若用户在文本窗口内输入文本“加油”,并点击确认控件时,电子设备可以将文本“加油”确定为第一互动信息。例如,用户在文本输入栏中输入文本“加油”,并点击发送控件时,电子设备可以将文本“加油”确定为第一互动信息。
下面,结合图4A-图4B,对该种实现方式中获取第一互动信息的过程进行说明。
图4A为本公开实施例提供的一种获取第一互动信息的过程示意图。请参见图4A,包括电子设备。其中,电子设备的显示页面为视频界面,视频界面中播放的视频为直播视频。视频界面包括文本编辑控件,在用户 点击文本编辑控件时,视频界面弹出文本窗口,用户在文本窗口的文本输入栏中输入文本加油,并点击发送控件,电子设备确定获取的第一互动信息为文本“加油”。
图4B为本公开实施例提供的另一种获取第一互动信息的过程示意图。请参见图4B,包括电子设备。其中,电子设备的显示页面为视频界面,视频界面中播放的视频为直播视频。在用户点击视频界面的任意一点时,视频界面的底部可以弹出文本输入栏,用户在文本输入栏中输入文本加油,并点击发送控件,电子设备确定获取的第一互动信息为文本“加油”。
另一种可行的实现方式:
响应于对视频界面中的语音采集控件的触发操作,获取第一用户的语音,并将第一用户的语音,确定为第一互动信息。例如,电子设备在视频界面播放直播视频时,视频界面中可以包括语音采集控件,在用户点击语音采集控件时,电子设备可以实时的采集第一用户的语音,并将该语音确定为第一互动信息。例如,若电子设备采集的第一用户的语音为语音“加油”,则电子设备可以将语音“加油”确定为第一互动信息。
下面,结合图5,对该种可行的实现方式中,获取第一互动信息的过程进行说明。
图5为本公开实施例提供的另一种获取第一互动信息的过程示意图。请参见图5,包括电子设备。其中,电子设备的显示页面为视频界面,视频界面中播放的视频为直播视频。视频界面中包括语音采集控件。在用户长按语音采集控件时,电子设备可以采集周围的语音,若用户向电子设备输入的语音为语音“加油”,则在用户停止触控语音采集控件时,电子设备确定获取的第一互动信息为语音“加油”。
S203、向服务器发送第一互动信息。
可选的,电子设备获取到第一用户输入的第一互动信息之后,电子设备可以向服务器发送第一互动信息。例如,电子设备获取到第一互动信息可以为文本信息和语音信息,电子设备可以向服务器发送获取的文本信息和语音信息,需要说明的是,在第一互动信息为文本信息时,电子设备可以向服务器发送文本信息,也可以将文本信息转换为语音信息,再向服务器发送语音信息,本公开实施例对此不作限定。
下面,结合图6,对向服务器发送互动信息的过程进行说明。
图6为本公开实施例提供的一种发送第一互动信息的场景示意图。请参见图6,包括:观众1、观众2、观众3和服务器。其中,在3个观众观看直播视频时,观众1可以根据使用的电子设备(图6中未示出)向服务器发送文本A,观众2可以根据使用的电子设备向服务器发送文本B,观众3可以根据使用的电子设备向服务器发送语音。这样,服务器可以接收到3个观众发送的互动信息,进而根据3个观众发送的互动信息和获取的主播视频,生成直播视频流。
需要说明的是,在图6所示的实施例中,观众1和观众2的文本可以在电子设备中转换为语音后,再向服务器发送文本对应的语音,也可以向服务器发送文本,服务器将文本转换为语音,本公开实施例对此不作限定。
S204、接收服务器发送的视频流。
可选的,电子设备向服务器发送第一互动信息之后,电子设备可以接收服务器发送的视频流。
S205、根据视频流在视频界面播放直播视频。
可选的,直播视频中包括主播视频、第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。可选的,第二用户可以为除第一用户之外的其它用户。例如,在实际应用过程中,服务器可以接收到多个用户发送的互动信息,因此,直播视频中可以包括多个用户的互动信息对应的语音。
可选的,第二互动信息可以为第二用户在视频界面输入的互动信息。例如,第二互动信息可以包括文本信息和语音信息,第二用户向视频界面输入第二互动信息的方法与第一用户在视频界面输入第一互动信息的方法相同,本公开实施例在此不再进行赘述。
可选的,第一语音可以为第一互动信息对应的语音。例如,在第一互动信息为文本信息时,第一语音可以为该文本信息对应的语音,在第一互动信息为语音信息时,电子设备可以将该语音信息确定为第一语音。例如,若第一互动信息为文本“加油”,则第一语音可以为语音“加油”。若第一互动信息为语音“加油”,则第一语音可以为语音“加油”。
可选的,第二语音可以为第二互动信息对应的语音。例如,在第二互 动信息为文本信息时,第二语音可以为该文本信息对应的语音,在第二互动信息为语音信息时,电子设备可以将该语音信息确定为第一语音。例如,若第二互动信息为文本“你好”,则第二语音可以为语音“你好”,若第二互动信息为语音“你好”,则第二语音可以为语音“你好”。
可选的,视频流可以包括主播视频流和音频流,电子设备可以根据如下可行的实现方式,在视频界面播放直播视频:在视频流中获取主播视频流和音频流。其中,主播视频流可以基于主播视频得到。例如,服务器对主播视频进行编码,可以得到主播视频对应的主播视频流。音频流可以基于第一语音和第二语音确定得到的。例如,服务器获取到第一语音和第二语音时,可以根据第一语音和第二语音生成合成语音,并对合成语音进行编码,得到音频流。
根据主播视频流和音频流,得到直播视频。例如,电子设备接收到服务器发送的视频流之后,可以对视频流中的主播视频流和音频流进行解码,进而得到直播视频,该直播视频中可以包括主播视频、第一语音和第二语音。
可选的,直播视频中还可以包括第一用户和第二用户的图像。例如,在第一用户向视频界面输入第一互动信息时,电子设备可以获取第一用户的图像,在电子设备向服务器发送第一互动信息时,电子设备还可以向服务器发送第一用户的图像,服务器可以在主播视频中加入第一用户的图像,使得直播视频中还包括第一用户的图像。例如,在第二用户向视频界面输入第二互动信息时,电子设备可以获取第二用户的图像,在电子设备向服务器发送第二互动信息时,电子设备还可以向服务器发送第二用户的图像,服务器可以在主播视频中加入第二用户的图像,使得直播视频中还包括第二用户的图像。例如,在直播视频为演唱会的直播视频时,电子设备接收到的直播视频中,直播视频的观众席区域中可以包括第一用户和第二用户的图像,进而提高直播视频的效果。
本公开实施例提供一种视频处理方法,电子设备显示用于播放直播视频的视频界面,响应于对视频界面中的文本编辑控件的触发操作,显示输入文本的文本窗口,获取文本窗口内编辑的文本内容,并将文本内容确定为第一互动信息,或者,响应于对视频界面中的语音采集控件的触发操作, 获取第一用户的语音,并将第一用户的语音,确定为第一互动信息,电子设备可以向服务器发送第一互动信息,接收服务器发送的视频流,并根据视频流在视频界面播放直播视频。这样,由于直播视频中包括主播视频、第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音,因此,主播无需观看直播视频也可以听到用户的互动信息,用户也可以通过直播视频,获取到多个用户与主播的语音互动信息,进而降低直播互动的复杂度。
在图2所示的实施例的基础上,下面,结合图7,对另一种视频处理方法进行说明。
图7为本公开实施例提供的另一种视频处理方法的示意图。请参见图7,该方法流程包括:
S701、接收电子设备发送的多个目标互动信息。
本公开实施例的执行主体可以为服务器,也可以为设置在服务器中的视频处理装置。可选的,视频处理装置可以通过软件实现,视频处理装置也可以通过软件和硬件的结合实现,本公开实施例对此不作限定。
可选的,目标互动信息可以包括文本信息和/或语音信息。例如,目标互动信息可以为观看直播的用户输入的文本信息,目标互动信息也可以为观看直播的用户输入的语音信息,目标互动信息还可以为观看直播的用户输入的文本信息和语音信息。例如,在用户观看直播视频时,若用户在视频界面输入文本“加油”,则服务器接收到的目标互动信息为文本“加油”,若用户在视频界面输入语音“加油”,则服务器接收到的目标互动信息为语音“加油”。
可选的,电子设备可以为第一用户或第二用户的电子设备。例如,电子设备可以为观看直播视频的用户的电子设备,在用户观看直播视频时,用户可以向电子设备中输入文本信息和语音信息,服务器可以接收到电子设备发送的文本信息和语音信息。
S702、接收主播设备发送的主播视频。
可选的,主播视频可以为包括主播的视频,主播设备可以为拍摄主播视频的设备。例如,在直播场景中,直播视频中的主播视频由主播设备提供,主播设备拍摄到包括主播的视频之后,可以向服务器发送包括该主播 视频的直播视频,服务器可以向观看直播的用户的电子设备发送直播视频,使得用户可以观看到直播视频。
S703、根据主播视频和多个目标互动信息,确定视频流。
可选的,视频流包括主播视频的编码流和目标互动信息相关联的语音的编码流。例如,若目标互动信息为文本信息,则服务器可以将该文本信息转换为语音信息。例如,若目标互动信息为文本“加油”,则服务器可以将该文本“加油”转换为语音“加油”。
需要说明的是,在服务器接收到电子设备发送的文本信息时,服务器可以将文本信息转换为语音信息时,若服务器可以通过该电子设备已发送的语音确定用户的音色,则服务器在将文本信息转换为语音信息时,语音信息相关联的音色可以为该用户的音色。
可选的,服务器可以根据如下可行的实现方式,生成视频流:根据多个目标互动信息,生成音频流,根据主播视频和所述音频流,生成视频流。例如,服务器对主播视频进行编码,得到主播视频流,将主播视频流和音频流合并,进而得到视频流。可选的,音频流是基于多个目标互动信息对应的语音得到的。例如,音频流可以为多个目标互动信息对应的语音相关联的编码流。
可选的,电子设备可以根据如下可行的实现方式,生成音频流:获取多个目标互动信息相关联的多个第三语音,在第三语音中确定目标语音,并根据目标语音生成音频流。可选的,第三语音可以为目标互动信息对应的语音。例如,若目标互动信息为文本“加油”,则第三语音可以为语音“加油”,若目标互动信息为语音“你好”,则第三语音可以为语音“你好”。
可选的,目标语音的数量大于或等于第一阈值。例如,在实际应用的过程中,每个语音都包括对应的语义信息,因此,服务器根据语义信息可以将第三语音划分为多种类型的语音,进而根据每种类型的语音数量确定目标语音。
可选的,服务器在第三语音中确定目标语音,具体为:对多个第三语音的语义进行聚类处理,得到至少一类第三语音,获取每类第三语音的语音数量,将语音数量大于或等于第一阈值的一类第三语音,确定为目标语 音。例如,服务器获取的第三语音的数量为100条,第一阈值为20,若服务器对100条语音的语义进行聚类处理,得到语义A的语音为30条,语义B的语音为60条,语义C的语音为10条,则电子设备可以将语义A对应的30条语音和语义B对应的60条语音确定为目标语音。
下面,结合图8,对确定目标语音的过程进行说明。
图8为本公开实施例提供的一种确定目标语音的过程示意图。请参见图8,包括服务器获取的100条第三语音。服务器(图8中未示出)可以根据每条第三语音的语义,对100条第三语音进行语义聚类处理,得到两类第三语音。其中一类第三语音的语义为加油,另一类第三语音的语义为好听。语义为加油的第三语音的语音数量为30,语义为好听的第三语音的语音数量为70。若第一阈值为50,则服务器确定目标语音的语义为好听,即,将70条语义为好听的第三语音,确定为目标语音。
可选的,服务器根据目标语音,生成音频流,具体为:根据每类目标语音的语音数量,确定每类目标语音对应的语音音量。可选的,服务器可以获取第一预设关系,并根据第一预设关系和目标语音的语音数量,确定目标语音对应的语音音量。其中,第一预设关系可以包括至少一个语音数量和每个语音数量对应的语音音量。例如,第一预设关系可以如表1所示:
表1
需要说明的是,表1只是以示例的形式示意第一预设关系,并非对第一预设关系的限定。
例如,若服务器确定目标语音的数量为数量1,则服务器确定目标语音对应的语音音量为音量a;若服务器确定目标语音的数量为数量2,则服务器确定目标语音对应的语音音量为音量b;若服务器确定目标语音的数量为数量3,则服务器确定目标语音对应的语音音量为音量c。
根据目标语音和目标语音对应的语音音量,生成音频流。例如,服务 器可以根据目标语音生成合成语音,其中,合成语音的语义与目标语音的语义相同,合成语音可以具备多个用户发出目标语音的效果,服务器将合成语音的音量设置为目标语音对应的语音音量,并对该合成语音进行编码,得到音频流。
可选的,服务器还可以接收到电子设备发送的用户的图像,服务器还可以在主播视频中添加至少一个用户的图像。例如,在演唱会直播的场景中,服务器可以在获取的主播视频的观众席区域中添加至少一个用户的图像,进而提高直播视频的效果。
S704、向主播设备和电子设备发送视频流。
可选的,在服务器得到视频流之后,服务器可以向主播设备和多个观看直播的用户的电子设备发送视频流,这样,主播可以通过直播视频的音频获取用户的互动信息,并对互动信息进行回复,观看直播的用户也可以通过直播视频感受到现场直播的效果(如,多个观众合唱的效果、向主播现场提问的效果等),进而提高直播互动的效果。
本公开实施例提供一种视频处理方法,接收多个电子设备发送的多个目标互动信息,接收主播设备发送的主播视频,根据主播视频和多个目标互动信息,确定视频流,向主播设备和多个电子设备发送视频流。这样,主播无需观看直播视频也可以听到用户的互动信息,并且互动信息越多,合声的效果越好,提高直播的效果,用户也可以通过直播视频,获取到多个用户与主播的语音互动信息,进而降低直播互动的复杂度。
图9为本公开实施例提供的一种视频处理装置的结构示意图。请参见图9,该视频处理装置10包括显示模块11、获取模块12、发送模块13、接收模块14和播放模块15,其中:
所述显示模块11用于,显示视频界面,所述视频界面用于播放直播视频;
所述获取模块12用于,获取第一用户在所述视频界面输入的第一互动信息;
所述发送模块13用于,向服务器发送所述第一互动信息;
所述接收模块14用于,接收所述服务器发送的视频流;
所述播放模块15用于,根据所述视频流在所述视频界面播放直播视 频,所述直播视频中包括主播视频、所述第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。
根据本公开一个或多个实施例,所述获取模块12具体用于:
响应于对所述视频界面中的文本编辑控件的触发操作,显示输入文本的文本窗口;
获取所述文本窗口内编辑的文本内容,并将所述文本内容确定为所述第一互动信息。
根据本公开一个或多个实施例,所述获取模块12具体用于:响应于对所述视频界面中的语音采集控件的触发操作,获取第一用户的语音;
将所述第一用户的语音,确定为所述第一互动信息。
根据本公开一个或多个实施例,所述直播视频中还包括第一用户和第二用户的图像。
本公开实施例提供的视频处理装置,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图10为本公开实施例提供的另一种视频处理装置的结构示意图。请参见图10,该视频处理装置20包括接收模块21、确定模块22和发送模块23,其中:
所述接收模块21用于,接收多个电子设备发送的多个目标互动信息,所述目标互动信息包括文本信息和/或语音信息;
所述接收模块21还用于,接收主播设备发送的主播视频;
所述确定模块22用于,根据所述主播视频和所述多个目标互动信息,确定视频流,所述视频流包括主播视频的编码流和所述目标互动信息相关联的语音的编码流;
所述发送模块23用于,向所述主播设备和所述多个电子设备发送所述视频流。
根据本公开一个或多个实施例,所述确定模块22具体用于:
根据所述多个目标互动信息,生成音频流,所述音频流是基于所述多个目标互动信息对应的语音得到的;
根据所述主播视频和所述音频流,生成所述视频流。
根据本公开一个或多个实施例,所述确定模块22具体用于:
获取多个目标互动信息相关联的多个第三语音;
在所述第三语音中确定目标语音,所述目标语音的数量大于或等于第一阈值;
根据所述目标语音,生成所述音频流。
根据本公开一个或多个实施例,所述确定模块22具体用于:
对所述多个第三语音的语义进行聚类处理,得到至少一类第三语音;
获取每类第三语音的语音数量,将所述语音数量大于或等于第一阈值的一类第三语音,确定为所述目标语音。
根据本公开一个或多个实施例,所述确定模块22具体用于:
根据每类目标语音的语音数量,确定每类目标语音对应的语音音量;
根据所述目标语音和所述目标语音对应的语音音量,生成所述音频流。
根据本公开一个或多个实施例,在所述主播视频中添加至少一个用户的图像。
本公开实施例提供的视频处理装置,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
图11为本公开实施例提供的一种电子设备的结构示意图。请参见图11,其示出了适于用来实现本公开实施例的电子设备1100的结构示意图,该电子设备1100可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图11示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图11所示,电子设备1100可以包括处理装置(例如中央处理器、图形处理器等)1101,其可以根据存储在只读存储器(Read Only Memory,简称ROM)1102中的程序或者从存储装置1108加载到随机访问存储器(Random Access Memory,简称RAM)1103中的程序而执行各种适当的动作和处理。在RAM 1103中,还存储有电子设备1100操作所需的各种程序和数据。处理装置1101、ROM 1102以及RAM 1103通过总线1104 彼此相连。输入/输出(I/O)接口1105也连接至总线1104。
通常,以下装置可以连接至I/O接口1105:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1106;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置1107;包括例如磁带、硬盘等的存储装置1108;以及通信装置1109。通信装置1109可以允许电子设备1100与其他设备进行无线或有线通信以交换数据。虽然图11示出了具有各种装置的电子设备1100,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1109从网络上被下载和安装,或者从存储装置1108被安装,或者从ROM 1102被安装。在该计算机程序被处理装置1101执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施例还包括一种服务器,服务器可以包括处理器和存储器,所述存储器存储计算机执行指令,所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器执行上述任意一个实施例所述的视频处理方法。
本公开实施例还包括一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上述任意一个实施例的视频处理方法。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读 存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的 功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当 依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明确地提示用户,其请求执行的操作将需要获取和使用到用户的个人信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供个人信息。
作为一种可选的但非限定性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供个人信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。数据可以包括信息、参数和消息等,如切流指示信息。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。
第一方面,本公开一个或多个实施例,提供一种视频处理方法,该方法包括:
显示视频界面,所述视频界面用于播放直播视频;
获取第一用户在所述视频界面输入的第一互动信息,并向服务器发送所述第一互动信息;
接收所述服务器发送的视频流,并根据所述视频流在所述视频界面播放直播视频,所述直播视频中包括主播视频、所述第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。
根据本公开一个或多个实施例,获取第一用户在视频界面输入的第一互动信息,包括:
响应于对所述视频界面中的文本编辑控件的触发操作,显示输入文本的文本窗口;
获取所述文本窗口内编辑的文本内容,并将所述文本内容确定为所述第一互动信息。
根据本公开一个或多个实施例,获取第一用户在视频界面输入的第一互动信息,包括:
响应于对所述视频界面中的语音采集控件的触发操作,获取第一用户的语音;
将所述第一用户的语音,确定为所述第一互动信息。
根据本公开一个或多个实施例,所述直播视频中还包括第一用户和第二用户的图像。
第二方面,本公开一个或多个实施例,提供另一种视频处理方法,该方法包括:
接收多个电子设备发送的多个目标互动信息,所述目标互动信息包括文本信息和/或语音信息;
接收主播设备发送的主播视频;
根据所述主播视频和所述多个目标互动信息,确定视频流,所述视频流包括主播视频的编码流和所述目标互动信息相关联的语音的编码流;
向所述主播设备和所述多个电子设备发送所述视频流。
根据本公开一个或多个实施例,根据所述主播视频和所述多个目标互动信息,确定视频流,包括:
根据所述多个目标互动信息,生成音频流,所述音频流是基于所述多个目标互动信息对应的语音得到的;
根据所述主播视频和所述音频流,生成所述视频流。
根据本公开一个或多个实施例,根据所述多个目标互动信息,生成音频流,包括:
获取多个目标互动信息相关联的多个第三语音;
在所述第三语音中确定目标语音,所述目标语音的数量大于或等于第一阈值;
根据所述目标语音,生成所述音频流。
根据本公开一个或多个实施例,在所述第三语音中确定目标语音,包括:
对所述多个第三语音的语义进行聚类处理,得到至少一类第三语音;
获取每类第三语音的语音数量,将所述语音数量大于或等于第一阈值的一类第三语音,确定为所述目标语音。
根据本公开一个或多个实施例,根据所述目标语音,生成所述音频流,包括:
根据每类目标语音的语音数量,确定每类目标语音对应的语音音量;
根据所述目标语音和所述目标语音对应的语音音量,生成所述音频流。
根据本公开一个或多个实施例,所述方法还包括:
在所述主播视频中添加至少一个用户的图像。
第三方面,本公开一个或多个实施例,提供一种视频处理装置,该视频处理装置包括显示模块、获取模块、发送模块、接收模块和播放模块,其中:
所述显示模块用于,显示视频界面,所述视频界面用于播放直播视频;
所述获取模块用于,获取第一用户在所述视频界面输入的第一互动信 息;
所述发送模块用于,向服务器发送所述第一互动信息;
所述接收模块用于,接收所述服务器发送的视频流;
所述播放模块用于,根据所述视频流在所述视频界面播放直播视频,所述直播视频中包括主播视频、所述第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。
根据本公开一个或多个实施例,所述获取模块具体用于:
响应于对所述视频界面中的文本编辑控件的触发操作,显示输入文本的文本窗口;
获取所述文本窗口内编辑的文本内容,并将所述文本内容确定为所述第一互动信息。
根据本公开一个或多个实施例,所述获取模块具体用于:
响应于对所述视频界面中的语音采集控件的触发操作,获取第一用户的语音;
将所述第一用户的语音,确定为所述第一互动信息。
根据本公开一个或多个实施例,所述直播视频中还包括第一用户和第二用户的图像。
第四方面,本公开一个或多个实施例,提供另一种视频处理装置,该视频处理装置包括接收模块、确定模块和发送模块,其中:
所述接收模块用于,接收多个电子设备发送的多个目标互动信息,所述目标互动信息包括文本信息和/或语音信息;
所述接收模块还用于,接收主播设备发送的主播视频;
所述确定模块用于,根据所述主播视频和所述多个目标互动信息,确定视频流,所述视频流包括主播视频的编码流和所述目标互动信息相关联的语音的编码流;
所述发送模块用于,向所述主播设备和所述多个电子设备发送所述视频流。
根据本公开一个或多个实施例,所述确定模块具体用于:
根据所述多个目标互动信息,生成音频流,所述音频流是基于所述多个目标互动信息对应的语音得到的;
根据所述主播视频和所述音频流,生成所述视频流。
根据本公开一个或多个实施例,所述确定模块具体用于:
获取多个目标互动信息相关联的多个第三语音;
在所述第三语音中确定目标语音,所述目标语音的数量大于或等于第一阈值;
根据所述目标语音,生成所述音频流。
根据本公开一个或多个实施例,所述确定模块具体用于:
对所述多个第三语音的语义进行聚类处理,得到至少一类第三语音;
获取每类第三语音的语音数量,将所述语音数量大于或等于第一阈值的一类第三语音,确定为所述目标语音。
根据本公开一个或多个实施例,所述确定模块具体用于:
根据每类目标语音的语音数量,确定每类目标语音对应的语音音量;
根据所述目标语音和所述目标语音对应的语音音量,生成所述音频流。
根据本公开一个或多个实施例,在所述主播视频中添加至少一个用户的图像。
第五方面,本公开提供一种电子设备,包括:处理器和存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能涉及的所述视频处理方法。
第六方面,本公开提供一种服务器,包括:处理器和存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第二方面以及第二方面各种可能涉及的所述视频处理方法。
第七方面,本公开提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能涉及的所述视频处理方法,或者,实现如上第二方面以及第二方面各种可能涉及的所述视频处理方法。
第八方面,本公开实施例提供一种计算机程序产品,包括计算机程序, 所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能涉及的所述视频处理方法,或者,实现如上第二方面以及第二方面各种可能涉及的所述视频处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (15)

  1. 一种视频处理方法,包括:
    显示视频界面,所述视频界面用于播放直播视频;
    获取第一用户在所述视频界面输入的第一互动信息,并向服务器发送所述第一互动信息;
    接收所述服务器发送的视频流,并根据所述视频流在所述视频界面播放直播视频,所述直播视频中包括主播视频、所述第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。
  2. 根据权利要求1所述的方法,其中获取第一用户在视频界面输入的第一互动信息,包括:
    响应于对所述视频界面中的文本编辑控件的触发操作,显示输入文本的文本窗口;
    获取所述文本窗口内编辑的文本内容,并将所述文本内容确定为所述第一互动信息。
  3. 根据权利要求1所述的方法,其中获取第一用户在视频界面输入的第一互动信息,包括:
    响应于对所述视频界面中的语音采集控件的触发操作,获取第一用户的语音;
    将所述第一用户的语音,确定为所述第一互动信息。
  4. 根据权利要求1-3任一项所述的方法,其中所述直播视频中还包括第一用户和第二用户的图像。
  5. 一种视频处理方法,包括:
    接收电子设备发送的多个目标互动信息,所述目标互动信息包括文本信息和/或语音信息;
    接收主播设备发送的主播视频;
    根据所述主播视频和所述多个目标互动信息,确定视频流,所述视频流包括主播视频的编码流和所述目标互动信息相关联的语音的编码流;
    向所述主播设备和所述多个电子设备发送所述视频流。
  6. 根据权利要求5所述的方法,其中根据所述主播视频和所述多个目标互动信息,确定视频流,包括:
    根据所述多个目标互动信息,生成音频流,所述音频流是基于所述多个目标互动信息对应的语音得到的;
    根据所述主播视频和所述音频流,生成所述视频流。
  7. 根据权利要求6所述的方法,其中根据所述多个目标互动信息,生成音频流,包括:
    获取多个目标互动信息相关联的多个第三语音;
    在所述第三语音中确定目标语音,所述目标语音的数量大于或等于第一阈值;
    根据所述目标语音,生成所述音频流。
  8. 根据权利要求7所述的方法,其中在所述第三语音中确定目标语音,包括:
    对所述多个第三语音的语义进行聚类处理,得到至少一类第三语音;
    获取每类第三语音的语音数量,将所述语音数量大于或等于第一阈值的一类第三语音,确定为所述目标语音。
  9. 根据权利要求7或8所述的方法,其中根据所述目标语音,生成所述音频流,包括:
    根据每类目标语音的语音数量,确定每类目标语音对应的语音音量;
    根据所述目标语音和所述目标语音对应的语音音量,生成所述音频流。
  10. 根据权利要求5-9任一项所述的方法,其中所述方法还包括:
    在所述主播视频中添加至少一个用户的图像。
  11. 一种视频处理装置,包括显示模块、获取模块、发送模块、接收模块和播放模块,其中:
    所述显示模块用于,显示视频界面,所述视频界面用于播放直播视频;
    所述获取模块用于,获取第一用户在所述视频界面输入的第一互动信息;
    所述发送模块用于,向服务器发送所述第一互动信息;
    所述接收模块用于,接收所述服务器发送的视频流;
    所述播放模块用于,根据所述视频流在所述视频界面播放直播视频,所述直播视频中包括主播视频、所述第一互动信息对应的第一语音、以及至少一个第二用户的第二互动信息对应的第二语音。
  12. 一种视频处理装置,包括接收模块、确定模块和发送模块,其中:
    所述接收模块用于,接收电子设备发送的多个目标互动信息,所述目标互动信息包括文本信息和/或语音信息;
    所述接收模块还用于,接收主播设备发送的主播视频;
    所述确定模块用于,根据所述主播视频和所述多个目标互动信息,确定视频流,所述视频流包括主播视频的编码流和所述目标互动信息相关联的语音的编码流;
    所述发送模块用于,向所述主播设备和所述多个电子设备发送所述视频流。
  13. 一种电子设备,包括:处理器和存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器执行如权利要求1至4中任一项所述的视频处理方法。
  14. 一种服务器,包括:处理器和存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器执行如权利要求5至10中任一项所述的视频处理方法。
  15. 一种计算机可读存储介质,其中所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至4中任一项所述的视频处理方法,或者,实现如权利要求5至10中任一项所述的视频处理方法。
PCT/CN2023/137590 2022-12-20 2023-12-08 视频处理方法、装置及电子设备 WO2024131576A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211644421.9A CN118233664A (zh) 2022-12-20 2022-12-20 视频处理方法、装置及电子设备
CN202211644421.9 2022-12-20

Publications (1)

Publication Number Publication Date
WO2024131576A1 true WO2024131576A1 (zh) 2024-06-27

Family

ID=91507100

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/137590 WO2024131576A1 (zh) 2022-12-20 2023-12-08 视频处理方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN118233664A (zh)
WO (1) WO2024131576A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465959A (zh) * 2017-07-14 2017-12-12 腾讯音乐娱乐(深圳)有限公司 在线互动的方法、装置及系统
US20190313146A1 (en) * 2018-04-10 2019-10-10 General Workings Inc. System and methods for interactive filters in live streaming media
CN112954378A (zh) * 2021-02-05 2021-06-11 广州方硅信息技术有限公司 直播间语音弹幕的播放方法、装置、电子设备及介质
CN113163221A (zh) * 2021-03-15 2021-07-23 北京城市网邻信息技术有限公司 一种互动处理方法、装置、电子设备及存储介质
CN113368489A (zh) * 2021-06-16 2021-09-10 广州博冠信息科技有限公司 直播互动方法及系统、装置、电子设备和存储介质
WO2022174579A1 (zh) * 2021-02-20 2022-08-25 北京达佳互联信息技术有限公司 直播间的互动方法及其装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107465959A (zh) * 2017-07-14 2017-12-12 腾讯音乐娱乐(深圳)有限公司 在线互动的方法、装置及系统
US20190313146A1 (en) * 2018-04-10 2019-10-10 General Workings Inc. System and methods for interactive filters in live streaming media
CN112954378A (zh) * 2021-02-05 2021-06-11 广州方硅信息技术有限公司 直播间语音弹幕的播放方法、装置、电子设备及介质
WO2022174579A1 (zh) * 2021-02-20 2022-08-25 北京达佳互联信息技术有限公司 直播间的互动方法及其装置
CN113163221A (zh) * 2021-03-15 2021-07-23 北京城市网邻信息技术有限公司 一种互动处理方法、装置、电子设备及存储介质
CN113368489A (zh) * 2021-06-16 2021-09-10 广州博冠信息科技有限公司 直播互动方法及系统、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN118233664A (zh) 2024-06-21

Similar Documents

Publication Publication Date Title
WO2020253806A1 (zh) 展示视频的生成方法、装置、设备及存储介质
WO2021196903A1 (zh) 视频处理方法、装置、可读介质及电子设备
WO2021008223A1 (zh) 信息的确定方法、装置及电子设备
WO2021218518A1 (zh) 视频的处理方法、装置、设备及介质
JP6971292B2 (ja) 段落と映像を整列させるための方法、装置、サーバー、コンピュータ可読記憶媒体およびコンピュータプログラム
JP6906584B2 (ja) デバイスをウェイクアップするための方法及び装置
WO2020151599A1 (zh) 视频同步发布方法、装置、电子设备及可读存储介质
CN111163330A (zh) 直播视频的渲染方法、装置、系统、设备及存储介质
WO2021012952A1 (zh) 消息处理方法、装置及电子设备
CN111629253A (zh) 视频处理方法及装置、计算机可读存储介质、电子设备
CN108924661A (zh) 基于直播间的数据交互方法、装置、终端和存储介质
CN113055624B (zh) 课程回放的方法、服务器、客户端及电子设备
JP6280215B2 (ja) ビデオ会議端末、セカンダリストリームデータアクセス方法およびコンピュータ記憶媒体
US20240127856A1 (en) Audio processing method and apparatus, and electronic device and storage medium
WO2021197023A1 (zh) 多媒体资源筛选方法、装置、电子设备及计算机存储介质
US11886484B2 (en) Music playing method and apparatus based on user interaction, and device and storage medium
WO2024109706A1 (zh) 媒体内容显示方法、装置、设备、可读存储介质及产品
CN110010127A (zh) 场景切换方法、装置、设备和存储介质
WO2020224294A1 (zh) 用于处理信息的系统、方法和装置
CN109635131B (zh) 多媒体内容榜单显示方法、推送方法,装置及存储介质
WO2024067157A1 (zh) 生成特效视频的方法、装置、电子设备及存储介质
US20240129598A1 (en) Mehtod, system and device for playing effect in live room
WO2020253452A1 (zh) 直播间状态消息的推送方法、交互内容的切换方法、装置及设备
WO2024094130A1 (zh) 内容分享方法、装置、设备、计算机可读存储介质及产品
WO2021114981A1 (zh) 视频播放页面显示方法、装置、电子设备和介质