WO2023284411A1 - 应用于直播的音频的输入输出的切换方法、直播设备 - Google Patents

应用于直播的音频的输入输出的切换方法、直播设备 Download PDF

Info

Publication number
WO2023284411A1
WO2023284411A1 PCT/CN2022/094396 CN2022094396W WO2023284411A1 WO 2023284411 A1 WO2023284411 A1 WO 2023284411A1 CN 2022094396 W CN2022094396 W CN 2022094396W WO 2023284411 A1 WO2023284411 A1 WO 2023284411A1
Authority
WO
WIPO (PCT)
Prior art keywords
live broadcast
scene
live
output
anchor
Prior art date
Application number
PCT/CN2022/094396
Other languages
English (en)
French (fr)
Inventor
陈映宜
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023284411A1 publication Critical patent/WO2023284411A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • H04N21/2335Processing of audio elementary streams involving reformatting operations of audio signals, e.g. by converting from one coding standard to another
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • Embodiments of the present disclosure relate to the technical field of computer and network communication, and in particular to a method for switching input and output of audio applied to live broadcast, live broadcast equipment, electronic equipment, readable storage media, computer program products, and computer programs.
  • live broadcasting has become a new trend in performing arts.
  • the live broadcast performer is called the host, and the equipment used for live broadcasting is called the live broadcast device, and the host can also communicate with the audience through the live broadcast device during the live broadcast.
  • Embodiments of the present disclosure provide a method for switching input and output of audio applied to live broadcast, live broadcast equipment, electronic equipment, readable storage media, computer program products, and computer programs, so as to overcome the cumbersome operation of manual switching and avoid the trouble of manual switching. Timeliness and low reliability issues.
  • an embodiment of the present disclosure provides a method for switching input and output of live audio, including:
  • the live scene includes a far-field scene and a near-field scene
  • the audio input and output of the live broadcast device are switched according to the change of the live broadcast scene.
  • an embodiment of the present disclosure provides a live broadcast device, including:
  • the main control component is used to obtain the live image of the anchor during the live broadcast, and determine the live scene of the anchor according to the live image, and the live scene includes a far field scene and a near field scene;
  • the main control component is also configured to, in response to the change of the live broadcast scene, generate a switching instruction according to the change of the live broadcast scene, and transmit the switching instruction to the audio processor, wherein the switching instruction is used to indicate Switch the audio input and output of the live broadcast device;
  • the audio processor is configured to switch the audio input and output of the live broadcast device according to the switching instruction.
  • an embodiment of the present disclosure provides an electronic device, including: at least one processor and a memory;
  • the memory stores computer-executable instructions
  • the at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the above first aspect and various possible methods of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the above first aspect and the first various possible methods.
  • a computer program product comprising: a computer program, the computer program is stored in a readable storage medium, at least one processor of an electronic device can read from the The storage medium reads the computer program, and the at least one processor executes the computer program so that the electronic device executes the method described in the first aspect.
  • an input and output switching device applied to live audio including:
  • the acquisition unit is used to acquire the live image of the anchor during the live broadcast
  • a determining unit configured to determine the live broadcast scene of the anchor according to the live broadcast image, and the live broadcast scene includes a far-field scene and a near-field scene;
  • the switching unit is configured to switch the audio input and output of the live broadcast device according to the change of the live broadcast scene in response to the change of the live broadcast scene.
  • a computer program is provided.
  • the computer program is executed by a processor, the above first aspect and various possible methods of the first aspect are implemented.
  • the input and output switching method and live broadcast equipment applied to live broadcast audio include: acquiring the live broadcast image of the anchor during live broadcast, and determining the live broadcast scene of the anchor according to the live broadcast image, and the live broadcast scene includes a far field scene and a near field Scene, in response to the change of the live scene, switch the audio input and output of the live device according to the change of the live scene, in this embodiment, introduce: determine the live scene based on the live image, and when the live scene changes, based on the live scene
  • the technical feature of changing and switching the audio input and output avoids the disadvantages of cumbersome operation caused by the host to manually switch the audio input and output of the live broadcast device when the live broadcast scene changes in related technologies, and improves the automation of live broadcast. It satisfies the anchor's live broadcast experience, and makes the overall live broadcast more smooth, improves the reliability of the live broadcast, and also satisfies the audience's viewing experience.
  • FIG. 1 is a schematic diagram of a scene of a method for switching input and output of audio applied to live broadcast according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a method for switching input and output of audio applied to live broadcast according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a method for switching input and output of audio applied to live broadcast according to another embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a method for switching input and output of audio applied to live broadcast according to another embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a live broadcast device according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a live broadcast device according to another embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of an input and output switching device applied to live audio according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of an input and output switching device applied to live audio according to another embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present disclosure.
  • Figure 1 is a schematic diagram of a live broadcast scene, as shown in Figure 1, the anchor 101 can complete the live broadcast based on the live broadcast device 102, and the live broadcast device 102 can be a mobile phone as shown in Figure 1, or other electronic devices, which are not included in this embodiment. limited.
  • the live broadcast device 102 can be provided with a camera 103, and the camera 103 can collect the live broadcast content of the anchor 101, and transmit the collected live content to the user equipment 105 of the audience 104, so that the audience 104 learns the live content through the user equipment 105.
  • the user equipment 105 may be a mobile phone as shown in FIG. 1 , or may be other electronic equipment, which is not limited in this embodiment.
  • the distance between the host and the live broadcast device can be divided into two scenarios, one is a far-field scenario, and the other is a near-field scenario.
  • the far-field scene refers to a live broadcast scene in which the distance between the anchor and the live broadcast device is relatively long
  • the near-field scene refers to a live broadcast scene in which the distance between the anchor and the live broadcast device is relatively short.
  • the anchor when the anchor is dancing, it is more suitable for the live broadcast of far-field scenes, so that the audience watching the live broadcast can see the anchor's complete dance posture, satisfying the viewing experience of the audience. And when the anchor finishes dancing and enters the interactive link with the audience, it is more suitable for the live broadcast of near-field scenes, so as to shorten the distance between the anchor and the audience, making the interaction effect stronger and satisfying the interactive experience of the audience.
  • the anchor when the scene of the live broadcast is switched, in order to improve the reliability of the live broadcast and satisfy the viewer's experience, the anchor needs to manually switch the audio input and output of the live broadcast device.
  • the audio output of the live broadcast device needs to be set to the live broadcast device
  • the external speaker output of the live broadcast device specifically the speaker output of the live broadcast device, so that the anchor can hear the music corresponding to the dance
  • the anchor manually sets the audio output of the live broadcast device, and selects the external speaker output of the live broadcast device.
  • the audio output of the live broadcast device needs to be set to headphone output to prevent the audience from hearing the audience's interactive audio information recorded by the live broadcast device.
  • To set the audio output of the live broadcast device select the headphone output connected to the live broadcast device.
  • the inventors of the present disclosure obtained the inventive concept of the present disclosure through creative work: determine the live broadcast scene according to the live broadcast image of the host during the live broadcast, so that based on the live broadcast scene The change automatically switches the audio input and output of the live equipment.
  • FIG. 2 is a schematic diagram of a method for switching input and output of live audio according to an embodiment of the present disclosure.
  • the method includes:
  • S201 Obtain a live broadcast image of the anchor during live broadcast, and determine a live broadcast scene of the anchor according to the live broadcast image, where the live broadcast scene includes a far-field scene and a near-field scene.
  • the executor of this embodiment may be a live broadcast device, which may be a device for implementing live broadcast, and this embodiment does not limit the type, style, shape, etc. of the live broadcast device.
  • the live image refers to the acquired image of the anchor during the live broadcast.
  • An image acquisition device may be set on the live broadcast device.
  • the image acquisition device may be a camera as shown in FIG. 1, and the image acquisition device ( The camera shown in FIG. 1) acquires the image of the anchor during the live broadcast, and obtains the live image including the anchor.
  • S202 In response to the change of the live broadcast scene, switch the audio input and output of the live broadcast device according to the change of the live broadcast scene.
  • This step can be understood as: the live broadcast device can determine whether the live broadcast scene changes based on the determined live broadcast scene, and if it is determined that the live broadcast scene changes, so as to switch the audio input and output of the live broadcast device based on the change of the live broadcast scene.
  • the live broadcast device when it determines that the live broadcast scene changes, it may generate a switching instruction based on the change of the live broadcast scene, and switch the audio input and output of the live broadcast device based on the switching instruction.
  • the live broadcast device can determine the live broadcast scene based on a preset time interval, and detect whether the current live broadcast scene is the same live broadcast scene as the previous live broadcast scene, and if it is a different live broadcast scene, it indicates that the live broadcast scene has changed, for example , if the current live broadcast scene is a far-field scene and the previous live broadcast scene is a near-field scene, the live broadcast device can generate a switching instruction to realize automatic switching of audio input and output of the live broadcast device.
  • the live broadcast device detects that the current live broadcast scene is the same live broadcast scene as the previous live broadcast scene, there is no need to switch the audio input and output of the live broadcast device.
  • the preset time interval may be determined by the live broadcast device based on requirements, historical records, and experiments, which is not limited in this embodiment.
  • the live broadcast device can determine the live broadcast scene in real time, such as detecting each frame of live image collected by the image acquisition device, and comparing the live scene of the current frame of live image with the live scene of the previous frame of live image For comparison, if the live broadcast scenes of the two frames of live images are different live broadcast scenes, the live broadcast device may generate a switching instruction to realize automatic switching of audio input and output of the live broadcast device.
  • the live broadcast device detects that the live broadcast scene of the current frame is the same as the live broadcast scene of the previous frame, there is no need to switch the audio input and output of the live broadcast device.
  • the embodiment of the present disclosure provides a method for switching audio input and output applied to live broadcast, including: acquiring the live image of the anchor during the live broadcast, and determining the live broadcast scene of the anchor according to the live image, the live broadcast scene includes remote Field scene and near-field scene, in response to the change of the live scene, switch the audio input and output of the live device according to the change of the live scene, in this embodiment, introduce: determine the live scene based on the live image, and when the live scene changes , the technical feature of switching the audio input and output based on the change of the live broadcast scene, avoiding the cumbersome operation caused by the anchor manually switching the audio input and output of the live broadcast device when the live broadcast scene changes in the related technology
  • the disadvantage is that it improves the automation of the live broadcast, satisfies the live broadcast experience of the anchor, and makes the overall live broadcast smoother, improves the reliability of the live broadcast, and also satisfies the viewing experience of the audience.
  • FIG. 3 is a schematic diagram of a method for switching input and output of live audio according to another embodiment of the present disclosure.
  • the method includes:
  • S302 Recognize the live image to obtain a first recognition result.
  • the first recognition result is used to characterize: the correlation between the anchor's first human body feature in the live image and the anchor's second human body feature in the real scene.
  • the first recognition result can be obtained by constructing a recognition model for recognizing human body features, and recognizing live images based on the recognition model, that is, obtaining the human body features of the anchor in the live image (that is, the first human body feature).
  • the first human body feature may be the first body area
  • the recognition model may identify the anchor's body area in the live image.
  • the second body area of the anchor in the real scene is stored in the live broadcast device, and the first recognition result represents the correlation between the first body area and the second body area.
  • the first human body feature can be the first body part of the anchor in the live image, such as the identification of the anchor image by the recognition model, it is determined that the head of the anchor is included in the live image, and the first recognition result represents the first body The relationship between the part and the host's overall body part in the live scene.
  • the association relationship may be the ratio between the first body area and the second body area, that is, the ratio of the body area of the anchor in the live image to the body area of the anchor in the real scene.
  • the live broadcast scene is a far-field scene.
  • the ratio is smaller than the first threshold, the live scene is a near-field scene.
  • the first threshold may be set by the live broadcast device based on requirements, historical records, and experiments, which is not limited in this embodiment.
  • the live broadcast scene is determined to be a near-field scene.
  • the ratio is relatively large, that is, the first body area is relatively large, and the anchor and the live broadcast device are relatively far away, then it is determined that the live broadcast scene is a far-field scene.
  • the live scene is determined, so that the determined live scene can have The technical effect of higher reliability and accuracy.
  • the association relationship may be an association relationship between the first body part and the whole body part.
  • the association relationship may specifically be that the recognition result includes the head in the whole body part.
  • the first body part includes relatively more parts in the overall body parts, it can be determined that the live broadcast scene is a far-field scene. Conversely, if the first body part includes relatively few parts in the overall body parts, it can be determined that the live broadcast scene is a near-field scene.
  • the live image is recognized and determined by the recognition model: the live image includes the head of the host's overall body parts, then the live scene is determined to be a near-field scene.
  • the correlation between the first human body feature of the anchor in the live image and the second human body feature of the anchor in the real scene is determined.
  • the determined association relationship can have high reliability and accuracy, and then when the live broadcast scene is determined based on the association relationship, the validity and accuracy of the determined live broadcast scene can be improved. technical effect.
  • S304 In response to the change of the live broadcast scene, switch the audio input and output of the live broadcast device according to the change of the live broadcast scene.
  • S304 may include the following embodiments:
  • Embodiment 1 If the live broadcast scene changes from a near-field scene to a far-field scene, switch the audio input of the live broadcast device to the microphone input of the live broadcast device.
  • the audio output of the live broadcast device can be switched to the audio output of the live broadcast device.
  • External output when the live broadcast device determines that the live broadcast scene changes from a near-field scene to a far-field scene, the audio output of the live broadcast device can be switched to the audio output of the live broadcast device.
  • the audio output of the live broadcast device is automatically switched from the earphone output to the external output output of the live broadcast device through the live broadcast device, so that the host can clearly hear
  • the dance music outputted by the live broadcast equipment provides more favorable conditions for the host to dance, avoids the cumbersome operation caused by the manual switching of the host, saves time, and improves the effectiveness and reliability of the live broadcast.
  • Embodiment 2 If the live broadcast scene changes from a near-field scene to a far-field scene, switch the audio output of the live broadcast device to the external output of the live broadcast device.
  • the audio input of the live broadcast device is the microphone input of the headset
  • the audio input of the live broadcast device can be switched to The microphone input of the live broadcast device.
  • the audio output of the live broadcast device is automatically switched from the microphone input of the earphone to the microphone input of the live device by the live broadcast device, so that the voice of the anchor
  • the audience is informed by the microphone of the live broadcast device, which avoids the cumbersome operation caused by the manual switching of the anchor, saves time, and improves the effectiveness and reliability of the technical effect of the live broadcast.
  • Embodiment 1 and Embodiment 2 may be two separate embodiments, and Embodiment 1 and Embodiment 2 may also be combined into one embodiment, which is not limited in this embodiment.
  • Embodiment 3 If the live broadcast scene changes from a far-field scene to a near-field scene, switch the audio output of the live broadcast device to the earphone output.
  • the audio output of the live broadcast device is the external output of the live broadcast device
  • the audio output of the live broadcast device can be changed from The external speaker output of the live broadcast device is switched to the headphone output connected to the live broadcast device.
  • the audio output of the live broadcast device is automatically switched from the headphone output to the headphone output through the live broadcast device, which can facilitate the interaction between the anchor and the audience, and satisfy The interactive experience of the audience improves the technical effect of live broadcast effectiveness and reliability.
  • Embodiment 4 If the live broadcast scene changes from a far-field scene to a near-field scene, switch the audio input of the live broadcast device to the microphone input of the earphone connected to the live broadcast device.
  • the audio input of the live broadcast device is the microphone input of the live broadcast device
  • the live broadcast device determines that the live broadcast scene changes from a far-field scene to a near-field scene
  • the audio input of the live broadcast device can be changed from the live
  • the microphone input of the device is switched to the microphone input of the headset connected to the live broadcast device.
  • the audio information of the anchor can be recorded relatively completely and clearly by the microphone of the headset connected to the live broadcast device, so as to satisfy the interactive experience of the audience and improve the reliability and accuracy of the live broadcast.
  • FIG. 4 is a schematic diagram of a method for switching input and output of live audio according to another embodiment of the present disclosure.
  • the method includes:
  • the second recognition result is used to characterize the relative distance between the host and the live broadcast device.
  • sample images can be collected, and the sample images include the images of the host during the live broadcast, according to the marked distance between the host and the live broadcast device (that is, the predetermined real distance between the host and the live broadcast device), and the sample The image is used to train the preset neural network model to obtain a prediction model for predicting the relative distance between the anchor and the live broadcast device.
  • the live image when the live broadcast device acquires the live image, the live image may be input into the prediction model, so as to obtain the second recognition result representing the relative distance.
  • S403 Determine the live broadcast scene according to the relative distance.
  • the live scene can be determined based on the relative distance, which can improve the reliability and accuracy of the determined live scene. , and then when the audio input and output of the live broadcast device are switched based on the live broadcast scene, the technical effect of switching accuracy and reliability can be achieved while realizing automatic switching.
  • the live broadcast scene is a near-field scene
  • the live broadcast scene is a far-field scene
  • the second threshold can be set by the live broadcast device based on requirements, historical records, and experiments, which is not limited in this embodiment.
  • S404 In response to the change of the live broadcast scene, switch the audio input and output of the live broadcast device according to the change of the live broadcast scene.
  • the embodiments of the present disclosure provide a live broadcast device.
  • FIG. 5 is a schematic diagram of a live broadcast device according to an embodiment of the present disclosure.
  • the live broadcast device 500 includes:
  • the main control component 501 is used to obtain the live image of the anchor during the live broadcast, and determine the live scene of the anchor according to the live image, and the live scene includes a far-field scene and a near-field scene.
  • the main control component 501 is also used to generate a switch instruction according to the change of the live broadcast scene in response to the change of the live broadcast scene, and transmit the switch instruction to the audio processor, wherein the switch instruction is used to instruct to switch the audio input and output of the live broadcast device.
  • the audio processor 502 is configured to switch the audio input and output of the live broadcast device according to the switching instruction.
  • FIG. 6 is a schematic diagram of a live broadcast device according to another embodiment of the present disclosure.
  • the live broadcast device 600 includes:
  • the image collection device 601 is configured to collect the live broadcast images of the host during the live broadcast, and transmit the collected live broadcast images to the main control component 602 .
  • the image acquisition device 601 is a device with an image acquisition function, such as a camera.
  • the main control component 602 is used to obtain the live image of the anchor during the live broadcast, and determine the live broadcast scene of the anchor according to the live image.
  • the live broadcast scene includes a far-field scene and a near-field scene.
  • the main control component 602 is also used to generate a switching instruction according to the change of the live broadcast scene in response to the change of the live broadcast scene, and transmit the switching instruction to the audio processor 603, wherein the switching instruction is used to instruct switching of the audio input of the live broadcast device 600 output.
  • the main control component 602 may generate an instruction to switch the audio input of the live broadcast device 600 to the live broadcast device 600 A switching command input by the microphone 604; and/or,
  • the main control component 602 may generate a switching instruction for instructing to switch the audio output of the live broadcast device 600 to the external playback output of the live broadcast device 600 .
  • the external speaker output of the live broadcast device 600 may specifically be the speaker 605 output as shown in FIG. 6 .
  • the main control component 602 may generate an instruction to switch the audio input of the live broadcast device 500 to a live broadcast A switch command for the microphone input of the headset connected to the device 600; and/or,
  • the main control component 602 may generate a switching instruction for instructing to switch the audio output of the live broadcast device 600 to the headphone output connected to the live broadcast device 600 .
  • the earphone connected to the live broadcast device 600 is an earphone worn by the host.
  • the audio processor 603 is configured to switch the audio input and output of the live broadcast device 600 according to the switching instruction.
  • the embodiments of the present disclosure further provide an input and output switching device applied to live audio.
  • FIG. 7 is a schematic diagram of an input and output switching device applied to live audio according to an embodiment of the present disclosure.
  • the switching device 700 applied to the input and output of live audio includes:
  • the obtaining unit 701 is configured to obtain the live image of the anchor during the live broadcast.
  • the determining unit 702 is configured to determine the live broadcast scene of the anchor according to the live broadcast image, and the live broadcast scene includes a far field scene and a near field scene.
  • the switching unit 703 is configured to switch the audio input and output of the live broadcast device according to the change of the live broadcast scene in response to the change of the live broadcast scene.
  • FIG. 8 is a schematic diagram of an input and output switching device applied to live audio according to another embodiment of the present disclosure.
  • the switching device 800 applied to the input and output of live audio includes:
  • the obtaining unit 801 is configured to obtain the live image of the anchor during the live broadcast.
  • the determining unit 802 is configured to determine the live broadcast scene of the anchor according to the live broadcast image, and the live broadcast scene includes a far field scene and a near field scene.
  • the determining unit 802 includes:
  • the identification subunit 8021 is configured to identify the live image to obtain a first identification result, wherein the first identification result is used to characterize: the first human body feature of the anchor in the live image, and the Describe the relationship between the second human characteristics of the anchor in the real scene;
  • the determining subunit 8022 is configured to determine the live broadcast scene according to the association relationship.
  • the identification subunit 8021 is configured to identify the live image to obtain a second identification result, wherein the second identification result is used to characterize the relationship between the host and the live broadcast device relative distance;
  • the determination subunit 8022 is configured to determine the live broadcast scene according to the relative distance.
  • the switching unit 803 is configured to switch the audio input and output of the live broadcast device according to the change of the live broadcast scene in response to the change of the live broadcast scene.
  • the present disclosure also provides an electronic device and a readable storage medium.
  • the present disclosure also provides a computer program product, the program product includes: a computer program, the computer program is stored in a readable storage medium, and at least one processor of an electronic device can read the program from the readable storage medium A computer program, at least one processor executes the computer program so that the electronic device executes the solution provided by any of the foregoing embodiments.
  • the electronic device 900 may be a terminal device or a server.
  • the terminal equipment may include but not limited to mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablet computers (Portable Android Device, PAD for short), portable multimedia players (Portable Media Player, referred to as PMP), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs, desktop computers, etc.
  • PDA Personal Digital Assistant
  • PMP portable multimedia players
  • mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals)
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 9 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 900 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 908 loads the programs in the random access memory (Random Access Memory, RAM for short) 903 to execute various appropriate actions and processes.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM Random Access Memory
  • various programs and data necessary for the operation of the electronic device 900 are also stored.
  • the processing device 901, ROM 902, and RAM 903 are connected to each other through a bus 904.
  • An input/output (Input/Output, I/O for short) interface 905 is also connected to the bus 904 .
  • an input device 906 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; ), a speaker, a vibrator, etc.
  • a storage device 908 including, for example, a magnetic tape, a hard disk, etc.
  • the communication means 909 may allow the electronic device 900 to perform wireless or wired communication with other devices to exchange data. While FIG. 9 shows electronic device 900 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication means 909, or from storage means 908, or from ROM 902.
  • the processing device 901 When the computer program is executed by the processing device 901, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Electrical Programmable Read Only Memory (EPROM or flash memory for short), optical fiber, compact disc read-only memory (CD-ROM for short), optical storage device, magnetic storage device, or the above any suitable combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • the program code contained on the computer readable medium can be transmitted by any appropriate medium, including but not limited to: electric wire, optical cable, radio frequency (Radio Frequency, RF for short), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the methods shown in the above-mentioned embodiments.
  • Computer program code for carrying out the operations of the present disclosure can be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external A computer (connected via the Internet, eg, using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit for obtaining at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA for short), Application Specific Integrated Circuit (ASIC for short), application specific standard product (Application Specific Standard Product, ASSP for short), System-on-a-chip (SOC for short), Complex Programmable Logic Device (CPLD for short), etc.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a method for switching input and output of audio applied to live broadcast including:
  • the live scene includes a far-field scene and a near-field scene
  • the audio input and output of the live broadcast device are switched according to the change of the live broadcast scene.
  • determining the live broadcast scene of the anchor according to the live image includes:
  • the live image Recognizing the live image to obtain a first recognition result, wherein the first recognition result is used to characterize: the first human body characteristics of the anchor in the live image, and the first human characteristics of the anchor in a real scene The correlation between the second human body characteristics;
  • the live broadcast scene is determined according to the association relationship.
  • the association relationship represents: a ratio of the first human body characteristic to the second human body characteristic.
  • the live scene is a far-field scene
  • the live scene is a near-field scene.
  • switching the audio input and output of the live broadcast device according to the change of the live broadcast scene includes:
  • switching the audio input and output of the live broadcast device according to the change of the live broadcast scene includes:
  • the first default prompt information corresponding to the input box after the first default prompt information corresponding to the input box is displayed in the target area outside the input box, it further includes: if it is detected that there is no information input in the input box and When the input box loses focus, the display of the first default prompt information in the target area is canceled, and the preset prompt information is displayed at the position of the input box.
  • determining the live broadcast scene of the anchor according to the live image includes:
  • the live broadcast scene is determined according to the relative distance.
  • the live broadcast scene is a near-field scene
  • the live broadcast scene is a far-field scene.
  • a live broadcast device including:
  • the main control component is used to obtain the live image of the anchor during the live broadcast, and determine the live scene of the anchor according to the live image, and the live scene includes a far field scene and a near field scene;
  • the main control component is also configured to, in response to the change of the live broadcast scene, generate a switching instruction according to the change of the live broadcast scene, and transmit the switching instruction to the audio processor, wherein the switching instruction is used to indicate Switch the audio input and output of the live broadcast device;
  • the audio processor is configured to switch the audio input and output of the live broadcast device according to the switching instruction.
  • the image collection device is used to collect the live broadcast images of the host during the live broadcast, and transmit the collected live broadcast images to the main control component.
  • the main control component is configured to identify the live image to obtain a first identification result, wherein the first identification result is used to represent: the anchor is in the An association relationship between the first human body feature in the live image and the second human body feature of the anchor in the real scene, and determine the live scene according to the association relationship.
  • the association relationship represents: a ratio of the first human body characteristic to the second human body characteristic.
  • the live scene is a far-field scene
  • the live scene is a near-field scene.
  • the switching instruction is used to indicate: switch the audio input of the live broadcast device to the the microphone input of the live broadcast device, and switch the audio output of the live broadcast device to the external output of the live broadcast device.
  • the switching instruction is used to indicate: switch the audio input of the live broadcast device to a
  • the microphone input of the earphone connected to the live broadcast device switches the audio output of the live broadcast device to the earphone output.
  • the main control component is configured to identify the live image to obtain a second identification result, wherein the second identification result is used to characterize the anchor and the The relative distance between the live broadcast devices, and determine the live broadcast scene according to the relative distance.
  • the live broadcast scene is a near-field scene
  • the live broadcast scene is a far-field scene.
  • an electronic device including: at least one processor and a memory;
  • the memory stores computer-executable instructions
  • the at least one processor executes the computer-executed instructions stored in the memory, so that the at least one processor executes the above first aspect and various possible methods of the first aspect.
  • a computer-readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, Realize the above first aspect and various possible methods of the first aspect.
  • a computer program product is provided.
  • the computer program is executed by a processor, the above first aspect and various possible methods of the first aspect are implemented.
  • an input and output switching device applied to live audio including:
  • the acquisition unit is used to acquire the live image of the anchor during the live broadcast
  • a determining unit configured to determine the live broadcast scene of the anchor according to the live broadcast image, and the live broadcast scene includes a far-field scene and a near-field scene;
  • the switching unit is configured to switch the audio input and output of the live broadcast device according to the change of the live broadcast scene in response to the change of the live broadcast scene.
  • the determining unit includes:
  • the identification subunit is configured to identify the live image to obtain a first identification result, wherein the first identification result is used to characterize: the first human body feature of the anchor in the live image, and the The relationship between the anchor's second human characteristics in the real scene;
  • the determining subunit is configured to determine the live broadcast scene according to the association relationship.
  • the association relationship represents: a ratio of the first human body characteristic to the second human body characteristic.
  • the live scene is a far-field scene
  • the live scene is a near-field scene.
  • the switching unit is configured to switch the audio input of the live broadcast device to the The microphone input of the live broadcast device switches the audio output of the live broadcast device to the external output of the live broadcast device.
  • the switching unit is configured to switch the audio input of the live broadcast device to the the microphone input of the earphone connected to the live broadcast device, and switch the audio output of the live broadcast device to the earphone output.
  • the determining unit includes:
  • the identification subunit is configured to identify the live broadcast image to obtain a second identification result, wherein the second identification result is used to represent the relative distance between the anchor and the live broadcast device;
  • the determining subunit is configured to determine the live broadcast scene according to the relative distance.
  • the live broadcast scene is a near-field scene
  • the live broadcast scene is a far-field scene.
  • a computer program is provided.
  • the computer program is executed by a processor, the above first aspect and various possible methods of the first aspect are implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Studio Devices (AREA)

Abstract

本公开实施例提供一种应用于直播的音频的输入输出的切换方法、直播设备,包括:获取主播在直播时的直播图像,并根据直播图像确定主播的直播场景,直播场景包括远场场景和近场场景,响应于直播场景的变化,根据直播场景的变化切换直播设备的音频的输入输出,避免了相关技术中,当直播场景发生变化时,需由主播人为的对直播设备的音频的输入输出进行手动切换,造成的操作繁琐的弊端,提高了直播的自动化,满足了主播的直播体验,且使得直播整体更为流畅,提高了直播的可靠性,也满足了观众的观看体验。

Description

应用于直播的音频的输入输出的切换方法、直播设备
本申请要求于2021年07月13日提交中国专利局、申请号为202110791411.7、申请名称为“应用于直播的音频的输入输出的切换方法、直播设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及计算机与网络通信技术领域,尤其涉及一种应用于直播的音频的输入输出的切换方法、直播设备、电子设备、可读存储介质、计算机程序产品及计算机程序。
背景技术
随着互联网的发展,直播成为一种新的演艺趋势,其中,直播的表演者称为主播,用于直播的设备称为直播设备,且主播在直播时,还可以通过直播设备与观众交流。
在直播时,音频的输入输出需要在远场场景和近场场景进行切换,如在远场场景时,音频的输出需支持外放,以使得主播和观众都能听到,在近场场景时,则需要停止外放。在现有技术中,需要由主播手动的方式实现音频的输入输出的切换。
然而,手动切换的及时性和可靠性偏低,尤其针对主播在远场与近场的频繁切换时,手动切换显得尤其繁琐。
发明内容
本公开实施例提供一种应用于直播的音频的输入输出的切换方法、直播设备、电子设备、可读存储介质、计算机程序产品及计算机程序,以克服手动切换的繁琐操作,且避免手动切换的及时性和可靠性偏低的问题。
第一方面,本公开实施例提供一种应用于直播的音频的输入输出的切换方法,包括:
获取主播在直播时的直播图像,并根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
第二方面,本公开实施例提供一种直播设备,包括:
主控组件,用于获取主播在直播时的直播图像,并根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
所述主控组件还用于,响应于所述直播场景的变化,根据所述直播场景的变化生成切换指令,并将所述切换指令传输给音频处理器,其中,所述切换指令用于指示切换直播设备的音频的输入输出;
所述音频处理器用于,根据所述切换指令切换所述直播设备的音频的输入输出。
第三方面,本公开实施例提供一种电子设备,包括:至少一个处理器和存储器;
所述存储器存储计算机执行指令;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能的方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的方法。
根据本公开的第五方面,提供了一种计算机程序产品,所述程序产品包括:计算机程序,所述计算机程序存储在可读存储介质中,电子设备的至少一个处理器可以从所述可读存储介质读取所述计算机程序,所述至少一个处理器执行所述计算机程序使得电子设备执行第一方面所述的方法。
根据本公开的第六方面,提供了一种应用于直播的音频的输入输出的切换装置,包括:
获取单元,用于获取主播在直播时的直播图像;
确定单元,用于根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
切换单元,用于响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
根据本公开的第七方面,提供了一种计算机程序,所述计算机程序在被处理器执行时实现如上第一方面以及第一方面各种可能的方法。
本实施例提供的应用于直播的音频的输入输出的切换方法、直播设备,包括:获取主播在直播时的直播图像,并根据直播图像确定主播的直播场景,直播场景包括远场场景和近场场景,响应于直播场景的变化,根据直播场景的变化切换直播设备的音频的输入输出,在本实施例中,引入了:基于直播图像确定直播场景,并在直播场景变化时,基于直播场景的变化切换音频的输入输出的技术特征,避免了相关技术中,当直播场景发生变化时,需由主播人为的切换直播设备的音频的输入输出,造成的操作繁琐的弊端,提高了直播的自动化,满足了主播的直播体验,且使得直播整体更为流畅,提高了直播的可靠性,也满足了观众的观看体验。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为根据本公开实施例的应用于直播的音频的输入输出的切换方法的场景示意图;
图2为根据本公开一个实施例的应用于直播的音频的输入输出的切换方法的示意图;
图3为根据本公开另一实施例的应用于直播的音频的输入输出的切换方法的示意图;
图4为根据本公开另一实施例的应用于直播的音频的输入输出的切换方法的示意图;
图5为根据本公开一个实施例的直播设备的示意图;
图6为根据本公开另一实施例的直播设备的示意图;
图7为根据本公开一个实施例的应用于直播的音频的输入输出的切换装置的示意图;
图8为根据本公开另一个实施例的应用于直播的音频的输入输出的切换装置的示意图;
图9为本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
随着互联网技术的发展,直播被越来越多人熟知,并受到越来越多人的青睐。
图1为直播场景的示意图,如图1所示,主播101可以基于直播设备102完成直播,直播设备102可以为如图1中所示的手机,也可以为其他电子设备,本实施例不做限定。
直播设备102上可以设置有摄像头103,摄像头103可以对主播101的直播内容进行采集,并将采集到的直播内容传输给观众104的用户设备105,以使观众104通过用户设备105获悉直播内容。
同理,用户设备105可以为如图1中所示的手机,也可以为其他电子设备,本实施例不做限定。
值得说明地是,上述示例只是示范性地说明,本实施例直播可能适用的应用场景,而不能理解为对场景的限定。
根据直播时,主播与直播设备之间的距离可以分为两种场景,一种场景为远场场景,另一种场景为近场场景。
其中,远场场景是指主播与直播设备之间的距离相对较远的直播的场景,近场场景是指主播与直播设备之间的距离相对较近的直播的场景。
例如,当主播跳舞环节时,更加适用于远场场景的直播,以便观看直播的观众可以看到主播的完整的舞姿,满足观众的观看体验。而当主播跳舞完毕,进入与观众的互动环节时,更加适用于近场场景的直播,以便拉近主播与观众之间的距离,使得互动的效果更强,满足观众的互动体验。
在相关技术中,在直播的场景发生切换时,为了提高直播的可靠性,满足观众的体验,主播需要通过手动的方式切换直播设备的音频的输入输出。
例如,结合上述跳舞示例地描述,若直播场景由近场场景(即主播与观众互动的场景)切换为远场场景(即主播跳舞的场景)时,直播设备的音频的输出需要设置为直播设备的外放输出,具体为直播设备的扬声器输出,以便主播可以听到与舞蹈对应的音乐,则主播通过手动的方式对直播设备的音频的输出进行设置,选择直播设备的外放输出。
而当主播跳舞结束,由远场场景切换为近场场景时,直播设备的音频的输出需要设为耳机输出,以避免观众听到被直播设备收录的观众的互动音频信息,则直播通过手动的方式对直播设备的音频的输出进行设置,选择与直播设备连接的耳机输出。
应该理解地是,上述示例只是以主播跳舞为例(即直播内容为舞蹈),对相关技术中的音频的输入输出进行切换进行描述,而不能理解为对直播内容的限定。
为了解决上述相关技术中存在的问题中的至少一种,本公开的发明人经过创造性地劳动,得到了本公开的发明构思:根据主播在直播时的直播图像确定直播场景,以便基于直播场景的变化对直播设备的音频的输入输出进行自动化的切换。
请参阅图2,图2为根据本公开一个实施例的应用于直播的音频的输入输出的切换方法的示意图。
如图2所示,该方法包括:
S201:获取主播在直播时的直播图像,并根据直播图像确定主播的直播场景,直播场景包括远场场景和近场场景。
示例性地,本实施例的执行主体可以为直播设备,直播设备可以为用于实现直播的设备,本实施例对直播设备的类型、样式、形状等不做限定。
其中,直播图像是指获取到的主播在直播过程中,主播的图像。
关于获取直播图像的实现,可以采用下述方式:
直播设备上可以设置图像采集装置,例如,当本实施例的方法应用于如图1所示的应用场景时,图像采集装置可以为如图1中所示的摄像头,并可以通过图像采集装置(如图1中所示的摄像头)对直播时的主播的图像进行获取,得到包括主播的直播图像。
S202:响应于直播场景的变化,根据直播场景的变化切换直播设备的音频的输入输出。
该步骤可以理解为:直播设备可以基于确定出的直播场景,确定直播场景是否发生变化,如果确定出直播场景发生变化,以便基于直播场景的变化切换直播设备的音频的输入输出。
示例性地,直播设备在确定出直播场景发生变化时,可以基于直播场景的变化生成切换指令,并基于切换指令对直播设备的音频的输入输出进行切换。
具体地,直播设备可以基于预设时间间隔确定直播场景,并检测当前次的直播场景与前一次的直播场景是否为相同的直播场景,如果为不同的直播场景,则说明直播场景发生变化,例如,若当前次的直播场景为远场场景,前一次的直播场景为近场场景,则直播设备可以生成切换指令,以实现直播设备的音频的输入输出进行自动切换。
反之,若直播设备检测出当前次的直播场景与前一次的直播场景是相同的直播场景,则无需对直播设备的音频的输入输出进行切换。
其中,预设时间间隔可以由直播设备基于需求、历史记录、以及试验等方式确定,本实施例不做限定。
在另一些实施例中,直播设备可以实时确定直播场景,如对由图像采集设备采集的每一帧直播图像进行检测,并将当前帧直播图像的直播场景、与前一帧直播图像的直播场景进行比较,若该两帧直播图像的直播场景为不同的直播场景,则直播设备可以生成切换指令,以实现直播设备的音频的输入输出的自动切换。
反之,若直播设备检测出当前帧直播场景与前一帧直播场景是相同的直播场景,则无需对直播设备的音频的输入输出进行切换。
基于上述分析可知,本公开实施例提供了一种应用于直播的音频的输入输出的切换方法,包括:获取主播在直播时的直播图像,并根据直播图像确定主播的直播场景,直播场景包括远场场景和近场场景,响应于直播场景的变化,根据直播场景的变化切换直 播设备的音频的输入输出,在本实施例中,引入了:基于直播图像确定直播场景,并在直播场景变化时,基于直播场景的变化切换音频的输入输出的技术特征,避免了相关技术中,当直播场景发生变化时,需由主播人为的对直播设备的音频的输入输出进行手动切换,造成的操作繁琐的弊端,提高了直播的自动化,满足了主播的直播体验,且使得直播整体更为流畅,提高了直播的可靠性,也满足了观众的观看体验。
请参阅图3,图3为根据本公开另一实施例的应用于直播的音频的输入输出的切换方法的示意图。
如图3所示,该方法包括:
S301:获取主播在直播时的直播图像。
示例性地,关于S301的实现原理,可以参见上述实施例,此处不再赘述。
S302:对直播图像进行识别,得到第一识别结果。
其中,第一识别结果用于表征:主播在直播图像中的第一人体特征、以及主播在现实场景中的第二人体特征之间的关联关系。
在一些实施例中,可以通过构建用于对人体特征进行识别的识别模型,并基于识别模型对直播图像进行识别,得到第一识别结果,即得到直播图像中主播的人体特征(即第一人体特征)。
一个示例中,第一人体特征可以为第一人体面积,如识别模型可以对直播图像中主播的人体面积进行识别。直播设备中存储有主播在现实场景中的第二人体面积,第一识别结果表征第一人体面积与第二人体面积之间的关联关系。
另一个示例中,第一人体特征可以为主播在直播图像中的第一身体部位,如经识别模型对主播图像的识别,确定直播图像中包括主播的头部,第一识别结果表征第一身体部位与现场场景中主播的整体身体部位之间的关联关系。
S303:根据关联关系确定直播场景。
结合上述示例,一个示例中,关联关系可以为第一人体面积与第二人体面积之间的比值,即直播图像中的主播的人体面积,相对于现实场景中主播的人体面积的占比。
示例性地,若比值大于预设的第一阈值,则直播场景为远场场景。反之,若比值小于第一阈值,则直播场景为近场场景。
其中,第一阈值可以由直播设备基于需求、历史记录、以及试验等方式进行设置,本实施例不做限定。
一般而言,若比值相对较小,即第一人体面积相对较小,主播与直播设备相对较近,则确定直播场景为近场场景。
反之,若比值相对较大,即第一人体面积相对较大,主播与直播设备相对较远,则确定直播场景为远场场景。
值得说明地是,在本实施例中,通过结合主播在直播图像中的第一人体特征、与直播在现实场景中的第二人体特征的比值,确定直播场景,可以使得确定出的直播场景具有较高的可靠性和准确性的技术效果。
另一个示例中,关联关系可以为第一身体部位与整体身体部位之间的关联关系,如关联关系具体可以为识别结果包括整体身体部位中的头部。
一般而言,第一身体部位包括的整体身体部位中的部位相对较多,则可以确定直播场景为远场场景。反之,第一身体部位包括的整体身体部位中的部位相对较少,则可以确定直播场景为近场场景。
例如,若经识别模型对直播图像进行识别确定:直播图像中包括主播的整体身体部位中的头部,则确定直播场景为近场场景。
值得说明地是,在本实施例中,通过对直播图像的第一识别结果,确定直播图像中的主播的第一人体特征、与现实场景中的主播的第二人体特征之间的关联关系,以便基于关联关系确定直播场景,可以使得确定出的关联关系具有较高的可靠性和准确性,进而使得当基于关联关系确定直播场景时,可以提高确定出的直播场景的有效性和准确性的技术效果。
S304:响应于直播场景的变化,根据直播场景的变化切换直播设备的音频的输入输出。
示例性地,关于S304地描述,可以参见上述实施例,此处不再赘述。
在一些实施例中,S304可以包括下述实施例:
实施例1:若直播场景的变化为由近场场景变化为远场场景,则将直播设备的音频的输入切换至直播设备的麦克风输入。
例如,若直播场景为近场场景,直播设备的音频的输出为耳机输出,则当直播设备确定直播场景由近场场景变化为远场场景时,可以将直播设备的音频输出切换为直播设备的外放输出。
结合上述针对舞蹈的直播可知,在本实施例的直播场景的变化的情况下,通过直播设备自动将直播设备的音频输出由耳机输出切换为直播设备的外放输出,可以使得主播清晰听到基于直播设备的外放输出的舞蹈音乐,从而为主播跳舞提供了更为有利的条件,避免了主播手动切换造成的操作繁琐,节约了时间,提高了直播的有效性和可靠性的技术效果。
实施例2:若直播场景的变化为由近场场景变化为远场场景,则将直播设备的音频的输出切换至直播设备的外放输出。
又如,若直播场景为近场场景,直播设备的音频的输入为耳机的麦克风输入,则当直播设备确定直播场景由近场场景变化为远场场景时,可以将直播设备的音频输入切换为直播设备的麦克风输入。
结合上述针对舞蹈的直播可知,在本实施例的直播场景的变化的情况下,通过直播设备自动将直播设备的音频输如由耳机的麦克风输入切换为直播设备的麦克风输入,可以使得主播的声音通过直播设备的麦克风被观众获悉,避免了主播手动切换造成的操作繁琐,节约了时间,提高了直播的有效性和可靠性的技术效果。
值得说明地是,实施例1和实施例2可以为单独的两个实施例,也可以将实施例1和实施例2组合成一个实施例,本实施例不做限定。
实施例3:若直播场景的变化为由远场场景变化为近场场景,则将直播设备的音频的输出切换至耳机输出。
例如,若直播场景为远场场景,直播设备的音频的输出为直播设备的外放输出,则当直播设备确定直播场景由远场场景变化为近场场景时,可以将直播设备的音频输出由直播设备的外放输出,切换至与直播设备连接的耳机输出。
结合上述针对舞蹈的直播可知,在本实施例的直播场景的变化的情况下,通过直播设备自动将直播设备的音频输出由耳机输出切换为耳机输出,可以便于主播与观众之间的互动,满足观众的互动体验,提高了直播的有效性和可靠性的技术效果。
实施例4:若直播场景的变化为由远场场景变化为近场场景,则将直播设备的音频的输入切换至与直播设备连接的耳机的麦克风输入。
例如,若直播场景为远场场景,直播设备的音频的输入为直播设备的麦克风输入,则当直播设备确定直播场景由远场场景变化为近场场景时,可以将直播设备的音频输入由直播设备的麦克风输入,切换至与直播设备连接的耳机的麦克风输入。
同理,通过本实施例的方案,可以使得主播的音频信息被相对较为完整和清楚的被与直播设备连接的耳机的麦克风录入,以满足观众的互动体验,提高直播的可靠性和准确性。
请参阅图4,图4为根据本公开另一实施例的应用于直播的音频的输入输出的切换方法的示意图。
如图4所示,该方法包括:
S401:获取主播在直播时的直播图像。
示例性地,关于S401的实现原理,可以参见上述实施例,此处不再赘述。
S402:对直播图像进行识别,得到第二识别结果。
示例性地,第二识别结果用于表征主播与直播设备之间的相对距离。
在一些实施例中,可以采集样本图像,样本图像中包括主播在直播时的图像,根据主播与直播设备之间的标注距离(即预先确定的主播与直播设备之间的真实距离)、以及样本图像对预设神经网络模型进行训练,得到用于对主播与直播设备之间的相对距离进行预测的预测模型。
相应地,在本实施例中,当直播设备获取到直播图像时,可以将直播图像输入至预测模型,从而得到表征相对距离的第二识别结果。
S403:根据相对距离确定直播场景。
值得说明地是,在本实施例中,通过基于直播图像确定主播与直播设备之间的相对距离,以基于相对距离确定直播场景,可以提高确定出的直播场景的可靠性和准确性的技术效果,进而当基于直播场景进行直播设备的音频的输入输出的切换时,可以在实现自动切换的同时,实现切换的准确性和可靠性的技术效果。
其中,若相对距离小于预设的第二阈值,则直播场景为近场场景,若相对距离大于第二阈值,则直播场景为远场场景。
同理,第二阈值可以由直播设备基于需求、历史记录、以及试验等方式进行设置,本实施例不做限定。
S404:响应于直播场景的变化,根据直播场景的变化切换直播设备的音频的输入输出。
示例性地,关于S404的实现原理,可以参见上述实施例,此处不再赘述。
根据本公开实施例的另一个方面,本公开实施例提供了一种直播设备。
请参阅图5,图5为根据本公开一个实施例的直播设备的示意图。
如图5所示,直播设备500包括:
主控组件501,用于获取主播在直播时的直播图像,并根据直播图像确定主播的直播场景,直播场景包括远场场景和近场场景。
主控组件501还用于,响应于直播场景的变化,根据直播场景的变化生成切换指令,并将切换指令传输给音频处理器,其中,切换指令用于指示切换直播设备的音频的输入输出。
音频处理器502用于,根据切换指令切换直播设备的音频的输入输出。
请参阅图6,图6为根据本公开另一实施例的直播设备的示意图。
如图6所示,直播设备600包括:
图像采集装置601,用于对主播在直播时的直播图像进行采集,并将采集到的直播图像传输给主控组件602。
其中,图像采集装置601为具有图像采集功能的装置,如摄像头。
主控组件602,用于获取主播在直播时的直播图像,并根据直播图像确定主播的直播场景,直播场景包括远场场景和近场场景。
其中,关于主控组件602确定直播场景的原理,可以参见上述实施例中的描述,此处不再赘述。
主控组件602还用于,响应于直播场景的变化,根据直播场景的变化生成切换指令,并将切换指令传输给音频处理器603,其中,切换指令用于指示切换直播设备600的音频的输入输出。
一个示例中,若主控组件602确定出直播场景的变化为由近场场景变化为远场场景,则主控组件602可以生成用于指示:将直播设备600的音频的输入切换至直播设备600的麦克风604输入的切换指令;和/或,
主控组件602可以生成用于指示:将直播设备600的音频的输出切换至直播设备600的外放输出的切换指令。其中,直播设备600的外放输出具体可以为如图6中所示的扬声器605输出。
另一个示例中,若主控组件602确定出直播场景的变化为由远场场景变化为近场场景,则主控组件602可以生成用于指示:将直播设备500的音频的输入切换至与直播设备600连接的耳机的麦克风输入的切换指令;和/或,
主控组件602可以生成用于指示:将直播设备600的音频的输出切换至与直播设备600连接的耳机输出的切换指令。
其中,与直播设备600连接的耳机为主播佩戴的耳机。
音频处理器603用于,根据切换指令切换直播设备600的音频的输入输出。
根据本公开实施例的另一个方面,本公开实施例还提供了一种应用于直播的音频的输入输出的切换装置。
请参阅图7,图7为根据本公开一个实施例的应用于直播的音频的输入输出的切换装置的示意图。
如图7所示,应用于直播的音频的输入输出的切换装置700包括:
获取单元701,用于获取主播在直播时的直播图像。
确定单元702,用于根据直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景。
切换单元703,用于响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
请参阅图8,图8为根据本公开另一个实施例的应用于直播的音频的输入输出的切换装置的示意图。
如图8所示,应用于直播的音频的输入输出的切换装置800包括:
获取单元801,用于获取主播在直播时的直播图像。
确定单元802,用于根据直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景。
结合图8可知,在一些实施例中,确定单元802包括:
识别子单元8021,用于对所述直播图像进行识别,得到第一识别结果,其中,所述第一识别结果用于表征:所述主播在所述直播图像中的第一人体特征、以及所述主播在现实场景中的第二人体特征之间的关联关系;
确定子单元8022,用于根据所述关联关系确定所述直播场景。
在另一些实施例中,识别子单元8021用于,对所述直播图像进行识别,得到第二识别结果,其中,所述第二识别结果用于表征所述主播与所述直播设备之间的相对距离;
确定子单元8022用于,根据所述相对距离确定所述直播场景。
切换单元803,用于响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
根据本公开的实施例,本公开还提供了一种电子设备和一种可读存储介质。
根据本公开的实施例,本公开还提供了一种计算机程序产品,程序产品包括:计算机程序,计算机程序存储在可读存储介质中,电子设备的至少一个处理器可以从可读存储介质读取计算机程序,至少一个处理器执行计算机程序使得电子设备执行上述任一实施例提供的方案。
参考图9,其示出了适于用来实现本公开实施例的电子设备900的结构示意图,该电子设备900可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图9示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图9所示,电子设备900可以包括处理装置(例如中央处理器、图形处理器等)901,其可以根据存储在只读存储器(Read Only Memory,简称ROM)902中的程序或者从存储装置908加载到随机访问存储器(Random Access Memory,简称RAM)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有电子设备900操作所需的各种程序和数据。处理装置901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(Input/Output,简称I/O)接口905也连接至总线904。
通常,以下装置可以连接至I/O接口905:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置906;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置907;包括例如磁带、硬盘等的存储装置908;以及通信装置909。通信装置909可以允许电子设备900与其他设备进行无线或有线通信以交换数据。虽然图9示出了具有各种装置的电子设备900,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置909从网络上被下载和安装,或者从存储装置908被安装,或者从ROM 902被安装。在该计算机程序被处理装置901执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Electrical Programmable Read Only Memory,简称EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(compact disc read-only memory,简称CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,简称RF)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域 网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、专用标准产品(Application Specific Standard Product,简称ASSP)、片上系统(System-on-a-chip,简称SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,简称CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种应用于直播的音频的输入输出的切换方法,包括:
获取主播在直播时的直播图像,并根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
根据本公开的一个或多个实施例,根据所述直播图像确定所述主播的直播场景,包括:
对所述直播图像进行识别,得到第一识别结果,其中,所述第一识别结果用于表征:所述主播在所述直播图像中的第一人体特征、以及所述主播在现实场景中的第二人体特征之间的关联关系;
根据所述关联关系确定所述直播场景。
根据本公开的一个或多个实施例,所述关联关系表征:所述第一人体特征所占所述第二人体特征的比值。
根据本公开的一个或多个实施例,若所述比值大于预设的第一阈值,则所述直播场景为远场场景;
若所述比值小于所述第一阈值,则所述直播场景为近场场景。
根据本公开的一个或多个实施例,若所述直播场景的变化为由近场场景变化为远场场景,则根据所述直播场景的变化切换直播设备的音频的输入输出包括:
将所述直播设备的音频的输入切换至所述直播设备的麦克风输入,将所述直播设备的音频的输出切换至所述直播设备的外放输出。
根据本公开的一个或多个实施例,若所述直播场景的变化为由远场场景变化为近场场景,则根据所述直播场景的变化切换直播设备的音频的输入输出包括:
将所述直播设备的音频的输入切换至与所述直播设备连接的耳机的麦克风输入,将所述直播设备的音频的输出切换至所述耳机输出。
根据本公开的一个或多个实施例,在所述输入框之外的目标区域显示所述输入框对应的第一默认提示信息之后,还包括:若检测到所述输入框内没有信息输入且所述输入框失去焦点,则取消在在所述目标区域显示所述第一默认提示信息,并在所述输入框所在位置显示预设提示信息。
根据本公开的一个或多个实施例,根据所述直播图像确定所述主播的直播场景,包括:
对所述直播图像进行识别,得到第二识别结果,其中,所述第二识别结果用于表征所述主播与所述直播设备之间的相对距离;
根据所述相对距离确定所述直播场景。
根据本公开的一个或多个实施例,若所述相对距离小于预设的第二阈值,则所述直播场景为近场场景;
若所述相对距离大于所述第二阈值,则所述直播场景为远场场景。
第二方面,根据本公开的一个或多个实施例,提供了一种直播设备,包括:
主控组件,用于获取主播在直播时的直播图像,并根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
所述主控组件还用于,响应于所述直播场景的变化,根据所述直播场景的变化生成切换指令,并将所述切换指令传输给音频处理器,其中,所述切换指令用于指示切换直播设备的音频的输入输出;
所述音频处理器用于,根据所述切换指令切换所述直播设备的音频的输入输出。
根据本公开的一个或多个实施例,还包括:
图像采集装置,用于对所述主播在直播时的直播图像进行采集,并将采集到的所述直播图像传输给所述主控组件。
根据本公开的一个或多个实施例,所述主控组件用于,对所述直播图像进行识别,得到第一识别结果,其中,所述第一识别结果用于表征:所述主播在所述直播图像中的第一人体特征、以及所述主播在现实场景中的第二人体特征之间的关联关系,并根据所述关联关系确定所述直播场景。
根据本公开的一个或多个实施例,所述关联关系表征:所述第一人体特征所占所述第二人体特征的比值。
根据本公开的一个或多个实施例,若所述比值大于预设的第一阈值,则所述直播场景为远场场景;
若所述比值小于所述第一阈值,则所述直播场景为近场场景。
根据本公开的一个或多个实施例,若所述直播场景的变化为由近场场景变化为远场场景,则所述切换指令用于指示:将所述直播设备的音频的输入切换至所述直播设备的麦克风输入,将所述直播设备的音频的输出切换至所述直播设备的外放输出。
根据本公开的一个或多个实施例,若所述直播场景的变化为由远场场景变化为近场场景,则所述切换指令用于指示:将所述直播设备的音频的输入切换至与所述直播设备连接的耳机的麦克风输入,将所述直播设备的音频的输出切换至所述耳机输出。
根据本公开的一个或多个实施例,所述主控组件用于,对所述直播图像进行识别,得到第二识别结果,其中,所述第二识别结果用于表征所述主播与所述直播设备之间的相对距离,并根据所述相对距离确定所述直播场景。
根据本公开的一个或多个实施例,若所述相对距离小于预设的第二阈值,则所述直播场景为近场场景;
若所述相对距离大于所述第二阈值,则所述直播场景为远场场景。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:至少一个处理器和存储器;
所述存储器存储计算机执行指令;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如上第一方面以及第一方面各种可能的方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的方法。
第五方面,根据本公开的一个或多个实施例,提供了一种计算机程序产品,所述计算机程序在被处理器执行时,实现如上第一方面以及第一方面各种可能的方法。
第六方面,根据本公开的一个或多个实施例,提供了一种应用于直播的音频的输入输出的切换装置,包括:
获取单元,用于获取主播在直播时的直播图像;
确定单元,用于根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
切换单元,用于响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
根据本公开的一个或多个实施例,所述确定单元包括:
识别子单元,用于对所述直播图像进行识别,得到第一识别结果,其中,所述第一识别结果用于表征:所述主播在所述直播图像中的第一人体特征、以及所述主播在现实场景中的第二人体特征之间的关联关系;
确定子单元,用于根据所述关联关系确定所述直播场景。
根据本公开的一个或多个实施例,所述关联关系表征:所述第一人体特征所占所述第二人体特征的比值。
根据本公开的一个或多个实施例,若所述比值大于预设的第一阈值,则所述直播场景为远场场景;
若所述比值小于所述第一阈值,则所述直播场景为近场场景。
根据本公开的一个或多个实施例,若所述直播场景的变化为由近场场景变化为远场场景,则所述切换单元用于,将所述直播设备的音频的输入切换至所述直播设备的麦克风输入,将所述直播设备的音频的输出切换至所述直播设备的外放输出。
根据本公开的一个或多个实施例,若所述直播场景的变化为由远场场景变化为近场场景,则所述切换单元用于,将所述直播设备的音频的输入切换至与所述直播设备连接的耳机的麦克风输入,将所述直播设备的音频的输出切换至所述耳机输出。
根据本公开的一个或多个实施例,所述确定单元包括:
识别子单元,用于对所述直播图像进行识别,得到第二识别结果,其中,所述第二识别结果用于表征所述主播与所述直播设备之间的相对距离;
确定子单元,用于根据所述相对距离确定所述直播场景。
根据本公开的一个或多个实施例,若所述相对距离小于预设的第二阈值,则所述直播场景为近场场景;
若所述相对距离大于所述第二阈值,则所述直播场景为远场场景。
第七方面,根据本公开的一个或多个实施例,提供了一种计算机程序,所述计算机程序在被处理器执行时,实现如上第一方面以及第一方面各种可能的方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (15)

  1. 一种应用于直播的音频的输入输出的切换方法,包括:
    获取主播在直播时的直播图像,并根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
    响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
  2. 根据权利要求1所述的方法,其中,根据所述直播图像确定所述主播的直播场景,包括:
    对所述直播图像进行识别,得到第一识别结果,其中,所述第一识别结果用于表征:所述主播在所述直播图像中的第一人体特征、以及所述主播在现实场景中的第二人体特征之间的关联关系;
    根据所述关联关系确定所述直播场景。
  3. 根据权利要求2所述的方法,其中,所述关联关系表征:所述第一人体特征所占所述第二人体特征的比值。
  4. 根据权利要求3所述的方法,其中,若所述比值大于预设的第一阈值,则所述直播场景为远场场景;
    若所述比值小于所述第一阈值,则所述直播场景为近场场景。
  5. 根据权利要求1至4中任一项所述的方法,其中,若所述直播场景的变化为由近场场景变化为远场场景,则根据所述直播场景的变化切换直播设备的音频的输入输出包括:
    将所述直播设备的音频的输入切换至所述直播设备的麦克风输入,将所述直播设备的音频的输出切换至所述直播设备的外放输出。
  6. 根据权利要求1至4中任一项所述的方法,其中,若所述直播场景的变化为由远场场景变化为近场场景,则根据所述直播场景的变化切换直播设备的音频的输入输出包括:
    将所述直播设备的音频的输入切换至与所述直播设备连接的耳机的麦克风输入,将所述直播设备的音频的输出切换至所述耳机输出。
  7. 根据权利要求1所述的方法,其中,根据所述直播图像确定所述主播的直播场景,包括:
    对所述直播图像进行识别,得到第二识别结果,其中,所述第二识别结果用于表征所述主播与所述直播设备之间的相对距离;
    根据所述相对距离确定所述直播场景。
  8. 根据权利要求7所述的方法,其中,若所述相对距离小于预设的第二阈值,则所述直播场景为近场场景;
    若所述相对距离大于所述第二阈值,则所述直播场景为远场场景。
  9. 一种直播设备,包括:
    主控组件,用于获取主播在直播时的直播图像,并根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
    所述主控组件还用于,响应于所述直播场景的变化,根据所述直播场景的变化生 成切换指令,并将所述切换指令传输给音频处理器,其中,所述切换指令用于指示切换直播设备的音频的输入输出;
    所述音频处理器用于,根据所述切换指令切换所述直播设备的音频的输入输出。
  10. 根据权利要求9所述的直播设备,还包括:
    图像采集装置,用于对所述主播在直播时的直播图像进行采集,并将采集到的所述直播图像传输给所述主控组件。
  11. 一种电子设备,包括:至少一个处理器和存储器;
    所述存储器存储计算机执行指令;
    所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述至少一个处理器执行如权利要求1至8中任一项所述的方法。
  12. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至8中任一项所述的方法。
  13. 一种计算机程序产品,所述计算机程序在被处理器执行时实现根据权利要求1至8中任一项所述的方法。
  14. 一种应用于直播的音频的输入输出的切换装置,包括:
    获取单元,用于获取主播在直播时的直播图像;
    确定单元,用于根据所述直播图像确定所述主播的直播场景,所述直播场景包括远场场景和近场场景;
    切换单元,用于响应于所述直播场景的变化,根据所述直播场景的变化切换直播设备的音频的输入输出。
  15. 一种计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1至8中任一项所述的方法。
PCT/CN2022/094396 2021-07-13 2022-05-23 应用于直播的音频的输入输出的切换方法、直播设备 WO2023284411A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110791411.7A CN113542785B (zh) 2021-07-13 2021-07-13 应用于直播的音频的输入输出的切换方法、直播设备
CN202110791411.7 2021-07-13

Publications (1)

Publication Number Publication Date
WO2023284411A1 true WO2023284411A1 (zh) 2023-01-19

Family

ID=78098918

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/094396 WO2023284411A1 (zh) 2021-07-13 2022-05-23 应用于直播的音频的输入输出的切换方法、直播设备

Country Status (2)

Country Link
CN (1) CN113542785B (zh)
WO (1) WO2023284411A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542785B (zh) * 2021-07-13 2023-04-07 北京字节跳动网络技术有限公司 应用于直播的音频的输入输出的切换方法、直播设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006229329A (ja) * 2005-02-15 2006-08-31 Canon Inc 撮像装置
CN103997563A (zh) * 2013-02-19 2014-08-20 三星电子株式会社 控制声音输入和输出的方法及其电子装置
CN106303565A (zh) * 2016-08-12 2017-01-04 广州华多网络科技有限公司 视频直播的画质优化方法和装置
CN106375846A (zh) * 2016-09-19 2017-02-01 北京小米移动软件有限公司 直播音频的处理方法及装置
CN111026263A (zh) * 2019-11-26 2020-04-17 维沃移动通信有限公司 一种音频播放方法及电子设备
CN111050269A (zh) * 2018-10-15 2020-04-21 华为技术有限公司 音频处理方法和电子设备
CN111095408A (zh) * 2017-09-15 2020-05-01 高通股份有限公司 基于相机的视场的与远程物联网(iot)设备的连接
CN113542785A (zh) * 2021-07-13 2021-10-22 北京字节跳动网络技术有限公司 应用于直播的音频的输入输出的切换方法、直播设备

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011022430A2 (en) * 2009-08-17 2011-02-24 Weigel Broadcasting Co. System and method for remote live audio-visual production
CN203387645U (zh) * 2013-06-29 2014-01-08 青岛歌尔声学科技有限公司 一种耳机播放模式的自动切换机构和一种耳机
CN105872253B (zh) * 2016-05-31 2020-07-07 腾讯科技(深圳)有限公司 一种直播声音处理方法及移动终端
CN106470343B (zh) * 2016-09-29 2019-09-17 广州华多网络科技有限公司 直播视频流远程控制方法及装置
CN106792188B (zh) * 2016-12-06 2020-06-02 腾讯数码(天津)有限公司 一种直播页面的数据处理方法、装置、系统和存储介质
CN106658032B (zh) * 2017-01-19 2020-02-21 三峡大学 一种多摄像头直播方法及系统
US20180338163A1 (en) * 2017-05-18 2018-11-22 International Business Machines Corporation Proxies for live events
EP3652950B1 (en) * 2017-07-13 2021-07-14 Dolby Laboratories Licensing Corporation Audio input and output device with streaming capabilities
US10506361B1 (en) * 2018-11-29 2019-12-10 Qualcomm Incorporated Immersive sound effects based on tracked position
CN110460863A (zh) * 2019-07-15 2019-11-15 北京字节跳动网络技术有限公司 基于显示位置的音视频处理方法、装置、介质和电子设备
CN110798726A (zh) * 2019-10-21 2020-02-14 北京达佳互联信息技术有限公司 弹幕显示方法、装置、电子设备及存储介质
CN112087659A (zh) * 2020-09-16 2020-12-15 四川长虹电器股份有限公司 一种在电视端教育直播的多人智能语音通话的装置及方法
CN111930341A (zh) * 2020-10-14 2020-11-13 歌尔光学科技有限公司 音频播放模式切换方法、装置及头戴设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006229329A (ja) * 2005-02-15 2006-08-31 Canon Inc 撮像装置
CN103997563A (zh) * 2013-02-19 2014-08-20 三星电子株式会社 控制声音输入和输出的方法及其电子装置
CN106303565A (zh) * 2016-08-12 2017-01-04 广州华多网络科技有限公司 视频直播的画质优化方法和装置
CN106375846A (zh) * 2016-09-19 2017-02-01 北京小米移动软件有限公司 直播音频的处理方法及装置
CN111095408A (zh) * 2017-09-15 2020-05-01 高通股份有限公司 基于相机的视场的与远程物联网(iot)设备的连接
CN111050269A (zh) * 2018-10-15 2020-04-21 华为技术有限公司 音频处理方法和电子设备
CN111026263A (zh) * 2019-11-26 2020-04-17 维沃移动通信有限公司 一种音频播放方法及电子设备
CN113542785A (zh) * 2021-07-13 2021-10-22 北京字节跳动网络技术有限公司 应用于直播的音频的输入输出的切换方法、直播设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "How can the anchor wear headphones to live broadcast without turning on the sound and dance?", 23 June 2015 (2015-06-23), CN, XP009542568, Retrieved from the Internet <URL:https://zhidao.baidu.com/question/305777625271958164.html> *

Also Published As

Publication number Publication date
CN113542785B (zh) 2023-04-07
CN113542785A (zh) 2021-10-22

Similar Documents

Publication Publication Date Title
CN109658932B (zh) 一种设备控制方法、装置、设备及介质
US9503831B2 (en) Audio playback method and apparatus
WO2021008223A1 (zh) 信息的确定方法、装置及电子设备
KR102347069B1 (ko) 전자 장치 및 그 동작방법
US20130226593A1 (en) Audio processing apparatus
EP3438974A1 (en) Information processing device, information processing method, and program
WO2022237464A1 (zh) 音频合成方法、装置、设备、介质及程序产品
WO2021114979A1 (zh) 视频页面显示方法、装置、电子设备和计算机可读介质
WO2023284437A1 (zh) 媒体文件处理方法、装置、设备、可读存储介质及产品
US20160065791A1 (en) Sound image play method and apparatus
WO2020147521A1 (zh) 用于显示图像的方法和装置
KR20220148915A (ko) 오디오 처리 방법, 장치, 판독 가능 매체 및 전자기기
US11936605B2 (en) Message processing method, apparatus and electronic device
WO2023051293A1 (zh) 一种音频处理方法、装置、电子设备和存储介质
WO2023284411A1 (zh) 应用于直播的音频的输入输出的切换方法、直播设备
US11886484B2 (en) Music playing method and apparatus based on user interaction, and device and storage medium
US11822854B2 (en) Automatic volume adjustment method and apparatus, medium, and device
JP2019537042A (ja) 映像表示装置及び映像表示方法
CN105632542A (zh) 音频播放方法及装置
CN108668011B (zh) 输出方法、输出设备以及电子设备
CN112259076B (zh) 语音交互方法、装置、电子设备及计算机可读存储介质
WO2023231787A1 (zh) 音频处理方法和装置
CN111355995A (zh) 蓝牙设备声音延迟时长的确定方法、装置及终端设备
WO2022237463A1 (zh) 直播背景音处理方法、装置、设备、介质及程序产品
WO2022198824A1 (zh) 音乐应用的续听处理方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22841051

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18573325

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE