WO2018018482A1 - 播放音效的方法及装置 - Google Patents

播放音效的方法及装置 Download PDF

Info

Publication number
WO2018018482A1
WO2018018482A1 PCT/CN2016/091996 CN2016091996W WO2018018482A1 WO 2018018482 A1 WO2018018482 A1 WO 2018018482A1 CN 2016091996 W CN2016091996 W CN 2016091996W WO 2018018482 A1 WO2018018482 A1 WO 2018018482A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound effect
interactive
determining
interaction information
information
Prior art date
Application number
PCT/CN2016/091996
Other languages
English (en)
French (fr)
Inventor
汤晓
史大龙
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2016/091996 priority Critical patent/WO2018018482A1/zh
Priority to CN201680000631.0A priority patent/CN106464939B/zh
Publication of WO2018018482A1 publication Critical patent/WO2018018482A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2355Processing of additional data, e.g. scrambling of additional data or processing content descriptors involving reformatting operations of additional data, e.g. HTML pages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/762Media network packet handling at the source 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2387Stream processing in response to a playback request from an end-user, e.g. for trick-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4346Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream involving stuffing data, e.g. packets or bytes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • H04N21/4351Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reassembling additional data, e.g. rebuilding an executable program from recovered modules

Definitions

  • the present disclosure relates to the field of video live broadcast technologies, and in particular, to a method and apparatus for playing sound effects.
  • the anchor players of the live video broadcast in order to attract the audience, through the combination of live content to properly insert some jokes or action expressions, etc., so that the live broadcast is more interesting.
  • the viewer interacts with the anchor by text, and the anchor player needs to play the sound effect manually, so that the live broadcaster and the viewer can feel the atmosphere adapted to the live content through the background sound effect.
  • the related technology needs The anchor player selects the sound effect that matches the live content and plays it manually, which causes the anchor's operation to be cumbersome, and it is also easy to distract the anchor's attention.
  • the embodiments of the present disclosure provide a method and a device for playing a sound effect, which are used to automatically play a background sound effect adapted to the live content, thereby achieving the effect of rendering a live atmosphere.
  • a method of playing a sound effect comprising:
  • the method before the playing the interactive sound effect, the method further includes:
  • the interactive sound effect is played according to the sound effect level.
  • the determining the corresponding sound effect level according to the current interaction information includes:
  • the determining the corresponding sound effect level according to the current interaction information includes:
  • the determining, according to the voice feature in the current interaction information, the sound effect level corresponding to the current interaction information including:
  • the determining the corresponding sound effect level according to the current interaction information is a barrage information
  • the content related information includes the content of the text in the barrage information
  • the determining Content related information contained in the current interactive information including:
  • the sound effect level includes any one or any combination of sound intensity, sound effect content, and sound effect sounding number.
  • an apparatus for playing a sound effect comprising:
  • An interactive information obtaining module configured to obtain current interactive information in a live room
  • the interactive sound effect determining module is configured to determine an interactive sound effect corresponding to the current interaction information acquired by the interaction information acquiring module;
  • the interactive sound effect playing module is configured to play the interactive sound effect determined by the interactive sound effect determining module.
  • the apparatus further includes:
  • the sound effect level determining module is configured to determine a corresponding sound effect level according to the current interaction information before the interactive sound effect playing module plays the interactive sound effect;
  • the interactive sound effect is played according to the sound effect level.
  • the sound effect level determining module comprises:
  • a first determining sub-module configured to determine a sound effect level corresponding to the current interaction information according to the facial feature in the current interaction information acquired by the interaction information acquiring module, where the facial feature includes a facial features .
  • the sound effect level determining module comprises:
  • a second determining sub-module configured to determine, according to the voice feature in the current interaction information acquired by the interaction information acquiring module, a sound effect level corresponding to the current interaction information, where the voice feature includes a voice content of the anchor user And voice intensity.
  • the second determining submodule is further configured to: acquire a number of keyword repetitions included in the voice content, and a decibel level corresponding to the voice strength; and according to the keyword repetition number and location The decibel level determines a sound effect level corresponding to the speech feature.
  • the sound effect level determining module comprises:
  • a fourth determining sub-module configured to determine a sound effect level corresponding to the current interactive information according to the barrage information in the current interaction information acquired by the interaction information acquiring module, where the barrage information includes a keyword repetition The number of times or emoji repetitions.
  • the sound effect level includes any one or any combination of sound intensity, sound effect content, and sound effect sounding number.
  • an apparatus for playing a sound effect comprising:
  • a memory for storing processor executable instructions
  • processor is configured to:
  • the sound played can be adapted to the live atmosphere of the live room, making the live atmosphere easy and enjoyable, achieving the effect of rendering the live room, and avoiding the anchor user.
  • the operation of the anchor user is simplified.
  • the corresponding sound level is determined to play the interactive sound effect, and the interactive sound effect can be controlled to be consistent with the atmosphere expressed by the current interactive information, thereby achieving the effect of rendering the live atmosphere.
  • the anchor user can control the interactive sound effect to be played according to the expression, due to the interactive sound effect and The exaggeration of expressions is consistent, so the live atmosphere can be rendered very well.
  • the sound level corresponding to the current interactive information is determined by detecting the level of the voice feature of the anchor user, and the interactive sound effect of different sound effects levels is played according to different degrees of the voice content and the voice intensity reflected by the voice features of the anchor user, thereby enabling control
  • the interactive sound effects match the voice features to achieve the effect of rendering the live atmosphere.
  • the interactive sound effect is played according to the interaction level of the audience user's barrage information, and the interactive sound effect and the audience user are controlled.
  • the degree of interaction is matched to achieve the effect of rendering the live atmosphere.
  • FIG. 1A is a flowchart of a method of playing a sound effect, according to an exemplary embodiment.
  • FIG. 1B is a scene diagram of a method of playing a sound effect, according to an exemplary embodiment.
  • FIG. 2 is a flow chart showing a method of playing sound effects according to an exemplary embodiment.
  • FIG. 3 is a flowchart of a method of playing sound effects, according to an exemplary embodiment 2.
  • FIG. 4 is a flowchart of a method of playing sound effects, according to an exemplary embodiment 3.
  • FIG. 5 is a flowchart of a method of playing sound effects, according to an exemplary embodiment 4.
  • FIG. 6 is a block diagram of an apparatus for playing sound effects, according to an exemplary embodiment.
  • FIG. 7 is a block diagram of another apparatus for playing sound effects, according to an exemplary embodiment.
  • FIG. 8 is a block diagram of an apparatus suitable for playing sound effects, according to an exemplary embodiment.
  • FIG. 1A is a flowchart of a method for playing sound effects according to an exemplary embodiment
  • FIG. 1B is one of scene diagrams of a method for playing sound effects according to an exemplary embodiment
  • the method for playing sound effects may be applied to an electronic device. (for example: smartphones, tablets, etc.),
  • the method for playing sound effects includes the following steps 101-103:
  • step 101 the current interactive information in the live room is obtained.
  • the live room can be a video platform or a video application of the host user during the live broadcast of the video, and the real-time video scene of the anchor user can be captured in real time through the camera or the camera on the electronic device.
  • the current interaction information may include a face feature of the anchor user, a voice feature of the anchor user, and a barrage information of the audience user based on the live content feedback of the anchor user, for example, the face feature is a funny expression feature of the host user.
  • the voice feature for example, the "Applause” in the "Applause There Should Be Applause" after the user has said something about the event.
  • the audience information of the audience user based on the live content feedback of the anchor user can be sung by the host user. When the song is good, the audience user sends out the text message "applause" or the "rose" pattern.
  • step 102 an interactive sound effect corresponding to the current interactive information is determined.
  • the interactive sound effects may include: laughter, applause, strange sound, and the like.
  • different interactive sound effects may be set according to the requirements of the anchor user's own interactive sound effects.
  • step 103 an interactive sound effect is played.
  • the anchor user obtains the live room of the anchor user after registering the live application of the electronic device 11, and the anchor user broadcasts the video through the live room, and collects the anchor user through the camera 111.
  • the electronic device 11 uploads the video captured by the camera 111 to the server 10, and the server 10 transmits the video collected in real time to the electronic device 12 and the audience user of the viewer user A in the form of a video stream.
  • the electronic device 13 of the B, the electronic device 12 and the electronic device 13 respectively play a live video scene about the anchor user through the live broadcast application.
  • the current interactive information of the anchor user can be monitored in real time.
  • the funny expression when the anchor user plays a funny expression, the funny expression can be detected by face recognition, and the funny expression can be regarded as a current interaction.
  • Information by determining the interactive sound effect corresponding to the funny expression, and then Play the funny interactive sound; or, after the anchor user has said something very interesting, and said "There should be applause here", you can detect the "Applause here" by voice recognition. There should be applause. It can be regarded as a current interactive information.
  • the interactive sound effect corresponding to "There should be applause here” the interactive sound effect of the applause can be played; or, the audience user A passes the live application on the electronic device 12. Log in to the live room of the anchor user.
  • the audience user A sends the barrage information "Applause” through the electronic device, and the user of the electronic device 11 on the anchor user side.
  • the barrage information “Applause and Applause” will be displayed on the interface.
  • the “Applause and Applause” can be regarded as a current interactive information. After the “Applause and Applause” is recognized by the text, the interactive sound effect of the applause is played.
  • the played sound effect can be adapted to the live atmosphere of the live room, so that the live broadcast atmosphere is relaxed and pleasant, and the effect of rendering the live room is achieved, and It also prevents the anchor user from playing the sound effect consistent with the live video scene by manually, which simplifies the operation of the anchor user.
  • determining a corresponding sound effect level according to the current interaction information including:
  • the sound effect level corresponding to the current interactive information is determined, and the facial features include facial features.
  • determining a corresponding sound effect level according to the current interaction information including:
  • the sound effect level corresponding to the current interactive information is determined according to the voice feature in the current interactive information, and the voice feature includes the voice content and the voice intensity of the anchor user.
  • determining a sound effect level corresponding to the current interaction information according to the voice feature in the current interaction information including:
  • the sound effect level corresponding to the voice feature is determined according to the keyword repetition number and the decibel level.
  • determining a corresponding sound effect level according to current interaction information current interaction
  • the information is the barrage information
  • the content related information includes the content of the text in the barrage information, and determines the content related information included in the current interactive information, including:
  • the sound effect level corresponding to the current interactive information is determined, and the barrage information includes the number of keyword repetitions or the number of repetitions of the emoji.
  • the sound effect level includes any one or any combination of sound intensity, sound effect content, and sound effect sounding number.
  • the above method provided by the embodiments of the present disclosure can ensure that the played sound effect can be adapted to the live atmosphere of the live room, make the live broadcast atmosphere easy and enjoyable, achieve the effect of rendering the live room, and also prevent the anchor user from playing by manual mode.
  • the sound effect of the live video scene is simplified, simplifying the operation of the anchor user.
  • FIG. 2 is a flowchart of a method for playing a sound effect according to an exemplary embodiment of the present invention.
  • This embodiment uses the foregoing method provided by the embodiment of the present disclosure to perform how to implement sound effects of different sound effects levels as an example and in conjunction with FIG. 1B.
  • An exemplary description, as shown in FIG. 2, includes the following steps:
  • step 201 the current interactive information in the live room is obtained.
  • step 202 an interactive sound effect corresponding to the current interactive information is determined.
  • step 203 the corresponding sound effect level is determined according to the current interaction information.
  • step 204 the interactive sound effect is played according to the sound effect level.
  • the sound effect level may include any one or any combination of sound intensity, sound effect content, and sound effect sound quantity, wherein the sound effect content may be applause, funny sound, laughter, etc., and the sound intensity corresponds to applause.
  • the sound of the funny sound and the sound of laughter, the number of sound effects can be one or more.
  • the face feature can be regarded as a facial feature change feature, by determining a sound effect level corresponding to the facial features, for example, if the facial features change corresponding sound level includes: decibel level For 60, the sound content corresponds to the " ⁇ " laughter. If the number of sound effects is 1 person, you can play the laughter of " ⁇ " with 1 person's laughter and intensity of 60 decibels, if the features of the facial features change The sound level includes: the decibel level is 65, the sound content corresponds to the "haha” laughter, and the number of sound effects is 5 people, then the "haha" laughter with 5 laughter and 60 decibel intensity can be played.
  • the embodiment has the above-mentioned embodiment shown in FIG. 1A, and determines the corresponding sound effect level to play the interactive sound effect according to the current interactive information, and can control the interactive sound effect to be consistent with the atmosphere expressed by the current interactive information, and achieve the rendering live atmosphere. Effect.
  • FIG. 3 is a flowchart of a method for playing a sound effect according to an exemplary embodiment of the present invention.
  • the present embodiment uses the above method provided by the embodiment of the present disclosure to describe how the current interactive information includes a face feature in a live room.
  • the sound effect level corresponding to the current interactive information is determined according to the face feature and is exemplified in FIG. 1B. As shown in FIG. 3, the following steps are included:
  • step 301 the current interaction information in the live room is obtained.
  • step 301 For related description of the step 301, reference may be made to the related description of the embodiment shown in FIG. 1A, which is not described in detail herein.
  • step 302 a facial feature is obtained from the current interactive information.
  • the face feature in the live room can be detected in real time by the face recognition method in the related art, which is not described in detail in the present disclosure.
  • step 303 the face feature is matched with the reference expression feature in the first preset feature database, and the first preset feature library is used to store the reference expression feature of the anchor user in different facial expressions.
  • the camera user can collect the different expressions of the anchor user in various different expressions (for example, different degrees of grimace corresponding to the crushed eyebrows, different degrees of mouth angles and cracks, and different degrees of mouth angles and a reference expression feature when the smile is split, etc., and storing the reference expression feature in the first preset feature library, thereby allowing only the first preset feature library to be stored
  • the facial expression feature of the anchor user involved in the live video scene ensures that the facial features of the anchor user collected by the camera in various different expressions can match the reference expression features in the first preset feature database. Improve the accuracy of expression recognition.
  • step 304 the reference expression feature matching the facial feature is determined as a facial feature variation feature.
  • the currently recognized face feature is [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4], through the face feature [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4] and the reference expression feature in the first preset feature library [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4 ], the reference expression feature [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4] is used for similarity calculation.
  • the reference expression feature [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4] is determined to be similar to the face feature [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4]
  • the two can be considered to match.
  • the reference expression feature [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4] matching the facial feature [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4] is determined as a feature of the five senses.
  • step 305 the corresponding sound effect level is determined according to the facial features.
  • the sound effect level corresponding to the different reference expression features may be preset in the first preset feature library.
  • step 306 the interactive sound effect is played according to the sound effect level.
  • the sound effect level corresponding to the expression feature [ ⁇ 1 ⁇ 2 ⁇ 3 ⁇ 4] includes: the decibel level is 65, the sound content corresponds to the “haha” laughter, and the number of vocalizations is 5, and the sound can be played with 5 laughs and the intensity is 65.
  • the anchor user can be made according to the The expression can control the interactive sound effect that needs to be played. Because the interactive sound effect is consistent with the exaggeration of the expression, the live atmosphere can be well rendered.
  • FIG. 4 is a flowchart of a method for playing a sound effect according to an exemplary embodiment of the present invention.
  • the present embodiment uses the above method provided by the embodiment of the present disclosure to determine how to perform the current interactive information according to a voice feature in a live room.
  • the voice feature determines the sound level corresponding to the current interactive information For example, and as shown in FIG. 1B, as shown in FIG. 4, the following steps are included:
  • step 401 the current interactive information in the live room is obtained.
  • step 401 For related descriptions in step 401, reference may be made to the related description of the foregoing embodiment of FIG. 1A, which is not described in detail herein.
  • step 402 a voice feature is obtained from the current interaction information.
  • the voice feature may include voice content and voice strength of the anchor user.
  • the voice content of the anchor user in the current interaction information may be identified by the voice recognition method in the related art, and the disclosure is not described in detail in the disclosure.
  • the anchor user said in the video live broadcast process that “there should be Applause.
  • the voice content is "here”, "should", “have applause”.
  • the voice strength of the anchor user can be detected by the sound sensor and the voice strength is expressed by the decibel level.
  • step 403 an interactive sound effect corresponding to the voice feature is determined.
  • the anchor user said something after saying "There should be applause here", and the voice content “here”, “should”, “applause”, “here”, “should”, “ The applause is matched with the voice reference keywords in the second preset feature database.
  • the interactive sound effect can be determined as “applause”.
  • the reference keyword that the anchor user needs to trigger the interactive sound effect during the live broadcast of the video may be collected by the electronic device 11, for example, the "applause” and the "music" are stored as the voice reference keyword in the second preset feature.
  • the second preset feature library can store only the voice reference keywords that the anchor user himself needs to trigger the interactive sound effect during the live broadcast of the video, thereby ensuring different word habits used by different anchor users. Underneath, it is still possible to trigger the interactive sound effects required by the anchor users according to their respective expression habits, so that the control of the interactive sound effects is more targeted.
  • step 404 the sound effect level corresponding to the current interactive information is determined according to the voice feature.
  • the number of keyword repetitions included in the voice content and the decibel level corresponding to the voice strength may be obtained, and the sound effect level corresponding to the voice feature is determined according to the keyword repetition number and the decibel level, for example, “there should be applause here”
  • Voice content package identified by voice With a applause "There should be applause in the applause", the voice content recognized by the voice contains three applause, then the sound content of the three applause voices is higher than the sound level corresponding to the applause voice content. It can also be combined with the voice intensity of the anchor user. For example, when the voice content includes one applause and the voice intensity is 50 decibels, and the voice content includes three applause and the voice intensity is 40 decibels, corresponding to different sound effects levels.
  • step 405 the interactive sound effect is played according to the sound effect level.
  • the voice content includes a applause and the voice intensity is 50 decibels.
  • the sound level includes: the decibel level is 65, the sound content corresponds to the applause, and the number of vocalists is 5, and the applause with 5 people and the intensity of 65 decibels can be played;
  • the content contains three applause and the sound intensity is 60 decibels.
  • the sound level includes: the decibel level is 70, the sound content corresponds to the applause, and the number of vocals is 10, then the applause with 10 people and 70 decibels of intensity can be played.
  • the present embodiment determines the sound effect level corresponding to the current interactive information by detecting the level of the voice feature of the anchor user, and implements the voice content and voice according to the voice feature of the anchor user.
  • the different levels of intensity play interactive sound effects of different sound levels, so that the interactive sound effects can be controlled to match the voice features to achieve the effect of rendering the live atmosphere.
  • FIG. 5 is a flowchart of a method for playing a sound effect according to an exemplary embodiment of the present invention.
  • the present embodiment uses the above method provided by the embodiment of the present disclosure to describe how the current interactive information includes the barrage information in the live room.
  • the sound effect level corresponding to the current interactive information is determined according to the barrage information as an example and is exemplarily illustrated in FIG. 1B. As shown in FIG. 5, the following steps are included:
  • step 501 the current interaction information in the live room is obtained
  • step 501 For related descriptions in step 501, reference may be made to the related description of the foregoing embodiment of FIG. 1A, which is not described in detail herein.
  • step 502 the bullet information is obtained from the current interactive information.
  • the barrage information may include text information sent by the viewer user and pattern expression information, such as text, and the graphic expression information is, for example, a rose, a smile of various happiness levels, a hug, and the like.
  • step 503 an interactive sound effect corresponding to the barrage information is determined.
  • the barrage information related to the live video scene may be identified, at least one text keyword is obtained, and the at least one text keyword is matched with the reference keyword in the third preset feature database, and the third The preset feature library is used to store reference keywords of the viewer user.
  • the server 10 can collect a keyword that needs to trigger an interactive sound effect sent by a large number of viewer users during the live broadcast of the video, for example, “applause”, “cheering”, etc., and store the reference keywords in the In the third preset feature database, the server 10 delivers the third preset feature library to the electronic device 11.
  • the text keyword of the viewer user in the live video scene can be identified by the semantic recognition method in the related art, which is not detailed in the disclosure, for example, the viewer user B uses the electronic device 12 to the electronic device of the anchor user. 11 sent "too good to hear, applause", the text keywords are "good”, "applause.”
  • step 504 the sound effect level corresponding to the current interactive information is determined according to the barrage information.
  • the corresponding sound effect level is also different.
  • the sound effect level corresponding to the barrage information can be determined by the number of times the keyword appears in the text information, for example, the barrage sent by the viewer user A
  • the information is “too good to hear, applause and applause”, the corresponding sound level includes: decibel level is 65, sound content corresponds to applause, the number of vocalization is 5, and for example, the audience information sent by audience user B is three big smiles.
  • the pattern emoticons, the corresponding sound level includes: the decibel level is 55, the sound content corresponds to music, and the number of vocalizations is 2 people.
  • step 505 the interactive sound effect is played according to the sound effect level.
  • step 504 for the response of the audience member A's barrage information, an applause having 5 persons and an intensity of 65 decibels can be played; for the response of the audience user B's barrage information, the player can play 3 people and the intensity is 55 decibels of music that can be set by the anchor user.
  • the present embodiment detects the viewer on the basis of the beneficial technical effects of the above embodiments.
  • the sound level of the user's barrage information, and playing the interactive sound effect consistent with the sound level realizing the interactive sound effect according to the interaction degree of the audience user's barrage information, controlling the interaction degree of the interactive sound effect and the audience user Match to achieve the effect of rendering live atmosphere.
  • FIG. 6 is a block diagram of an apparatus for playing sound effects according to an exemplary embodiment. As shown in FIG. 6, the apparatus for playing sound effects includes:
  • the interactive information obtaining module 61 is configured to obtain current interaction information in the live room
  • the interactive sound effect determining module 62 is configured to determine an interactive sound effect corresponding to the current interaction information acquired by the interactive information acquiring module 61;
  • the interactive sound effect playing module 63 is configured to play the interactive sound effect determined by the interactive sound effect determining module 62.
  • FIG. 7 is a block diagram of another apparatus for playing sound effects according to an exemplary embodiment. As shown in FIG. 7, on the basis of the embodiment shown in FIG. 6, in an embodiment, the apparatus further includes:
  • the sound effect level determining module 64 is configured to determine a corresponding sound effect level according to the current interaction information acquired by the interaction information acquiring module 61 before the interactive sound effect playing module 63 plays the interactive sound effect;
  • the interactive sound effect playing module 63 is configured to:
  • the interactive sound effect is played according to the sound level determined by the sound level determination module 64.
  • the sound level determination module 64 includes:
  • the first determining sub-module 641 is configured to determine a sound effect level corresponding to the current interactive information according to the facial feature in the current interactive information acquired by the interaction information acquiring module 61, and the facial feature includes a facial features.
  • the sound level determination module 64 includes:
  • the second determining sub-module 642 is configured to be acquired according to the interaction information acquiring module 61.
  • the voice feature in the current interactive information determines the sound effect level corresponding to the current interactive information, and the voice features include the voice content and voice intensity of the anchor user.
  • the second determining sub-module 642 is further configured to: obtain a number of keyword repetitions included in the voice content, and a decibel level corresponding to the voice strength; and determine a sound effect corresponding to the voice feature according to the keyword repetition number and the decibel level grade.
  • the sound level determination modulo 64 includes:
  • the third determining sub-module 643 is configured to determine the sound effect level corresponding to the current interactive information according to the barrage information in the current interactive information acquired by the interaction information acquiring module 61, and the barrage information includes the keyword repetition number or the emoji repetition number .
  • the sound effect level includes any one or any combination of sound intensity, sound effect content, and sound effect sounding number.
  • FIG. 8 is a block diagram of an apparatus suitable for playing sound effects, according to an exemplary embodiment.
  • device 800 can be a mobile phone with a camera, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • device 800 can include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, And a communication component 816.
  • Processing component 802 typically controls the overall operation of device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • Processing component 802 can include one or more processors 820 to execute instructions to perform all or part of the steps of the above described methods.
  • processing component 802 can include one or more modules to facilitate interaction between component 802 and other components.
  • processing component 802 can include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
  • Memory 804 is configured to store various types of data to support operation at device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phone book data, messages, pictures, videos, and the like.
  • the memory 804 can be composed of Any type of volatile or non-volatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable programmable read only memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable programmable read only memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • Power component 806 provides power to various components of device 800.
  • Power component 806 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 800.
  • the multimedia component 808 includes a screen between the device 800 and the user that provides an output interface.
  • the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor may sense not only the boundary of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 810 is configured to output and/or input an audio signal.
  • the audio component 810 includes a microphone (MIC) that is configured to receive an external audio signal when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in memory 804 or transmitted via communication component 816.
  • the audio component 810 also includes a speaker for outputting an audio signal.
  • the I/O interface 812 provides an interface between the processing component 802 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
  • Sensor assembly 814 includes one or more sensors for providing device 800 with a status assessment of various aspects.
  • sensor component 814 can detect the opening/closing of device 800 State, relative positioning of components, such as the display and keypad of device 800, sensor component 814 can also detect a change in position of one component of device 800 or device 800, the presence or absence of contact with device 800, device 800 Azimuth or acceleration/deceleration and temperature changes of device 800.
  • Sensor assembly 814 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 814 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 816 is configured to facilitate wired or wireless communication between device 800 and other devices.
  • the device 800 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor, or other electronic component implementation for performing the above methods.
  • non-transitory computer readable storage medium comprising instructions, such as a memory 804 comprising instructions executable by processor 820 of apparatus 800 to perform the above method.
  • the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • Processor 820 is configured to:

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Geometry (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本公开是关于一种播放音效的方法及装置。所述方法包括:确定视频直播场景的互动信息;当所述互动信息与预设参考信息相匹配时,确定与所述预设参考信息相对应的音效类型;根据所述音效类型播放音效。本公开技术方案可以实现自动播放与视频直播场景相适应的音效,达到渲染直播气氛的效果,使得直播氛围轻松愉快,并且还可避免主播用户通过手动的方式播放与视频直播场景相一致的音效,简化主播用户的操作。

Description

播放音效的方法及装置 技术领域
本公开涉及视频直播技术领域,尤其涉及一种播放音效的方法及装置。
背景技术
在视频直播的过程中,视频直播的主播人员为了吸引观众,通过会结合直播内容来适当插播一些笑话或者动作表情等,从而使直播更富有趣味性。相关技术中,观众通过文字与主播人员进行互动,主播人员需要通过手动的方式播放音效,从而使直播人员和观众都能够通过背景音效感受到与直播内容相适应的气氛,然而,相关技术由于需要主播人员选择与直播内容相适应的音效并手动播放,导致主播人员的操作较为繁琐,并且还容易分散主播人员的注意力。
发明内容
为克服相关技术中存在的问题,本公开实施例提供一种播放音效的方法及装置,用以自动播放与直播内容相适应的背景音效,从而达到渲染直播气氛的效果。
根据本公开实施例的第一方面,提供一种播放音效的方法,包括:
获取直播房间中的当前互动信息;
确定与所述当前互动信息相对应的互动音效;
播放所述互动音效。
在一实施例中,所述播放所述互动音效之前,所述方法还包括:
根据所述当前互动信息确定对应的音效等级;
所述播放所述互动音效包括;
按照所述音效等级播放所述互动音效。
在一实施例中,所述根据所述当前互动信息确定对应的音效等级,包括:
根据所述当前互动信息中的人脸特征,确定所述当前互动信息对应的音效等级,所述人脸特征包括五官变化特征。
在一实施例中,所述根据所述当前互动信息确定对应的音效等级,包括:
根据所述当前互动信息中的语音特征,确定所述当前互动信息对应的音效等级,所述语音特征包括主播用户的语音内容及语音强度。
在一实施例中,所述根据所述当前互动信息中的语音特征,确定所述当前互动信息对应的音效等级,包括:
获取所述语音内容包含的关键词重复次数,以及所述语音强度对应的分贝级别;
根据所述关键词重复次数和所述分贝级别确定所述语音特征对应的音效等级。
在一实施例中,所述根据所述当前互动信息确定对应的音效等级,所述当前互动信息为弹幕信息,所述内容相关信息包括所述弹幕信息中的本文内容,所述确定所述当前互动信息中包含的内容相关信息,包括:
根据所述当前互动信息中的弹幕信息,确定所述当前互动信息对应的音效等级,所述弹幕信息包括关键词重复次数或表情符号重复次数。
在一实施例中,所述音效等级包括:音效强度、音效内容、音效发声人数中的任意一种或者任意组合。
根据本公开实施例的第二方面,提供一种播放音效的装置,包括:
互动信息获取模块,被配置为获取直播房间中的当前互动信息;
互动音效确定模块,被配置为确定与所述互动信息获取模块获取到的所述当前互动信息相对应的互动音效;
互动音效播放模块,被配置为播放所述互动音效确定模块确定的所述互动音效。
在一实施例中,所述装置还包括:
音效等级确定模块,被配置为在所述互动音效播放模块播放所述互动音效之前,根据所述当前互动信息确定对应的音效等级;
所述互动音效播放模块被配置为:
按照所述音效等级播放所述互动音效。
在一实施例中,所述音效等级确定模块包括:
第一确定子模块,被配置为根据所述互动信息获取模块获取到的所述当前互动信息中的人脸特征,确定所述当前互动信息对应的音效等级,所述人脸特征包括五官变化特征。
在一实施例中,所述音效等级确定模块包括:
第二确定子模块,被配置为根据所述互动信息获取模块获取到的所述当前互动信息中的语音特征,确定所述当前互动信息对应的音效等级,所述语音特征包括主播用户的语音内容及语音强度。
在一实施例中,所述第二确定子模块还被配置为:获取所述语音内容包含的关键词重复次数,以及所述语音强度对应的分贝级别;并根据所述关键词重复次数和所述分贝级别确定所述语音特征对应的音效等级。
在一实施例中,所述音效等级确定模包括:
第四确定子模块,被配置为根据所述互动信息获取模块获取到的所述当前互动信息中的弹幕信息,确定所述当前互动信息对应的音效等级,所述弹幕信息包括关键词重复次数或表情符号重复次数。
在一实施例中,所述音效等级包括:音效强度、音效内容、音效发声人数中的任意一种或者任意组合。
根据本公开实施例的第三方面,提供一种播放音效的装置,包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为:
获取直播房间中的当前互动信息;
确定与所述当前互动信息相对应的互动音效;
播放所述互动音效。
本公开的实施例提供的技术方案可以包括以下有益效果:
通过播放与直播房间中的当前互动信息相对应的互动音效,确保播放的音效能够与直播房间的直播氛围相适应,使直播氛围轻松愉快,达到了渲染直播房间的效果,并且还避免了主播用户通过手动的方式播放与视频直播场景相一致的音效,简化了主播用户的操作。
此外,按照当前互动信息确定对应的音效等级播放互动音效,可以控制互动音效能够与当前互动信息所表达的氛围相一致,达到了渲染直播气氛的效果。
此外,通过将与人脸特征相匹配的参考表情特征确定为五官变化特征,根据五官变化特征确定对应的音效等级,可以使主播用户根据其表情即可控制需要播放的互动音效,由于互动音效与表情的夸张程度相一致,因此能够很好地渲染直播气氛。
通过检测主播用户的语音特征的级别确定当前互动信息对应的音效等级,实现了根据主播用户的语音特征所体现出的语音内容和语音强度的不同程度来播放不同音效等级的互动音效,从而可以控制互动音效与语音特征相匹配,达到渲染直播气氛的效果。
通过检测观众用户的弹幕信息的音效等级,并播放与该音效等级相一致的互动音效,实现了根据观众用户的弹幕信息所体现出的互动程度来播放互动音效,控制互动音效与观众用户的互动程度相匹配,达到渲染直播气氛的效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。
图1A是根据一示例性实施例示出的播放音效的方法的流程图。
图1B是根据一示例性实施例示出的播放音效的方法的场景图。
图2是根据一示例性实施例一示出的播放音效的方法的流程图。
图3是根据一示例性实施例二示出的播放音效的方法的流程图。
图4是根据一示例性实施例三示出的播放音效的方法的流程图。
图5是根据一示例性实施例四示出的播放音效的方法的流程图。
图6是根据一示例性实施例示出的一种播放音效的装置的框图。
图7是根据一示例性实施例示出的另一种播放音效的装置的框图。
图8是根据一示例性实施例示出的一种适用于播放音效的装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。
图1A是根据一示例性实施例示出的播放音效的方法的流程图,图1B是根据一示例性实施例示出的播放音效的方法的场景图之一;该播放音效的方法可以应用在电子设备(例如:智能手机、平板电脑等设备)上, 如图1A所示,该播放音效的方法包括以下步骤101-103:
在步骤101中,获取直播房间中的当前互动信息。
在一实施例中,直播房间可以为主播用户在视屏直播过程中的视频平台或者视频应用程序,可通过电子设备上的摄像头或者摄像装置实时抓取主播用户的实时视频场景。在一实施例中,当前互动信息可以包括主播用户的人脸特征、主播用户的语音特征以及观众用户基于主播用户的直播内容反馈的弹幕信息,例如,人脸特征为主播用户的搞怪表情特征,语音特征例如为主播用户讲了一个事情后说出的“此处应该有掌声”中的“掌声”,观众用户基于主播用户的直播内容反馈的弹幕信息可以为主播用户在唱一首很好听的歌曲时,观众用户发出的弹幕信息“鼓掌”的文字内容或者“玫瑰花”的图案。
在步骤102中,确定与当前互动信息相对应的互动音效。
在一实施例中,互动音效可以包括:大笑、掌声、搞怪声等。在一实施例中,可根据主播用户自身对互动音效的需求来设置不同的互动音效。
在步骤103中,播放互动音效。
在一示例性场景中,如图1B所示,主播用户在电子设备11的直播应用程序进行注册后得到该主播用户的直播房间,主播用户通过其直播房间进行视频直播,通过摄像头111采集主播用户在视频直播过程中的视频直播场景,电子设备11将摄像头111实时采集的视频上传至服务器10,服务器10再将实时采集的视频以视频流的形式传输给观众用户A的电子设备12和观众用户B的电子设备13,电子设备12和电子设备13分别通过直播应用程序播放关于主播用户的视频直播场景。在直播过程中,可以实时监测主播用户的当前互动信息,例如,当主播用户扮了一个搞怪的表情时,可以通过人脸识别检测到该搞怪的表情,该搞怪的表情可以视为一条当前互动信息,通过确定与该搞怪的表情相对应的互动音效,进而可以 播放该搞怪的互动音效;或者,主播用户讲了一个很有趣的事情后,并且说到“此处应该有掌声”,可以通过语音识别检测到该“此处应该有掌声”,该“此处应该有掌声”可以视为一条当前互动信息,通过确定与“此处应该有掌声”相对应的互动音效,可以播放鼓掌的互动音效;再或者,观众用户A通过电子设备12上的直播应用程序登录到主播用户的直播房间,当观众用户A认为主播用户讲了一个很励志的事情后,观众用户A通过电子设备发送了弹幕信息“鼓掌鼓掌”,在主播用户侧的电子设备11的用户界面上,会在界面上显示出弹幕信息“鼓掌鼓掌”,该“鼓掌鼓掌”可以视为一条当前互动信息,在通过文本识别到“鼓掌鼓掌”后,播放鼓掌的互动音效。
本实施例中,通过播放与直播房间中的当前互动信息相对应的互动音效,确保播放的音效能够与直播房间的直播氛围相适应,使直播氛围轻松愉快,达到了渲染直播房间的效果,并且还避免了主播用户通过手动的方式播放与视频直播场景相一致的音效,简化了主播用户的操作。
在一实施例中,根据当前互动信息确定对应的音效等级,包括:
根据当前互动信息中的人脸特征,确定当前互动信息对应的音效等级,人脸特征包括五官变化特征。
在一实施例中,根据当前互动信息确定对应的音效等级,包括:
根据当前互动信息中的语音特征,确定当前互动信息对应的音效等级,语音特征包括主播用户的语音内容及语音强度。
在一实施例中,根据当前互动信息中的语音特征,确定当前互动信息对应的音效等级,包括:
获取语音内容包含的关键词重复次数,以及语音强度对应的分贝级别;
根据关键词重复次数和分贝级别确定语音特征对应的音效等级。
在一实施例中,根据当前互动信息确定对应的音效等级,当前互动 信息为弹幕信息,内容相关信息包括弹幕信息中的本文内容,确定当前互动信息中包含的内容相关信息,包括:
根据当前互动信息中的弹幕信息,确定当前互动信息对应的音效等级,弹幕信息包括关键词重复次数或表情符号重复次数。
在一实施例中,音效等级包括:音效强度、音效内容、音效发声人数中的任意一种或者任意组合。
具体如何播放音效的,请参考后续实施例。
至此,本公开实施例提供的上述方法,可以确保播放的音效能够与直播房间的直播氛围相适应,使直播氛围轻松愉快,达到渲染直播房间的效果,并且还避免主播用户通过手动的方式播放与视频直播场景相一致的音效,简化主播用户的操作。
下面以具体实施例来说明本公开实施例提供的技术方案。
图2是根据一示例性实施例一示出的播放音效的方法的流程图;本实施例利用本公开实施例提供的上述方法,以如何实现播放不同音效等级的音效为例并结合图1B进行示例性说明,如图2所示,包括如下步骤:
步骤201中,获取直播房间中的当前互动信息。
步骤202中,确定与当前互动信息相对应的互动音效。
步骤201和步骤202的相关描述可以参见上述图1A所示实施例的相关描述,在此不再详述。
步骤203中,根据当前互动信息确定对应的音效等级。
步骤204中,按照音效等级播放互动音效。
在一实施例中,音效等级可以包括:音效强度、音效内容、音效发声人数中的任意一种或者任意组合,其中,音效内容可以为鼓掌声、搞怪声、笑声等,音效强度对应鼓掌声、搞怪声以及笑声的声音的高低,音效发声人数可以为1个或者多个。以当前互动信息所包括的人脸特征为例进行示例性说明,人脸特征对应不同的五官变化特征时,可对应不同的音效 等级,例如,检测到人脸特征为鬼脸特征,则可以将该鬼脸特征视为五官变化特征,通过确定该五官变化特征对应的音效等级,例如,如果五官变化特征对应的音效等级包括:分贝级别为60,音效内容对应“嘻嘻”的笑声,音效发声人数为1人,则可以播放具有1人笑声并且强度为60分贝的“嘻嘻”的笑声,如果如果五官变化特征对应的音效等级包括:分贝级别为65,音效内容对应“哈哈”的笑声,音效发声人数为5人,则可以播放具有5人笑声并且强度为60分贝的“哈哈”的笑声。
本实施例在具有上述图1A所示实施例的基础上,按照当前互动信息确定对应的音效等级播放互动音效,可以控制互动音效能够与当前互动信息所表达的氛围相一致,达到了渲染直播气氛的效果。
图3是根据一示例性实施例二示出的播放音效的方法的流程图;本实施例利用本公开实施例提供的上述方法,以当前互动信息包括直播房间中的人脸特征的情形下如何根据人脸特征确定当前互动信息对应的音效等级为例并结合图1B进行示例性说明,如图3所示,包括如下步骤:
步骤301中,获取直播房间中的当前互动信息。
步骤301的相关描述可以参见上述图1A所示实施例的相关描述,在此不再详述。
步骤302中,从当前互动信息中获取人脸特征。
在一实施例中,可以通过相关技术中的人脸识别方法实时检测直播房间中的人脸特征,本公开不再详述。
步骤303中,将人脸特征与第一预设特征库中的参考表情特征进行匹配,第一预设特征库用于存储主播用户在不同人脸表情时的参考表情特征。
在一实施例中,可以通过摄像头采集主播用户在各种不同表情(例如,不同程度的挤眉对应的的鬼脸、不同程度的嘴角向下并裂开时的哭脸、不同程度的嘴角上扬并裂开时的笑脸等)时的参考表情特征,并将该参考表情特征存储在第一预设特征库中,由此可以使第一预设特征库中只存储 视频直播场景中涉及到的主播用户的人脸表情特征,确保摄像头采集到的主播用户本人在各种不同表情的情况下的人脸特征均能够与第一预设特征库中的参考表情特征匹配,提高表情特征识别的准确度。
步骤304中,将与人脸特征相匹配的参考表情特征确定为五官变化特征。
在一实施例中,例如,当前识别到的人脸特征为[α1 α2 α3 α4],通过人脸特征[α1 α2 α3 α4]与第一预设特征库中的参考表情特征[β1 β2 β3 β4]、参考表情特征[χ1 χ2 χ3 χ4]等进行相似度计算,当确定参考表情特征[β1 β2 β3 β4]与人脸特征[α1 α2 α3 α4]相似时,可认为二者相匹配,可将与人脸特征[α1 α2 α3 α4]相匹配的参考表情特征[β1 β2 β3 β4]确定为五官变化特征。
步骤305中,根据五官变化特征确定对应的音效等级。
在一实施例中,可以在第一预设特征库中预设设置不同的参考表情特征所对应的音效等级。
步骤306中,按照音效等级播放互动音效。
例如,参考表情特征[β1 β2 β3 β4]对应的音效等级包括:分贝级别为65,音效内容对应“哈哈”的笑声,发声人数为5人,则可以播放具有5人笑声并且强度为65分贝的“哈哈”的笑声。
本实施例在具有上述实施例的有益技术效果的基础上,通过将与人脸特征相匹配的参考表情特征确定为五官变化特征,根据五官变化特征确定对应的音效等级,可以使主播用户根据其表情即可控制需要播放的互动音效,由于互动音效与表情的夸张程度相一致,因此能够很好地渲染直播气氛。
图4是根据一示例性实施例三示出的播放音效的方法的流程图;本实施例利用本公开实施例提供的上述方法,以当前互动信息包括直播房间中的语音特征的情形下如何根据语音特征确定当前互动信息对应的音效等级 为例并结合图1B进行示例性说明,如图4所示,包括如下步骤:
步骤401中,获取直播房间中的当前互动信息。
步骤401中的相关描述可以参见上述图1A实施例的相关描述,在此不再详述。
步骤402中,从当前互动信息中获取语音特征。
在一实施例中,语音特征可以包括主播用户的语音内容及语音强度。在一实施例中,可以通过相关技术中的语音识别方法识别出当前互动信息中主播用户的语音内容,本公开不再详述,例如,主播用户在视频直播过程中说了“此处应该有掌声”,语音内容为“此处”、“应该”、“有掌声”。在一实施例中,可以通过声音传感器检测到主播用户的语音强度,并通过分贝级别表示语音强度。
步骤403中,确定与语音特征相对应的互动音效。
例如,主播用户讲了一个事情后说出的“此处应该有掌声”,通过语音识别出语音内容“此处”、“应该”、“掌声”,将“此处”、“应该”、“掌声”与第二预设特征库中的语音参考关键词进行匹配,在确定第二预设特征库中存储的一个语音参考关键词为“掌声”后,可以确定出互动音效为“掌声”。在一实施例中,可以通过电子设备11收集主播用户在视频直播过程中需要触发互动音效的参考关键词,例如,将“掌声”、“音乐”作为语音参考关键词存储在第二预设特征库中,由此可以使第二预设特征库中只存储主播用户本人在视频直播过程中需要触发互动音效的语音参考关键词,由此可以确保不同的主播用户所使用的词语习惯不同的情形下,仍能够根据各自的表达习惯触发主播用户需要的互动音效,使互动音效的控制更具针对性。
步骤404中,根据语音特征,确定当前互动信息对应的音效等级。
在一实施例中,可以获取语音内容包含的关键词重复次数,以及语音强度对应的分贝级别,根据关键词重复次数和分贝级别确定语音特征对应的音效等级,例如,“此处应该有掌声”,通过语音识别出的语音内容包 含一个掌声,“此处应该有掌声掌声掌声”,通过语音识别出的语音内容包含三个掌声,则三个掌声的语音内容对应的音效等级高于一个掌声的语音内容对应的音效等级,此外,还可以结合主播用户的语音强度,例如,语音内容包含一个掌声并且语音强度为50分贝时与语音内容包含三个掌声并且语音强度为40分贝时,对应不同的音效等级。
步骤405中,按照音效等级播放互动音效。
例如,语音内容包含一个掌声并且语音强度为50分贝对应的音效等级包括:分贝级别为65,音效内容对应掌声,发声人数为5人,则可以播放具有5人并且强度为65分贝的掌声;语音内容包含三个掌声并且语音强度为60分贝对应的音效等级包括:分贝级别为70,音效内容对应掌声,发声人数为10人,则可以播放具有10人并且强度为70分贝的掌声。
本实施例在具有上述实施例的有益技术效果的基础上,通过检测主播用户的语音特征的级别确定当前互动信息对应的音效等级,实现了根据主播用户的语音特征所体现出的语音内容和语音强度的不同程度来播放不同音效等级的互动音效,从而可以控制互动音效与语音特征相匹配,达到渲染直播气氛的效果。
图5是根据一示例性实施例四示出的播放音效的方法的流程图;本实施例利用本公开实施例提供的上述方法,以当前互动信息包括直播房间中的弹幕信息的情形下如何根据弹幕信息确定当前互动信息对应的音效等级为例并结合图1B进行示例性说明,如图5所示,包括如下步骤:
步骤501中,获取直播房间中的当前互动信息;
步骤501中的相关描述可以参见上述图1A实施例的相关描述,在此不再详述。
步骤502中,从当前互动信息中获取弹幕信息。
在一实施例中,弹幕信息可以包括观众用户发送的文本信息以及图案表情信息,文本信息例如为文字,图案表情信息例如为玫瑰花、各种不同开心级别的笑脸、拥抱等。
步骤503中,确定与弹幕信息相对应的互动音效。
在一实施例中,可以对与视频直播场景相关的弹幕信息进行识别,得到至少一个文本关键词,将至少一个文本关键词与第三预设特征库中的参考关键词进行匹配,第三预设特征库用于存储观众用户的参考关键词。在一实施例中,可以通过服务器10收集海量的观众用户在视频直播过程中发送的需要触发互动音效的关键词,例如,将“掌声”、“欢呼声”等,并将参考关键词存储在第三预设特征库中,服务器10将该第三预设特征库下发给电子设备11。在一实施例中,可以通过相关技术中的语义识别方法识别出视频直播场景中观众用户的文本关键词,本公开不再详述,例如,观众用户B通过电子设备12向主播用户的电子设备11发送了“太好听了,鼓掌鼓掌”,该文本关键词为“好听”、“鼓掌”。
步骤504中,根据弹幕信息,确定当前互动信息对应的音效等级。
当观众用户所使用的文本信息具有不同的紧凑程度时(例如,一条文本信息中包含两个“掌声”与包含一个“掌声”,或者,一条文本信息中包含一朵“玫瑰花”与三朵“玫瑰花”),对应的音效等级度也不同,在一实施例中,可以通过关键词在文本信息中出现的次数来确定弹幕信息对应的音效等级,例如,观众用户A发送的弹幕信息为“太好听了,鼓掌鼓掌鼓掌”,对应的音效等级包括:分贝级别为65,音效内容对应掌声,发声人数为5人,再例如,观众用户B发送的弹幕信息为三个大笑脸的图案表情符号,对应的音效等级包括:分贝级别为55,音效内容对应音乐,发声人数为2人。
步骤505中,按照音效等级播放互动音效。
与上述步骤504相对应,对于观众用户A的弹幕信息的响应,可以播放具有5人并且强度为65分贝的掌声;对于观众用户B的弹幕信息的响应,可以播放具有3人并且强度为55分贝的音乐,该音乐可以由主播用户来设定。
本实施例在具有上述实施例的有益技术效果的基础上,通过检测观众 用户的弹幕信息的音效等级,并播放与该音效等级相一致的互动音效,实现了根据观众用户的弹幕信息所体现出的互动程度来播放互动音效,控制互动音效与观众用户的互动程度相匹配,达到渲染直播气氛的效果。
本领域技术人员可以理解的是,上述图3-图5所示实施例中的各种结合可形成新的实施例,也即,可以通过人脸特征、语音特征以及弹幕信息中的任意一个或者任意组合的方式播放互动音效。
图6是根据一示例性实施例示出的一种播放音效的装置的框图,如图6所示,播放音效的装置包括:
互动信息获取模块61,被配置为获取直播房间中的当前互动信息;
互动音效确定模块62,被配置为确定与互动信息获取模块61获取到的当前互动信息相对应的互动音效;
互动音效播放模块63,被配置为播放互动音效确定模块62确定的互动音效。
图7是根据一示例性实施例示出的另一种播放音效的装置的框图,如图7所示,在上述图6所示实施例的基础上,在一实施例中,装置还包括:
音效等级确定模块64,被配置为在互动音效播放模块63播放互动音效之前,根据互动信息获取模块61获取到的当前互动信息确定对应的音效等级;
互动音效播放模块63被配置为:
按照音效等级确定模块64确定的音效等级播放互动音效。
在一实施例中,音效等级确定模块64包括:
第一确定子模块641,被配置为根据互动信息获取模块61获取到的当前互动信息中的人脸特征,确定当前互动信息对应的音效等级,人脸特征包括五官变化特征。
在一实施例中,音效等级确定模块64包括:
第二确定子模块642,被配置为根据互动信息获取模块61获取到的 当前互动信息中的语音特征,确定当前互动信息对应的音效等级,语音特征包括主播用户的语音内容及语音强度。
在一实施例中,第二确定子模块642还被配置为:获取语音内容包含的关键词重复次数,以及语音强度对应的分贝级别;并根据关键词重复次数和分贝级别确定语音特征对应的音效等级。
在一实施例中,音效等级确定模64包括:
第三确定子模块643,被配置为根据互动信息获取模块61获取到的当前互动信息中的弹幕信息,确定当前互动信息对应的音效等级,弹幕信息包括关键词重复次数或表情符号重复次数。
在一实施例中,音效等级包括:音效强度、音效内容、音效发声人数中的任意一种或者任意组合。
图8是根据一示例性实施例示出的一种适用于播放音效的装置的框图。例如,装置800可以是具有摄像头的移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等电子设备。
参照图8,装置800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。
处理组件802通常控制装置800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理元件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理部件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在设备800的操作。这些数据的示例包括用于在装置800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由 任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电力组件806为装置800的各种组件提供电力。电力组件806可以包括电源管理系统,一个或多个电源,及其他与为装置800生成、管理和分配电力相关联的组件。
多媒体组件808包括在所述装置800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当装置800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为装置800提供各个方面的状态评估。例如,传感器组件814可以检测到设备800的打开/关闭 状态,组件的相对定位,例如所述组件为装置800的显示器和小键盘,传感器组件814还可以检测装置800或装置800一个组件的位置改变,用户与装置800接触的存在或不存在,装置800方位或加速/减速和装置800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于装置800和其他设备之间有线或无线方式的通信。装置800可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信部件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器804,上述指令可由装置800的处理器820执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。处理器820被配置为:
获取直播房间中的当前互动信息;
确定与当前互动信息相对应的互动音效;
播放互动音效。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (15)

  1. 一种播放音效的方法,其特征在于,所述方法包括:
    获取直播房间中的当前互动信息;
    确定与所述当前互动信息相对应的互动音效;
    播放所述互动音效。
  2. 根据权利要求1所述的方法,其特征在于,所述播放所述互动音效之前,所述方法还包括:
    根据所述当前互动信息确定对应的音效等级;
    所述播放所述互动音效包括:
    按照所述音效等级播放所述互动音效。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述当前互动信息确定对应的音效等级,包括:
    根据所述当前互动信息中的人脸特征,确定所述当前互动信息对应的音效等级,所述人脸特征包括五官变化特征。
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述当前互动信息确定对应的音效等级,包括:
    根据所述当前互动信息中的语音特征,确定所述当前互动信息对应的音效等级,所述语音特征包括主播用户的语音内容及语音强度。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述当前互动信息中的语音特征,确定所述当前互动信息对应的音效等级,包括:
    获取所述语音内容包含的关键词重复次数,以及所述语音强度对应的分贝级别;
    根据所述关键词重复次数和所述分贝级别确定所述语音特征对应的音效等级。
  6. 根据权利要求2所述的方法,其特征在于,所述当前互动信息包括弹幕信息,所述根据所述当前互动信息确定对应的音效等级,包括:
    根据所述当前互动信息中的弹幕信息,确定所述当前互动信息对应的音效等级,所述弹幕信息包括关键词重复次数或表情符号重复次数。
  7. 根据权利要求1所述的方法,其特征在于,所述音效等级包括:音效强度、音效内容、音效发声人数中的任意一种或者任意组合。
  8. 一种播放音效的装置,其特征在于,所述装置包括:
    互动信息获取模块,被配置为获取直播房间中的当前互动信息;
    互动音效确定模块,被配置为确定与所述互动信息获取模块获取到的所述当前互动信息相对应的互动音效;
    互动音效播放模块,被配置为播放所述互动音效确定模块确定的所述互动音效。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    音效等级确定模块,被配置为在所述互动音效播放模块播放所述互动音效之前,根据所述当前互动信息确定对应的音效等级;
    所述互动音效播放模块被配置为:
    按照所述音效等级确定模块确定的所述音效等级播放所述互动音效。
  10. 根据权利要求9所述的装置,其特征在于,所述音效等级确定模块包括:
    第一确定子模块,被配置为根据所述互动信息获取模块获取到的所述当前互动信息中的人脸特征,确定所述当前互动信息对应的音效等级,所述人脸特征包括五官变化特征。
  11. 根据权利要求9所述的装置,其特征在于,所述音效等级确定模块包括:
    第二确定子模块,被配置为根据所述互动信息获取模块获取到的所述当前互动信息中的语音特征,确定所述当前互动信息对应的音效等级,所述语音特征包括主播用户的语音内容及语音强度。
  12. 根据权利要求11所述的装置,其特征在于,所述第二确定子模块还被配置为:获取所述语音内容包含的关键词重复次数,以及所述语音强 度对应的分贝级别;并根据所述关键词重复次数和所述分贝级别确定所述语音特征对应的音效等级。
  13. 根据权利要求9所述的装置,其特征在于,所述音效等级确定模包括:
    第三确定子模块,被配置为根据所述互动信息获取模块获取到的所述当前互动信息中的弹幕信息,确定所述当前互动信息对应的音效等级,所述弹幕信息包括关键词重复次数或表情符号重复次数。
  14. 根据权利要求8所述的装置,其特征在于,所述音效等级包括:音效强度、音效内容、音效发声人数中的任意一种或者任意组合。
  15. 一种播放音效的装置,其特征在于,所述装置包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为:
    获取直播房间中的当前互动信息;
    确定与所述当前互动信息相对应的互动音效;
    播放所述互动音效。
PCT/CN2016/091996 2016-07-28 2016-07-28 播放音效的方法及装置 WO2018018482A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/091996 WO2018018482A1 (zh) 2016-07-28 2016-07-28 播放音效的方法及装置
CN201680000631.0A CN106464939B (zh) 2016-07-28 2016-07-28 播放音效的方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/091996 WO2018018482A1 (zh) 2016-07-28 2016-07-28 播放音效的方法及装置

Publications (1)

Publication Number Publication Date
WO2018018482A1 true WO2018018482A1 (zh) 2018-02-01

Family

ID=58215564

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/091996 WO2018018482A1 (zh) 2016-07-28 2016-07-28 播放音效的方法及装置

Country Status (2)

Country Link
CN (1) CN106464939B (zh)
WO (1) WO2018018482A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110392273A (zh) * 2019-07-16 2019-10-29 北京达佳互联信息技术有限公司 音视频处理的方法、装置、电子设备及存储介质
CN110536166A (zh) * 2019-08-30 2019-12-03 北京字节跳动网络技术有限公司 直播应用程序的互动触发方法、装置、设备及存储介质
CN111757174A (zh) * 2020-06-01 2020-10-09 青岛海尔多媒体有限公司 用于视频音画质匹配的方法及装置、电子设备
US20220248107A1 (en) * 2020-12-29 2022-08-04 Alibaba Group Holding Limited Method, apparatus, electronic device, and storage medium for sound effect processing during live streaming
CN114915853A (zh) * 2021-02-08 2022-08-16 中国电信股份有限公司 交互信息处理方法、装置、终端以及存储介质

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108076392A (zh) * 2017-03-31 2018-05-25 北京市商汤科技开发有限公司 直播互动方法、装置和电子设备
CN109165005B (zh) * 2018-09-04 2020-08-25 Oppo广东移动通信有限公司 音效增强方法、装置、电子设备及存储介质
CN109286772B (zh) * 2018-09-04 2021-03-12 Oppo广东移动通信有限公司 音效调整方法、装置、电子设备以及存储介质
CN109766473B (zh) * 2018-11-30 2019-12-24 北京达佳互联信息技术有限公司 信息交互方法、装置、电子设备及存储介质
CN109739464A (zh) * 2018-12-20 2019-05-10 Oppo广东移动通信有限公司 音效的设置方法、装置、终端及存储介质
CN109951652A (zh) * 2019-03-20 2019-06-28 合肥科塑信息科技有限公司 一种人像语音视频同步校准装置及系统
CN110113256B (zh) * 2019-05-14 2022-11-11 北京达佳互联信息技术有限公司 信息互动方法、装置、服务器、用户终端及可读存储介质
CN111263227B (zh) * 2020-02-10 2023-12-08 腾讯科技(深圳)有限公司 一种多媒体播放方法、装置、存储介质以及终端
CN111696565B (zh) * 2020-06-05 2023-10-10 北京搜狗科技发展有限公司 语音处理方法、装置和介质
CN111696564B (zh) * 2020-06-05 2023-08-18 北京搜狗科技发展有限公司 语音处理方法、装置和介质
CN111696566B (zh) * 2020-06-05 2023-10-13 北京搜狗智能科技有限公司 语音处理方法、装置和介质
CN112423143B (zh) * 2020-09-30 2024-02-20 腾讯科技(深圳)有限公司 一种直播消息交互方法、装置及存储介质
CN112911324B (zh) * 2021-01-29 2022-10-28 北京达佳互联信息技术有限公司 直播间的内容展示方法、装置、服务器以及存储介质
CN113031906A (zh) * 2021-04-23 2021-06-25 腾讯科技(深圳)有限公司 直播中的音频播放方法、装置、设备及存储介质
CN113573143B (zh) * 2021-07-21 2023-09-19 维沃移动通信有限公司 音频播放方法和电子设备
CN113490011A (zh) * 2021-08-20 2021-10-08 云知声(上海)智能科技有限公司 一种基于asr直播间气氛烘托系统及方法
CN113810729B (zh) * 2021-09-16 2024-02-02 中国平安人寿保险股份有限公司 直播氛围特效匹配方法、装置、设备及介质
CN114866791A (zh) * 2022-03-31 2022-08-05 北京达佳互联信息技术有限公司 音效切换方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101836219A (zh) * 2007-11-01 2010-09-15 索尼爱立信移动通讯有限公司 基于面部表情生成音乐播放列表
CN102355527A (zh) * 2011-07-22 2012-02-15 深圳市无线开锋科技有限公司 一种手机感应心情装置及方法
CN202150884U (zh) * 2011-07-22 2012-02-22 深圳市无线开锋科技有限公司 一种手机感应心情装置
CN102541259A (zh) * 2011-12-26 2012-07-04 鸿富锦精密工业(深圳)有限公司 电子设备及其根据脸部表情提供心情服务的方法
US20130262634A1 (en) * 2012-03-29 2013-10-03 Ikala Interactive Media Inc. Situation command system and operating method thereof
CN104484045A (zh) * 2014-12-26 2015-04-01 小米科技有限责任公司 音频播放控制方法及装置
CN105227550A (zh) * 2015-09-18 2016-01-06 广州酷狗计算机科技有限公司 场景显示方法、装置及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103634681B (zh) * 2013-11-29 2017-10-10 腾讯科技(成都)有限公司 直播互动方法、装置、客户端、服务器及系统
CN105763922B (zh) * 2016-04-28 2019-01-04 徐文波 视频处理的方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101836219A (zh) * 2007-11-01 2010-09-15 索尼爱立信移动通讯有限公司 基于面部表情生成音乐播放列表
CN102355527A (zh) * 2011-07-22 2012-02-15 深圳市无线开锋科技有限公司 一种手机感应心情装置及方法
CN202150884U (zh) * 2011-07-22 2012-02-22 深圳市无线开锋科技有限公司 一种手机感应心情装置
CN102541259A (zh) * 2011-12-26 2012-07-04 鸿富锦精密工业(深圳)有限公司 电子设备及其根据脸部表情提供心情服务的方法
US20130262634A1 (en) * 2012-03-29 2013-10-03 Ikala Interactive Media Inc. Situation command system and operating method thereof
CN104484045A (zh) * 2014-12-26 2015-04-01 小米科技有限责任公司 音频播放控制方法及装置
CN105227550A (zh) * 2015-09-18 2016-01-06 广州酷狗计算机科技有限公司 场景显示方法、装置及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110392273A (zh) * 2019-07-16 2019-10-29 北京达佳互联信息技术有限公司 音视频处理的方法、装置、电子设备及存储介质
CN110392273B (zh) * 2019-07-16 2023-08-08 北京达佳互联信息技术有限公司 音视频处理的方法、装置、电子设备及存储介质
CN110536166A (zh) * 2019-08-30 2019-12-03 北京字节跳动网络技术有限公司 直播应用程序的互动触发方法、装置、设备及存储介质
CN111757174A (zh) * 2020-06-01 2020-10-09 青岛海尔多媒体有限公司 用于视频音画质匹配的方法及装置、电子设备
US20220248107A1 (en) * 2020-12-29 2022-08-04 Alibaba Group Holding Limited Method, apparatus, electronic device, and storage medium for sound effect processing during live streaming
CN114915853A (zh) * 2021-02-08 2022-08-16 中国电信股份有限公司 交互信息处理方法、装置、终端以及存储介质

Also Published As

Publication number Publication date
CN106464939B (zh) 2019-10-25
CN106464939A (zh) 2017-02-22

Similar Documents

Publication Publication Date Title
WO2018018482A1 (zh) 播放音效的方法及装置
US11620984B2 (en) Human-computer interaction method, and electronic device and storage medium thereof
CN110505491B (zh) 一种直播的处理方法、装置、电子设备及存储介质
EP3125530B1 (en) Video recording method and device
CN107644646B (zh) 语音处理方法、装置以及用于语音处理的装置
CN107172497B (zh) 直播方法、装置及系统
WO2020042827A1 (zh) 网络作品的分享方法、装置、服务器以及存储介质
CN109348239B (zh) 直播片段处理方法、装置、电子设备及存储介质
WO2018036392A1 (zh) 基于语音分享信息的方法、装置与移动终端
CN109151565B (zh) 播放语音的方法、装置、电子设备及存储介质
CN106911967A (zh) 直播回放方法及装置
WO2016078394A1 (zh) 一种提醒语音通话的方法及装置
CN110309327B (zh) 音频生成方法、装置以及用于音频的生成装置
WO2017215133A1 (zh) 拍照提示方法及装置
CN111739530A (zh) 一种交互方法、装置、耳机和耳机收纳装置
US20220078221A1 (en) Interactive method and apparatus for multimedia service
CN111696538A (zh) 语音处理方法、装置和介质
US20210287011A1 (en) Information interaction method and apparatus, electronic device, and storage medium
CN109862421A (zh) 一种视频信息识别方法、装置、电子设备及存储介质
CN115273831A (zh) 语音转换模型训练方法、语音转换方法和装置
CN112532931A (zh) 一种视频处理方法、装置和电子设备
CN111739529A (zh) 一种交互方法、装置、耳机和服务器
CN107959751A (zh) 音频播放方法及装置
CN112988956B (zh) 自动生成对话的方法及装置、信息推荐效果检测方法及装置
CN111696536A (zh) 语音处理方法、装置和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16910063

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16910063

Country of ref document: EP

Kind code of ref document: A1