WO2021004362A1 - Audio data processing method and apparatus, and electronic device - Google Patents

Audio data processing method and apparatus, and electronic device Download PDF

Info

Publication number
WO2021004362A1
WO2021004362A1 PCT/CN2020/099864 CN2020099864W WO2021004362A1 WO 2021004362 A1 WO2021004362 A1 WO 2021004362A1 CN 2020099864 W CN2020099864 W CN 2020099864W WO 2021004362 A1 WO2021004362 A1 WO 2021004362A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
data
audio
main
feedback data
Prior art date
Application number
PCT/CN2020/099864
Other languages
French (fr)
Chinese (zh)
Inventor
贾锦杰
廖多依
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2021004362A1 publication Critical patent/WO2021004362A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • G06F16/437Administration of user profiles, e.g. generation, initialisation, adaptation, distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Definitions

  • This application relates to the field of Internet technology, and more specifically, to an audio data processing method, device, electronic equipment, and computer-readable storage medium.
  • An objective of the embodiments of the present application is to provide a new technical solution for a processing method of audio data.
  • an audio data processing method which includes:
  • the audio feedback data and the main audio data are combined to generate combined audio data for playback.
  • the combining the audio feedback data with the main audio data includes:
  • the audio feedback data generated in the set play period is merged with the main audio data.
  • the combining the audio feedback data with the main audio data includes:
  • the combining the audio feedback data with the main audio data includes:
  • the audio feedback data is combined with the main audio data through audio track synthesis.
  • the acquiring audio feedback data generated during the playback of the main audio data includes:
  • the generating of the combined audio data for playback includes:
  • the combined audio data is generated for playback by terminal devices that meet the target classification.
  • the method further includes:
  • the target classification corresponding to the terminal device is determined.
  • the set user characteristics include set characteristics corresponding to audio feedback data generated by the user of the terminal device during the playback process of the main audio data.
  • the main audio data is audio data of a video file
  • the method further includes:
  • the audio waveform representing the audio feedback data is displayed in the form of a bullet screen.
  • the acquiring audio feedback data generated during the playback of the main audio data includes:
  • the acquiring audio feedback data generated during the playback of the main audio data includes:
  • the text comment is converted into corresponding audio data, and at least the converted audio data is used as the audio feedback data.
  • the acquiring audio feedback data generated during the playback of the main audio data includes:
  • the expression feature is converted into corresponding audio data, and at least the converted audio data is used as the audio feedback data.
  • the main audio data is audio data of a live media file.
  • the method further includes:
  • a method for processing audio data which is implemented by a terminal device, and the method includes:
  • the live audio data further includes audio feedback data that the user corresponding to the terminal device feeds back on the main audio data.
  • a method for processing audio data which is implemented by a terminal device, and the method includes:
  • playing the target media file In response to the operation of playing the target media file, playing the target media file, where the target media file includes main audio data;
  • the acquiring live audio data corresponding to the main audio data includes:
  • the method further includes:
  • an audio data processing device including:
  • a data acquisition module for acquiring audio feedback data generated during the playback of the main audio data
  • the audio processing module is used to combine the audio feedback data with the main audio data to generate the combined audio data for playback.
  • an electronic device including the processing device according to the fourth aspect of the present application; or, including:
  • Memory used to store executable instructions
  • the processor is configured to run the electronic device according to the control of the executable instruction to execute the processing method according to the first, second or third aspect of the present application.
  • the electronic device is a terminal device without a display device.
  • the electronic device is a terminal device
  • the electronic device is a terminal device
  • the terminal device further includes an input device for the corresponding user to input feedback content for the main audio data
  • the feedback content is sent to the processing device or the processor, so that the processing device or the processor generates audio feedback data of the corresponding user for the main audio data according to the feedback content.
  • the electronic device is a terminal device, and the terminal device further includes an audio output device, and the audio output device is configured to play the main audio data while playing the main audio data according to the control of the processing device or the processor. Corresponding audio feedback data.
  • a computer-readable storage medium stores a computer program that can be read and executed by a computer, and the computer program is used to be read by the computer. When it is running, execute the processing method according to the first, second or third aspect of the present application.
  • the audio data processing method of this embodiment combines the main audio data with the audio feedback data generated during the playback of the main audio data, so that any terminal device can play the main audio data. At the same time, it can also play audio feedback data from other users. In this way, any user can listen to the main audio data with other users and post comments when listening to the main audio data through their respective terminal devices. Effect, get on-site experience.
  • Figure 1a is a schematic diagram of an application scenario illustrating the effects of an embodiment of the present application
  • Figure 1b is a hardware configuration diagram of an alternative data processing system that can be used to implement the audio data processing method of the embodiment of the present application;
  • Fig. 2 is a schematic flowchart of a processing method according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an example of guiding the user to input audio feedback data in the play window of the target media file
  • FIG. 4 is a schematic diagram of inserting audio feedback data into adjacent free gaps of main audio data during audio mixing
  • FIG. 5 is a schematic diagram of an example of guiding a user to input an instruction to turn on a live sound effect function
  • Fig. 6a is an interactive schematic diagram of a processing method according to an example of the present application.
  • Fig. 6b is an interactive schematic diagram of a processing method according to another example of the present application.
  • FIG. 7 is a schematic flowchart of a processing method according to another embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a processing method according to a third embodiment of the present application.
  • Fig. 9 is a schematic functional block diagram of an audio data processing device according to an embodiment of the present application.
  • Fig. 10a is a schematic block diagram of an electronic device according to an embodiment of the present application.
  • Fig. 10b is a schematic block diagram of an electronic device according to another embodiment of the present application.
  • media files have become the main medium of information transmission.
  • people can not only choose to enjoy the content of media files with others in the place where the media files are played, but also can use their own terminal devices to independently enjoy the content of the media files in various places
  • the above media files can be video files that contain audio data and image data.
  • the terminal equipment that supports the playback of video files requires a display device and an audio output device.
  • the above media files can also be pure audio files that only contain audio data, supporting pure audio playback
  • the file terminal device requires an audio output device, but may not have a display device, for example, a smart speaker.
  • everyone can feel the various voice feedbacks of other people on the media files.
  • voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent Such as the feedback of expression characteristics, etc., so that people can get rich and three-dimensional sensory experience on the spot.
  • feedback of language comments and for example, happy, sigh, sad, and silent
  • these voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent
  • these voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent
  • these voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent
  • these voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent
  • these voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent
  • these voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent
  • these voice feedbacks include feedback of expression characteristics, etc.
  • a user enjoys the content of a media file through a personal terminal device
  • at least the audio feedback data of others for the same media file can be combined with the main audio data of the media file. Play together to get a sensory experience equivalent to live mode.
  • An application scenario is shown in Fig. 1a, where user A, user B, user C, and user D enjoy the content of the same media file in different spaces through their respective terminal devices 1200 at the same time or at different times.
  • User A, User B, and User C all posted language comments during the same set play time period. Due to the spatial separation, each user cannot actually perceive the sound feedback of other users on the media file.
  • the audio feedback data of other users for the same media file can be combined and played together with the main audio data of the media file, so that each user can experience
  • the sound feedback of other users to the media file is equivalent to the on-site effect of the user A, user B, user C, and user D enjoying the media file through the same terminal device in the same place as shown in the lower part of FIG. 1a.
  • Fig. 1b is a schematic diagram of the structure of a data processing system to which the audio data processing method according to an embodiment of the present application can be applied.
  • the data processing system 1000 of this embodiment includes a server 1100, a terminal device 1200, and a network 1300.
  • the server 1100 may be, for example, a blade server, a rack server, etc.
  • the server 1100 may also be a server cluster deployed in the cloud, which is not limited here.
  • the server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160.
  • the processor 1110 may be, for example, a central processing unit CPU or the like.
  • the memory 1120 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like.
  • the interface device 1130 includes, for example, a USB interface, a serial interface, and the like.
  • the communication device 1140 can perform wired or wireless communication, for example.
  • the display device 1150 is, for example, a liquid crystal display.
  • the input device 1160 may include, for example, a touch screen, a keyboard, and the like.
  • the server 1100 can be used to participate in implementing the data processing method according to any embodiment of the present application.
  • the memory 1120 of the server 1100 is used to store instructions, and the instructions are used to control the processor 1110 to operate to support the implementation of the processing method according to any embodiment of the present application.
  • Technicians can design instructions according to the scheme disclosed in this application. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
  • server 1100 in the embodiment of the present application may only involve some of the devices, for example, only the processor 1110 and the memory 1120.
  • the terminal device 1200 may include a processor 1210, a memory 1220, an interface device 1230, a communication device 1240, a display device 1250, an input device 1260, an audio output device 1270, an audio pickup device 1280, and so on.
  • the processor 1210 may be a central processing unit (CPU), a microprocessor MCU, or the like.
  • the memory 1220 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like.
  • the interface device 1230 includes, for example, a USB interface, a headphone interface, and the like.
  • the communication device 1240 can perform wired or wireless communication, for example.
  • the display device 1250 is, for example, a liquid crystal display, a touch display, or the like.
  • the input device 1260 may include, for example, a touch screen, a keyboard, and the like.
  • the terminal device 1200 may output audio information through an audio output device 1270, which includes, for example, a speaker.
  • the terminal device 1200 may pick up the voice information input by the user through an audio pickup device 1280, which includes, for example, a microphone.
  • the terminal device 1200 may be a smart phone, a portable computer, a desktop computer, a tablet computer, a wearable device, a smart speaker, a set-top box, a smart TV, a voice recorder, a camcorder, etc., wherein the terminal device 1200 may have an audio output device 1270 to perform To play media files, an audio output device 1270 can also be connected to play media files.
  • the terminal device 1200 can be used to participate in implementing the data processing method according to any embodiment of the present application.
  • the memory 1220 of the terminal device 1200 is used to store instructions, and the instructions are used to control the processor 1210 to operate to support the implementation of the processing method according to any embodiment of the present application.
  • Technicians can design instructions according to the scheme disclosed in this application. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
  • the terminal device 1200 in the embodiment of the present application may only involve some of the devices, for example, only the processor 1210 and the memory 1220 and so on.
  • the network 1300 may be a wireless network or a wired network, and may be a local area network or a wide area network.
  • the terminal device 1200 may communicate with the server 1100 through the network 1300.
  • the data processing system 1000 shown in FIG. 1b is only for explanatory purposes, and is by no means intended to limit the application, its applications or uses.
  • FIG. 1b only shows one server 1100 and one terminal device 1200, it does not mean to limit the respective numbers.
  • the data processing system 1000 may include multiple servers 1100 and/or multiple terminal devices 1200.
  • the audio data processing method may be implemented by the server 1100 as required, may also be implemented by the terminal device 1200, or jointly implemented by the server 1100 and the terminal device 1200, which is not limited herein.
  • Fig. 2 is a schematic flowchart of a method for processing audio data according to an embodiment of the present application.
  • the processing method of this embodiment may include the following steps S2100 to S2200.
  • Step S2100 Obtain audio feedback data generated during the playback of the main audio data.
  • the main audio data is the audio data of the target media file to be played.
  • the target media file can be a pure audio file or a video file.
  • the target media file can be a live broadcast file or a recorded broadcast file, which is not limited here.
  • step S2100 all the audio feedback data generated during the playback of the main audio data may be acquired and combined in the following step S2200, or it may be acquired during the playback of the main audio data according to the set conditions.
  • the generated part of the audio feedback data is merged in the following step S2200, which is not limited here.
  • any feedback content published by any user during the playback of the main audio data corresponds to a piece of audio feedback data.
  • the feedback content may be one piece of audio feedback.
  • Voice comment this voice comment is about to form a piece of audio feedback data;
  • the content of any one feedback can also be a text comment, and the text comment can be converted into corresponding audio data, and the converted audio data is enough
  • a piece of audio feedback data is formed;
  • the content of any one-time feedback can also be an input expression feature, the expression feature can be an input emoji, voice expression, etc., the expression feature can be converted into corresponding audio data, and it can also be formed One piece of audio feedback data.
  • obtaining audio feedback data generated during the playback of the main audio data in step S2100 may include: obtaining voice comments fed back during the playback of the main audio data, and at least using the voice comments as the main audio Audio feedback data generated during data playback.
  • users can input voice comments through their respective terminal devices.
  • an entry for guiding users to input voice comments (for example: press and hold to send voice comments) can be provided in the play window of the target media file to be played.
  • the terminal device can collect the voice comment through an audio input device such as a microphone to form audio feedback data.
  • the user can also post a voice comment by operating the physical buttons set by the terminal device, which is not limited here.
  • obtaining audio feedback data generated during the playback of the main audio data in step S2100 may also include: obtaining text comments fed back during the playback of the main audio data; and converting the text comments into corresponding Audio data, and at least use the converted audio data as audio feedback data generated during the playback of the main audio data.
  • the user’s voice feature can be obtained according to the corresponding user’s voice collected in advance, and then the text comment is converted based on the voice feature, so that the converted audio data Reflect the voice characteristics of the user.
  • the text comment can also be converted based on the default voice feature.
  • the default voice feature can be a voice feature set by the system or a voice feature selected by the user, which is not limited here.
  • the emotional characteristics expressed by the text comment can also be recognized, so that the converted audio data reflects the emotional characteristics intended by the text comment.
  • the user can input the content of the text comment through the physical keyboard, virtual keyboard or touch screen provided by the terminal device, and can also post the text comment by simply selecting the preset text content provided by the terminal device.
  • obtaining the audio feedback data generated during the playback of the main audio data in step S2100 may also include: obtaining the expression characteristics fed back during the playback of the main audio data; and converting the expression characteristics into corresponding audio Data, and at least use the converted audio data as audio feedback data generated during the playback of the main audio data.
  • the expression features can be pre-stored in the terminal device, and the user can perform emotional feedback on the main audio data being played by selecting the expression features that can express their emotions.
  • the expression features may include symbolic expressions and voice expressions.
  • Voice expressions may include voice expressions and/or sound effect expressions.
  • Symbolic expressions are symbols, static pictures or dynamic pictures, etc. that express emotions or themes, and are used for users to choose to express their emotions or feelings in the process of voice communication.
  • the corresponding audio data can be converted according to the emotions or feelings expressed by the symbolic expressions.
  • Voice expressions are voice content expressing specific emotions or topics, and are used for users to choose to express their emotions or feelings in the process of voice communication.
  • voice expression conversion the voice content in the voice expression can be directly extracted as the converted audio data.
  • the voice content of the voice expression is the voice corresponding to the emotion or theme expressed by the voice expression, and is the voice expression with language content.
  • the voice content of the voice expression may be recorded by a specific person such as a celebrity, celebrity, voice actress, etc. according to a preset theme or content, or may be recorded by the user according to his own emotional expression needs.
  • the sound content of the sound effect expression is the sound effect corresponding to the emotional feature of the sound effect expression, and is the sound expression without language content. Users usually expect to express their feelings or emotions through the sound effects generated when the sound expression is played.
  • the sound content of the sound expression can be recorded for various preset themes or emotional expression needs.
  • this step S2100 can be to obtain all the audio feedback data generated by the accumulation of the corresponding playback period, or to obtain the audio feedback data generated within a set time. The audio feedback data corresponding to the playback period.
  • step S2100 can be to obtain all the audio feedback data generated corresponding to the accumulation of the first play period, or to obtain the audio feedback data generated on the day corresponding to the second play period, etc. , There is no limitation here.
  • the target media file is a live media file
  • the audio feedback data generated during any playback period is the audio feedback data generated during the playback time corresponding to the arbitrary playback period. Therefore, for the live media file, the combined audio data Will be able to reflect the live effect during the live broadcast.
  • step S2100 may be implemented by a server, such as the server 1100 in FIG. 1.
  • obtaining audio feedback data generated during the playback of the main audio data in step S2100 may include: obtaining audio feedback data generated by the corresponding user during the playback of the main audio data from each terminal device.
  • this step S2200 may be implemented by a terminal device, such as the terminal device 1200 in FIG. 1.
  • obtaining audio feedback data generated during the playback of the main audio data in step S2200 may include: obtaining audio feedback data generated by other users during the playback of the main audio data from the server.
  • Step S2200 Combine the acquired audio feedback data with the main audio data to generate a combined audio file for playback.
  • the combination in this embodiment may use any existing mixing means to mix the audio feedback data with the main audio data to form an audio file mixed with the audio feedback data.
  • the merging in this embodiment may also refer to the establishment of a temporal correspondence between the audio feedback data and the main audio data to form an audio file embodying the mapping relationship, and at least the main audio data can be played through different channels. And audio feedback data to achieve the effect of "mixing" for the user.
  • all audio feedback data can be mixed to occupy one channel, or all audio feedback data can be processed into multiple audio files occupying multiple channels, which is not limited here. , As long as the user can feel the "mixing" effect of the audio feedback data being played together with the main audio data.
  • the merging process according to this step can be performed continuously as the target media file is played and the audio feedback data is continuously generated, so as to be based on the continuously generated merged
  • the audio file continues to play the target media file until the playback ends.
  • this step S2200 can be implemented by a server or a terminal device, which is not limited here.
  • combining the acquired audio feedback data with the main audio data in this step S2200 may include: performing the audio feedback data and the main audio data according to the playback period of the main audio data corresponding to each audio feedback data when it is generated. Consolidation of audio data.
  • the play period of the main audio data is divided based on the relative play time of the target media file, where the relative reference point of the relative play time is the start play point of the target media file. For example, playing 0-5 minutes is the first playing period, playing 5-10 minutes is the second playing period, and so on.
  • the length of the playback period can be set according to needs, and the length can be fixed and can also be adjusted adaptively.
  • the length of the set playing period is 5 minutes
  • the playing period of the main audio data corresponding to the audio feedback data when it is generated is the playing period of 5-10 minutes of playing the target media file.
  • the length of the set playing period is 2 minutes
  • the playing period of the main audio data corresponding to the audio feedback data generated is the playing period of 1 to 2 minutes of playing the target media file.
  • each terminal device when each terminal device obtains the audio feedback data generated by the corresponding user, it can record the generation time of the audio feedback data and the corresponding playback period.
  • the starting position of each audio feedback data can be set to be aligned with the starting position of the corresponding main audio data play period.
  • the start position of each audio feedback data can be allowed to lag behind the corresponding play period of the main audio data.
  • the user plays the merged audio data through his personal terminal device, he can feel that the audio feedback of all users changes with the playback of the main audio data, including changes in the number of feedbacks and/ Or feedback content changes, etc., to provide a more realistic on-site experience.
  • combining the acquired audio feedback data with the main audio data in this step S2200 may include the following steps S2211 to S2213:
  • Step S2211 Obtain the number of audio feedback data generated within the set playing period of the main audio data.
  • the set play period can be preset according to real-time requirements. For example, if it is set to perform quantity statistics per minute, the divided set period includes: the first playing period of 0 to 5 minutes, the playing period of 5 to 10 minutes, ..., and so on.
  • Step S2212 Determine the corresponding merging effect according to the number, where the merging effect at least reflects the volume ratio of each data participating in the merging.
  • the mapping data representing the correspondence between the quantity and the merging effect may be pre-stored, so as to find the merging effect corresponding to the quantity obtained in step S2211 in the mapping data.
  • the combined effect includes a living room scene effect, a theater scene effect, a square scene effect, and so on.
  • the quantity situation corresponding to various scenes is: the living room scene is smaller than the theater scene, and the theater scene is smaller than the square scene.
  • the volume ratio reflected by various scene effects is: For the volume ratio of the audio feedback data to the main audio data, the living room scene is smaller than the theater scene, and the theater scene is smaller than the square scene.
  • the corresponding number is, for example, less than or equal to 20 people.
  • each user in the scene can hear the audio feedback of other users clearly. Therefore, the volume ratio reflected by the living room scene effect can be set It is: after merging, on the basis of listening to the content of the main audio data, the content of each audio feedback data participating in the merging can be heard.
  • the corresponding number is, for example, greater than 20 people and less than or equal to 200 people.
  • the various audio feedback in the scene can only be vague and audible. Therefore, the volume configuration reflected by the theater scene effect
  • the ratio can be set to: after merging, on the basis of listening to the content of the main audio data, the content of each audio feedback data participating in the merging can be heard vaguely.
  • the corresponding number is, for example, more than 200 people.
  • the audio feedback in the scene is not audible, and only various noises can be heard. Therefore, the volume ratio reflected by the theater scene effect can be set to : After merging, you can only hear the content of the main audio data, but cannot hear the content of the audio feedback data participating in the merging. That is, in the square scene, you can only feel the noise of multiple people performing audio feedback.
  • the combined audio data will only have the main audio data in the part that corresponds to the playback period or lags behind the playback period.
  • Content When the terminal device is playing this part of the content, the user can only hear the audio content of the main audio data without any audio feedback data content. Therefore, all users can feel the silent scene when all users enjoy this part of the content. Atmosphere.
  • Step S2213 According to the combination effect determined in step S2212, the audio feedback data generated during the set playing period is combined with the main audio data.
  • the combination according to step S2213 can be performed in the corresponding playback period of the main audio data according to the determined combination effect, or the combination according to step S2213 can be performed in the next playback period of the corresponding playback period of the main audio data. Not limited.
  • step S2211 Taking the number of audio feedback data generated during the 0th to 5th minutes of the main audio data as an example, the number obtained in step S2211 is 15, and the combination effect determined according to the number in step S2212 is the living room scene effect , Then in step S2213, according to the effect of the living room scene, the audio feedback data generated during the 0 to 5 minutes playing period will be merged with the main audio data. Due to the merging process, the playing time of the audio feedback data will be longer than that of the same audio feedback data The generation time is delayed.
  • this can be the combination of the audio feedback data generated during the playback period of 0 to 5 minutes with the part of the main audio data corresponding to the playback period of 5 to 10 minutes, or the combination of the main audio data Corresponding to the partial merging of the playback period from 2 to 7 minutes, the specific delay time is related to the processing speed and the set sampling time interval for reading the audio feedback data, which is not limited here.
  • the merging process of this example can make the merged audio data reflect the impact of the amount of audio feedback data on the auditory effect, and then realize the simulation of the live effect of the corresponding number of audiences on the main audio data. , To enhance the user's on-site experience.
  • combining the audio feedback data with the main audio data in this step S2200 may include the following steps S2221 to S2222.
  • Step S2221 Detect the idle gap of the main audio data adjacent to each audio feedback data according to the play period of the main audio data corresponding to each audio feedback data when it is generated.
  • This idle gap is a time gap in the main audio data where there is no audio content.
  • the grid part identifies that it has audio content, and the blank part indicates the idle gaps in the main audio data, and audio feedback data can be inserted in the idle gaps.
  • each free gap of the main audio data can be used as a combined slot to perform a combined operation in each combined slot.
  • Step S2222 align each audio feedback data with an adjacent idle gap, and merge the audio feedback data with the main audio data.
  • the start position of each audio feedback data can be aligned with any position of the adjacent free gap to merge, for example, the start position of each audio feedback data is aligned with the adjacent free gap.
  • the starting positions of the free gaps are relatively aligned and merged, which is not limited here.
  • combining the audio feedback data with the main audio data in step S2200 may include the following steps S2231 to S2232.
  • Step S2231 setting that each data including the main audio data and the audio feedback data occupies a different audio track.
  • Step S2232 Combine the audio feedback data with the main audio data through audio track synthesis.
  • the audio processing according to this example can use audio track synthesis technology to merge audio data, which is beneficial to reduce the difficulty of audio merging and obtain a good merging effect.
  • the combined audio data generated in step S2200 for playback may be: for the terminal device that currently plays the target media file, after each audio file is updated through the combination, the current playback time is continued after the update is played Audio files.
  • the generation of the merged audio data for playback in step S2200 may be implemented by the terminal device or the server.
  • generating the combined audio data for playback in this step may include: sending the combined audio data to the terminal device for playback.
  • generating the combined audio data for playback in this step may include: generating the combined audio data to drive the audio output device to play.
  • the audio data processing method of this embodiment is to combine the main audio data of the target media file selected by the user for playback with the audio feedback data generated during the playback of the main audio data to obtain the combined
  • the audio data is available for playback. In this way, when any user plays the target media file through his or her own terminal device, he can get the live listening effect of enjoying the target media file with other people, and then get the live experience.
  • the processing method may further include a step of detecting whether the live sound effect is turned on, so as to execute the above step S2200 in response to the instruction to enable the live sound effect function.
  • the above additional steps in this embodiment can be implemented by the terminal device, that is, the terminal device merges the acquired audio feedback data with the main audio data in response to the user's input to enable the live sound effect function.
  • the instruction may be triggered by the user through a physical button of the terminal device, or may be triggered by a virtual button (control) provided by the application playing the target media file.
  • the instruction may be triggered by a virtual button for enabling live sound effects as shown in FIG. 5.
  • the server responds to the instruction sent by the terminal device to turn on the live sound function, and provides the combined audio data to the terminal device for playback, or provides the terminal device with audio feedback data Merging with the main audio data to form merged audio data for playback.
  • the instruction sent by the terminal device may be generated based on the instruction triggered by the user.
  • This embodiment allows the user to choose whether to play the combined audio data. If the audio feedback data is not desired to be played, they can also choose to play only the main audio data of the target media file to achieve diversified choices.
  • the audio feedback data participating in the merging may be the same.
  • all audio feedback data generated during the playback of the main audio data may be acquired.
  • Combining can also be to obtain and combine the filtered part of the audio feedback data according to the set filtering conditions, which is not limited here.
  • the audio feedback data participating in the merging may be different, that is, according to user preferences, different audio feedback data can be filtered for different types of users and combined to obtain thousands of people. The live effect of the face.
  • obtaining the audio feedback sound data generated during the playback of the main audio data in the above step S2100 may include: obtaining audio feedback data that meets the target classification generated during the playback of the main audio data.
  • each target classification may be set in advance, which may be a classification based on at least one of the user's age, gender, education, and preferences, for example, five target classifications are set according to the user's age.
  • audio feedback data generated by a user under 20 can be obtained in this step S2100 to form audio feedback data conforming to the target classification.
  • generating the combined audio data for playback in the above step S2200 includes: generating the combined audio data for playback by terminal devices that meet the target classification.
  • the terminal device that conforms to the target classification refers to the user corresponding to the terminal device, that is, the user who uses the terminal device, and conforms to the target classification.
  • step S2110 may include: for each target classification set, the server obtains the corresponding target classification generated during the playback of the target media file. Audio feedback data.
  • the server may deliver the acquired audio feedback data that meets the target classification to a terminal device that matches the target classification to merge with the main audio data.
  • the server can also merge the audio feedback data with the main audio data after acquiring the audio feedback data that meets the target classification, and send the combined audio data to the terminal device that matches the target classification for playback .
  • step S2110 may include: the terminal device obtains from the server audio feedback data that corresponds to the target category to which the user belongs, so as to communicate with the main audio data The merger.
  • the target category to which the corresponding user belongs can be selected and determined by the user according to the target category provided, or determined according to the user characteristics of the corresponding user.
  • different on-site effects can be provided for different types of users, thereby improving the fit between the provided on-site effects and users, and improving user experience.
  • the processing method of the present application may further include: acquiring a characteristic value of a set user characteristic corresponding to a terminal device that plays the main audio data; and, according to the characteristic value, determining the target category to which the terminal device belongs.
  • the set user characteristic corresponding to the terminal device refers to the set user characteristic of the user corresponding to the terminal device, that is, the set user characteristic of the user who uses the terminal device.
  • the set user characteristics include any one or more of age, education, gender, hobbies, and preferred language types. These feature values for setting user characteristics can be determined based on the user's registration information, or based on the historical usage data generated by the user using this application (the application that provides the target media file), or based on the historical usage generated by the user through the use of other applications The data is confirmed and not limited here.
  • the set user characteristics may include the set characteristics of the audio feedback data generated by the user during the playback of the main audio data.
  • the setting feature includes, for example, any one or two of voice features and emotional features.
  • the corresponding user can be assigned to target categories with similar language types based on the set feature, or the corresponding users can be assigned to target categories with the opposite language type based on the set feature, which is not limited here.
  • Sound characteristics refer to characteristics related to sound attributes embodied in audio feedback data.
  • the sound characteristics may include volume characteristics, rhythm characteristics, pitch characteristics, and the like.
  • Emotional features refer to the features related to the user's emotions or feelings reflected in the audio feedback data.
  • the emotional features can include the type of emotion, the degree of emotion, and the theme of expression.
  • the emotion type can be a preset type according to human emotion and emotion classification.
  • the emotion type can include anger, happiness, sadness, joy, etc.
  • the emotion level can include the emotion level of the corresponding emotion type, for example, the emotion type of anger can include Various degrees of anger, such as anger, anger, and slight anger.
  • voice analysis can be performed on the audio feedback data to extract the corresponding volume characteristics and rhythm characteristics.
  • voice signal analysis methods can be used to determine the volume and rhythm of the audio feedback data, and correspondingly obtain the volume characteristics and rhythm characteristics of the audio feedback data.
  • the content of the audio feedback data can be converted into the corresponding text, and the emotional keywords can be extracted from the text according to the pre-built emotional vocabulary, and the emotional keywords can be processed through the emotional structure model. Structured analysis obtains the emotional type and emotional degree of emotional keywords as the emotional characteristics of the audio feedback data.
  • the audio feedback data can be passed through a speech recognition engine or speech-to-text tools, plug-ins, etc., to obtain the corresponding text.
  • the emotional vocabulary includes a plurality of emotional vocabularies that respectively reflect different human emotions or human emotions.
  • these emotional vocabularies can be excavated manually or by machines to construct an emotional vocabulary in advance.
  • the vocabulary obtained by segmenting the audio feedback data and the emotional vocabulary included in the emotional vocabulary can be analyzed for similarity through cosine similarity and other methods, and emotional vocabulary with similarity higher than the preset similarity threshold can be extracted As emotional keywords.
  • the emotional structured model can be a vocabulary model obtained by classifying and structuring the collected emotional vocabulary related to emotions.
  • Each emotion vocabulary included in the emotion structure model has a corresponding emotion type and emotion degree.
  • the emotional vocabulary obtained through manual or machine mining in advance can be classified into different levels according to human emotions or human emotions.
  • each emotion type is divided into major categories, and each major category includes The emotional vocabulary of the same emotional type is further subdivided into different sub-categories according to the different emotional level in each major category.
  • the emotional vocabulary can be sorted according to the emotional level to form different classification levels.
  • the structure of the emotional vocabulary is organized to obtain the emotional structured model.
  • the emotional keyword is structured and analyzed, and the emotional vocabulary corresponding to the emotional keyword can be found in the emotional structured model, and the emotional type of the emotional keyword can be determined according to the emotional type and emotional degree of the emotional vocabulary And the degree of emotion corresponds to the emotional characteristics of the audio feedback data.
  • the required feature value of the set feature can be determined directly according to the expression feature, for example, an angry expression represented by an expression feature, the feature value of the corresponding emotion feature can be determined directly based on the expression feature .
  • This step of this embodiment can be implemented by the server according to the characteristic value of the set user characteristic corresponding to the user provided by the terminal device, or it can be implemented by the terminal device.
  • each terminal device determines the corresponding The target classification described by the corresponding user.
  • the target classification to which the user or the terminal device belongs is determined according to the characteristic value of the user characteristic, which can improve the accuracy of determining the target classification, and does not require the user to set the desired target classification through additional operations, thereby achieving intelligence classification.
  • the main audio data is the audio data of the video file.
  • the processing method of this embodiment may further include: displaying the audio representing the audio feedback data in the form of a bullet screen in the video playback window of the video file Waveform.
  • the audio waveform representing the audio feedback data is a graphical representation of the audio feedback data.
  • the sound characteristics and emotional characteristics of the audio feedback data may be acquired first, and then the audio waveform is generated according to the sound characteristics and emotional characteristics of the audio feedback data.
  • the display shape of the audio waveform can be set according to the sound characteristics of the audio feedback data.
  • the display shape may include the amplitude of the audio waveform, the interval of the waveform period, and the duration of the waveform.
  • the sound characteristics of the audio feedback data include rhythm characteristics and volume characteristics.
  • the waveform period interval of the audio waveform can be set according to the rhythm reflected by the rhythm characteristics. For example, the faster the rhythm, the shorter the waveform period interval, etc., according to the volume characteristics of the volume. Set the waveform amplitude of the audio waveform, such as the louder the volume, the larger the waveform amplitude, etc.
  • the display color of the audio waveform can be set according to the emotional characteristics of the audio feedback data.
  • different types of display colors can be set according to different emotion types.
  • the emotion type is "angry”
  • the display color is set to red
  • the emotion type is "happy”
  • the display color is set to green, which is different for the same emotion type.
  • the emotion level setting is different for the same type of display color.
  • the emotion type is "happy”
  • the emotion level is “big joy”
  • the display color is dark green
  • the emotion level is "a little happy”
  • the display color is light green. and many more.
  • the audio waveform is displayed in the form of a barrage in the video playback window. While obtaining the on-site auditory effect, it can also intuitively feel the voice characteristics of other users through the graphical expression of the audio feedback data. And emotional characteristics.
  • Fig. 6a is an exemplary flowchart of a method for processing audio data according to an example of the present application.
  • the audio feedback data provided by the server to each terminal device may be the same. Therefore, only one terminal device is shown in the figure.
  • the processing method may include the following steps:
  • step S1210 the terminal device 1200 collects the audio feedback data generated by the corresponding user during the playback process of the target media file, that is, during the playback process of the main audio data, and uploads it to the server 1100.
  • the terminal device 1200 shown in the figure may not generate audio feedback data. Instead, other terminal devices 1200 collect the audio feedback data generated by the corresponding user during the playback of the target media file and upload it to the server 1100.
  • step S1110 the server 1100 obtains the audio feedback data uploaded by each terminal device including the terminal device shown in the figure.
  • step S1120 the server 1100 delivers the acquired audio feedback data to each terminal device 1200 that is playing the target media file to merge the audio data.
  • step S1220 the terminal device 1200 obtains audio feedback data provided by the server 1100.
  • step S1230 the terminal device 1200 merges the acquired audio feedback data with the main audio data of the target media file to generate a merged target media file.
  • the terminal device 1200 combines the main audio data and the acquired audio feedback data, for example, by means of audio mixing.
  • Step S1240 When the terminal device 1200 plays the target media file, it plays the combined audio data instead of playing the separate main audio data. That is, the user corresponding to the terminal device 1200 can at least listen to other users while listening to the main audio data. Audio feedback data generated in the process of playing the main audio data.
  • Fig. 6b is an exemplary flowchart of a method for processing audio data according to another example of the present application.
  • the audio feedback data provided by the server to each terminal device may be different.
  • the figure shows two terminal devices conforming to different target categories, namely terminal device 1200-1 and terminal device 1200-2.
  • the processing method may include the following steps:
  • step S1210-1 the terminal device 1200-1 collects and uploads the audio feedback data generated by the corresponding user during the playback of the target media file to the server 1100.
  • step S1210-2 the terminal device 1200-2 collects and uploads the audio feedback data generated by the corresponding user during the playback of the target media file to the server 1100.
  • the terminal device 1200-1 and/or the terminal device 1200-2 shown in the figure may not generate audio feedback data, but the other terminal device 1200 collects the corresponding user during the playback of the target media file.
  • the generated audio feedback data is uploaded to the server 1100.
  • step S1110 the server 1100 obtains audio feedback data uploaded by each terminal device including the terminal device 1200-1 and the terminal device 1200-2.
  • step S1120-1 the server 1100 delivers the user voice data that is generated during the playback of the target media file and conforms to the target classification to which the terminal device 1200-1 belongs to the terminal device 1200-1 to merge the audio data.
  • step S1120-2 the server 1100 delivers the user sound data generated during the playback of the target media file and conforms to the target classification to which the terminal device 1200-2 belongs to the terminal device 1200-2 to merge the audio data.
  • Step S1220-1 The terminal device 1200-1 obtains the audio feedback data provided by the server 1100.
  • step S1230-1 the terminal device 1200-1 combines the acquired audio feedback data with the main audio data of the target media file to generate combined audio data A.
  • step S1240-1 the terminal device 1200-1 plays the combined audio data A during the process of playing the target media file, where the auditory effect of playing the combined audio data A is: the user corresponding to the terminal device 1200-1 While listening to the main audio data, it is also possible to listen to the audio feedback data that matches the target classification of the terminal device 1200-1.
  • step S1220-2 the terminal device 1200-2 obtains the audio feedback data provided by the server 1100.
  • step S1230-2 the terminal device 1200-2 combines the acquired audio feedback data with the main audio data of the target media file to generate combined audio data B.
  • step S1240-2 the terminal device 1200-2 plays the combined audio data B during the process of playing the target media file, where the auditory effect of playing the combined audio data B is: the user corresponding to the terminal device 1200-2 While listening to the main audio data, it is also possible to listen to the audio feedback data that matches the target classification of the terminal device 1200-2.
  • the merged audio data A and the merged audio data B will be different, achieving a scene effect of thousands of people .
  • FIG. 7 is a schematic flowchart of a method for processing audio data according to this embodiment.
  • the processing method is implemented by a terminal device, such as the terminal device 1200 in FIG. 1, where the terminal device in this embodiment may have a display device.
  • the device may also be a device without a display device; it may have an audio output device itself, or an external audio output device may be connected wirelessly or wiredly.
  • the method of this embodiment may include the following steps S7100 to S7300:
  • step S7100 the terminal device 1200 obtains the main audio data selected to be played.
  • the main audio data selected to be played is the audio data of the target media file selected by the user using the terminal device 1200, and the target media file may be a pure audio file or a video file.
  • step S7200 the terminal device 1200 obtains live audio data corresponding to the main audio data, where the live audio data includes at least other users' audio feedback data for the main audio data.
  • the live audio data may also include audio feedback data generated by the user corresponding to the terminal device 1200 for the main audio data. That is, for any terminal device 1200, not only the audio feedback data of other users participates in the merging of audio data, but the terminal device is used Audio feedback data generated by users of 1200 can also participate in the merging of audio data.
  • the acquired live audio data may be live audio data conforming to the target classification of the terminal device 1200, or may be live audio data that is the same for any terminal device 1200, which is not limited herein.
  • the terminal device 1200 may obtain all audio feedback data from the server, including audio feedback data generated by other users, and may also include audio feedback data generated by the user corresponding to the terminal device 1200.
  • the terminal device 1200 may only obtain audio feedback data generated by other users from the server, and obtain audio feedback data generated by the corresponding user locally.
  • the terminal device 1200 obtains the live audio data
  • the live audio data and the main audio data are combined to obtain the combined audio data.
  • it may be merged by the server and provided to the terminal device.
  • the above steps S7100 and S7200 refer to acquiring the merged audio data, where the merged audio data includes the main Audio data and live audio data.
  • step S7300 the terminal device 1200 performs a processing operation of playing the corresponding live audio data while playing the main audio data.
  • the processing operation includes the combining process, and driving the audio output device according to the combined audio data to play the corresponding live audio data while playing the main audio data.
  • the processing operation includes the combining process, and driving the audio output device according to the combined audio data to play the corresponding live audio data while playing the main audio data.
  • any one or more of the methods provided in Embodiment 1 of the above method can be used for the merging process, which will not be repeated here.
  • the processing operation includes: according to the merged audio data, the audio output device is driven to play the main audio data while playing the corresponding live audio data.
  • the terminal device 1200 may drive the audio output device to be able to play the main audio according to the combined audio data, for example, according to the mixed audio data, or according to the correspondence between the main audio data and the live audio data.
  • the corresponding live audio data is played to realize the live effect of enjoying the target media file with other people.
  • the terminal device may be a smart phone, a laptop computer, a desktop computer, a tablet computer, a wearable device, a smart speaker, a set-top box, a smart TV, a voice recorder, a camcorder, etc., which are not limited here.
  • the terminal device can play the acquired live audio data along with the main audio data of the target media file during the process of playing the target media file selected by the user, so that the user can obtain the main audio data and the live audio data.
  • the live listening experience of audio data mixed together. Therefore, according to the processing method of this embodiment, when any user plays the target media file through his terminal device, he/she can obtain the live listening effect of enjoying the target media file with other people, thereby obtaining the live experience.
  • FIG. 8 is a schematic flowchart of a method for processing audio data according to this embodiment.
  • the processing method is implemented by a terminal device, such as the terminal device 1200 in FIG. 1, where the terminal device in this embodiment may have a display device.
  • the device may also be a device without a display device; it may have an audio output device itself, or an external audio output device may be connected wirelessly or wiredly.
  • the processing method of this embodiment may include the following steps S8100 to S8300:
  • Step S8100 In response to the operation of playing the target media file, the terminal device 1200 plays the target media file, where the target media file includes main audio data.
  • Step S8200 Acquire live audio data corresponding to the main audio data, where the live audio data includes at least other users' audio feedback data for the main audio data.
  • the live audio data may also include the user corresponding to the terminal device, that is, the local user, audio feedback data for the main audio data.
  • the audio feedback data of the local user can be obtained from the server together with the audio feedback data of other users, or it can be obtained locally, which is not limited here.
  • obtaining live audio data corresponding to the main audio data in step S8200 may include: obtaining audio feedback data of other users for the main audio data from the server to form live audio data.
  • obtaining live audio data corresponding to the main audio data in step S8200 may further include: obtaining audio feedback data of the user corresponding to the terminal device for the main audio data from the server or locally to form live audio data.
  • step S8300 the terminal device 1200 performs a processing operation of playing live audio data along with the main audio data of the target media file during the process of playing the target media file.
  • the processing operation may include: terminal device 1200 merging processing, that is, merging live audio data with main audio data, and driving the audio output device to play the live audio data while playing the main audio data according to the merged audio data,
  • the merging process can adopt any one or more of the methods provided in Embodiment 1 of the above method, which will not be repeated here.
  • the processing operation may include: the terminal device 1200 obtains the combined audio data provided by the server 1100, where the combined audio data is audio data obtained by combining the main audio data and the live audio data, and , According to the combined audio data to drive the audio output device to play the live audio data while playing the main audio data.
  • step S8300 the terminal device 1200 will drive the audio output device to play the live audio data at the same time as the main audio data according to the combined form, such as the mixed audio form or the multi-channel form, so as to follow the main
  • the audio data playback corresponds to the live audio data to realize the live effect of enjoying the target media file with others.
  • the processing method may further include: obtaining audio feedback data fed back by the user corresponding to the terminal device for the main audio data; uploading the user's audio feedback data to the server.
  • the server after uploading the user's audio feedback data to the server, the server can send the user's audio feedback data to the terminal devices of other users, so that other users who are also playing the target media file can receive Audio feedback data to the user.
  • Fig. 9 is a schematic block diagram of an audio data processing device according to an embodiment of the present application.
  • the processing device 9000 of this embodiment includes a data acquisition module 9100 and an audio processing module 9200.
  • the data acquisition module 9100 is used to acquire audio feedback data generated during the playback of the main audio data.
  • the audio processing module 9200 is used to combine the audio feedback data with the main audio data, and generate the combined audio data for playback.
  • the above audio processing module 9200 when the above audio processing module 9200 merges the audio feedback data with the main audio data, it can be used to: obtain the quantity of the audio feedback data generated during the set playing period of the main audio data; according to the quantity The corresponding merging effect is determined, where the merging effect at least reflects the volume ratio of each data participating in the merging; and, according to the merging effect, the audio feedback data generated during the set playback period is merged with the main audio data.
  • the above audio processing module 9200 when the above audio processing module 9200 combines the audio feedback data with the main audio data, it can be used to: detect the main audio data according to the play period of the main audio data corresponding to each audio feedback data when it is generated. And align each audio feedback data with the adjacent idle gaps for merging.
  • the above audio processing module 9200 when the above audio processing module 9200 combines the audio feedback data with the main audio data, it can be used to: set each data including the main audio data and the audio feedback data to occupy a different audio track. ; And, merge the audio feedback data with the main audio data through audio track synthesis.
  • the processing device 9000 may further include a detection module configured to detect whether the live sound effect function is enabled, and in response to the instruction to enable the live sound effect function, notify the audio processing module 9200 to perform the combination of audio feedback data with the main audio Operation of data merging.
  • a detection module configured to detect whether the live sound effect function is enabled, and in response to the instruction to enable the live sound effect function, notify the audio processing module 9200 to perform the combination of audio feedback data with the main audio Operation of data merging.
  • the above data acquisition module 9100 when the above data acquisition module 9100 acquires the audio feedback data generated during the playback of the main audio data, it may include: acquiring the voice comments fed back during the playback of the main audio data, and at least the voice Comments are used as audio feedback data.
  • the above data acquisition module 9100 when the above data acquisition module 9100 acquires the audio feedback data generated during the playback of the main audio data, it may include: acquiring the text comments fed back during the playback of the main audio data; and adding the text comments Convert it into corresponding audio data, and at least use the converted audio data as audio feedback data.
  • the above data acquisition module 9100 when the above data acquisition module 9100 acquires the audio feedback data generated during the playback of the main audio data, it may include: acquiring the expression characteristics fed back during the playback of the main audio data; and The expression features are converted into corresponding audio data, and at least the converted audio data is used as audio feedback data.
  • the above data acquisition module 9100 acquires audio feedback data generated during the playback of the main audio data, it may be used to: acquire audio feedback data that meets the target classification generated during the playback of the main audio data , So that the audio processing module 9200 generates merged audio data for playback by terminal devices that meet the target classification.
  • the processing device 9000 may further include a classification module configured to: obtain a characteristic value of a set user characteristic corresponding to a terminal device that plays the main audio data; and, according to the characteristic value, determine the Target classification corresponding to the terminal device.
  • a classification module configured to: obtain a characteristic value of a set user characteristic corresponding to a terminal device that plays the main audio data; and, according to the characteristic value, determine the Target classification corresponding to the terminal device.
  • the setting user characteristics may include: setting characteristics corresponding to the audio feedback data generated by the user of the terminal device during the playback process of the main audio data.
  • the main audio data is the audio data of the video file
  • the processing device 9000 may further include a display processing module, which is used to display the audio feedback data in the form of a barrage in the display window. Audio waveform.
  • the electronic device 100 includes a processing device 9000 according to any embodiment of the present application.
  • the electronic device 100 may include a memory 110 and a processor 120.
  • the memory 110 is used to store executable instructions; the processor 120 is used to execute commands according to the executable instructions. Control and execute the processing method as in any method embodiment of this application.
  • the electronic device 1000 may be a server, such as the server 1100 in FIG. 1, or any terminal device, such as the terminal device 1200 in FIG. 1, and may also include a server and a terminal device, such as The server 1100 and the terminal device 1200 in FIG. 1 are not limited here.
  • the electronic device 100 is a terminal device.
  • the terminal device may be a device with a display device or a device without a display device.
  • the terminal device is a set-top box, a smart speaker, etc.
  • the electronic device 100 is a terminal device.
  • the terminal device may also include an input device for the corresponding user to post feedback content for the main audio data and send the feedback content to the above processing device 9000 or
  • the processor 120 is used for the processing device 9000 or the processor 120 to generate audio feedback data corresponding to the user's main audio data according to the feedback content.
  • the input device may include at least one of an audio input device, a physical keyboard, a virtual keyboard, and a touch screen.
  • processing device or processor of the terminal device can also be used to control the communication device to send the audio feedback data of the corresponding user to the server, so that the server can send the audio feedback data of the corresponding user to the terminal equipment of other users.
  • Other users can receive the user’s audio feedback data while playing the same target media file.
  • the electronic device 100 is a terminal device.
  • the terminal device may also include an audio output device.
  • the audio output device is used to play the main audio data while playing the corresponding audio according to the control of the processing device or the processor. Feedback data.
  • the terminal device may also be connected to the audio output device in a wired or wireless manner to play the combined audio data.
  • a computer-readable storage medium stores a computer program that can be read and run by a computer. , Execute the audio data processing method as described in any of the above embodiments of this application.
  • the computer program product may include a computer readable storage medium loaded with computer readable program instructions for enabling a processor to implement various aspects of the present application.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as a transient signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages
  • Source code or object code written in any combination the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet connection).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • the computer-readable program instructions are executed to realize various aspects of the present application.
  • These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine such that when these instructions are executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner, so that the computer-readable medium storing instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more functions for implementing the specified logical function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Disclosed are an audio data processing method and apparatus, and an electronic device. The processing method comprises: acquiring audio feedback data generated during the playing of main audio data (S2100); and merging the audio feedback data with the main audio data to generate merged audio data for playing (S2200). When different users play a media file with the main audio data by means of respective terminal devices and in different spaces, the users can achieve an on-site effect of enjoying the media file in the same space with the others.

Description

音频数据的处理方法、装置及电子设备Audio data processing method, device and electronic equipment
本申请要求2019年07月10日递交的申请号为201910619886.0、发明名称为“音频数据的处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 201910619886.0 and the invention title of "audio data processing method, device and electronic equipment" filed on July 10, 2019, the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及互联网技术领域,更具体地,涉及一种音频数据的处理方法、装置及电子设备及计算机可读存储介质。This application relates to the field of Internet technology, and more specifically, to an audio data processing method, device, electronic equipment, and computer-readable storage medium.
背景技术Background technique
随着音、视频等媒体文件的播放技术的飞速发展,时下提供媒体文件播放服务的应用通常都向用户提供评论功能,令用户可以在播放媒体文件的过程中发表评论。在现有技术中,这些评论都是线性展开排列的,任意媒体文件的接收方对于媒体文件的接收与评论内容的接收,在感官上是相互分离的,无法获得多人接收并评论媒体文件的现场感。With the rapid development of audio, video and other media file playback technologies, applications that provide media file playback services nowadays usually provide users with a comment function so that users can post comments during the playback of media files. In the prior art, these comments are arranged linearly, and the receiving party of any media file is sensoryly separated from the receiving of the media file and the comment content, and it is impossible for multiple people to receive and comment on the media file. A sense of presence.
发明内容Summary of the invention
本申请实施例的一个目的是提供一种用于音频数据的处理方法的新技术方案。An objective of the embodiments of the present application is to provide a new technical solution for a processing method of audio data.
根据本申请的第一方面,提供了一种音频数据的处理方法,其包括:According to the first aspect of the present application, there is provided an audio data processing method, which includes:
获取在主音频数据的播放过程中产生的音频反馈数据;Obtain audio feedback data generated during the playback of the main audio data;
将所述音频反馈数据与所述主音频数据合并,生成合并后的音频数据供播放。The audio feedback data and the main audio data are combined to generate combined audio data for playback.
可选地,所述将所述音频反馈数据与所述主音频数据合并,包括:Optionally, the combining the audio feedback data with the main audio data includes:
获取在所述主音频数据的设定播放时段内产生的所述音频反馈数据的数量;Acquiring the quantity of the audio feedback data generated within the set playing period of the main audio data;
根据所述数量确定对应的合并效果,其中,所述合并效果至少反映参与合并的各数据的音量配比;Determine the corresponding merging effect according to the number, where the merging effect at least reflects the volume ratio of each data participating in the merging;
根据所述合并效果,将在所述设定播放时段内产生的所述音频反馈数据与所述主音频数据合并。According to the merging effect, the audio feedback data generated in the set play period is merged with the main audio data.
可选地,所述将所述音频反馈数据与所述主音频数据合并,包括:Optionally, the combining the audio feedback data with the main audio data includes:
根据每一所述音频反馈数据在产生时所对应的主音频数据的播放时段,检测所述主音频数据的、邻近每一所述音频反馈数据的空闲间隙;Detecting the idle gap of the main audio data adjacent to each of the audio feedback data according to the play period of the main audio data corresponding to each of the audio feedback data when it is generated;
将每一所述音频反馈数据与相邻近的所述空闲间隙对齐,进行所述合并。Align each of the audio feedback data with the adjacent free gaps, and perform the combination.
可选地,所述将所述音频反馈数据与所述主音频数据合并,包括:Optionally, the combining the audio feedback data with the main audio data includes:
设置包括所述主音频数据及所述音频反馈数据在内的每一条数据各自占用互不相同的音轨;Setting each piece of data including the main audio data and the audio feedback data to occupy a different audio track;
通过音轨合成将所述音频反馈数据与所述主音频数据合并。The audio feedback data is combined with the main audio data through audio track synthesis.
可选地,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:Optionally, the acquiring audio feedback data generated during the playback of the main audio data includes:
获取在主音频数据的播放过程中产生的符合目标分类的音频反馈数据;Acquire audio feedback data that meets the target classification generated during the playback of the main audio data;
所述生成合并后的音频数据供播放,包括:The generating of the combined audio data for playback includes:
生成合并后的音频数据供符合所述目标分类的终端设备播放。The combined audio data is generated for playback by terminal devices that meet the target classification.
可选地,所述方法还包括:Optionally, the method further includes:
获取播放所述主音频数据的终端设备所对应的设定用户特征的特征值;Acquiring the characteristic value of the set user characteristic corresponding to the terminal device playing the main audio data;
根据所述特征值,确定所述终端设备所对应的目标分类。According to the characteristic value, the target classification corresponding to the terminal device is determined.
可选地,所述设定用户特征包括对应于所述终端设备的用户在所述主音频数据的播放过程中产生的音频反馈数据的设定特征。Optionally, the set user characteristics include set characteristics corresponding to audio feedback data generated by the user of the terminal device during the playback process of the main audio data.
可选地,所述主音频数据为视频文件的音频数据,所述方法还包括:Optionally, the main audio data is audio data of a video file, and the method further includes:
在所述视频文件的视频播放窗口中,以弹幕形式展示代表所述音频反馈数据的音频波形。In the video playback window of the video file, the audio waveform representing the audio feedback data is displayed in the form of a bullet screen.
可选地,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:Optionally, the acquiring audio feedback data generated during the playback of the main audio data includes:
获取在主音频数据的播放过程中发表的语音评论,并至少将所述语音评论作为所述音频反馈数据。Acquire the voice comment published during the playing process of the main audio data, and use the voice comment as the audio feedback data at least.
可选地,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:Optionally, the acquiring audio feedback data generated during the playback of the main audio data includes:
获取在主音频数据的播放过程中发表的文字评论;Acquire text comments published during the playback of the main audio data;
将所述文字评论转化为对应的音频数据,并至少将转化后的音频数据作为所述音频反馈数据。The text comment is converted into corresponding audio data, and at least the converted audio data is used as the audio feedback data.
可选地,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:Optionally, the acquiring audio feedback data generated during the playback of the main audio data includes:
获取在主音频数据的播放过程中发表的表情特征;Obtain the expression features published during the playback of the main audio data;
将所述表情特征转化为对应的音频数据,并至少将转化后的音频数据作为所述音频反馈数据。The expression feature is converted into corresponding audio data, and at least the converted audio data is used as the audio feedback data.
可选地,所述主音频数据为直播媒体文件的音频数据。Optionally, the main audio data is audio data of a live media file.
可选地,所述方法还包括:Optionally, the method further includes:
响应于开启现场音效功能的指令,执行所述将所述音频反馈数据与所述主音频数据 合并的操作。In response to the instruction to turn on the live sound effect function, the operation of merging the audio feedback data with the main audio data is performed.
根据本申请的第二方面,还提供了一种音频数据的处理方法,由终端设备实施,所述方法包括:According to the second aspect of the present application, there is also provided a method for processing audio data, which is implemented by a terminal device, and the method includes:
获取选择播放的主音频数据;Obtain the main audio data selected to be played;
获取对应于所述主音频数据的现场音频数据,其中,所述现场音频数据至少包括其他用户针对所述主音频数据的音频反馈数据;Acquiring live audio data corresponding to the main audio data, where the live audio data includes at least other users' audio feedback data for the main audio data;
执行在播放所述主音频数据的同时播放所述现场音频数据的处理操作。Perform a processing operation of playing the live audio data while playing the main audio data.
可选地,所述现场音频数据还包括所述终端设备对应的用户针对所述主音频数据反馈的音频反馈数据。Optionally, the live audio data further includes audio feedback data that the user corresponding to the terminal device feeds back on the main audio data.
根据本申请的第三方面,还提供了一种音频数据的处理方法,由终端设备实施,所述方法包括:According to the third aspect of the present application, there is also provided a method for processing audio data, which is implemented by a terminal device, and the method includes:
响应于播放目标媒体文件的操作,播放所述目标媒体文件,其中,所述目标媒体文件包括主音频数据;In response to the operation of playing the target media file, playing the target media file, where the target media file includes main audio data;
获取对应于所述主音频数据的现场音频数据,其中,所述现场音频数据至少包括其他用户针对所述主音频数据的音频反馈数据;Acquiring live audio data corresponding to the main audio data, where the live audio data includes at least other users' audio feedback data for the main audio data;
执行在播放所述目标媒体文件的过程中,随同所述主音频数据播放所述现场音频数据的处理操作。Perform a processing operation of playing the live audio data along with the main audio data during the process of playing the target media file.
可选地,所述获取对应于所述主音频数据的现场音频数据,包括:Optionally, the acquiring live audio data corresponding to the main audio data includes:
从服务器获取其他用户针对所述主音频数据的音频反馈数据,作为所述现场音频数据。Acquire audio feedback data of other users for the main audio data from the server as the live audio data.
可选地,所述方法还包括:Optionally, the method further includes:
获取所述终端设备对应的用户针对所述主音频数据的音频反馈数据;Acquiring audio feedback data of the user corresponding to the terminal device for the main audio data;
将所述用户的音频反馈数据上传至服务器。Upload the audio feedback data of the user to the server.
根据本申请的第四方面,还提供了一种音频数据的处理装置,包括:According to the fourth aspect of the present application, there is also provided an audio data processing device, including:
数据获取模块,用于获取在主音频数据的播放过程中产生的音频反馈数据;以及,A data acquisition module for acquiring audio feedback data generated during the playback of the main audio data; and,
音频处理模块,用于将所述音频反馈数据与所述主音频数据合并,生成合并后的音频数据供播放。The audio processing module is used to combine the audio feedback data with the main audio data to generate the combined audio data for playback.
根据本申请的第五方面,还提供了一种电子设备,包括根据本申请的第四方面所述的处理装置;或者,包括:According to the fifth aspect of the present application, there is also provided an electronic device, including the processing device according to the fourth aspect of the present application; or, including:
存储器,用于存储可执行的指令;Memory, used to store executable instructions;
处理器,用于根据所述可执行的指令的控制,运行所述电子设备执行根据本申请的第一方面、第二方面或者第三方面所述的处理方法。The processor is configured to run the electronic device according to the control of the executable instruction to execute the processing method according to the first, second or third aspect of the present application.
可选地,所述电子设备是不具有显示装置的终端设备。Optionally, the electronic device is a terminal device without a display device.
可选地,所述电子设备是终端设备,所述电子设备是终端设备,所述终端设备还包括输入装置,所述输入装置用于供对应用户针对主音频数据输入反馈内容,并将所述反馈内容发送至所述处理装置或者处理器,以供所述处理装置或者处理器根据所述反馈内容生成所述对应用户针对主音频数据的音频反馈数据。Optionally, the electronic device is a terminal device, the electronic device is a terminal device, and the terminal device further includes an input device for the corresponding user to input feedback content for the main audio data, and The feedback content is sent to the processing device or the processor, so that the processing device or the processor generates audio feedback data of the corresponding user for the main audio data according to the feedback content.
可选地,所述电子设备是终端设备,所述终端设备还包括音频输出装置,所述音频输出装置用于根据所述处理装置或者所述处理器的控制,在播放主音频数据的同时播放对应的音频反馈数据。Optionally, the electronic device is a terminal device, and the terminal device further includes an audio output device, and the audio output device is configured to play the main audio data while playing the main audio data according to the control of the processing device or the processor. Corresponding audio feedback data.
根据本申请的第六方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有可被计算机读取执行的计算机程序,所述计算机程序用于在被所述计算机读取运行时,执行根据本申请的第一方面、第二方面或者第三方面所述的处理方法。According to the sixth aspect of the present application, there is also provided a computer-readable storage medium, the computer-readable storage medium stores a computer program that can be read and executed by a computer, and the computer program is used to be read by the computer. When it is running, execute the processing method according to the first, second or third aspect of the present application.
本申请实施例的一个有益效果在于:本实施例的音频数据的处理方法将主音频数据与在主音频数据的播放过程中产生的音频反馈数据合并,以使得任意终端设备能够在播放主音频数据的同时,还能够播放来自其他用户的音频反馈数据,这样,任意用户在分别通过各自的终端设备独自收听主音频数据时,也能够获得与其他用户一起收听主音频数据、并发表评论的现场听觉效果,获得现场体验。One beneficial effect of the embodiment of the present application is that the audio data processing method of this embodiment combines the main audio data with the audio feedback data generated during the playback of the main audio data, so that any terminal device can play the main audio data. At the same time, it can also play audio feedback data from other users. In this way, any user can listen to the main audio data with other users and post comments when listening to the main audio data through their respective terminal devices. Effect, get on-site experience.
通过以下参照附图对本申请的示例性实施例的详细描述,本申请的其它特征及其优点将会变得清楚。Through the following detailed description of exemplary embodiments of the present application with reference to the accompanying drawings, other features and advantages of the present application will become clear.
附图说明Description of the drawings
被结合在说明书中并构成说明书的一部分的附图示出了本申请的实施例,并且连同其说明一起用于解释本申请的原理:The accompanying drawings incorporated in the specification and constituting a part of the specification illustrate the embodiments of the present application, and together with the description are used to explain the principle of the present application:
图1a是示意本申请实施例的作用效果的应用场景示意图;Figure 1a is a schematic diagram of an application scenario illustrating the effects of an embodiment of the present application;
图1b是可用于实现本申请实施例的音频数据的处理方法的一种可供选择的数据处理系统的硬件配置结构图;Figure 1b is a hardware configuration diagram of an alternative data processing system that can be used to implement the audio data processing method of the embodiment of the present application;
图2是根据本申请实施例的处理方法的流程示意图;Fig. 2 is a schematic flowchart of a processing method according to an embodiment of the present application;
图3是在目标媒体文件的播放窗口中引导用户输入音频反馈数据的例子的示意图;FIG. 3 is a schematic diagram of an example of guiding the user to input audio feedback data in the play window of the target media file;
图4是在混音时将音频反馈数据插入主音频数据的相邻近的空闲间隙中的示意图;4 is a schematic diagram of inserting audio feedback data into adjacent free gaps of main audio data during audio mixing;
图5是引导用户输入开启现场音效功能的指令的例子的示意图;5 is a schematic diagram of an example of guiding a user to input an instruction to turn on a live sound effect function;
图6a是根据本申请一个例子的处理方法的交互示意图;Fig. 6a is an interactive schematic diagram of a processing method according to an example of the present application;
图6b是根据本申请另一个例子的处理方法的交互示意图;Fig. 6b is an interactive schematic diagram of a processing method according to another example of the present application;
图7是根据本申请另一实施例的处理方法的流程示意图;FIG. 7 is a schematic flowchart of a processing method according to another embodiment of the present application;
图8是根据本申请第三实施例的处理方法的流程示意图;FIG. 8 is a schematic flowchart of a processing method according to a third embodiment of the present application;
图9是根据本申请实施例的音频数据的处理装置的示意性原理框图;Fig. 9 is a schematic functional block diagram of an audio data processing device according to an embodiment of the present application;
图10a是根据本申请一个实施例的电子设备的示意性原理框图;Fig. 10a is a schematic block diagram of an electronic device according to an embodiment of the present application;
图10b是是根据本申请另一个实施例的电子设备的示意性原理框图。Fig. 10b is a schematic block diagram of an electronic device according to another embodiment of the present application.
具体实施方式Detailed ways
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any restriction on the application and its application or use.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the specification.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all the examples shown and discussed herein, any specific value should be interpreted as merely exemplary and not as limiting. Therefore, other examples of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following drawings, so once a certain item is defined in one drawing, it does not need to be further discussed in subsequent drawings.
目前,媒体文件已经成为信息传递的主要媒介,随着互联网技术的发展,人们不仅可以选择在播放媒体文件的场所与他人共同欣赏媒体文件的内容,还可以通过自己的终端设备在各种场所独自欣赏媒体文件的内容。以上媒体文件可以是包含音频数据和图像数据的视频文件,支持播放视频文件的终端设备要求具备显示装置和音频输出装置,以上媒体文件也可以是仅包含音频数据的纯音频文件,支持播放纯音频文件的终端设备要求具备音频输出装置,但可以不具备显示装置,例如,智能音箱等。在此,对于多人共同欣赏的现场模式,每个人都可以感受到现场中其他人对于媒体文件的各种声音反馈,这些声音反馈包括语言评论的反馈,以及例如是高兴、叹气、伤心、鸦雀无声等表情特征的反馈等,进而使得人们能够在现场获得丰富、立体的感官体验。而对于个人通过自 己的终端设备单独欣赏的线上模式,目前只能简单地通过网络发表文字评论,并无法获得现场的感官体验,但该种模式却具有现场模式无法比拟的便捷性。At present, media files have become the main medium of information transmission. With the development of Internet technology, people can not only choose to enjoy the content of media files with others in the place where the media files are played, but also can use their own terminal devices to independently enjoy the content of the media files in various places Enjoy the content of the media file. The above media files can be video files that contain audio data and image data. The terminal equipment that supports the playback of video files requires a display device and an audio output device. The above media files can also be pure audio files that only contain audio data, supporting pure audio playback The file terminal device requires an audio output device, but may not have a display device, for example, a smart speaker. Here, for the live mode that many people appreciate together, everyone can feel the various voice feedbacks of other people on the media files. These voice feedbacks include feedback of language comments, and for example, happy, sigh, sad, and silent Such as the feedback of expression characteristics, etc., so that people can get rich and three-dimensional sensory experience on the spot. As for the online mode that individuals enjoy individually through their own terminal devices, at present, they can only simply post text comments through the Internet, and cannot obtain the on-site sensory experience, but this mode has the convenience that the on-site mode cannot match.
为了解决线上模式对于现场感官体验的缺失,本申请实施例可以在用户通过个人的终端设备欣赏媒体文件的内容时,至少合并他人对于同一媒体文件的音频反馈数据,随同媒体文件的主音频数据一起播放,进而获得相当于现场模式的感官体验。一种应用场景例如如图1a所示,用户A、用户B、用户C、用户D在相同的时间或者不同的时间,分别通过各自的终端设备1200在不同的空间欣赏同一媒体文件的内容,其中,用户A、用户B、用户C在同一设定播放时段均发表了语言评论,由于空间上的分离,每位用户实际上并无法感知其他用户对于媒体文件的声音反馈,但通过本申请实施例的处理,可以在通过每位用户的终端设备1200播放媒体文件的过程中,合并其他用户对于同一媒体文件的音频反馈数据随同媒体文件的主音频数据一起播放,这就使得每位用户都能够感受到其他用户对于媒体文件的声音反馈,相当于如图1a中下方所示的用户A、用户B、用户C和用户D在同一场所通过同一终端设备共同欣赏媒体文件的现场效果一般。In order to solve the lack of on-site sensory experience in the online mode, when a user enjoys the content of a media file through a personal terminal device, at least the audio feedback data of others for the same media file can be combined with the main audio data of the media file. Play together to get a sensory experience equivalent to live mode. An application scenario is shown in Fig. 1a, where user A, user B, user C, and user D enjoy the content of the same media file in different spaces through their respective terminal devices 1200 at the same time or at different times. , User A, User B, and User C all posted language comments during the same set play time period. Due to the spatial separation, each user cannot actually perceive the sound feedback of other users on the media file. However, through the embodiments of this application In the process of playing media files through each user’s terminal device 1200, the audio feedback data of other users for the same media file can be combined and played together with the main audio data of the media file, so that each user can experience The sound feedback of other users to the media file is equivalent to the on-site effect of the user A, user B, user C, and user D enjoying the media file through the same terminal device in the same place as shown in the lower part of FIG. 1a.
<硬件配置><Hardware Configuration>
图1b为可以应用根据本申请实施例的音频数据的处理方法的一种数据处理系统的组成结构示意图。Fig. 1b is a schematic diagram of the structure of a data processing system to which the audio data processing method according to an embodiment of the present application can be applied.
如图1b所示,本实施例的数据处理系统1000包括服务器1100、终端设备1200以及网络1300。As shown in FIG. 1b, the data processing system 1000 of this embodiment includes a server 1100, a terminal device 1200, and a network 1300.
服务器1100例如可以是刀片服务器、机架式服务器等,服务器1100也可以是部署在云端的服务器集群,在此不做限定。The server 1100 may be, for example, a blade server, a rack server, etc. The server 1100 may also be a server cluster deployed in the cloud, which is not limited here.
如图1b所示,服务器1100可以包括处理器1110、存储器1120、接口装置1130、通信装置1140、显示装置1150和输入装置1160。处理器1110例如可以是中央处理器CPU等。存储器1120例如包括ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1130例如包括USB接口、串行接口等。通信装置1140例如能够进行有线或无线通信。显示装置1150例如是液晶显示屏。输入装置1160例如可以包括触摸屏、键盘等。As shown in FIG. 1b, the server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160. The processor 1110 may be, for example, a central processing unit CPU or the like. The memory 1120 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes, for example, a USB interface, a serial interface, and the like. The communication device 1140 can perform wired or wireless communication, for example. The display device 1150 is, for example, a liquid crystal display. The input device 1160 may include, for example, a touch screen, a keyboard, and the like.
本实施例中,服务器1100可用于参与实现根据本申请任意实施例的数据处理方法。In this embodiment, the server 1100 can be used to participate in implementing the data processing method according to any embodiment of the present application.
应用于本申请实施例中,服务器1100的存储器1120用于存储指令,所述指令用于控制所述处理器1110进行操作以支持实现根据本申请任意实施例的处理方法。技术人员可以根据本申请所公开方案设计指令。指令如何控制处理器进行操作,这是本领域公知, 故在此不再详细描述。Applied to the embodiment of the present application, the memory 1120 of the server 1100 is used to store instructions, and the instructions are used to control the processor 1110 to operate to support the implementation of the processing method according to any embodiment of the present application. Technicians can design instructions according to the scheme disclosed in this application. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
本领域技术人员应当理解,尽管在图1b中示出了服务器1100的多个装置,但是,本申请实施例的服务器1100可以仅涉及其中的部分装置,例如,只涉及处理器1110和存储器1120。Those skilled in the art should understand that although multiple devices of the server 1100 are shown in FIG. 1b, the server 1100 in the embodiment of the present application may only involve some of the devices, for example, only the processor 1110 and the memory 1120.
如图1b所示,终端设备1200可以包括处理器1210、存储器1220、接口装置1230、通信装置1240、显示装置1250、输入装置1260、音频输出装置1270、音频拾取装置1280,等等。其中,处理器1210可以是中央处理器CPU、微处理器MCU等。存储器1220例如包括ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1230例如包括USB接口、耳机接口等。通信装置1240例如能够进行有线或无线通信。显示装置1250例如是液晶显示屏、触摸显示屏等。输入装置1260例如可以包括触摸屏、键盘等。终端设备1200可以通过音频输出装置1270输出音频信息,该音频输出装置1270例如包括扬声器。终端设备1200可以通过音频拾取装置1280拾取用户输入的语音信息,该音频拾取装置1280例如包括麦克风。As shown in FIG. 1b, the terminal device 1200 may include a processor 1210, a memory 1220, an interface device 1230, a communication device 1240, a display device 1250, an input device 1260, an audio output device 1270, an audio pickup device 1280, and so on. The processor 1210 may be a central processing unit (CPU), a microprocessor MCU, or the like. The memory 1220 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like. The interface device 1230 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1240 can perform wired or wireless communication, for example. The display device 1250 is, for example, a liquid crystal display, a touch display, or the like. The input device 1260 may include, for example, a touch screen, a keyboard, and the like. The terminal device 1200 may output audio information through an audio output device 1270, which includes, for example, a speaker. The terminal device 1200 may pick up the voice information input by the user through an audio pickup device 1280, which includes, for example, a microphone.
终端设备1200可以是智能手机、便携式电脑、台式计算机、平板电脑、可穿戴设备、智能音箱、机顶盒、智能电视、录音笔,摄录机等,其中,终端设备1200可以具有音频输出装置1270以进行媒体文件的播放,也可以连接音频输出装置1270以进行媒体文件的播放。The terminal device 1200 may be a smart phone, a portable computer, a desktop computer, a tablet computer, a wearable device, a smart speaker, a set-top box, a smart TV, a voice recorder, a camcorder, etc., wherein the terminal device 1200 may have an audio output device 1270 to perform To play media files, an audio output device 1270 can also be connected to play media files.
本实施例中,终端设备1200可用于参与实现根据本申请任意实施例的数据处理方法。In this embodiment, the terminal device 1200 can be used to participate in implementing the data processing method according to any embodiment of the present application.
应用于本申请的实施例中,终端设备1200的存储器1220用于存储指令,所述指令用于控制所述处理器1210进行操作以支持实现根据本申请任意实施例的处理方法。技术人员可以根据本申请所公开方案设计指令。指令如何控制处理器进行操作,这是本领域公知,故在此不再详细描述。In the embodiment applied to the present application, the memory 1220 of the terminal device 1200 is used to store instructions, and the instructions are used to control the processor 1210 to operate to support the implementation of the processing method according to any embodiment of the present application. Technicians can design instructions according to the scheme disclosed in this application. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
本领域技术人员应当理解,尽管在图1b中示出了终端设备1200的多个装置,但是,本申请实施例的终端设备1200可以仅涉及其中的部分装置,例如,只涉及处理器1210、存储器1220等。Those skilled in the art should understand that although multiple devices of the terminal device 1200 are shown in FIG. 1b, the terminal device 1200 in the embodiment of the present application may only involve some of the devices, for example, only the processor 1210 and the memory 1220 and so on.
网络1300可以是无线网络也可以是有线网络,可以是局域网也可以是广域网。终端设备1200可以通过信网络1300与服务器1100进行通信。The network 1300 may be a wireless network or a wired network, and may be a local area network or a wide area network. The terminal device 1200 may communicate with the server 1100 through the network 1300.
图1b所示的数据处理系统1000仅是解释性的,并且决不是为了要限制本申请、其应用或用途。例如,尽管图1b仅示出一个服务器1100和一个终端设备1200,但不意味 着限制各自的数量,该数据处理系统1000中可以包含多个服务器1100和/或多个终端设备1200。The data processing system 1000 shown in FIG. 1b is only for explanatory purposes, and is by no means intended to limit the application, its applications or uses. For example, although FIG. 1b only shows one server 1100 and one terminal device 1200, it does not mean to limit the respective numbers. The data processing system 1000 may include multiple servers 1100 and/or multiple terminal devices 1200.
根据本申请任意实施例的音频数据的处理方法可以根据需要由服务器1100实施,也可以由终端设备1200实施,还可以由服务器1100和终端设备1200共同实施,在此不做限定。The audio data processing method according to any embodiment of the present application may be implemented by the server 1100 as required, may also be implemented by the terminal device 1200, or jointly implemented by the server 1100 and the terminal device 1200, which is not limited herein.
<方法实施例1><Method Example 1>
图2是根据本申请实施例的音频数据的处理方法的流程示意图。Fig. 2 is a schematic flowchart of a method for processing audio data according to an embodiment of the present application.
根据图2所示,本实施例的处理方法可以包括如下步骤S2100~S2200。According to FIG. 2, the processing method of this embodiment may include the following steps S2100 to S2200.
步骤S2100,获取在主音频数据的播放过程中产生的音频反馈数据。Step S2100: Obtain audio feedback data generated during the playback of the main audio data.
本实施例中,主音频数据是所播放的目标媒体文件的音频数据。目标媒体文件可以是纯音频文件,也可以是视频文件。目标媒体文件可以是直播文件,也可以是录播文件,在此不做限定。In this embodiment, the main audio data is the audio data of the target media file to be played. The target media file can be a pure audio file or a video file. The target media file can be a live broadcast file or a recorded broadcast file, which is not limited here.
本实施例中,该步骤S2100中可以是获取在主音频数据的播放过程中产生的所有音频反馈数据进行以下步骤S2200的合并,也可以是根据设定条件,获取在主音频数据的播放过程中产生的部分音频反馈数据进行以下步骤S2200的合并,在此不做限定。In this embodiment, in this step S2100, all the audio feedback data generated during the playback of the main audio data may be acquired and combined in the following step S2200, or it may be acquired during the playback of the main audio data according to the set conditions. The generated part of the audio feedback data is merged in the following step S2200, which is not limited here.
本实施例中,可以将任意用户在主音频数据的播放过程中,也即目标媒体文件的播放过程中,发表的任意一次反馈内容对应一条音频反馈数据,例如,该任意一次反馈内容可以是一次语音评论,该次语音评论即将形成一条音频反馈数据;又例如,该任意一次反馈内容也可以是一次文字评论,将该文字评论可以被转换为对应的音频数据,而转换得到的音频数据即可形成一条音频反馈数据;再例如,该任意一次反馈内容还可以是输入的表情特征,该表情特征可以是输入的表情符号、声音表情等,将该表情特征转换为对应的音频数据,也可形成一条音频反馈数据。In this embodiment, any feedback content published by any user during the playback of the main audio data, that is, during the playback of the target media file, corresponds to a piece of audio feedback data. For example, the feedback content may be one piece of audio feedback. Voice comment, this voice comment is about to form a piece of audio feedback data; for example, the content of any one feedback can also be a text comment, and the text comment can be converted into corresponding audio data, and the converted audio data is enough A piece of audio feedback data is formed; for another example, the content of any one-time feedback can also be an input expression feature, the expression feature can be an input emoji, voice expression, etc., the expression feature can be converted into corresponding audio data, and it can also be formed One piece of audio feedback data.
在一个例子中,该步骤S2100中获取在主音频数据的播放过程中产生的音频反馈数据,可以包括:获取在主音频数据的播放过程中反馈的语音评论,并至少将语音评论作为在主音频数据的播放过程中产生的音频反馈数据。In an example, obtaining audio feedback data generated during the playback of the main audio data in step S2100 may include: obtaining voice comments fed back during the playback of the main audio data, and at least using the voice comments as the main audio Audio feedback data generated during data playback.
该例子中,用户可以通过各自的终端设备输入语音评论,以图3为例,可以在播放目标媒体文件的播放窗口内提供引导用户输入语音评论的入口(例如:按住发语音评论),以供用户通过该入口,例如长按该入口,发表语音评论,终端设备可以通过麦克风等音频输入装置采集该语音评论,形成音频反馈数据。In this example, users can input voice comments through their respective terminal devices. Taking Figure 3 as an example, an entry for guiding users to input voice comments (for example: press and hold to send voice comments) can be provided in the play window of the target media file to be played. For the user to post a voice comment through the portal, for example, long-press the portal, the terminal device can collect the voice comment through an audio input device such as a microphone to form audio feedback data.
该例子中,用户也可以通过操作终端设备设置的物理按键,发表语音评论,在此不做限定。In this example, the user can also post a voice comment by operating the physical buttons set by the terminal device, which is not limited here.
在一个例子中,该步骤S2100中获取在主音频数据的播放过程中产生的音频反馈数据,也可以包括:获取在主音频数据的播放过程中反馈的文字评论;将该文字评论转化为对应的音频数据,并至少将转化后的音频数据作为在主音频数据的播放过程中产生的音频反馈数据。In an example, obtaining audio feedback data generated during the playback of the main audio data in step S2100 may also include: obtaining text comments fed back during the playback of the main audio data; and converting the text comments into corresponding Audio data, and at least use the converted audio data as audio feedback data generated during the playback of the main audio data.
该例子中,在将文字评论转换为对应的音频数据时,可以根据预先采集的对应用户的声音获得该用户的声音特征,进而根据该声音特征进行文字评论的转换,以使得转换得到的音频数据体现该用户的声音特征。该例子中,也可以根据默认的声音特征进行文字评论的转换,该默认的声音特征可以是系统设置的声音特征,也可以是用户选择的声音特征,在此不做限定。In this example, when the text comment is converted into corresponding audio data, the user’s voice feature can be obtained according to the corresponding user’s voice collected in advance, and then the text comment is converted based on the voice feature, so that the converted audio data Reflect the voice characteristics of the user. In this example, the text comment can also be converted based on the default voice feature. The default voice feature can be a voice feature set by the system or a voice feature selected by the user, which is not limited here.
该例子中,在将文字评论转换为对应的音频数据时,还可以识别文字评论表达的情感特征,以使得转换得到的音频数据体现该文字评论所想表达的情感特征。In this example, when the text comment is converted into corresponding audio data, the emotional characteristics expressed by the text comment can also be recognized, so that the converted audio data reflects the emotional characteristics intended by the text comment.
在该例子中,用户可以通过终端设备提供的物理键盘、虚拟键盘或者触摸屏等输入文字评论的内容,也可以通过简单地选择终端设备提供的预先设置的文字内容,来发表文字评论。In this example, the user can input the content of the text comment through the physical keyboard, virtual keyboard or touch screen provided by the terminal device, and can also post the text comment by simply selecting the preset text content provided by the terminal device.
在一个例子中,步骤S2100中获取在主音频数据的播放过程中产生的音频反馈数据,还可以包括:获取在主音频数据的播放过程中反馈的表情特征;将该表情特征转化为对应的音频数据,并至少将转化后的音频数据作为在主音频数据的播放过程中产生的音频反馈数据。In an example, obtaining the audio feedback data generated during the playback of the main audio data in step S2100 may also include: obtaining the expression characteristics fed back during the playback of the main audio data; and converting the expression characteristics into corresponding audio Data, and at least use the converted audio data as audio feedback data generated during the playback of the main audio data.
在该例子中,表情特征可以预先存储在终端设备中,用户可以通过选择能够表达自己情感的表情特征,来对播放中的主音频数据进行情感反馈。表情特征可以包括符号表情和声音表情等。声音表情又可以包括语音表情和/或音效表情等。In this example, the expression features can be pre-stored in the terminal device, and the user can perform emotional feedback on the main audio data being played by selecting the expression features that can express their emotions. The expression features may include symbolic expressions and voice expressions. Voice expressions may include voice expressions and/or sound effect expressions.
符号表情是表达情感或者主题的符号、静态图片或者动态图片等,用于供用户选择在语音交流过程中表达自身的情绪或感受。对于符号表情的转换,可以根据符号表情所表达的情绪或感受,转换得到对应的音频数据。Symbolic expressions are symbols, static pictures or dynamic pictures, etc. that express emotions or themes, and are used for users to choose to express their emotions or feelings in the process of voice communication. For the conversion of symbolic expressions, the corresponding audio data can be converted according to the emotions or feelings expressed by the symbolic expressions.
声音表情是表达特定的情感或者主题的声音内容,用于供用户选择在语音交流过程中表达自身的情绪或感受。对于声音表情的转换,可以直接提取声音表情中的声音内容作为转换后的音频数据。Voice expressions are voice content expressing specific emotions or topics, and are used for users to choose to express their emotions or feelings in the process of voice communication. For voice expression conversion, the voice content in the voice expression can be directly extracted as the converted audio data.
语音表情的声音内容是与该语音表情所表达情感或者主题对应的语音,是有语言内 容的声音表情。该语音表情的声音内容可以是由特定的人员例如名人、明星、声优等根据预设的主题或者内容录制的,也可以是由用户根据自身的情感表达需求录制的。The voice content of the voice expression is the voice corresponding to the emotion or theme expressed by the voice expression, and is the voice expression with language content. The voice content of the voice expression may be recorded by a specific person such as a celebrity, celebrity, voice actress, etc. according to a preset theme or content, or may be recorded by the user according to his own emotional expression needs.
用户通常期望通过语音表情播放时的语言内容来表达自身的情感或情绪。Users usually expect to express their feelings or emotions through the language content when the voice expression is played.
音效表情的声音内容是与音效表情的情感特征对应的音效,是没有语言内容的声音表情。用户通常期望通过音效表情播放时所产生的音效来表达自身的情感或情绪。音效表情的声音内容可以针对各种预设的主题或者情感表达需求录制的音效内容。The sound content of the sound effect expression is the sound effect corresponding to the emotional feature of the sound effect expression, and is the sound expression without language content. Users usually expect to express their feelings or emotions through the sound effects generated when the sound expression is played. The sound content of the sound expression can be recorded for various preset themes or emotional expression needs.
本实施例中,对于目标媒体文件的根据任意时间间隔划分的任意播放时段,该步骤S2100中可以是获取对应该播放时段的累积产生的所有音频反馈数据,也可以是获取在设定时间内产生的对应该播放时段的音频反馈数据。In this embodiment, for any playback period of the target media file divided according to any time interval, this step S2100 can be to obtain all the audio feedback data generated by the accumulation of the corresponding playback period, or to obtain the audio feedback data generated within a set time. The audio feedback data corresponding to the playback period.
例如,以5分钟为间隔划分目标媒体文件,播放0~5分钟为第一个播放时段,播放5~10分钟为第二个播放时段,以此类推。以第一个播放时段为例,步骤S2100可以是获取对应该第一个播放时段的累积产生的所有音频反馈数据,也可以是获取在当天产生的对应该第二个播放时段的音频反馈数据等,在此不做限定。For example, dividing the target media file at intervals of 5 minutes, playing 0 to 5 minutes as the first playing period, playing 5 to 10 minutes as the second playing period, and so on. Taking the first play period as an example, step S2100 can be to obtain all the audio feedback data generated corresponding to the accumulation of the first play period, or to obtain the audio feedback data generated on the day corresponding to the second play period, etc. , There is no limitation here.
又例如,目标媒体文件为直播媒体文件,在任意播放时段产生的音频反馈数据即为在任意播放时段所对应的播放时间内产生的音频反馈数据,因此,对于直播媒体文件,合并后的音频数据将能够反映直播进行时的现场效果。For another example, the target media file is a live media file, and the audio feedback data generated during any playback period is the audio feedback data generated during the playback time corresponding to the arbitrary playback period. Therefore, for the live media file, the combined audio data Will be able to reflect the live effect during the live broadcast.
在一个例子中,该步骤S2100可以由服务器参与实施,例如是图1中的服务器1100。在该例子中,步骤S2100中获取在主音频数据的播放过程中产生的音频反馈数据,可以包括:从各终端设备获取所对应的用户在主音频数据的播放过程中产生的音频反馈数据。In an example, the step S2100 may be implemented by a server, such as the server 1100 in FIG. 1. In this example, obtaining audio feedback data generated during the playback of the main audio data in step S2100 may include: obtaining audio feedback data generated by the corresponding user during the playback of the main audio data from each terminal device.
在一个例子中,该步骤S2200可以由终端设备参与实施,例如是图1中的终端设备1200。该例子中,步骤S2200中获取在主音频数据的播放过程中产生的音频反馈数据,可以包括:从服务器获取其他用户在主音频数据的播放过程中产生的音频反馈数据。In an example, this step S2200 may be implemented by a terminal device, such as the terminal device 1200 in FIG. 1. In this example, obtaining audio feedback data generated during the playback of the main audio data in step S2200 may include: obtaining audio feedback data generated by other users during the playback of the main audio data from the server.
步骤S2200,将获取到的音频反馈数据与主音频数据合并,生成合并后的音频文件供播放。Step S2200: Combine the acquired audio feedback data with the main audio data to generate a combined audio file for playback.
本实施例中的合并可以是利用任意一种现有的混音手段,对音频反馈数据与主音频数据进行混音,以形成混有音频反馈数据的音频文件。The combination in this embodiment may use any existing mixing means to mix the audio feedback data with the main audio data to form an audio file mixed with the audio feedback data.
本实施例中的合并也可以指在音频反馈数据与主音频数据之间建立时间上的对应关系,以形成体现该种映射关系的音频文件,进而至少能够通过不同的声道分别播放主音频数据和音频反馈数据,以对于用户而言,达到“混音”的效果。在此,对于音频反馈数据部分,可以对所有音频反馈数据进行混音而占用一个声道,也可以是将所有音频反 馈数据处理成占用多个声道的多个音频文件,在此不做限定,只要是对于用户而言,能够感受到音频反馈数据随同主音频数据一起播放的“混音”效果即可。The merging in this embodiment may also refer to the establishment of a temporal correspondence between the audio feedback data and the main audio data to form an audio file embodying the mapping relationship, and at least the main audio data can be played through different channels. And audio feedback data to achieve the effect of "mixing" for the user. Here, for the audio feedback data part, all audio feedback data can be mixed to occupy one channel, or all audio feedback data can be processed into multiple audio files occupying multiple channels, which is not limited here. , As long as the user can feel the "mixing" effect of the audio feedback data being played together with the main audio data.
本实施例中,对于任意正在接收目标媒体文件的用户而言,根据本步骤的合并处理可以随着目标媒体文件的播放及音频反馈数据的不断产生而持续进行,以根据持续产生的合并后的音频文件进行目标媒体文件的继续播放,直至播放结束。In this embodiment, for any user who is receiving the target media file, the merging process according to this step can be performed continuously as the target media file is played and the audio feedback data is continuously generated, so as to be based on the continuously generated merged The audio file continues to play the target media file until the playback ends.
在一个例子中,本步骤S2200可以由服务器实施,也可以由终端设备实施,在此不做限定。In an example, this step S2200 can be implemented by a server or a terminal device, which is not limited here.
在一个例子中,本步骤S2200中将获取到的音频反馈数据与主音频数据合并,可以包括:根据每一音频反馈数据在产生时所对应的主音频数据的播放时段,进行音频反馈数据与主音频数据的合并。In an example, combining the acquired audio feedback data with the main audio data in this step S2200 may include: performing the audio feedback data and the main audio data according to the playback period of the main audio data corresponding to each audio feedback data when it is generated. Consolidation of audio data.
该例子中,主音频数据的播放时段是基于目标媒体文件的相对播放时间划分,其中,相对播放时间的相对参照点为目标媒体文件的起始播放点。例如,播放0~5分钟为第一个播放时段,播放5~10分钟为第二个播放时段,以此类推。In this example, the play period of the main audio data is divided based on the relative play time of the target media file, where the relative reference point of the relative play time is the start play point of the target media file. For example, playing 0-5 minutes is the first playing period, playing 5-10 minutes is the second playing period, and so on.
该例子中,可以根据需要设定播放时段的长度,该长度可以固定不变,也可以适应性调整。In this example, the length of the playback period can be set according to needs, and the length can be fixed and can also be adjusted adaptively.
例如,设定的播放时段的长度为5分钟,一音频反馈数据在产生时所对应的主音频数据的播放时段为目标媒体文件播放5~10分钟的播放时段。又例如,设定的播放时段的长度为2分钟,一音频反馈数据在产生时所对应的主音频数据的播放时段为目标媒体文件播放1~2分钟的播放时段。For example, the length of the set playing period is 5 minutes, and the playing period of the main audio data corresponding to the audio feedback data when it is generated is the playing period of 5-10 minutes of playing the target media file. For another example, the length of the set playing period is 2 minutes, and the playing period of the main audio data corresponding to the audio feedback data generated is the playing period of 1 to 2 minutes of playing the target media file.
该例子中,每一终端设备在获得相对应的用户产生的音频反馈数据时,可以记录该音频反馈数据的产生时间及所对应的播放时段。In this example, when each terminal device obtains the audio feedback data generated by the corresponding user, it can record the generation time of the audio feedback data and the corresponding playback period.
该例子中,在进行音频数据的合并时,至少对于部分音频反馈数据,可以设置其中的每一音频反馈数据的起始位置与所对应的主音频数据的播放时段的起始位置对齐。In this example, when the audio data is merged, at least for part of the audio feedback data, the starting position of each audio feedback data can be set to be aligned with the starting position of the corresponding main audio data play period.
该例子中,在进行音频数据的合并时,至少对于部分音频反馈数据,可以允许其中的每一音频反馈数据的起始位置相对所对应的主音频数据的播放时段发生滞后偏移。根据该例子的合并处理,用户可以在通过个人的终端设备播放合并后的音频数据时,感受到所有用户的音频反馈随着主音频数据的播放而发生变化的情况,包括反馈数量的变化和/或反馈内容的变化等,提供更真实的现场体验。In this example, when the audio data is merged, at least for part of the audio feedback data, the start position of each audio feedback data can be allowed to lag behind the corresponding play period of the main audio data. According to the merging process of this example, when the user plays the merged audio data through his personal terminal device, he can feel that the audio feedback of all users changes with the playback of the main audio data, including changes in the number of feedbacks and/ Or feedback content changes, etc., to provide a more realistic on-site experience.
在一个例子中,本步骤S2200中将获取到的音频反馈数据与主音频数据合并,可以包括以下步骤S2211至S2213:In an example, combining the acquired audio feedback data with the main audio data in this step S2200 may include the following steps S2211 to S2213:
步骤S2211,获取在主音频数据的设定播放时段内产生的音频反馈数据的数量。Step S2211: Obtain the number of audio feedback data generated within the set playing period of the main audio data.
该设定播放时段可以根据实时性要求预先设定。例如,设定以每分钟为单位进行数量统计,则划分得到的设定时段包括:播放0~5分钟的第一播放时段,播放5~10分钟的播放时段,……,以此类推。The set play period can be preset according to real-time requirements. For example, if it is set to perform quantity statistics per minute, the divided set period includes: the first playing period of 0 to 5 minutes, the playing period of 5 to 10 minutes, ..., and so on.
步骤S2212,根据该数量确定对应的合并效果,其中,该合并效果至少反映参与合并的各数据的音量配比。Step S2212: Determine the corresponding merging effect according to the number, where the merging effect at least reflects the volume ratio of each data participating in the merging.
该步骤S2212中,可以预存表示数量与合并效果之间的对应关系的映射数据,以在映射数据中查找对应步骤S2211获取到的数量的合并效果。In this step S2212, the mapping data representing the correspondence between the quantity and the merging effect may be pre-stored, so as to find the merging effect corresponding to the quantity obtained in step S2211 in the mapping data.
例如,该合并效果包括客厅场景效果、剧院场景效果、广场场景效果等。各种场景所对应的数量情况为:客厅场景小于剧院场景,剧院场景小于广场场景。各种场景效果所反映的音量配比情况为:对于音频反馈数据与主音频数据的音量比值,客厅场景小于剧院场景,剧院场景小于广场场景。For example, the combined effect includes a living room scene effect, a theater scene effect, a square scene effect, and so on. The quantity situation corresponding to various scenes is: the living room scene is smaller than the theater scene, and the theater scene is smaller than the square scene. The volume ratio reflected by various scene effects is: For the volume ratio of the audio feedback data to the main audio data, the living room scene is smaller than the theater scene, and the theater scene is smaller than the square scene.
对于客厅场景效果,其对应的数量例如为小于或者等于20人,在客厅场景下,场景内的每一用户均能听清楚其他用户的音频反馈,因此,客厅场景效果反映的音量配比可以设置为:合并后,可在收听主音频数据的内容的基础上,听见参与合并的各音频反馈数据的内容。For the living room scene effect, the corresponding number is, for example, less than or equal to 20 people. In the living room scene, each user in the scene can hear the audio feedback of other users clearly. Therefore, the volume ratio reflected by the living room scene effect can be set It is: after merging, on the basis of listening to the content of the main audio data, the content of each audio feedback data participating in the merging can be heard.
对于剧院场景效果,其对应的数量例如为大于20人,且小于或者等于200人,在该剧院场景下,场景内的各种音频反馈只能模糊可听,因此,剧院场景效果反映的音量配比可以设置为:合并后,可在收听主音频数据的内容的基础上,模糊听见参与合并的各音频反馈数据的内容。For a theater scene effect, the corresponding number is, for example, greater than 20 people and less than or equal to 200 people. In this theater scene, the various audio feedback in the scene can only be vague and audible. Therefore, the volume configuration reflected by the theater scene effect The ratio can be set to: after merging, on the basis of listening to the content of the main audio data, the content of each audio feedback data participating in the merging can be heard vaguely.
对于广场场景效果,其对应的数量例如为大于200人,在该广场场景下,场景内的音频反馈不可听,只能听见各种嘈杂声,因此,剧院场景效果反映的音量配比可以设置为:合并后,只能收听到主音频数据的内容,而无法听见参与合并的各音频反馈数据的内容,即在广场场景下,只能感受到有多人进行音频反馈的杂音。For the square scene effect, the corresponding number is, for example, more than 200 people. In this square scene, the audio feedback in the scene is not audible, and only various noises can be heard. Therefore, the volume ratio reflected by the theater scene effect can be set to : After merging, you can only hear the content of the main audio data, but cannot hear the content of the audio feedback data participating in the merging. That is, in the square scene, you can only feel the noise of multiple people performing audio feedback.
另外,如果在主音频数据的播放过程中,没有产生对应某一播放时段的音频反馈数据,则合并后的音频数据在对应该播放时段或者滞后该播放时段的部分,将只具有主音频数据的内容,终端设备在播放该部分的内容时,用户只能听到主音频数据的音频内容,而没有任何的音频反馈数据的内容,因此,可以感受到所有用户在欣赏该部分内容时鸦雀无声的现场氛围。In addition, if the audio feedback data corresponding to a certain playback period is not generated during the playback of the main audio data, the combined audio data will only have the main audio data in the part that corresponds to the playback period or lags behind the playback period. Content. When the terminal device is playing this part of the content, the user can only hear the audio content of the main audio data without any audio feedback data content. Therefore, all users can feel the silent scene when all users enjoy this part of the content. Atmosphere.
步骤S2213,根据步骤S2212确定的该合并效果,将在该设定播放时段内产生的音 频反馈数据与主音频数据合并。Step S2213: According to the combination effect determined in step S2212, the audio feedback data generated during the set playing period is combined with the main audio data.
该例子中,可以根据确定的合并效果,在主音频数据的对应播放时段进行根据步骤S2213的合并,也可以在主音频数据的对应播放时段的下一播放时段进行根据步骤S2213的合并,在此不做限定。In this example, the combination according to step S2213 can be performed in the corresponding playback period of the main audio data according to the determined combination effect, or the combination according to step S2213 can be performed in the next playback period of the corresponding playback period of the main audio data. Not limited.
以获取在主音频数据的第0~5分钟的播放时段内产生的音频反馈数据的数量为例,在步骤S2211获取到的数量为15,在步骤S2212根据该数量确定的合并效果为客厅场景效果,则在步骤S2213,将根据客厅场景效果,将在0~5分钟的播放时段内产生的音频反馈数据与主音频数据合并,由于合并处理会导致音频反馈数据的播放时间相对同一音频反馈数据的产生时间有所延迟,因此,这可以是将在0~5分钟的播放时段内产生的音频反馈数据与主音频数据的对应第5~10分钟的播放时段的部分合并,或者与主音频数据的对应第2~7分钟的播放时段的部分合并等,具体的延迟时间与处理速度和设定的读取音频反馈数据的采样时间间隔有关,在此不做限定。Taking the number of audio feedback data generated during the 0th to 5th minutes of the main audio data as an example, the number obtained in step S2211 is 15, and the combination effect determined according to the number in step S2212 is the living room scene effect , Then in step S2213, according to the effect of the living room scene, the audio feedback data generated during the 0 to 5 minutes playing period will be merged with the main audio data. Due to the merging process, the playing time of the audio feedback data will be longer than that of the same audio feedback data The generation time is delayed. Therefore, this can be the combination of the audio feedback data generated during the playback period of 0 to 5 minutes with the part of the main audio data corresponding to the playback period of 5 to 10 minutes, or the combination of the main audio data Corresponding to the partial merging of the playback period from 2 to 7 minutes, the specific delay time is related to the processing speed and the set sampling time interval for reading the audio feedback data, which is not limited here.
根据以上步骤S2211~S2213,该例子的合并处理可以使得合并后的音频数据体现音频反馈数据的数量对听觉效果的影响,进而实现对相应数量的观众对主音频数据进行音频反馈的现场效果的模拟,提升用户的现场感受。According to the above steps S2211 to S2213, the merging process of this example can make the merged audio data reflect the impact of the amount of audio feedback data on the auditory effect, and then realize the simulation of the live effect of the corresponding number of audiences on the main audio data. , To enhance the user's on-site experience.
在一个例子中,本步骤S2200中将音频反馈数据与主音频数据合并,可以包括以下步骤S2221~S2222。In an example, combining the audio feedback data with the main audio data in this step S2200 may include the following steps S2221 to S2222.
步骤S2221,根据每一音频反馈数据在产生时所对应的主音频数据的播放时段,检测主音频数据的邻近每一音频反馈数据的空闲间隙。Step S2221: Detect the idle gap of the main audio data adjacent to each audio feedback data according to the play period of the main audio data corresponding to each audio feedback data when it is generated.
该空闲间隙为主音频数据中没有音频内容的时间间隙。This idle gap is a time gap in the main audio data where there is no audio content.
以图4为例,图4中的主音频数据的数据流中,网格部分标识具有音频内容,空白部分表示该主音频数据中存在的空闲间隙,可以在空闲间隙中插入音频反馈数据。Taking FIG. 4 as an example, in the data stream of the main audio data in FIG. 4, the grid part identifies that it has audio content, and the blank part indicates the idle gaps in the main audio data, and audio feedback data can be inserted in the idle gaps.
通过该步骤S2221,可以将主音频数据的每一空闲间隙分别作为一个合并槽位,以在每一个合并槽位进行合并操作。Through this step S2221, each free gap of the main audio data can be used as a combined slot to perform a combined operation in each combined slot.
步骤S2222,将每一音频反馈数据与相邻近的空闲间隙对齐,进行音频反馈数据与主音频数据的合并。Step S2222: align each audio feedback data with an adjacent idle gap, and merge the audio feedback data with the main audio data.
根据该步骤S2222中的对齐可以是将每一音频反馈数据的起始位置与相邻近的空闲间隙的任意位置对齐进行合并,例如,将每一音频反馈数据的起始位置与相邻近的空闲间隙的起始位置相对齐进行合并,在此不做限定。According to the alignment in step S2222, the start position of each audio feedback data can be aligned with any position of the adjacent free gap to merge, for example, the start position of each audio feedback data is aligned with the adjacent free gap. The starting positions of the free gaps are relatively aligned and merged, which is not limited here.
根据以上步骤S2221~S2222,通过将每一音频反馈数据与主音频数据的相邻近的空 闲间隙对齐来进行各数据的合并,可以尽可能地减少音频反馈数据对主音频数据的影响。According to the above steps S2221 to S2222, by aligning each audio feedback data with the adjacent idle gap of the main audio data to merge each data, the influence of the audio feedback data on the main audio data can be reduced as much as possible.
在一个例子中,本步骤S2200中将音频反馈数据与主音频数据合并,可以包括如下步骤S2231~S2232。In an example, combining the audio feedback data with the main audio data in step S2200 may include the following steps S2231 to S2232.
步骤S2231,设置包括主音频数据及音频反馈数据在内的每一数据占用互不相同的音轨。Step S2231, setting that each data including the main audio data and the audio feedback data occupies a different audio track.
例如,在主音频数据的设定播放时段内产生了10条音频反馈数据,则在将此10条音频反馈数据与主音频数据进行合并时,相当于进行11条音频数据的合并,在此,可以设置11个音轨,以使每一数据占用互不相同的音轨进行合并。For example, if 10 pieces of audio feedback data are generated during the set playback period of the main audio data, when these 10 pieces of audio feedback data are combined with the main audio data, it is equivalent to merging 11 pieces of audio data. Here, You can set 11 audio tracks so that each data occupies a different audio track to be combined.
步骤S2232,通过音轨合成将音频反馈数据与主音频数据合并。Step S2232: Combine the audio feedback data with the main audio data through audio track synthesis.
根据以上步骤S2231~S2232,根据本例子的音频处理可以利用音轨合成技术进行音频数据的合并,这有利于降低音频合并的难度,同时获得良好的合并效果。According to the above steps S2231 to S2232, the audio processing according to this example can use audio track synthesis technology to merge audio data, which is beneficial to reduce the difficulty of audio merging and obtain a good merging effect.
关于以上非限制性列举的各种合并处理的例子,可以根据需要单独使用或者进行任意的相互结合使用。Regarding the above non-limiting examples of various combinations of processing, they can be used alone or in any combination as needed.
本实施例中,步骤S2200中生成合并后的音频数据供播放,可以是:对于当前播放目标媒体文件的终端设备,在每次经合并进行音频文件的更新后,接续当前的播放时刻播放更新后的音频文件。In this embodiment, the combined audio data generated in step S2200 for playback may be: for the terminal device that currently plays the target media file, after each audio file is updated through the combination, the current playback time is continued after the update is played Audio files.
该步骤S2200中生成合并后的音频数据供播放可以由终端设备参与实施,也可以由服务器参与实施。The generation of the merged audio data for playback in step S2200 may be implemented by the terminal device or the server.
在由服务器参与实施的例子中,该步骤中生成合并后的音频数据供播放,可以包括:将合并后的音频数据下发至终端设备进行播放。In the example where the server participates in the implementation, generating the combined audio data for playback in this step may include: sending the combined audio data to the terminal device for playback.
在由终端设备参与实施的例子中,该步骤中生成合并后的音频数据供播放,可以包括:生成合并后的音频数据,以驱动音频输出装置播放。In the example where the terminal device participates in the implementation, generating the combined audio data for playback in this step may include: generating the combined audio data to drive the audio output device to play.
根据以上步骤S2100~S2200,本实施例的音频数据的处理方法是将用户选择播放的目标媒体文件的主音频数据与在主音频数据的播放过程中产生的音频反馈数据合并,以获得合并后的音频数据供播放,这样,任意用户在通过各自的终端设备播放目标媒体文件时,便可以获得与其他人一起欣赏目标媒体文件的现场听感效果,进而获得现场体验。According to the above steps S2100 to S2200, the audio data processing method of this embodiment is to combine the main audio data of the target media file selected by the user for playback with the audio feedback data generated during the playback of the main audio data to obtain the combined The audio data is available for playback. In this way, when any user plays the target media file through his or her own terminal device, he can get the live listening effect of enjoying the target media file with other people, and then get the live experience.
在一个实施例中,可以允许用户选择是否开启现场音效,因此,本实施例中,该处理方法还可以包括检测是否开启现场音效的步骤,以响应于开启现场音效功能的指令,执行以上步骤S2200中将获取到的音频反馈数据与主音频数据合并的操作。In an embodiment, the user may be allowed to choose whether to enable the live sound effect. Therefore, in this embodiment, the processing method may further include a step of detecting whether the live sound effect is turned on, so as to execute the above step S2200 in response to the instruction to enable the live sound effect function. The operation of merging the acquired audio feedback data with the main audio data.
本实施例中附加的以上步骤可以由终端设备参与实施,即终端设备响应于用户输入的开启现场音效功能的指令,将获取到的音频反馈数据与主音频数据合并。该指令可以由用户通过终端设备的物理按键触发,也可以由播放目标媒体文件的应用提供的虚拟按键(控件)触发,例如,通过如图5所示的开启现场音效的虚拟按键触发该指令。The above additional steps in this embodiment can be implemented by the terminal device, that is, the terminal device merges the acquired audio feedback data with the main audio data in response to the user's input to enable the live sound effect function. The instruction may be triggered by the user through a physical button of the terminal device, or may be triggered by a virtual button (control) provided by the application playing the target media file. For example, the instruction may be triggered by a virtual button for enabling live sound effects as shown in FIG. 5.
本实施例中附加的以上步骤也可以由服务器参与实施,即服务器响应于终端设备发送的开启现场音效功能的指令,向终端设备提供合并后的音频数据供播放,或者向终端设备提供音频反馈数据进行与主音频数据的合并,以形成合并后的音频数据进行播放。终端设备发送的该指令可以基于用户触发的指令生成。The above additional steps in this embodiment can also be implemented by the server, that is, the server responds to the instruction sent by the terminal device to turn on the live sound function, and provides the combined audio data to the terminal device for playback, or provides the terminal device with audio feedback data Merging with the main audio data to form merged audio data for playback. The instruction sent by the terminal device may be generated based on the instruction triggered by the user.
本实施例允许用户选择是否需要播放合并后的音频数据,如果不期望播放音频反馈数据,也可以选择只播放目标媒体文件的主音频数据,实现多样化选择。This embodiment allows the user to choose whether to play the combined audio data. If the audio feedback data is not desired to be played, they can also choose to play only the main audio data of the target media file to achieve diversified choices.
在一个实施例中,对于任意接收目标媒体文件的用户,参与合并的音频反馈数据可以是相同的,其中,在步骤S2100中可以是获取在主音频数据的播放过程中产生的所有音频反馈数据进行合并,也可以是根据设定的筛选条件获取筛选出的部分音频反馈数据进行合并,在此不做限定。In one embodiment, for any user who receives the target media file, the audio feedback data participating in the merging may be the same. In step S2100, all audio feedback data generated during the playback of the main audio data may be acquired. Combining can also be to obtain and combine the filtered part of the audio feedback data according to the set filtering conditions, which is not limited here.
在另外的实施例中,对于不同类型的用户,参与合并的音频反馈数据可以是不同的,即,可以根据用户喜好,针对不同类型的用户筛选不同的音频反馈数据进行合并,以获得千人千面的现场效果。In another embodiment, for different types of users, the audio feedback data participating in the merging may be different, that is, according to user preferences, different audio feedback data can be filtered for different types of users and combined to obtain thousands of people. The live effect of the face.
在该另外的实施例中,以上步骤S2100中获取在主音频数据的播放过程中产生的音频反馈声音数据,可以包括:获取在主音频数据的播放过程中产生的符合目标分类的音频反馈数据。In this other embodiment, obtaining the audio feedback sound data generated during the playback of the main audio data in the above step S2100 may include: obtaining audio feedback data that meets the target classification generated during the playback of the main audio data.
本实施例中,可以预先设置各目标分类,这可以是根据用户年龄、性别、学历、喜好等中的至少一项进行的分类,例如,按照用户年龄设置五个目标分类。In this embodiment, each target classification may be set in advance, which may be a classification based on at least one of the user's age, gender, education, and preferences, for example, five target classifications are set according to the user's age.
例如,对于用户年龄为20岁以下(包括20岁)的目标分类,可以在该步骤S2100获取由20岁以下的用户产生的音频反馈数据,形成符合该目标分类的音频反馈数据。For example, for a target classification whose user age is under 20 (including 20 years old), audio feedback data generated by a user under 20 can be obtained in this step S2100 to form audio feedback data conforming to the target classification.
在该另外的实施例中,以上步骤S2200中生成合并后的音频数据供播放,包括:生成合并后的音频数据供符合该目标分类的终端设备播放。In this other embodiment, generating the combined audio data for playback in the above step S2200 includes: generating the combined audio data for playback by terminal devices that meet the target classification.
在目标分类是根据用户属性进行的分类的情况下,符合该目标分类的终端设备是指对应于该终端设备的用户,即使用该终端设备的用户,符合该目标分类。In the case where the target classification is based on user attributes, the terminal device that conforms to the target classification refers to the user corresponding to the terminal device, that is, the user who uses the terminal device, and conforms to the target classification.
在一个例子中,该另外的实施例可以由服务器参与实施,该例子中,步骤S2110可 以包括:服务器对于设置的每一目标分类,获取在目标媒体文件的播放过程中产生的符合对应目标分类的音频反馈数据。In an example, the other embodiment may be implemented by the server. In this example, step S2110 may include: for each target classification set, the server obtains the corresponding target classification generated during the playback of the target media file. Audio feedback data.
进一步地,服务器可以将获取到的符合目标分类的音频反馈数据下发至与该目标分类相匹配的终端设备进行与主音频数据的合并。Further, the server may deliver the acquired audio feedback data that meets the target classification to a terminal device that matches the target classification to merge with the main audio data.
进一步地,服务器也可以在获取到符合目标分类的音频反馈数据后,将这些音频反馈数据与主音频数据合并,并将合并后的音频数据下发至与该目标分类相匹配的终端设备进行播放。Further, the server can also merge the audio feedback data with the main audio data after acquiring the audio feedback data that meets the target classification, and send the combined audio data to the terminal device that matches the target classification for playback .
在一个例子中,该另外的实施例也可以由终端设备参与实施,该例子中,步骤S2110可以包括:终端设备从服务器获取符合对应用户所属的目标分类的音频反馈数据,以进行与主音频数据的合并。In an example, the other embodiment may also be implemented by the terminal device. In this example, step S2110 may include: the terminal device obtains from the server audio feedback data that corresponds to the target category to which the user belongs, so as to communicate with the main audio data The merger.
该例子中,对应用户所属的目标分类,也即为与该终端设备相匹配的目标分类,可以由用户根据所提供的目标分类进行选择确定,也可以根据对应用户的用户特征确定。In this example, the target category to which the corresponding user belongs, that is, the target category matching the terminal device, can be selected and determined by the user according to the target category provided, or determined according to the user characteristics of the corresponding user.
根据本实施例的处理方法,可以为不同类型的用户提供不同的现场效果,进而提升所提供的现场效果与用户之间的贴合度,提升用户体验。According to the processing method of this embodiment, different on-site effects can be provided for different types of users, thereby improving the fit between the provided on-site effects and users, and improving user experience.
在一个实施例中,本申请处理方法还可以包括:获取播放主音频数据的终端设备所对应的设定用户特征的特征值;以及,根据该特征值,确定该终端设备所属的目标分类。In an embodiment, the processing method of the present application may further include: acquiring a characteristic value of a set user characteristic corresponding to a terminal device that plays the main audio data; and, according to the characteristic value, determining the target category to which the terminal device belongs.
该实施例中,终端设备所对应的设定用户特征指对应于该终端设备的用户的设定用户特征,即使用该终端设备的用户的设定用户特征。In this embodiment, the set user characteristic corresponding to the terminal device refers to the set user characteristic of the user corresponding to the terminal device, that is, the set user characteristic of the user who uses the terminal device.
在一个例子中,该设定用户特征包括年龄、学历、性别、兴趣爱好、喜好的语言类型中的任意一项或者多项。这些设定用户特征的特征值可以根据用户的注册信息确定,也可以根据用户使用本应用(提供目标媒体文件的应用)产生的历史使用数据确定,还可以根据用户通过使用其他应用产生的历史使用数据确定,在此不做限定。In an example, the set user characteristics include any one or more of age, education, gender, hobbies, and preferred language types. These feature values for setting user characteristics can be determined based on the user's registration information, or based on the historical usage data generated by the user using this application (the application that provides the target media file), or based on the historical usage generated by the user through the use of other applications The data is confirmed and not limited here.
在一个例子中,该设定用户特征可以包括该用户在主音频数据的播放过程中产生的音频反馈数据的设定特征。该设定特征例如包括声音特征和情感特征中的任意一项或者两项。在该例子中,可以根据该设定特征将对应用户分配至语言类型相似的目标分类中,也可以根据该设定特征将对应用户分配至语言类型相反的目标分类中,在此不做限定。In an example, the set user characteristics may include the set characteristics of the audio feedback data generated by the user during the playback of the main audio data. The setting feature includes, for example, any one or two of voice features and emotional features. In this example, the corresponding user can be assigned to target categories with similar language types based on the set feature, or the corresponding users can be assigned to target categories with the opposite language type based on the set feature, which is not limited here.
声音特征是指音频反馈数据中体现的与声音属性相关的特征,该声音特征可以包括音量特征、节奏特征、音调特征等。Sound characteristics refer to characteristics related to sound attributes embodied in audio feedback data. The sound characteristics may include volume characteristics, rhythm characteristics, pitch characteristics, and the like.
情感特征是指音频反馈数据中体现的用户的情绪或感受相关的特征,该情感特征可 以包括情感类型、情感程度、表情主题等。情感类型可以是根据人类的情感及情绪分类预先设置的类型,例如情感类型可以包括生气、快乐、哀伤、欢喜等,情感程度可以包括对应的情感类型的情感程度,例如生气这一情感类型可以包括狂怒、发火、微怒等不同程度的生气情绪。Emotional features refer to the features related to the user's emotions or feelings reflected in the audio feedback data. The emotional features can include the type of emotion, the degree of emotion, and the theme of expression. The emotion type can be a preset type according to human emotion and emotion classification. For example, the emotion type can include anger, happiness, sadness, joy, etc. The emotion level can include the emotion level of the corresponding emotion type, for example, the emotion type of anger can include Various degrees of anger, such as anger, anger, and slight anger.
在提取音频反馈数据的声音特征时,可以对音频反馈数据进行语音分析,提取对应的音量特征以及节奏特征等。例如,可以使用常用的语音信号分析手段,确定音频反馈数据的音量大小、节奏快慢等,对应得到音频反馈数据的音量特征以及节奏特征等。When extracting the sound features of the audio feedback data, voice analysis can be performed on the audio feedback data to extract the corresponding volume characteristics and rhythm characteristics. For example, common voice signal analysis methods can be used to determine the volume and rhythm of the audio feedback data, and correspondingly obtain the volume characteristics and rhythm characteristics of the audio feedback data.
在提取音频反馈数据的情感特征时,可以将音频反馈数据的内容转换为对应的文本,并根据预先构建的情感词库从文本中提取情感关键词,通过情感结构化模型,对情感关键词进行结构化分析,得到情感关键词的情感类型以及情感程度,作为音频反馈数据的情感特征。When extracting the emotional features of the audio feedback data, the content of the audio feedback data can be converted into the corresponding text, and the emotional keywords can be extracted from the text according to the pre-built emotional vocabulary, and the emotional keywords can be processed through the emotional structure model. Structured analysis obtains the emotional type and emotional degree of emotional keywords as the emotional characteristics of the audio feedback data.
例如,可以将音频反馈数据通过语音识别引擎或者语音转文本的工具、插件等,得到对应的文本。For example, the audio feedback data can be passed through a speech recognition engine or speech-to-text tools, plug-ins, etc., to obtain the corresponding text.
情感词库中包括多个分别体现不同的人类情感或者人类情绪的情感词汇。例如,可以通过人工或者机器挖掘这些情感词汇,预先构建情感词库。The emotional vocabulary includes a plurality of emotional vocabularies that respectively reflect different human emotions or human emotions. For example, these emotional vocabularies can be excavated manually or by machines to construct an emotional vocabulary in advance.
根据该情感词库,可以将音频反馈数据进行分词得到的词汇与情感词库中包括的情感词汇通过余弦相似度等方法进行相似度分析,提取相似度高于预设的相似度阈值的情感词汇作为情感关键词。According to the emotional vocabulary, the vocabulary obtained by segmenting the audio feedback data and the emotional vocabulary included in the emotional vocabulary can be analyzed for similarity through cosine similarity and other methods, and emotional vocabulary with similarity higher than the preset similarity threshold can be extracted As emotional keywords.
情感结构化模型可以通过对采集的与情感相关的情感词汇进行分类并结构化组织得到的词汇模型。情感结构化模型中包括的每个情感词汇都具有对应的情感类型以及情感程度。The emotional structured model can be a vocabulary model obtained by classifying and structuring the collected emotional vocabulary related to emotions. Each emotion vocabulary included in the emotion structure model has a corresponding emotion type and emotion degree.
在一个例子中,可以对预先通过人工或者机器挖掘得到的情感词汇,根据人类情感或者人类情绪进行不同层次的分类,例如,根据每种情感类型进分为大类,每个大类里包括属于相同情感类型的情感词汇,在每个大类里再按情感程度的不同,进一步细分为不同的小类,每个小类下可根据情感程度的高低对情感词汇进行排序,形成不同分类层次的结构,以此组织情感词汇对应的得到情感结构化模型。In one example, the emotional vocabulary obtained through manual or machine mining in advance can be classified into different levels according to human emotions or human emotions. For example, each emotion type is divided into major categories, and each major category includes The emotional vocabulary of the same emotional type is further subdivided into different sub-categories according to the different emotional level in each major category. Under each sub-category, the emotional vocabulary can be sorted according to the emotional level to form different classification levels. The structure of the emotional vocabulary is organized to obtain the emotional structured model.
通过情感结构化模型,对情感关键词进行结构化分析,可以在情感结构化模型查找到与情感关键词对应的情感词汇,根据该情感词汇的情感类型以及情感程度,确定情感关键词的情感类型以及情感程度,对应得到音频反馈数据的情感特征。Through the emotional structured model, the emotional keyword is structured and analyzed, and the emotional vocabulary corresponding to the emotional keyword can be found in the emotional structured model, and the emotional type of the emotional keyword can be determined according to the emotional type and emotional degree of the emotional vocabulary And the degree of emotion corresponds to the emotional characteristics of the audio feedback data.
在音频反馈数据为表情特征的情况下,可以直接根据表情特征确定所需的设定特征 的特征值,例如表情特征表示的生气表情,则可以直接基于该表情特征确定对应的情感特征的特征值。In the case that the audio feedback data is an expression feature, the required feature value of the set feature can be determined directly according to the expression feature, for example, an angry expression represented by an expression feature, the feature value of the corresponding emotion feature can be determined directly based on the expression feature .
本实施例的该步骤可以由服务器根据终端设备提供的对应用户的设定用户特征的特征值实施,也可以由终端设备实施,在由终端设备实施的例子中,每一终端设备确定与各自相对应的用户所述的目标分类。This step of this embodiment can be implemented by the server according to the characteristic value of the set user characteristic corresponding to the user provided by the terminal device, or it can be implemented by the terminal device. In the example implemented by the terminal device, each terminal device determines the corresponding The target classification described by the corresponding user.
根据该实施例的处理方法,根据用户特征的特征值确定用户或终端设备所属的目标分类,可以提高确定目标分类的准确性,而且无需用户通过额外的操作设置所期望的目标分类,进而实现智能分类。According to the processing method of this embodiment, the target classification to which the user or the terminal device belongs is determined according to the characteristic value of the user characteristic, which can improve the accuracy of determining the target classification, and does not require the user to set the desired target classification through additional operations, thereby achieving intelligence classification.
在一个实施例中,主音频数据为视频文件的音频数据,对此,本实施例的处理方法还可以包括:在该视频文件的视频播放窗口中,以弹幕形式展示代表音频反馈数据的音频波形。In one embodiment, the main audio data is the audio data of the video file. In this regard, the processing method of this embodiment may further include: displaying the audio representing the audio feedback data in the form of a bullet screen in the video playback window of the video file Waveform.
代表音频反馈数据的音频波形是音频反馈数据的图形化表达方式。例如,如图5所示的播放窗口中显示的音频波形。The audio waveform representing the audio feedback data is a graphical representation of the audio feedback data. For example, the audio waveform displayed in the playback window shown in Figure 5.
在一个例子中,可以先获取音频反馈数据的声音特征以及情感特征,然后根据音频反馈数据的声音特征以及情感特征生成音频波形。In one example, the sound characteristics and emotional characteristics of the audio feedback data may be acquired first, and then the audio waveform is generated according to the sound characteristics and emotional characteristics of the audio feedback data.
在一个例子中,可以根据音频反馈数据的声音特征,设置音频波形的显示形状。In an example, the display shape of the audio waveform can be set according to the sound characteristics of the audio feedback data.
该例子中,显示形状可以包括音频波形的幅度大小、波形周期间隔、波形持续长度等。例如,音频反馈数据的声音特征包括节奏特征以及音量特征,可以根据节奏特征体现的节奏快慢设置音频波形的波形周期间隔,如节奏越快,波形周期间隔越短等,根据音量特征体现的音量大小设置音频波形的波形幅度,如音量越大,波形幅度越大等。In this example, the display shape may include the amplitude of the audio waveform, the interval of the waveform period, and the duration of the waveform. For example, the sound characteristics of the audio feedback data include rhythm characteristics and volume characteristics. The waveform period interval of the audio waveform can be set according to the rhythm reflected by the rhythm characteristics. For example, the faster the rhythm, the shorter the waveform period interval, etc., according to the volume characteristics of the volume. Set the waveform amplitude of the audio waveform, such as the louder the volume, the larger the waveform amplitude, etc.
在一个例子中,可以根据音频反馈数据的情感特征,设置音频波形的显示颜色等。In an example, the display color of the audio waveform can be set according to the emotional characteristics of the audio feedback data.
该例子中,可以根据不同的情感类型设置不同类型的显示颜色,如情感类型是“生气”,设置显示颜色是红色,情感类型是“高兴”,设置显示颜色是绿色,对于同一情感类型的不同情感程度设置同类型的显示颜色深浅不同,例如,对于情感类型是“高兴”,情感程度是“大喜”,设置显示颜色是深绿色,情感程度是“有点开心”,设置显示颜色是浅绿色,等等。In this example, different types of display colors can be set according to different emotion types. For example, the emotion type is "angry", the display color is set to red, the emotion type is "happy", and the display color is set to green, which is different for the same emotion type. The emotion level setting is different for the same type of display color. For example, the emotion type is "happy", the emotion level is "big joy", the display color is dark green, the emotion level is "a little happy", and the display color is light green. and many more.
根据本实施例的处理方法,在视频播放窗口中以弹幕形式展示音频波形,可以在获取现场听感效果的同时,还能通过音频反馈数据的图形化表达,直观地感受其他用户的声音特征以及情感特征。According to the processing method of this embodiment, the audio waveform is displayed in the form of a barrage in the video playback window. While obtaining the on-site auditory effect, it can also intuitively feel the voice characteristics of other users through the graphical expression of the audio feedback data. And emotional characteristics.
<例子1><Example 1>
图6a是根据本申请一个例子的音频数据的处理方法的示例性流程图。该例子,服务器向每一终端设备提供的音频反馈数据可以是相同的,因此,图中仅示意出一个终端设备。在该例子中,处理方法可以包括如下步骤:Fig. 6a is an exemplary flowchart of a method for processing audio data according to an example of the present application. In this example, the audio feedback data provided by the server to each terminal device may be the same. Therefore, only one terminal device is shown in the figure. In this example, the processing method may include the following steps:
步骤S1210,终端设备1200采集对应用户在目标媒体文件的播放过程中,即主音频数据的播放过程中,产生的音频反馈数据上传至服务器1100。In step S1210, the terminal device 1200 collects the audio feedback data generated by the corresponding user during the playback process of the target media file, that is, during the playback process of the main audio data, and uploads it to the server 1100.
在另外的例子中,图中所示的终端设备1200也可能没有音频反馈数据产生,而是由其他终端设备1200采集对应用户在目标媒体文件的播放过程中产生的音频反馈数据上传至服务器1100。In another example, the terminal device 1200 shown in the figure may not generate audio feedback data. Instead, other terminal devices 1200 collect the audio feedback data generated by the corresponding user during the playback of the target media file and upload it to the server 1100.
步骤S1110,服务器1100获取包括图中所示终端设备在内的各终端设备上传的该音频反馈数据。In step S1110, the server 1100 obtains the audio feedback data uploaded by each terminal device including the terminal device shown in the figure.
步骤S1120,服务器1100将获取到的音频反馈数据下发至正在播放目标媒体文件的每一终端设备1200进行音频数据的合并。In step S1120, the server 1100 delivers the acquired audio feedback data to each terminal device 1200 that is playing the target media file to merge the audio data.
步骤S1220,终端设备1200获取服务器1100提供的音频反馈数据。In step S1220, the terminal device 1200 obtains audio feedback data provided by the server 1100.
步骤S1230,终端设备1200将获取到的音频反馈数据与目标媒体文件的主音频数据合并,产生合并后的目标媒体文件。In step S1230, the terminal device 1200 merges the acquired audio feedback data with the main audio data of the target media file to generate a merged target media file.
终端设备1200例如通过混音手段合并主音频数据和获取到的各音频反馈数据。The terminal device 1200 combines the main audio data and the acquired audio feedback data, for example, by means of audio mixing.
步骤S1240,终端设备1200在播放目标媒体文件时,播放合并后的音频数据替代播放单独的主音频数据,即,终端设备1200对应的用户在收听主音频数据的同时,还至少能够收听到其他用户在播放主音频数据的过程中产生的音频反馈数据。Step S1240: When the terminal device 1200 plays the target media file, it plays the combined audio data instead of playing the separate main audio data. That is, the user corresponding to the terminal device 1200 can at least listen to other users while listening to the main audio data. Audio feedback data generated in the process of playing the main audio data.
<例子2><Example 2>
图6b是根据本申请另一个例子的音频数据的处理方法的示例性流程图。该例子,服务器向每一终端设备提供的音频反馈数据可以是不相同的,图中示出了两个符合不同目标分类的终端设备,分别为终端设备1200-1和终端设备1200-2。在该例子中,处理方法可以包括如下步骤:Fig. 6b is an exemplary flowchart of a method for processing audio data according to another example of the present application. In this example, the audio feedback data provided by the server to each terminal device may be different. The figure shows two terminal devices conforming to different target categories, namely terminal device 1200-1 and terminal device 1200-2. In this example, the processing method may include the following steps:
步骤S1210-1,终端设备1200-1采集对应用户在目标媒体文件的播放过程中产生的音频反馈数据上传至服务器1100。In step S1210-1, the terminal device 1200-1 collects and uploads the audio feedback data generated by the corresponding user during the playback of the target media file to the server 1100.
步骤S1210-2,终端设备1200-2采集对应用户在目标媒体文件的播放过程中产生的音频反馈数据上传至服务器1100。In step S1210-2, the terminal device 1200-2 collects and uploads the audio feedback data generated by the corresponding user during the playback of the target media file to the server 1100.
在另外的例子中,图中所示的终端设备1200-1和/或终端设备1200-2也可能没有音频反馈数据产生,而是由其他终端设备1200采集对应用户在目标媒体文件的播放过程中产生的音频反馈数据上传至服务器1100。In another example, the terminal device 1200-1 and/or the terminal device 1200-2 shown in the figure may not generate audio feedback data, but the other terminal device 1200 collects the corresponding user during the playback of the target media file. The generated audio feedback data is uploaded to the server 1100.
步骤S1110,服务器1100获取包括终端设备1200-1和终端设备1200-2在内的各终端设备上传的音频反馈数据。In step S1110, the server 1100 obtains audio feedback data uploaded by each terminal device including the terminal device 1200-1 and the terminal device 1200-2.
步骤S1120-1,服务器1100将在目标媒体文件的播放过程中产生的符合终端设备1200-1所属的目标分类的用户声音数据下发至终端设备1200-1进行音频数据的合并。In step S1120-1, the server 1100 delivers the user voice data that is generated during the playback of the target media file and conforms to the target classification to which the terminal device 1200-1 belongs to the terminal device 1200-1 to merge the audio data.
步骤S1120-2,服务器1100将在目标媒体文件的播放过程中产生的符合终端设备1200-2所属的目标分类的用户声音数据下发至终端设备1200-2进行音频数据的合并。In step S1120-2, the server 1100 delivers the user sound data generated during the playback of the target media file and conforms to the target classification to which the terminal device 1200-2 belongs to the terminal device 1200-2 to merge the audio data.
步骤S1220-1,终端设备1200-1获取服务器1100提供的音频反馈数据。Step S1220-1: The terminal device 1200-1 obtains the audio feedback data provided by the server 1100.
步骤S1230-1,终端设备1200-1将获取到的音频反馈数据与目标媒体文件的主音频数据合并,产生合并后的音频数据A。In step S1230-1, the terminal device 1200-1 combines the acquired audio feedback data with the main audio data of the target media file to generate combined audio data A.
步骤S1240-1,终端设备1200-1在播放目标媒体文件的过程中,播放合并后的音频数据A,其中,播放合并后的音频数据A的听感效果为:终端设备1200-1对应的用户在收听主音频数据的同时,还能够收听到与终端设备1200-1的目标分类相符合的音频反馈数据。In step S1240-1, the terminal device 1200-1 plays the combined audio data A during the process of playing the target media file, where the auditory effect of playing the combined audio data A is: the user corresponding to the terminal device 1200-1 While listening to the main audio data, it is also possible to listen to the audio feedback data that matches the target classification of the terminal device 1200-1.
步骤S1220-2,终端设备1200-2获取服务器1100提供的音频反馈数据。In step S1220-2, the terminal device 1200-2 obtains the audio feedback data provided by the server 1100.
步骤S1230-2,终端设备1200-2将获取到的音频反馈数据与目标媒体文件的主音频数据合并,产生合并后的音频数据B。In step S1230-2, the terminal device 1200-2 combines the acquired audio feedback data with the main audio data of the target media file to generate combined audio data B.
步骤S1240-2,终端设备1200-2在播放目标媒体文件的过程中,播放合并后的音频数据B,其中,播放合并后的音频数据B的听感效果为:终端设备1200-2对应的用户在收听主音频数据的同时,还能够收听到与终端设备1200-2的目标分类相符合的音频反馈数据。In step S1240-2, the terminal device 1200-2 plays the combined audio data B during the process of playing the target media file, where the auditory effect of playing the combined audio data B is: the user corresponding to the terminal device 1200-2 While listening to the main audio data, it is also possible to listen to the audio feedback data that matches the target classification of the terminal device 1200-2.
在该例子中,由于终端设备1200-1与终端设备1200-2分属于不同的目标分类,因此,合并后的音频数据A与合并后的音频数据B将不同,实现千人千面的现场效果。In this example, since the terminal device 1200-1 and the terminal device 1200-2 belong to different target categories, the merged audio data A and the merged audio data B will be different, achieving a scene effect of thousands of people .
<方法实施例2><Method Example 2>
图7是根据本实施例的音频数据的处理方法的流程示意图,该处理方法由终端设备实施,例如由图1中的终端设备1200实施,其中,本实施例中的终端设备可以是具有显示装置的设备,也可以是不具有显示装置的设备;可以是自身具有音频输出装置,也可以通过无线或者有线的方式外接音频输出装置。FIG. 7 is a schematic flowchart of a method for processing audio data according to this embodiment. The processing method is implemented by a terminal device, such as the terminal device 1200 in FIG. 1, where the terminal device in this embodiment may have a display device. The device may also be a device without a display device; it may have an audio output device itself, or an external audio output device may be connected wirelessly or wiredly.
如图7所示,本实施例的方法可以包括如下步骤S7100~S7300:As shown in FIG. 7, the method of this embodiment may include the following steps S7100 to S7300:
步骤S7100,终端设备1200获取选择播放的主音频数据。In step S7100, the terminal device 1200 obtains the main audio data selected to be played.
该选择播放的主音频数据为使用终端设备1200的用户所选择的目标媒体文件的音频数据,该目标媒体文件可以是纯音频文件,也可以是视频文件。The main audio data selected to be played is the audio data of the target media file selected by the user using the terminal device 1200, and the target media file may be a pure audio file or a video file.
步骤S7200,终端设备1200获取对应于主音频数据的现场音频数据,其中,该现场音频数据至少包括其他用户针对主音频数据的音频反馈数据。In step S7200, the terminal device 1200 obtains live audio data corresponding to the main audio data, where the live audio data includes at least other users' audio feedback data for the main audio data.
该现场音频数据还可以包括该终端设备1200对应的用户针对主音频数据产生的音频反馈数据,即对于任意终端设备1200而言,不仅其他用户的音频反馈数据参与音频数据的合并,使用该终端设备1200的用户产生的音频反馈数据也可以参与音频数据的合并。The live audio data may also include audio feedback data generated by the user corresponding to the terminal device 1200 for the main audio data. That is, for any terminal device 1200, not only the audio feedback data of other users participates in the merging of audio data, but the terminal device is used Audio feedback data generated by users of 1200 can also participate in the merging of audio data.
该步骤S7200中,获取的现场音频数据可以是符合该终端设备1200的目标分类的现场音频数据,也可以是对于任意终端设备1200而言均相同的现场音频数据,在此不做限定。In this step S7200, the acquired live audio data may be live audio data conforming to the target classification of the terminal device 1200, or may be live audio data that is the same for any terminal device 1200, which is not limited herein.
在一个例子中,终端设备1200可以从服务器获取所有的音频反馈数据,包括其他用户产生的音频反馈数据,也可以包括该终端设备1200所对应用户产生的音频反馈数据。In an example, the terminal device 1200 may obtain all audio feedback data from the server, including audio feedback data generated by other users, and may also include audio feedback data generated by the user corresponding to the terminal device 1200.
在另一个例子中,终端设备1200可以仅从服务器获取其他用户产生的音频反馈数据,以及,从本地获取所对应用户产生的音频反馈数据。In another example, the terminal device 1200 may only obtain audio feedback data generated by other users from the server, and obtain audio feedback data generated by the corresponding user locally.
在一个例子中,由终端设备1200在获取到现场音频数据后,进行现场音频数据与主音频数据的合并,得到合并后的音频数据。In an example, after the terminal device 1200 obtains the live audio data, the live audio data and the main audio data are combined to obtain the combined audio data.
在另一个例子中,也可以是由服务器进行合并,并提供给终端设备,在该例子中,以上步骤S7100和步骤S7200即指,获取合并后的音频数据,其中,合并后的音频数据包括主音频数据和现场音频数据。In another example, it may be merged by the server and provided to the terminal device. In this example, the above steps S7100 and S7200 refer to acquiring the merged audio data, where the merged audio data includes the main Audio data and live audio data.
步骤S7300,终端设备1200执行在播放主音频数据的同时,播放对应的现场音频数据的处理操作。In step S7300, the terminal device 1200 performs a processing operation of playing the corresponding live audio data while playing the main audio data.
在由终端设备1200进行现场音频数据与主音频数据的合并的例子中,该处理操作包括该合并处理,以及根据合并后的音频数据驱动音频输出装置在播放主音频数据的同时,播放对应的现场音频数据,其中,合并处理可采用以上方法实施例1中提供的任意一种或者多种方式,在此不再赘述。In the example of combining live audio data and main audio data by the terminal device 1200, the processing operation includes the combining process, and driving the audio output device according to the combined audio data to play the corresponding live audio data while playing the main audio data. For audio data, any one or more of the methods provided in Embodiment 1 of the above method can be used for the merging process, which will not be repeated here.
在终端设备1200直接接收服务器1100提供的合并后的音频数据的例子中,该处理操作包括:根据合并后的音频数据驱动音频输出装置在播放主音频数据的同时,播放对 应的现场音频数据。In an example in which the terminal device 1200 directly receives the merged audio data provided by the server 1100, the processing operation includes: according to the merged audio data, the audio output device is driven to play the main audio data while playing the corresponding live audio data.
该步骤S7200中,终端设备1200可以根据合并后的音频数据,例如,根据混音后的音频数据,或者根据主音频数据与现场音频数据之间的对应关系,驱动音频输出装置能够在播放主音频数据的同时,播放对应的现场音频数据,实现与其他人共同欣赏目标媒体文件的现场效果。In this step S7200, the terminal device 1200 may drive the audio output device to be able to play the main audio according to the combined audio data, for example, according to the mixed audio data, or according to the correspondence between the main audio data and the live audio data. At the same time as the data, the corresponding live audio data is played to realize the live effect of enjoying the target media file with other people.
本实施例中,该终端设备可以是智能手机、便携式电脑、台式计算机、平板电脑、可穿戴设备、智能音箱、机顶盒、智能电视、录音笔,摄录机等,在此不做限定。In this embodiment, the terminal device may be a smart phone, a laptop computer, a desktop computer, a tablet computer, a wearable device, a smart speaker, a set-top box, a smart TV, a voice recorder, a camcorder, etc., which are not limited here.
根据本实施例的处理方法,终端设备可以在播放用户选择的目标媒体文件的过程中,随同目标媒体文件的主音频数据,播放获取到的现场音频数据,以使用户能够获得主音频数据与现场音频数据混合在一起的现场听感。因此,根据本实施例的处理方法,任意用户在通过各自的终端设备播放目标媒体文件时,便可以获得与其他人一起欣赏目标媒体文件的现场听感效果,进而获得现场体验。According to the processing method of this embodiment, the terminal device can play the acquired live audio data along with the main audio data of the target media file during the process of playing the target media file selected by the user, so that the user can obtain the main audio data and the live audio data. The live listening experience of audio data mixed together. Therefore, according to the processing method of this embodiment, when any user plays the target media file through his terminal device, he/she can obtain the live listening effect of enjoying the target media file with other people, thereby obtaining the live experience.
<方法实施例3><Method Example 3>
图8是根据本实施例的音频数据的处理方法的流程示意图,该处理方法由终端设备实施,例如由图1中的终端设备1200实施,其中,本实施例中的终端设备可以是具有显示装置的设备,也可以是不具有显示装置的设备;可以是自身具有音频输出装置,也可以通过无线或者有线的方式外接音频输出装置。FIG. 8 is a schematic flowchart of a method for processing audio data according to this embodiment. The processing method is implemented by a terminal device, such as the terminal device 1200 in FIG. 1, where the terminal device in this embodiment may have a display device. The device may also be a device without a display device; it may have an audio output device itself, or an external audio output device may be connected wirelessly or wiredly.
如图8所示,本实施例的处理方法可以包括如下步骤S8100~S8300:As shown in FIG. 8, the processing method of this embodiment may include the following steps S8100 to S8300:
步骤S8100,终端设备1200响应于播放目标媒体文件的操作,播放该目标媒体文件,其中,该目标媒体文件包括主音频数据。Step S8100: In response to the operation of playing the target media file, the terminal device 1200 plays the target media file, where the target media file includes main audio data.
步骤S8200,获取对应于主音频数据的现场音频数据,其中,该现场音频数据至少包括其他用户针对所述主音频数据的音频反馈数据。Step S8200: Acquire live audio data corresponding to the main audio data, where the live audio data includes at least other users' audio feedback data for the main audio data.
该步骤S8200中,该现场音频数据还可以包括该终端设备对应的用户,即本机用户,针对主音频数据的音频反馈数据。本机用户的音频反馈数据可以与其他用户的音频反馈数据一起从服务器获取,也可以从本地获取,在此不做限定。In this step S8200, the live audio data may also include the user corresponding to the terminal device, that is, the local user, audio feedback data for the main audio data. The audio feedback data of the local user can be obtained from the server together with the audio feedback data of other users, or it can be obtained locally, which is not limited here.
在一个例子中,步骤S8200中获取对应于主音频数据的现场音频数据可以包括:从服务器获取其他用户针对主音频数据的音频反馈数据,形成现场音频数据。In an example, obtaining live audio data corresponding to the main audio data in step S8200 may include: obtaining audio feedback data of other users for the main audio data from the server to form live audio data.
在一个例子中,步骤S8200中获取对应于主音频数据的现场音频数据还可以包括:从服务器或者本地获取该终端设备对应的用户针对该主音频数据的音频反馈数据,形成现场音频数据。In an example, obtaining live audio data corresponding to the main audio data in step S8200 may further include: obtaining audio feedback data of the user corresponding to the terminal device for the main audio data from the server or locally to form live audio data.
步骤S8300,终端设备1200执行在播放该目标媒体文件的过程中,随同目标媒体文件的主音频数据播放现场音频数据的处理操作。在一个例子中,该处理操作可以包括:终端设备1200合并处理,即将现场音频数据与主音频数据合并,以及根据合并后的音频数据驱动音频输出装置在播放主音频数据的同时播放现场音频数据,其中,合并处理可采用以上方法实施例1中提供的任意一种或者多种方式,在此不再赘述。In step S8300, the terminal device 1200 performs a processing operation of playing live audio data along with the main audio data of the target media file during the process of playing the target media file. In an example, the processing operation may include: terminal device 1200 merging processing, that is, merging live audio data with main audio data, and driving the audio output device to play the live audio data while playing the main audio data according to the merged audio data, Wherein, the merging process can adopt any one or more of the methods provided in Embodiment 1 of the above method, which will not be repeated here.
在另一个例子中,该处理操作可以包括:终端设备1200获取服务器1100提供的合并后的音频数据,其中,该合并后的音频数据是将主音频数据与现场音频数据合并得到的音频数据,以及,根据合并后的音频数据驱动音频输出装置在播放主音频数据的同时播放现场音频数据。In another example, the processing operation may include: the terminal device 1200 obtains the combined audio data provided by the server 1100, where the combined audio data is audio data obtained by combining the main audio data and the live audio data, and , According to the combined audio data to drive the audio output device to play the live audio data while playing the main audio data.
在步骤S8300中,终端设备1200将根据合并的形式,例如混音形式或者多声道形式,驱动音频输出装置在播放主音频数据的同时播放现场音频数据,以在播放目标媒体文件时,随同主音频数据播放对应现场音频数据,实现与其他人共同欣赏目标媒体文件的现场效果。In step S8300, the terminal device 1200 will drive the audio output device to play the live audio data at the same time as the main audio data according to the combined form, such as the mixed audio form or the multi-channel form, so as to follow the main The audio data playback corresponds to the live audio data to realize the live effect of enjoying the target media file with others.
在一个实施例中,该处理方法还可以包括:获取该终端设备对应的用户针对主音频数据反馈的音频反馈数据;将该用户的音频反馈数据上传至服务器。In an embodiment, the processing method may further include: obtaining audio feedback data fed back by the user corresponding to the terminal device for the main audio data; uploading the user's audio feedback data to the server.
根据本实施例,在将该用户的音频反馈数据上传至服务器后,服务器便可将该用户的音频反馈数据发送至其他用户的终端设备,以使得同在播放该目标媒体文件的其他用户能够接收到该用户的音频反馈数据。According to this embodiment, after uploading the user's audio feedback data to the server, the server can send the user's audio feedback data to the terminal devices of other users, so that other users who are also playing the target media file can receive Audio feedback data to the user.
<装置实施例><Device Example>
图9是根据本申请实施例的音频数据的处理装置的示意性原理框图。Fig. 9 is a schematic block diagram of an audio data processing device according to an embodiment of the present application.
根据图9所示,本实施例的处理装置9000包括数据获取模块9100和音频处理模块9200。As shown in FIG. 9, the processing device 9000 of this embodiment includes a data acquisition module 9100 and an audio processing module 9200.
该数据获取模块9100用于获取在主音频数据的播放过程中产生的音频反馈数据。The data acquisition module 9100 is used to acquire audio feedback data generated during the playback of the main audio data.
该音频处理模块9200用于将音频反馈数据与主音频数据合并,生成合并后的音频数据供播放。The audio processing module 9200 is used to combine the audio feedback data with the main audio data, and generate the combined audio data for playback.
在一个实施例中,以上音频处理模块9200在将音频反馈数据与主音频数据合并时,可以用于:获取在主音频数据的设定播放时段内产生的该音频反馈数据的数量;根据该数量确定对应的合并效果,其中,该合并效果至少反映参与合并的各数据的音量配比;以及,根据该合并效果,将在设定播放时段内产生的该音频反馈数据与主音频数据合并。In one embodiment, when the above audio processing module 9200 merges the audio feedback data with the main audio data, it can be used to: obtain the quantity of the audio feedback data generated during the set playing period of the main audio data; according to the quantity The corresponding merging effect is determined, where the merging effect at least reflects the volume ratio of each data participating in the merging; and, according to the merging effect, the audio feedback data generated during the set playback period is merged with the main audio data.
在一个实施例中,以上音频处理模块9200在将音频反馈数据与主音频数据合并时, 可以用于:根据每一音频反馈数据在产生时所对应的主音频数据的播放时段,检测主音频数据的邻近每一音频反馈数据的空闲间隙;以及,将每一音频反馈数据与相邻近的空闲间隙对齐,进行合并。In one embodiment, when the above audio processing module 9200 combines the audio feedback data with the main audio data, it can be used to: detect the main audio data according to the play period of the main audio data corresponding to each audio feedback data when it is generated. And align each audio feedback data with the adjacent idle gaps for merging.
在一个实施例中,以上音频处理模块9200在将音频反馈数据与主音频数据合并时,可以用于:设置包括主音频数据及音频反馈数据在内的每一数据分别占用互不相同的音轨;以及,通过音轨合成将音频反馈数据与主音频数据合并。In one embodiment, when the above audio processing module 9200 combines the audio feedback data with the main audio data, it can be used to: set each data including the main audio data and the audio feedback data to occupy a different audio track. ; And, merge the audio feedback data with the main audio data through audio track synthesis.
在一个实施例中,处理装置9000还可以包括检测模块,该检测模块用于检测是否开启现场音效功能,以响应于开启现场音效功能的指令,通知音频处理模块9200执行将音频反馈数据与主音频数据合并的操作。In one embodiment, the processing device 9000 may further include a detection module configured to detect whether the live sound effect function is enabled, and in response to the instruction to enable the live sound effect function, notify the audio processing module 9200 to perform the combination of audio feedback data with the main audio Operation of data merging.
在一个实施例中,以上数据获取模块9100在获取在主音频数据的播放过程中产生的音频反馈数据时,可以包括:获取在主音频数据的播放过程中反馈的语音评论,并至少将该语音评论作为音频反馈数据。In one embodiment, when the above data acquisition module 9100 acquires the audio feedback data generated during the playback of the main audio data, it may include: acquiring the voice comments fed back during the playback of the main audio data, and at least the voice Comments are used as audio feedback data.
在一个实施例中,以上数据获取模块9100在获取在主音频数据的播放过程中产生的音频反馈数据时,可以包括:获取在主音频数据的播放过程中反馈的文字评论;以及,将文字评论转化为对应的音频数据,并至少将转化后的音频数据作为音频反馈数据。In one embodiment, when the above data acquisition module 9100 acquires the audio feedback data generated during the playback of the main audio data, it may include: acquiring the text comments fed back during the playback of the main audio data; and adding the text comments Convert it into corresponding audio data, and at least use the converted audio data as audio feedback data.
在一个实施例中,以上数据获取模块9100在获取在主音频数据的播放过程中产生的音频反馈数据时,可以包括:获取在主音频数据的播放过程中反馈的表情特征;以及,将所述表情特征转化为对应的音频数据,并至少将转化后的音频数据作为音频反馈数据。In one embodiment, when the above data acquisition module 9100 acquires the audio feedback data generated during the playback of the main audio data, it may include: acquiring the expression characteristics fed back during the playback of the main audio data; and The expression features are converted into corresponding audio data, and at least the converted audio data is used as audio feedback data.
在一个实施例中,以上数据获取模块9100在获取在主音频数据的播放过程中产生的音频反馈数据时,可以用于:获取在主音频数据的播放过程中产生的符合目标分类的音频反馈数据,以使音频处理模块9200生成合并后的音频数据供符合该目标分类的终端设备播放。In one embodiment, when the above data acquisition module 9100 acquires audio feedback data generated during the playback of the main audio data, it may be used to: acquire audio feedback data that meets the target classification generated during the playback of the main audio data , So that the audio processing module 9200 generates merged audio data for playback by terminal devices that meet the target classification.
在一个实施例中,该处理装置9000还可以包括分类模块,该分类模块用于:获取播放主音频数据的终端设备所对应的设定用户特征的特征值;以及,根据该特征值,确定该终端设备所对应的目标分类。In an embodiment, the processing device 9000 may further include a classification module configured to: obtain a characteristic value of a set user characteristic corresponding to a terminal device that plays the main audio data; and, according to the characteristic value, determine the Target classification corresponding to the terminal device.
在一个实施例中,该设定用户特征可以包括:对应于该终端设备的用户在主音频数据的播放过程中产生的音频反馈数据的设定特征。In an embodiment, the setting user characteristics may include: setting characteristics corresponding to the audio feedback data generated by the user of the terminal device during the playback process of the main audio data.
在一个实施例中,该主音频数据为视频文件的音频数据,该处理装置9000还可以包括展示处理模块,该展示处理模块用于:在展示窗口中,以弹幕形式展示代表音频反馈数据的音频波形。In one embodiment, the main audio data is the audio data of the video file, and the processing device 9000 may further include a display processing module, which is used to display the audio feedback data in the form of a barrage in the display window. Audio waveform.
<设备实施例><Equipment Example>
本实施例提供一种电子设备,如图10a所示,该电子设备100包括根据本申请任意实施例的处理装置9000。This embodiment provides an electronic device. As shown in FIG. 10a, the electronic device 100 includes a processing device 9000 according to any embodiment of the present application.
在另一个实施例中,如图10b所示,该电子设备100可以包括存储器110和处理器120,该存储器110用于存储可执行的指令;该处理器120用于根据该可执行的指令的控制,执行如本申请任意方法实施例的处理方法。In another embodiment, as shown in FIG. 10b, the electronic device 100 may include a memory 110 and a processor 120. The memory 110 is used to store executable instructions; the processor 120 is used to execute commands according to the executable instructions. Control and execute the processing method as in any method embodiment of this application.
在本实施例中,电子设备1000可以是服务器,例如是图1中的服务器1100,也可以是任意的终端设备,例如是图1中的终端设备1200,还可以包括服务器和终端设备,例如包括图1中的服务器1100和终端设备1200,在此不做限定。In this embodiment, the electronic device 1000 may be a server, such as the server 1100 in FIG. 1, or any terminal device, such as the terminal device 1200 in FIG. 1, and may also include a server and a terminal device, such as The server 1100 and the terminal device 1200 in FIG. 1 are not limited here.
在一个实施例中,该电子设备100是终端设备,该终端设备可以是具有显示装置的设备,也可以是不具有显示装置的设备,例如终端设备是机顶盒、智能音箱等。In one embodiment, the electronic device 100 is a terminal device. The terminal device may be a device with a display device or a device without a display device. For example, the terminal device is a set-top box, a smart speaker, etc.
在一个实施例中,该电子设备100是终端设备,该终端设备还可以包括输入装置,该输入装置用于供对应用户针对主音频数据发表反馈内容,并将反馈内容发送至以上处理装置9000或者处理器120,以供处理装置9000或者处理器120根据该反馈内容生成对应用户针对主音频数据的音频反馈数据。In one embodiment, the electronic device 100 is a terminal device. The terminal device may also include an input device for the corresponding user to post feedback content for the main audio data and send the feedback content to the above processing device 9000 or The processor 120 is used for the processing device 9000 or the processor 120 to generate audio feedback data corresponding to the user's main audio data according to the feedback content.
该输入装置可以包括音频输入装置、物理键盘、虚拟键盘和触摸屏中的至少一项。The input device may include at least one of an audio input device, a physical keyboard, a virtual keyboard, and a touch screen.
进一步地,该终端设备的处理装置或者处理器还可以用于控制通信装置将对应用户的音频反馈数据发送至服务器,以使得服务器能够将对应用户的音频反馈数据发送给其他用户的终端设备,这样,其他用户在播放同一目标媒体文件的过程中,便能够接收到该用户的音频反馈数据。Further, the processing device or processor of the terminal device can also be used to control the communication device to send the audio feedback data of the corresponding user to the server, so that the server can send the audio feedback data of the corresponding user to the terminal equipment of other users. , Other users can receive the user’s audio feedback data while playing the same target media file.
在一个实施例中,该电子设备100是终端设备,该终端设备还可以包括音频输出装置,该音频输出装置用于根据处理装置或者处理器的控制,在播放主音频数据的同时播放对应的音频反馈数据。当然,在另外的实施例中,该终端设备也可以通过有线或者无线的方式连接音频输出装置来播放合并后的音频数据。In one embodiment, the electronic device 100 is a terminal device. The terminal device may also include an audio output device. The audio output device is used to play the main audio data while playing the corresponding audio according to the control of the processing device or the processor. Feedback data. Of course, in other embodiments, the terminal device may also be connected to the audio output device in a wired or wireless manner to play the combined audio data.
<介质实施例><Medium Example>
在本实施例中,还提供一种计算机可读存储介质,该计算机可读存储介质存储有可被计算机读取并运行的计算机程序,所述计算机程序用于在被所述计算机读取运行时,执行如本申请以上任意实施例所述的音频数据的处理方法。In this embodiment, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program that can be read and run by a computer. , Execute the audio data processing method as described in any of the above embodiments of this application.
本申请可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可 读存储介质,其上载有用于使处理器实现本申请的各个方面的计算机可读程序指令。This application can be a system, method and/or computer program product. The computer program product may include a computer readable storage medium loaded with computer readable program instructions for enabling a processor to implement various aspects of the present application.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples of computer-readable storage media (non-exhaustive list) include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon The protruding structure in the hole card or the groove, and any suitable combination of the above. The computer-readable storage medium used here is not interpreted as a transient signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。The computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages. Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet connection). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions. The computer-readable program instructions are executed to realize various aspects of the present application.
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或 框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Here, various aspects of the present application are described with reference to the flowcharts and/or block diagrams of the methods, devices (systems) and computer program products according to the embodiments of the present application. It should be understood that each block of the flowcharts and/or block diagrams and combinations of blocks in the flowcharts and/or block diagrams can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine such that when these instructions are executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner, so that the computer-readable medium storing instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions onto a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , So that the instructions executed on the computer, other programmable data processing apparatus, or other equipment realize the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本申请的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowcharts and block diagrams in the drawings show the possible implementation of the system architecture, functions, and operations of the system, method, and computer program product according to multiple embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more functions for implementing the specified logical function. Executable instructions. In some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本申请的范围由所附权利要求来限定。The embodiments of the present application have been described above, and the above description is exemplary and not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements in the market of the embodiments, or to enable other ordinary skilled in the art to understand the embodiments disclosed herein. The scope of the application is defined by the appended claims.

Claims (24)

  1. 一种音频数据的处理方法,包括:A processing method of audio data, including:
    获取在主音频数据的播放过程中产生的音频反馈数据;Obtain audio feedback data generated during the playback of the main audio data;
    将所述音频反馈数据与所述主音频数据合并,生成合并后的音频数据供播放。The audio feedback data and the main audio data are combined to generate combined audio data for playback.
  2. 根据权利要求1所述的处理方法,其中,所述将所述音频反馈数据与所述主音频数据合并,包括:The processing method according to claim 1, wherein said combining said audio feedback data with said main audio data comprises:
    获取在所述主音频数据的设定播放时段内产生的所述音频反馈数据的数量;Acquiring the quantity of the audio feedback data generated within the set playing period of the main audio data;
    根据所述数量确定对应的合并效果,其中,所述合并效果至少反映参与合并的各数据的音量配比;Determine the corresponding merging effect according to the number, where the merging effect at least reflects the volume ratio of each data participating in the merging;
    根据所述合并效果,将在所述设定播放时段内产生的所述音频反馈数据与所述主音频数据合并。According to the merging effect, the audio feedback data generated in the set play period is merged with the main audio data.
  3. 根据权利要求1所述的处理方法,其中,所述将所述音频反馈数据与所述主音频数据合并,包括:The processing method according to claim 1, wherein said combining said audio feedback data with said main audio data comprises:
    根据每一所述音频反馈数据在产生时所对应的主音频数据的播放时段,检测所述主音频数据的、邻近每一所述音频反馈数据的空闲间隙;Detecting the idle gap of the main audio data adjacent to each of the audio feedback data according to the play period of the main audio data corresponding to each of the audio feedback data when it is generated;
    将每一所述音频反馈数据与相邻近的所述空闲间隙对齐,进行所述合并。Align each of the audio feedback data with the adjacent free gaps, and perform the combination.
  4. 根据权利要求1所述的处理方法,其中,所述将所述音频反馈数据与所述主音频数据合并,包括:The processing method according to claim 1, wherein said combining said audio feedback data with said main audio data comprises:
    设置包括所述主音频数据及所述音频反馈数据在内的每一条数据各自占用互不相同的音轨;Setting each piece of data including the main audio data and the audio feedback data to occupy a different audio track;
    通过音轨合成,将所述音频反馈数据与所述主音频数据合并。The audio feedback data is combined with the main audio data through audio track synthesis.
  5. 根据权利要求1所述的处理方法,其中,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:The processing method according to claim 1, wherein said acquiring audio feedback data generated during the playing process of the main audio data comprises:
    获取在主音频数据的播放过程中产生的符合目标分类的音频反馈数据;Acquire audio feedback data that meets the target classification generated during the playback of the main audio data;
    所述生成合并后的音频数据供播放,包括:The generating of the combined audio data for playback includes:
    生成合并后的音频数据供符合所述目标分类的终端设备播放。The combined audio data is generated for playback by terminal devices that meet the target classification.
  6. 根据权利要求5所述的处理方法,其中,所述方法还包括:The processing method according to claim 5, wherein the method further comprises:
    获取播放所述主音频数据的终端设备所对应的设定用户特征的特征值;Acquiring the characteristic value of the set user characteristic corresponding to the terminal device playing the main audio data;
    根据所述特征值,确定所述终端设备所对应的目标分类。According to the characteristic value, the target classification corresponding to the terminal device is determined.
  7. 根据权利要求6所述的处理方法,其中,所述设定用户特征包括对应于所述终端 设备的用户在所述主音频数据的播放过程中产生的音频反馈数据的设定特征。8. The processing method according to claim 6, wherein the set user characteristics include set characteristics corresponding to audio feedback data generated by the user of the terminal device during the playback of the main audio data.
  8. 根据权利要求1所述的处理方法,其中,所述主音频数据为视频文件的音频数据,所述方法还包括:The processing method according to claim 1, wherein the main audio data is audio data of a video file, and the method further comprises:
    在所述视频文件的视频播放窗口中,以弹幕形式展示代表所述音频反馈数据的音频波形。In the video playback window of the video file, the audio waveform representing the audio feedback data is displayed in the form of a bullet screen.
  9. 根据权利要求1所述的处理方法,其中,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:The processing method according to claim 1, wherein said acquiring audio feedback data generated during the playing process of the main audio data comprises:
    获取在主音频数据的播放过程中反馈的语音评论,并至少将所述语音评论作为所述音频反馈数据。Acquire the voice comment fed back during the playing process of the main audio data, and use the voice comment as the audio feedback data at least.
  10. 根据权利要求1所述的处理方法,其中,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:The processing method according to claim 1, wherein said acquiring audio feedback data generated during the playing process of the main audio data comprises:
    获取在主音频数据的播放过程中反馈的文字评论;Obtain the text comments fed back during the playback of the main audio data;
    将所述文字评论转化为对应的音频数据,并至少将转化后的音频数据作为所述音频反馈数据。The text comment is converted into corresponding audio data, and at least the converted audio data is used as the audio feedback data.
  11. 根据权利要求1所述的处理方法,其中,所述获取在主音频数据的播放过程中产生的音频反馈数据,包括:The processing method according to claim 1, wherein said acquiring audio feedback data generated during the playing process of the main audio data comprises:
    获取在主音频数据的播放过程中反馈的表情特征;Obtain the expression characteristics fed back during the playback of the main audio data;
    将所述表情特征转化为对应的音频数据,并至少将转化后的音频数据作为所述音频反馈数据。The expression feature is converted into corresponding audio data, and at least the converted audio data is used as the audio feedback data.
  12. 根据权利要求1所述的处理方法,其中,所述主音频数据为直播媒体文件的音频数据。The processing method according to claim 1, wherein the main audio data is audio data of a live media file.
  13. 根据权利要求1至12中任一项所述的处理方法,其中,所述方法还包括:The processing method according to any one of claims 1 to 12, wherein the method further comprises:
    响应于开启现场音效功能的指令,执行所述将所述音频反馈数据与所述主音频数据合并的操作。In response to the instruction to turn on the live sound effect function, the operation of merging the audio feedback data with the main audio data is performed.
  14. 一种音频数据的处理方法,由终端设备实施,所述方法包括:A method for processing audio data, implemented by a terminal device, the method including:
    获取选择播放的主音频数据;Obtain the main audio data selected to be played;
    获取对应于所述主音频数据的现场音频数据,其中,所述现场音频数据至少包括其他用户针对所述主音频数据的音频反馈数据;Acquiring live audio data corresponding to the main audio data, where the live audio data includes at least other users' audio feedback data for the main audio data;
    执行在播放所述主音频数据的同时播放所述现场音频数据的处理操作。Perform a processing operation of playing the live audio data while playing the main audio data.
  15. 根据权利要求14所述的处理方法,其中,所述现场音频数据还包括所述终端设 备对应的用户针对所述主音频数据的音频反馈数据。The processing method according to claim 14, wherein the live audio data further includes audio feedback data of the user corresponding to the terminal device for the main audio data.
  16. 一种音频数据的处理方法,由终端设备实施,所述方法包括:A method for processing audio data, implemented by a terminal device, the method including:
    响应于播放目标媒体文件的操作,播放所述目标媒体文件,其中,所述目标媒体文件包括主音频数据;In response to the operation of playing the target media file, playing the target media file, where the target media file includes main audio data;
    获取对应于所述主音频数据的现场音频数据,其中,所述现场音频数据至少包括其他用户针对所述主音频数据的音频反馈数据;执行在播放所述目标媒体文件的过程中,随同所述主音频数据播放所述现场音频数据的处理操作。Acquire live audio data corresponding to the main audio data, where the live audio data includes at least other users’ audio feedback data for the main audio data; the execution is accompanied by the process of playing the target media file. The main audio data plays the processing operation of the live audio data.
  17. 根据权利要求16所述的处理方法,其中,所述获取对应于所述主音频数据的现场音频数据,包括:The processing method according to claim 16, wherein said acquiring live audio data corresponding to said main audio data comprises:
    从服务器获取其他用户针对所述主音频数据的音频反馈数据,作为所述现场音频数据。Acquire audio feedback data of other users for the main audio data from the server as the live audio data.
  18. 根据权利要求16所述的处理方法,其中,所述方法还包括:The processing method according to claim 16, wherein the method further comprises:
    获取所述终端设备对应的用户针对所述主音频数据的音频反馈数据;Acquiring audio feedback data of the user corresponding to the terminal device for the main audio data;
    将所述用户的音频反馈数据上传至服务器。Upload the audio feedback data of the user to the server.
  19. 一种音频数据的处理装置,包括:An audio data processing device includes:
    数据获取模块,用于获取在主音频数据的播放过程中产生的音频反馈数据;以及,音频处理模块,用于将所述音频反馈数据与所述主音频数据合并,生成合并后的音频数据供播放。A data acquisition module for acquiring audio feedback data generated during the playback of the main audio data; and an audio processing module for combining the audio feedback data with the main audio data to generate the combined audio data for use Play.
  20. 一种电子设备,包括权利要求19所述的处理装置;或者,包括:An electronic device, comprising the processing device according to claim 19; or, comprising:
    存储器,用于存储可执行的指令;Memory, used to store executable instructions;
    处理器,用于根据所述可执行的指令的控制,运行所述电子设备执行根据权利要求1-18中任意一项所述的处理方法。The processor is configured to run the electronic device to execute the processing method according to any one of claims 1-18 according to the control of the executable instruction.
  21. 根据权利要求20所述的电子设备,其中,所述电子设备是不具有显示装置的终端设备。The electronic device according to claim 20, wherein the electronic device is a terminal device without a display device.
  22. 根据权利要求20所述的电子设备,其中,所述电子设备是终端设备,所述终端设备还包括输入装置,所述输入装置用于供对应用户针对主音频数据输入反馈内容,并将所述反馈内容发送至所述处理装置或者处理器,以供所述处理装置或者处理器根据所述反馈内容生成所述对应用户针对主音频数据的音频反馈数据。The electronic device according to claim 20, wherein the electronic device is a terminal device, and the terminal device further comprises an input device for the corresponding user to input feedback content for the main audio data, and the The feedback content is sent to the processing device or the processor, so that the processing device or the processor generates audio feedback data of the corresponding user for the main audio data according to the feedback content.
  23. 根据权利要求20所述的电子设备,其中,所述电子设备是终端设备,所述终端设备还包括音频输出装置,所述音频输出装置用于根据所述处理装置或所述处理器的控 制,在播放主音频数据的同时播放对应的音频反馈数据。The electronic device according to claim 20, wherein the electronic device is a terminal device, the terminal device further comprises an audio output device, and the audio output device is configured to be controlled by the processing device or the processor, Play the corresponding audio feedback data while playing the main audio data.
  24. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有可被计算机读取执行的计算机程序,所述计算机程序用于在被所述计算机读取运行时,执行根据权利要求1至18中任一项所述的处理方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that can be read and executed by a computer, and the computer program is used to execute a computer program according to claim 1 when read and executed by the computer. The processing method described in any one of to 18.
PCT/CN2020/099864 2019-07-10 2020-07-02 Audio data processing method and apparatus, and electronic device WO2021004362A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910619886.0A CN112287129A (en) 2019-07-10 2019-07-10 Audio data processing method and device and electronic equipment
CN201910619886.0 2019-07-10

Publications (1)

Publication Number Publication Date
WO2021004362A1 true WO2021004362A1 (en) 2021-01-14

Family

ID=74114394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099864 WO2021004362A1 (en) 2019-07-10 2020-07-02 Audio data processing method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN112287129A (en)
WO (1) WO2021004362A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112954579A (en) * 2021-01-26 2021-06-11 腾讯音乐娱乐科技(深圳)有限公司 Method and device for reproducing on-site listening effect

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160819B (en) * 2021-04-27 2023-05-26 北京百度网讯科技有限公司 Method, apparatus, device, medium, and product for outputting animation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293784A (en) * 1998-03-13 2001-05-02 西门子共同研究公司 Apparatus and method for collaborative dynamic video annotation
US20120210348A1 (en) * 2008-03-20 2012-08-16 Verna IP Holdings, LLC. System and methods providing sports event related media to internet-enabled devices synchronized with a live broadcast of the sports event
CN103150325A (en) * 2012-09-25 2013-06-12 圆刚科技股份有限公司 Multimedia comment system and multimedia comment method
US20170163697A1 (en) * 2005-12-16 2017-06-08 At&T Intellectual Property Ii, L.P. Real-time media control for audio and multimedia conferencing services
JP2017219573A (en) * 2016-06-03 2017-12-14 デジタルセンセーション株式会社 Information processing device, information processing method, and program

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8887190B2 (en) * 2009-05-28 2014-11-11 Harris Corporation Multimedia system generating audio trigger markers synchronized with video source data and related methods
US20100306232A1 (en) * 2009-05-28 2010-12-02 Harris Corporation Multimedia system providing database of shared text comment data indexed to video source data and related methods
KR20180105810A (en) * 2017-03-16 2018-10-01 네이버 주식회사 Method and system for generating content using audio comment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293784A (en) * 1998-03-13 2001-05-02 西门子共同研究公司 Apparatus and method for collaborative dynamic video annotation
US20170163697A1 (en) * 2005-12-16 2017-06-08 At&T Intellectual Property Ii, L.P. Real-time media control for audio and multimedia conferencing services
US20120210348A1 (en) * 2008-03-20 2012-08-16 Verna IP Holdings, LLC. System and methods providing sports event related media to internet-enabled devices synchronized with a live broadcast of the sports event
CN103150325A (en) * 2012-09-25 2013-06-12 圆刚科技股份有限公司 Multimedia comment system and multimedia comment method
JP2017219573A (en) * 2016-06-03 2017-12-14 デジタルセンセーション株式会社 Information processing device, information processing method, and program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112954579A (en) * 2021-01-26 2021-06-11 腾讯音乐娱乐科技(深圳)有限公司 Method and device for reproducing on-site listening effect

Also Published As

Publication number Publication date
CN112287129A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
Vryzas et al. Speech emotion recognition for performance interaction
US11043216B2 (en) Voice feedback for user interface of media playback device
US11003708B2 (en) Interactive music feedback system
US20180077440A1 (en) System and method of creating, analyzing, and categorizing media
JP2017229060A (en) Methods, programs and devices for representing meeting content
CN113709561A (en) Video editing method, device, equipment and storage medium
CN108847214A (en) Method of speech processing, client, device, terminal, server and storage medium
WO2021004362A1 (en) Audio data processing method and apparatus, and electronic device
WO2022253157A1 (en) Audio sharing method and apparatus, device, and medium
US20210232624A1 (en) Interactive Music Feedback System
Vryzas et al. Speech emotion recognition adapted to multimodal semantic repositories
US9286943B2 (en) Enhancing karaoke systems utilizing audience sentiment feedback and audio watermarking
WO2022218027A1 (en) Audio playing method and apparatus, and computer-readable storage medium and electronic device
CN114501103B (en) Live video-based interaction method, device, equipment and storage medium
CN111726696A (en) Application method, device and equipment of sound barrage and readable storage medium
Liem et al. Multimedia technologies for enriched music performance, production, and consumption
CN110324702B (en) Information pushing method and device in video playing process
JP6367748B2 (en) Recognition device, video content presentation system
US20230030502A1 (en) Information play control method and apparatus, electronic device, computer-readable storage medium and computer program product
CN112995530A (en) Video generation method, device and equipment
CN111914115A (en) Sound information processing method and device and electronic equipment
Liaw et al. Live stream highlight detection using chat messages
KR102472921B1 (en) User interfacing method for visually displaying acoustic signal and apparatus thereof
CN113918755A (en) Display method and device, storage medium and electronic equipment
Roininen et al. Modeling the timing of cuts in automatic editing of concert videos

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20837295

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20837295

Country of ref document: EP

Kind code of ref document: A1