WO2019227308A1 - 一种从音视频文件中选择音轨的方法及装置 - Google Patents

一种从音视频文件中选择音轨的方法及装置 Download PDF

Info

Publication number
WO2019227308A1
WO2019227308A1 PCT/CN2018/088857 CN2018088857W WO2019227308A1 WO 2019227308 A1 WO2019227308 A1 WO 2019227308A1 CN 2018088857 W CN2018088857 W CN 2018088857W WO 2019227308 A1 WO2019227308 A1 WO 2019227308A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
terminal
language
track
bit rate
Prior art date
Application number
PCT/CN2018/088857
Other languages
English (en)
French (fr)
Inventor
余艳辉
李昕
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US17/058,995 priority Critical patent/US20210219028A1/en
Priority to PCT/CN2018/088857 priority patent/WO2019227308A1/zh
Priority to CN201880093609.4A priority patent/CN112189344A/zh
Priority to EP18921207.9A priority patent/EP3783906A4/en
Publication of WO2019227308A1 publication Critical patent/WO2019227308A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • H04N21/4433Implementing client middleware, e.g. Multimedia Home Platform [MHP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof

Definitions

  • the embodiments of the present application relate to the technical field of audio and video related to terminal devices, and in particular, to a method and device for selecting an audio track from audio and video files on a terminal device.
  • Smartphones have developed rapidly in recent years. With the increase in screen size and enhancement of audio effects, it is becoming more and more common for people to watch movies, TVs and other videos through mobile phone audio and video playback applications.
  • Video resources include video data and audio data.
  • the audio data can be a single track or multiple tracks.
  • An audio track is a track that records audio data.
  • Each audio track has one or more attribute parameters, which include audio format, bit rate, dubbing language, sound effect, number of channels, volume, and so on.
  • attribute parameters include audio format, bit rate, dubbing language, sound effect, number of channels, volume, and so on.
  • different two audio tracks have at least one different attribute parameter, or at least one of the different audio parameters has different values.
  • the playback application can display audio track options on the interface, and the user can select an appropriate audio track for playback according to the audio track option.
  • most general playback applications do not have this function, resulting in a problem that when the user watches a video with multiple audio tracks, the actual audio track does not match the audio track that the user wants to play.
  • the accuracy of the audio track selection is very important.
  • the embodiments of the present application provide a method and a device for selecting an audio track from an audio and video file, so that a terminal can automatically select an audio track that meets user requirements when playing a multi-audio track video.
  • an embodiment of the present invention provides a method for selecting an audio track from an audio and video file, including:
  • the terminal selects one or more audio tracks that support decoding from one or more audio tracks of the audio and video file; the terminal selects one or more languages from one or more audio tracks that support decoding Matching audio tracks; the terminal selects one or more audio tracks of the first specification among the one or more audio tracks of the language matching; the terminal selects one or more audio tracks of the first specification from the supported audio formats Among the plurality of audio tracks, an audio track having a first bit rate is selected; the terminal plays the audio and video file according to the selected audio track; wherein the first specification in the supported audio format is higher than the second specification in the supported audio format ; The first bit rate in the audio track with the first bit rate is higher than the second bit rate in the audio track with the second bit rate; the second specification refers to Dolby Surround Audio Coding AC-3; the second bit rate It means 448 kilobytes per second.
  • the decoding format of the one or more audio tracks is compared with a preset decoding set, and if the decoding format of the one or more audio tracks is in a preset decoding set, it is considered that One or more audio tracks exist in one or more audio tracks that support decoding.
  • the terminal selecting one or more audio tracks that support decoding from one or more audio tracks of the audio and video files includes: the terminal selecting one or more audio tracks from one or more audio and video files whose decoding format is Set the audio track in the decoding set.
  • the terminal selects one or more language-matching audio tracks from one or more audio tracks that support decoding.
  • the terminal includes: according to the system language, input method settings, voice assistant input, and historical viewing habits.
  • One or more of the first language evaluation results are determined; the terminal selects one or more audio tracks that match the language from one or more audio tracks that support decoding according to the first language evaluation results.
  • one or more audio tracks of which the terminal matches the language, and one or more audio tracks of the first specification that the terminal selects to support the audio format include: one or more of the terminals that match the language Among the audio tracks, the audio format selected by the terminal to be supported is one or more audio tracks with a higher sampling rate than the second specification.
  • the terminal selects an audio track with a first bit rate from one or more audio tracks of a first specification supported by the terminal, including: the terminal is based on one or more of the audio format and bit rate Each determines the results of the second language evaluation.
  • the terminal determines the language of the audio track to be played according to the first language evaluation result and the second language evaluation result.
  • an embodiment of the present invention provides a terminal device including: a display; an audio playback or output element; one or more processors; a memory; a plurality of application programs; and one or more computer programs, wherein One or more computer programs are stored in the memory, and the one or more computer programs include instructions that, when executed by the terminal device, cause the terminal device to perform the following steps:
  • the first specification in the supported audio format is higher than the second specification in the supported audio format; the first bit rate in the track with the first bit rate is higher than the second rate in the track with the second bit rate Bit rate;
  • the second specification refers to Dolby Surround Audio Coding AC-3;
  • the second bit rate refers to 448 kilobytes per second.
  • the decoding format of the one or more audio tracks is compared with a preset decoding set, and if the decoding format of the one or more audio tracks is in a preset decoding set, it is considered that One or more audio tracks exist in one or more audio tracks that support decoding.
  • the terminal selecting one or more audio tracks that support decoding from one or more audio tracks of the audio and video files includes: the terminal selecting one or more audio tracks from one or more audio and video files whose decoding format is Set the audio track in the decoding set.
  • the terminal selects one or more language-matching audio tracks from one or more audio tracks that support decoding.
  • the terminal includes: according to the system language, input method settings, voice assistant input, and historical viewing habits.
  • One or more of the first language evaluation results are determined; the terminal selects one or more audio tracks that match the language from one or more audio tracks that support decoding according to the first language evaluation results.
  • one or more audio tracks of which the terminal matches the language, and one or more audio tracks of the first specification supported by the terminal in the selected audio format include: Among the audio tracks, the audio format selected by the terminal to be supported is one or more audio tracks with a higher sampling rate than the second specification.
  • the terminal selects an audio track with a first bit rate from one or more audio tracks of a first specification supported by the terminal, including: the terminal according to one or more of the audio format and the bit rate Each determines the results of the second language evaluation.
  • the terminal determines the language of the audio track to be played according to the first language evaluation result and the second language evaluation result.
  • the one or more computer programs include instructions, and when the instructions are executed by the electronic device, cause the electronic device to perform the following steps: the terminal according to a system language, an input method setting, and a voice assistant One or more of input and historical viewing habits determine the first language evaluation result.
  • the one or more computer programs include instructions, and when the instructions are executed by the electronic device, cause the electronic device to perform the following steps: the terminal according to one of an audio format, a bit rate, or Multiple determine the second language evaluation results.
  • the one or more computer programs include instructions that, when executed by the electronic device, cause the electronic device to perform the following steps: according to the first language evaluation result and the The second language evaluation result determines the language of the audio track to be played.
  • an embodiment of the present invention provides a computer program product, and when the computer program product runs on a terminal, the terminal is caused to execute the method according to any one of the foregoing.
  • an embodiment of the present invention provides a computer-readable storage medium including instructions.
  • the instructions When the instructions are executed on a terminal, the terminal is caused to execute the method according to any one of the foregoing.
  • the solution provided by the present invention can automatically select the audio track that meets the user's needs.
  • FIG. 1 is a schematic diagram of language and input method settings in an Android system according to an embodiment of the present invention
  • FIG. 2 is a structural diagram of an Android system provided by an embodiment of the present invention.
  • FIG. 3 is a first schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a processor according to an embodiment of the present invention.
  • FIG. 5 is a second schematic structural diagram of a terminal according to an embodiment of the present invention.
  • first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise stated, the meaning of "a plurality" is two or more.
  • the method for selecting an audio track provided in the embodiments of the present application can be applied to a terminal.
  • the terminal can be a mobile phone, tablet, wearable device, in-vehicle device, augmented reality (AR) ⁇ virtual reality (VR) device, laptop, ultra-mobile personal computer (UMPC) ), A personal digital assistant (personal digital assistant, PDA) and any other terminal with a display function, which is not limited in this embodiment of the present application.
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • a general terminal has one or more application software for playing audio and video.
  • the playback application software may be pre-installed when the terminal is delivered from the factory, or may be subsequently installed in the terminal.
  • the terminal can play audio and video files through the playback application, that is, the user can watch movies, TV, or other audio and video programs through the terminal.
  • An audio track is a track for recording audio data.
  • Each audio track has one or more attribute parameters.
  • the attribute parameters may include audio format, bit rate, language, sound effect, channel number, volume, and so on.
  • the audio data is a multi-audio track, different two audio tracks have at least one different attribute parameter, or at least one of the different audio parameters has different values.
  • the language can refer to the voiceover language.
  • the played audio and video files can be saved in the terminal or downloaded from the network immediately. Playing the immediately downloaded audio and video files is called online playback, and playing the audio and video files stored in the terminal is called local playback. If the played audio and video are online, the network side needs to inform the terminal whether the audio and video are multi-track. Specifically, the network side can notify the terminal that the audio and video are multi-track by using a notification message. After the terminal selects a specific audio track, it must inform the network side of the selection. Specifically, the terminal may notify the network-side terminal that a specific audio track has been selected in a notification message. If it is played locally, there is no need to interact with the network side, and the terminal can select a specific audio track to play locally.
  • the embodiment of the present invention does not limit the format of the notification message.
  • the network side notifies the terminal that the audio and video file is a multi-audio track through a specific notification message, or the network side notifies the terminal of the Audio and video files are multi-track.
  • Playing video and audio requires a playback application.
  • the playback applications are used differently. For example, a user opens an application and selects a program to play from an album in the application. For playback of videos saved in the album, the first playback can use the default settings of the playback app.
  • the terminal selects one audio track from a plurality of audio tracks of the audio and video file to play.
  • Audio and video track selection including the following rules:
  • Audio tracks that support high-standard audio formats in the audio format (for example, DTS (Digital Cinema Systems, Digital Theater Systems) audio formats with specifications higher than AC-3 (Dolby Surround Audio Coding-3, Dolby AC-3) Audio format specifications); and
  • the order of selection among the various options is not limited, and any one of them may be selected first.
  • the same language as the user uses the mobile phone or (2) the language type consistent with the user's historical viewing video.
  • different scenes can also be distinguished. For example, if the video watched by the user historically includes a video whose original version is a first language (such as Chinese) and a video whose original version is a second language (such as English), the determination of the audio track language must be combined with the video that the user is currently watching Characteristics. For example, if the user currently wants to watch a video with the original version in Chinese, and the user has watched a video with the original version in Chinese history, then according to the user's viewing history, select Chinese as the language of the audio track.
  • a first language such as Chinese
  • a second language such as English
  • Audio track specifications include the following: lossless audio formats, audio formats with small lossy compression loss, and audio formats with large lossy compression loss.
  • the order of the audio track specifications is: lossless audio format> audio format with small lossy compression loss> audio format with large lossy compression loss.
  • the order of the audio track specifications is: DTS decoding> AC-3 decoding of Dolby decoding> ordinary lossy compression loss audio format.
  • the bit selection rate is generally high, but it also depends on whether the hardware or software of the terminal supports it. If the terminal's hardware or software does not support it, you can choose the next highest bit rate. If the hardware or software of the terminal cannot support even the next highest bit rate, you can choose the lower bit rate and then proceed in the same way.
  • the determination of the audio track can also depend on factors other than the above, such as:
  • the sound effect can be selected according to the user's historical viewing habits, or the terminal can choose according to the setting of the built-in sound effect.
  • the historical movie viewing habits you can record the most viewed previous sound effects, if any, you can choose the most viewed previous sound effects.
  • the historical viewing habits can refer to the viewing habits of users in the same viewing application, and the historical viewing habits can also refer to the viewing habits of users in all different viewing applications.
  • the user here refers to a user using different user accounts, or different terminals correspond to different users.
  • the terminal selects according to the settings of the built-in sound effects. You can choose this way: Some terminals have built-in sound effects, such as Dolby, which has settings for music, theater, etc. If there is a match, you can also choose this.
  • the number of channels can be based on the number of channels supported by the terminal or the maximum number of channels.
  • the terminal When a segment of audio and video files needs to be played, the terminal will select a suitable audio track from the audio and video files to match the video playback in the audio and video files. How to select a suitable audio track requires the algorithm in the embodiment of the present invention.
  • the terminal can select the soundtrack of the matching language according to the following language parameters, such as the language decision result obtained based on the user's system language setting, input method setting, voice assistant input, and historical viewing habits.
  • language parameters such as the language decision result obtained based on the user's system language setting, input method setting, voice assistant input, and historical viewing habits.
  • the terminal can decide the language to play according to the audio format and bit rate; and the bottom layer and hardware of the system determine the audio format and bit rate.
  • the terminal synthesizes the contents of the above two paragraphs to obtain the final rules for selecting the playback language of the audio track: exclude audio tracks that do not support decoding; then select from multiple audio tracks that match the language; and then select multiple languages that match.
  • select the audio format that supports high specifications in the audio format for example, the DTS audio format in this example has a higher specification than the AC-3 audio format; then choose a high bit rate from the high-specification audio format. Tracks.
  • the algorithm can be applied to a variety of operating systems.
  • operating systems include Operating system, Apple Various other operating systems, such as operating systems. Below with The operating system is used as an example to illustrate how the algorithm works.
  • This algorithm can place the selection of audio tracks (Hereinafter, it is replaced by Android) the application framework layer of the operating system. Specifically, the application framework layer receives and stores multiple audio tracks, and the application framework layer selects and uses specific audio tracks, and sends the selected specific audio tracks to the application layer for playback.
  • the terminal software system developed according to the algorithm may specifically include two recording modules.
  • the Android Android framework layer may include two recording modules, one is a language determination recording module 105 and the other is a decoding capability recording module 106.
  • the language judgment recording module performs input judgment according to at least three parameters of the application layer.
  • the at least three parameters include the language of the system language setting 101, the language 102 of the user input method, and the language 103 input by the user through a voice assistant.
  • the system language setting 101 includes a language option in a terminal setting.
  • the input method setting 102 includes settings of an input method application. For example, in the setting of the language and input method in the Android system shown in FIG.
  • the uppermost box “Language” is the system language setting
  • the lower box “Baidu input method Huawei version” is the input method setting.
  • Voice assistant input refers to input through language interaction with the terminal.
  • Historical movie viewing habits refer to the past movie viewing habits of users in this movie viewing. Historical viewing habits can refer to the viewing habits of users in the same playback application, and historical viewing habits can also refer to the viewing habits of users in all different playback applications.
  • the user here refers to a user who uses a playback application differentiated according to different user accounts, or a different user corresponding to a different terminal.
  • the language judgment result of the user refers to the language judgment result comprehensively obtained according to the user's system language setting, input method setting, voice assistant input, and historical viewing habits, and it can also be the language that the selected audio track should use.
  • the language judgment recording module performs input judgment according to at least three parameters of the application layer.
  • the at least three parameters include the language set by the user 101, the language 102 of the user input method, and the language 103 input by the user through a voice assistant.
  • the specific judgment process can refer to the following table 1:
  • the above table 1 also includes the user ’s historical viewing habits.
  • the language judgment record module of the framework layer obtains the language judgment results according to the various conditions of the parameters in the middle four columns in the above table.
  • the system language is set to Chinese
  • the input method setting is also Chinese
  • the input of the voice assistant is also Chinese
  • the historical viewing habits are also Chinese
  • the language judgment result is Chinese.
  • the system language is set to Chinese
  • the input method is set to English
  • the voice assistant input is Chinese
  • the historical viewing habits are English.
  • Shadow habits to determine the language verdict is English.
  • the application framework layer of the Android system also includes a decoding capability recording module 106.
  • This decoding capability recording module 106 is determined according to the conditions of the system bottom layer and the hardware 108. Specifically, refer to the following table:
  • the audio format represents the decoding ability
  • the bit rate represents the audio track effect.
  • the framework layer gets all the track information from the player in the application layer.
  • the audio format is DTS (Digital Theater Systems, Digital Theater Systems)
  • the bit rate is 1509kbps (kilobytes per second)
  • the language is English.
  • the track with the track number 2 is different from the track with the track number 1 in that the language is Chinese.
  • the audio format is AC-3 (Dolby Surround Audio Coding-3, Dolby AC-3), the bit rate is 448 kbps, and the language is English.
  • the audio format and bit rate of the table above determine the language of playback.
  • the sorting of the audio track scores of the above video includes the following rules: first exclude the audio tracks that do not support decoding; then select from multiple audio tracks to match the language; and then select the audio formats that support the audio from multiple audio tracks that match the language Medium and high specification audio formats (for example, the specifications of the DTS audio format in this example are higher than the specifications of the AC-3 audio format); then from the high specification audio format, select the audio track with the high bit rate.
  • different users can correspond to different terminals or different accounts in the same terminal.
  • different accounts act as independent users, and maintain several recording language determination records, decoding capability records, and video soundtrack score ranking modules.
  • the video track score sorting module 107 of the frame layer obtains all audio track information in the video and audio files from the player 104 of the application layer, and after the video track score sorting module 107 sorts, the optimal default sound is obtained. The track is returned to the player 104 of the application layer to play.
  • the process of selecting the optimal default audio track is the same as the function implemented by the above-mentioned video audio track score ranking module, and is not repeated here.
  • the embodiments of the present invention also include the following solutions:
  • the algorithm described in the embodiment of the present invention may also be applied in a scenario that a user can select.
  • the terminal can automatically select it first, and then prompt the user that "the video includes multiple audio tracks, and the terminal has automatically selected the XX audio track". If the user is not satisfied with the selected audio track, he can manually select it.
  • FIG. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal 300 includes a processor 301, a memory 302, a camera 303, an RF circuit 304, an audio circuit 305, a speaker 306, a microphone 307, an input device 308, other input devices 309, a display screen 310, a touch panel 311, and a display panel 312 , Output device 313, and power supply 314.
  • the display screen 310 is at least composed of a touch panel 311 as an input device and a display panel 312 as an output device.
  • the terminal structure shown in FIG. 3 does not constitute a limitation on the terminal, and may include more or fewer components than shown in the figure, or combine some components, or split certain components, or different The arrangement of components is not limited here.
  • a radio frequency (RF) circuit 304 can be used to receive and send signals during the transmission and reception of information or during a call. For example, if the terminal 300 is a vehicle-mounted device, the terminal 300 can transmit the downlink information sent by the base station through the RF circuit 304. After receiving, it is transmitted to the processor 301 for processing; in addition, uplink-related data is sent to the base station.
  • the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
  • the RF circuit 304 can also communicate with a network and other devices through wireless communication.
  • This wireless communication can use any communication standard or protocol, including but not limited to the Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), and Code Division Multiple Access multiple access (CDMA), wideband code division multiple access (WCDMA), long term evolution (LTE), e-mail, short message service (SMS), and the like.
  • GSM Global System for Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • SMS short message service
  • the memory 302 may be used to store software programs and modules.
  • the processor 301 executes various functional applications and data processing of the terminal 300 by running the software programs and modules stored in the memory 302.
  • the memory 302 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required for at least one function (for example, a sound playback function, an image playback function, etc.), etc .; the storage data area may store Data (such as audio data, video data, etc.) created according to the use of the terminal 300.
  • the memory 302 may include a high-speed random access memory, and may further include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • Other input devices 309 may be used to receive inputted numeric or character information, and generate key signal inputs related to user settings and function control of the terminal 300.
  • other input devices 309 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), trackball, mouse, joystick, light mouse (light mouse is a touch that does not display visual output) One or more of a sensitive surface, or an extension of a touch-sensitive surface formed by a touch screen).
  • Other input devices 309 may also include sensors built into the terminal 300, such as gravity sensors, acceleration sensors, etc. The terminal 300 may also use parameters detected by the sensors as input data.
  • the display screen 310 may be used to display information input by the user or information provided to the user and various menus of the terminal 300, and may also accept user input.
  • the display panel 312 may be configured with a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like to configure the display panel 312; a touch panel 311, also referred to as a touch screen or a touch-sensitive screen Etc.
  • the user may collect contact or non-contact operations on or near the user (for example, the user uses a finger, a stylus or any suitable object or accessory to operate on or near the touch panel 311, or Including somatosensory operation; this operation includes single-point control operation, multi-point control operation and other operation types), and drives the corresponding connection device according to a preset program.
  • the touch panel 311 may further include a touch detection device and a touch controller.
  • the touch detection device detects the user's touch position and posture, and detects signals brought by the touch operation, and transmits the signals to the touch controller;
  • the touch controller receives touch information from the touch detection device and converts it into the processor 301 The information that can be processed is then transmitted to the processor 301, and can also receive commands from the processor 301 and execute them.
  • the touch panel 311 may be implemented using various types such as resistive, capacitive, infrared, and surface acoustic wave, and may also be implemented using any technology developed in the future.
  • the touch panel 311 can cover the display panel 312, and the user can overlay the display panel 312 according to the content displayed by the display panel 312 (the display content includes, but is not limited to, a soft keyboard, virtual mouse, virtual keys, icons, etc.) Operation is performed on or near the touch panel 311, and after the touch panel 111 detects an operation on or near the touch panel 311, the touch panel 111 transmits the operation to the processor 301 to determine a user input, and then the processor 301 provides a Corresponding visual output.
  • the touch panel 311 and the display panel 312 are implemented as input and output functions of the terminal 300 as two independent components, in some embodiments, the touch panel 311 and the display panel 312 may be integrated. To implement the input and output functions of the terminal 300.
  • the RF circuit 304, the speaker 306, and the microphone 307 may provide an audio interface between the user and the terminal 300.
  • the audio circuit 305 can transmit the converted signal of the received audio data to the speaker 306, which can be converted into a sound signal by the speaker 306.
  • the microphone 307 can convert the collected sound signal into a signal and be received by the audio circuit 305.
  • the audio data is output to the RF circuit 304 for transmission to a device such as another terminal, or the audio data is output to the memory 302, so that the processor 301 performs further processing in combination with the content stored in the memory 302.
  • the camera 303 can collect image frames in real time and transmit them to the processor 301 for processing, and store the processed results in the memory 302 and / or present the processed results to the user through the display panel 312.
  • the processor 301 is the control center of the terminal 300, and uses various interfaces and lines to connect various parts of the entire terminal 300.
  • the processor 301 runs or executes software programs and / or modules stored in the memory 302, and calls data stored in the memory 302. , Execute various functions of the terminal 300 and process data, so as to monitor the terminal 300 as a whole.
  • the processor 301 may include one or more processing units; the processor 301 may also integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system and a user interface (UI). ) And applications, etc., the modem processor mainly handles wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 301.
  • the terminal 300 may further include a power source 314 (for example, a battery) for supplying power to various components.
  • a power source 314 for example, a battery
  • the power source 314 may be logically connected to the processor 301 through a power management system, thereby implementing management of charging, discharging, And power consumption.
  • the terminal 300 may further include a Bluetooth module, a sensor, and the like, and details are not described herein again.
  • the processor 301 in the terminal 300 is used to select an audio track that supports decoding; select an audio track that matches a language; select an audio track that supports high-standard audio formats (such as DTS (Digital Cinema System)
  • DTS Digital Cinema System
  • AC-3 Dolby Surround Audio Coding-3, Dolby AC-3)
  • the above-mentioned audio track supporting decoding means that the audio track can be decoded.
  • any one basis may be judged first. For example: select the audio track that supports decoding from all audio tracks; then select the audio track that matches the language from the multiple audio tracks that support decoding; and then select the audio format that supports medium to high among the multiple audio tracks that match the language Audio tracks of the specified audio format (for example, the specifications of the audio format of DTS (Digital Theater Systems) are higher than the specifications of the audio format of AC-3 (Dolby Surround Audio Coding-3)); Then select a high bit rate audio track from the high specification audio format.
  • the specified audio format for example, the specifications of the audio format of DTS (Digital Theater Systems) are higher than the specifications of the audio format of AC-3 (Dolby Surround Audio Coding-3)
  • the processor includes four high-speed processing cores and four low-speed processing cores. Every 4 high-speed processing cores are combined with a corresponding second-level cache to form a high-speed core processing area. Every 4 low-speed processing cores are combined with a corresponding second-level cache to form a low-speed core processing area.
  • the high-speed processing core may refer to a processing core having a processing frequency of 2.1 GHz (Hertz).
  • the low-speed processing core may refer to a processing core having a processing frequency of 1.7 GHz (Hertz).
  • All the steps executed by the processor 301 are completed by a high-speed processing core or a low-speed processing core.
  • the modem baseband part In addition to the high-speed processing core, the low-speed processing core and the corresponding second-level cache, there are other components.
  • the modem baseband part the baseband part connected to the RF transceiver for processing radio frequency signals; the display subsystem connected to the display; the image signal processing subsystem connected to the CPU; the single-channel DDR controller connected to the DDR memory ;, Embedded multimedia card interface connected to embedded multimedia card; USB interface connected to personal computer; SDIO input and output interface connected to short-range communication module; UART interface connected to Bluetooth and GPS; I2C interface connected to sensor; and Smart card interface for smart card SIM card interface.
  • the film processing subsystem, Sensor Hub subsystem, low-power microcontroller, high-resolution video codec, dual security engine, image processor, and image processing unit formed by the secondary cache are also included in the CPU.
  • the foregoing terminal and the like include a hardware structure and / or a software module corresponding to executing each function.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 5 a possible structural diagram of a terminal involved in the foregoing embodiment is shown, including a processing module 1001, a communication module 1002, an input / output module 1003, and a storage module 1004.
  • the processing module 1001 is configured to control and manage the actions of the terminal.
  • the communication module 1002 is configured to support communication between the terminal and other network entities.
  • the input / output module 1003 is used to receive information input by the user or output information provided to the user and various menus of the terminal.
  • the storage module 1004 is configured to store program codes and data of the terminal.
  • the processing module 1001 may be a processor or a controller.
  • the processing module 1001 may be a central processing unit (CPU), a GPU, a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), and an application-specific integrated circuit. (Application-Specific Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the disclosure of this application.
  • the processor may also be a combination that realizes computing functions, for example, a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication module 1002 may be a transceiver, a transceiver circuit, an input / output device, or a communication interface.
  • the communication module 1002 may be a Bluetooth device, a Wi-Fi device, a peripheral interface, or the like.
  • the storage module 1004 may be a memory, and the memory may include high-speed random access memory (RAM), DDR, and may also include non-volatile memory, such as a disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • RAM high-speed random access memory
  • DDR digital versatile disk drive
  • non-volatile memory such as a disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input / output module 1003 may be an input / output device such as a touch screen, a keyboard, a microphone, and a display.
  • the display may be configured by a liquid crystal display, an organic light emitting diode, or the like.
  • a touchpad can also be integrated on the display to collect touch events on or near it, and send the collected touch information to other devices (such as a processor, etc.).
  • all or part can be implemented by software, hardware, firmware, or any combination thereof.
  • a software program When implemented using a software program, it may appear in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, a computer, a server, or a data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes one or more available medium integration.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

本申请的实施例提供一种从音视频文件中选择音轨的方法,包括:终端从音视频文件的一个或多个音轨中选出一个或多个支持解码的音轨;终端从支持解码的一个或多个音轨中选择语言相匹配的音轨;在语言相匹配的多个音轨中,终端选择支持的音频格式为第一规格的音频格式;终端从第一规格的音频格式中,选择具备第一码率的音轨;终端根据选择的音轨播放所述音视频文件;其中,支持的音频格式中第一规格高于支持的音频格式中的第二规格;具备第一码率的音轨中的第一码率高于具备第二码率的音轨中的第二码率。

Description

一种从音视频文件中选择音轨的方法及装置 技术领域
本申请实施例涉及与终端设备相关的音视频技术领域,尤其涉及一种在终端设备上从音视频文件中选择音轨的方法及装置。
背景技术
智能手机近年来发展迅速,随着屏幕尺寸增大和音频效果的增强,人们通过手机的音视频播放应用来观看电影、电视等视频也越来越普遍。
视频资源包括视频数据和音频数据。其中音频数据可以为单音轨或多音轨。音轨是记录音频数据的轨道,每条音轨具有一个或多个属性参数,所述属性参数包括音频格式、码率、配音语言、音效、通道数、音量等等。当音频数据为多音轨时,不同的两个音轨至少具有一个不同的属性参数,或者不同的两个音轨中至少一个属性参数具备不同的值。
当用户使用个别专业的播放应用在手机上播放具有多音轨的视频时,该播放应用可以在界面上显示音轨选项,用户可以根据该音轨选项选择一个合适的音轨进行播放。但一般的播放应用大多不具备此功能,导致用户在观看多音轨的视频时会出现实际播放的音轨与用户所想要播放的音轨不匹配的问题。尤其是对于存在多个不同语言音轨的情况,音轨选择的准确性十分重要。
发明内容
本申请的实施例提供一种从音视频文件中选择音轨的方法及装置,可以使终端在播放多音轨视频时能自动选择符合用户需求的音轨。
一方面,本发明实施例提供了一种从音视频文件中选择音轨的方法,包括:
在一个可能的设计中,终端从音视频文件的一个或多个音轨中选出一个或多个支持解码的音轨;终端从支持解码的一个或多个音轨中选择一个或多个语言相匹配的音轨;终端在语言相匹配的一个或多个音轨中,选择支持的音频格式为第一规格的一个或多个音轨;终端从支持的音频格式为第一规格的一个或多个音轨中,选择具备第一码率的音轨;终端根据选择的音轨播放所述音视频文件;其中,支持的音频格式中第一规格高于支持的音频格式中的第二规格;具备第一码率的音轨中的第一码率高于具备第二码率的音轨中的第二码率;第二规格是指杜比环绕音频编码AC-3;第二码率是指448千字节每秒。
在一个可能的设计中,将所述一个或多个音轨的解码格式与预设的解码集合相比,如果所述一个或多个音轨的解码格式在预设的解码集合中,则认为一个或多个音轨中存在一个或多个支持解码的音轨。终端从音视频文件的一个或多个音轨中选出一个或多个支持解码的音轨包括:终端从音视频文件的一个或多个音轨中选出一个或多个其 解码格式在预设的解码集合中的音轨。
在一个可能的设计中,终端从支持解码的一个或多个音轨中选择一个或多个语言相匹配的音轨包括:终端根据系统语言、输入法设置、语音助手输入、历史观影习惯中的一个或多个决定第一语言评价结果;终端根据第一语言评价结果,从支持解码的一个或多个音轨中选择一个或多个语言相匹配的音轨。
在一个可能的设计中,终端在语言相匹配的一个或多个音轨中,终端选择支持的音频格式为第一规格的一个或多个音轨包括:终端在语言相匹配的一个或多个音轨中,终端选择支持的音频格式为采样率比第二规格高的一个或多个音轨。
在一个可能的设计中,终端从支持的音频格式为第一规格的一个或多个音轨中,选择具备第一码率的音轨,包括:终端根据音频格式、码率中的一个或多个决定第二语言评价结果。
在一个可能的设计中,终端根据所述第一语言评价结果和所述第二语言评价结果决定播放的音轨的语言。
另一方面,本发明实施例提供了一种终端设备,包括:显示器;音频播放或输出元件;一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述终端设备执行时,使得所述终端设备执行以下步骤:
从音视频文件的一个或多个音轨中选出一个或多个支持解码的音轨;从支持解码的一个或多个音轨中选择一个或多个语言相匹配的音轨;在语言相匹配的一个或多个音轨中,选择支持的音频格式为第一规格的一个或多个音轨;从支持的音频格式为第一规格的一个或多个音轨中,选择具备第一码率的音轨;根据选择的音轨播放所述音视频文件;所述显示器和所述音频播放或输出元件或输出。其中,支持的音频格式中第一规格高于支持的音频格式中的第二规格;具备第一码率的音轨中的第一码率高于具备第二码率的音轨中的第二码率;第二规格是指杜比环绕音频编码AC-3;第二码率是指448千字节每秒。
在一个可能的设计中,将所述一个或多个音轨的解码格式与预设的解码集合相比,如果所述一个或多个音轨的解码格式在预设的解码集合中,则认为一个或多个音轨中存在一个或多个支持解码的音轨。终端从音视频文件的一个或多个音轨中选出一个或多个支持解码的音轨包括:终端从音视频文件的一个或多个音轨中选出一个或多个其解码格式在预设的解码集合中的音轨。
在一个可能的设计中,终端从支持解码的一个或多个音轨中选择一个或多个语言相匹配的音轨包括:终端根据系统语言、输入法设置、语音助手输入、历史观影习惯中的一个或多个决定第一语言评价结果;终端根据第一语言评价结果,从支持解码的一个或多个音轨中选择一个或多个语言相匹配的音轨。
在一个可能的设计中,终端在语言相匹配的一个或多个音轨中,终端选择支持的音频格式为第一规格的一个或多个音轨包括:终端在语言相匹配的一个或多个音轨中,终端选择支持的音频格式为采样率比第二规格高的一个或多个音轨。
在一个可能的设计中,终端从支持的音频格式为第一规格的一个或多个音轨中, 选择具备第一码率的音轨,包括:终端根据音频格式、码率中的一个或多个决定第二语言评价结果。
在一个可能的设计中,终端根据所述第一语言评价结果和所述第二语言评价结果决定播放的音轨的语言。在一个可能的设计中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行以下步骤:终端根据系统语言、输入法设置、语音助手输入、历史观影习惯中的一个或多个决定第一语言评价结果。
在一个可能的设计中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行以下步骤:终端根据音频格式、码率中的一个或多个决定第二语言评价结果。
在一个可能的设计中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行以下步骤:根据所述第一语言评价结果和所述第二语言评价结果决定播放的音轨的语言。
再一方面,本发明实施例提供了一种计算机程序产品,当所述计算机程序产品在终端上运行时,使得所述终端执行上述任一项所述的方法。
另一方面,本发明实施例提供了一种计算机可读存储介质,包括指令,当所述指令在终端上运行时,使得所述终端执行上述任一项所述的方法。
相较于现有技术,本发明提供的方案可以在播放应用没有音轨选择功能的时候,安装有该播放应用的终端能自动选择符合用户需求的音轨。
附图说明
图1为本发明实施例提供的安卓系统中语言和输入法的设置的示意图;
图2为本发明实施例提供的一种安卓系统架构图;
图3为本发明实施例提供的一种终端的结构示意图一;
图4为本发明实施例提供的处理器的结构示意图;
图5为本发明实施例提供的一种终端的结构示意图二。
具体实施方式
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
以下详细描述参考附图对所公开的系统和方法的各种特征和功能进行了描述。在图中,除非上下文另外指出,否则相同的符号标识相同的组件。可容易理解,所公开的系统和方法的某些方面可以按多种不同的配置进行布置和组合,所有这些都在本文中被设想到。
本申请实施例提供的一种选择音轨的方法可应用于终端上。该终端可以为手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer, UMPC)、个人数字助理(personal digital assistant,PDA)等具有显示功能的任意终端,本申请实施例对此不作任何限制。
一般的终端具有一个或多个播放音视频的应用软件,该播放应用软件可以是终端出厂时预装的,也可以是后续安装到终端中的。终端可以通过该播放应用播放音视频文件,即用户可以通过终端来观看电影、电视或其它音视频节目。
音轨是记录音频数据的轨道,每条音轨具有一个或多个属性参数,该属性参数可以包括音频格式、码率、语言、音效、通道数、音量等等。当音频数据为多音轨时,不同的两个音轨至少具有一个不同的属性参数,或者不同的两个音轨中至少一个属性参数具备不同的值。此处语言可以指配音语言。
播放的音视频文件可以是保存在终端中的或即时从网络下载的。播放即时下载的音视频文件称为在线播放,播放保存在终端中的音视频文件称为本地播放。如果播放的音视频是在线的,则网络侧需要告知终端这个音视频是否是多音轨的。具体的,网络侧可以通过通知消息告知终端该音视频是多音轨的。终端在选择了具体的音轨之后,就要将选择告知网络侧。具体的,终端可以以通知消息的方式告知网络侧终端已选择具体的音轨。如果是本地播放,则无需与网络侧交互,终端可在本地选择具体的音轨播放。本发明实施例对通知消息的格式不限定,例如,网络侧通过一个特定的通知消息告知终端该音视频文件是多音轨,也可以是网络侧通过音视频文件里的文件头信息告知终端该音视频文件是多音轨的。
播放音视频需要使用播放应用。播放应用的使用方式各有不同。举例来说,用户打开应用,从应用中的相册(album)选择播放的节目。对于相册中保存的视频的播放,第一次播放可以使用播放应用的默认设置。
按照本发明的实施例提供的方法,终端从音视频文件的多个音轨中,选择一个音轨播放。音视频的音轨选择,包括以下规则:
选择音轨中能够被终端解码的音轨;
选择音轨中语言相匹配的音轨;
选择支持音频格式中高规格的音频格式的音轨(例如,DTS(数字影院系统,Digital Theater Systems)的音频格式的规格高于AC-3(Dolby Surround Audio Coding-3,杜比AC-3)的音频格式的规格);以及
选择高码率的音轨。
在上述音轨选择的规则中,各个选择之间的先后顺序是没有限定的,任何一个选择先行皆可。
例如可以按照这样的顺序:从音视频文件的所有音轨中选出可被终端解码的音轨;然后从可被终端解码的多个音轨中选择语言相匹配的音轨;再在语言相匹配的多个音轨中,选择支持音频格式中高规格的音频格式的音轨;接着从高规格的音频格式中,选择高码率的音轨。
对于音轨中的语言的选择,有以下几个原则:(1)与用户使用手机时使用的语言相同;或(2)与用户历史观看视频的语言类型相一致。对于历史观看的视频还可以区 分不同的场景。比如:如果用户历史观看的视频包括原始版本是第一语言(如中文)的视频和原始版本是第二语言(如英文)的视频,音轨语言的确定就要结合用户当前欲观看的视频本身的特征。例如,用户当前欲观看的是原始版本为中文的视频,而用户历史上曾经看过原始版本为中文的视频,则依据用户的观看历史,选择中文作为音轨的语言。
对于选择音频格式的原则,一般是选终端支持的音频格式,如果终端支持多种音频格式,则选择高规格音频格式的音轨。音轨的规格包括以下几种:无损音频格式,有损压缩损失小的音频格式,有损压缩损失大的音频格式。音轨的规格的高低排序为:无损音频格式>有损压缩损失小的音频格式>有损压缩损失大的音频格式。例如,有DTS解码的音频格式,杜比解码的音频格式,和普通的音频格式。音轨的规格的高低排序为:DTS解码>杜比解码的AC-3解码>普通的有损压缩损失大的音频格式。
对于选择码率的原则,一般是选码率高的,但是也要看终端的硬件或软件是否支持。若终端的硬件或软件不支持,则可以选择码率次高的。若终端的硬件或软件连码率次高的也不能支持,则可以选择码率再低的,依次往下类推。
音轨的确定,还可以取决于除了以上因素之外的别的因素,例如:
音效,可以根据用户历史观影习惯,或者终端根据自带音效的设置情况进行选择。根据历史观影习惯选择,可以记录下之前观看最多的音效,如果有的话就选之前观看最多的音效。其中,历史观影习惯可以指用户在同一个观影应用中的观影习惯,历史观影习惯也可以是指用户在所有不同观影应用中的观影习惯。这里的用户指的是按照不同用户账号使用的用户,或者不同终端对应不同用户。终端根据自带音效的设置情况进行选择,可以这样选择:终端有些有自带音效,比如杜比,里面有分音乐,影院等设置,如有匹配的也可以按这个选。
声道数,可以根据终端支持的声道数,或者选最多的声道数。
音量,可以选择手机当前媒体音量设置情况。
当需要播放一段音视频文件时,终端会从音视频文件中选择合适的音轨来配合音视频文件中的视频播放。而如何选择合适的音轨就要运用本发明实施例中的算法。
终端可以根据如下几种语言参数来选择匹配的语言的音轨,如根据用户的系统语言设置、输入法设置、语音助手输入、历史观影习惯,综合得到的语言判决结果。
同时,终端可以根据音频格式和码率决定播放的语言;而决定音频格式和码率的是系统底层和硬件的情况。
最后,终端综合上面两段的内容,得到最终的选择音轨的播放语言的规则:排除不支持解码的音轨;然后从多个音轨中选择语言相匹配的;再在语言相匹配的多个音轨中,选择支持音频格式中高规格的音频格式(例如本例中DTS的音频格式的规格高于AC-3的音频格式的规格);接着从高规格的音频格式中,选择高码率的音轨。
该算法可以应用于多种操作系统。包括
Figure PCTCN2018088857-appb-000001
操作系统,苹果
Figure PCTCN2018088857-appb-000002
操作系统等各种其他操作系统。下面以
Figure PCTCN2018088857-appb-000003
操作系统为例来说明该算法是如何运作的。
该算法可以把音轨的选择放在
Figure PCTCN2018088857-appb-000004
(下文以安卓代替)操作系统的应用程序框架(application framework)层。具体的,由应用程序框架层接收并存储多条音轨, 并由应用程序框架层选择出使用具体的音轨,并将选择出的具体的音轨发给应用层播放。
根据该算法开发的终端软件系统具体地可以包括两个记录模块。例如图2所示,终端的安卓的框架层可以包括两个记录模块,一是语言判定记录模块105,一个是解码能力记录模块106。其中,语言判定记录模块根据应用层的至少三种参数进行输入判定。该至少三个参数包括系统语言设置101的语言,用户输入法的语言102和用户通过语音助手输入的语言103。其中,系统语言设置101包括终端设置里的语言选项。而输入法设置102包括输入法应用的设置。例如,如图1所示的安卓系统中语言和输入法的设置中,最上方的框“语言”是系统语言设置,下方的框中的“百度输入法华为版”是输入法设置。语音助手输入是指,通过与终端的语言交互输入。历史观影习惯是指用户在本次观影的过去的观影习惯。历史观影习惯可以指用户在同一个播放应用中的观影习惯,历史观影习惯也可以是指用户在所有不同播放应用中的观影习惯。这里的用户指的是按照不同用户账号区分的使用播放应用的用户,或者使用不同终端所对应的不同用户。用户语言判决结果是指根据用户的系统语言设置、输入法设置、语音助手输入、历史观影习惯,综合得到的语言判决结果,也可以是选择的音轨应该使用的语言。
具体的,语言判定记录模块根据应用层的至少三种参数进行输入判定。该至少三个参数包括用户设置101的语言,用户输入法的语言102和用户通过语音助手输入的语言103。具体的判断过程可参考下表1:
Figure PCTCN2018088857-appb-000005
表1
上表1中还包括用户历史观影习惯,框架层的语言判定记录模块根据上表中中间四列的参数的各种情况,得出语言判决结果。以用户1为例,系统语言设置为中文,输入法设置也是中文,语音助手的输入也是中文,历史观影习惯也是中文,因此语言判决 结果是中文。以用户4为例,系统语言设置为中文,输入法设置是英文,语音助手输入为中文,历史观影习惯为英文,当中间四列参数的排列中没有明显优势的语言时,依照用户历史观影习惯来决定语言判决结果为英文。
另外,安卓系统的应用程序框架(application framework)层还包括解码能力记录模块106,这个解码能力记录模块106根据系统底层和硬件108的情况来确定。具体的,可参考下表:
音轨序号 音频格式 码率 语言
1 DTS 1509kbps 英文
2 DTS 1509kbps 中文
3 DTS 754kbps 中文
4 AC-3 448kbps 英文
5 AC-3 448kbps 中文
表2
上表中,音频格式代表的是解码能力,码率代表的是音轨效果。例如,框架层从应用层的播放器获取了所有的音轨信息。上表中,对于音轨序号1,其音频格式是DTS(数字影院系统,Digital Theater Systems),码率是1509kbps(千字节每秒),语言是英文。例如,对于音轨序号为2的音轨,和音轨序号为1的音轨的区别在于,语言是中文。例如,对于音轨序号为4的音轨,其音频格式是AC-3(Dolby Surround Audio Coding-3,杜比AC-3),码率是448kbps,语言是英文。
上表的音频格式和码率决定了播放的语言。
接着,结合表1和表2中语言判定记录和解码能力记录,得到视频的音轨得分排序,如下表所示:
默认音轨序号 用户1 用户2 用户3 用户4
支持DTS解码 2 1 1 1
不支持DTS解码 5 4 4 4
表3
例如,对于用户1,若支持DTS解码,则选择表2中的第二条。对于用户1,若不支持DTS解码,这选择表2中的第五条。上述视频的音轨得分排序,包括以下规则:首先 排除不支持解码的音轨;然后从多个音轨中选择语言相匹配的;再在语言相匹配的多个音轨中,选择支持音频格式中高规格的音频格式(例如本例中DTS的音频格式的规格高于AC-3的音频格式的规格);接着从高规格的音频格式中,选择高码率的音轨。
此处不同的用户可对应不同的终端,也可对应同一终端中的不同账号。当不同的用户不同的用户对应同一终端中的不同账号时,不同的账号作为独立的用户,分别维护数个记录语言判定记录,解码能力记录和视频音轨得分排序的模块。
需要说明的是,框架层的视频音轨得分排序模块107从应用层的播放器104获取视音频文件中的所有音轨信息,经过视频音轨得分排序模块107排序后,得到最优的默认音轨,返回给应用层的播放器104播放。选择最优的默认音轨的过程,和上述视频音轨得分排序的模块实现的功能相同,此处不再赘述。
除上述本发明实施例内容外,本发明实施例还包括如下方案:
可以在选择完成后,在终端的显示界面上提示用户:“该视频包括多音轨,终端已自动选择XX音轨”。其中的XX表示某个的含义。
或者,本发明实施例所述的算法也可以应用在用户能选择的场景下。可以终端首先自动选择,然后提示用户“该视频包括多音轨,终端已自动选择XX音轨”,如用户对已选音轨不满意,可以再手动选择。
这样,不用应用层的播放器做任何改动,只需将视音频文件中的音轨发到框架层进行分析,得到最适合该用户的音轨,发给应用层的播放器播放,能实现不依赖播放器的音轨的自动选择,改善了用户的体验。
如图3所示,是本发明实施例的终端的结构示意图。该终端300中包括处理器301、存储器302、摄像头303、RF电路304、音频电路305、扬声器306、话筒307、输入设备308、其他输入设备309、显示屏310、触控面板311、显示面板312、输出设备313、以及电源314等部件。其中,显示屏310至少由作为输入设备的触控面板311和作为输出设备的显示面板312组成。需要说明的是,图3中示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置,在此不做限定。
下面结合图3对终端300的各个构成部件进行具体的介绍:
射频(radio frequency,RF)电路304可用于收发信息或通话过程中,信号的接收和发送,比如,若该终端300为车载设备,那么该终端300可以通过RF电路304,将基站发送的下行信息接收后,传送给处理器301处理;另外,将涉及上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(low noise amplifier,LNA)、双工器等。此外,RF电路304还可以通过无线通信与网络和其他设备通信。该无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(global system for mobile communication,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA)、长期演进(long term evolution,LTE)、电子邮件、短消息服务(short messaging service,SMS)等。
存储器302可用于存储软件程序以及模块,处理器301通过运行存储在存储器302 的软件程序以及模块,从而执行终端300的各种功能应用以及数据处理。存储器302可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如,声音播放功能、图像播放功能等)等;存储数据区可存储根据终端300的使用所创建的数据(比如,音频数据、视频数据等)等。此外,存储器302可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
其他输入设备309可用于接收输入的数字或字符信息,以及产生与终端300的用户设置以及功能控制有关的键信号输入。具体地,其他输入设备309可包括但不限于物理键盘、功能键(比如,音量控制按键、开关按键等)、轨迹球、鼠标、操作杆、光鼠(光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸)等中的一种或多种。其他输入设备309还可以包括终端300内置的传感器,比如,重力传感器、加速度传感器等,终端300还可以将传感器所检测到的参数作为输入数据。
显示屏310可用于显示由用户输入的信息或提供给用户的信息以及终端300的各种菜单,还可以接受用户输入。此外,显示面板312可以采用液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板312;触控面板311,也称为触摸屏、触敏屏等,可收集用户在其上或附近的接触或者非接触操作(比如,用户使用手指、触笔等任何适合的物体或附件在触控面板311上或在触控面板311附近的操作,也可以包括体感操作;该操作包括单点控制操作、多点控制操作等操作类型),并根据预先设定的程式驱动相应的连接装置。需要说明的是,触控面板311还可以包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位、姿势,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成处理器301能够处理的信息,再传送给处理器301,并且,还能接收处理器301发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板311,也可以采用未来发展的任何技术实现触控面板311。一般情况下,触控面板311可覆盖显示面板312,用户可以根据显示面板312显示的内容(该显示内容包括但不限于软键盘、虚拟鼠标、虚拟按键、图标等),在显示面板312上覆盖的触控面板311上或者附近进行操作,触控面板111检测到在其上或附近的操作后,传送给处理器301以确定用户输入,随后处理器301根据用户输入,在显示面板312上提供相应的视觉输出。虽然在图3中,触控面板311与显示面板312是作为两个独立的部件来实现终端300的输入和输出功能,但是在某些实施例中,可以将触控面板311与显示面板312集成,以实现终端300的输入和输出功能。
RF电路304、扬声器306,话筒307可提供用户与终端300之间的音频接口。音频电路305可将接收到的音频数据转换后的信号,传输到扬声器306,由扬声器306转换为声音信号输出;另一方面,话筒307可以将收集的声音信号转换为信号,由音频电路305接收后转换为音频数据,再将音频数据输出至RF电路304以发送给诸如另一终端的设备,或者将音频数据输出至存储器302,以便处理器301结合存储器302中存储的内容进行进一步的处理。另外,摄像头303可以实时采集图像帧,并传送给 处理器301处理,并将处理后的结果存储至存储器302和/或将处理后的结果通过显示面板312呈现给用户。
处理器301是终端300的控制中心,利用各种接口和线路连接整个终端300的各个部分,通过运行或执行存储在存储器302内的软件程序和/或模块,以及调用存储在存储器302内的数据,执行终端300的各种功能和处理数据,从而对终端300进行整体监控。需要说明的是,处理器301可以包括一个或多个处理单元;处理器301还可以集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面(user interface,UI)和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器301中。
终端300还可以包括给各个部件供电的电源314(比如,电池),在本发明实施例中,电源314可以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗等功能。
此外,图3中还存在未示出的部件,比如,终端300还可以包括蓝牙模块、传感器等,在此不再赘述。
终端300中的处理器301,用于选择音轨中支持解码的音轨;选择音轨中语言相匹配的音轨;选择支持音频格式中高规格的音频格式的音轨(例如DTS(数字影院系统,Digital Theater Systems)的音频格式的规格高于AC-3(Dolby Surround Audio Coding-3,杜比AC-3)的音频格式的规格);选择高码率的音轨。
其中,上述支持解码的音轨是指音轨能够被解码。
关于音视频的音轨的选择的规则,各个选择的依据之间可以没有先后顺序,任何一个依据先判断皆可。例如:从所有音轨中选出支持解码的音轨;然后从支持解码的多个音轨中选择语言相匹配的音轨;再在语言相匹配的多个音轨中,选择支持音频格式中高规格的音频格式的音轨(例如DTS(数字影院系统,Digital Theater Systems)的音频格式的规格高于AC-3(Dolby Surround Audio Coding-3,杜比AC-3)的音频格式的规格);接着从高规格的音频格式中,选择高码率的音轨。
对于音轨中的语言的选择,对于选择音频格式的原则,对于选择码率的原则,对于别的因素,可以参考本申请文件中的方法实施例中的描述,此处不再赘述。
如图4所示,是一个处理器的内部实现框图。可以从图中看出,处理器中包括4个高速处理核和4个低速处理核。每4个高速处理核和一个相应的二级缓存配合起来,形成一个高速核处理区域。每4个低速处理核和一个相应的二级缓存配合起来,形成一个低速核处理区域。这里高速处理核可以指处理频率为2.1GHz(赫兹)的处理核。这里低速处理核可以指处理频率为1.7GHz(赫兹)的处理核。
而所有处理器301执行的步骤都是由高速处理核或低速处理核完成。
除了高速处理核,低速处理核和相应的二级缓存外,还有其他的组成部分。例如,调制解调器基带部分;和射频收发器连接,用于处理射频信号的基带部分;和显示器相连的显示子系统;和CPU外部相连的图像信号处理子系统;和DDR存储相连的单通道DDR控制器;,和嵌入多媒体卡连接的嵌入多媒体卡接口;和个人电脑连接的USB接口;和短距通信模块相连的SDIO输入输出接口;和蓝牙,GPS相连的UART接口;和传感器相连的I2C接口;和智能卡SIM卡接口的智能卡接口。以及CPU内 部还包括的影片处理子系统,Sensor Hub子系统,低功耗微控制器,高分辨率视频编解码器,双安全引擎,图像处理器和二级缓存形成的图像处理单元。还有布局于CPU内部的一致性总线,用于连接CPU中的所有接口及处理单元。
可以理解的是,上述终端等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
本申请实施例可以根据上述方法示例对上述终端等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
如图5所示,示出了上述实施例中所涉及的终端的一种可能的结构示意图,包括处理模块1001、通信模块1002、输入/输出模块1003以及存储模块1004。
其中,处理模块1001用于对终端的动作进行控制管理。通信模块1002用于支持终端与其他网络实体的通信。输入/输出模块1003用于接收由用户输入的信息或输出提供给用户的信息以及终端的各种菜单。存储模块1004用于保存终端的程序代码和数据。
示例性的,处理模块1001可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),GPU,通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。
通信模块1002可以是收发器、收发电路、输入输出设备或通信接口等。例如,通信模块1002具体可以是蓝牙装置、Wi-Fi装置、外设接口等等。
存储模块1004可以是存储器,该存储器可以包括高速随机存取存储器(RAM)、DDR,还可以包括非易失存储器,例如磁盘存储器件、闪存器件或其他易失性固态存储器件等。
输入/输出模块1003可以为触摸屏、键盘、麦克风以及显示器等输入输出设备。其中,显示器具体可以采用液晶显示器、有机发光二极管等形式来配置显示器。另外,显示器上还可以集成触控板,用于采集在其上或附近的触摸事件,并将采集到的触摸信息发送给其他器件(例如处理器等)。
在上述实施例中,可以全部或部分的通过软件,硬件,固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式出现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序 指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘,硬盘、磁带)、光介质(例如,DVD)或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种从音视频文件中选择音轨的方法,其特征在于,包括:
    终端从音视频文件的一个或多个音轨中选出一个或多个支持解码的音轨;
    终端从支持解码的一个或多个音轨中选择语言相匹配的音轨;
    在语言相匹配的多个音轨中,终端选择支持的音频格式为第一规格的音频格式;
    终端从第一规格的音频格式中,选择具备第一码率的音轨;
    终端根据选择的音轨播放所述音视频文件;
    其中,支持的音频格式中第一规格高于支持的音频格式中的第二规格;
    具备第一码率的音轨中的第一码率高于具备第二码率的音轨中的第二码率;
    第二规格是指杜比环绕音频编码AC-3;
    第二码率是指448千字节每秒。
  2. 根据权利要求1所述的方法,其特征在于,终端从支持解码的多个音轨中选择语言相匹配的音轨;在语言相匹配的多个音轨中,终端选择支持音频格式中第一规格的音频格式包括:终端根据系统语言、输入法设置、语音助手输入、历史观影习惯中的一个或多个决定第一语言评价结果。
  3. 根据权利要求1所述的方法,其特征在于,终端从第一规格的音频格式中,选择具备第一码率的音轨,包括:终端根据音频格式、码率中的一个或多个决定第二语言评价结果。
  4. 根据权利要求2和3所述的方法,其特征在于,还包括:
    终端根据所述第一语言评价结果和所述第二语言评价结果决定播放的音轨的语言。
  5. 一种终端设备,其特征在于,包括:
    触摸屏,其中,所述触摸屏包括触敏表面和显示器;
    一个或多个处理器;
    存储器;
    多个应用程序;
    以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述终端设备执行时,使得所述终端设备执行以下步骤:
    从音视频文件的一个或多个音轨中选出一个或多个支持解码的音轨;
    从支持解码的一个或多个音轨中选择语言相匹配的音轨;
    在语言相匹配的多个音轨中,选择支持的音频格式中为第一规格的音频格式;
    从第一规格的音频格式中,选择第一码率的音轨;
    根据选择的音轨播放所述音视频文件;其中,支持的音频格式中第一规格高于支持的音频格式中的第二规格;
    具备第一码率的音轨中的第一码率高于具备第二码率的音轨中的第二码率;
    第二规格是指杜比环绕音频编码AC-3;
    第二码率是指448千字节每秒。
  6. 根据权利要求5所述的终端,其特征在于,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行以下步骤:终端根据系统语言、输入法设置、语音助手输入、历史观影习惯中的一个或多个决定第一语言评价结果。
  7. 根据权利要求5所述的终端,其特征在于,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行以下步骤:
    终端根据音频格式、码率中的一个或多个决定第二语言评价结果。
  8. 根据权利要求6和7所述的终端,其特征在于,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行以下步骤:根据所述第一语言评价结果和所述第二语言评价结果决定播放的音轨的语言。
  9. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在终端上运行时,使得所述电子设备执行如权利要求1-4中任一项所述的方法。
  10. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在终端上运行时,使得所述电子设备执行如权利要求1-4中任一项所述的方法。
PCT/CN2018/088857 2018-05-29 2018-05-29 一种从音视频文件中选择音轨的方法及装置 WO2019227308A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US17/058,995 US20210219028A1 (en) 2018-05-29 2018-05-29 Method and apparatus for selecting audio track from audio and video file
PCT/CN2018/088857 WO2019227308A1 (zh) 2018-05-29 2018-05-29 一种从音视频文件中选择音轨的方法及装置
CN201880093609.4A CN112189344A (zh) 2018-05-29 2018-05-29 一种从音视频文件中选择音轨的方法及装置
EP18921207.9A EP3783906A4 (en) 2018-05-29 2018-05-29 METHOD AND DEVICE FOR SELECTING THE AUDIO TRACK FROM AUDIO AND VIDEO FILES

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/088857 WO2019227308A1 (zh) 2018-05-29 2018-05-29 一种从音视频文件中选择音轨的方法及装置

Publications (1)

Publication Number Publication Date
WO2019227308A1 true WO2019227308A1 (zh) 2019-12-05

Family

ID=68697740

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/088857 WO2019227308A1 (zh) 2018-05-29 2018-05-29 一种从音视频文件中选择音轨的方法及装置

Country Status (4)

Country Link
US (1) US20210219028A1 (zh)
EP (1) EP3783906A4 (zh)
CN (1) CN112189344A (zh)
WO (1) WO2019227308A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735445A (zh) * 2020-12-25 2021-04-30 广州朗国电子科技有限公司 自适应选择音轨的方法、装置及存储介质
CN115460438A (zh) * 2022-09-22 2022-12-09 西安诺瓦星云科技股份有限公司 视频推送方法、装置、非易失性存储介质及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860361A (zh) * 2021-01-12 2021-05-28 广州朗国电子科技有限公司 一种自动选择音轨和字幕的方法、设备、存储介质
TWI777771B (zh) * 2021-09-15 2022-09-11 英業達股份有限公司 行動影音裝置及影音播放控制方法
CN117597936A (zh) * 2022-06-17 2024-02-23 北京小米移动软件有限公司 音频信号格式确定方法、装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1937609A (zh) * 2006-08-29 2007-03-28 华为技术有限公司 流媒体平台支持多音轨内容的方法、系统及流媒体服务器
CN103093776A (zh) * 2011-11-04 2013-05-08 腾讯科技(深圳)有限公司 网络视听中多音轨内容播放方法及系统
CN105872727A (zh) * 2016-03-31 2016-08-17 乐视控股(北京)有限公司 一种视频流转码方法及装置
WO2017158538A1 (en) * 2016-03-16 2017-09-21 Patil Mrityunjay A method and a system for enabling an user to consume a video or audio content understandable with respect to a preferred language

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102347042B (zh) * 2010-07-28 2014-05-07 Tcl集团股份有限公司 一种音轨切换方法、系统及音视频文件播放设备
WO2014179003A1 (en) * 2013-04-30 2014-11-06 Dolby Laboratories Licensing Corporation System and method of outputting multi-lingual audio and associated audio from a single container
US10506295B2 (en) * 2014-10-09 2019-12-10 Disney Enterprises, Inc. Systems and methods for delivering secondary content to viewers
CN105025319B (zh) * 2015-07-09 2019-03-12 无锡天脉聚源传媒科技有限公司 一种视频推送方法和装置
CN105744347A (zh) * 2016-02-17 2016-07-06 四川长虹电器股份有限公司 网络媒体终端提高用户视听体验的方法
CN105898617B (zh) * 2016-04-28 2018-09-14 广东欧珀移动通信有限公司 匹配视频播放的音轨的方法及装置
CN106210927A (zh) * 2016-09-05 2016-12-07 青岛海信电器股份有限公司 一种音轨切换方法和装置
CN107872703B (zh) * 2017-11-22 2020-03-06 青岛海信电器股份有限公司 一种优化首选音轨语言设置项的方法及其数码系统
US10923135B2 (en) * 2018-10-14 2021-02-16 Tyson York Winarski Matched filter to selectively choose the optimal audio compression for a metadata file

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1937609A (zh) * 2006-08-29 2007-03-28 华为技术有限公司 流媒体平台支持多音轨内容的方法、系统及流媒体服务器
CN103093776A (zh) * 2011-11-04 2013-05-08 腾讯科技(深圳)有限公司 网络视听中多音轨内容播放方法及系统
WO2017158538A1 (en) * 2016-03-16 2017-09-21 Patil Mrityunjay A method and a system for enabling an user to consume a video or audio content understandable with respect to a preferred language
CN105872727A (zh) * 2016-03-31 2016-08-17 乐视控股(北京)有限公司 一种视频流转码方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3783906A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735445A (zh) * 2020-12-25 2021-04-30 广州朗国电子科技有限公司 自适应选择音轨的方法、装置及存储介质
CN115460438A (zh) * 2022-09-22 2022-12-09 西安诺瓦星云科技股份有限公司 视频推送方法、装置、非易失性存储介质及电子设备
CN115460438B (zh) * 2022-09-22 2024-05-10 西安诺瓦星云科技股份有限公司 视频推送方法、装置、非易失性存储介质及电子设备

Also Published As

Publication number Publication date
US20210219028A1 (en) 2021-07-15
EP3783906A1 (en) 2021-02-24
CN112189344A (zh) 2021-01-05
EP3783906A4 (en) 2021-02-24

Similar Documents

Publication Publication Date Title
WO2019227308A1 (zh) 一种从音视频文件中选择音轨的方法及装置
KR102207208B1 (ko) 음악 정보 시각화 방법 및 장치
US9784797B2 (en) Method for controlling and an electronic device thereof
US10264053B2 (en) Method, apparatus, and system for data transmission between multiple devices
US9843667B2 (en) Electronic device and call service providing method thereof
US20150213127A1 (en) Method for providing search result and electronic device using the same
US20150067521A1 (en) Method and apparatus for presenting content using electronic devices
US20150156300A1 (en) Method for filtering spam in electronic device and the electronic device
US10283168B2 (en) Audio file re-recording method, device and storage medium
US9977646B2 (en) Broadcast control and accrued history of media
US12015733B2 (en) Do-not-disturb method and terminal
US9728226B2 (en) Method for creating a content and electronic device thereof
CN109003194A (zh) 评论分享方法、终端以及存储介质
KR102128088B1 (ko) 전자 장치의 방송 채널 정보 공유 방법 및 그 전자 장치
TW201901523A (zh) 指紋的採集方法及相關產品
US20230035128A1 (en) Concurrent streaming of content to multiple devices
WO2016029351A1 (zh) 一种处理媒体文件的方法和终端
CN110086941B (zh) 语音播放方法、装置及终端设备
US20160240223A1 (en) Electronic device and method for playing back image data
CN104794139B (zh) 信息检索方法、装置及系统
US20180131736A1 (en) Streaming service method and device
CN111491292A (zh) 上网模式调整方法、装置、存储介质及移动终端
US10375370B2 (en) Audio capture on mobile client devices
CN108464008B (zh) 电子设备和由电子设备控制的内容再现方法
US11468887B2 (en) Electronic device and control method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18921207

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018921207

Country of ref document: EP

Effective date: 20201116

NENP Non-entry into the national phase

Ref country code: DE