WO2018040102A1 - 音频处理方法及设备 - Google Patents

音频处理方法及设备 Download PDF

Info

Publication number
WO2018040102A1
WO2018040102A1 PCT/CN2016/098112 CN2016098112W WO2018040102A1 WO 2018040102 A1 WO2018040102 A1 WO 2018040102A1 CN 2016098112 W CN2016098112 W CN 2016098112W WO 2018040102 A1 WO2018040102 A1 WO 2018040102A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
data source
module
audio data
interception service
Prior art date
Application number
PCT/CN2016/098112
Other languages
English (en)
French (fr)
Inventor
蒋钟寅
陈坤芳
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US16/322,809 priority Critical patent/US11042587B2/en
Priority to CN201680056198.2A priority patent/CN108140013B/zh
Priority to PCT/CN2016/098112 priority patent/WO2018040102A1/zh
Priority to EP16914667.7A priority patent/EP3480707A4/en
Publication of WO2018040102A1 publication Critical patent/WO2018040102A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams

Definitions

  • the present invention relates to the field of audio processing, and in particular, to an audio processing method and apparatus.
  • the present invention provides an audio processing method. Through the invention, the targeted collection of the audio track data can be realized, and the user behavior sample can be obtained by analyzing the audio track data, thereby reducing the difficulty of collecting the user behavior sample.
  • the invention provides an audio processing method.
  • the method includes: determining an audio data source (eg, determining an audio data source through an APP running on the terminal device, a media player provided by the operating system, or a network phone, etc.), and performing audio interception service registration on the audio data source to obtain Audio interception service registration information, wherein the audio interception service registration information includes identification information of an audio interception service for the audio data source (eg, code stream type, process number, play mode, etc.) One or more of them).
  • the audio data source is soft-decoded to obtain audio track data (PCM code stream); and behavior analysis is performed according to the audio track data.
  • PCM code stream audio track data
  • behavior analysis is performed according to the audio track data.
  • the targeted collection of the audio track data, the speech recognition of the collected audio track data into text can be used for semantic analysis or recognition of the behavior of the recorded audio, the text converted by the audio track data or
  • the identified internal recording audio can analyze the user's behavior, and can collect the user behavior samples, which reduces the difficulty of user behavior analysis.
  • the behavior analysis can be performed based on the audio track data and one or more of the following: an operation command, the name of the audio data packet corresponding to the audio data source (eg, the APP package name).
  • the operation command may be a start, pause, end, fast forward, and fast rewind command of the audio play, and the foregoing command may be collected according to the audio data source combined with the time.
  • determining that the audio data source needs to be intercepted according to the identifier information may include: determining an identifier information set of the audio data source that needs to be intercepted; and determining whether the identifier information in the audio interception service registration information is in the identifier information. Concentrate; when in, determine that the audio data source needs to be intercepted.
  • the identification information set that needs to be intercepted can be determined in advance by the present invention. When the audio data source is registered, it is determined whether the identification information in the current registration information is in the identification information set. When present, the audio data source corresponding to the current registration information needs to be Through the invention, the audio data source that needs to be intercepted can be intercepted differently. For the audio data source that does not need to be intercepted, after the audio interception service is registered, the normal playback process is no longer interfered. More accurate determination of user behavior samples and more resource savings.
  • an embodiment of the present invention provides a terminal device.
  • the terminal device includes a processor and a memory.
  • the memory is used to store programs.
  • the program in the processor running memory is configured to determine an audio data source, perform audio interception service registration on the audio data source, and obtain audio interception service registration information, where the audio interception service registration information includes an audio interception service for the audio data source.
  • Identification information when it is determined according to the identification information that the audio data source needs to be intercepted, the audio data source is soft decoded to obtain audio track data; and the behavior analysis is performed according to the audio track data.
  • the processor is further configured to: determine an identifier information set of the audio data source that needs to be intercepted; determine whether the identifier information set in the audio interception service registration information is in the identifier information set; and determine that the interception is needed when The audio data source.
  • an embodiment of the present invention provides an audio processing device.
  • the audio processing device includes an audio interception service module, a soundtrack module, and a behavior analysis module.
  • the audio interception service module is configured to determine audio interception service registration information, where the audio interception service registration information includes an audio interception service for the audio data source.
  • the identifier information is: when it is determined that the audio data source needs to be intercepted according to the identifier information, the audio interception service module sends the first interception indication information to the audio track module; the audio track module is configured to receive the first interception indication information, according to the first interception indication information
  • the instruction sends the audio track data corresponding to the audio data source to the audio interception service module; the audio intercept service module is further configured to send the audio track data sent by the audio track module to the behavior analysis module; the behavior analysis module is configured to perform the audio track according to the audio track Behavior analysis.
  • the audio interception service module is further configured to send the audio interception service registration information to the behavior analysis module; the behavior analysis module is further configured to perform behavior analysis according to the audio track and the audio interception service registration information.
  • the device further includes an operating system including an audio interception service module.
  • the operating system also includes a soundtrack module.
  • the apparatus further includes a first application for determining an audio data source and performing an audio intercept service registration with the audio intercept service module.
  • the device further includes a second application, configured to send, to the audio intercept service module, second intercept indication information, where the second intercept indication information carries identification information; and audio interception is performed on the audio data source.
  • the second interception indication information is used to instruct the audio interception service module to intercept the audio data source included in the identifier information carried in the second indication information in the audio interception service registration information.
  • an embodiment of the present invention provides a computer storage medium for storing a computer.
  • Software instruction the computer runs the instruction for:
  • the audio interception service registration information includes identification information for performing audio interception service on the audio data source
  • the audio data source When it is determined according to the identification information that the audio data source needs to be intercepted, the audio data source is soft-decoded to obtain audio track data;
  • Behavior analysis is performed based on the audio track data.
  • the present invention can collect the track data in a targeted manner, and the user behavior sample can be obtained by analyzing the track data, thereby reducing the difficulty of collecting the user behavior sample.
  • FIG. 1 is a schematic diagram of a data source playing process
  • FIG. 2 is a schematic structural diagram of an audio processing device according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of another audio processing device according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a playback process of an audio data source according to an embodiment of the present invention.
  • FIG. 5 is an information interaction diagram provided by an embodiment of the present invention.
  • FIG. 6 is a flowchart of an audio processing method according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
  • the first application involved in the present invention can be used to determine an audio data source.
  • the audio interception service registration to the audio interception service module that is, an application that plays an audio data source, for example, a third-party application, a media player that is provided by the operating system, or a network phone.
  • the first application can be an APP running on the terminal device.
  • the “first interception indication information” is an audio interception service module that sends interception indication information to the audio track module, the first intercept indication information indicating which audio data source corresponds to the audio track data.
  • the “second intercept indication information” is interception indication information sent by the second application to the audio interception service module, and the second intercept indication information is used to indicate which audio data source the audio interception service module intercepts.
  • the second application may be an APP running on the terminal device.
  • the terminal device generally runs an operating system, such as an Android OS (Operating System), a Windows OS or an IOS, and the like.
  • the operating system is mainly used to manage and control computer hardware and software resources. It is the most basic system software.
  • the operating system can be understood as the interface between the user and the computer and the interface between the computer hardware and other software. Other software can be supported by the operating system. run.
  • an audio interception service may be provided through an operating system, and the audio interception service may intercept the decoded audio track data of the audio data source.
  • Other software programs can intercept the audio track data by calling the audio interception service, and perform user behavior analysis according to the intercepted audio track data.
  • the terminal device can distinguish the audio data source by intercepting the service identification information, and intercept the audio data source to obtain the audio track data by soft decoding to obtain the audio track data (Pulse Code Modulation (PCM)).
  • PCM Packe Code Modulation
  • the code stream performs behavior analysis based on the track data and the interception service identification information. For example, speech recognition, song recognition or scene perception, etc. are performed based on the track data, or further comprehensive analysis is performed based on the recognized or perceived information and the like.
  • the terminal device involved in the embodiment of the present invention may be a mobile phone, a tablet computer, a personal digital assistant (PDA), a point of sales (POS), a vehicle-mounted computer, or the like.
  • PDA personal digital assistant
  • POS point of sales
  • vehicle-mounted computer or the like.
  • FIG. 1 is a schematic diagram of a data source playing process.
  • an audio resource to be played may be specified by a Uniform Resource Identifier (or URI) or a file descriptor (Fd) of an audio data source.
  • the data source for example, the URI or Fd of the audio data source may refer to a Hypertext Transfer Protocol (HTTP) or a Real Time Streaming Protocol (RTSP) uniform resource locator (URL, Uniform Resource). Locator), the address of the local file (URI) or the local file descriptor Fd.
  • HTTP Hypertext Transfer Protocol
  • RTSP Real Time Streaming Protocol
  • Locator the address of the local file (URI) or the local file descriptor Fd.
  • the audio data source may include audio data (eg, an audio file, a video file, a VoIP data stream, etc.), and may also include an operation command (eg, start, pause, end, Fast forward and rewind, etc.).
  • the audio data corresponds to a URI or an fd, and the action corresponds to a command.
  • the audio data source (DataSource) corresponding to the URI or fd is determined by setDataSourc, and data support is provided for the next parsing (demux).
  • the general DataSource is encoded according to a certain encoding format, and the encoder information needs to be parsed by an extractor module to decode the DataSource.
  • the DataSource can be compressed (encapsulated) together with audio and video data, but the audio needs to be played by the speaker, and the video needs to be displayed on the display screen. Therefore, when playing, it needs to be parsed to obtain independent audio data and video data, and they respectively correspond.
  • the decoder information; the DataSource can also be only compressed audio data, in this case also needs to be parsed to obtain the decoder information of the DataSource.
  • DataSource Because there are many formats for the encapsulated DataSource, you need to generate different extractors through the DataSource generated by setDataSourc. For example, a DataSource with WMV (Windows Media Video, a series of video codecs developed by Microsoft and its associated video encoding format) is required to be demultiplexed by WVMExtractor; the package format is AMR. (Adaptive Multi-Rate, Adaptive Multi-Rate Audio Compression) DataSource requires AMRExtractor for parsing; and so on.
  • WMV Windows Media Video, a series of video codecs developed by Microsoft and its associated video encoding format
  • the independent video data and its corresponding decoder information and audio data and its corresponding decoder information are obtained.
  • the extractor splits the audio and video streams in the package format.
  • the audio and video decoders are separately sent; only the DataSource corresponding to the compressed audio data passes through the extractor, and the decoder information corresponding to the audio data source is obtained.
  • the terminal device needs to generate a decoder based on the parsed decoder information. Different types of DataSources match different decoders.
  • processing of the audio data source can be divided into hard decoding and soft decoding (soft decoding requires a decoder in software form, and hard decoding requires a decoder in hardware form).
  • the track data or video track data is obtained, and after being rendered, it can be output as audio or video.
  • the terminal device needs to play a plurality of audio data sources, a plurality of audio track data are obtained after being decoded, and after being mixed by the mixer, the speaker is driven to play the mixed audio.
  • the track data before the mixing is used for behavior analysis.
  • FIG. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
  • the terminal device 100 includes: an application (Application, APP) 111 (first application), a media service module 120, an audio interception service module 130, and a behavior analysis module 140. Is the second application, but also the service of the operating system system, but also other applications).
  • the media service module 120 and the audio interception service module 130 may be services provided by an operating system running on the terminal device 100.
  • the APP 111 running on the terminal device 100 may invoke an API provided by the operating system. Programming Interface, Application programming interface), can call the service corresponding to the API.
  • a Voice over Internet Protocol (VoIP) module 112 (eg, the VoIP module 112 may be the first application) may be included, and the VoIP module 112 may be provided by an operating system running on the terminal device 100.
  • VoIP Voice over Internet Protocol
  • the media service module 120 can be configured to decode audio data sources of multiple formats to obtain audio track data, and mix the audio track data and transmit the data to the hardware abstraction layer.
  • the media service module 120 may include an audio processing module 121, a soundtrack module 122, and a mixing module 123.
  • the audio data source includes multiple formats, and the terminal device 100 can process audio data sources of different formats by using one or more audio processing modules 121 provided by the operating system to obtain audio track data (that is, implement the audio data shown in FIG. 1).
  • the process of obtaining the audio track data by the decoder is provided to the audio track module 122.
  • the audio track module 122 provides the audio track data to the mixing module 123, and after being mixed by the mixing module 123, the driving hardware plays the same.
  • the audio track module 122 outputs the independent track data
  • the sound mixing module 123 outputs the mixed track data of all the independent track data.
  • the terminal device 100 can provide audio data sources for playing telephone tones, audio or video files, streaming media, game sound effects, key tones, audio synchronized with video, audio effects interacting with game animations, and recordings using microphones.
  • Audio processing module different types of audio data sources can correspond to different encoding formats. Also for example, provided in the Android OS, the ToneGenerator is used to play telephone tones; the MediaPlayer is used to play audio, video files and streaming media; the SoundPool can be played at low latency.
  • AudioPlayer audio player
  • JetPlayer an audio player
  • JETEngine sound effects
  • AudioRecord is used to control MIC recording.
  • the APP111 can play the audio data source that needs to be played through the operating system by calling these modules.
  • the VoIP module 112 is generally a service provided by the operating system, which can directly play the audio data source received from the network side through the operating system.
  • the operating system can also play the audio data source that needs to be played directly, for example, through MediaPlayer (the default player of the system).
  • a soundtrack module (eg, AudioTrack in the Android OS) 122 is created to play the audio data source.
  • the audio processing module 121 creates an audio decoder and a track module 122, and decodes the audio data source through the audio decoder to obtain a pulse code.
  • a Pulse Code Modulation (PCM) code stream which is a track data, which is processed by mixing the track data by the mixing module 123, and then converted by a digital-to-analog converter and then played by a speaker.
  • PCM Pulse Code Modulation
  • the application 111 or the VoIP module 112 can also directly decode the audio data source to obtain the track data.
  • the terminal device also needs to create the track module 122, and the track module 122 is used for playing.
  • the terminal device receives an audio data source from a network (eg, a mobile communication network), creates a decoder by the VoIP module 112, and decodes the audio data source to obtain audio track data.
  • a network eg, a mobile communication network
  • the terminal device may further provide a package manager (PackageManager) through an operating system, and the package manager may be configured to determine an APP package name corresponding to the audio data source, by which the application of the audio data source may be determined, for example, an APP package name.
  • PackageManager package manager
  • the audio processing device 100 may provide an audio interception service module 130 for an audio interception service, and the audio interception service module 130 may be disposed in an operating system, and all audio data sources that need to be played need to be in an audio interception service.
  • the module 130 registers, and the operating system can provide an API of the audio interception service module 130.
  • the APP with the behavior analysis module can call the audio interception service module 130 through the API of the audio interception service module 130 to intercept the audio data source that needs to be intercepted.
  • the audio interception service module 130 can maintain an identification information table of an audio data source that needs to be intercepted.
  • the identifier information table may be determined according to the identifier information carried by the second interception indication information sent by the second application.
  • the identifier information table may include the identifier information carried by the second interception indication information and the second application when the second interception indication information sent by the multiple second application is received. relationship.
  • the audio interception service registration may be performed when the audio processing device creates the audio track module 122.
  • the registration information may include a track module identifier corresponding to the audio data source, a stream type (StreamType), a process number (PID), and Audio processing module identification (ModuleID) and so on.
  • the audio processing device may include a plurality of first applications for determining the audio data source
  • the audio track modules corresponding to each of the first applications may be separately created.
  • the audio processing module identifier can identify the foregoing audio processing module.
  • the MediaPlayer, the Tone Generator, the Sound Pool, the AudioPlayer, and the JetPlayer in the Android system are respectively identified by different Module IDs.
  • the audio interception service module 130 may intercept the audio track data (PCM code stream) corresponding to the audio data source that needs to be intercepted according to the registration information; the other behavior analysis module 140 such as the context awareness module performs behavior analysis according to the audio track data and the registration information (for example, Speech recognition, song recognition or scene perception, etc., or further analysis based on the identified or perceived information, etc., based on the PCM stream.
  • the other behavior analysis module 140 such as the context awareness module, can also perform comprehensive analysis in combination with other data, such as an APP package name, or analyze the user's status (eg, running) according to data provided by the motion sensor, or provide other applications.
  • the behavior analysis result (for example, the data of the user fitness provided by the fitness app), and further, for example, the song recognition module analyzes that the piano music is being played according to the audio track data, and determines that the APP is a music player according to the interception service registration information,
  • the behavior analysis provided by the fitness app results in the user jogging.
  • Other applications or function modules can analyze the information to obtain comprehensive results, based on which the user's behavior and habits can be analyzed.
  • the audio track module 220 may be created, and the APP 210 may directly decode the audio data source to obtain a PCM code stream, and use the audio track module 220 to perform Play.
  • the audio server provided by the operating system running on the audio processing device 100 may not be played.
  • APP 210 determines the broadcast The audio data source requirement can directly create the audio track module 220 and simultaneously register with the audio intercept service module 130. During audio playback, the audio intercept service module 130 intercepts the audio track data that needs to be intercepted.
  • the embodiments of the present invention are further described in the following with reference to FIG. 4 as an example of each module provided by the Android OS-based audio processing device.
  • the operating system of the audio processing device includes an application layer, a Framework layer, and a hardware abstraction layer.
  • the application layer includes various APIs provided by the operating system;
  • the Framework layer includes an audio processing module, an audio track module, an audio module (AudioFlinger), and an audio interception service module;
  • the hardware abstraction layer includes hardware interfaces such as wired headphones, Bluetooth, speakers, and earpace. .
  • the Android OS provides the MediaPlayer API, the AudioTrack API, the SoundPool API, and the Intercept Service API.
  • the Android OS can also provide an API of more modules.
  • the embodiment of the present invention is only described by taking the above API as an example. Among them, MediaPlayer, SoundPool and interception service module are provided by the Media Service Module (MediaServer) for the Android OS.
  • MediaServer Media Service Module
  • the game class APP calls the SoundPool API to enable playback of game sound effects.
  • the game-like APP generates a requirement for audio playback, such as background music or sound effects, etc.
  • the background music or sound effect files are generally stored in the APP package corresponding to the game-like APP (the APP package)
  • the program that the APP runs, data, and the like the game-like APP generally calls the SoundPool API in its process, and the SoundPool processes the audio data source that needs to be played.
  • SoundPool will register, register the audio data source to be played and the process ID of the corresponding APP, in order to execute the playback process.
  • the MediaServer creates the audio stream that the AudioTrack transmits to the SoundPool.
  • the audio interception service module provided by the MediaServer performs the interception service registration, and registers the audio data source.
  • the PID of the application the URI of the audio data source or Fd, Stream Type (for example, ALARM (alarm) or MUSIC (music), etc.), the identifier of the processing module that handles the audio data source (SoundPool) Logo
  • the source performs identification information of the audio interception service.
  • the identification information of the audio interception service corresponding to the audio data source may be used for behavior analysis. In other words, the identification information of a set of audio interception services can identify an audio play behavior of the audio processing device, for example, which APP is playing which audio (in the Netease cloud music listening song).
  • the VoIP module does not need to be processed by the audio processing module to directly call the AudioTrack API for voice playback.
  • the VoIP module generates audio playback requirements during operation. For example, in a VoIP call, voice needs to be played, and the audio data source corresponding to the voice is VoIP data received through the network, and the audio processing device receives VoIP from the network. The data is cached in local memory, and the VoIP module decodes the VoIP data in the memory to obtain the track data.
  • Media Server creates the interception service registration through the audio interception service module provided by Media Server while creating AudioTrack to play the audio track data transmitted by the VoIP module.
  • the PID of the application that registers the audio data source, the URI or Fd of the audio data source.
  • Stream Type the identifier of the processing module that handles the audio data source (the logo of AudioTrack), and so on.
  • MediaPlayer can be used directly to play audio data sources, and third-party apps can also play audio by calling the MediaPlayer API. Specifically, the third-party APP generates a need for audio playback at runtime, or the user triggers MediaPlayer to play the audio.
  • MediaPlayerService will create StagefrightPlayer, AwesomePlayer and AudioTrack for player audio data sources.
  • the audio data source is parsed by an analysis module (Exteractor) to obtain decoder information, and the audio data source is decoded by the decoder (OMXCodec) to obtain audio track data.
  • the audio interception app calls the audio interception service module through the interception service API to intercept the audio
  • the service module intercepts the audio data source that needs to be intercepted.
  • the audio interception app provides identification information of the audio interception service of the audio data source that needs to be intercepted to the audio interception service module, and the audio interception service module notifies the audio data that the audio track needs to intercept according to the identifier information of the registered audio interception service.
  • the audio track data corresponding to the source is provided to the audio interception App through the audio interception service module.
  • the track module provides the track data to the mix module and the sound mix module mixes the sound to the hardware abstraction layer for playback.
  • the audio processing device can intercept the audio data source by providing an audio interception service module, and intercept the audio data source that needs to be intercepted by the audio interception service identification information when intercepting, Achieve targeted collection of user behavior samples, analyze user behavior through user behavior samples, and reduce the difficulty of user behavior analysis.
  • FIG. 5 is an information interaction diagram provided by an embodiment of the present invention. As shown in FIG. 5, the embodiment of the present invention may specifically include the following steps:
  • the APP determines to play the audio data source requirement and registers with the media service module. Specifically, it is required to register the PID of the APP and the URI or Fd of the audio data source to be played back to the media service module.
  • the APP usually has the need to play audio when running. For example, game apps need to play sound effects, music apps need to play songs in playlists, video apps also need to play audio synchronously with video or even if communication apps need to send reminders, etc. Wait.
  • the APP may request an audio processing module in the media service module to play audio (eg, MediaPlayer or SoundPool, etc.) by calling an API of the audio processing module provided by the media service module.
  • the audio processing module registers with the media service module, carries the process number (PID) of the APP, and the URI or Fd of the audio data source to be played. This allows the media service to determine which audio data source to play.
  • PID process number
  • the media service module performs media analysis on the audio data source to determine decoder information.
  • the media service creates a soundtrack module (for example, AudioTrack) and establishes an IPC (Inter-Process Communication) channel for the audio track module and the mixing module (for example, AudioFlinger).
  • a soundtrack module for example, AudioTrack
  • IPC Inter-Process Communication
  • the media service performs an audio interception service registration to an audio interception service module (for example, the audio interception service module 130 in FIG. 1).
  • an audio interception service module for example, the audio interception service module 130 in FIG. 1).
  • the audio interception service registration can be performed simultaneously to the audio interception service module when the audio track module is created.
  • Carry identification information such as PID, StreamType, and ModuleID for audio interception services.
  • StreamType can include: VOICE_CALL (voice call), SYSTEM (system sound), RING (calling ringtone), MUSIC (music), ALARM (alarm), NOTIFICATION (notice ringtone), BLUETOOTH_SCO (Bluetooth audio), DTMF (dual-tone multifrequency, dual tone multi-frequency), TTS (Text to Speech) and so on.
  • ModuleID can identify audio processing modules and audio track modules such as ToneGenerator, MediaPlayer, SoundPool, AudioPlayer, and JetPlayer.
  • the audio interception service module determines, according to the identification information of the audio interception service, whether to intercept the audio data source.
  • the audio interception service module can perform interception screening based on the identification information. For example, only the audio data source corresponding to the specific identification information is intercepted.
  • the specific identification information may be set in advance, or may be set by the behavior analysis module to notify the audio interception service module.
  • the media service module receives a start play indication sent by the APP.
  • the APP can start playing, pause playing, and stop playing audio actions to the media service module according to its own needs.
  • the media service module works according to the above instructions.
  • the media service module soft-decodes the audio data source that needs to be intercepted according to the decoder information according to the indication of intercepting the audio data source to obtain the audio track data.
  • the decoding manner of the audio data source may include soft decoding and hard decoding. If the audio data source that needs to be intercepted cannot be obtained by soft decoding, the audio interception service module may be notified by the audio interception service module to soft decode the audio data source that needs to be intercepted. In addition, the audio interception service module may not intervene in its decoding mode for audio data sources that do not need to be intercepted.
  • the audio track module sends a buffer (copy) of the track data (PCM code stream) decoded by the audio data source that needs to be intercepted to the audio interception service module.
  • the process of the PCM Buffer to the audio interception service module is to write the shared memory through the Binder mechanism, and then notify the audio interception service module to read through the shared memory pointer, and the process can be implemented by inter-process communication or inter-thread communication.
  • the audio interception service module sends the buffer of the audio track data to the behavior analysis module, so that the behavior analysis module performs behavior analysis according to the buffer of the audio track data.
  • the Track To Text technology converts the track data into text for semantic analysis or the song recognition module to recognize the recorded audio and the like.
  • the media service module outputs the audio track data to a hardware abstraction layer (HAL) for playing.
  • HAL hardware abstraction layer
  • S205 and S206; S209 and S210 can be executed in any order or simultaneously.
  • the audio interception service module may further obtain the name of the APP data packet by using the following steps, and provide the behavior analysis module for analysis.
  • the audio interception service module queries the package manager for the APP package name.
  • the audio interception service module can provide a PID, and the package manager can query the APP package name according to the PID.
  • the package manager returns the APP package name.
  • the audio interception service module may determine, from the media service module, an audio action corresponding to the audio data source that needs to be intercepted, and provide the behavior analysis module for analysis.
  • Table 1 provides some examples of the results of the behavior analysis module.
  • the embodiment of the present invention can implement the user behavior analysis by intercepting the audio data source through the audio interception service module, and combining other information corresponding to the audio data source, for example, playing the APP package name of the audio data source. And intercepting the audio data source by acquiring audio data reduces the difficulty of collecting user behavior samples, and also reduces the difficulty of user behavior analysis.
  • FIG. 6 is a flowchart of an audio processing method according to an embodiment of the present invention. Specifically, the following steps are included:
  • S610 Determine an audio data source, perform audio interception service registration on the audio data source, and obtain audio interception service registration information, where the audio interception service registration information includes identifier information for performing an audio interception service on the audio data source.
  • the audio data source may be an audio file or a video file stored in a local memory or an extended memory
  • the media to be played may also be an audio data packet of VoIP or an audio data packet received by the application from the network side.
  • the corresponding audio data source can be determined by the URI or the Fd. The specific process participates in the embodiment shown in FIG. 1 and will not be described again.
  • the audio data source After determining the audio data source, the audio data source needs to be registered with the audio interception service.
  • the process identifier PID
  • the code stream type, API type, play mode, and the like can be provided. Identifies the information for this audio data source.
  • the play mode may refer to which play audio processing module the audio data source plays. The specific process participates in the embodiments shown in the foregoing FIGS. 2 and 3 and will not be described again.
  • the audio interception service After the audio interception service is registered with the audio data source, it can be determined according to the identification information in the registration information whether the audio data source needs to be intercepted.
  • the decoding process is different for different playback modes of the audio processing module. Some audio data sources can go through the soft decoding process when they do not need to be intercepted, or they can go through the hard decoding process. For audio data sources that need to be intercepted, the soft decoding process is required. By decoding the audio data source, a PCM stream, that is, track data, is obtained.
  • the identification information set of the audio data source that needs to be intercepted can also be determined
  • the identification information in the current audio interception service registration information determines that the audio data source needs to be intercepted when identifying the information set. For the process, refer to the related description of the identification information table in the embodiment shown in FIG. 2, and details are not described herein.
  • the track data is converted into text by speech recognition of the track data, which can be used for semantic analysis or song recognition module to recognize the recorded audio and the like.
  • Situational awareness can also be based on audio track data.
  • the interception of the audio data source can be realized by intercepting the audio track data column, the difficulty of collecting the user behavior sample is reduced, and the audio data source is intercepted by the audio interception service identifier, and the resource consumption is utilized. Less, improving the user experience.
  • FIG. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present invention. Taking the terminal device as a mobile phone as an example, FIG. 7 is a block diagram showing a part of the structure of the mobile phone 500 related to the embodiment of the present invention.
  • the mobile phone 500 includes an RF (Radio Frequency) circuit 510, a memory 520, other input devices 530, a display screen 540, a sensor 550, an audio circuit 560, an I/O subsystem 570, a processor 580, and a power supply. 590 and other components.
  • RF Radio Frequency
  • display screen 540 is a User Interface (UI) and that handset 500 may include fewer user interfaces than illustrated or less.
  • UI User Interface
  • the components of the mobile phone 500 will be specifically described below with reference to FIG. 5:
  • the RF circuit 510 can be used for transmitting and receiving information or a call process (for example, a VoIP call), receiving and transmitting signals, and in particular, receiving downlink information (audio data source) of the base station, and processing it to the processor 580; in addition, designing Uplink data (eg, audio data source acquisition request, etc.) is sent to the base station.
  • RF circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like.
  • RF circuitry 510 can also communicate with the network and other devices via wireless communication.
  • the wireless communication can use any communication standard or protocol, including but not limited to GSM (Global System of Mobile communication), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access, Code division multiple access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), e-mail, SMS (Short Messaging Service), and the like.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access, Code division multiple access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • e-mail Short Messaging Service
  • the memory 520 can be used to store software programs (eg, music players, VoIP modules, operating systems, etc.) and data, and the processor 580 executes various functions of the mobile phone 500 and data processing by running software programs stored in the memory 520.
  • the memory 520 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (for example, an audio playing function and a video playing function, etc.), and the like; the storage data area may be stored. Data created according to the use of the mobile phone 500 (such as audio data and VoIP call duration and time, etc.).
  • memory 520 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • Other input devices 530 can be used to receive input numeric or character information, as well as generate key signal inputs (eg, start, pause, end, switch, fast forward, rewind, etc.) related to user settings and function controls of handset 500.
  • other input devices 530 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and light mice (the light mouse is not sensitive to display visual output).
  • Other input devices 530 are coupled to other input device controllers 571 of I/O subsystem 570 and are in signal communication with processor 580 under the control of other device input controllers 571. It should be appreciated that in an embodiment of the invention, other input devices 530 can assume interaction with the user, and user behavior samples can be obtained based on information generated by other input devices 530, in conjunction with programs running in the handset 500,
  • the display screen 540 can be used to display information entered by the user or information provided to the user as well as various menus of the handset 500 (eg, playlists and playback progress, etc.), and can also accept user input.
  • the specific display screen 540 can include a display panel 541 and a touch panel 542.
  • the display panel 541 can be configured by using an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
  • the touch panel 542, also referred to as a touch screen, a touch sensitive screen, etc., can collect contact or non-contact operations on or near the user (eg, the user uses any suitable object or accessory such as a finger, a stylus, etc. on the touch panel 542.
  • the operation in the vicinity of the touch panel 542 may also include a somatosensory operation; the operation includes a single point control operation, a multi-point control operation, and the like.
  • the touch panel 542 can include two parts: a touch detection device and a touch controller. Wherein, the touch detection device detects the touch orientation and posture of the user, and detects a signal brought by the touch operation, and transmits a signal to the touch controller; the touch controller receives the touch information from the touch detection device, and converts the signal into a processor. The processed information is sent to processor 580 and can receive commands from processor 580 and execute them.
  • the touch panel 542 can be implemented by using various types such as resistive, capacitive, infrared, and surface acoustic waves, and the touch panel 542 can be implemented by any technology developed in the future.
  • the touch panel 542 can cover the display panel 541, and the user can display the content according to the display panel 541 (including but not limited to, a soft keyboard, a virtual mouse, a virtual button, an icon, etc.) on the display panel 541. Operation is performed on or near the covered touch panel 542. After detecting the operation thereon or nearby, the touch panel 542 transmits to the processor 580 through the I/O subsystem 570 to determine user input, and then the processor 480 is based on the user.
  • the input provides a corresponding visual output on display panel 541 via I/O subsystem 570.
  • the touch panel 542 and the display panel 541 are used as two independent components to implement the input and input functions of the mobile phone 500, in some embodiments, the touch panel 542 can be integrated with the display panel 541.
  • the input and output functions of the mobile phone 500 are implemented. It should be noted that, in the embodiment of the present invention, according to the
  • the handset 500 can also include at least one type of sensor 550, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 541 according to the brightness of the ambient light, and the proximity sensor may close the display panel 541 when the mobile phone 500 moves to the ear. / or backlight.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.
  • the mobile phone 500 can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here Let me repeat.
  • the user's line can be determined by the sensor 550 For data, for analysis.
  • Audio circuit 560, speaker 561, microphone 562 can provide an audio interface between the user and handset 500.
  • the audio circuit 560 can transmit the analog-digital converted analog audio signal to the speaker 561, and convert it into a sound signal output by the speaker 561.
  • the microphone 562 converts the collected sound signal into an analog signal.
  • the audio circuit 560 receives the post analog to digital data, outputs the audio data to the RF circuit 510 for transmission to, for example, another handset, or outputs the audio data to the memory 520 for further processing (eg, for playback).
  • the I/O subsystem 570 is used to control external devices for input and output, and may include other device input controllers 571, sensor controllers 572, and display controllers 573.
  • one or more other input control device controllers 571 receive signals from other input devices 530 and/or send signals to other input devices 530, and other input devices 530 may include physical buttons (press buttons, rocker buttons, etc.) , dial, slide switch, joystick, click wheel, light mouse (light mouse is a touch-sensitive surface that does not display visual output, or an extension of a touch-sensitive surface formed by a touch screen). It is worth noting that other input control device controllers 571 can be connected to any one or more of the above devices.
  • Display controller 573 in I/O subsystem 570 receives signals from display 540 and/or transmits signals to display screen 540. After the display screen 540 detects the user input, the display controller 573 converts the detected user input into an interaction with the user interface object displayed on the display screen 540, ie, implements human-computer interaction. Sensor controller 572 can receive signals from one or more sensors 550 and/or send signals to one or more sensors 550.
  • the processor 580 is the control center of the handset 500, which connects various portions of the entire handset using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 520, for example, as shown in FIG. 1 or FIG. The modules are all run in the processor 580. And invoking data stored in the memory 520, performing the steps of: determining an audio data source, performing audio interception service registration on the audio data source, obtaining audio interception service registration information, the audio interception service registration information including the audio
  • the data source performs identification information of the audio interception service. When it is determined according to the identification information that the audio data source needs to be intercepted, the audio data source is soft decoded to obtain Track data; performing behavior analysis based on the track data.
  • the processor 580 may include one or more processing units; preferably, the processor 580 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor mainly processes wireless communication (mobile communication). It will be appreciated that the above described modem processor may also not be integrated into the processor 580.
  • the handset 500 also includes a power source 590 (such as a battery) that powers the various components.
  • a power source 590 such as a battery
  • the power source can be logically coupled to the processor 580 via a power management system to manage functions such as charging, discharging, and power consumption through the power management system.
  • the mobile phone 500 may further include a camera, a Bluetooth module, and the like, and details are not described herein again.

Abstract

一种音频处理方法,包括:确定音频数据源,对音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息(S610),其中,音频拦截服务注册信息包括音频数据源对应的音频拦截服务的标识信息。当根据标识信息确定需要拦截音频数据源时,对音频数据源进行软解码,得到音轨数据(S620);根据音轨数据进行行为分析(S630)。可实现有针对性的对音轨数据进行收集,对收集的音轨数据进行语音识别转成文本可用于语义分析或识别内录音频,通过音轨数据转换的文本或识别出的内录音频可分析出用户的行为,可实现用户行为样本的收集,降低了用户行为分析的难度。

Description

音频处理方法及设备 技术领域
本发明涉及音频处理领域,尤其涉及一种音频处理方法及设备。
背景技术
随着电子设备功能日新月异,电子设备已经成为人们不可或缺的工具,用户可使用终端上各种多媒体应用,播放网络或本地的媒体。丰富的移动互联网应用在为用户提供便利的同时,也产生了大量情景(英文:context)信息,比如音视频播放器、电话、闹钟、日程、导航、语音播报、微信语音消息等音频信息,用户还可以通过话筒(英文:mic)向网络传输网络电话(英文:Voice over Internet Protocol,VoIP)、即时通信语音消息等等。用户一般会依据自己的习惯或偏好使用上述应用,相应的,用户使用上述应用的频率、时间、时长等等行为样本可体现用户的习惯或偏好。但是,电子设备功能的多样性使得如何收集用户行为样本成为一个迫切需要解决的技术问题。
发明内容
本发明提供了一种音频处理方法。通过本发明可实现有针对性的对音轨数据进行收集,通过对音轨数据的分析可得到用户行为样本,降低了用户行为样本收集的难度。
一方面,本发明提供了一种音频处理方法。该方法包括:确定音频数据源(例如,通过终端设备上运行的APP、操作系统自带的媒体播放器或者网络电话等来确定音频数据源),对该音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息,其中,音频拦截服务注册信息包括对该音频数据源进行音频拦截服务的标识信息(例如,码流类型、进程号以及播放方式等 中的一个或多个)。当根据标识信息确定需要拦截该音频数据源时,对该音频数据源进行软解码,得到音轨数据(PCM码流);根据该音轨数据进行行为分析。通过本发明,可实现有针对性的对音轨数据进行收集,对收集的音轨数据进行语音识别转成文本可用于语义分析或识别内录音频等行为分析,通过音轨数据转换的文本或识别出的内录音频可分析出用户的行为,可实现用户行为样本的收集,降低了用户行为分析的难度。
在一个可选地实现中,可根据音轨数据和下述一项或多项进行行为分析:操作命令,音频数据源对应的音频数据包的名称(例如,APP包名)。其中,操作命令可以为音频播放的开始、暂停、结束、快进以及快退等命令,可依据音频数据源结合时间收集上述指令。本发明通过综合分析,可更准确的确定用户行为样本,以便更精确的进行行为分析。
在另一个可选地实现中,上述根据标识信息确定需要拦截所述音频数据源可包括:确定需要拦截的音频数据源的标识信息集;判断音频拦截服务注册信息中的标识信息是否在标识信息集中;当在时,确定需要拦截该音频数据源。通过本发明可预先确定需要拦截的标识信息集,当音频数据源进行注册时,判断该当前的注册信息中的标识信息是否在标识信息集中,当在时,当前注册信息对应的音频数据源需进行拦截,通过本发明,能够有区别性的对需要拦截的音频数据源进行拦截,对于不需要拦截的音频数据源,在进行音频拦截服务注册后,便不再干扰其正常的播放流程,可更准确的确定用户行为样本,且更节省资源。
第二方面,本发明实施例提供了一种终端设备。该终端设备包括处理器以及存储器。存储器用于存储程序。处理器运行存储器中的程序用于,确定音频数据源,对音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息,其中,该音频拦截服务注册信息包括对音频数据源进行音频拦截服务的标识信息;当根据标识信息确定需要拦截音频数据源时,对该音频数据源进行软解码,得到音轨数据;根据该音轨数据进行行为分析。
在一个可选地实现中,处理器还用于:确定需要拦截的音频数据源的标识信息集;判断音频拦截服务注册信息中的标识信息集是否在标识信息集中;当在时,确定需要拦截该音频数据源。
第三方面,本发明实施例提供了一种音频处理设备。该音频处理设备包括音频拦截服务模块,音轨模块以及行为分析模块;其中,音频拦截服务模块,用于确定音频拦截服务注册信息,该音频拦截服务注册信息包括对音频数据源进行音频拦截服务的标识信息;当根据标识信息确定需要拦截该音频数据源时,音频拦截服务模块向音轨模块发送第一拦截指示信息;音轨模块,用于接收第一拦截指示信息,根据第一拦截指示信息的指示将音频数据源对应的音轨数据发送给音频拦截服务模块;音频拦截服务模块,还用于将音轨模块发送的音轨数据发送给行为分析模块;行为分析模块用于根据音轨进行行为分析。
在一个可选地实现中,音频拦截服务模块还用于,将音频拦截服务注册信息发送给行为分析模块;行为分析模块还用于,根据音轨以及音频拦截服务注册信息进行行为分析。
在另一个可选地实现中,该设备还包括:操作系统,该操作系统包括音频拦截服务模块。
在再一个可选地实现中,操作系统还包括音轨模块。
在再一个可选地实现中,该设备还包括第一应用程序,用于确定音频数据源,向音频拦截服务模块进行音频拦截服务注册。
在再一个可选地实现中,该设备还包括第二应用程序,用于向音频拦截服务模块发送第二拦截指示信息,该第二拦截指示信息携带有标识信息;在音频数据源进行音频拦截服务注册后,该第二拦截指示信息用于指示音频拦截服务模块对音频拦截服务注册信息中包含于第二指示信息携带的标识信息中的音频数据源进行拦截。
第四方面,本发明实施例提供了一种计算机存储介质,用于储存计算机 软件指令,计算机运行该指令,用于:
确定音频数据源,对所述音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息,所述音频拦截服务注册信息包括对所述音频数据源进行音频拦截服务的标识信息;
当根据所述标识信息确定需要拦截所述音频数据源时,对所述音频数据源进行软解码,得到音轨数据;
根据所述音轨数据进行行为分析。
由上可以看出,本发明可以有针对性的对音轨数据进行收集,通过对音轨数据的分析可得到用户行为样本,降低了用户行为样本收集的难度。
附图说明
图1为一种数据源播放过程示意图;
图2为本发明实施例提供的一种音频处理设备结构示意图;
图3为本发明实施例提供的另一种音频处理设备结构示意图;
图4为本发明实施例提供的一种音频数据源播放过程示意图;
图5为本发明实施例提供的信息交互图;
图6为本发明实施例提供的一种音频处理方法的流程图;
图7为本发明实施例提供的一种终端设备结构示意图。
具体实施方式
应该理解的是,尽管下面提供了一种或多种实施例的示例性实施方式,本发明公开的系统和/或方法可通过多种其他已知的或存在的技术实施。本发明决不应限于下文所说明的描述性实施方案、图式和技术,包含本文所说明描述的示范性设计和实施方案,而是可以在所附权利要求书的范围以及其均等物的完整范围内修改。
需要说明的是,本发明涉及的第一应用程序可以用于确定音频数据源以 及向音频拦截服务模块进行音频拦截服务注册,也就是播放音频数据源的应用程序,例如,第三方应用程序、操作系统自带的媒体播放器或者网络电话等。第二应用程序,用于指示音频拦截服务模块拦截那些音频数据源。第一应用可以为运行在终端设备上的APP。
“第一拦截指示信息”为音频拦截服务模块向音轨模块发送拦截指示信息,该第一拦截指示信息的指示音轨数据拦截哪个音频数据源对应的音轨数据。“第二拦截指示信息”为第二应用程序向音频拦截服务模块发送的拦截指示信息,该第二拦截指示信息用于指示音频拦截服务模块拦截哪个音频数据源。第二应用可以为运行在终端设备上的APP。
其中,这里的“第一”与“第二”仅是为了区分并不构成限定。
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。
终端设备上一般运行有操作系统,例如Android OS(Operating System,操作系统),Windows OS或IOS等等。该操作系统主要用来管理和控制计算机硬件与软件资源,是最基本的系统软件,操作系统可理解为用户和计算机的接口以及计算机硬件和其他软件的接口,其他软件在操作系统的支持下才能运行。
在本发明实施例中,可通过操作系统提供音频拦截服务,该音频拦截服务可拦截音频数据源经过解码后的音轨数据。其他软件程序(第二应用程序)可通过调用该音频拦截服务实现对音轨数据的拦截,根据拦截到的音轨数据可进行用户行为分析。
需要说明的是,终端设备可通过拦截服务标识信息区分音频数据源,对有拦截需要的音频数据源,拦截该音频数据源经过软解码得到音轨数据(脉冲编码调制(Pulse Code Modulation,PCM)码流),根据该音轨数据以及拦截服务标识信息等进行行为分析。例如,根据音轨数据进行语音识别、歌曲识别或者情景感知等等,或进一步根据识别或感知出的信息等等进行综合分析。
本发明实施例涉的终端设备,可以为手机、平板电脑、个人数字助理(英文:Personal Digital Assistant,PDA)、销售终端(Point of Sales,POS)、车载电脑等等。
图1为一种数据源播放过程示意图,如图1所示,可以通过音频数据源的统一资源标识符(Uniform Resource Identifier,或URI)或文件描述符(file descriptor,Fd)指定需要播放的音频数据源,例如,音频数据源的URI或Fd可以是指,超文本传输协议(HyperText Transfer Protocol,HTTP)或实时流传输协议(Real Time Streaming Protocol,RTSP)的统一资源定位符(URL,Uniform Resource Locator),本地文件的地址(URI)或者本地文件描述符Fd。
在本发明实施例中,音频数据源可以包括音频数据(例如,音频文件、视频文件、VoIP数据流等等),还可以包括在播放音频数据时的操作命令(例如,开始、暂停、结束、快进以及快退等等)等。其中,音频数据对应有URI或者fd,动作对应命令。
例如,在基于Android OS的终端设备中,通过setDataSourc确定URI或者fd对应的音频数据源(DataSource),为下一步的解析(demux)提供数据支持。需要说明的是,一般DataSource是依据一定的编码格式经过编码的,需要通过解析模块(extractor)解析出编码器信息,才能对DataSource进行解码。另外,DataSource可以是音视频数据压缩(封装)在一起的,但音频需要用扬声器播放,视频需要用显示屏显示,所以在播放时,需要解析分别得到独立的音频数据和视频数据以及他们分别对应的解码器信息;DataSource也可以是仅是经过压缩的音频数据,此时也需要经过解析得到该DataSource的解码器信息。
因为经过封装的DataSource的格式有很多种,需通过setDataSourc产生的DataSource来生成不同的解析模块(extractor)。例如,封装格式为WMV(Windows Media Video,是微软开发的一系列视频编解码和其相关的视频编码格式的统称)的DataSource需要WVMExtractor来解复用;封装格式为AMR (Adaptive Multi-Rate,自适应多速率音频压缩)的DataSource需要AMRExtractor来解析;等等。
对于音视频压缩在一起的DataSource经过extractor之后,会得到独立的视频数据以及其对应的解码器信息和音频数据以及其对应的解码器信息,该extractor把封装格式里面的音视频流拆分出来,分别的送给音视频解码器;仅是经过压缩的音频数据对应的DataSource经过extractor之后,会得到音频数据源对应的解码器信息。
接下来终端设备需根据解析到的解码器信息生成解码器。不同类型的DataSource匹配不同的解码器。
另外,对音频数据源的处理过程可分为硬解码和软解码(软解码需要软件形式的解码器,硬解码需要硬件形式的解码器)。
解码完之后得到音轨数据或视频轨数据,经过渲染后便可输出为音频或视频。例如,由于终端设备需要播放的音频数据源可能是多个,所以经过解码便得到多个音轨数据,经过混音器进行混音后,驱动扬声器播放混音后的音频。
在本发明实施例中,通过拦截解码之后,混音之前的音轨数据用于行为分析。
下面结合本发明实施例提供的终端设备的结构,对本发明做进一步的介绍。
图2为本发明实施例提供的一种终端设备结构示意图。如图2所示,本发明实施例涉及的终端设备100包括:应用程序(Application,APP)111(第一应用程序)、媒体服务模块120、音频拦截服务模块130、以及行为分析模块140(可以是第二应用程序,也可是操作系统系统的服务,还可是其他应用程序)。在本发明实施例中,媒体服务模块120、音频拦截服务模块130可为运行在终端设备100上的操作系统提供的服务,运行在终端设备100上的APP111可通过调用操作系统提供的API(Application Programming Interface, 应用程序编程接口),可调用API对应的服务。另外,还可以包括网络通话(Voice over Internet Protocol,VoIP)模块112(例如,VoIP模块112可以为第一应用程序),VoIP模块112可由运行在终端设备100上的操作系统提供。
媒体服务模块120可以用于对多种格式的音频数据源进行解码得到音轨数据,并对音轨数据进行混音后传输给硬件抽象层。其中,媒体服务模块120可以包括音频处理模块121、音轨模块122以及混音模块123。音频数据源包括多种格式,终端设备100通过操作系统提供的一个或多个音频处理模块121可对不同格式的音频数据源进行处理得到音轨数据(也就是实现图1所示的将音频数据源经过解码器得到音轨数据的过程),并提供给音轨模块122,音轨模块122将音轨数据提供给混音模块123,由混音模块123进行混音后,驱动硬件进行播放。其中,音轨模块122输出的为独立的音轨数据,混音模块123输出的为把所有的独立的音轨数据的混音后的音轨数据。
例如,终端设备100可提供用于播放电话信号音、音频或视频文件、流媒体、游戏音效、按键音、与视频同步的音频、与游戏动画交互的音效以及利用麦克风录音等类型的音频数据源的音频处理模块,不同类型的音频数据源可对应不同的编码格式。还例如,在Android OS中提供了,ToneGenerator(音调发生器)用于播放电话信号音;MediaPlayer(媒体播放器)用于播放音频、视频文件以及流媒体;SoundPool(声池)能够低延时播放,可用于播放游戏音效或按键音;AudioPlayer(音频播放器)可用于与视频同步的音频播放;JetPlayer(一种音频播放器)用于播放JETEngine音效,可以与游戏动画交互;AudioRecord用于控制MIC录音。
APP111可通过调用这些模块,通过操作系统来播放需要播放的音频数据源。VoIP模块112一般为操作系统提供的服务,其可将从网络侧接收到的音频数据源直接通过操作系统来实现播放。操作系统也可直接播放需要播放的音频数据源,例如,通过MediaPlayer(系统默认的播放器)来播放音频。
终端设备在利用上述音频处理模块121播放音频数据源时,终端设备需 创建音轨模块(例如,Android OS中的AudioTrack)122来播放音频数据源,具体地,音频处理模块121创建音频解码器以及音轨模块122,通过音频解码器对音频数据源进行解码得到脉冲编码调制(Pulse Code Modulation,PCM)码流,该PCM码流即为音轨数据,通过混音模块123对音轨数据进行混音等的处理后,再经过数模转换器进行转换后由扬声器播放。
应用程序111或VoIP模块112也可直接对音频数据源进行解码得到音轨数据,此种情况下,终端设备也需要创建音轨模块122,利用音轨模块122来进行播放。例如,在利用VoIP模块112进行网络通话时,终端设备接收到来自网络(例如,移动通信网)的音频数据源,由VoIP模块112创建解码器,将该音频数据源解码后得到音轨数据。
终端设备还可通过操作系统提供包管理器(PackageManager),该包管理器可用于确定音频数据源对应的APP包名,通过该包名可确定该音频数据源的应用程序,例如,APP包名为com.baidu.music,可以确定该应用程序为百度音乐播放器。
在本发明实施例中,音频处理设备100可提供用于音频拦截服务的音频拦截服务模块130,该音频拦截服务模块130可设置在操作系统内,所有需要播放的音频数据源需要在音频拦截服务模块130注册,操作系统可提供音频拦截服务模块130的API。具有行为分析模块的APP可通过音频拦截服务模块130的API调用音频拦截服务模块130,对需要拦截的音频数据源进行拦截。例如,音频拦截服务模块130可维护一个需要拦截的音频数据源的标识信息表,当有音频数据源需要在音频拦截服务模块130注册时,确定该音频数据源的频数据源的标识信息是否在需要拦截的音频数据源的标识信息表内,如果在,则对该音频数据源进行拦截。不过不在,则不拦截该音频数据源。该标识信息表可根据接收第二应用程序发送的第二拦截指示信息携带的标识信息确定。其中,在接收到多个第二应用程序发送的第二拦截指示信息时,该标识信息表可包括第二拦截指示信息携带的标识信息与第二应用程序的对应 关系。在进行音频拦截服务时,可根据该标识信息表将拦截的音频数据源发送给对应的第二应用程序,以便该第二应用程序进行行为分析。
例如,可在音频处理设备创建音轨模块122时进行音频拦截服务注册,注册信息可以包括音频数据源对应的音轨模块标识、音轨模块中码流类型(StreamType)、进程号(PID)以及音频处理模块标识(ModuleID)等等。
需要说明的是,音频处理设备可包括多个第一应用程序用于确定音频数据源时,可分别创建与每个第一应用程序对应的音轨模块。
其中,音频处理模块标识能够标识前述音频处理模块,例如,安卓系统中的MediaPlayer、Tone Generator、Sound Pool、AudioPlayer、JetPlayer分别通过不同的ModuleID标识。
音频拦截服务模块130可根据注册信息,拦截需要拦截的音频数据源对应的音轨数据(PCM码流);情景感知模块等其他行为分析模块140根据音轨数据以及注册信息进行行为分析(例如,根据PCM码流进行语音识别、歌曲识别或者情景感知等等,或进一步根据识别或感知出的信息等等进行综合分析)。其中,情景感知模块等其他行为分析模块140还可结合其他的数据进行综合分析,例如,APP包名,或者根据运动传感器提供的数据进行分析用户的状态(例如,跑步),或者其他应用程序提供的行为分析结果(例如,健身APP提供的用户健身的数据),进一步地,再例如歌曲识别模块根据音轨数据分析出正在播放的为钢琴曲,根据拦截服务注册信息确定APP为音乐播放器,健身APP提供的行为分析结果为用户在慢跑。其他应用程序或者功能模块便可以根据这些信息分析得到综合的结果,根据这些结果可分析用户的行为习惯等信息。
在另一个实施例中,如图3所示,APP210确定播放音频数据源需求后,可创建音轨模块220,APP210可直接对音频数据源进行解码得到PCM码流,利用音轨模块220来进行播放。在本发明实施例中,可不由音频处理设备100上运行的操作系统提供的音频服务器来实进行播放。具体地,APP210确定播 放音频数据源需求可直接创建音轨模块220,并同时向音频拦截服务模块130进行注册,在音频播放时,音频拦截服务模块130对需要拦截的音轨数据进行拦截。
下面结合图4以基于Android OS的音频处理设备提供的各个模块为例,对本方面实施例作进一步地介绍。音频处理设备的操作系统包括应用层、Framework层以及硬件抽象层。其中应用层包括操作系统提供的各个API;Framework层包括音频处理模块、音轨模块、混音模块(AudioFlinger)以及音频拦截服务模块;硬件抽象层包括有线耳机、蓝牙、扬声器以及earpace等硬件的接口。
Android OS提供了MediaPlayer API,AudioTrack API,SoundPool API以及拦截服务API。当然,Android OS还可以提供更多模块的API,本发明实施例仅是以上述API为例进行说明。其中,MediaPlayer,SoundPool以及拦截服务模块为Android OS通过媒体服务模块(MediaServer)提供。
在一个示例中,游戏类APP调用SoundPool API来实现游戏音效的播放。具体地,游戏类APP在运行时会产生音频播放的需求,例如背景音乐或者音效等等,这些背景音乐或音效的文件(音频数据源)一般储存在游戏类APP对应的APP包(该APP包一般包括该APP运行的程序以及数据等)中,游戏类APP一般会在其进程中调用SoundPool API,由SoundPool来对需要播放的音频数据源进行处理。SoundPool在收到游戏类APP的调用请求后,会进行注册,注册需要播放的音频数据源以及对应的APP的进程号等信息,以便执行播放流程。此时,MediaServer会创建AudioTrack用来播放SoundPool传输过来的音频码流,在本发明实施例中,在Media Server创建AudioTrack的同时,通过MediaServer提供的音频拦截服务模块进行拦截服务注册,注册音频数据源的应用程序的PID,音频数据源的URI或Fd,Stream Type(码流类型)(例如,ALARM(闹铃)或MUSIC(音乐)等等),处理该音频数据源的处理模块的标识(SoundPool的标识),还可以注册其他能够对音频数据 源进行音频拦截服务的标识信息。其中,该音频数据源对应的音频拦截服务的标识信息可用于行为分析。换句话说,一组音频拦截服务的标识信息能够标识音频处理设备的一次音频播放行为,例如,哪个APP在播放哪个音频(网易云音乐听歌中)。
在另一个示例中,VoIP模块不需要经过音频处理模块处理,直接调用AudioTrack API来实现语音播放。具体地,VoIP模块在运行时会产生音频播放的需求,例如,网络电话通话中,需要播放语音,语音对应的音频数据源为通过网络接收到的VoIP数据,音频处理设备将从网络接收的VoIP数据缓存在本地存储器中,VoIP模块对存储器中的VoIP数据进行解码得到音轨数据。Media Server在创建AudioTrack用来播放VoIP模块传输过来的音轨数据的同时,通过Media Server提供的音频拦截服务模块进行拦截服务注册,注册音频数据源的应用程序的PID,音频数据源的URI或Fd,Stream Type(码流类型),处理该音频数据源的处理模块的标识(AudioTrack的标识)等等。
在又一个示例中,MediaPlayer可以直接用来播放音频数据源,第三方APP也可通过调用MediaPlayer API来实现音频播放。具体地,第三方APP在运行时会产生音频播放的需求,或者用户触发MediaPlayer来播放音频。MediaPlayerService会创建StagefrightPlayer,AwesomePlayer以及AudioTrack用来播放器音频数据源。其中,通过解析模块(Exteractor)对音频数据源进行解析得到解码器信息,由解码器(OMXCodec)对音频数据源进行解码得到音轨数据。在Media Server创建AudioTrack的同时,通过Media Server提供的音频拦截服务进行拦截服务注册,注册音频数据源的应用程序的PID,音频数据源的统一资源标识符(Uniform Resource Identifier,或URI)或文件描述符(file descriptor,Fd),Stream Type(码流类型)(例如,ALARM(闹铃)或MUSIC(音乐)等等),处理该音频数据源的处理模块的标识(SoundPool的标识)等等。
音频拦截App通过拦截服务API调用音频拦截服务模块,通过音频拦截 服务模块对需要拦截的音频数据源进行拦截。具体地,音频拦截App将需要拦截的音频数据源的音频拦截服务的标识信息提供给音频拦截服务模块,由音频拦截服务模块根据注册的音频拦截服务的标识信息,通知AudioTrack将需要拦截的音频数据源对应的音轨数据通过音频拦截服务模块提供给音频拦截App。
音轨模块将音轨数据提供给混音模块由混音模块混音后提供给硬件抽象层,以便进行播放。
通过本发明实施例,音频处理设备可通过提供音频拦截服务模块来实现对音频数据源进行拦截,且在进行拦截时,可通过音频拦截服务标识信息,对有需要的音频数据源进行拦截,可实现有针对性的对用户行为样本进行收集,通过用户行为样本可分析出用户的行为,降低了用户行为分析的难度。
结合上述模块的划分,以对来自APP(第一应用程序)的音频数据源的拦截为例,对本发明实施例做进一步的介绍。图5为本发明实施例提供的信息交互图。如图5所示,本发明实施例具体可以包括如下步骤:
S201,APP确定播放音频数据源需求,向媒体服务模块注册。具体地,需要向媒体服务模块注册APP的PID以及需要播放的音频数据源的URI或Fd等信息。
APP在运行时一般会有播放音频的需求,例如,游戏类APP需要播放音效,音乐类APP需要播放播放列表中的歌曲,视频类APP也需要与视频同步播放音频或者即使通信APP需要来信提醒等等。APP通过调用媒体服务模块提供的音频处理模块的API可请求到媒体服务模块中的音频处理模块来播放音频(例如,MediaPlayer或SoundPool等等)。音频处理模块收到APP的调用请求后会向媒体服务模块注册,携带该APP的进程号(PID)以及需要播放的音频数据源的URI或Fd等信息。这样媒体服务便可确定需要播放哪个音频数据源。
S202,媒体服务模块对音频数据源进行媒体解析,确定解码器信息。
媒体服务会创建音轨模块(例如,AudioTrack),建立音轨模块与混音模块(例如,AudioFlinger)的IPC(Inter-Process Communication,进程间通信)通道。
S203,媒体服务向音频拦截服务模块(AudioInterceptor)(例如,图1中的音频拦截服务模块130)进行音频拦截服务注册。
可在音轨模块创建时同时向音频拦截服务模块进行音频拦截服务注册。携带PID、StreamType以及ModuleID等用于音频拦截服务的标识信息。
可以理解的,StreamType可以包括:VOICE_CALL(语音通话),SYSTEM(系统声音),RING(来电铃声),MUSIC(音乐),ALARM(闹铃),NOTIFICATION(通知铃声),BLUETOOTH_SCO(蓝牙音频),DTMF(dual-tone multifrequency,双音多频),TTS(Text to Speech,语音合成)等等。
需要说明的是,ModuleID可以标识:ToneGenerator、MediaPlayer、SoundPool、AudioPlayer、以及JetPlayer等音频处理模块和音轨模块。
S204,音频拦截服务模块根据音频拦截服务的标识信息判断是否要拦截音频数据源。
音频拦截服务模块可根据标识信息进行拦截筛选。例如,只拦截特定的标识信息对应的音频数据源。其中,该特定的标识信息可提前设定,也可由行为分析模块设定,通知音频拦截服务模块。
S205,当音频拦截服务模块判断要拦截音频数据源时,向音轨模块发送拦截该音频数据源的指示(第一拦截指示信息)。
S206,媒体服务模块接收APP发送的开始播放指示。
APP可根据自身的需求,向媒体服务模块开始播放、暂停播放、终止播放等音频动作。媒体服务模块根据上述指令进行工作。
S207,媒体服务模块根据解码器信息,根据拦截该音频数据源的指示对需要拦截的音频数据源进行软解码得到音轨数据。
其中,音频数据源的解码方式可包括软解码和硬解码。对于需要拦截的音频数据源如果进行软解码的话无法获取音轨数据,所以可由音频拦截服务模块通知媒体服务模块对需要拦截的音频数据源都进行软解码。另外,对不不需要拦截的音频数据源,音频拦截服务模块可不对其解码方式进行干预。
S208,音轨模块向音频拦截服务模块发送需要拦截的音频数据源解码后的音轨数据(PCM码流)的buffer(副本)。
其中,PCM Buffer到音频拦截服务模块的过程是通过Binder机制写入共享内存,再通知音频拦截服务模块通过共享内存指针读取,该过程可通过进程间通信,也可能是线程间通信来实现。
S209,音频拦截服务模块将音轨数据的buffer发送给行为分析模块,以便行为分析模块根据音轨数据的buffer进行行为分析。例如,通过Speech To Text技术将音轨数据转成文本用于语义分析或歌曲识别模块识别内录音频等等。
S210,媒体服务模块将音轨数据输出到硬件抽象层(Hardware Abstraction Laye,HAL)进行播放。
需要说明的是,S205与S206;S209与S210可以任意的先后顺序或同时执行。
可选地,音频拦截服务模块还可通过下述步骤获取APP数据包的名称,提供给行为分析模块进行分析。
S211,音频拦截服务模块向包管理器查询APP包名。例如,音频拦截服务模块可提供PID,包管理器可根据PID查询APP包名。
S212,包管理器返回APP包名。
另外,音频数据源若不包括音频动作,音频拦截服务模块可从媒体服务模块确定需要拦截的音频数据源对应的音频动作,提供给行为分析模块进行分析。
表1为行为分析模块分析出的结果的一些举例。
表1
Package Name Module ID Stream Type Context
com.netease.cloudmusic MediaPlayer MUSIC 网易云音乐听歌中
com.qiyi.video AudioPlayer MUSIC 爱奇艺看视频
com.autonavi.minimap TTS   高德导航中
com.baidu.music AudioTrack MUSIC 百度音乐播放中
  AudioPlayer   视频播放器播放中
    VOICE_CALL IP通话中
    RING 来电状态
com.android.phone   DTMF 拨号中
  ToneGenerator   在IP拨号
com.halfbrick.fruitninjahd JetPlayer   水果忍者游戏中
tencent.qqgame.lord SoundPool   玩斗地主
com.htc.task   ALARM 正在日程提醒
com.karakal.musicalarm   ALARM 青橙听闹钟正在提醒
通过本发明实施例可以实现,通过音频拦截服务模块对音频数据源进行拦截,结合该音频数据源对应的其他信息,例如播放该音频数据源的APP包名,可实现用户行为分析。且通过获取音频数据的方式拦截音频数据源,降低了用户行为样本收集的难度,同时也降低了用户行为分析的难度。
需要说明的是,上述实施例中的功能模块以及对应的流程仅为本发明的一种实现方式,并不够成限定。上述实施例中的功能模块可以通过软件与硬件结合的方式实现。
图6,为本发明实施例提供的一种音频处理方法的流程图。具体包括如下步骤:
S610,确定音频数据源,对该音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息,该音频拦截服务注册信息包括对该音频数据源进行音频拦截服务的标识信息。
需要说明的是,音频数据源可以是本地存储器或拓展存储器存储的音频文件或视频文件,需要播放的媒体还可以是VoIP的音频数据包,或者应用程序从网络侧接收的音频数据包。可通过URI或Fd确定对应的音频数据源,具体过程参加前述图1所示的实施例,不再赘述。
在确定音频数据源后,需要对该音频数据源进行音频拦截服务注册,在注册时,可提供进程号(Process identifier,PID),还可提供码流类型、API类型,播放方式等等其他能够标识该音频数据源的信息。其中,播放方式可指音频数据源通过哪个播放音频处理模块播放。具体过程参加前述图2、3所示的实施例,不再赘述。
S620,当根据标识信息确定需要拦截音频数据源时,对音频数据源进行软解码,得到音轨数据。
在对音频数据源进行音频拦截服务注册后,便可以根据注册信息中的标识信息来确定是否需要拦截该音频数据源。
对于不同的播放音频处理模块的播放方式,解码的流程也不尽相同。一些音频数据源在不需要拦截时,可走软解码流程,也可走硬解码流程。对于需要拦截的音频数据源,则需走软解码流程。通过对音频数据源进行解码,可得到PCM码流,也就是音轨数据。
另外,还可通过确定需要拦截的音频数据源的标识信息集;
判断当前音频拦截服务注册信息中的标识信息是否在标识信息集中;
当前音频拦截服务注册信息中的标识信息在标识信息集中时,确定需要拦截该音频数据源。该过程可参见图2所示的实施例中标识信息表的相关描述,不再赘述。
S630,根据音轨数据进行行为分析。
通过对音轨数据进行语音识别,将音轨数据转换成文本,该文本可用于语义分析或歌曲识别模块识别内录音频等等。还可根据音轨数据进行情景感知。
还可以结合注册信息和/或音频数据源的数据包信息等等进行综合分析。
通过本发明实施例,可以实现通过拦截音轨数据栏实现对音频数据源的拦截,降低了收集用户行为样本的难度,且通过音频拦截服务标识有针对性的对音频数据源进行拦截,资源消耗较少,提高了用户体验。
图7为本发明实施例提供的一种终端设备结构示意图。以终端设备为手机为例,图7示出的是与本发明实施例相关的手机500的部分结构的框图。参考图7,手机500包括、RF(Radio Frequency,射频)电路510、存储器520、其他输入设备530、显示屏540、传感器550、音频电路560、I/O子系统570、处理器580、以及电源590等部件。本领域技术人员可以理解,图5中示出的手机结构并不构成对手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。本领领域技术人员可以理解显示屏540属于用户界面(UI,User Interface),且手机500可以包括比图示或者更少的用户界面。
下面结合图5对手机500的各个构成部件进行具体的介绍:
RF电路510可用于收发信息或通话过程(例如,VoIP通话)中,信号的接收和发送,特别地,将基站的下行信息(音频数据源)接收后,给处理器580处理;另外,将设计上行的数据(例如,音频数据源获取请求等等)发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、LNA(Low Noise Amplifier,低噪声放大器)、双工器等。此外,RF电路510还可以通过无线通信与网络和其他设备通信。该无线通信可以使用任一通信标准或协议,包括但不限于GSM(Global System of Mobile communication,全球移动通讯系统)、GPRS(General Packet Radio Service,通用分组无线服务)、CDMA(Code Division Multiple Access,码分多址)、WCDMA(Wideband Code Division Multiple Access,宽带码分多址)、LTE(Long Term Evolution,长期演进)、电子邮件、SMS(Short Messaging Service,短消息服务)等。
存储器520可用于存储软件程序(例如,音乐播放器、VoIP模块以及操作系统等等)以及数据,处理器580通过运行存储在存储器520的软件程序,从而执行手机500的各种功能以及数据处理。存储器520可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能(例如,音频播放功能以及视频播放功能等)所需的应用程序等;存储数据区可存储根据手机500的使用所创建的数据(比如音频数据以及VoIP通话时长和时间等)等。此外,存储器520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
其他输入设备530可用于接收输入的数字或字符信息,以及产生与手机500的用户设置以及功能控制有关的键信号输入(例如,开始、暂停、结束、切换、快进以及快退等等)。具体地,其他输入设备530可包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆、光鼠(光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸)等中的一种或多种。其他输入设备530与I/O子系统570的其他输入设备控制器571相连接,在其他设备输入控制器571的控制下与处理器580进行信号交互。应该知道的是,在发明实施例中,其他输入设备530可承担与用户的交互,根据其他输入设备530产生的信息可获得用户行为样本,结合手机500中运行的程序,
显示屏540可用于显示由用户输入的信息或提供给用户的信息以及手机500的各种菜单(例如,播放列表以及播放进度等等),还可以接受用户输入。具体的显示屏540可包括显示面板541,以及触控面板542。其中显示面板541可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等形式来配置显示面板541。触控面板542,也称为触摸屏、触敏屏等,可收集用户在其上或附近的接触或者非接触操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板542上 或在触控面板542附近的操作,也可以包括体感操作;该操作包括单点控制操作、多点控制操作等操作类型。可选的,触控面板542可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位、姿势,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成处理器能够处理的信息,再送给处理器580,并能接收处理器580发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板542,也可以采用未来发展的任何技术实现触控面板542。进一步的,触控面板542可覆盖显示面板541,用户可以根据显示面板541显示的内容(该显示内容包括但不限于,软键盘、虚拟鼠标、虚拟按键、图标等等),在显示面板541上覆盖的触控面板542上或者附近进行操作,触控面板542检测到在其上或附近的操作后,通过I/O子系统570传送给处理器580以确定用户输入,随后处理器480根据用户输入通过I/O子系统570在显示面板541上提供相应的视觉输出。虽然在图5中,触控面板542与显示面板541是作为两个独立的部件来实现手机500的输入和输入功能,但是在某些实施例中,可以将触控面板542与显示面板541集成而实现手机500的输入和输出功能。需要说明的是,在本发明实施例中,可根据该
手机500还可包括至少一种传感器550,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板541的亮度,接近传感器可在手机500移动到耳边时,关闭显示面板541和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机500还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。通过传感器550可以确定用户的行 为数据,以便进行分析。
音频电路560、扬声器561,麦克风562可提供用户与手机500之间的音频接口。音频电路560可将接收到的音轨数据数模转换后的模拟信号,传输到扬声器561,由扬声器561转换为声音信号输出;另一方面,麦克风562将收集的声音信号转换为模拟信号,由音频电路560接收后模数转换为音频数据,再将音频数据输出至RF电路510以发送给比如另一手机,或者将音频数据输出至存储器520以便进一步处理(例如,进行播放)。
I/O子系统570用来控制输入输出的外部设备,可以包括其他设备输入控制器571、传感器控制器572、显示控制器573。可选的,一个或多个其他输入控制设备控制器571从其他输入设备530接收信号和/或者向其他输入设备530发送信号,其他输入设备530可以包括物理按钮(按压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击滚轮、光鼠(光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸)。值得说明的是,其他输入控制设备控制器571可以与任一个或者多个上述设备连接。所述I/O子系统570中的显示控制器573从显示屏540接收信号和/或者向显示屏540发送信号。显示屏540检测到用户输入后,显示控制器573将检测到的用户输入转换为与显示在显示屏540上的用户界面对象的交互,即实现人机交互。传感器控制器572可以从一个或者多个传感器550接收信号和/或者向一个或者多个传感器550发送信号。
处理器580是手机500的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器520内的软件程序和/或模块,例如,图1或图3中所示的模块都可运行在处理器580中。以及调用存储在存储器520内的数据,执行以下步骤:确定音频数据源,对所述音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息,所述音频拦截服务注册信息包括对所述音频数据源进行音频拦截服务的标识信息。当根据所述标识信息确定需要拦截所述音频数据源时,对所述音频数据源进行软解码,得到 音轨数据;根据所述音轨数据进行行为分析。可选的,处理器580可包括一个或多个处理单元;优选的,处理器580可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信(移动通信)。可以理解的是,上述调制解调处理器也可以不集成到处理器580中。
手机500还包括给各个部件供电的电源590(比如电池),优选的,电源可以通过电源管理系统与处理器580逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗等功能。
尽管未示出,手机500还可以包括摄像头、蓝牙模块等,在此不再赘述。
本领域的普通技术人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域的普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令处理器完成,所述的程序可以存储于计算机可读存储介质中,所述存储介质是非短暂性(英文:non-transitory)介质,例如随机存取存储器,只读存储器,快闪存储器,硬盘,固态硬盘,磁带(英文:magnetic tape),软盘(英文:floppy disk),光盘(英文:optical disc)及其任意组合。
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。

Claims (14)

  1. 一种音频处理方法,其特征在于,包括:
    确定音频数据源,对所述音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息,所述音频拦截服务注册信息包括对所述音频数据源进行音频拦截服务的标识信息;
    当根据所述标识信息确定需要拦截所述音频数据源时,对所述音频数据源进行软解码,得到音轨数据;
    根据所述音轨数据进行行为分析。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述音轨数据进行行为分析包括根据所述音轨数据以及下述一项或多项进行行为分析:
    操作命令,所述音频数据源对应的音频数据包的名称。
  3. 根据权利要求1所述的方法,其特征在于,所述标识信息包括下述一项或多项:码流类型、进程号和播放方式。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述根据所述标识信息确定需要拦截所述音频数据源包括:
    确定需要拦截的音频数据源的标识信息集;
    判断所述标识信息是否在所述标识信息集中;
    当所述标识信息在所述标识信息集中时,确定需要拦截所述音频数据源。
  5. 一种终端设备,其特征在于,包括:处理器和存储器,存储器用于存储程序,所述处理器通过运行存储器中的程序,用于:
    确定音频数据源,对所述音频数据源进行音频拦截服务注册,得到音频拦截服务注册信息,所述音频拦截服务注册信息包括对所述音频数据源进行音频拦截服务的标识信息;
    当根据所述标识信息确定需要拦截所述音频数据源时,对所述音频数据源进行软解码,得到音轨数据;
    根据所述音轨数据进行行为分析。
  6. 根据权利要求5所述的终端设备,其特征在于,所述根据所述音轨数据进行行为分析包括根据所述音轨数据以及下述一项或多项进行行为分析:
    音频动作进行行为分析,所述音频数据源对应的音频数据包的名称。
  7. 根据权利要求5所述的终端设备,其特征在于,所述标识信息包括下述一项或多项:码流类型、进程号、播放方式。
  8. 根据权利要求5-7任意一项所述的终端设备,所述处理器还用于:
    确定需要拦截的音频数据源的标识信息集;
    判断所述标识信息是否在所述标识信息集中;
    当所述标识信息在所述标识信息集中时,确定需要拦截所述音频数据源。
  9. 一种音频处理设备,其特征在于,所述音频处理设备包括音频拦截服务模块,音轨模块以及行为分析模块;
    所述音频拦截服务模块,用于确定音频拦截服务注册信息,所述音频拦截服务注册信息包括对音频数据源进行音频拦截服务的标识信息;当根据所述标识信息确定需要拦截所述音频数据源时,所述音频拦截服务模块向所述音轨模块发送第一拦截指示信息;
    所述音轨模块,用于接收所述第一拦截指示信息,根据所述第一拦截指示信息的指示将所述音频数据源对应的音轨数据发送给所述音频拦截服务模块;
    所述音频拦截服务模块,还用于将所述音轨模块发送的音轨数据发送给所述行为分析模块;
    所述行为分析模块用于根据所述音轨进行行为分析。
  10. 根据权利要求9所述的设备,其特征在于,
    所述音频拦截服务模块还用于,将所述音频拦截服务注册信息发送给所述行为分析模块;
    所述行为分析模块还用于,根据所述音轨以及音频拦截服务注册信息进行行为分析。
  11. 根据权利要求9或10所述的设备,其特征在于,还包括:操作系统,所述操作系统包括所述音频拦截服务模块。
  12. 根据权利要求11所述的设备,其特征在于,所述操作系统还包括所述音轨模块。
  13. 根据权利要求9-11任意一项所述的设备,其特征在于,还包括第一应用程序,用于确定所述音频数据源,向所述音频拦截服务模块进行音频拦截服务注册。
  14. 根据权利要求9-13所述的设备,其特征在于,还包括第二应用程序,用于向所述音频拦截服务模块发送第二拦截指示信息,所述第二拦截指示信息携带有标识信息;在所述音频数据源进行音频拦截服务注册后,所述第二拦截指示信息用于指示所述音频拦截服务模块对音频拦截服务注册信息中包含于所述第二指示信息携带的标识信息中的音频数据源进行拦截。
PCT/CN2016/098112 2016-09-05 2016-09-05 音频处理方法及设备 WO2018040102A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US16/322,809 US11042587B2 (en) 2016-09-05 2016-09-05 Performing behavior analysis on audio track data to obtain a name of an application
CN201680056198.2A CN108140013B (zh) 2016-09-05 2016-09-05 音频处理方法及设备
PCT/CN2016/098112 WO2018040102A1 (zh) 2016-09-05 2016-09-05 音频处理方法及设备
EP16914667.7A EP3480707A4 (en) 2016-09-05 2016-09-05 AUDIO PROCESSING METHOD AND DEVICE

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/098112 WO2018040102A1 (zh) 2016-09-05 2016-09-05 音频处理方法及设备

Publications (1)

Publication Number Publication Date
WO2018040102A1 true WO2018040102A1 (zh) 2018-03-08

Family

ID=61299874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/098112 WO2018040102A1 (zh) 2016-09-05 2016-09-05 音频处理方法及设备

Country Status (4)

Country Link
US (1) US11042587B2 (zh)
EP (1) EP3480707A4 (zh)
CN (1) CN108140013B (zh)
WO (1) WO2018040102A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114051194A (zh) * 2021-10-15 2022-02-15 赛因芯微(北京)电子科技有限公司 一种音频轨道元数据和生成方法、电子设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113766278B (zh) * 2020-08-11 2024-04-12 北京沃东天骏信息技术有限公司 音频播放方法、音频播放装置和音频播放系统
CN112206520B (zh) * 2020-10-21 2022-09-02 深圳市欢太科技有限公司 实时音频采集方法、系统、服务端、客户端及存储介质
CN113778856B (zh) * 2021-07-27 2023-12-08 浙江大学 基于流媒体语义服务器的app检测方法和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231660A (zh) * 2008-02-19 2008-07-30 林超 电话自然对话中关键信息的挖掘系统及其方法
CN101894114A (zh) * 2009-05-18 2010-11-24 骅讯电子企业股份有限公司 在线信息个性化方法与系统
CN102456344A (zh) * 2010-10-22 2012-05-16 中国电信股份有限公司 基于语音识别技术分析客户行为特征的系统及方法
CN103221948A (zh) * 2010-08-16 2013-07-24 诺基亚公司 用于基于情境感知来执行设备动作的方法和装置
CN105005578A (zh) * 2015-05-21 2015-10-28 中国电子科技集团公司第十研究所 多媒体目标信息可视化分析系统
WO2016025277A1 (en) * 2014-08-11 2016-02-18 Nuance Communications, Inc. Dialog flow management in hierarchical task dialogs

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8528019B1 (en) * 1999-11-18 2013-09-03 Koninklijke Philips N.V. Method and apparatus for audio/data/visual information
US8463612B1 (en) 2005-11-08 2013-06-11 Raytheon Company Monitoring and collection of audio events
CN102176731A (zh) 2010-12-27 2011-09-07 华为终端有限公司 一种截取音频文件或视频文件的方法及手机
US9619811B2 (en) * 2011-12-20 2017-04-11 Bitly, Inc. Systems and methods for influence of a user on content shared via 7 encoded uniform resource locator (URL) link
RU2530210C2 (ru) * 2012-12-25 2014-10-10 Закрытое акционерное общество "Лаборатория Касперского" Система и способ выявления вредоносных программ, препятствующих штатному взаимодействию пользователя с интерфейсом операционной системы
US9195432B2 (en) * 2013-02-26 2015-11-24 Sonos, Inc. Pre-caching of audio content
CN105335498A (zh) * 2015-10-23 2016-02-17 广东小天才科技有限公司 一种基于语音信息进行信息推荐的方法和系统
RU2628925C1 (ru) * 2016-04-25 2017-08-22 Акционерное общество "Лаборатория Касперского" Система и способ защищенной передачи аудиоданных от микрофона к процессам

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231660A (zh) * 2008-02-19 2008-07-30 林超 电话自然对话中关键信息的挖掘系统及其方法
CN101894114A (zh) * 2009-05-18 2010-11-24 骅讯电子企业股份有限公司 在线信息个性化方法与系统
CN103221948A (zh) * 2010-08-16 2013-07-24 诺基亚公司 用于基于情境感知来执行设备动作的方法和装置
CN102456344A (zh) * 2010-10-22 2012-05-16 中国电信股份有限公司 基于语音识别技术分析客户行为特征的系统及方法
WO2016025277A1 (en) * 2014-08-11 2016-02-18 Nuance Communications, Inc. Dialog flow management in hierarchical task dialogs
CN105005578A (zh) * 2015-05-21 2015-10-28 中国电子科技集团公司第十研究所 多媒体目标信息可视化分析系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3480707A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114051194A (zh) * 2021-10-15 2022-02-15 赛因芯微(北京)电子科技有限公司 一种音频轨道元数据和生成方法、电子设备及存储介质

Also Published As

Publication number Publication date
US11042587B2 (en) 2021-06-22
EP3480707A4 (en) 2019-08-14
EP3480707A1 (en) 2019-05-08
CN108140013A (zh) 2018-06-08
US20190205338A1 (en) 2019-07-04
CN108140013B (zh) 2020-07-24

Similar Documents

Publication Publication Date Title
US8433828B2 (en) Accessory protocol for touch screen device accessibility
CN105872253B (zh) 一种直播声音处理方法及移动终端
WO2016165566A1 (zh) 弹幕发布方法和移动终端
KR101716401B1 (ko) 모바일 디바이스 이벤트의 통지
WO2019101015A1 (zh) 音频信号处理方法、装置和存储介质
KR101657963B1 (ko) 단말기의 터치 면적 변화율에 따른 운용 방법 및 장치
US20180364825A1 (en) Devices and Methods for Sharing User Interaction
JP2019091472A (ja) 発語トリガを常時リッスンするための動的閾値
US20090177966A1 (en) Content Sheet for Media Player
EP2688014A1 (en) Method and Apparatus for Recommending Texts
WO2018040102A1 (zh) 音频处理方法及设备
WO2016184295A1 (zh) 即时通讯方法、用户设备及系统
TW201715496A (zh) 多媒體海報生成方法及裝置
WO2018157812A1 (zh) 一种实现视频分支选择播放的方法及装置
CN107636541B (zh) 计算设备上的方法、用于闹铃的系统和机器可读介质
WO2017193496A1 (zh) 应用数据的处理方法、装置和终端设备
WO2017215661A1 (zh) 一种场景音效的控制方法、及电子设备
CN109862430B (zh) 多媒体播放方法及终端设备
WO2017101260A1 (zh) 音频切换方法、装置以及存储介质
KR20180076830A (ko) 오디오 장치 및 그 제어방법
WO2017215615A1 (zh) 一种音效处理方法及移动终端
CN111049977B (zh) 一种闹钟提醒方法及电子设备
WO2014180362A1 (zh) 一种终端及其管理多媒体记事本的方法
KR102206426B1 (ko) 사용자 기기의 도움말 제공 방법 및 그에 관한 장치
KR20150004623A (ko) 컨텐츠를 통합 검색할 수 있는 전자 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16914667

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016914667

Country of ref document: EP

Effective date: 20190130

NENP Non-entry into the national phase

Ref country code: DE