WO2016179921A1 - Procédé, appareil et dispositif de traitement d'informations de vulgarisation audio, et support de stockage informatique non volatile - Google Patents

Procédé, appareil et dispositif de traitement d'informations de vulgarisation audio, et support de stockage informatique non volatile Download PDF

Info

Publication number
WO2016179921A1
WO2016179921A1 PCT/CN2015/087978 CN2015087978W WO2016179921A1 WO 2016179921 A1 WO2016179921 A1 WO 2016179921A1 CN 2015087978 W CN2015087978 W CN 2015087978W WO 2016179921 A1 WO2016179921 A1 WO 2016179921A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
promotion information
feature
obtaining
original
Prior art date
Application number
PCT/CN2015/087978
Other languages
English (en)
Chinese (zh)
Inventor
田彪
Original Assignee
北京音之邦文化科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京音之邦文化科技有限公司 filed Critical 北京音之邦文化科技有限公司
Publication of WO2016179921A1 publication Critical patent/WO2016179921A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to audio processing technology, and in particular, to a method, an apparatus, a device and a non-volatile computer storage medium for processing audio promotion information.
  • the presentation of the audio promotion information may be determined based on the text content attribute such as the title and content of the audio promotion information, for example, whether the audio promotion information is displayed, the presentation position, the presentation time, and the like.
  • the audio promotion information is displayed, resulting in a decrease in the conversion rate of the audio promotion information.
  • aspects of the present invention provide a method, an apparatus, and a device for processing audio promotion information, and a non-volatile computer storage medium for improving the conversion rate of audio promotion information.
  • An aspect of the present invention provides a method for processing audio promotion information, including:
  • the acquiring the original audio data of the audio promotion information includes:
  • a text feature of the audio promotion information is obtained by using a voice recognition technology.
  • the promotional attribute feature comprising at least one of the following features:
  • the attribute characteristics of the push user of the audio promotion information are the attribute characteristics of the push user of the audio promotion information.
  • Another aspect of the present invention provides an apparatus for processing audio promotion information, including:
  • An obtaining unit configured to obtain original audio data of the audio promotion information
  • An audio unit configured to obtain an audio feature of the audio promotion information according to the original audio data
  • mapping unit configured to obtain a text feature of the audio promotion information according to at least one of the original audio data and the audio feature
  • a presentation unit configured to obtain, according to at least one of the audio feature and the text feature, a presentation of the audio promotion information.
  • any possible implementation manner further provide an implementation manner, where the acquiring unit is specifically configured to
  • mapping unit is specifically used to
  • a text feature of the audio promotion information is obtained by using a voice recognition technology.
  • the promotional attribute feature comprising at least one of the following features:
  • the attribute characteristics of the push user of the audio promotion information are the attribute characteristics of the push user of the audio promotion information.
  • an apparatus comprising:
  • One or more processors are One or more processors;
  • One or more programs the one or more programs being stored in the memory, when executed by the one or more processors:
  • a nonvolatile computer storage medium storing one or more programs when the one or more programs are executed by a device causes The device:
  • the embodiment of the present invention obtains an audio feature of the audio promotion information according to the original audio data of the obtained audio promotion information, and further according to the Obtaining, by at least one of the original audio data and the audio feature, a text feature of the audio promotion information, such that the presentation of the audio promotion information can be obtained according to at least one of the audio feature and the text feature
  • the audio promotion information is not completely dependent on the text content attribute of the audio promotion information, but the audio feature of the audio promotion information is considered, which can more accurately describe the attributes of the audio promotion information and display the audio promotion information. It can ensure the accurate display of audio promotion information, thus improving the conversion rate of audio promotion information.
  • the automatic promotion of the audio promotion information can be realized without manual participation, and therefore, the pushing cost of the audio promotion information can be effectively improved.
  • the technical solution provided by the present invention is simple in operation, and therefore, the efficiency of processing the audio promotion information can be effectively improved.
  • FIG. 1 is a schematic flowchart of a method for processing audio promotion information according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an apparatus for processing audio promotion information according to another embodiment of the present invention.
  • the terminals involved in the embodiments of the present invention may include, but are not limited to, a mobile phone, a personal digital assistant (PDA), a wireless handheld device, a tablet computer, and a personal computer (Personal Computer, PC). ), MP3 player, MP4 player, wearable device (for example, smart glasses, smart watches, smart bracelets, etc.).
  • PDA personal digital assistant
  • PC Personal Computer
  • FIG. 1 is a schematic flowchart of a method for processing audio promotion information according to an embodiment of the present invention, as shown in FIG. 1 .
  • the so-called audio promotion information may refer to a complete audio file, which may be pre-stored in the storage device of the terminal.
  • the audio promotion information may include audio files of various encoding formats in the prior art, for example, Moving Picture Experts Group (MPEG) Layer 3 (MPEG Layer, MP3) format audio file, WMA (Windows Media)
  • MPEG Moving Picture Experts Group
  • MPEG Layer 3 MPEG Layer 3
  • WMA Windows Media
  • the audio file format, the Advanced Audio Coding (AAC) format audio file, or the APE format audio file, etc. are not particularly limited in this embodiment.
  • the storage device of the terminal may store the device at a slow speed, which may be a hard disk of the computer system, or may be a non-operating memory of the mobile phone, that is, physical memory, for example, a read-only memory (Read-Only)
  • a slow speed which may be a hard disk of the computer system, or may be a non-operating memory of the mobile phone, that is, physical memory, for example, a read-only memory (Read-Only)
  • the memory, the ROM, the memory card, and the like are not particularly limited in this embodiment.
  • the storage device of the terminal may also be a fast storage device, which may be a memory of the computer system, or may be a running memory of the mobile phone, that is, system memory, for example, a random access memory (Random Access Memory). , RAM, etc., this embodiment is not particularly limited.
  • a fast storage device which may be a memory of the computer system, or may be a running memory of the mobile phone, that is, system memory, for example, a random access memory (Random Access Memory). , RAM, etc., this embodiment is not particularly limited.
  • execution entities of 101 to 104 may be applications located in the local terminal, or may be plug-ins or software development kits (SDKs) in the application of the local terminal.
  • SDKs software development kits
  • the processing engine in the server on the network side, or the distributed system on the network side may not be specifically limited in this embodiment, and is not particularly limited in this embodiment.
  • the application may be a local application installed on the terminal (nativeApp), or may be a web application (webApp) of a browser on the terminal.
  • This embodiment is not particularly limited.
  • the audio feature of the audio promotion information is obtained according to the original audio data of the acquired audio promotion information, and then the audio promotion information is obtained according to at least one of the original audio data and the audio feature.
  • a text feature such that the presentation of the audio promotion information is obtained according to at least one of the audio feature and the text feature, and the audio promotion information is not completely dependent on the text content attribute of the audio promotion information, Rather, considering the audio features of the audio promotion information, the attributes of the audio promotion information can be more accurately described, and the audio promotion information can be displayed, which can ensure the accurate display of the audio promotion information, thereby improving the conversion rate of the audio promotion information.
  • the original audio data may be collected in real time.
  • the sound signal of the audio promotion information may be specifically collected, and then the sound signal is converted into original audio data.
  • the sound signal is sampled, quantized, and encoded to obtain Pulse Code Modulation (PCM) data.
  • PCM Pulse Code Modulation
  • the audio promotion information may be specifically acquired, and the audio promotion information is decoded to obtain the original audio data.
  • the original audio data may be obtained by performing decoding processing on the data block of the audio promotion information.
  • the so-called original audio data is a digital signal converted from an audio signal, for example, the audio signal is sampled, quantized, and encoded to obtain PCM data.
  • the obtained original audio data may be For the original audio data corresponding to one channel, if there are multiple channels in the audio promotion information, the subsequent processing processes, that is, 102 to 104, may be respectively performed on the original audio data corresponding to each channel.
  • the number of channels of the audio promotion information may be determined, and the data block of the audio promotion information is decoded to obtain original audio data. Then, the original audio data corresponding to each channel can be obtained according to the number of channels and the original audio data.
  • the frame header of the audio promotion information may be parsed to determine the number of channels of the audio promotion information.
  • the file header of the audio promotion information may be parsed to determine the number of channels of the audio promotion information.
  • the other parts of the audio promotion information may be parsed to determine the number of channels of the audio promotion information, which is not specifically limited in this embodiment.
  • the number of channels of the audio promotion information may be obtained from a configuration file.
  • the processing device may first perform the step of “determining the number of channels of the audio promotion information”, and then perform the steps of “decoding the data block of the audio promotion information to obtain original audio data”, or may perform first a step of "decoding the data block of the audio promotion information to obtain the original audio data”, and then performing the step of "determining the number of channels of the audio promotion information", or may perform both steps simultaneously.
  • This embodiment is not particularly limited.
  • the original audio data may be subjected to a framing process to obtain at least one frame of data, and then, for each frame of the at least one frame of data. Audio analysis processing is performed to obtain audio characteristics of each frame of data.
  • the original audio data may be framed according to a preset time interval, for example, 20 ms, and some data overlap between adjacent frames, for example, 50% of data overlap, such that At least one frame of data of the original audio data can be obtained.
  • the audio feature may include, but is not limited to, at least one of a time domain audio feature of the original audio data and a frequency domain audio feature of the original audio data, which is used in this embodiment. No particular limitation is imposed.
  • the time domain audio feature of the original audio data may include at least one of the following parameters:
  • Time domain waveform intensity, zero-crossing rate, Linear Prediction Coding (LPC) coefficient, Linear Prediction Cepstrum Coefficient (LPCC), Mel Frequency Cepstrum Coefficient (MFCC) or Perceptual Linear Predictive (PLP) coefficients, beats, tones, and tonality.
  • LPC Linear Prediction Coding
  • LPCC Linear Prediction Cepstrum Coefficient
  • MFCC Mel Frequency Cepstrum Coefficient
  • PLP Perceptual Linear Predictive
  • the frequency domain audio features of the original audio data may include, but are not limited to, spectrum information of the original audio data.
  • the text feature of the audio promotion information may be obtained by using a correspondence between the pre-established audio feature and the text feature according to the audio feature.
  • the so-called text feature can be specifically described in all descriptions of audio promotion information.
  • the audio promotion information has a fast rhythm
  • the audio promotion information has a slow rhythm
  • the audio promotion information has a high sound quality
  • the audio promotion information has a low sound quality.
  • the sound quality of the so-called audio promotion information refers to the fidelity of the original audio data after the compression processing.
  • a beat threshold may be preset, for example, Beat Per Minute (BPM), as a representation of the correspondence between audio features and text features. If the obtained beat is less than or equal to the beat threshold, it may be mapped to a text feature for indicating relief, and conversely, if the obtained beat is greater than the beat threshold, it may be mapped to a text feature for indicating joy.
  • BPM Beat Per Minute
  • time domain waveform without clipping distortion and the text feature for indicating high sound quality
  • the time domain waveform has clipping distortion and text features for indicating low quality. If the obtained time domain waveform has no clipping distortion, it can be mapped to a text feature for indicating high sound quality. Conversely, if the obtained time domain waveform has clipping distortion, it can be mapped to text indicating low quality. feature.
  • a pre-specified training sample set may be used to perform training to construct a learning model, which is used to describe a correspondence between an audio feature and a text feature.
  • the training samples included in the training sample set may be labeled known samples, so that the known samples may be directly used for training to construct a learning model; or part of the labeled known samples may be used. If some of the unknown samples are not labeled, then you can use the known samples to train to build the initial learning model, and then use the initial learning model to evaluate the unknown samples to obtain knowledge.
  • the unknown sample can be labeled according to the recognition result of the unknown sample to form a known sample, as a newly added known sample, and the newly added known sample and the original known sample can be retrained.
  • the recognition accuracy is greater than or equal to the preset accuracy threshold or the number of known samples is greater than or equal to the preset
  • the number threshold and the like are not particularly limited in this embodiment.
  • a text feature of the audio promotion information may be obtained by using a voice recognition technology according to the original audio data.
  • the specific voice recognition technology may be any existing technology, as long as the specific keyword can be identified as the text feature of the audio promotion information, and details are not described herein again.
  • the text feature of the audio promotion information may be obtained by using a correspondence between the pre-established audio feature and the text feature according to the audio feature. And obtaining, according to the original audio data, a text feature of the audio promotion information by using a voice recognition technology.
  • the technical solution in the foregoing two implementation manners may be used to perform organic combination to obtain text features of the audio promotion information.
  • the technical solution in the foregoing two implementation manners may be used to perform organic combination to obtain text features of the audio promotion information.
  • a matching degree of the promotion attribute feature and at least one of the audio feature and the text feature may be specifically calculated as the audio.
  • the presentation score of the promotion information, and further, the presentation of the audio promotion information may be obtained according to the presentation score.
  • the so-called promotion attribute feature can be described by the topic model of this promotion.
  • the topic model is a modeling method for implicit topics in text, audio, and so on.
  • the word "Apple” contains both the theme of Apple and the theme of fruit.
  • the promotion attribute feature may include, but is not limited to, at least one of the following features:
  • Attribute characteristics of a page displaying audio promotion information such as a shopping page, a game page, a news page, and the like;
  • the attribute characteristics of the website to which the page displaying the audio promotion information belongs such as a shopping website, a game website, a news website, etc.;
  • the audio promotion information pushes the user's attribute characteristics, such as teenagers, seniors, and the like.
  • RTB Real Time Bidding
  • the audio feature of the audio promotion information is obtained according to the original audio data of the acquired audio promotion information, and then the audio is obtained according to at least one of the original audio data and the audio feature.
  • Promoting the text feature of the information so that the presentation of the audio promotion information can be obtained according to at least one of the audio feature and the text feature, and the audio promotion information is not completely relied on the text content attribute of the audio promotion information.
  • Presentation but consider the audio characteristics of the audio promotion information, which can be more accurate. Describe the attributes of the audio promotion information, and display the audio promotion information, which can ensure the accurate display of the audio promotion information, thereby improving the conversion rate of the audio promotion information.
  • the automatic promotion of the audio promotion information can be realized without manual participation, and therefore, the pushing cost of the audio promotion information can be effectively improved.
  • the technical solution provided by the present invention is simple in operation, and therefore, the efficiency of processing the audio promotion information can be effectively improved.
  • the processing apparatus of the audio promotion information of the embodiment may include an acquisition unit 21, an audio unit 22, a mapping unit 23, and a presentation unit 24.
  • the obtaining unit 21 is configured to obtain original audio data of the audio promotion information
  • the audio unit 22 is configured to obtain an audio feature of the audio promotion information according to the original audio data
  • a mapping unit 23 configured to use the original Obtaining at least one of audio data and the audio feature, obtaining a text feature of the audio promotion information
  • part of the processing apparatus for audio promotion information provided by this embodiment Or all of them may be applications located in the local terminal, or may be plug-ins or software development kits (SDKs) in the application of the local terminal, or may be processed in the server on the network side.
  • the engine may be a distributed system located on the network side, which is not limited in this embodiment, and is not particularly limited in this embodiment.
  • the application may be a local application (nativeApp) installed on the terminal, or may be a web application (webApp) of the browser on the terminal, which is not specifically limited in this embodiment.
  • the acquiring unit 21 may be specifically configured to collect the original audio data in real time.
  • the acquiring unit 21 may be specifically configured to acquire the audio promotion information, and perform decoding processing on the audio promotion information to obtain the original audio data. .
  • the mapping unit 23 may be specifically configured to obtain the audio promotion by using a correspondence between a pre-established audio feature and a text feature according to the audio feature. a textual feature of the information; and/or obtaining a textual feature of the audio promotional information using speech recognition techniques based on the raw audio data.
  • the displaying unit 24 may be specifically configured to calculate a matching degree between the promotion attribute feature and at least one of the audio feature and the text feature, to And a presentation score of the audio promotion information; and obtaining a presentation of the audio promotion information according to the presentation score.
  • the promotion attribute feature may include, but is not limited to, at least one of the following features:
  • Attribute characteristics of a page displaying audio promotion information such as a shopping page, a game page, a news page, and the like;
  • the attribute characteristics of the website to which the page displaying the audio promotion information belongs such as a shopping website, a game website, a news website, etc.;
  • the audio promotion information pushes the user's attribute characteristics, such as teenagers, seniors, and the like.
  • the audio feature of the audio promotion information is obtained by the audio unit according to the original audio data of the audio promotion information acquired by the acquiring unit, and then the mapping unit is configured according to at least the original audio data and the audio feature. And obtaining a text feature of the audio promotion information, so that the presentation unit can obtain the presentation of the audio promotion information according to at least one of the audio feature and the text feature, since the audio promotion is no longer completely relied on
  • the text content attribute of the information is used to display the audio promotion information, but the audio feature of the audio promotion information is considered, which can more accurately describe the attributes of the audio promotion information, and display the audio promotion information, thereby ensuring accurate display of the audio promotion information. Thereby improving the conversion rate of audio promotion information.
  • the automatic promotion of the audio promotion information can be realized without manual participation, and therefore, the pushing cost of the audio promotion information can be effectively improved.
  • the technical solution provided by the present invention is simple in operation, and therefore, the efficiency of processing the audio promotion information can be effectively improved.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the above-described integrated unit implemented in the form of a software functional unit can be stored in a computer readable storage medium.
  • the software functional unit is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an audio processing engine, or a network device, etc.) or a processor to perform the embodiments of the present invention. Part of the steps of the method.
  • the foregoing storage medium includes: a USB flash drive, a mobile hard disk, a read-only memory (ROM), and a random access memory (Random Access).
  • ROM read-only memory
  • Random Access random access memory

Abstract

La présente invention concerne un procédé, un appareil et un dispositif de traitement d'informations de vulgarisation audio, et un support de stockage informatique non volatile. Le procédé comprend les étapes suivantes : obtenir une caractéristique audio des informations de vulgarisation audio en fonction de données audio d'origine acquises des informations de vulgarisation audio (102) ; obtenir une caractéristique textuelle des informations de vulgarisation audio en fonction d'au moins l'une des données audio d'origine et de la caractéristique audio (103) ; et obtenir une situation de présentation des informations de vulgarisation audio en fonction d'au moins l'une de la caractéristique audio et de la caractéristique textuelle (104). La présentation des informations de vulgarisation audio n'est pas exécutée entièrement en fonction de l'attribut de contenu textuel des informations de vulgarisation audio, mais en tenant compte des caractéristiques audio des informations de vulgarisation audio, qui peuvent décrire avec plus de précision l'attribut des informations de vulgarisation audio, de sorte qu'une présentation précise des informations de vulgarisation audio est assurée, et un taux de conversion des informations de vulgarisation audio est augmenté.
PCT/CN2015/087978 2015-05-12 2015-08-25 Procédé, appareil et dispositif de traitement d'informations de vulgarisation audio, et support de stockage informatique non volatile WO2016179921A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510237646.6A CN104882146B (zh) 2015-05-12 2015-05-12 音频推广信息的处理方法及装置
CN201510237646.6 2015-05-12

Publications (1)

Publication Number Publication Date
WO2016179921A1 true WO2016179921A1 (fr) 2016-11-17

Family

ID=53949614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/087978 WO2016179921A1 (fr) 2015-05-12 2015-08-25 Procédé, appareil et dispositif de traitement d'informations de vulgarisation audio, et support de stockage informatique non volatile

Country Status (2)

Country Link
CN (1) CN104882146B (fr)
WO (1) WO2016179921A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111818225A (zh) * 2020-06-30 2020-10-23 深圳传音控股股份有限公司 音频数据的处理方法、终端设备及存储
CN112863518A (zh) * 2021-01-29 2021-05-28 深圳前海微众银行股份有限公司 一种语音数据主题识别的方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919662B (zh) * 2017-02-14 2021-08-31 复旦大学 一种音乐识别方法及系统
CN107808305A (zh) * 2017-09-28 2018-03-16 百度在线网络技术(北京)有限公司 信息流推广信息的推广实况实现方法、装置及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1582444A (zh) * 1999-12-30 2005-02-16 诺基亚有限公司 选择性媒体流广告技术
CN101034455A (zh) * 2006-03-06 2007-09-12 腾讯科技(深圳)有限公司 一种实现在线广告的方法及系统
WO2007133754A2 (fr) * 2006-05-12 2007-11-22 Owl Multimedia, Inc. Procédé et système de recherche d'informations musicales
CN102254265A (zh) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 一种富媒体互联网广告内容匹配、效果评估方法
US20130339343A1 (en) * 2012-06-18 2013-12-19 Ian Paul Hierons Systems and methods to facilitate media search
CN103631802A (zh) * 2012-08-24 2014-03-12 腾讯科技(深圳)有限公司 歌曲信息检索方法、装置及相应的服务器
CN103685520A (zh) * 2013-12-13 2014-03-26 深圳Tcl新技术有限公司 基于语音识别的歌曲推送的方法和装置
CN103853778A (zh) * 2012-12-04 2014-06-11 大陆汽车投资(上海)有限公司 音乐标签信息更新、音乐推送的方法及相应装置、系统

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1582444A (zh) * 1999-12-30 2005-02-16 诺基亚有限公司 选择性媒体流广告技术
CN101034455A (zh) * 2006-03-06 2007-09-12 腾讯科技(深圳)有限公司 一种实现在线广告的方法及系统
WO2007133754A2 (fr) * 2006-05-12 2007-11-22 Owl Multimedia, Inc. Procédé et système de recherche d'informations musicales
CN102254265A (zh) * 2010-05-18 2011-11-23 北京首家通信技术有限公司 一种富媒体互联网广告内容匹配、效果评估方法
US20130339343A1 (en) * 2012-06-18 2013-12-19 Ian Paul Hierons Systems and methods to facilitate media search
CN103631802A (zh) * 2012-08-24 2014-03-12 腾讯科技(深圳)有限公司 歌曲信息检索方法、装置及相应的服务器
CN103853778A (zh) * 2012-12-04 2014-06-11 大陆汽车投资(上海)有限公司 音乐标签信息更新、音乐推送的方法及相应装置、系统
CN103685520A (zh) * 2013-12-13 2014-03-26 深圳Tcl新技术有限公司 基于语音识别的歌曲推送的方法和装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111818225A (zh) * 2020-06-30 2020-10-23 深圳传音控股股份有限公司 音频数据的处理方法、终端设备及存储
CN112863518A (zh) * 2021-01-29 2021-05-28 深圳前海微众银行股份有限公司 一种语音数据主题识别的方法及装置
CN112863518B (zh) * 2021-01-29 2024-01-09 深圳前海微众银行股份有限公司 一种语音数据主题识别的方法及装置

Also Published As

Publication number Publication date
CN104882146B (zh) 2018-05-15
CN104882146A (zh) 2015-09-02

Similar Documents

Publication Publication Date Title
US11132172B1 (en) Low latency audio data pipeline
US10614803B2 (en) Wake-on-voice method, terminal and storage medium
US20200402500A1 (en) Method and device for generating speech recognition model and storage medium
WO2020173134A1 (fr) Procédé et dispositif de synthèse vocale fondée sur un mécanisme d'attention
TWI711967B (zh) 播報語音的確定方法、裝置和設備
WO2017084360A1 (fr) Procédé et système de reconnaissance vocale
JP2019527371A (ja) 声紋識別方法及び装置
WO2017031846A1 (fr) Procédé, appareil et dispositif d'élimination de bruit et de reconnaissance vocale, et support d'informations non volatil pour ordinateur
CN103943104B (zh) 一种语音信息识别的方法及终端设备
US20170092261A1 (en) System and method for crowd-sourced data labeling
US20240021202A1 (en) Method and apparatus for recognizing voice, electronic device and medium
WO2020237769A1 (fr) Procédé d'évaluation de pureté d'accompagnement et dispositif associé
WO2016179921A1 (fr) Procédé, appareil et dispositif de traitement d'informations de vulgarisation audio, et support de stockage informatique non volatile
Bahat et al. Self-content-based audio inpainting
WO2022178969A1 (fr) Procédé et appareil de traitement de données vocales de conversation, dispositif informatique et support de stockage
WO2021259300A1 (fr) Procédé et appareil d'ajout d'effet sonore, support de stockage et dispositif électronique
JP2008170820A (ja) コンテンツ提供システム及び方法
CN108877779B (zh) 用于检测语音尾点的方法和装置
US20230127787A1 (en) Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium
WO2021227308A1 (fr) Procédé et appareil de génération de ressource vidéo
JP2023507889A (ja) オーディオ相互作用における感情検出
CN112116903A (zh) 语音合成模型的生成方法、装置、存储介质及电子设备
WO2023169258A1 (fr) Procédé et appareil de détection audio, support de stockage et dispositif électronique
CN107680584B (zh) 用于切分音频的方法和装置
CN114598933B (zh) 一种视频内容处理方法、系统、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15891624

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.04.2018)

122 Ep: pct application non-entry in european phase

Ref document number: 15891624

Country of ref document: EP

Kind code of ref document: A1