WO2014079324A1 - 一种语音交互方法和装置 - Google Patents

一种语音交互方法和装置 Download PDF

Info

Publication number
WO2014079324A1
WO2014079324A1 PCT/CN2013/086734 CN2013086734W WO2014079324A1 WO 2014079324 A1 WO2014079324 A1 WO 2014079324A1 CN 2013086734 W CN2013086734 W CN 2013086734W WO 2014079324 A1 WO2014079324 A1 WO 2014079324A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
picture material
keyword
file
interaction
Prior art date
Application number
PCT/CN2013/086734
Other languages
English (en)
French (fr)
Inventor
周彬
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2014079324A1 publication Critical patent/WO2014079324A1/zh
Priority to US14/719,981 priority Critical patent/US9728192B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a voice interaction method and apparatus. Background of the invention
  • the screen material is usually provided directly by the screen provider itself, and the screen is actively displayed on the network in one direction.
  • Embodiments of the present invention provide a voice interaction method to improve the success rate of interaction.
  • the embodiment of the invention also proposes a voice interaction device to improve the success rate of the interaction.
  • the embodiment of the invention also proposes a mobile terminal to improve the success rate of the interaction.
  • the specific scheme of the embodiment of the present invention is as follows:
  • a voice interaction method which sets a correspondence between a screen material movement command and an interaction keyword, and the method further includes:
  • Display picture material Recording a user voice file, analyzing the user voice file to parse the interaction keyword; determining a screen material movement command corresponding to the interaction keyword according to the parsed interaction keyword, and controlling the movement based on the determined screen material movement command The movement of the picture material.
  • a voice interaction device comprising: one or more processors and a memory; wherein the memory includes a plurality of units executable by the one or more processors, the plurality of units comprising: a correspondence relationship setting unit, A picture material display unit, an interactive keyword analysis unit, and a picture material moving unit, wherein:
  • Correspondence relationship setting unit configured to set a correspondence relationship between the screen material movement command and the interaction keyword
  • a picture material display unit for displaying picture material
  • An interactive keyword parsing unit configured to record a user voice file, and analyze the user voice file to parse the interaction keyword
  • a picture material moving unit configured to determine a picture material movement command corresponding to the interaction keyword according to the parsed interaction keyword, and control movement of the picture material based on the determined picture material movement command.
  • a mobile terminal comprising one or more processors and a memory; wherein the memory comprises a plurality of units executable by the one or more processors, the plurality of units comprising: a display unit, a voice Recording unit and computing unit, where:
  • a voice recording unit configured to record a user voice file
  • a calculation unit configured to save a correspondence between a screen material movement command and an interaction keyword, analyze the user voice file to parse the interaction keyword; and determine, according to the parsed interaction keyword, a screen corresponding to the interaction keyword
  • the material movement command controls the movement of the picture material based on the determined picture material movement command.
  • setting the screen material shift Corresponding relationship between the dynamic command and the interactive keyword displaying the screen material; recording the user voice file, analyzing the user voice file to parse the interaction keyword; determining the screen material movement command corresponding to the interaction keyword according to the parsed interaction keyword, And controlling the movement of the picture material based on the determined picture material movement command.
  • the screen browsing audience can control the movement of the screen material based on the voice mode, so the screen browsing audience can use the voice mode and the screen material. Effective interaction improves the success rate of interaction.
  • the embodiment of the present invention controls the picture material by sensing the user's voice, and also improves the exposure degree of the picture material, thereby further improving the effect of the picture material.
  • FIG. 1 is a flowchart of a voice interaction method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a picture material of a car type according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of interaction movement of picture materials of a car type according to an embodiment of the present invention.
  • FIG. 4 is a structural diagram of a voice interaction apparatus according to an embodiment of the present invention.
  • FIG. 5 is a structural diagram of another voice interaction apparatus according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of another mobile terminal according to an embodiment of the present invention. detailed description
  • the screen material is usually provided directly by the screen provider itself, and the screen is actively displayed on the network in one direction.
  • this kind of display does not take into account the individual participation of the screen audience, so it is only a one-sided screen display, and The viewers of the screen view lack effective interaction, so the success rate of interaction is very low.
  • an embodiment of the present invention provides a voice interaction method.
  • setting a correspondence between a screen material movement command and an interaction keyword displaying a screen material; recording a user voice file, analyzing the user voice file to parse the interaction keyword; and determining the correspondence according to the parsed interaction keyword
  • the screen material movement command of the interactive keyword controls the movement of the screen material based on the determined screen material movement command.
  • FIG. 1 is a flow chart of a voice interaction method according to an embodiment of the present invention.
  • the method includes:
  • Step 101 Set the correspondence between the screen material movement command and the interaction keyword.
  • the screen material movement command is used to control the movement of the screen material. You can set the correspondence between the screen material movement command and the interactive keyword, such as the screen material acceleration command, the screen material slowdown command, the screen material start command, the screen material stop command, the screen material movement speed hold command, or the screen material movement trajectory.
  • the movement of the screen material can be controlled based on the screen material movement command corresponding to the interactive keyword.
  • the interaction keyword “start” to correspond to the screen material start command
  • the interaction keyword “stop” corresponds to the screen material stop command
  • set the interaction keyword “acceleration” to the screen material acceleration command
  • set the interaction keyword “deceleration” Corresponds to the screen material deceleration command
  • the interactive keyword “curve” corresponds to the command to set the screen material movement trajectory to curve
  • the interaction keyword "straight line” corresponds to the command to set the screen material movement trajectory to a straight line, and so on.
  • the correspondence between the moving speed of the picture material and the interactive keyword may be saved on the mobile terminal.
  • Mobile terminals may include, but are not limited to, feature phones, smart phones, PDAs, personal computers (PCs), tablets or personal digital assistants (PDAs), and the like.
  • the mobile terminal can be applied to any smart terminal operating system, and specific operating systems include but are not limited to: Android (Andorid), Palm OS, Symbian (Saipan), Windows mobile, Linux, Android (Android), iPhone ( Apple) OS, Black Berry (Blackberry) OS 6.0, Windows Phone series, and more.
  • Android Andorid
  • Palm OS Symbian (Saipan)
  • Windows mobile Linux
  • Android Android
  • iPhone Apple
  • Black Berry Blackberry OS 6.0
  • Windows Phone series and more.
  • the mobile terminal may specifically adopt an Android operating system, and the mobile terminal may adopt various versions of Andorid, including but not limited to: Astro Boy (Android Beta), Clockwork Robot (Android 1.0), Cupcake ( Android 1.5), Sweet ⁇ (Android 1.6), Muffin (Android 2.0/2.1), Frozen Yogurt (Android 2.2), Gingerbread (Android 2.3), Honeycomb (Android 3.0), Ice Cream Sandwich (Android 4.0), Jelly Beans (Jelly Bean, Android 4.1) and other versions.
  • the specific version of the Android platform is listed in detail above. Those skilled in the art can appreciate that the embodiments of the present invention are not limited to the above listed version, but can also be applied to any other version based on the Android software architecture.
  • step 102 can be directly executed, and step 101 can be no longer repeated.
  • Step 102 Display screen material, record a user voice file, and analyze the user voice file to parse the interaction keyword.
  • the image audience, the image provider or any third party can upload the screen material to the server on the network side through various information transmission methods, and then the mobile terminal obtains the screen material from the server and displays the screen. material.
  • the specific content of the screen material is related to the material itself that is expected to be presented to the user. For example, if you want to deliver information about a brand car, you can upload a physical model image of the brand car; if you want to deliver information about an electronic product, you can upload a physical model image of the electronic product;
  • a background image corresponding to the picture material can also be displayed on the mobile terminal.
  • the background image and the picture material can be: bitmap; joint photo expert group (JPEG); signed image file format (TIFF); image interchange format (GIF); streaming network graphics format (PNG); or three-dimensional image , and many more.
  • JPEG joint photo expert group
  • TIFF signed image file format
  • GIF image interchange format
  • PNG streaming network graphics format
  • the data of the GIF image file is compressed, and a compression algorithm such as variable length is used.
  • Another feature of the GIF format is that it can store multiple color images in a GIF file. If multiple image data stored in one file is read out one by one and displayed on the screen, it can form the most compact single. Picture.
  • the common display can be realized by superimposing the picture material of the picture type on the background image.
  • FIG. 2 is a schematic view of a picture material of a car type according to an embodiment of the present invention.
  • the picture material is a car model and the background image is a house.
  • the car model is shown together with the house, and the car model stays in the house.
  • information push audiences, image providers, or any third party can also upload footage in a variety of ways.
  • the image material can be uploaded to the server directly by the image audience on the mobile terminal through the wireless internet connection; the image material can also be uploaded to the server by the image provider through a personal computer (PC), and the like.
  • PC personal computer
  • the mobile terminal can access the server through a universal gateway interface (CGI)
  • CGI universal gateway interface
  • the background image and the picture material are acquired, and the background image and the picture material are displayed on the browser window of the mobile terminal.
  • the mobile terminal browser is a browser running on the mobile terminal, and can browse the Internet content through various methods such as General Packet Radio Service (GPRS).
  • GPRS General Packet Radio Service
  • Some mobile terminal browsers require JAVA or mobile terminal systems (such as Apple's IOS system and Android platform).
  • the server may provide a background image to the mobile terminal, or may pre-store the background image locally on the mobile terminal. It is preferable to save the background image in the server because the server can have a larger storage space than the mobile terminal, so that a large amount of background image can be saved in the server.
  • the background image may include: blue sky, white clouds, bridges, highways, etc.
  • the image audience, the image provider, or any third party further uploads image attribute information for describing the type of the picture material in addition to uploading the picture material.
  • the image attribute information may be text information. It is also possible to describe the type of the picture material directly by its naming.
  • the server can determine the type of the picture material based on the image attribute information, and retrieve the background image that matches the picture material.
  • the server can retrieve a background image suitable for the car (such as a runway); assuming that the picture material is information push for the electronic product, the server can retrieve the suitable electronic product. Background image (such as a desk).
  • the server may first send the uploaded picture material and the background image saved by itself to the mobile terminal, and preferably send the order information and/or the advertisement bit information, etc., and provide the same to the mobile terminal. Show accordingly.
  • the background image and the screen material are displayed together on the mobile terminal to realize the fused display.
  • the picture material is displayed above or in front of the background image.
  • the user can make a voice when browsing the screen material, or after browsing the screen material.
  • the user voice file is recorded, and the user voice file is analyzed to parse the interaction keyword.
  • the user voice file can be analyzed based on various speech recognition technologies to resolve the interaction keywords.
  • Speech recognition technology mainly includes feature extraction technology, pattern matching criteria and model training techniques.
  • continuous speech recognition may employ a hidden Markov model
  • embodiments of the present invention may also employ various speech recognition algorithms such as dynamic time warping, neural network, support vector machine, and vector quantization.
  • various voice recognition technologies can be embedded into browser windows of various terminals through built-in plug-ins or interfaces, so that the browser window itself can have corresponding voice recognition functions.
  • the voice file input by the user can be converted into a text file, and the text file is compared with the keyword in the text format in the database. If the matching is successful, the interactive keyword can be determined. It is also possible to compare the voice file input by the user with the interactive keyword of the voice format, and if it is consistent with the interactive keyword waveform of the voice format, the interaction key can be determined.
  • the voice training file and the text training file may be first acquired, and the voice recognition file is used to estimate the voice parameter in a voice adaptive manner, and the estimated voice parameter is utilized by using the voice training file and the text training file.
  • the speech recognizer identifies the user voice file to convert the user voice file into a text file; and retrieves the interactive keyword from the text file.
  • the voice waveform of the user voice file may be determined; determining whether the voice waveform of the user voice file includes a waveform region that is consistent with the voice waveform of the interaction keyword, and if so, based on the The speech waveform of the included interactive keyword determines the interactive keyword.
  • the change of the screen material can also be realized based on the voice.
  • a correspondence relationship between the screen material change command and the screen material change keyword may be set; then, by recording the user voice file, the user voice file is analyzed to parse the screen material change keyword; and according to the parsed image material
  • the change keyword identifies the updated screen material corresponding to the screen material change keyword, and displays the updated screen material.
  • Step 103 Determine a screen material movement command corresponding to the interaction keyword according to the parsed interaction keyword, and control movement of the screen material based on the determined screen material movement command.
  • the screen material movement command corresponding to the interaction keyword may be determined according to the parsed interaction keyword according to the correspondence between the screen material movement command set in step 101 and the interaction keyword.
  • the port is set with the keyword "start” in step 101 corresponding to the screen material start command
  • the interactive keyword “stop” corresponds to the screen material stop command
  • the interactive keyword “acceleration” corresponds to the screen material acceleration command
  • setting interaction
  • the keyword “deceleration” corresponds to the screen material deceleration command
  • the interactive keyword “curve” corresponds to the command for setting the screen material movement trajectory to the curve
  • the interaction keyword “straight line” corresponds to the command for setting the screen material movement trajectory to a straight line.
  • the screen material movement command is specifically: a screen material startup command; when the interaction keyword is "stopped” in step 102, the screen may be determined.
  • the material movement command is specifically: a screen material stop command; when the interaction keyword is parsed as "acceleration” in step 102, the screen material can be determined.
  • the movement command is specifically: a picture material acceleration command; when the interaction keyword is parsed as "deceleration” in step 102, it may be determined that the picture material movement command is specifically: a picture material deceleration command; when the interaction key is parsed in step 102
  • the screen material movement command is specifically: a command for setting the screen material movement trajectory to a curve; when the interaction keyword is parsed as "straight line” in step 102, the screen material movement command can be determined as follows: Set the command to move the screen material to a straight line, and so on.
  • the parsed interaction keyword determines a screen material movement command corresponding to the interaction keyword
  • the movement of the screen material may be controlled based on the determined screen material movement command.
  • the correspondence between the interactive keyword and the moving speed of the screen material may be set in advance in step 101.
  • setting the interactive keyword "high-speed motion” corresponds to the high-speed moving speed of the screen material that is preset, that is, the high-speed moving speed holding command for the screen material; and the interactive keyword "medium speed motion” corresponding to the screen material remains
  • the interactive keyword "low speed motion” may also be set corresponding to the low speed moving speed of the screen material being preset, that is, the low speed moving for the screen material Speed keep command.
  • the screen material movement command is specifically: the screen material maintains a preset high-speed moving speed, and then the control screen material moving speed is accelerated to the preset Move at high speed and keep the high speed moving.
  • the screen material movement command is specifically: the screen material maintains a preset high speed movement speed, and then the control screen material movement speed is accelerated to the preset The medium speed moves and keeps the medium speed moving.
  • the screen material movement command is specifically: the screen material maintains a preset low speed moving speed, then The control screen material moving speed is accelerated to the preset low speed moving speed, and the low speed moving speed is maintained.
  • FIG. 3 is a schematic diagram of interactive movement of picture material of a car type according to an embodiment of the present invention.
  • the picture material is a car model and the background image is a house.
  • the car model is shown together with the house, and the car model stays on the house. After sensing the user's voice, the car model can be removed from the house according to the screen material movement command corresponding to the interaction keyword included in the voice.
  • the user's heat can be detected by the camera's thermal sensor and the preset engine start sound is started to indicate that the car model has been activated.
  • the embodiment of the present invention also proposes a voice interaction device.
  • 4 is a structural diagram of a voice interaction device according to an embodiment of the present invention.
  • the apparatus includes a correspondence relationship setting unit 401, a screen material presentation unit 402, an interaction keyword parsing unit 403, and a screen material moving unit 404, wherein:
  • a correspondence relationship setting unit 401 configured to set a correspondence between a screen material movement command and an interaction keyword; wherein, the screen material movement command is used to control movement of the screen material;
  • a picture material unit 402 configured to display picture material
  • the interactive keyword parsing unit 403 is configured to record a user voice file, and analyze the user voice file to parse the interaction keyword;
  • the screen material moving unit 404 is configured to determine a screen material movement command corresponding to the interaction keyword according to the parsed interaction keyword, and control movement of the screen material based on the determined screen material movement command.
  • the interactive keyword parsing unit 403 is configured to acquire a voice training file and a text training file, and use the voice training file and the text training file to estimate a voice parameter in a voice adaptive manner by using the voice recognizer, and Utilizing the estimated speech parameters
  • the speech recognizer identifies the user voice file to convert the user voice file into a text file; retrieves an interaction keyword from the text file.
  • the interaction keyword parsing unit 403 is configured to determine a voice waveform of the user voice file; and determine whether the voice waveform of the user voice file includes a waveform region that is consistent with the voice waveform of the interaction keyword. If yes, the interactive keyword is determined based on the voice waveform of the included interactive keyword.
  • the correspondence relationship setting unit 401 is configured to set a picture material acceleration command, a picture material deceleration command, a picture material start command, a picture material stop command, a picture material moving speed hold command, or a picture material movement track and an interaction keyword relationship.
  • FIG. 5 is a schematic structural diagram of another voice interaction apparatus according to an embodiment of the present invention. As shown in Figure 5.
  • the apparatus may further include a screen material changing unit 405;
  • the correspondence relationship setting unit 401 is further configured to set a correspondence relationship between the screen material change command and the screen material change keyword;
  • the interactive keyword parsing unit 403 is further configured to record a user voice file, and analyze the user voice file to parse the screen material change keyword;
  • the screen material changing unit 405 is configured to determine, after the parsed image material change key, the updated screen material corresponding to the screen material change keyword, and send the updated screen material to the screen material display unit 402;
  • the screen material display unit 402 is further configured to display the updated screen material.
  • Embodiments of the invention may be practiced in a variety of application environments based on the methods and apparatus described above.
  • embodiments of the present invention can be applied to an advertising application of a mobile terminal.
  • the interactive keyword "Audi Start” can be set in advance. After the user clicks on the advertisement, the user is prompted to use the microphone to say the prompt: "Audi starts”; when the user says that Audi starts, the voice keyword is solved. Analysis of the text, compared to the interactive keywords set by the ad.
  • the behavior of the advertisement will be triggered, and the engine sound of the car start will be released, indicating that the car in the advertisement has been activated, and the car tire in the advertisement banner is rotated, and the advertisement is quickly moved to open the visible range of the advertisement, thereby improving the range.
  • the interactivity and novelty of advertising are consistent, the behavior of the advertisement will be triggered, and the engine sound of the car start will be released, indicating that the car in the advertisement has been activated, and the car tire in the advertisement banner is rotated, and the advertisement is quickly moved to open the visible range of the advertisement, thereby improving the range.
  • an embodiment of the present invention further provides a mobile terminal.
  • FIG. 6 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
  • the mobile terminal includes a display unit 601, a voice recording unit 602, and a computing unit 603. among them:
  • a display unit 601 configured to display picture material
  • a voice recording unit 602 configured to record a user voice file
  • the calculating unit 603 is configured to save a correspondence between the screen material movement command and the interaction keyword, analyze the user voice file to parse the interaction keyword, and determine a screen corresponding to the interaction keyword according to the parsed interaction keyword
  • the material movement command controls the movement of the picture material based on the determined picture material movement command.
  • FIG. 7 is a schematic structural diagram of another mobile terminal according to an embodiment of the present invention.
  • the mobile terminal may further include an image capturing unit 604 for sensing the heat of the user and emitting heat to the display unit 601 after sensing the user's heat. Prompt message;
  • the display unit 601 is further configured to: after receiving the heat prompt message, play the screen material to move the start audio.
  • the units in the embodiments of the present invention may be integrated into one, or may be separately deployed; may be combined into one unit, or may be further split into multiple sub-units. These units may be implemented by software (such as computer readable instructions executed by one or more processors stored in a non-volatile storage medium), or by hardware, or by a combination of software and hardware.
  • the image audience, image provider or any third party can pass Various information transmission methods are used to upload the screen material to the server located on the network side, and then the mobile terminal acquires the screen material from the server and displays the screen material.
  • the voice interaction method and apparatus proposed by the embodiments of the present invention can be embodied in various forms.
  • the voice interaction method can be written as a plug-in installed in a mobile terminal according to a certain specification application interface, or it can be packaged as an application for the user to download and use.
  • a plug-in it can be implemented as a variety of plug-ins such as ocx, dll, cab, etc.
  • the voice interaction method proposed by the embodiment of the present invention can also be implemented by a specific technology such as a Flash plug-in, a RealPlayer plug-in, an MMS plug-in, a MIDI staff plug-in, or an ActiveX plug-in.
  • the voice interaction method proposed by the embodiment of the present invention can be stored on various storage media by means of storage of instructions or instruction set storage.
  • These storage media include, but are not limited to, floppy disks, optical disks, DVDs, hard disks, flash memories, USB flash drives, CF cards, SD cards, MMC cards, SM cards, Memory Sticks, xD cards, and the like.
  • the voice interaction method provided by the embodiment of the present invention may also be applied to a Nand flash-based storage medium, such as a USB flash drive, a CF card, an SD card, an SDHC card, an MMC card, an SM card, a memory stick, xD card, etc.
  • a Nand flash-based storage medium such as a USB flash drive, a CF card, an SD card, an SDHC card, an MMC card, an SM card, a memory stick, xD card, etc.
  • the method further includes: displaying the screen material; preparing the user voice file, analyzing the user voice file to parse the interaction key a word; a picture material movement command corresponding to the interaction keyword is determined according to the parsed interaction keyword, and the movement of the picture material is controlled based on the determined picture material movement command.
  • the embodiment of the present invention controls the picture material by sensing the user voice, and also The exposure of the screen material is increased, so that the effect of the screen material can be further improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种语音交互方法和装置,方法包括:设置画面素材移动命令与交互关键字的对应关系,该方法还包括:展示画面素材;录制用户语音文件,分析用户语音文件以解析出交互关键字;根据解析出的交互关键字确定对应于交互关键字的画面素材移动命令,并基于所确定的画面素材移动命令控制画面素材的移动。

Description

一种语音交互方法和装置
技术领域
本发明涉及信息处理技术领域, 更具体地, 涉及一种语音交互方法 和装置。 发明背景
随着计算机技术和网络技术的飞速发展, 互联网 (Internet )和即时 通信技术在人们的日常生活、 学习和工作中发挥的作用也越来越大。 而 且, 随着移动互联网的发展, 互联网也在向移动化发展。
当今社会已经进入了高度发达的信息时代, 其企业竟争形态也已经 由过去主要以产品功能质量为主的单一竟争形态转变为以企业形象、 商 品、 品牌等作为重要手段和主要倾向的复合竟争形态。 这种形态的转变 与现代画面展示事业(例如广告) 的迅速发展是分不开的。
在目前的画面展示方法中,通常由画面提供商自己直接提供画面 素材, 而且单向在网络上主动展示画面。 发明内容
本发明实施方式提出一种语音交互方法, 以提高交互成功率。 本发明实施方式还提出了一种语音交互装置, 以提高交互成功率。 本发明实施方式还提出了一种移动终端, 以提高交互成功率。 本发明实施方式的具体方案如下:
一种语音交互方法, 设置画面素材移动命令与交互关键字的对应关 系, 该方法还包括:
展示画面素材; 录制用户语音文件, 分析所述用户语音文件以解析出交互关键字; 根据解析出的所述交互关键字确定对应于该交互关键字的画面素材 移动命令, 并基于所确定的画面素材移动命令控制所述画面素材的移动。
一种语音交互装置, 包括一个或者多个处理器和存储器; 其中, 所 述存储器中包含可由所述一个或者多个处理器执行的多个单元, 所述多 个单元包括: 对应关系设置单元、 画面素材展示单元、 交互关键字解析 单元和画面素材移动单元, 其中:
对应关系设置单元, 用于设置画面素材移动命令与交互关键字的对 应关系;
画面素材展示单元, 用于展示画面素材;
交互关键字解析单元, 用于录制用户语音文件, 分析所述用户语音 文件以解析出交互关键字;
画面素材移动单元, 用于根据解析出的所述交互关键字确定对应于 该交互关键字的画面素材移动命令, 并基于所确定的画面素材移动命令 控制所述画面素材的移动。
一种移动终端, 该移动终端包括一个或者多个处理器和存储器; 其 中, 所述存储器包含可由所述一个或者多个处理器执行的多个单元, 所 述多个单元包括: 显示单元、 语音录制单元和计算单元, 其中:
显示单元, 用于展示画面素材;
语音录制单元, 用于录制用户语音文件;
计算单元, 用于保存画面素材移动命令与交互关键字的对应关系, 分析所述用户语音文件以解析出交互关键字; ^据解析出的所述交互关 键字确定对应于该交互关键字的画面素材移动命令, 并基于所确定的画 面素材移动命令控制所述画面素材的移动。
从上述技术方案可以看出, 在本发明实施方式中, 设置画面素材移 动命令与交互关键字的对应关系; 展示画面素材; 录制用户语音文件, 分析用户语音文件以解析出交互关键字; 根据解析出的交互关键字确定 对应于该交互关键字的画面素材移动命令, 并基于所确定的画面素材移 动命令控制述画面素材的移动。 由此可见, 应用本发明实施方式之后, 不同于现有技术中画面提供者的单片面画面展示, 画面浏览受众可以基 于语音方式控制画面素材的移动, 因此画面浏览受众可以通过语音方式 与画面素材有效交互, 提高了交互成功率。
而且, 本发明实施方式通过感应用户语音来控制画面素材, 同时还 提高了画面素材的曝光程度, 从而进一步能够提高画面素材投放效果。 附图简要说明
图 1为本发明实施例中的一种语音交互方法流程图。
图 2为本发明实施例中的一种汽车类型的画面素材示意图。
图 3为本发明实施例中的一种汽车类型的画面素材交互移动示意图。 图 4为本发明实施例中的一种语音交互装置结构图。
图 5为本发明实施例中的另一种语音交互装置的结构图。
图 6为本发明实施例中的一种移动终端结构示意图。
图 7为本发明实施例中的另一种移动终端的结构示意图。 具体实施方式
为使本发明的目的、 技术方案和优点更加清楚, 下面结合附图对本 发明作进一步的详细阐述。
在目前的画面展示方法中, 通常由画面提供商自己直接提供画面 素材, 而且单向在网络上主动展示画面。 然而, 这种展示方式并没有 考虑到画面受众的个人参与程度, 因此仅是一种片面的画面展示, 与 画面浏览受众缺乏有效交互, 因此交互成功率很低。
为此, 本发明实施例提供了一种语音交互方法。 在本发明实施方式 中, 设置画面素材移动命令与交互关键字的对应关系; 展示画面素材; 录制用户语音文件, 分析用户语音文件以解析出交互关键字; ^据解析 出的交互关键字确定对应于该交互关键字的画面素材移动命令, 并基于 所确定的画面素材移动命令控制述画面素材的移动。 由此可见, 应用本 发明实施方式之后, 不同于现有技术中画面提供者的单片面画面展示, 画面浏览受众可以基于语音方式控制画面素材的移动, 因此画面浏览受 众可以通过语音方式与画面素材有效交互, 提高了交互成功率。
图 1为根据本发明实施方式的语音交互方法流程图。
如图 1所示, 该方法包括:
步骤 101: 设置画面素材移动命令与交互关键字的对应关系。
在这里, 画面素材移动命令用于控制画面素材的移动。 可以设置画 面素材加速命令、 画面素材减速命令、 画面素材启动命令、 画面素材停 止命令、 画面素材移动速度保持命令、 或画面素材移动轨迹等各种画面 素材移动命令与交互关键字的对应关系。
当用户的语音中包含交互关键字时, 可以基于与该交互关键字对应 的画面素材移动命令来控制画面素材的移动。 比如, 可以设置交互关键 字 "启动" 对应于画面素材启动命令; 交互关键字 "停止" 对应于画面 素材停止命令; 设置交互关键字 "加速" 对应于画面素材加速命令; 设 置交互关键字 "减速" 对应于画面素材减速命令; 交互关键字 "曲线" 对应于设置画面素材移动轨迹为曲线的命令; 交互关键字 "直线" 对应 于设置画面素材移动轨迹为直线的命令, 等等。
以上虽然详细罗列了画面素材移动命令与交互关键字的具体实例, 本领域技术人员可以意识到, 这种罗列仅仅是示范性的, 并不用于限定 本发明实施方式的保护范围。
在一个实施方式中, 可以将画面素材移动速度与交互关键字的对应 关系保存在移动终端上。 移动终端可以包括但是不局限于: 功能手机、 智能手机、掌上电脑、个人电脑( PC )、平板电脑或个人数字助理( PDA ), 等等。
以上虽然详细罗列了移动终端的具体实例,本领域人员可以意识到, 这些罗列仅是阐述目的, 并不用于限定本发明实施方式的保护范围。
移动终端可以适用于任意的智能终端操作系统, 具体可以采用的操 作系统包括但是不局限于:安卓( Andorid )、 Palm OS、 Symbian (塞班)、 Windows mobile , Linux、 Android (安卓 )、 iPhone (苹果 ) OS、 Black Berry (黑莓) OS 6.0、 Windows Phone 系列, 等等。
优选地, 移动终端具体可以采用安卓操作系统, 而且移动终端可以 采用到 Andorid的各个版本之中, 包括但是不局限于: 阿童木( Android Beta ), 发条机器人(Android 1.0 )、 纸杯蛋糕(Android 1.5 )、 甜甜圏 ( Android 1.6 )、 松饼( Android 2.0/2.1 ), 冻酸奶 ( Android 2.2 )、 姜饼 ( Android 2.3 ), 蜂巢( Android 3.0 )、 冰激凌三明治 ( Android 4.0 )、 果 冻豆(Jelly Bean, Android 4.1 )等版本。 以上详细罗列了 Android平台 的具体版本, 本领域技术人员可以意识到, 本发明实施方式并不局限于 上述罗列版本, 而还可以适用于其他基于 Android软件架构的任意版本 之中。
需要注意的是, 在设置完画面素材移动命令和交互关键字的对应关 系之后, 当用户下次利用语音控制画面素材的移动时, 可以直接执行步 骤 102 , 步骤 101可以不再重复执行。
步骤 102: 展示画面素材, 录制用户语音文件, 分析所述用户语音 文件以解析出交互关键字。 在这里, 图像受众、 图像提供商或任意的第三方都可以通过各种信 息传输方式, 将画面素材上传到位于网络侧的服务器中, 然后由移动终 端从服务器中获取到画面素材, 并展示画面素材。
画面素材的具体内容与期望向用户展示的素材本身相关。 比如, 假 设期望投放关于品牌汽车的信息推送, 则可以上传品牌汽车的实物模型 图像; 假如期望投放关于某款电子产品的信息推送, 则可以上传该款电 子产品的实物模型图像; 等等。
在移动终端上还可以展示与画面素材相对应的背景图像。 背景图像 和画面素材的种类可以有多种。比如,背景图像和画面素材可以分别为: 位图; 联合照片专家组(JPEG ); 签图像文件格式(TIFF ); 图像互换格 式(GIF ); 流式网络图形格式(PNG ); 或三维图像, 等等。 GIF 图像 文件的数据是经过压缩的, 而且是采用了可变长度等压缩算法。 GIF格 式的另一个特点是其在一个 GIF文件中可以存多幅彩色图像, 如果把存 于一个文件中的多幅图像数据逐幅读出并显示到屏幕上, 就可构成一种 最筒单的画面。
本发明实施方式中, 可以通过将该画面类型的画面素材叠加到背景 图像上, 实现共同显示。
示范性地, 图 2为根据本发明实施方式的汽车类型的画面素材示意 图。 由图 2可见, 画面素材为汽车模型, 背景图像为房屋。 汽车模型与 房屋融合显示在一起, 而且汽车模型停留在房屋之中。
而且, 信息推送受众、 图像提供商或任意的第三方还可以通过多种 方式上传画面素材。 比如, 可以由图像受众直接在移动终端上, 通过无 线上网的方式将画面素材上传到服务器; 也可以由图像提供商通过个人 计算机(PC )等方式将画面素材上传到服务器, 等等。
优选的, 移动终端可以通过通用网关接口 (CGI ) 的方式从服务器 获取到背景图像和画面素材, 并在移动终端的浏览器窗口上显示背景图 像和画面素材。
移动终端浏览器是运行在移动终端上的浏览器, 可以通过通用分组 无线服务技术(GPRS )等多种方式上网浏览互联网内容。 目前, 一些 移动终端浏览器需要 JAVA或移动终端的系统(如苹果的 IOS系统以及 Android平台等) 支持。
服务器可以向移动终端提供背景图像, 也可以在移动终端本地预先 存储背景图像。 优选在服务器中保存背景图像, 这是因为相比较移动终 端, 服务器可以具有更大的储存空间, 因此可以在服务器中保存海量的 背景图像。 示范性地, 背景图像可以包括: 蓝天、 白云、 大桥、 公路等 等。
在一个实施方式中, 图像受众、 图像提供商或任意第三方除了上传 画面素材之外, 还进一步上传用于描述该画面素材类型的图像属性信息。 图像属性信息可以是文本信息。 还可以直接通过画面素材的命名来描述 其类型。 服务器可以根据图像属性信息确定出画面素材的类型, 并检索 与该画面素材相匹配的背景图像。
比如, 假设画面素材是针对汽车的信息推送, 则服务器可以检索出 适于汽车的背景图像(比如为跑道); 假设画面素材是针对电子产品的 信息推送, 则服务器可以检索出适于电子产品的背景图像(比如为办公 桌)。
在这里, 当由服务器提供背景图像时, 服务器可以首先将上传来的 画面素材和自身保存的背景图像发送到移动终端, 而且优选顺带发送定 单信息和 /或广告位信息等, 并提供给移动终端进行相应展示。
在移动终端上将背景图像和画面素材共同显示,从而实现融合显示。 优选地, 画面素材显示在背景图像的上面或前面。 当向用户展示画面素材之后, 可以基于用户的语音实现针对画面素 材的互动过程。
用户在浏览画面素材时, 或者浏览画面素材之后, 可以发出语音。 此时录制用户语音文件, 分析用户语音文件以解析出交互关键字。
在这里, 可以基于各种语音识别技术分析用户语音文件以解析出交 互关键字。 语音识别技术主要包括特征提取技术、 模式匹配准则及模型 训练技术等方面。
本发明实施方式可以采用的语音识别的种类有很多, 比如连续语音 识别、 关键词检出、 说话人辨认、 说话人确认、 语音合成、 音频检索等 方式。 更具体地, 连续语音识别可以采用隐马尔科夫模型, 而且本发明 实施方式还可以采用动态时间规正、 神经网络、 支持向量机、 矢量量化 等各种语音识别算法。
在具体实施中, 可以将各种语音识别技术通过内置插件或接口等方 式嵌入到各种终端的浏览器窗口中, 从而使得浏览器窗口自身可以具备 相应语音识别功能。
比如, 可以将用户输入的语音文件转换成文本文件, 再将文本文件 与数据库中的文本格式的关键字进行比较, 如果匹配成功则可确定出交 互关键字。 也可以将用户输入的语音文件与语音格式的交互关键字进行 波形比较, 若与语音格式的交互关键字波形一致, 则可确定出交互关键 字。
在一个实施方式中, 可以首先获取语音训练文件和文本训练文件, 并利用所述语音训练文件和文本训练文件, 对语音识别器采用语音自适 应方式估计语音参数, 并利用该已估计语音参数的语音识别器识别所述 用户语音文件, 以将该用户语音文件转换为文本文件; 再从所述文本文 件中检索交互关键字。 在另一个实施方式中, 可以确定该用户语音文件的语音波形图; 判 断该用户语音文件的语音波形图中是否包含与交互关键字的语音波形 保持一致的波形区域, 如果是, 则基于该所包含的交互关键字的语音波 形确定该交互关键字。
在本发明实施方式中, 还可以基于语音来实现变更画面素材。
具体地, 可以设置画面素材变更命令与画面素材变更关键字的对应 关系; 然后通过录制用户语音文件, 分析所述用户语音文件以解析出画 面素材变更关键字; 再根据解析出的所述画面素材变更关键字确定对应 于该画面素材变更关键字的更新后画面素材, 并展示更新后画面素材。
步骤 103: 根据解析出的所述交互关键字确定对应于该交互关键字 的画面素材移动命令, 并基于所确定的画面素材移动命令控制所述画面 素材的移动。
在这里, 可以根据在步骤 101设置的画面素材移动命令与交互关键 字的对应关系, 根据解析出的所述交互关键字确定对应于该交互关键字 的画面素材移动命令。
比如, 口在步骤 101中设置有关键字 "启动" 对应于画面素材启 动命令、 交互关键字 "停止" 对应于画面素材停止命令、 设置交互关键 字 "加速" 对应于画面素材加速命令、 设置交互关键字 "减速" 对应于 画面素材减速命令、 交互关键字 "曲线" 对应于设置画面素材移动轨迹 为曲线的命令、 交互关键字 "直线" 对应于设置画面素材移动轨迹为直 线的命令。
则当步骤 102中解析出交互关键字为 "启动" 时, 则可确定画面素 材移动命令具体为: 画面素材启动命令; 当步骤 102中解析出交互关键 字为 "停止" 时, 则可确定画面素材移动命令具体为: 画面素材停止命 令; 当步骤 102中解析出交互关键字为 "加速" 时, 则可确定画面素材 移动命令具体为: 画面素材加速命令; 当步骤 102中解析出交互关键字 为 "减速"时,则可确定画面素材移动命令具体为: 画面素材减速命令; 当步骤 102中解析出交互关键字为 "曲线" 时, 则可确定画面素材移动 命令具体为: 设置画面素材移动轨迹为曲线的命令; 当步骤 102中解析 出交互关键字为 "直线" 时, 则可确定画面素材移动命令具体为: 设置 画面素材移动轨迹为直线的命令, 等等。
当解析出的所述交互关键字确定对应于该交互关键字的画面素材移 动命令, 可以基于所确定的画面素材移动命令控制所述画面素材的移动。
在本发明实施方式中, 还可以在步骤 101中预先设置交互关键字与 画面素材移动速度之间的对应关系。比如:设置交互关键字 "高速运动" 对应于画面素材保持预先设置的高速移动速度, 即针对画面素材的高速 移动速度保持命令; 还可以设置交互关键字 "中速运动" 对应于画面素 材保持为预先设置的中速移动速度, 即针对画面素材的中速移动速度保 持命令; 还可以设置交互关键字 "低速运动" 对应于画面素材保持为预 先设置的低速移动速度, 即针对画面素材的低速移动速度保持命令。
当步骤 102中解析出交互关键字为 "高速运动" 时, 则可确定画面 素材移动命令具体为: 画面素材保持预先设置的高速移动速度, 则此时 控制画面素材移动速度加速到该预先设置的高速移动速度, 并保持该高 速移动速度。
当步骤 102中解析出交互关键字为 "中速运动" 时, 则可确定画面 素材移动命令具体为: 画面素材保持预先设置的高速移动速度, 则此时 控制画面素材移动速度加速到该预先设置的中速移动速度, 并保持该中 速移动速度。
当步骤 102中解析出交互关键字为 "低速运动" 时, 则可确定画面 素材移动命令具体为: 画面素材保持预先设置的低速移动速度, 则此时 控制画面素材移动速度加速到该预先设置的低速移动速度, 并保持该低 速移动速度。
图 3为根据本发明实施方式的汽车类型的画面素材交互移动示意图。 由图 3可见, 画面素材为汽车模型, 背景图像为房屋。 汽车模型与 房屋融合显示在一起, 而且汽车模型停留在房屋之上。 当感应到用户的 语音之后, 可以根据包含于该语音中的交互关键字对应的画面素材移动 命令, 汽车模型从该房屋中移开。
而且, 在用户靠近摄像头时, 可以由摄像头的热传感器检测到用户 的热量, 并开始播放预先设定的引擎启动声音, 以预示汽车模型已经启 动。
基于上述详细分析, 本发明实施方式还提出了一种语音交互装置。 图 4为根据本发明实施方式的语音交互装置结构图。
如图 4所示, 该装置包括对应关系设置单元 401、 画面素材展示单 元 402、 交互关键字解析单元 403和画面素材移动单元 404, 其中:
对应关系设置单元 401 , 用于设置画面素材移动命令与交互关键字 的对应关系; 其中, 画面素材移动命令用于控制画面素材的移动;
画面素材单元 402, 用于展示画面素材;
交互关键字解析单元 403 , 用于录制用户语音文件, 分析所述用户 语音文件以解析出交互关键字;
画面素材移动单元 404, 用于根据解析出的所述交互关键字确定对 应于该交互关键字的画面素材移动命令, 并基于所确定的画面素材移动 命令控制所述画面素材的移动。
在一个实施方式中, 交互关键字解析单元 403 , 用于获取语音训练 文件和文本训练文件, 并利用所述语音训练文件和文本训练文件, 对语 音识别器采用语音自适应方式估计语音参数, 并利用该已估计语音参数 的语音识别器识别所述用户语音文件, 以将该用户语音文件转换为文本 文件; 从所述文本文件中检索交互关键字。
在另一个实施方式中, 交互关键字解析单元 403 , 用于确定该用户 语音文件的语音波形图; 判断该用户语音文件的语音波形图中是否包含 与交互关键字的语音波形保持一致的波形区域, 如果是, 则基于该所包 含的交互关键字的语音波形确定该交互关键字。
优选地, 对应关系设置单元 401 , 用于设置画面素材加速命令、 画 面素材减速命令、 画面素材启动命令、 画面素材停止命令、 画面素材移 动速度保持命令、 或画面素材移动轨迹与交互关键字的对应关系。
图 5为本发明实施例中另一种语音交互装置的结构示意图。 如图 5 所示。 除了对应关系设置单元 401、 画面素材展示单元 402、 交互关键 字解析单元 403和画面素材移动单元 404, 该装置可以进一步包括画面 素材变更单元 405; 其中,
对应关系设置单元 401 , 进一步用于设置画面素材变更命令与画面 素材变更关键字的对应关系;
交互关键字解析单元 403 , 进一步用于录制用户语音文件, 分析所 述用户语音文件以解析出画面素材变更关键字;
画面素材变更单元 405 , 用于 ^据解析出的所述画面素材变更关键 字确定对应于该画面素材变更关键字的更新后画面素材, 并向画面素材 展示单元 402发送该更新后画面素材;
画面素材展示单元 402 , 进一步用于展示该更新后画面素材。
可以基于上述方法和装置,在多种应用环境中实施本发明实施方式。 比如, 可以将本发明实施方式应用到移动终端的广告应用中。 可以预先 设置互动关键字 "奥迪启动"。 在用户点击广告后, 提示用户利用麦克 风说提示语: "奥迪启动"; 当用户说出奥迪启动后, 该语音关键字被解 析成文本, 与广告设定的互动关键字比较。 若互动关键字一致, 则将触 发广告的行为, 放出汽车启动的引擎声音, 预示着广告内的汽车已经启 动, 而且广告横幅内的汽车轮胎转动, 迅速移动开出广告可见范围, 因 此更能提高广告的互动性和新颖性。
基于上述分析, 本发明实施方式还提出了一种移动终端。
图 6为根据本发明实施方式的移动终端结构示意图。
如图 6所示, 该移动终端包括显示单元 601、 语音录制单元 602和 计算单元 603。 其中:
显示单元 601 , 用于展示画面素材;
语音录制单元 602, 用于录制用户语音文件;
计算单元 603 , 用于保存画面素材移动命令与交互关键字的对应关 系, 分析所述用户语音文件以解析出交互关键字; 根据解析出的所述交 互关键字确定对应于该交互关键字的画面素材移动命令, 并基于所确定 的画面素材移动命令控制所述画面素材的移动。
图 7为本发明实施例中另一种移动终端的结构示意图。如图 7所示, 除了显示单元 601、 语音录制单元 602和计算单元 603 , 该移动终端可 以进一步包括摄像单元 604, 用于感应用户的热量, 并在感应到用户热 量后向显示单元 601发出热量提示消息;
显示单元 601 , 进一步用于收到热量提示消息之后, 播放画面素材 移动启动音频。
本发明实施例中的单元可以集成于一体, 也可以分离部署; 可以合 并为一个单元, 也可以进一步拆分成多个子单元。 这些者单元可以由软 件实现(如存储在非易失性存储介质中的由一个或者多个处理器执行的 计算机可读指令), 或者由硬件实现, 或者由软件和硬件结合实现。
而且, 在这里, 图像受众、 图像提供商或任意的第三方都可以通过 各种信息传输方式, 将画面素材上传到位于网络侧的服务器中, 然后由 移动终端从服务器中获取到画面素材, 并展示画面素材。
实际上, 可以通过多种形式来具体实施本发明实施方式所提出的语 音互动方法和装置。 比如, 可以遵循一定规范的应用程序接口, 将语音 互动方法编写为安装到移动终端中的插件程序, 也可以将其封装为应用 程序以供用户自行下载使用。当编写为插件程序时,可以将其实施为 ocx、 dll、 cab等多种插件形式。也可以通过 Flash插件、 RealPlayer插件、 MMS 插件、 MIDI五线谱插件、 ActiveX插件等具体技术来实施本发明实施方 式所提出的语音互动方法。
可以通过指令或指令集存储的储存方式将本发明实施方式所提出的 语音互动方法存储在各种存储介质上。 这些存储介质包括但是不局限于: 软盘、 光盘、 DVD、 硬盘、 闪存、 U盘、 CF卡、 SD卡、 MMC卡、 SM 卡、 记忆棒 ( Memory Stick )、 xD卡等。
另外, 还可以将本发明实施方式所提出的语音互动方法应用到基于 闪存(Nand flash ) 的存储介质中, 比如 U盘、 CF卡、 SD卡、 SDHC 卡、 MMC卡、 SM卡、 记忆棒、 xD卡等。
综上所述, 在本发明实施方式中, 设置画面素材移动命令与交互关 键字的对应关系, 该方法还包括: 展示画面素材; 制用户语音文件, 分 析所述用户语音文件以解析出交互关键字; 根据解析出的所述交互关键 字确定对应于该交互关键字的画面素材移动命令, 并基于所确定的画面 素材移动命令控制所述画面素材的移动。 由此可见, 应用本发明实施方 式之后, 不同于现有技术中画面提供者的单片面画面展示, 而是可以基 于语音文件的交互关键字控制画面素材的移动, 因此画面浏览受众可以 通过语音方式与画面素材有效交互, 提高了交互成功率。
而且, 本发明实施方式通过感应用户语音来控制画面素材, 同时还 提高了画面素材的曝光程度, 从而进一步能够提高画面素材投放效果。 以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的 保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、 改进等, 均应包含在本发明的保护范围之内。
以上所述仅为本发明的较佳实施例而已, 并不用以限制本发明, 凡 在本发明的精神和原则之内所作的任何修改、 等同替换和改进等, 均应 包含在本发明的保护范围之内。

Claims

权利要求书
1、 一种语音交互方法, 其特征在于, 设置画面素材移动命令与交互 关键字的对应关系, 该方法还包括:
展示画面素材;
录制用户语音文件, 分析所述用户语音文件以解析出交互关键字; 才艮据解析出的所述交互关键字确定对应于该交互关键字的画面素材 移动命令, 并基于所确定的画面素材移动命令控制所述画面素材的移动。
2、根据权利要求 1所述的语音交互方法, 其特征在于, 所述分析用 户语音文件以解析出交互关键字包括:
获取语音训练文件和文本训练文件, 并利用所述语音训练文件和文 本训练文件, 对语音识别器采用语音自适应方式估计语音参数, 并利用 该已估计语音参数的语音识别器识别所述用户语音文件, 以将该用户语 音文件转换为文本文件;
从所述文本文件中检索交互关键字。
3、根据权利要求 1所述的语音交互方法, 其特征在于, 所述分析用 户语音文件以解析出交互关键字包括:
确定该用户语音文件的语音波形图;
判断该用户语音文件的语音波形图中是否包含与交互关键字的语音 波形保持一致的波形区域, 如果是, 则基于该所包含的交互关键字的语 音波形确定该交互关键字。
4、根据权利要求 1所述的语音交互方法, 其特征在于, 所述设置画 面素材移动命令与交互关键字的对应关系为: 设置画面素材加速命令、 画面素材减速命令、 画面素材启动命令、 画面素材停止命令、 画面素材 移动速度保持命令、 或画面素材移动轨迹与交互关键字的对应关系。
5、根据权利要求 1所述的语音交互方法, 其特征在于, 进一步设置 画面素材变更命令与画面素材变更关键字的对应关系; 该方法还包括: 录制用户语音文件, 分析所述用户语音文件以解析出画面素材变更 关键字;
才艮据解析出的所述画面素材变更关键字确定对应于该画面素材变更 关键字的更新后画面素材, 并展示所述更新后画面素材。
6、 一种语音交互装置, 其特征在于, 包括一个或者多个处理器和存 储器; 其中, 所述存储器中包含可由所述一个或者多个处理器执行的多 个单元, 所述多个单元包括: 对应关系设置单元、 画面素材展示单元、 交互关键字解析单元和画面素材移动单元, 其中:
对应关系设置单元, 用于设置画面素材移动命令与交互关键字的对 应关系;
画面素材展示单元, 用于展示画面素材;
交互关键字解析单元, 用于录制用户语音文件, 分析所述用户语音 文件以解析出交互关键字;
画面素材移动单元, 用于根据解析出的所述交互关键字确定对应于 该交互关键字的画面素材移动命令, 并基于所确定的画面素材移动命令 控制所述画面素材的移动。
7、 根据权利要求 6所述的语音交互装置, 其特征在于,
交互关键字解析单元, 用于获取语音训练文件和文本训练文件, 并 利用所述语音训练文件和文本训练文件, 对语音识别器采用语音自适应 方式估计语音参数, 并利用该已估计语音参数的语音识别器识别所述用 户语音文件, 以将该用户语音文件转换为文本文件; 从所述文本文件中 检索交互关键字。
8、 根据权利要求 6所述的语音交互装置, 其特征在于, 交互关键字解析单元, 用于确定该用户语音文件的语音波形图; 判 断该用户语音文件的语音波形图中是否包含与交互关键字的语音波形 保持一致的波形区域, 如果是, 则基于该所包含的交互关键字的语音波 形确定该交互关键字。
9、 根据权利要求 6所述的语音交互装置, 其特征在于,
对应关系设置单元, 用于设置画面素材加速命令、 画面素材减速命 令、 画面素材启动命令、 画面素材停止命令、 画面素材移动速度保持命 令、 或画面素材移动轨迹与交互关键字的对应关系。
10、 根据权利要求 6所述的语音交互装置, 其特征在于, 进一步包 括画面素材变更单元;
对应关系设置单元, 进一步用于设置画面素材变更命令与画面素材 变更关键字的对应关系;
交互关键字解析单元, 进一步用于录制用户语音文件, 分析所述用 户语音文件以解析出画面素材变更关键字;
画面素材变更单元, 用于根据解析出的所述画面素材变更关键字确 定对应于该画面素材变更关键字的更新后画面素材, 并向画面素材展示 单元发送该更新后画面素材;
画面素材展示单元, 进一步用于展示该更新后画面素材。
11、 一种移动终端, 其特征在于, 包括一个或者多个处理器和存储 器; 其中, 所述存储器包含可由所述一个或者多个处理器执行的多个单 元, 所述多个单元包括: 显示单元、 语音录制单元和计算单元, 其中: 显示单元, 用于展示画面素材;
语音录制单元, 用于录制用户语音文件;
计算单元, 用于保存画面素材移动命令与交互关键字的对应关系, 分析所述用户语音文件以解析出交互关键字; ^据解析出的所述交互关 键字确定对应于该交互关键字的画面素材移动命令, 并基于所确定的画 面素材移动命令控制所述画面素材的移动。
12、根据权利要求 11所述的移动终端, 其特征在于, 所述计算单元 进一步用于:
获取语音训练文件和文本训练文件, 并利用所述语音训练文件和文 本训练文件, 对语音识别器采用语音自适应方式估计语音参数, 并利用 该已估计语音参数的语音识别器识别所述用户语音文件, 以将该用户语 音文件转换为文本文件;
从所述文本文件中检索交互关键字。
13、根据权利要求 11所述的移动终端, 其特征在于, 所述计算单元 进一步用于:
确定该用户语音文件的语音波形图;
判断该用户语音文件的语音波形图中是否包含与交互关键字的语音 波形保持一致的波形区域, 如果是, 则基于该所包含的交互关键字的语 音波形确定该交互关键字。
14、根据权利要求 11所述的移动终端, 其特征在于, 所述画面素材 移动命令与交互关键字的对应关系包括: 画面素材加速命令、 画面素材 减速命令、 画面素材启动命令、 画面素材停止命令、 画面素材移动速度 保持命令、 或画面素材移动轨迹与交互关键字的对应关系。
15、根据权利要求 11所述的移动终端, 其特征在于, 所述计算单元 所述计算单元进一步用于:
分析所述用户语音文件以解析出画面素材变更关键字;
才艮据解析出的所述画面素材变更关键字确定对应于该画面素材变更 关键字的更新后画面素材, 并控制所述展示单元展示所述更新后画面素 材。
16、 根据权利要求 11所述的移动终端, 进一步包括摄像单元, 摄像单元, 进一步用于感应用户的热量, 并在感应到用户热量后向 显示单元发出热量提示消息;
显示单元, 进一步用于收到热量提示消息之后, 播放画面素材移动 启动音频。
PCT/CN2013/086734 2012-11-26 2013-11-08 一种语音交互方法和装置 WO2014079324A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/719,981 US9728192B2 (en) 2012-11-26 2015-05-22 Method and apparatus for voice interaction control of movement base on material movement

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210487130.3A CN103839548B (zh) 2012-11-26 2012-11-26 一种语音交互方法、装置、系统和移动终端
CN201210487130.3 2012-11-26

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/719,981 Continuation US9728192B2 (en) 2012-11-26 2015-05-22 Method and apparatus for voice interaction control of movement base on material movement

Publications (1)

Publication Number Publication Date
WO2014079324A1 true WO2014079324A1 (zh) 2014-05-30

Family

ID=50775525

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/086734 WO2014079324A1 (zh) 2012-11-26 2013-11-08 一种语音交互方法和装置

Country Status (3)

Country Link
US (1) US9728192B2 (zh)
CN (1) CN103839548B (zh)
WO (1) WO2014079324A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104253902A (zh) * 2014-07-21 2014-12-31 宋婉毓 与智能语音设备进行语音交互的方法
CN105528393A (zh) * 2015-11-30 2016-04-27 何磊 一种编辑文件的方法和装置
CN107659603B (zh) * 2016-09-22 2020-11-27 腾讯科技(北京)有限公司 用户与推送信息互动的方法及装置
CN109041258A (zh) * 2018-06-07 2018-12-18 安徽爱依特科技有限公司 储钱罐网络任务下发及动态交互展示方法及其系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1371090A (zh) * 2002-03-25 2002-09-25 苏州孔雀电器集团有限责任公司 一种将语音文件转换成文本文件的方法
JP2002318132A (ja) * 2001-04-23 2002-10-31 Hitachi Ltd 音声対話型ナビゲーションシステムおよび移動端末装置および音声対話サーバ
CN201117218Y (zh) * 2007-11-15 2008-09-17 邹鹏 自动控制的广告播放装置
CN101742110A (zh) * 2008-11-10 2010-06-16 天津三星电子有限公司 采用语音识别系统进行设置的摄像机
US20110003777A1 (en) * 2008-03-07 2011-01-06 Topotarget A/S Methods of Treatment Employing Prolonged Continuous Infusion of Belinostat
CN102253710A (zh) * 2010-05-21 2011-11-23 台达电子工业股份有限公司 多模式互动操作的电子装置及其多模式互动操作方法
CN102374864A (zh) * 2010-08-13 2012-03-14 国基电子(上海)有限公司 语音导航设备及语音导航方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725307B2 (en) * 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Query engine for processing voice based queries including semantic decoding
US9076448B2 (en) * 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system
US7710457B2 (en) * 2001-01-10 2010-05-04 Ip Holdings, Inc. Motion detector camera having a flash
US7668718B2 (en) * 2001-07-17 2010-02-23 Custom Speech Usa, Inc. Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile
CN1991982A (zh) * 2005-12-29 2007-07-04 摩托罗拉公司 一种使用语音数据激励图像的方法
CN101013571A (zh) * 2007-01-30 2007-08-08 无敌科技(西安)有限公司 一种使用语音命令的互动方法及其系统
US8718262B2 (en) * 2007-03-30 2014-05-06 Mattersight Corporation Method and system for automatically routing a telephonic communication base on analytic attributes associated with prior telephonic communication
DE102008051756A1 (de) 2007-11-12 2009-05-14 Volkswagen Ag Multimodale Benutzerschnittstelle eines Fahrerassistenzsystems zur Eingabe und Präsentation von Informationen
US20090182562A1 (en) * 2008-01-14 2009-07-16 Garmin Ltd. Dynamic user interface for automated speech recognition
US8898568B2 (en) * 2008-09-09 2014-11-25 Apple Inc. Audio user interface
TW201142686A (en) * 2010-05-21 2011-12-01 Delta Electronics Inc Electronic apparatus having multi-mode interactive operation method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002318132A (ja) * 2001-04-23 2002-10-31 Hitachi Ltd 音声対話型ナビゲーションシステムおよび移動端末装置および音声対話サーバ
CN1371090A (zh) * 2002-03-25 2002-09-25 苏州孔雀电器集团有限责任公司 一种将语音文件转换成文本文件的方法
CN201117218Y (zh) * 2007-11-15 2008-09-17 邹鹏 自动控制的广告播放装置
US20110003777A1 (en) * 2008-03-07 2011-01-06 Topotarget A/S Methods of Treatment Employing Prolonged Continuous Infusion of Belinostat
CN101742110A (zh) * 2008-11-10 2010-06-16 天津三星电子有限公司 采用语音识别系统进行设置的摄像机
CN102253710A (zh) * 2010-05-21 2011-11-23 台达电子工业股份有限公司 多模式互动操作的电子装置及其多模式互动操作方法
CN102374864A (zh) * 2010-08-13 2012-03-14 国基电子(上海)有限公司 语音导航设备及语音导航方法

Also Published As

Publication number Publication date
CN103839548B (zh) 2018-06-01
US9728192B2 (en) 2017-08-08
US20150255072A1 (en) 2015-09-10
CN103839548A (zh) 2014-06-04

Similar Documents

Publication Publication Date Title
US20200302179A1 (en) Method for labeling performance segment, video playing method, apparaus and system
US11238635B2 (en) Digital media editing
US11877016B2 (en) Live comments generating
EP3146446B1 (en) Media stream cue point creation with automated content recognition
US9715901B1 (en) Video preview generation
US7673238B2 (en) Portable media device with video acceleration capabilities
US20150160853A1 (en) Video transition method and video transition system
US20160099023A1 (en) Automatic generation of compilation videos
US20150243326A1 (en) Automatic generation of compilation videos
WO2018149176A1 (zh) 视频自动录制方法及装置、终端
US20180075487A1 (en) Advertisement management
JP2009543497A (ja) オーディオ−ビデオコンテンツを再生するための装置及び方法
CN110267113B (zh) 视频文件加工方法、系统、介质和电子设备
US11445144B2 (en) Electronic device for linking music to photography, and control method therefor
WO2021031733A1 (zh) 视频特效生成方法及终端
US20200351467A1 (en) Video recording method and video recording terminal
US9558784B1 (en) Intelligent video navigation techniques
US20210117471A1 (en) Method and system for automatically generating a video from an online product representation
US9564177B1 (en) Intelligent video navigation techniques
WO2014079324A1 (zh) 一种语音交互方法和装置
WO2022000991A1 (zh) 表情包生成方法及设备、电子设备和介质
CN104065977A (zh) 音/视频文件的处理方法及装置
TWI599220B (zh) 緩存資料管理系統及方法
US20140297285A1 (en) Automatic page content reading-aloud method and device thereof
US9084011B2 (en) Method for advertising based on audio/video content and method for creating an audio/video playback application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13856788

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 05.10.2015)

122 Ep: pct application non-entry in european phase

Ref document number: 13856788

Country of ref document: EP

Kind code of ref document: A1