WO2015165257A1 - 语音识别方法、装置、系统及计算机存储介质 - Google Patents

语音识别方法、装置、系统及计算机存储介质 Download PDF

Info

Publication number
WO2015165257A1
WO2015165257A1 PCT/CN2014/092162 CN2014092162W WO2015165257A1 WO 2015165257 A1 WO2015165257 A1 WO 2015165257A1 CN 2014092162 W CN2014092162 W CN 2014092162W WO 2015165257 A1 WO2015165257 A1 WO 2015165257A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
voice recognition
list
supported
speech
Prior art date
Application number
PCT/CN2014/092162
Other languages
English (en)
French (fr)
Inventor
刘海军
缪川扬
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP14890694.4A priority Critical patent/EP3139376B1/en
Priority to US15/307,023 priority patent/US20170047066A1/en
Publication of WO2015165257A1 publication Critical patent/WO2015165257A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • the invention relates to speech recognition technology in the field of communication and information, in particular to a speech recognition method, device, system and computer storage medium.
  • the development of digital multimedia and networks enriches the entertainment experience of users in their daily lives.
  • the current technology allows users to watch HDTV at home.
  • the source of TV programs may come from digital discs, cable TV, the Internet, etc., and can experience stereo, 5.1-channel, 7.1-channel and even more realistic sound effects, and users can also These experiences are realized using a tablet (PAD) or mobile phone.
  • PAD tablet
  • Related technologies include that users can transfer digital content between different devices for playback via the network, and control the playback of a device through a remote control or voice, such as controlling switching. One channel, the next channel program, and so on.
  • the traditional control of multiple devices is that the remote control device is used to control the corresponding devices, and these remote controllers are often not universal, and most remote controllers do not have network functions, such as traditional televisions and stereos;
  • Some remote controllers that support the network such as software that supports computing and network communication capabilities, such as mobile phones, load software that supports interworking protocols to control another device.
  • Voice control is a relatively new way.
  • the microphone on one device collects voice and analyzes it, and finally converts it into a corresponding executable command to control the device.
  • Techniques and products such as speech recognition require a device to be controlled to have a microphone to collect voice, but in some environments, such as a home environment, some devices do not have a microphone due to device size, cost, etc., but users also need Control these devices without a microphone by voice.
  • the embodiments of the present invention provide a voice recognition method, device, system, and computer storage medium, which can enable a device that does not have voice collection capability to receive voice control, thereby facilitating the user to use the voice control device to improve the user experience.
  • An embodiment of the present invention provides a voice recognition method, where the method includes:
  • the speech recognition device issues a list of supported speech, and/or a list of instructions corresponding to the supported speech.
  • An embodiment of the present invention further provides a voice recognition method, where the method includes:
  • the voice recognition control device acquires a list of voices supported by the voice recognition device, and/or a list of instructions corresponding to voices supported by the voice recognition device.
  • the embodiment of the present invention further provides a voice recognition device, where the voice recognition device includes:
  • the first communication unit is configured to issue a list of supported voices, and/or a list of instructions corresponding to the supported voices.
  • the embodiment of the invention further provides a voice recognition control device, and the voice recognition control device include:
  • the second communication unit is configured to acquire a list of voices supported by the voice recognition device, and/or a list of instructions corresponding to voices supported by the voice recognition device.
  • An embodiment of the present invention further provides a voice recognition system, where the voice recognition system includes a voice recognition device, and/or a voice recognition control device;
  • the voice recognition device is configured to issue a list of supported voices, and/or a list of instructions corresponding to the supported voices;
  • the voice recognition control device is configured to acquire a list of voices supported by the voice recognition device, and/or a list of instructions corresponding to voices supported by the voice recognition device.
  • the embodiment of the invention further provides a computer storage medium storing executable instructions configured to execute the voice recognition method described above.
  • the technical solution provided by the embodiment of the present invention is to publish a list of voices supported by the voice recognition device and/or a list of instructions corresponding to the supported voices in the network; enable the voice recognition device to be set and does not have voice collection
  • Capable devices can also accept voice control, which can help users to control devices within a certain range using simpler and more natural operation modes, and users do not need to learn to control the use of multiple devices. Quick and easy control, while reducing the cost of production and user consumption.
  • FIG. 1 is a first schematic diagram of a voice recognition method according to an embodiment of the present invention.
  • FIG. 2 is a second schematic diagram of a voice recognition method according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a voice recognition apparatus according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a voice recognition control apparatus according to an embodiment of the present invention.
  • FIG. 5a is a schematic diagram of a scenario in an embodiment of the present invention.
  • 5b is a flowchart showing the operation of the voice recognition device and the voice recognition control device according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of message interaction when voice control is implemented in an embodiment of the present invention.
  • the inventors have found that related technologies have realized the transmission of control information between different devices through a network to realize mutual discovery and control between devices, such as related universal plug and play (UPnP, Universal).
  • the Plug and Play technology specifies how to send and receive network messages between devices to implement discovery and control.
  • the technology uses the network address and digital code as the device identifier, which is a kind of machine identification.
  • the final control requires the user to use the device according to the device.
  • the logo is selected and operated; if a voice recognition method can be provided, the user can use a simpler and more natural operation mode to control more devices within a certain range, so that the user does not need to learn to master more usage methods, and can also Reduce the cost of production and user consumption.
  • the embodiment of the present invention describes a voice recognition method.
  • a voice recognition device for example, in a network issues a list of supported voices and/or a list of commands corresponding to the supported voices.
  • the voice recognition device is disposed in the device to be controlled, and the device to be controlled may be any conventional device, and does not need to have voice collection capability and voice recognition capability; both of the above lists include a voice recognition device.
  • the identifier of the device to be controlled and the instruction supported by the voice recognition device because the voice recognition device has a one-to-one correspondence with the device to be controlled, the instruction supported by the voice recognition device is used to control the device to be controlled, and therefore, the device to be controlled
  • the identification may be equivalent to (as) the identification of the voice recognition device, and the instructions supported by the voice recognition device may also be equivalent to the instructions supported by the device to be controlled; an example of the list of voices supported by the voice recognition device is:
  • An example of a list of instructions corresponding to speech supported by the speech recognition device is:
  • the "wav” file name is an encoded voice data file in which encoded digital data such as "shutdown" is stored.
  • the voice recognition apparatus may issue a list corresponding to the form of any of the above examples, or may issue a list including the forms corresponding to the above two examples.
  • the same list can be preset, or different lists can be preset.
  • the device identification (local identification) in the list is unique to distinguish different devices to be controlled.
  • the voice recognition device further receives the collected voice; and executes an instruction corresponding to the collected voice; or
  • the instruction corresponding to the collected voice or the collected voice is forwarded; wherein, before the voice recognition device executes the instruction corresponding to the collected voice, the collected voice needs to be recognized, and the collected voice is obtained.
  • the instruction corresponding to the voice is forwarded; wherein, before the voice recognition device executes the instruction corresponding to the collected voice, the collected voice needs to be recognized, and the collected voice is obtained.
  • the instruction corresponding to the voice is forwarded; wherein, before the voice recognition device executes the instruction corresponding to the collected voice, the collected voice needs to be recognized, and the collected voice is obtained.
  • the voice recognition device implements control of the device to be controlled by the voice recognition device by executing an instruction, such as starting, ending, and the like; when forwarding the collected voice, the voice recognition device can forward all collected Voice (or instructions corresponding to voice).
  • the forwarding, by the voice recognition device, the collected voice or the voice, according to a preset policy, the instruction corresponding to the collected voice or the collected voice The instruction corresponding to the collected voice;
  • the forwarding may be implemented by sending a message in the network, or by a communication interface between voice recognition devices; the messages sent in the network include multicast, broadcast, and unicast messages;
  • the setting policy includes at least one of the following policies: when the received collected voice is a preset specific voice, forwarding the collected voice or the collected voice An instruction corresponding to the voice; when the collected voice is not supported, forwarding the collected voice or the instruction corresponding to the collected voice, that is, when the voice recognition device receives the voice recognized by the voice recognition control device If the received voice is not recognized, or the command corresponding to the recognized voice is supported, but the recognized command is not supported, the target voice recognition device of the received voice is not the voice recognition device, and accordingly,
  • the voice recognition device forwards the received voice or the command corresponding to the collected voice to other voice recognition devices to enable the target recognition device that receives the voice or command to perform processing; for example, when receiving the voice "power on” and voice When "power off”, if the voice recognition device only supports the power-on command corresponding to "
  • the acquisition of speech can be implemented by a speech recognition control device such that the speech recognition device receives the speech acquired by the speech recognition control device.
  • the speech described herein is represented by computer coded data, for example, sampling frequency data including sound, and the encoding format may be a standard such as G.711 established by the International Telecommunication Union Telecommunication Standardization Group (ITU-T).
  • ITU-T International Telecommunication Union Telecommunication Standardization Group
  • the voice recognition device issues a list of supported voices, and/or a list of instructions corresponding to the supported voices, including:
  • the voice recognition device issues (eg, publishes in the network) a list of supported voices, and/or a list of instructions corresponding to the supported voices, ie, the voice recognition device spontaneously issues;
  • the voice recognition device after receiving the request message for querying the voice recognition capability, responds to the list of supported voices and/or the list of commands corresponding to the supported voices, that is, the voice recognition device passively responds and sends in the network. For example, you can respond in the form of unicast, multicast, or broadcast messages over the network;
  • the list of supported voices issued by the voice recognition device, and/or the supported The list of instructions corresponding to the voice may be periodically or non-periodically published; the list of voices includes at least one of the following: voice text; encoded voice data; voice text and/or device identifier of the device identifier Encoded voice data.
  • voice recognition devices are provided in each device to be controlled, and the voices supported by each voice recognition device may be different, and the voice recognition control device may collect the collected voices.
  • the voice is identified, that is, the one or more voice recognition devices that support the voices collected by the voices are determined, and the voice corresponding instructions are sent to the target voice recognition device, and correspondingly, as an implementation manner, the method further includes: The voice recognition device receives the instruction corresponding to the collected voice, and executes the instruction;
  • the instruction corresponding to the collected voice received by the voice recognition device is an instruction supported by the voice recognition device, and therefore, the received command can be directly executed.
  • the voice recognition device can be set in the device to be controlled, and uses its own voice recognition capability for voice recognition.
  • the list of supported voices issued by the voice recognition device in the network, and/or the list of instructions corresponding to the supported voices further includes an identifier of the voice recognition device; the identifier includes at least one of the following forms :
  • the encoded speech data corresponds to the encoded speech data.
  • the embodiment of the invention further describes a voice recognition method, as shown in FIG. 2, the method includes:
  • the voice recognition control device acquires a list of voices supported by the voice recognition device, and/or a list of instructions corresponding to voices supported by the voice recognition device.
  • the voice recognition control device also collects voice (via a microphone) and transmits the collected voice to the voice recognition device; thus, for voiceless
  • the device to be controlled by the capability to receive the voice collected by the voice recognition control device is equivalent to having the voice collection capability;
  • the voice includes at least one of the following forms of voice: voice text; encoded voice data.
  • the voice recognition control device collects voice and sends the collected voice to the voice recognition device; that is, the voice recognition device sends the collected voice to all voice recognition devices, and the voice recognition device itself identifies Processing; of course, the voice recognition control device can also recognize the collected voice, recognize the instruction corresponding to the collected voice, and send the identified command to all the voice recognition devices.
  • a voice recognition device is disposed in each device to be controlled, and when the voice recognition control device collects voice, the voice may be recognized and recognized.
  • a command corresponding to the voice, and a target voice recognition device for the voice since the voice recognition device is in one-to-one correspondence with the device to be controlled, the target voice recognition device that recognizes the voice may also be equivalent to the target manipulation device that recognizes the voice, and the collected voice recognition device
  • the voice (or the instruction corresponding to the voice) is sent to the target voice recognition device;
  • the list of voices supported by the voice recognition device and the list of commands corresponding to voices supported by the voice recognition device all include an identifier of the voice recognition device;
  • the voice recognition control device when the voice recognition control device determines that the collected voice indicates the target voice recognition device, the voice recognition control device can recognize the collected voice, and the recognition result and the identifier of the voice recognition device. Matching; the matched speech recognition device is determined as the target speech recognition device that the acquired speech indicates manipulation.
  • the identifier of the voice recognition device includes at least one of the following forms:
  • the voice data of the code for example, when the encoded voice data is "living room television.wav", the target voice recognition device that identifies the voice is a voice recognition device provided in the living room television.
  • the voice recognition control device acquires (for example, can obtain through the network) a list of voices supported by the voice recognition device, and/or a list of instructions corresponding to the supported voice, including:
  • the voice recognition control device receives (for example, can receive through the network) a list of supported voices issued by the voice recognition device, and/or a list of instructions corresponding to the supported voices, that is, the voice recognition control device receives The voice recognition device actively releases the above list device; or
  • the voice recognition control device transmits (eg, may transmit over the network) a voice recognition capability request message to the voice recognition device to receive a list of supported voices responded by the voice recognition device, and/or the supported voice corresponding List of instructions.
  • the embodiment of the invention further describes a computer storage medium, wherein the computer storage medium stores executable instructions, and the executable instructions are configured to perform the voice recognition method shown in FIG. 1 or FIG.
  • the embodiment of the present invention further describes a voice recognition device.
  • the voice recognition device includes:
  • the first communication unit 31 is configured to issue (eg, may be published in the network) a list of supported voices, and/or a list of instructions corresponding to the supported voices.
  • the voice recognition device further includes:
  • the first receiving unit 32 is configured to receive the collected voice
  • the first execution unit 33 is configured to execute an instruction corresponding to the collected voice
  • the first execution unit 33 is further configured to: identify the collected voice, obtain an instruction corresponding to the collected voice; and determine to support the collected voice, The instruction corresponding to the collected voice is determined, and the determined instruction is executed.
  • the first execution unit 33 is further configured to: forward the collected voice or the instruction corresponding to the collected voice according to a preset policy; the preset policy includes at least one of the following policies:
  • the collected voice or the instruction corresponding to the collected voice is forwarded.
  • the first communication unit 31 is further configured to issue a list of supported voices (eg, may be published in the network), and/or a list of instructions corresponding to the supported voices;
  • a list of supported speeches eg, responsive in the network
  • a list of instructions corresponding to the supported speech e.g., a list of instructions corresponding to the supported speech.
  • the voice recognition device further includes:
  • the second receiving unit 34 is configured to be an instruction corresponding to the collected voice
  • the second execution unit 35 is configured to execute an instruction received by the second receiving unit 34.
  • the voice in the list of voices includes at least one of the following forms of voice:
  • Voice text encoded voice data.
  • the voice recognition device publishes a list of supported voices in the network, and/or a list of instructions corresponding to the supported voices, and further includes an identifier of the voice recognition device; the identifier includes at least the following form identifier One:
  • the encoded speech data corresponds to the encoded speech data.
  • the first communication unit 31, the first receiving unit 32, and the second receiving unit 34 may be implemented by a chip in a voice recognition device that supports a corresponding communication protocol, and the communication protocol includes: IEEE 802.11b/ g/n, IEEE 802.3; the first execution unit 33 and the first
  • the second execution unit 35 can be implemented by a central processing unit (CPU) in a voice recognition device, a digital signal processor (DSP), or a Field Programmable Gate Array (FPGA).
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA Field Programmable Gate Array
  • the embodiment of the present invention further describes a voice recognition control device.
  • the voice recognition control device includes:
  • the second communication unit 41 is configured to acquire (eg, obtain through the network) a list of voices supported by the voice recognition device, and/or a list of instructions corresponding to voices supported by the voice recognition device.
  • the voice recognition control device further includes:
  • the first collecting unit 42 is configured to collect voice, and send the collected voice to the voice recognition device through the second communication unit 41.
  • the voice includes at least one of the following forms of voice: voice text; encoded voice data.
  • the voice recognition control device further includes:
  • the second collecting unit 43 is configured to collect voices
  • the first identification unit 44 is configured to identify an instruction corresponding to the voice collected by the second collection unit 43 , and send the identified instruction to the voice recognition device by the second communication unit 41 .
  • the third collecting unit 45 is configured to collect voices
  • the second identification unit 46 is configured to identify the target voice recognition device that is controlled by the voice collected by the third collection unit 45, and trigger the second communication unit 41 to collect the voice collected by the third collection unit 45. Or the instruction corresponding to the voice collected by the third collecting unit 45 is sent to the target voice recognition device.
  • the list of voices supported by the voice recognition device and the list of commands corresponding to voices supported by the voice recognition device all include an identifier of the voice recognition device;
  • the second identifying unit 46 is further configured to identify the voice collected by the third collecting unit 45, and match the recognition result with the identifier of the voice recognition device;
  • the matched voice recognition device is determined as the target voice recognition device that the voice collected by the third acquisition unit 45 indicates manipulation.
  • the identifier of the voice recognition device includes at least one of the following forms:
  • the voice recognition device corresponds to the encoded voice data.
  • the second communication unit 41 is further configured to receive (for example, may receive through the network) a list of supported voices issued by the voice recognition device, and/or a list of instructions corresponding to the supported voices; or
  • a voice recognition capability request message is transmitted (e.g., may be transmitted over the network) to the voice recognition device to receive a list of supported voices that the voice recognition device responds, and/or a list of instructions corresponding to the supported voice.
  • the second communication unit 41 may be implemented by a chip in a voice recognition control device that supports a corresponding communication protocol, the communication protocol includes: IEEE 802.11b/g/n, IEEE 802.3; the first collection unit 42 The second acquiring unit 43 and the third collecting unit 45 may be implemented by a microphone having a voice recognition control device having a voice collecting function; the first identifying unit 44 and the second identifying unit 46 may be implemented by a CPU, DSP or a voice recognition control device. FPGA implementation.
  • the embodiment of the invention further describes a speech recognition system, the speech recognition system speech recognition device, and/or a speech recognition control device;
  • the voice recognition device is configured to issue a list of supported voices, and/or a list of instructions corresponding to the supported voices;
  • the voice recognition control device is configured to acquire a list of voices supported by the voice recognition device, and/or a list of instructions corresponding to voices supported by the voice recognition device.
  • the voice recognition device is further configured to receive the collected voice
  • the voice recognition device is further configured to identify the collected voice, and obtain an instruction corresponding to the collected voice.
  • the voice recognition device is further configured to: forward the collected voice or the command corresponding to the collected voice according to a preset policy; the preset policy includes at least one of the following policies:
  • the collected voice or the instruction corresponding to the collected voice is forwarded.
  • the voice recognition device is further configured to spontaneously issue a list of supported voices, and/or a list of instructions corresponding to the supported voices;
  • the voice recognition device responds to the list of supported voices and/or the list of instructions corresponding to the supported voices upon receiving the request message for querying the voice recognition capability.
  • the voice recognition device is further configured to receive an instruction corresponding to the collected voice and execute the instruction.
  • the speech in the list of speeches includes at least one of the following forms of speech:
  • Voice text encoded voice data.
  • the list of supported voices issued by the voice recognition device, and/or the list of instructions corresponding to the supported voices further includes an identifier of the voice recognition device; the identifier includes at least one of the following forms identifiers :
  • the encoded speech data corresponds to the encoded speech data.
  • the voice recognition control device is further configured to collect voices and send the collected voices. Sended to the voice recognition device.
  • the voice recognition control device is further configured to collect voice, identify an instruction corresponding to the collected voice, and send the recognized command to the voice recognition device.
  • the speech includes at least one of the following forms of speech: speech text; encoded speech data.
  • the voice recognition control device is further configured to collect voices
  • Determining the collected speech indication means the target speech recognition device
  • the list of voices supported by the voice recognition device and the list of commands corresponding to voices supported by the voice recognition device each include an identifier of the voice recognition device.
  • the voice recognition control device is further configured to identify the collected voice, and match the recognition result with the identifier of the voice recognition device;
  • the matched speech recognition device is determined as the target speech recognition device that the acquired speech indicates manipulation.
  • the identifier of the voice recognition device includes at least one of the following forms:
  • the voice recognition device corresponds to the encoded voice data.
  • the voice recognition control device is further configured to receive a list of supported voices issued by the voice recognition device, and/or a list of instructions corresponding to the supported voices; or
  • the voice recognition control device transmits a voice recognition capability request message to the voice recognition device to receive a list of supported voices responded by the voice recognition device, and/or a list of instructions corresponding to the supported voice.
  • FIG. 5a is a schematic diagram of a scenario in the embodiment of the present invention.
  • the four devices shown in FIG. 5a are respectively a voice recognition control device and a television. DVD player and home storage server. Among them, TV The machine and home storage server support voice control, but there is no microphone to support voice recognition. For the convenience of description, the DVD player does not support voice control and can only be controlled by a conventional remote controller.
  • IP Internet Protocol
  • the ability of these four devices to discover, connect, send and receive messages on the network can be implemented using the related UPnP technology, or using the multicast domain name system (mDNS) or domain name system based service discovery (DNS-SD).
  • mDNS multicast domain name system
  • DNS-SD domain name system based service discovery
  • this kind of technology is used in IP network, in unicast, multicast query mode, respond to queries and provide function calls according to a predefined message format.
  • the UPnP technology specifies how media display devices (such as televisions), servers (such as DVD players, home storage servers) respond to queries, and which calling functions are provided.
  • the voice recognition control device performs voice collection through a microphone to implement voice recognition; and can also implement functions of data storage, control, and network services.
  • the voice recognition control device may also be a wearable device, such as a ring-type device worn on the hand and a watch-type device worn on the arm, which can collect, identify or encode the user. Voices are sent and also have network capabilities.
  • the voice recognition control device can identify the identifier of the device device according to the capability information of the received voice control device, and find information such as a network address and a unique identifier of the device device, so that the target voice recognition device can be determined.
  • the collected voice or the command corresponding to the collected voice is sent to the target voice recognition device.
  • the television and the home storage server wait for the control device to be turned on, and the voice recognition device in the device to be controlled transmits the message in a multicast manner, where the message includes:
  • the unique identifier of the voice recognition device is used to indicate that the device is a voice recognition device, and may adopt a predefined coding type, such as a network address, or an identifier different from the network address, such as a character string;
  • the message may further include: an instruction parameter corresponding to the voice supported by the voice recognition device, such as a duration of the voice representation.
  • FIG. 5b is a flowchart of the operation of the voice recognition device and the voice recognition control device in the embodiment of the present invention. As shown in Figure 5b, the following steps are included:
  • step 501 the voice recognition device in the device to be controlled is started, or a query request is received.
  • the query request is issued by the voice recognition control device in FIG. 5b for requesting voice recognition capability of the voice recognition device set in each device (including a home storage server, a television, a DVD player) in FIG. 5a,
  • the speech recognition capability employs a list of speeches supported by the speech recognition device, and/or a list of instructions corresponding to the supported speech.
  • step 502 the voice recognition device issues a voice recognition capability message.
  • the speech recognition capability message includes an identification of the speech recognition device (in the form of text, or encoded speech data) and a set of speech description information, the speech description information including a list of instructions corresponding to the speech supported by the speech recognition device, and/or A list of supported voices; the form of voice adoption in the list of voices includes: a form of voice text, a form of encoded voice data; since the voice recognition device in FIG. 5a is in one-to-one correspondence with the device to be controlled, the identity of the voice recognition device It can also be used as an identifier for the device to be operated.
  • the voice recognition device may actively send out a voice recognition capability message in the form of a broadcast or multicast message; or when receiving a query message for querying whether the device to be controlled supports voice recognition, A speech recognition capability message is issued in the form of a unicast, multicast, or broadcast message.
  • Step 503 the voice recognition control device receives the voice recognition capability message.
  • step 504 the voice recognition control device collects voice.
  • the collection may be performed by computer acquisition, for example, by capturing voice data through a microphone to analyze and identify the voice, or by collecting voice data through the wearable device to analyze and identify the voice.
  • Step 505 The voice recognition control device collects the voice, determines an instruction corresponding to the collected voice, or determines description information of the collected voice, and sends the determined instruction or voice description information to the voice recognition device.
  • the target voice recognition device for determining the collected voice is determined. Since the voice recognition device in FIG. 5b is in one-to-one correspondence with the device to be controlled, determining the target voice recognition device is equivalent to determining the target motion of the voice.
  • the device that is, determining which device the acquired voice is used to control, and determining the target voice recognition device herein may be determined by matching the collected voice with the identifier of the voice recognition device in the list;
  • the description information of the collected voice is in the form of text or encoded voice data.
  • Step 506a the voice recognition control device transmits the determined instruction or description information of the voice to the target voice recognition device.
  • the voice recognition device in the target manipulation device that is sent to the voice.
  • Step 507a when the target speech recognition device receives the instruction, execute the received instruction; when the target speech recognition device receives the speech description information, perform secondary recognition according to the description information of the speech, determine a corresponding instruction, and execute the instruction.
  • Steps 506a and 507a may also be replaced with application steps 506b and 507b.
  • Step 506b the voice recognition control device transmits the determined instruction or description information of the voice to the voice recognition device.
  • step 507b the voice recognition device processes the received command or voice description information according to a preset policy.
  • the preset policy includes: when the collected voice is a preset specific voice (for example, the voice recognition device has forwarded the voice), forwarding the collected voice; when the collected voice is not supported And forwarding the collected voice.
  • a preset specific voice for example, the voice recognition device has forwarded the voice
  • the voice recognition device 1 when the voice recognition device 1 receives the command, if The voice recognition device 1 supports the received command, and the target control device that identifies the voice of the user is a television. Accordingly, the voice recognition device 1 controls the television to execute an instruction to complete the response to the user voice control; if the voice recognition device 1 does not Supporting the received command, the target control device identifying the user's voice is not a television, and forwarding the received command to the voice recognition device set in other devices (including the home storage server, DVD player) in FIG. 5a, and other The voice recognition device in the device respectively determines whether the received command is supported, and executes the command when determining to support the received command, and completes the response to the user voice control;
  • the voice recognition device When the voice recognition device (set to the voice recognition device 1) provided in the television set receives the voice description information (ie, the voice description information determined by the voice recognition control device in step 505), the voice recognition device 1 needs to determine based on the voice description information. Corresponding instructions, the rest of the processing is the same as described above, and will not be described here;
  • the voice recognition device set to the voice recognition device 1 provided in the television set receives the command (ie, the command determined by the voice recognition control device in step 505), if the command is an instruction previously forwarded by the voice recognition device 1, Identifying that the instruction is an instruction that is not supported by the voice recognition device, and forwarding the instruction to the voice recognition device set in other devices (including the home storage server, the DVD player) in FIG. 5, and the voice recognition device in the other device respectively Determine whether to support the received instruction, execute the instruction when determining to support the received instruction, complete the voice control of the user System response.
  • the command ie, the command determined by the voice recognition control device in step 505
  • the voice recognition device controls the device in which it is located to respond to the received command, thereby implementing voice control of the device.
  • the plurality of voice recognition devices of the user it is also possible to prevent the plurality of voice recognition devices of the user from performing erroneous operations according to the voice implemented by the user, for example, when the voice recognition devices of the plurality of devices support the same voice (corresponding to the shutdown command), and the user intends to close only one The device, such that by confirming the target voice manipulation device in the above steps, it is possible to avoid an error response to the user's voice.
  • FIG. 6 is a schematic diagram of message interaction when voice control is implemented in the embodiment of the present invention.
  • the voice recognition device described above is respectively disposed in the device 1 and the device 2, and the voice recognition control device described above is disposed in the voice recognition control device;
  • the voice control in the embodiment of the present invention includes the following steps:
  • step 601 the device 1 sends a multicast message.
  • the multicast message includes a list of instructions corresponding to the voice supported by the voice recognition device in the device 1.
  • the voice recognition control device in the network receives the list of instructions corresponding to the voice supported by the device 1.
  • Step 602 The voice recognition control device sends a request message for querying the voice recognition capability to the device 2.
  • the message sent in step 602 can be sent in the form of a broadcast, multicast, or unicast message.
  • step 603 the device 2 sends a unicast message.
  • the unicast message includes a list of instructions corresponding to the voice supported by the device 2.
  • Step 604 the voice recognition control device collects the voice.
  • Step 605 the voice recognition control device sends a voice control instruction to the device 1.
  • This instruction is issued because the voice recognition control device determines that the voice collected by the user at step 604 is to manipulate the device 1, and determines that the device 1 supports the collected voice.
  • the device 1 still supports voice control although it does not have components such as a microphone and a wearable device.
  • the device 1 and the device 2 may be a control device for the television, the player, and the storage server, and the device to be controlled according to the embodiment of the present invention is not limited to the device mentioned above, and other devices such as a computer and an audio device.
  • the sound box, the projector, the set top box, and the like can all be used as the equipment to be controlled, and even other industrial equipment such as automobiles, machine tools, ships, and the like can be controlled by the voice recognition control device described in the embodiment of the present invention.
  • the microphones in the voice recognition control device may be of various specifications such as a monaural acquisition microphone, a microphone array, and the like.
  • the foregoing process is an embodiment of the present invention, and is not limited to the foregoing embodiment.
  • the method for performing the specific process is not limited in the embodiment, and the embodiment of the present invention may be implemented in a similar manner, for example,
  • the replacement of the device with a unit, the modification of the names, types, and the like of various messages described in the embodiments of the present invention are merely changes in the naming format and still fall within the protection scope of the present invention.
  • the above embodiments are network-related and can be applied to IEEE 802.3, IEEE 802.11b/g/n, power line network (POWELINE), cable (CABLE), public switched telephone network (PSTN, Public Switched Telephone Network), and third.
  • the IP network supported by the 3GPP, 3rd Generation Partnership Project, 3GPP2 network, etc., the operating system of each device can be applied to UNIX operating systems, WINDOWS operating systems, ANDROID operating systems, IOS.
  • the operating system, the consumer interface can be applied to the JAVA language interface.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit;
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage medium includes: a mobile storage device, a random access memory (RAM), a read-only memory (ROM), a magnetic disk, or an optical disk.
  • RAM random access memory
  • ROM read-only memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code.
  • the above-described integrated unit of the present invention may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a standalone product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which is stored in a storage medium and includes a plurality of instructions for making
  • a computer device which may be a personal computer, server, or network device, etc.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a RAM, a ROM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Selective Calling Equipment (AREA)

Abstract

公开了语音识别方法、装置、系统及计算机存储介质。一种语音识别方法包括:语音识别装置发布支持的语音的清单、和/或所述支持的语音对应的指令的清单;另一种语音识别方法包括:语音识别控制装置获取支持的语音的清单、和/或所述语音识别装置支持的语音对应的指令的清单。

Description

语音识别方法、装置、系统及计算机存储介质 技术领域
本发明涉及通信与信息领域的语音识别技术,尤其涉及语音识别方法、装置、系统及计算机存储介质。
背景技术
数字多媒体和网络的发展,丰富了用户日常生活中的娱乐体验。目前的技术让用户在家里能够观看高清电视,电视节目的来源可能来自数字光盘、有线电视、互联网等等,能够体验立体声、5.1声道、7.1声道乃至更逼真的声音效果,而且用户还能够使用平板电脑(PAD)、手机来实现这些体验,相关技术还包括,用户能够通过网络在不同设备之间转移数字内容以进行播放,以及通过遥控器、语音控制一个设备的播放,例如控制切换上一频道、下一频道节目等等。
传统的对多个设备的控制中常见的是,分别使用遥控器控制对应的设备,而这些遥控器往往是互不通用的,遥控器大多不具备网络功能,例如传统的电视机、音响;也有一些支持网络的遥控器,例如在具有计算和网络通信能力的设备如手机、上加载支持互通协议的软件,来控制另一设备。
随着技术的发展,多个设备之间的内容播放共享、转移需求越来越多,上述操控方式显得并不够方便,例如,用户需要在一堆遥控器中挑选出对应设备的遥控器并随着控制不同的设备而不断地更换遥控器,或者,由熟悉电脑基本操作的人来操作PAD、手机来控制设备,或者以简单的语音来控制单一的设备。为了使用不同设备往往要学习使用不同的操控工具。
语音控制是当前比较新颖的一种方式,一台设备上的麦克风采集语音并进行分析识别,最后转换成对对应的可执行指令以控制设备。
相关技术和一些产品能够让用户使用语音来操控设备,例如,通过在电视机上增加一个麦克风以采集(人的)语音,识别语音并按照预定义的语音与操控指令对应关系,确定相应操作指令并执行,达到通过语音操控电视机的效果,已经实现的操控包括开机、关机等。
语音识别这样的技术和产品,要求被操控的设备要具有一个麦克风以采集语音,但在某些环境中如家庭环境中,一些设备由于设备尺寸、成本等原因并不具备麦克风,但是用户也需要通过语音来操控这些不具备麦克风的设备。
综上所述,如何帮助用户使用更简单、更自然的操作方式来控制在较小范围内的更多设备,使用户不需要学习掌握更多的使用方法,还能够降低企业生产、用户消费的成本,相关技术尚无有效解决方案。
发明内容
本发明实施例提供语音识别方法、装置、系统及计算机存储介质,能够使不具备语音采集能力的设备也能够接受语音的控制,方便用户使用语音控制设备,提升用户体验。
本发明实施例提供一种语音识别方法,所述方法包括:
语音识别装置发布支持的语音的清单、和/或所述支持的语音对应的指令的清单。
本发明实施例还提供一种语音识别方法,所述方法包括:
语音识别控制装置获取语音识别装置支持的语音的清单、和/或所述语音识别装置支持的语音对应的指令的清单。
本发明实施例还提供一种语音识别装置,所述语音识别装置包括:
第一通信单元,配置为发布支持的语音的清单、和/或所述支持的语音对应的指令的清单。
本发明实施例还提供一种语音识别控制装置,所述语音识别控制装置 包括:
第二通信单元,配置为获取语音识别装置支持的语音的清单、和/或所述语音识别装置支持的语音对应的指令的清单。
本发明实施例还提供一种语音识别系统,所述语音识别系统包括语音识别装置,和/或语音识别控制装置;其中,
所述语音识别装置,配置为发布支持的语音的清单、和/或所述支持的语音对应的指令的清单;
所述语音识别控制装置,配置为获取语音识别装置支持的语音的清单、和/或所述语音识别设备支持的语音对应的指令的清单。
本发明实施例还提供一种计算机存储介质,存储有可执行指令,配置为执行上述的语音识别方法。
本发明实施例提供的技术方案,通过在网络中发布语音识别装置支持的语音的清单、和/或所述支持的语音对应的指令的清单;能够使设置了语音识别装置的且不具备语音采集能力的设备也能够接受语音的控制,从而,能够帮助用户使用更简单、更自然的操作方式来控制在一定范围内的设备,且用户不需要学习掌握多个设备的控制使用方法即可对设备进行快速方便地控制,同时降低了企业生产、用户消费的成本。
附图说明
图1为本发明实施例中语音识别方法的示意图一;
图2为本发明实施例中语音识别方法的示意图二;
图3为本发明实施例中语音识别装置的组成示意图;
图4为本发明实施例中语音识别控制装置的组成示意图;
图5a是本发明实施例中的一个场景的示意图;
图5b为本发明实施例中语音识别装置和语音识别控制装置的工作流程图;
图6为本发明实施例中实现语音控制时的消息交互示意图。
具体实施方式
发明人在实施本发明的过程中发现,已经有相关技术实现通过网络在不同设备之间传递控制信息,以实现设备之间的互相发现和控制,例如相关的通用即插即用(UPnP,Universal Plug and Play)技术规定了设备之间如何发送、接收网络消息来实现发现和控制,该技术以网络地址及数字编码作为设备的标识,是一种机器标识,最终的控制需要用户根据设备的机器标识进行选择再操作;如果能够提供一种语音识别方法,帮助用户使用更简单、更自然的操作方式来控制一定范围内的更多设备,使用户不需要学习掌握更多的使用方法,还能够降低企业生产、用户消费的成本。
本发明实施例记载一种语音识别方法,如图1所示,语音识别装置(例如,在网络中)发布支持的语音的清单、和/或所述支持的语音对应的指令的清单。
需要指出的是,所述语音识别装置设置于待操控设备中,所述待操控设备可以为任意常规设备,且无需具有语音采集能力和语音识别能力;上述的两种清单均包括语音识别装置所处的待操控设备的标识、以及语音识别装置支持的指令,由于语音识别装置与待操控设备是一一对应的关系,语音识别装置支持的指令用于控制待操控设备,因此,待操控设备的标识可以等同于(作为)语音识别装置的标识,语音识别装置支持的指令也可以等同于待操控设备支持的指令;语音识别装置支持的语音的清单的一个示例为:
本机(对应待操控设备)标识=客厅电视机;关机.wav;开机.wav;增加音量.wav;降低音量.wav;
语音识别装置支持的语音对应的指令的清单的一个示例为:
本机(对应待操控设备)标识=客厅电视机;指令1=关机;指令2=开 机;3=增加音量;4=降低音量;
语音识别装置支持的语音对应的指令的清单的又一个示例为:
本机标识=客厅电视机.wav;指令1=关机.wav;指令2=开机.wav;3=增加音量.wav;4=降低音量.wav;
其中,“wav”文件名是编码的语音数据文件,语音数据文件中存储诸如“关机”等语音的编码数字数据。
如上所述,语音识别装置可以发布上述任一示例的形式所对应的清单,也可以发布包括与上述两个示例的形式所对应的清单。
对于不同的待操控设备,可以预设相同的清单,也可以预设不同的清单,清单中的设备标识(本机标识)唯一,以区分不同的待操控设备。
作为一个实施方式,语音识别装置还接收被采集到的语音;执行所述采集到的语音对应的指令;或,
转发所述被采集到的语音或所述被采集到的语音对应的指令;其中,语音识别装置执行被采集到的语音对应的指令之前,还需识别被采集到的语音,得到被采集到的语音对应的指令。
其中,语音识别装置通过执行指令,实现对语音识别装置所处的待操控设备的控制,例如开始、结束等;在转发所述被采集到的语音时,语音识别装置可以转发所有被采集到的语音(或语音对应的指令)。
作为一个实施方式,所述转发所述被采集到的语音或所述被采集到的语音对应的指令,包括:所述语音识别装置根据预设策略,转发所述被采集到的语音或所述被采集到的语音对应的指令;
这里,所述的转发可以通过在网络中发送消息的方式实现,或通过语音识别装置之间的通信接口实现;所述在网络中发送的消息包括多播、广播和单播消息;所述预设策略包括以下策略的至少之一:接收到的被采集到的语音为预设特定语音时,转发所述被采集到的语音或所述被采集到的 语音对应的指令;不支持所述被采集到的语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令,即语音识别装置接收到语音识别控制装置识别的语音时,如果无法识别所接收到语音,或者支持识别出的语音对应的指令,但不支持所识别出的指令,则说明所接收到的语音的目标语音识别装置不是所述语音识别装置,相应地,所述语音识别装置转发被采集的语音或被采集的语音对应的指令至其他语音识别装置,以使接收到语音或指令的目标识别装置能够进行处理;例如,当接收到语音“开机”和语音“关机”时,如果语音识别装置只支持“开机”对应的开机指令,则在网络中发布“关机”语音或“关机”指令,以使其他语音识别装置进行处理。
语音的采集可以由语音识别控制装置实现,以使语音识别装置接收语音识别控制装置采集的语音。这里所述的语音,用计算机编码数据来表示,例如包括声音的采样频率数据,编码格式可以采用国际电信联盟远程通信标准化组(ITU-T)制定的G.711等标准。语音识别装置接收到语音时,识别出与所接收的语音对应的指令,并触发语音识别装置所处的待操控设备执行所识别出的指令,以实现对语音识别装置所处的待操控设备的控制。
作为一个实施方式,所述语音识别装置发布支持的语音的清单、和/或所述支持的语音对应的指令的清单,包括:
所述语音识别装置发布(例如在网络中发布)支持的语音的清单、和/或所述支持的语音对应的指令的清单,即语音识别装置自发发布;
或者,所述语音识别装置在接收到查询语音识别能力的请求消息后,响应支持的语音的清单、和/或所述支持的语音对应的指令的清单,即语音识别装置在网络中被动响应发送;例如,可以在网络中以单播、多播或广播消息的形式进行响应;
其中,所述语音识别装置发布的支持的语音的清单、和/或所述支持的 语音对应的指令的清单时,可以周期性发布,也可以非周期性发布;所述语音的清单包括以下信息至少之一:语音文本;编码的语音数据;设备标识的语音文本和/或设备标识的编码的语音数据。
由于在一些使用场景中可能有多个待操控设备,相应地,每个待操控设备中都设置语音识别装置,每个语音识别装置所支持的语音可能不同,语音识别控制装置可以将所采集的语音进行识别,即确定一个或多个支持自身所采集的语音的语音识别装置,并向目标语音识别装置对应发送语音对应的指令,相应地,作为一个实施方式,所述方法还包括:所述语音识别装置接收被采集到的语音对应的指令,并执行指令;
本实施方式中,语音识别装置所接收到的被采集到的语音对应的指令,为语音识别装置所支持的指令,因此,可以直接执行所接收的指令。
语音识别装置可以设置于待操控设备中,利用自身的语音识别能力进行语音识别。
作为一个实施方式,由于在一些使用场景中可能有多个待操控设备,相应地,每个待操控设备中都设置语音识别装置,这就有必要对不同待操控设备中的语音识别装置进行区分;相应地,所述语音识别装置在网络中发布支持的语音的清单、和/或所述支持的语音对应的指令的清单还包括语音识别装置的标识;所述标识包括以下形式标识至少之一:
所述语音识别装置的标识对应的语音文本;
所述语音识别装置的标识对应的编码的语音数据。
本发明实施例还记载一种语音识别方法,如图2所示,所述方法包括:
语音识别控制装置获取语音识别装置支持的语音的清单、和/或所述语音识别装置支持的语音对应的指令的清单。
作为一个实施方式,所述语音识别控制装置还(通过麦克风)采集语音,将所采集的语音发送至所述语音识别装置;这样,对于不具备语音采 集能力的待操控设备,通过接收语音识别控制装置采集的语音,相当于具备了语音采集能力;
其中,所述语音包括以下形式语音至少之一:语音文本;编码的语音数据。
作为一个实施方式,所述语音识别控制装置采集语音,将所采集的语音发送至所述语音识别装置;即语音识别装置将所采集的语音发送至全部语音识别装置,由语音识别装置自身进行识别处理;当然,语音识别控制装置也可以对所采集的语音进行识别,识别出所采集的语音对应的指令,并将所识别出的指令发送至全部语音识别装置。
作为一个实施方式,由于在一些使用场景中可能有多个待操控设备,相应地,每个待操控设备中都设置语音识别装置,语音识别控制装置采集到语音时,可以对语音进行识别,识别出语音对应的指令、以及语音的目标语音识别装置(由于语音识别装置与待操控设备一一对应,因此识别语音的目标语音识别装置也可以等同于识别语音的目标操控设备),将所采集的语音(或语音对应的指令)发送至目标语音识别装置;
其中,所述语音识别装置支持的语音的清单、以及所述语音识别装置支持的语音对应的指令的清单,均包括所述语音识别装置的标识;
相应地,语音识别控制装置确定所采集到的语音指示操控的目标语音识别装置时,可以通过以下方式实现:语音识别控制装置识别所采集到的语音,将识别结果与所述语音识别装置的标识匹配;将匹配到的语音识别装置确定为所采集到的语音指示操控的目标语音识别装置。
其中,所述语音识别装置的标识包括以下形式标识至少之一:
所述语音识别装置(或为语音识别装置所处的待操控设备)的对应的语音文本;
所述语音识别装置(或为语音识别装置所处的待操控设备)对应的编 码的语音数据;例如当编码的语音数据为“客厅电视机.wav”时,标识语音的目标语音识别装置为客厅电视机中设置的语音识别装置。
作为一个实施方式,所述语音识别控制装置获取(例如可以通过网络获取)语音识别装置支持的语音的清单、和/或所述支持的语音对应的指令的清单,包括:
所述语音识别控制装置接收(例如可以通过网络接收)语音识别装置发布的支持的语音的清单、和/或所述支持的语音对应的指令的清单,也就是说,所述语音识别控制装置接收语音识别装置主动发布上述清单装置;或,
所述语音识别控制装置向所述语音识别装置发送(例如可以通过网络发送)语音识别能力请求消息,以接收所述语音识别装置响应的支持的语音的清单、和/或所述支持的语音对应的指令的清单。
本发明实施例还记载一种计算机存储介质,所述计算机存储介质中存储有可执行指令,所述可执行指令配置为执行图1或图2所示的语音识别方法。
本发明实施例还记载一种语音识别装置,如图3所示,所述语音识别装置包括:
第一通信单元31,配置为发布(例如,可以在网络中发布)支持的语音的清单、和/或所述支持的语音对应的指令的清单。
其中,所述语音识别装置还包括:
第一接收单元32,配置为接收被采集到的语音;
第一执行单元33,配置为执行所述被采集到的语音对应的指令;或,
转发所述被采集到的语音或所述被采集到的语音对应的指令。
其中,所述第一执行单元33,还配置为识别所述被采集到的语音,得到所述被采集到的语音对应的指令;确定支持所述被采集到的语音时,确 定所述被采集到的语音对应的指令,并执行所确定的指令。
其中,所述第一执行单元33,还配置为根据预设策略,转发所述被采集到的语音或所述被采集到的语音对应的指令;所述预设策略包括以下策略至少之一:
所述被采集到的语音为预设特定语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令;
不支持所述被采集到的语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令。
其中,所述第一通信单元31,还配置为自发发布(例如可以在网络中发布)支持的语音的清单、和/或所述支持的语音对应的指令的清单;
或者,在接收到查询语音识别能力的请求消息时,响应(例如可以在网络中响应)支持的语音的清单、和/或所述支持的语音对应的指令的清单。
其中,所述语音识别装置还包括:
第二接收单元34,配置为被采集到的语音所对应的指令;
第二执行单元35,配置为执行所述第二接收单元34所接收到的指令。
其中,所述语音的清单中的语音包括以下形式语音至少之一:
语音文本;编码的语音数据。
其中,所述语音识别装置在网络中发布支持的语音的清单、和/或所述支持的语音对应的指令的清单,还包括所述语音识别装置的标识;所述标识包括以下形式标识至少之一:
所述语音识别装置的标识对应的语音文本;
所述语音识别装置的标识对应的编码的语音数据。
实际应用中,所述第一通信单元31、所述第一接收单元32和所述第二接收单元34可由语音识别装置中支持相应通信协议的芯片实现,所述通信协议包括:IEEE 802.11b/g/n、IEEE 802.3;所述第一执行单元33和所述第 二执行单元35可由语音识别装置中的中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Signal Processor)或现场可编程门阵列(FPGA,Field Programmable Gate Array)实现。
本发明实施例还记载一种语音识别控制装置,如图4所示,所述语音识别控制装置包括:
第二通信单元41,配置为获取(例如,可以通过网络获取)语音识别装置支持的语音的清单、和/或所述语音识别装置支持的语音对应的指令的清单。
其中,所述语音识别控制装置还包括:
第一采集单元42,配置为采集语音,将所采集的语音通过所述第二通信单元41发送至所述语音识别装置。
其中,所述语音包括以下形式语音至少之一:语音文本;编码的语音数据。
其中,所述语音识别控制装置还包括:
第二采集单元43,配置为采集语音;
第一识别单元44,配置为识别所述第二采集单元43所采集到的语音对应的指令,通过所述第二通信单元41将所识别出的指令发送至所述语音识别装置。
第三采集单元45,配置为采集语音;
第二识别单元46,配置为识别所述第三采集单元45所采集到的语音指示操控的目标语音识别装置,触发所述第二通信单元41将所述第三采集单元45采集到的语音,或所述第三采集单元45采集到的语音对应的指令,发送至所述目标语音识别装置。
其中,所述语音识别装置支持的语音的清单、以及所述语音识别装置支持的语音对应的指令的清单,均包括所述语音识别装置的标识;
相应地,所述第二识别单元46,还配置为识别第三采集单元45所采集到的语音,将识别结果与所述语音识别装置的标识匹配;
将匹配到的语音识别装置确定为所述第三采集单元45所采集到的语音指示操控的目标语音识别装置。
其中,所述语音识别装置的标识包括以下形式标识至少之一:
所述语音识别装置的对应的语音文本;
所述语音识别装置对应的编码的语音数据。
其中,所述第二通信单元41,还配置为接收(例如,可以通过网络接收)语音识别装置发布的支持的语音的清单、和/或所述支持的语音对应的指令的清单;或,
向所述语音识别装置发送(例如可以通过网络发送)语音识别能力请求消息,以接收所述语音识别装置响应的支持的语音的清单、和/或所述支持的语音对应的指令的清单。
实际应用中,所述第二通信单元41可由语音识别控制装置中支持相应通信协议的芯片实现,所述通信协议包括:IEEE 802.11b/g/n、IEEE 802.3;所述第一采集单元42、所述第二采集单元43、第三采集单元45可由语音识别控制装置具有语音采集功能的麦克风实现;所述第一识别单元44、第二识别单元46可由语音识别控制装置中的CPU、DSP或FPGA实现。
本发明实施例还记载一种语音识别系统,所述语音识别系统语音识别装置、和/或语音识别控制装置;
其中,所述语音识别装置,配置为发布支持的语音的清单、和/或所述支持的语音对应的指令的清单;
所述语音识别控制装置,配置为获取语音识别装置支持的语音的清单、和/或所述语音识别设备支持的语音对应的指令的清单。
其中,所述语音识别装置还配置为接收被采集到的语音;
执行所述被采集到的语音对应的指令;或,
转发所述被采集到的语音或所述被采集到的语音对应的指令。
其中,所述语音识别装置还配置为识别所述被采集到的语音,得到所述被采集到的语音对应的指令。
其中,所述语音识别装置还配置为根据预设策略,转发所述被采集到的语音或所述被采集到的语音对应的指令;所述预设策略包括以下策略至少之一:
所述被采集到的语音为预设特定语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令;
不支持所述被采集到的语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令。
其中,所述语音识别装置还配置为自发发布支持的语音的清单、和/或所述支持的语音对应的指令的清单;
或者,所述语音识别装置在接收到查询语音识别能力的请求消息时,响应支持的语音的清单、和/或所述支持的语音对应的指令的清单。
其中,所述语音识别装置还配置为接收被采集到的语音所对应的指令,并执行所述指令。
所述语音的清单中的语音包括以下形式语音至少之一:
语音文本;编码的语音数据。
其中,所述语音识别装置所发布的支持的语音的清单、和/或所述支持的语音对应的指令的清单,还包括所述语音识别装置的标识;所述标识包括以下形式标识至少之一:
所述语音识别装置的标识对应的语音文本;
所述语音识别装置的标识对应的编码的语音数据。
其中,所述语音识别控制装置还配置为采集语音,将所采集的语音发 送至所述语音识别装置。
其中,所述语音识别控制装置还配置为采集语音,识别所采集的语音对应的指令,将识别出的指令发送至所述语音识别装置。
所述语音包括以下形式语音至少之一:语音文本;编码的语音数据。
其中,所述语音识别控制装置还配置为采集语音;
确定所采集到的语音指示操控的目标语音识别装置;
将所采集到的语音或所采集到的语音对应的指令,发送至所述目标语音识别装置。
所述语音识别装置支持的语音的清单、以及所述语音识别装置支持的语音对应的指令的清单,均包括所述语音识别装置的标识。
其中,所述语音识别控制装置还配置为识别所采集到的语音,将识别结果与所述语音识别装置的标识匹配;
将匹配到的语音识别装置确定为所采集到的语音指示操控的目标语音识别装置。
所述语音识别装置的标识包括以下形式标识至少之一:
所述语音识别装置的对应的语音文本;
所述语音识别装置对应的编码的语音数据。
其中,所述语音识别控制装置还配置为接收语音识别装置发布的支持的语音的清单、和/或所述支持的语音对应的指令的清单;或,
所述语音识别控制装置向所述语音识别装置发送语音识别能力请求消息,以接收所述语音识别装置响应的支持的语音的清单、和/或所述支持的语音对应的指令的清单。
下面再结合具体使用场景对本发明实施例记载的方法进行说明,图5a是本发明实施例中的一个场景的示意图,如图5a所示的四个设备,分别是语音识别控制装置、电视机、DVD播放机和家庭存储服务器。其中,电视 机、家庭存储服务器支持语音控制,但没有麦克风以支持语音识别,为说明上的方便,DVD播放机不支持语音控制,只能用传统遥控器来控制。
四个设备上都有网络接口,例如支持IEEE 802.11b/g/n,或者支持IEEE 802.3,从而可以连接到网际协议(IP,Internet Protocol)网络,四个设备中任一设备能够与其他设备通信,以及处理指令、或转交指令。
这种四个设备在网络上的互相发现、连接、发送与接收消息的能力,可以使用相关的UPnP技术实现,也可以使用多播域名系统(mDNS)或基于域名系统的服务发现(DNS-SD)技术实现,这一类技术用在IP网络中,以单播、多播查询方式,按照预先定义的报文格式响应查询、提供功能调用。例如,UPnP技术规定了媒体显示设备(如电视机)、服务器(如DVD播放机、家庭存储服务器)如何响应查询、提供哪些调用功能。
语音识别控制装置通过麦克风进行语音采集,以实现语音识别;还可实现数据存储、控制和网络服务的功能。
本发明实施例中,语音识别控制装置也可以是一种可穿戴设备,例如戴在手上的指环式设备、戴在手臂上的手表式设备,这种可穿戴设备能够采集、识别或编码用户发出的语音,并且也具备网络功能。
本发明实施例中,语音识别控制装置能够根据接收到的语音控制装置的能力信息,识别出设备装置的标识,查找到设备装置的网络地址、惟一标识等信息,从而能够确定目标语音识别装置,将采集到的语音或采集到的语音对应的指令,发给目标语音识别装置。
在本发明实施例中,电视机、家庭存储服务器等待操控设备在开机时,待操控设备中的语音识别装置以多播方式发送报文,报文中包括:
语音识别装置的惟一标识,用于指示本装置是语音识别装置,可以采用预先定义的编码类型,例如网络地址、或不同于网络地址的标识,例如字符串等;
语音识别装置支持语音对应的指令的清单,例如当语音采用文本形式时,清单的一个示例为:“本机标识=客厅电视机;指令1=关机;指令2=开机;3=增加音量;4=降低音量”;
当语音采用编码的数据时,清单的一个示例为:“本机标识=客厅电视机.wav;指令1=关机.wav;指令2=开机.wav;3=增加音量.wav;4=降低音量.wav”;
报文中还可以包括:语音识别装置支持的语音对应的指令参数,例如语音表示的持续时间。
下面对图5a中的语音识别装置和语音识别控制装置进行配合,完成对设备语音控制的处理进行说明,图5b为本发明实施例中语音识别装置和语音识别控制装置的工作流程图,如图5b所示,包括以下步骤:
步骤501,待控制设备中的语音识别装置启动,或收到了查询请求。
所述查询请求为图5b中的语音识别控制装置发出,用于请求图5a中的各设备(包括家庭存储服务器、电视机、DVD播放机)中设置的语音识别装置的语音识别能力,所述语音识别能力采用所述语音识别装置支持的语音的清单、和/或所述支持的语音对应的指令的清单。
步骤502,语音识别装置发出语音识别能力消息。
所述语音识别能力消息包括语音识别装置的标识(采用文本形式、或编码的语音数据形式)和一组语音描述信息,语音描述信息包括语音识别装置支持的语音对应的指令的清单,和/或支持的语音的清单;语音的清单中的语音采用的形式包括:语音文本形式、编码的语音数据形式;由于图5a中语音识别装置与待操控设备是一一对应的,因此语音识别装置的标识也可以作为待操控设备的标识。
语音识别装置可以广播或多播消息的形式主动发出语音识别能力消息;也可以在接收到查询待操控设备是否支持语音识别的查询消息时,以 单播、多播或广播消息的形式发出语音识别能力消息。
步骤503,语音识别控制装置接收语音识别能力消息。
步骤504,语音识别控制装置采集语音。
这里,所述的采集可以通过计算机采集方式,例如通过麦克风捕获语音数据对语音进行分析识别,也可以是通过穿戴设备采集语音数据对语音进行分析识别。
步骤505,语音识别控制装置采集语音,并确定采集到的语音对应的指令、或确定采集到的语音的描述信息,发送所确定的指令或语音描述信息至语音识别装置。
语音识别控制装置采集到语音后,确定采集到的语音的目标语音识别装置,由于图5b中语音识别装置与待操控设备是一一对应的,因此确定目标语音识别装置等同与确定语音的目标操控设备,即确定采集到的语音是用来控制哪个设备的,这里确定目标语音识别装置,可以通过将采集的语音与清单中的语音识别装置的标识进行匹配的方式确定;
采集到的语音的描述信息为文本形式或编码的语音数据形式。
步骤506a,语音识别控制装置将所确定的指令或语音的描述信息发送至目标语音识别装置。
即发送至语音的目标操控设备中的语音识别装置。
步骤507a,目标语音识别装置接收到指令时,执行所接收的指令;目标语音识别装置接收到语音描述信息时,根据语音的描述信息进行二次识别,确定对应的指令,并执行指令。
步骤506a和步骤507a也可以对应用步骤506b和步骤507b代替。
步骤506b,语音识别控制装置将所确定的指令或语音的描述信息发送至语音识别装置。
即发送至图5a中的设备(包括家庭存储服务器、电视机、DVD播放机) 中所设置的语音识别装置。
步骤507b,语音识别装置根据预设策略处理接收的指令或语音描述信息。
所述预设策略包括:所述被采集到的语音为预设特定语音(如语音识别装置已经转发过语音)时,转发所述被采集到的语音;不支持所述被采集到的语音时,转发所述被采集到的语音。
以设置于电视机中的语音识别装置(设为语音识别装置1)接收到指令(即步骤505中语音识别控制装置确定的指令)的处理为例,当语音识别装置1接收到指令时,如果语音识别装置1支持所接收的指令,则标识用户的语音的目标控制设备为电视机,相应地,语音识别装置1控制电视机执行指令,完成对用户语音控制的响应;如果语音识别装置1不支持所接收的指令,标识用户语音的目标控制设备不是电视机,则将接收到的指令转发至图5a中其他设备(包括家庭存储服务器、DVD播放器)中设置的语音识别装置,并由其他设备中的语音识别装置分别判断是否支持所接收的指令,在确定支持所接收的指令时执行指令,完成对用户语音控制的响应;
当设置于电视机中的语音识别装置(设为语音识别装置1)接收到语音描述信息(即步骤505中语音识别控制装置确定的语音描述信息)时,语音识别装置1需要根据语音描述信息确定对应的指令,其余的处理与以上所述相同,这里不再赘述;
当设置于电视机中的语音识别装置(设为语音识别装置1)接收到指令(即步骤505中语音识别控制装置确定的指令)时,如果该指令为语音识别装置1之前转发过的指令,标识该指令为语音识别装置所不支持的指令,则转发该指令至图5中其他设备(包括家庭存储服务器、DVD播放器)中设置的语音识别装置,并由其他设备中的语音识别装置分别判断是否支持所接收的指令,在确定支持所接收的指令时执行指令,完成对用户语音控 制的响应。
语音识别装置控制自身所处的设备响应所接收的指令,从而,实现了对设备的语音控制。
本实施例中,还能够避免用户多个语音识别装置根据用户实施的语音进行误操作,例如当多个设备中的语音识别装置均支持同一语音(对应关机指令),而用户本意只是要关闭一个设备,这样,通过上述步骤中对目标语音操控设备的确认,能够避免对用户实施语音的错误响应。
图6是本发明实施例中实现语音控制时的消息交互示意图,设备1和设备2中分别设置有以上所述的语音识别装置,语音识别控制设备中设置有以上所述的语音识别控制装置;如图6所示,本发明实施例中语音控制包括以下步骤:
步骤601,设备1发出多播消息。
多播消息中包括设备1中的语音识别装置支持的语音对应的指令的清单。
从而,使网络中的语音识别控制设备接收到了设备1支持的语音对应的指令的清单。
步骤602,语音识别控制设备向设备2发出查询语音识别能力的请求消息。
步骤602中发出的消息可以使用广播、多播、或单播消息的形式发出。
步骤603,设备2发出单播消息。
单播消息中包括设备2支持的语音对应的指令的清单。
步骤604,语音识别控制设备采集语音。
步骤605,语音识别控制设备向设备1发送语音控制指令。
这一指令的发出,是因为语音识别控制设备确定在步骤604采集到用户的语音是要操控设备1,并且确定设备1支持采集到的语音。
从而,实现了设备1虽然不具备麦克风、穿戴设备等部件,仍然支持语音控制。
其中,上述的设备1和设备2可以为电视机、播放机、存储服务器等待控制设备,而本发明实施例中所述的待操控设备并不仅限于上述提到的设备,其他设备例如电脑、音响、音箱、投影仪、机顶盒等等都可以作为待操控设备,甚至工业上其它设备如汽车、机床、轮船等等都可以由本发明实施例记载的语音识别控制装置来操控。
上述实施例中,语音识别控制装置中的麦克风可以是各种规格,例如单声道采集麦克风、麦克风阵列等等。
上述流程,是实现本发明的实施例,并不是限定只能用上述实施例来实现,本方实施例中也不限定具体流程执行的方法,本发明实施例还可以用类似的方式实现,例如将装置替换为单元、更改本发明实施例中记载的各种消息的名称、类型等,这仅仅是命名形式的变化,仍然属于本发明的保护范围。
为了清楚起见,本发明实施例中没有示出和描述设备的所有的常规特征。当然,应当理解,在任何实际设备的研制中,必需做出特定实现方式的决定以便实现研制者的特定目标,例如符合与应用及业务相关的约束,这些特定的目标随着不同的实现方式而变化,并且随着不同的研制者而变化。而且,应当理解,这种研制工作是复杂和耗时的,但是尽管如此,对于受到本发明公开内容启发的普通技术人员而言所进行的技术工作是常规的。
根据这里描述的主题,能够利用各种类型的操作系统、计算平台、计算机程序、和/或通用机器来制造、操作和/或执行各种部件、系统、装置、处理步骤和/或数据结构。此外,本领域的普通技术人员将会明白,也可以利用不太通用的装置,而不脱离这里公开的发明构思的范围和精神实质。 其中,所包含的方法由计算机、装置或机器执行,并且该方法可以被存储为机器可读的指令,它们可以存储在确定的介质上,例如计算机存储装置,包括但不限于ROM(例如,只读存储器、FLASH存储器、转移装置等)、磁存储介质(例如,磁带、磁盘驱动器等)、光学存储介质(例如,CD-ROM、DVD-ROM、纸卡、纸带等)以及其他熟知类型的程序存储器。此外,应当认识到,该方法可以利用软件工具的选择由人类操作者执行,而不需要人或创造性的判断。
上述实施例,网络相关的,可适用于基于IEEE 802.3、IEEE 802.11b/g/n、电力线网路(POWELINE)、电缆(CABLE)、公共交换电话网络(PSTN,Public Switched Telephone Network)、第三代合作伙伴计划(3GPP,3rd Generation Partnership Project,)网络、3GPP2网络等通讯网络所支持的IP网络,各装置的操作系统可适用于UNIX类操作系统、WINDOWS类操作系统、ANDROID类操作系统、IOS操作系统,对消费者接口可适用于JAVA语言接口等。
在本发明所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、随机存取存储器(RAM,Random Access Memory)、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本发明上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、RAM、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (33)

  1. 一种语音识别方法,所述方法包括:
    语音识别装置发布支持的语音的清单、和/或所述支持的语音对应的指令的清单。
  2. 如权利要求1所述的语音识别方法,其中,所述方法还包括:
    所述语音识别装置接收被采集到的语音;
    执行所述被采集到的语音对应的指令;或,
    转发所述被采集到的语音或所述被采集到的语音对应的指令。
  3. 如权利要求2所述的语音识别方法,其中,所述执行所述被采集到的语音对应的指令之前,所述方法还包括:
    识别所述被采集到的语音,得到所述被采集到的语音对应的指令。
  4. 如权利要求2所述的语音识别方法,其中,所述转发所述被采集到的语音或所述被采集到的语音对应的指令,包括:
    根据预设策略,转发所述被采集到的语音或所述被采集到的语音对应的指令;所述预设策略包括以下策略至少之一:
    所述被采集到的语音为预设特定语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令;
    不支持所述被采集到的语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令。
  5. 如权利要求1所述的语音识别方法,其中,所述语音识别装置发布支持的语音的清单、和/或所述支持的语音对应的指令的清单,包括:
    所述语音识别装置自发发布支持的语音的清单、和/或所述支持的语音对应的指令的清单;
    或者,所述语音识别装置在接收到查询语音识别能力的请求消息时,响应支持的语音的清单、和/或所述支持的语音对应的指令的清单。
  6. 如权利要求1所述的语音识别方法,其中,所述方法还包括:
    所述语音识别装置接收被采集到的语音所对应的指令,并执行所述指令。
  7. 如权利要求1所述的语音识别方法,其中,所述语音的清单中的语音包括以下形式语音至少之一:
    语音文本;编码的语音数据。
  8. 如权利要求1至7任一项所述的语音识别方法,其中,
    所述语音识别装置发布支持的语音的清单、和/或所述支持的语音对应的指令的清单,还包括所述语音识别装置的标识;所述标识包括以下形式标识至少之一:
    所述语音识别装置的标识对应的语音文本;
    所述语音识别装置的标识对应的编码的语音数据。
  9. 一种语音识别方法,所述方法包括:
    语音识别控制装置获取语音识别装置支持的语音的清单、和/或所述语音识别装置支持的语音对应的指令的清单。
  10. 如权利要求9所述的语音识别方法,其中,所述方法还包括:
    所述语音识别控制装置采集语音,将所采集的语音发送至所述语音识别装置。
  11. 如权利要求9所述的语音识别方法,其中,所述方法还包括:
    所述语音识别控制装置采集语音,识别所采集的语音对应的指令,将识别出的指令发送至所述语音识别装置。
  12. 如权利要求9所述的语音识别方法,其中,
    所述语音包括以下形式语音至少之一:语音文本;编码的语音数据。
  13. 如权利要求9所述的语音识别方法,其中,所述方法还包括:
    所述语音识别控制装置采集语音;
    确定所采集到的语音指示操控的目标语音识别装置;
    将所采集到的语音或所采集到的语音对应的指令,发送至所述目标语音识别装置。
  14. 如权利要求13所述的语音识别方法,其中,所述语音识别装置支持的语音的清单、以及所述语音识别装置支持的语音对应的指令的清单,均包括所述语音识别装置的标识。
  15. 如权利要求14所述的语音识别方法,其中,所述确定所采集到的语音指示操控的目标语音识别装置,包括:
    识别所采集到的语音,将识别结果与所述语音识别装置的标识匹配;
    将匹配到的语音识别装置确定为所采集到的语音指示操控的目标语音识别装置。
  16. 如权利要求9所述的语音识别方法,其中,所述语音识别装置的标识包括以下形式标识至少之一:
    所述语音识别装置的对应的语音文本;
    所述语音识别装置对应的编码的语音数据。
  17. 如权利要求9至16任一项所述的语音识别方法,其中,所述语音识别控制装置获取语音识别装置支持的语音的清单、和/或所述支持的语音对应的指令的清单,包括:
    所述语音识别控制装置接收语音识别装置发布的支持的语音的清单、和/或所述支持的语音对应的指令的清单;或,
    所述语音识别控制装置向所述语音识别装置发送语音识别能力请求消息,以接收所述语音识别装置响应的支持的语音的清单、和/或所述支持的语音对应的指令的清单。
  18. 一种语音识别装置,所述语音识别装置包括:
    第一通信单元,配置为发布支持的语音的清单、和/或所述支持的语音 对应的指令的清单。
  19. 如权利要求18所述的语音识别装置,其中,所述语音识别装置还包括:
    第一接收单元,配置为接收被采集到的语音;
    第一执行单元,配置为执行所述被采集到的语音对应的指令;或,
    转发所述被采集到的语音或所述被采集到的语音对应的指令。
  20. 如权利要求19所述的语音识别装置,其中,
    所述第一执行单元,还配置为识别所述被采集到的语音,得到所述被采集到的语音对应的指令。
  21. 如权利要求19所述的语音识别装置,其中,
    所述第一执行单元,还配置为根据预设策略,转发所述被采集到的语音或所述被采集到的语音对应的指令;所述预设策略包括以下策略至少之一:
    所述被采集到的语音为预设特定语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令;
    不支持所述被采集到的语音时,转发所述被采集到的语音或所述被采集到的语音对应的指令。
  22. 如权利要求18至21任一项所述的语音识别装置,其中,
    所述第一通信单元,还配置为在网络中自发发布支持的语音的清单、和/或所述支持的语音对应的指令的清单;
    或者,在接收到查询语音识别能力的请求消息后,响应支持的语音的清单、和/或所述支持的语音对应的指令的清单。
  23. 如权利要求18至21任一项所述的语音识别装置,其中,所述语音识别装置还包括:
    第二接收单元,配置为接收被采集到的语音所对应的指令;
    第二执行单元,配置为执行所述第二接收单元所接收的指令。
  24. 一种语音识别控制装置,所述语音识别控制装置包括:
    第二通信单元,配置为获取语音识别装置支持的语音的清单、和/或所述语音识别装置支持的语音对应的指令的清单。
  25. 如权利要求24所述的语音识别控制装置,其中,所述语音识别控制装置还包括:
    第一采集单元,配置为采集语音,触发所述第二通信单元将所采集的语音发送至所述语音识别装置。
  26. 如权利要求24所述的语音识别控制装置,其中,所述语音识别控制装置还包括:
    第二采集单元,配置为采集语音;
    第一识别单元,配置为识别所述第二采集单元所采集到的语音对应的指令,触发所述第二通信单元将所识别出的指令发送至所述语音识别装置。
  27. 如权利要求24所述的语音识别控制装置,其中,所述语音识别控制装置还包括:
    第三采集单元,配置为采集语音;
    第二识别单元,配置为识别所述第三采集单元所采集到的语音指示操控的目标语音识别装置,触发所述第二通信单元将所述第三采集单元到的语音,或所述第三采集单元采集到的语音对应的指令,发送至所述目标语音识别装置。
  28. 如权利要求27所述的语音识别控制装置,其中,所述语音识别装置支持的语音的清单、以及所述语音识别装置支持的语音对应的指令的清单,均包括所述语音识别装置的标识。
  29. 如权利要求28所述的语音识别控制装置,其中,
    所述第二识别单元,还配置为识别所采集到的语音,将识别结果与所 述语音识别装置的标识匹配;
    将匹配到的语音识别装置确定为所采集到的语音指示操控的目标语音识别装置。
  30. 如权利要求24至29任一项所述的语音识别控制装置,其中,
    所述第二通信单元,还配置为接收语音识别装置发布的支持的语音的清单、和/或所述支持的语音对应的指令的清单;或,
    向所述语音识别装置发送语音识别能力请求消息,以接收所述语音识别装置响应的支持的语音的清单、和/或所述支持的语音对应的指令的清单。
  31. 一种语音识别系统,所述语音识别系统包括语音识别装置,和/或语音识别控制装置;其中,
    所述语音识别装置,配置为发布支持的语音的清单、和/或所述支持的语音对应的指令的清单;
    所述语音识别控制装置,配置为获取语音识别装置支持的语音的清单、和/或所述语音识别设备支持的语音对应的指令的清单。
  32. 一种计算机存储介质,所述计算机存储介质中存储有可执行指令,所述可执行指令配置为执行权利要求1至8任一项所述的语音识别方法。
  33. 一种计算机存储介质,所述计算机存储介质中存储有可执行指令,所述可执行指令配置为执行权利要求9至17任一项所述的语音识别方法。
PCT/CN2014/092162 2014-04-30 2014-11-25 语音识别方法、装置、系统及计算机存储介质 WO2015165257A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP14890694.4A EP3139376B1 (en) 2014-04-30 2014-11-25 Voice recognition method and device
US15/307,023 US20170047066A1 (en) 2014-04-30 2014-11-25 Voice recognition method, device, and system, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410182842.3 2014-04-30
CN201410182842.3A CN105023575B (zh) 2014-04-30 2014-04-30 语音识别方法、装置和系统

Publications (1)

Publication Number Publication Date
WO2015165257A1 true WO2015165257A1 (zh) 2015-11-05

Family

ID=54358122

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/092162 WO2015165257A1 (zh) 2014-04-30 2014-11-25 语音识别方法、装置、系统及计算机存储介质

Country Status (4)

Country Link
US (1) US20170047066A1 (zh)
EP (1) EP3139376B1 (zh)
CN (1) CN105023575B (zh)
WO (1) WO2015165257A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10866783B2 (en) * 2011-08-21 2020-12-15 Transenterix Europe S.A.R.L. Vocally activated surgical control system
US11561762B2 (en) * 2011-08-21 2023-01-24 Asensus Surgical Europe S.A.R.L. Vocally actuated surgical control system
US10419497B2 (en) * 2015-03-31 2019-09-17 Bose Corporation Establishing communication between digital media servers and audio playback devices in audio systems
CN107801413B (zh) * 2016-06-28 2020-01-31 华为技术有限公司 对电子设备进行控制的终端及其处理方法
CN108831448B (zh) * 2018-03-22 2021-03-02 北京小米移动软件有限公司 语音控制智能设备的方法、装置及存储介质
US20210319782A1 (en) * 2018-08-23 2021-10-14 Huawei Technologies Co., Ltd. Speech recognition method, wearable device, and electronic device
KR20200043075A (ko) * 2018-10-17 2020-04-27 삼성전자주식회사 전자 장치 및 그 제어방법, 전자 장치의 음향 출력 제어 시스템
JP6966501B2 (ja) * 2019-03-25 2021-11-17 ファナック株式会社 工作機械および管理システム
CN111726667A (zh) * 2020-05-25 2020-09-29 福建新大陆通信科技股份有限公司 一种智能音箱与机顶盒互联的方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1324175A (zh) * 2000-05-11 2001-11-28 松下电工株式会社 用于操作家用电器的话音控制系统
CN1356688A (zh) * 2000-11-27 2002-07-03 佳能株式会社 语音识别系统、语音识别服务器、语音识别客户机及其控制方法
CN1501233A (zh) * 2002-11-13 2004-06-02 ���ǵ�����ʽ���� 使用家庭服务器的家庭机器人及其家庭网络系统
US7197455B1 (en) * 1999-03-03 2007-03-27 Sony Corporation Content selection system
WO2013179985A1 (ja) * 2012-05-30 2013-12-05 日本電気株式会社 情報処理システム、情報処理方法、通信端末、情報処理装置およびその制御方法と制御プログラム
CN103714816A (zh) * 2012-09-28 2014-04-09 三星电子株式会社 电子装置、服务器及其控制方法

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6052666A (en) * 1995-11-06 2000-04-18 Thomson Multimedia S.A. Vocal identification of devices in a home environment
DE69712485T2 (de) * 1997-10-23 2002-12-12 Sony Int Europe Gmbh Sprachschnittstelle für ein Hausnetzwerk
US6330676B1 (en) * 1998-09-08 2001-12-11 International Business Machines Corporation Method and system for the automatic initiation of power application and start-up activities in a computer system
CN1196324C (zh) * 2000-08-21 2005-04-06 皇家菲利浦电子有限公司 带有可下载话音命令集的话音控制的遥控装置
EP1184841A1 (de) * 2000-08-31 2002-03-06 Siemens Aktiengesellschaft Sprachgesteuerte Anordnung und Verfahren zur Spracheingabe und -erkennung
KR100438838B1 (ko) * 2002-01-29 2004-07-05 삼성전자주식회사 대화 포커스 추적 기능을 가진 음성명령 해석장치 및 방법
US7698566B1 (en) * 2004-07-12 2010-04-13 Sprint Spectrum L.P. Location-based voice-print authentication method and system
US20080154612A1 (en) * 2006-12-26 2008-06-26 Voice Signal Technologies, Inc. Local storage and use of search results for voice-enabled mobile communications devices
US9026447B2 (en) * 2007-11-16 2015-05-05 Centurylink Intellectual Property Llc Command and control of devices and applications by voice using a communication base system
CN201194160Y (zh) * 2008-03-21 2009-02-11 广州汉音电子科技有限公司 语音识别装置
US8364481B2 (en) * 2008-07-02 2013-01-29 Google Inc. Speech recognition with parallel recognition tasks
US9729628B2 (en) * 2011-03-09 2017-08-08 Ortiz And Associates Consulting, Llc Systems and methods for enabling temporary, user-authorized cloning of mobile phone functionality on a secure server accessible via a remote client
CN102427558A (zh) * 2011-09-27 2012-04-25 深圳市九洲电器有限公司 一种机顶盒的声控方法及机顶盒
CN103685393A (zh) * 2012-09-13 2014-03-26 大陆汽车投资(上海)有限公司 车载语音操控终端和语音操控系统及数据处理系统
KR20140060040A (ko) * 2012-11-09 2014-05-19 삼성전자주식회사 디스플레이장치, 음성취득장치 및 그 음성인식방법
US20160078864A1 (en) * 2014-09-15 2016-03-17 Honeywell International Inc. Identifying un-stored voice commands
JP6516585B2 (ja) * 2015-06-24 2019-05-22 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 制御装置、その方法及びプログラム
KR20170032096A (ko) * 2015-09-14 2017-03-22 삼성전자주식회사 전자장치, 전자장치의 구동방법, 음성인식장치, 음성인식장치의 구동 방법 및 컴퓨터 판독가능 기록매체
US10498882B2 (en) * 2016-05-20 2019-12-03 T-Mobile Usa, Inc. Secure voice print authentication for wireless communication network services

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7197455B1 (en) * 1999-03-03 2007-03-27 Sony Corporation Content selection system
CN1324175A (zh) * 2000-05-11 2001-11-28 松下电工株式会社 用于操作家用电器的话音控制系统
CN1356688A (zh) * 2000-11-27 2002-07-03 佳能株式会社 语音识别系统、语音识别服务器、语音识别客户机及其控制方法
CN1501233A (zh) * 2002-11-13 2004-06-02 ���ǵ�����ʽ���� 使用家庭服务器的家庭机器人及其家庭网络系统
WO2013179985A1 (ja) * 2012-05-30 2013-12-05 日本電気株式会社 情報処理システム、情報処理方法、通信端末、情報処理装置およびその制御方法と制御プログラム
CN103714816A (zh) * 2012-09-28 2014-04-09 三星电子株式会社 电子装置、服务器及其控制方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3139376A4 *

Also Published As

Publication number Publication date
EP3139376A4 (en) 2017-05-10
EP3139376A1 (en) 2017-03-08
EP3139376B1 (en) 2022-06-29
CN105023575B (zh) 2019-09-17
CN105023575A (zh) 2015-11-04
US20170047066A1 (en) 2017-02-16

Similar Documents

Publication Publication Date Title
WO2015165257A1 (zh) 语音识别方法、装置、系统及计算机存储介质
US11095599B2 (en) Method and system of operating a social networking application via an external device
US8782150B2 (en) Method and apparatus for enabling device communication and control using XMPP
JP2019101730A (ja) 機器制御システム、機器制御方法、及びプログラム
WO2019090902A1 (zh) 屏幕共享的方法、装置、电子设备及存储介质
WO2015127787A1 (zh) 手势识别方法、装置、系统及计算机存储介质
TWI512489B (zh) Multi-screen interactive method, center equipment, terminal equipment and systems
EP2424172B1 (en) Method and apparatus for establishing communication
CN104866083B (zh) 手势识别方法、装置和系统
US20160150011A1 (en) Media output device to transmit and synchronize playback of a media content stream that is received over a point-to-point connection on multiple interconnected devices
CN104635501A (zh) 智能家居控制方法和系统
US10038941B2 (en) Network-based control of a media device
EP2950310A1 (en) Method and system for continuously playing media resources in device
CN114245328B (zh) 语音通话转移方法及电子设备
CN103761123A (zh) 功能启动方法、装置及终端
CN112004146A (zh) 音频的播放方法、系统、电视和存储介质
JP2014131143A (ja) 送信装置、送信方法、及びプログラム
WO2016062191A1 (zh) 信息发布方法、信息接收方法、装置及信息共享系统
US10079728B2 (en) System and method for discovery of devices on a network
JP6549261B2 (ja) アプリケーション実装方法およびサービスコントローラ
JP2019102963A (ja) 機器制御システム、デバイス、プログラム、及び機器制御方法
CN109660914A (zh) 一种分布式蓝牙音响控制系统
JP6001586B2 (ja) ゲートウェイ装置におけるデバイス装置の設定方法、ゲートウェイ装置及びデバイス装置設定プログラム
CN212909839U (zh) 一种基于电视的物联网控制系统
US20240028315A1 (en) Automatically Creating Efficient Meshbots

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14890694

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15307023

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2014890694

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014890694

Country of ref document: EP