WO2019169591A1 - Method and device for voice interaction - Google Patents

Method and device for voice interaction Download PDF

Info

Publication number
WO2019169591A1
WO2019169591A1 PCT/CN2018/078362 CN2018078362W WO2019169591A1 WO 2019169591 A1 WO2019169591 A1 WO 2019169591A1 CN 2018078362 W CN2018078362 W CN 2018078362W WO 2019169591 A1 WO2019169591 A1 WO 2019169591A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
space
voice device
people
information
Prior art date
Application number
PCT/CN2018/078362
Other languages
French (fr)
Chinese (zh)
Inventor
魏建宾
余尚春
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/078362 priority Critical patent/WO2019169591A1/en
Priority to CN201880090636.6A priority patent/CN111819626A/en
Publication of WO2019169591A1 publication Critical patent/WO2019169591A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase

Definitions

  • the embodiments of the present invention relate to the field of communications technologies, and in particular, to a voice prompting method and apparatus.
  • wake-up word + speech semantic recognition integration method to achieve zero-interval, zero-delay, seamless docking between wake-up words and voice manipulation, abandon the traditional one-question-answer form, and reduce user voice control. Steps and complicated wake-up actions. For the user, when using each function of the smart device, whether it is first wake up and then answer the question or wake up the question and answer integration, because there will be a wake-up action, the interaction between the in-vehicle device and the user is too complicated.
  • Embodiments of the present application provide a method and apparatus for voice prompting, which can reduce the wake-up action to the smart device when the smart device needs to be operated.
  • the embodiment of the present invention provides a method for voice interaction, including: a voice device determining a number of people in a space where the voice device is located; and when the voice device determines that the number of people in the space is one, the voice device Enter the wake-free voice interaction mode.
  • the voice device determines the number of people in the space where the voice device is located, including: the voice device according to one or more of voiceprint information, iris information, portrait information, fingerprint information, and sensing data. Judging the number of people in the space where the voice device is located.
  • the voice device recognizes the number of people in the space where the voice device is located in a plurality of ways, and comprehensively judges the accuracy of identifying the number of people in the space where the voice device is located.
  • the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including: the voice device collects the first voice in the space; and the voice device determines that the voice device is received. Whether the second voice is received within the first time period after the first voice, the second voice and the first voice have different voiceprint characteristics; if the voice device is not in the first time Receiving the second voice within the segment determines that there is one person in the space. It is a common way for a voice device to identify the number of people in the space where the voice device is located through different voiceprints. In this example, there is one person in the space.
  • the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including: the voice device collects a first voice in the space; if the first voice is not Describe a specific instruction, the voice device determines whether a second voice is received within a first time period after receiving the first voice, and the second voice and the first voice have different voiceprint characteristics And if the voice device does not receive the second voice within the first time period, determining that there is a person in the space. The voice device further determines whether the second voice is received within the first time period after receiving the first voice, by determining that the collected first voice is not a specific command, and can more accurately determine the space in the space where the voice device is located. Number of people.
  • the first voice in this article can be the first voice.
  • the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including: the voice device collects a first voice in the space; if the first voice is not Describe a specific instruction, the voice device determines whether a second voice is received within a first time period after receiving the first voice, and the second voice has a different voiceprint with the first voice Characteristic; if the voice device receives the second voice within the first time period, the smart voice device determines that there are multiple people in the space. The voice device further determines whether the second voice is received within the first time period after receiving the first voice, by determining that the collected first voice is not a specific command, and can more accurately determine the space in the space where the voice device is located. Number of people.
  • the voice device determines the number of people in the space where the voice device is located according to the iris information, including: the voice device obtains an image recognized by the iris through a camera in a space where the voice device is located; the voice device Determining whether there is different iris information in the image; when the voice device determines that there is different iris information, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one type In the case of iris information, the voice device determines that the space in which the voice device is located is one person. In this example, the voice device determines the number of people in the space where the voice device is located according to the iris information, and adds a method for determining the number of people in the space where the voice device is located.
  • the voice device determines the number of people in the space where the voice device is located according to the portrait information, including: the voice device obtains portrait information through a camera in a space where the voice device is located; the voice device Determining whether there is different portrait information in the image; when the voice device determines that there is different portrait information, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one type In the case of portrait information, the voice device determines that the space in which the voice device is located is one person. In this example, the voice device determines the number of people in the space where the voice device is located according to the portrait information, and adds a method for determining the number of people in the space where the voice device is located.
  • the voice device determines the number of people in the space where the voice device is located according to the fingerprint information, including: the voice device obtains fingerprint information by using a fingerprint identification device in a space where the voice device is located; the voice device Determining whether there is different fingerprint information in the image; when the voice device determines that there is different fingerprint information, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one type of In the case of fingerprint information, the voice device determines that the space in which the voice device is located is one person.
  • the voice device determines the number of people in the space where the voice device is located according to the fingerprint information, and adds a method for determining the number of people in the space where the voice device is located.
  • the voice device determines the number of people in the space where the voice device is located according to the sensing data, including: the voice device obtains the sensing data through the sensing device in the space where the voice device is located; the voice device determines Whether there is different sensing data in the image; when the voice device determines that there is different sensing data, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one sensing In the case of data, the voice device determines that the space in which the voice device is located is one person. In this example, the voice device determines the number of people in the space where the voice device is located according to the sensing data, and adds a method for determining the number of people in the space where the voice device is located.
  • the method further includes: the voice device receives a third voice, the third voice does not include an awakening word; Performing a function corresponding to the third voice.
  • the voice device receives a third voice, the third voice does not include an awakening word; Performing a function corresponding to the third voice.
  • the third voice that does not include the wake-up word is recognized, and the corresponding function is executed, so that the number of voice interactions of the wake-up word can be reduced.
  • the voice device when the voice device determines that there are multiple people in the space, the voice device enters a wake-up voice interaction mode; the voice device receives an wake-up word or a fourth voice including an wake-up word; The voice device enters a voice interaction mode or voice recognition and performs a function corresponding to the fourth voice. In this example, the voice device enters the wake-up voice interaction mode, and the voice interaction based on the wake-up word can be implemented.
  • the space in which the voice device is located is an enclosed space, a semi-enclosed space, or an open space.
  • the semi-enclosed space or open space is a spherical space having a radius of communication distance of the voice device.
  • the space in which the voice device is located is the closed space.
  • the embodiment of the invention provides a voice device, which has the function of realizing the behavior of the voice device in the actual method.
  • the functions may be implemented by hardware or by corresponding software implemented by hardware.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions for use in the voice device, including a program designed to perform the above aspects.
  • the solution provided by the present invention can reduce the wake-up action on the smart device when the smart device needs to be operated.
  • FIG. 1 is a system architecture diagram of a mobile vehicle communication system according to an embodiment of the present invention.
  • FIG. 2 is a functional block diagram of an exemplary vehicle 12 according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural view of the audio circuit 305 of FIG. 3;
  • FIG. 5 is a flowchart of voice interaction according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of determining a number of people in a vehicle according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of interaction between a person and a vehicle-mounted smart rearview mirror according to an embodiment of the present invention.
  • FIG. 8 is another schematic diagram of interaction between a person and a vehicle-mounted smart rearview mirror according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a vehicle-mounted smart speaker according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a vehicle-mounted smart TV according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a home smart speaker according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of interaction between a person and a home smart television according to an embodiment of the present invention.
  • Figure 13 is a schematic diagram of two enclosed spaces provided by an embodiment of the present invention.
  • FIG. 14 is a schematic structural diagram of a processor according to an embodiment of the present invention.
  • first and second are used for descriptive purposes only, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, “multiple” means two or more unless otherwise stated.
  • FIG. 1 is a system architecture diagram of a mobile vehicle communication system according to an embodiment of the present invention.
  • communication system 10 includes vehicle 12, one or more wireless carrier systems 14, terrestrial communication network 16, computer 18, and call center 20.
  • vehicle 12 includes vehicle 12, one or more wireless carrier systems 14, terrestrial communication network 16, computer 18, and call center 20.
  • wireless carrier systems 14 includes vehicle 12, one or more wireless carrier systems 14, terrestrial communication network 16, computer 18, and call center 20.
  • the disclosed methods can be used with any number of different systems and are not specifically limited to the operating environments shown herein.
  • the architecture, construction, setup, and operation of system 10, as well as its individual components are generally known in the art.
  • the following paragraphs simply provide an overview of an example communication system 10, and other systems not shown herein can also use the disclosed methods.
  • the vehicle 12 can be implemented on a car or in the form of a car.
  • the example system can also be implemented on other vehicles or in the form of other vehicles, such as cars, trucks, motorcycles, buses, boats, airplanes, helicopters, lawn mowers, snow shovels, recreational vehicles, amusement park vehicles.
  • Other equipment such as agricultural equipment, construction equipment, trams, golf carts, trains and trams.
  • the robotic device can also be used to perform the methods and systems described herein.
  • Some vehicle hardware 28 is shown in FIG. 1, including an information communication unit 30, a microphone 32, one or more buttons or other control inputs 34, an audio system 36, a visual display 38, and a global position system (global position system, GPS) module 40 and a plurality of vehicle security modules (VSMs) 42.
  • VSMs vehicle security modules
  • Suitable network connections include controller area network (CAN), media oriented systems transport (MOST), local interconnect network (LIN), local area network (LAN) And other suitable connections, such as Ethernet or conforming to the International Organization for Standardization (ISO), the Society of Automotive Engineers (SAE), and the Institute of Electrical and Electronics Engineers (institute of electrical) And electronics engineers, IEEE) other connections to the standards and regulations, just to name a few.
  • CAN controller area network
  • MOST media oriented systems transport
  • LIN local interconnect network
  • LAN local area network
  • Ethernet such as Ethernet or conforming to the International Organization for Standardization (ISO), the Society of Automotive Engineers (SAE), and the Institute of Electrical and Electronics Engineers (institute of electrical) And electronics engineers, IEEE) other connections to the standards and regulations, just to name a few.
  • ISO International Organization for Standardization
  • SAE Society of Automotive Engineers
  • IEEE Institute of Electrical and Electronics Engineers
  • the information communication unit 30 may be an original equipment manufacturer (OEM) installation (embedded) or an accessory market device installed in the vehicle and capable of wirelessly sounding and/or wirelessly networked on the wireless carrier system 14. Or data communication. This enables the vehicle to communicate with the call center 20, other information-enabled vehicles, or some other entity or device.
  • the information communication unit preferably uses radio broadcasts to establish a communication channel (sound channel and/or data channel) with the wireless carrier system 14 such that sound and/or data transmissions can be transmitted and received over the channel.
  • the messaging unit 30 enables the vehicle to provide a variety of different services, including those associated with navigation, telephony, emergency, diagnostics, infotainment, and the like.
  • Data can be transmitted over a data connection (e.g., via packet data over a data channel, or via a voice channel using techniques known in the art).
  • voice communication e.g., having a field advisor or voice response unit at call center 20
  • data communication e.g, providing GPS location data or vehicle diagnostic data to call center 20
  • the system can utilize the sound A single call on the channel and switching between sound and data transmission on the sound channel as needed, which can be done using techniques known to those skilled in the art.
  • the short message service SMS can be used to send and receive data (eg, a packet data protocol (PDP)); the information communication unit can be configured to be mobile terminated and/or initiated, or configured to apply termination and/or initiate.
  • PDP packet data protocol
  • the information communication unit 30 utilizes cellular communication according to a global system for mobile communication (GSM) or code division multiple access (CDMA) standard, and thus includes standards for voice communication (eg, hands-free calling).
  • GSM global system for mobile communication
  • CDMA code division multiple access
  • the modem can be implemented by software stored in the information communication unit and executed by the processor 52, or it can be a separate hardware component located inside or outside the information communication unit 30.
  • the modem can use any number of different standards or protocols (eg EVDO (CDMA2000 1xEV-DO, EVDO), CDMA, general packet radio service (GPRS) and enhanced data rate for GSM) (enhanced data rate for GSM) Evolution, EDGE)) to run.
  • EVDO CDMA2000 1xEV-DO, EVDO
  • GPRS general packet radio service
  • GSM enhanced data rate for GSM
  • EDGE enhanced data rate for GSM
  • Wireless networking between the vehicle and other networked devices can also be performed using the information communication unit 30.
  • the information communication unit 30 can be configured to wirelessly communicate according to one or more wireless protocols (eg, any of IEEE 802.11 protocols, worldwide interoperability for microwave access (WiMAX), or Bluetooth) .
  • WiMAX worldwide interoperability for microwave access
  • the information communication unit can be configured to have a static IP address or can be set to be from the network.
  • Another device such as a router or automatically receives the assigned IP address from a network address server.
  • Processor 52 can be any type of device capable of processing electronic instructions, including microprocessors, microcontrollers, host processors, controllers, vehicle communication processors, and application specific integrated circuits (ASICs). It can be a dedicated processor for the information communication unit 30 only or can be shared with other vehicle systems. Processor 52 executes various types of digital storage instructions, such as software or firmware programs stored in memory 54, which enables the information communication unit to provide a wide variety of services. For example, processor 52 can execute a program or process data to perform at least a portion of the methods discussed herein.
  • the information communication unit 30 can be used to provide a range of vehicle services, including wireless communication with other portions from the vehicle.
  • vehicle services include: turn-by-turn direct 1 ns and other navigation-related services provided in conjunction with GPS-based vehicle navigation module 40; airbag deployment notifications and interface modules with one or more collision sensors (eg, The main body control module (not shown) combines the services provided by other emergency or roadside rescues. Diagnostic reports using one or more diagnostic modules. And entertainment related services, where music, web pages, movies, television shows, video games, and/or other information are downloaded by the infotainment module and stored for current or later playback.
  • the services listed above are by no means an exhaustive list of all the capabilities of the messaging unit 30, but merely an enumeration of some of the services that the messaging unit can provide.
  • the above modules can be implemented in the form of software instructions stored inside or outside of the information communication unit 30, which may be hardware components located inside or outside the information communication unit 30, or they may be integrated with each other. / or shared, or integrated and / or shared with other systems located throughout the vehicle, just to name a few possibilities.
  • the VSMs 42 located outside of the information communication unit 30 are in operation, they can exchange data and commands with the information communication unit 30 using the vehicle bus 44.
  • the GPS module 40 receives radio signals from the GPS satellites 60. From these signals, the GPS module 40 is able to determine the location of the vehicle that is used to provide the vehicle driver with navigation and other location-associated services.
  • the navigation information can be presented on display 38 (or other display within the vehicle) or can be rendered in a language, such as when providing steering navigation.
  • a navigation module (which may be part of the GPS module 40) within the dedicated vehicle can be used to provide navigation services, or some or all of the navigation services can be completed via the information communication unit 30, where the location information is sent to a remote location to facilitate Provide navigation maps, map annotations (points of interest, restaurants, etc.), route calculations, etc. for vehicles.
  • the location information can be provided to call center 20 or other remote computer system, such as computer 18, for other purposes, such as fleet management. And, new or updated map data can be downloaded from the call center 20 to the GPS module 40 via the information communication unit 30.
  • the vehicle 12 can include other vehicle security modules VSM in the form of electronic hardware components.
  • the other vehicle security modules VSM 42 are located throughout the vehicle, typically receiving input from one or more sensors. And use the sensed inputs to perform diagnostics, monitoring, control, reporting, and/or other functions.
  • Each of the VSMs 42 is preferably connected to other VSMs via a communication bus 44, also to the information communication unit 30, and can be programmed to run vehicle system and subsystem diagnostic tests.
  • one VSM 42 can be an engine control module (ECM) that controls various aspects of engine operation (eg, fuel ignition and ignition timing), and the other VSM 42 can be one that regulates the powertrain of the vehicle or
  • the operating powertrain control module of the plurality of components, and the other VSM 42 can be a body control module that manages various electrical components (like the power door locks and headlights of the vehicle) located throughout the vehicle.
  • the engine control module is equipped with on board diagnostics (OBD) features that provide a large amount of real-time data, such as data received from various sensors (including vehicle emission sensors), and provide a standardized series Diagnostic Trouble Code (DTS), which diagnoses fault codes allows technicians to quickly identify and repair faults within the vehicle.
  • OBD on board diagnostics
  • DTS Diagnostic Trouble Code
  • the vehicle electronics 28 also includes a plurality of vehicle user interfaces that provide vehicle occupants with means for providing and/or receiving information, including a microphone 32, buttons 34, an audio system 36, and a visual display 38.
  • vehicle user interface broadly includes any suitable form of electronic device, including hardware and software components that are located on a vehicle and that enable the vehicle user to communicate with components of the vehicle. Or communicate through the components of the vehicle.
  • the microphone 32 provides an audio input to the information communication unit to enable the driver or other occupant to provide voice commands and perform an hands-free escort via the wireless carrier system 14. For this purpose, it can be connected to an in-vehicle automated sound processing unit that utilizes human machine interface (HMI) technology known in the art.
  • HMI human machine interface
  • Button 34 allows a manual user to input to messaging unit 30 to initiate a wireless telephone call and provide other data, response or control inputs. Separate buttons can be used to initiate an emergency call as well as a regular service call to call center 20.
  • the audio system 36 provides audio output to the vehicle occupant and can be a dedicated stand-alone system or part of the host vehicle audio system. In accordance with the particular embodiment illustrated herein, audio system 36 is operatively coupled to vehicle bus 44 and entertainment bus 46, and is capable of providing amplitude modulation (AM), frequency modulation (FM), and satellite broadcast, digital. Digital versatile disc (DVD) and other multimedia features. This functionality can be provided in conjunction with or separately from the infotainment module described above.
  • AM amplitude modulation
  • FM frequency modulation
  • DVD digital versatile disc
  • Visual display 38 is preferably a graphical display, such as a touch screen on a dashboard or a heads-up display that is reflected from a windshield, and can be used to provide a variety of input and output functions.
  • a graphical display such as a touch screen on a dashboard or a heads-up display that is reflected from a windshield
  • Various other vehicle user interfaces can also be utilized, as the interface in Figure 1 is merely an example of a particular implementation.
  • the wireless carrier system 14 is preferably a cellular telephone system comprising a plurality of cellular towers 70 (only one shown), one or more mobile switching centers (MSCs) 72, and a wireless carrier system 14 coupled to the terrestrial network 16. Any other networking components required.
  • Each of the cellular towers 70 includes transmit and receive antennas and base stations, and base stations from different cellular towers are directly connected to the MSC 72 or to the MSC 72 via intermediate devices (e.g., base station controllers).
  • Cellular system 14 may implement any suitable communication technology including, for example, analog technologies (e.g., an advanced mobile phone system (AMPS)) or newer digital technologies (e.g., CDMA (e.g., CDMA2000) or GSM/GPRS).
  • each base station and the cellular tower can be co-located at the same location, or they can be located farther from each other, each base station can respond to a single cellular tower or a single base station can serve each cellular tower, and each base station can be coupled to a single MSC, which is merely an example Give a small set of possible settings.
  • different wireless carrier systems in the form of satellite communications can be used to provide one-way or two-way communication with the vehicle. This can be done using one or more communication satellites 62 and an uplink transmitting station 64.
  • the one-way communication can be, for example, a satellite broadcast service in which program content (news, music, etc.) is received by the transmitting station 64, packaged for uploading, and then transmitted to the satellite 62, which broadcasts the program to the user.
  • the two-way communication can be, for example, a satellite telephone service that relays telephone communications between the vehicle 12 and the station 64 using the satellite 62. If used, such a satellite phone can be attached to or used in place of the wireless carrier system 14.
  • the terrestrial network 16 may be a conventional land-based radio communication network that is coupled to one or more fixed telephones and that connects the wireless carrier system 14 to the call center 20.
  • terrestrial network 16 may include a public switched telephone network (PSTN), such as a PSTN that is used to provide wired telephone, packet switched data communications, and Internet infrastructure.
  • PSTN public switched telephone network
  • One or more portions of terrestrial network 16 can be accessed using standard wired networks, fiber optic or other optical networks, cable networks, power lines, other wireless networks (eg, wireless local area networks (WLAN)), or providing broadband wireless access. (broadband wireless access, BWA) network and any combination thereof to implement.
  • WLAN wireless local area networks
  • the terrestrial network 16 may also include one or more short message service centers (SMSCs) for storing, uploading, converting, and/or transmitting short messages (SMS) between the sender and the receiver. ).
  • SMSC short message service centers
  • the SMSC can receive an SMS message from the call center 20 or a content provider (eg, an external short message entity or ESME), and the SMSC can transmit the SMS message to the vehicle 12 (eg, a mobile terminal device). SMSCs and their functions are known to the skilled person.
  • call center 20 need not be connected via terrestrial network 16, but may include a wireless telephone device such that it can communicate directly with a wireless network (e.g., wireless carrier system 14).
  • Computer 18 can be one of a plurality of computers that are accessible via a private or public network, such as the Internet. Each such computer 18 can be used for one or more purposes, such as a vehicle that can access a web server via the information communication unit 30 and the wireless carrier 14. Other such accessible computers 18 can be, for example, a service center computer in which diagnostic information and other vehicle data can be uploaded from the vehicle via the information communication unit 30; the vehicle owner or other user is a client for use, for example, for the following purposes Computer: accessing or receiving vehicle data, or setting or configuring user parameters, or controlling the functionality of the vehicle; or third party library, whether by communicating with the vehicle 12 or the call center 20, or communicating with both, vehicle data Or other information is provided to or from the third party library.
  • the computer 18 can also be used to provide an internet connection, such as a domain name server (DNS) service, or as a means of assigning an IP address to a vehicle using a dynamic host configuration protocol (DHCP) or other suitable protocol. 12 network address server.
  • DNS domain name server
  • the call center 20 is designed to provide a variety of different system backend functions to the vehicle electronics 28, and according to the exemplary embodiment shown herein, the call center 20 typically includes one or more switches 80, servers 82, databases 84. On-site consultant 86, and automatic voice response system (VRS) 88, all of which are known in the prior art. These various call center components are preferably coupled to each other via a wired or wireless local area network 90.
  • Switch 80 can be a private branch exchange (PBX) that routes incoming signals such that voice transmissions are typically sent to field consultant 86 via a regular telephone or to automated voice response system 88 using VoIP.
  • PBX private branch exchange
  • the on-site advisory phone can also use voice over Internet phone (VoIP), as indicated by the dashed line in Figure 1.
  • VoIP voice over Internet phone
  • VoIP and other data communications through switch 80 are implemented via a modem (not shown) connected between switch 80 and network 90.
  • Data transfer is passed to server 82 and/or database 84 via a modem.
  • the database 84 is capable of storing account information such as user authentication information, vehicle identifiers, data profile records, behavioral patterns, and other related user information.
  • Data transmission can also be performed by a wireless system, such as 802.1lx, GPRS, and the like.
  • SMS short message service
  • PDP public data packet data
  • call center 20 can be configured to terminate and/or initiate mobile, or configured to terminate and/or initiate applications.
  • the call center can instead use VRS 88 as an automated advisor, or VRS 88 and on-site consultant 86. The combination can be used.
  • FIG. 2 is a functional block diagram of an example vehicle 12 provided by an embodiment of the present invention.
  • Components coupled to or included in vehicle 12 may include propulsion system 102, sensor system 104, control system 106, peripherals 108, power source 110, computing device 111, and user interface 112.
  • Computing device 111 can include a processor 113 and a memory 114.
  • Computing device 111 may be part of a controller or controller of vehicle 12.
  • the memory 114 can include instructions 115 that the processor 113 can run, and can also store map data 116.
  • the components of the vehicle 12 can be configured to operate in a manner interconnected with each other and/or with other components coupled to the various systems.
  • power source 110 can provide power to all components of vehicle 12.
  • Computing device 111 can be configured to receive data from, and control, propulsion system 102, sensor system 104, control system 106, and peripherals 108. Computing device 111 can be configured to generate a display of images on user interface 112 and receive input from user interface 112.
  • vehicle 12 may include more, fewer, or different systems, and each system may include more, fewer, or different components. Moreover, the systems and components shown may be combined or divided in any number of ways.
  • the propulsion system 102 can be used to provide power motion to the vehicle 12. As shown, the propulsion system 102 includes an engine/engine 118, an energy source 120, a transmission 122, and a wheel/tire 124.
  • Engine/engine 118 may be or include any combination of internal combustion engine, electric motor, steam engine, and Stirling engine. Other engines and engines are also possible.
  • propulsion system 102 can include multiple types of engines and/or engines.
  • a gas-electric hybrid car may include a gasoline engine and an electric motor. Other examples are possible.
  • Energy source 120 may be a source of energy that is fully or partially powered to engine/engine 118. That is, the engine/engine 118 can be used to convert the energy source 120 to mechanical energy. Examples of energy source 120 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas based fuels, ethanol, solar panels, batteries, and other sources of electrical power. The energy source(s) 120 may additionally or alternatively include any combination of fuel tanks, batteries, capacitors, and/or flywheels. In some examples, energy source 120 may also provide energy to other systems of vehicle 12 .
  • Transmission 122 can be used to transfer mechanical power from engine/engine 118 to wheel/tire 124.
  • the transmission 122 can include a gearbox, a clutch, a differential, a drive shaft, and/or other components.
  • the drive shaft includes one or more shafts for coupling to the wheel/tire 124.
  • the wheel/tire 124 of the vehicle 12 can be configured in a variety of forms, including a single wheeled vehicle, a bicycle/motorcycle, a tricycle, or a car/truck four wheel form. Other wheel/tire forms are also possible, such as those that include six or more wheels.
  • the wheel/tire 124 of the vehicle 12 can be configured to rotate differentially relative to the other wheels/tires 124.
  • the wheel/tire 124 can include at least one wheel that is fixedly attached to the transmission 122 and at least one tire that is coupled to the driving surface and that is coupled to the edge of the wheel.
  • Wheel/tire 124 may comprise any combination of metal and rubber, or a combination of other materials.
  • Propulsion system 102 may additionally or alternatively include components in addition to those shown.
  • Sensor system 104 may include a number of sensors for sensing information regarding the environment in which vehicle 12 is located. As shown, the sensors of the sensor system include a GPS 126, an inertial measurement unit (IMU) 128, a radio detection and radar ranging (RADAR) unit 130, a laser ranging (LIDAR) unit 132, a camera 134, and Actuator 136 to modify the position and/or orientation of the sensor. Sensor system 104 may also include additional sensors including, for example, sensors that monitor the internal system of vehicle 12 (eg, O2 monitor, fuel gauge, oil temperature, etc.). Sensor system 104 may also include other sensors.
  • IMU inertial measurement unit
  • RADAR radio detection and radar ranging
  • LIDAR laser ranging
  • Actuator 136 to modify the position and/or orientation of the sensor.
  • Sensor system 104 may also include additional sensors including, for example, sensors that monitor the internal system of vehicle 12 (eg, O2 monitor, fuel gauge, oil temperature, etc.). Sensor system 104 may also include other sensors
  • the GPS module 126 can be any sensor for estimating the geographic location of the vehicle 12.
  • the GPS module 126 may include a transceiver that estimates the position of the vehicle 12 relative to the earth based on satellite positioning data.
  • computing device 111 can be used in conjunction with map data 116 to use GPS module 126 to estimate the location of a lane boundary on a road on which vehicle 12 can travel.
  • the GPS module 126 can take other forms as well.
  • IMU 128 may be used to sense changes in position and orientation of vehicle 12 based on inertial acceleration and any combination thereof.
  • the combination of sensors can include, for example, an accelerometer and a gyroscope. Other combinations of sensors are also possible.
  • the RADAR unit 130 can be viewed as an object detection system for detecting the characteristics of an object using radio waves, such as the distance, height, direction or speed of the object.
  • the RADAR unit 130 can be configured to transmit radio waves or microwave pulses that can bounce off any object in the course of the wave.
  • the object may return a portion of the energy of the wave to a receiver (eg, a dish or antenna), which may also be part of the RADAR unit 130.
  • the RADAR unit 130 can also be configured to perform digital signal processing on the received signal (bounce from the object) and can be configured to identify the object.
  • LIDAR Light Detection and Ranging
  • the LIDAR unit 132 includes a sensor that uses light to sense or detect objects in the environment in which the vehicle 12 is located.
  • LIDAR is an optical remote sensing technique that can measure the distance to a target or other attribute of a target by illuminating the target with light.
  • LIDAR unit 132 can include a laser source and/or a laser scanner configured to emit laser pulses, and a detector for receiving reflections of the laser pulses.
  • the LIDAR unit 132 can include a laser range finder that is reflected by a rotating mirror and scans the laser around the digitized scene in one or two dimensions to acquire distance measurements at specified angular intervals.
  • LIDAR unit 132 may include components such as light (eg, laser) sources, scanners and optical systems, photodetectors, and receiver electronics, as well as position and navigation systems.
  • the LIDAR unit 132 can be configured to image an object using ultraviolet (UV), visible, or infrared light, and can be used for a wide range of targets, including non-metallic objects.
  • a narrow laser beam can be used to map physical features of an object with high resolution.
  • wavelengths in the range of from about 10 microns (infrared) to about 250 nanometers (UV) can be used.
  • Light is typically reflected via backscattering.
  • Different types of scattering are used for different LIDAR applications such as Rayleigh scattering, Mie scattering and Raman scattering, and fluorescence.
  • LIDAR can thus be referred to as Rayleigh laser RADAR, Mie LIDAR, Raman LIDAR, and sodium/iron/potassium fluorescent LIDAR.
  • Appropriate combinations of wavelengths may allow remote mapping of objects, for example by looking for wavelength dependent changes in the intensity of the reflected signal.
  • Three-dimensional (3D) imaging can be achieved using both a scanned LIDAR system and a non-scanning LIDAR system.
  • “3D gated viewing laser radar” is an example of a non-scanning laser ranging system that uses a pulsed laser and a fast gating camera.
  • Imaging LIDAR can also use high speed detector arrays that are typically built on a single chip using complementary metal oxide semiconductor (CMOS) and hybrid complementary metal oxide semiconductor/charge coupled device (CCD) fabrication techniques. And modulating the sensitive detector array to perform.
  • CMOS complementary metal oxide semiconductor
  • CCD charge coupled device
  • each pixel can be locally processed by high speed demodulation or gating such that the array can be processed to represent an image from the camera.
  • thousands of pixels can be acquired simultaneously to create a 3D point cloud representing the object or scene detected by the LIDAR unit 132.
  • a point cloud can include a set of vertices in a 3D coordinate system. These vertices may be defined, for example, by X, Y, Z coordinates and may represent the outer surface of the object.
  • the LIDAR unit 132 can be configured to create a point cloud by measuring a large number of points on the surface of the object, and can output the point cloud as a data file. As a result of the 3D scanning process of the object through the LIDAR unit 132, the point cloud can be used to identify and visualize the object.
  • the point cloud can be rendered directly to visualize the object.
  • a point cloud may be converted to a polygonal or triangular mesh model by a process that may be referred to as surface reconstruction.
  • Example techniques for converting a point cloud to a 3D surface may include a Delaunay triangulation, an alpha shape, and a rotating sphere. These techniques include building a network of triangles on existing vertices of a point cloud.
  • Other example techniques may include converting a point cloud to a volumetric distance field, and reconstructing such an implicit surface as defined by a moving cube algorithm.
  • Camera 134 can be used to capture any camera (eg, a still camera, video camera, etc.) of an image of the environment in which vehicle 12 is located. To this end, the camera can be configured to detect visible light, or can be configured to detect light from other portions of the spectrum, such as infrared or ultraviolet light. Other types of cameras are also possible. Camera 134 can be a two-dimensional detector or can have a three-dimensional spatial extent. In some examples, camera 134 can be, for example, a distance detector configured to generate a two-dimensional image indicative of the distance from camera 134 to several points in the environment. To this end, camera 134 can use one or more distance detection techniques.
  • a distance detector configured to generate a two-dimensional image indicative of the distance from camera 134 to several points in the environment. To this end, camera 134 can use one or more distance detection techniques.
  • camera 134 can be configured to use structured light technology in which vehicle 12 illuminates an object in the environment with a predetermined light pattern, such as a grid or checkerboard pattern, and uses camera 134 to detect reflections from a predetermined light pattern of the object. . Based on the distortion in the reflected light pattern, the vehicle 12 can be configured to detect the distance of a point on the object.
  • the predetermined light pattern may include infrared light or light of other wavelengths.
  • Actuator 136 can be configured, for example, to modify the position and/or orientation of the sensor.
  • Sensor system 104 may additionally or alternatively include components in addition to those shown.
  • Control system 106 can be configured to control the operation of vehicle 12 and its components. To this end, control system 106 can include steering unit 138, throttle 140, braking unit 142, sensor fusion algorithm 144, computer vision system 146, navigation or routing system 148, and obstacle avoidance system 150.
  • Steering unit 138 may be any combination of mechanisms configured to adjust the direction or direction of advancement of vehicle 12.
  • the throttle 140 may be any combination of mechanisms configured to control the operating speed and acceleration of the engine/engine 118 and thereby control the speed and acceleration of the vehicle 12.
  • Brake unit 142 may be any combination of mechanisms configured to decelerate vehicle 12.
  • the brake unit 142 can use friction to slow the wheel/tire 124.
  • the braking unit 142 can be configured to regeneratively convert the kinetic energy of the wheel/tire 124 into a current.
  • Brake unit 142 can take other forms as well.
  • Sensor fusion algorithm 144 may include, for example, an algorithm (or a computer program product that stores the algorithm) that computing device 111 may operate. Sensor fusion algorithm 144 can be configured to accept data from sensor 104 as an input. The data may include, for example, data representing information sensed at the sensors of sensor system 104. Sensor fusion algorithm 144 may include, for example, a Kalman filter, a Bayesian network, or another algorithm. The sensor fusion algorithm 144 may also be configured to provide various ratings based on data from the sensor system 104, including, for example, an assessment of individual objects and/or features in the environment in which the vehicle 12 is located, an assessment of a particular situation, and/or An assessment based on the likely impact of a particular situation. Other evaluations are also possible.
  • Computer vision system 146 may be any system configured to process and analyze images captured by camera 134 to identify objects and/or features in the environment in which vehicle 12 is located, such as lane information, traffic, for example Signals and obstacles. To this end, computer vision system 146 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, or other computer vision techniques. In some examples, computer vision system 146 may additionally be configured as a mapping environment, following an object, estimating the speed of an object, and the like.
  • SFM structure from motion
  • Navigation and route control system 148 may be any system configured to determine the driving route of vehicle 12.
  • the navigation and route control system 148 can additionally be configured to dynamically update the driving route while the vehicle 12 is in operation.
  • navigation and route control system 148 can be configured to combine data from sensor fusion algorithm 144, GPS module 126, and one or more predetermined maps to determine a driving route for vehicle 12.
  • the obstacle avoidance system 150 can be any system configured to identify, evaluate, and avoid or otherwise cross obstacles in the environment in which the vehicle 12 is located.
  • Control system 106 may additionally or alternatively include components in addition to those shown.
  • Peripheral device 108 can be configured to allow vehicle 12 to interact with external sensors, other vehicles, and/or users.
  • peripheral device 108 can include, for example, wireless communication system 152, touch screen 154, microphone 156, and/or speaker 158.
  • Wireless communication system 152 can be any system configured to be wirelessly coupled to one or more other vehicles, sensors, or other entities, either directly or via a communication network.
  • the wireless communication system 152 can include an antenna and chipset for communicating with other vehicles, sensors, or other entities, either directly or through an air interface.
  • the chipset or the entire wireless communication system 152 can be arranged to communicate in accordance with one or more other types of wireless communications (e.g., protocols) such as those described in Bluetooth, IEEE 802.11 (including any IEEE 802.11 revision).
  • Wireless communication system 152 can take other forms as well.
  • cellular technology such as GSM, CDMA, universal mobile telecommunications system (UMTS), EV-DO, WiMAX or long term evolution (LTE)
  • ZigBee dedicated short-range communication (dedic short) Range communications, DSRC) and radio frequency identification (RFID) communications, and the like.
  • RFID radio frequency identification
  • Touch screen 154 can be used by a user to enter commands into vehicle 12.
  • the touch screen 154 can be configured to sense at least one of a position and a movement of a user's finger via a capacitive sensing, a resistive sensing, or a surface acoustic wave process or the like.
  • the touch screen 154 may be capable of sensing finger movement in a direction parallel to the touch screen surface or in the same plane as the touch screen surface, in a direction perpendicular to the touch screen surface, or in both directions, and may also be capable of sensing application to The level of pressure on the surface of the touch screen.
  • Touch screen 154 may be formed from one or more translucent or transparent insulating layers and one or more translucent or transparent conductive layers. Touch screen 154 can take other forms as well.
  • Microphone 156 can be configured to receive audio (eg, a voice command or other audio input) from a user of vehicle 12.
  • the speaker 158 can be configured to output audio to a user of the vehicle 12.
  • Peripheral device 108 may additionally or alternatively include components in addition to those shown.
  • the power source 110 can be configured to provide power to some or all of the components of the vehicle 12.
  • the power source 110 can include, for example, a rechargeable lithium ion or lead acid battery.
  • one or more battery packs can be configured to provide power.
  • Other power materials and configurations are also possible.
  • power source 110 and energy source 120 can be implemented together, as in some all-electric vehicles.
  • Processor 113 included in computing device 111 may include one or more general purpose processors and/or one or more special purpose processors (eg, image processors, digital signal processors, etc.). Insofar as the processor 113 includes more than one processor, such processors can work individually or in combination. Computing device 111 may implement the function of controlling vehicle 12 based on input received through user interface 112.
  • the memory 114 can include one or more volatile storage components and/or one or more non-volatile storage components, such as optical, magnetic, and/or organic storage devices, and the memory 114 can be fully or partially coupled to the processor 113. integrated.
  • Memory 114 may include instructions 115 (eg, program logic) executable by processor 113 to perform various vehicle functions, including any of the functions or methods described herein.
  • the components of the vehicle 12 can be configured to operate in a manner interconnected with other components internal and/or external to their respective systems. To this end, the components and systems of the vehicle 12 can be communicatively linked together via a system bus, network, and/or other connection mechanism.
  • FIG. 3 is a schematic structural view of a vehicle according to an embodiment of the present invention.
  • the terminal 300 (taking the vehicle as an example) includes a processor 301, a memory 302, a camera 303, an RF circuit 304, an audio circuit 305, a speaker 306, a microphone 307, an input device 308, other input devices 309, a display screen 310, and a touch.
  • the display screen 310 is composed of at least a touch panel 311 as an input device and a display panel 312 as an output device.
  • the terminal structure shown in FIG. 3 does not constitute a limitation on the terminal, and may include more or less components than those illustrated, or combine some components, or split some components, or different. The component arrangement is not limited herein.
  • the components of the terminal 300 will be specifically described below with reference to FIG. 3:
  • the radio frequency (RF) circuit 304 can be used for transmitting and receiving information or during the call, and receiving and transmitting the signal. For example, if the terminal 300 is an in-vehicle device, the terminal 300 can send the downlink information sent by the base station through the RF circuit 304. After receiving, it is transmitted to the processor 301 for processing; in addition, data related to the uplink is transmitted to the base station.
  • RF circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like.
  • RF circuitry 304 can also communicate with the network and other devices via wireless communication.
  • the wireless communication can use any communication standard or protocol, including but not limited to global system for mobile communication (GSM), general packet radio service (GPRS), code division multiple access (code division) Multiple access (CDMA), wideband code division multiple access (WCDMA), long term evolution (LTE), e-mail, short messaging service (SMS), and the like.
  • GSM global system for mobile communication
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • SMS short messaging service
  • the memory 302 can be used to store software programs and modules, and the processor 301 executes various functional applications and data processing of the terminal 300 by running software programs and modules stored in the memory 302.
  • the memory 302 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (for example, a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored. Data (such as audio data, video data, etc.) created according to the use of the terminal 300, and the like.
  • memory 302 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • Other input devices 309 can be used to receive input numeric or character information, as well as to generate key signal inputs related to user settings and function control of terminal 300.
  • other input devices 309 may include, but are not limited to, a physical keyboard, function keys (eg, volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, light rats (light mice are touches that do not display visual output)
  • function keys eg, volume control buttons, switch buttons, etc.
  • trackballs mice
  • mice joysticks
  • light rats light mice are touches that do not display visual output
  • One or more of a sensitive surface, or an extension of a touch sensitive surface formed by a touch screen may further include a sensor built in the terminal 300, such as a gravity sensor, an acceleration sensor, etc., and the terminal 300 may also use the parameter detected by the sensor as input data.
  • the display screen 310 can be used to display information input by the user or information provided to the user as well as various menus of the terminal 300, and can also accept user input.
  • the display panel 312 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel 311 is also called a touch screen or a touch sensitive screen.
  • the touch panel 311 may further include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the touch orientation and posture of the user, and detects a signal brought by the touch operation, and transmits a signal to the touch controller; the touch controller receives the touch information from the touch detection device, and converts it into the processor 301.
  • the information that can be processed is transmitted to the processor 301, and the commands sent from the processor 301 can also be received and executed.
  • the touch panel 311 can be implemented by using various types such as resistive, capacitive, infrared, and surface acoustic waves, and the touch panel 311 can be implemented by any technology developed in the future.
  • the touch panel 311 can cover the display panel 312, and the user can cover the display panel 312 according to the content displayed by the display panel 312 (including but not limited to a soft keyboard, a virtual mouse, a virtual button, an icon, etc.).
  • the operation is performed on or near the touch panel 311.
  • the touch panel 111 detects the operation thereon or nearby, the touch panel 111 transmits to the processor 301 to determine the user input, and then the processor 301 provides the display panel 312 according to the user input.
  • Corresponding visual output is performed in FIG. 3, the touch panel 311 and the display panel 312 are used as two independent components to implement the input and output functions of the terminal 300, in some embodiments, the touch panel 311 can be integrated with the display panel 312. To implement the input and output functions of the terminal 300.
  • the RF circuit 304, the speaker 306, and the microphone 307 can provide an audio interface between the user and the terminal 300.
  • the audio circuit 305 can transmit the converted audio data to the speaker 306 and convert it into a sound signal output by the speaker 306.
  • the microphone 307 can convert the collected sound signal into a signal, which is received by the audio circuit 305. It is then converted to audio data, which is then output to RF circuitry 304 for transmission to a device such as another terminal, or audio data is output to memory 302 for processor 301 to perform further processing in conjunction with the content stored in memory 302.
  • the camera 303 can acquire image frames in real time and transmit them to the processor 301 for processing, and store the processed results to the memory 302 and/or present the processed results to the user via the display panel 312.
  • Processor 301 is the control center of terminal 300, which connects various portions of the entire terminal 300 using various interfaces and lines, by running or executing software programs and/or modules stored in memory 302, and recalling data stored in memory 302. The various functions and processing data of the terminal 300 are executed to perform overall monitoring of the terminal 300.
  • the processor 301 may include one or more processing units; the processor 301 may further integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface (UI) And the application, etc., the modem processor mainly handles wireless communication. It can be understood that the above modem processor may not be integrated into the processor 301.
  • the terminal 300 may further include a power source 314 (for example, a battery) for supplying power to the respective components.
  • a power source 314 for example, a battery
  • the power source 314 may be logically connected to the processor 301 through the power management system, thereby managing charging, discharging, and the like through the power management system. And power consumption and other functions.
  • the terminal 300 may further include a Bluetooth module, a sensor, and the like, and details are not described herein again.
  • the audio circuit 305 may specifically include a digital signal processing (DSP) and a codec 401 module, wherein the codec sub-module implements analog to digital/digital to analog (AD/DA) conversion.
  • DSP digital signal processing
  • AD/DA analog to digital/digital to analog
  • the DSP sub-module implements the processing of the speech algorithm.
  • the terminal takes the terminal as an in-vehicle device as an example.
  • the booting process of the car is as follows:
  • the car machine determines the number of people in the car through voiceprint recognition technology.
  • Voiceprint recognition is a kind of biometric technology, also known as speaker recognition. There are two types, namely speaker recognition and speaker confirmation. This article refers to the confirmation of the speaker.
  • Voiceprint recognition is the conversion of sound signals into electrical signals, which are then identified by a computer. Specifically in this paper, how many people in the car can be identified by the voiceprint recognition technology by judging how many different voiceprints are in the car.
  • the so-called voiceprint is a sound wave spectrum that carries speech information displayed by an electroacoustic instrument.
  • the generation of human language is a complex physiological and physical process between the human language center and the vocal organs.
  • the vocal organs used by people in speech--tongue, teeth, throat, lungs, and nasal cavity vary greatly in size and shape. , so the soundprints of any two people are different.
  • Each person's speech acoustic characteristics are both relatively stable and variability, not absolute and immutable. This variation can come from physiology, pathology, psychology, simulation, camouflage, and also related to environmental disturbances.
  • Voiceprint is a way of identifying, and the number of people in the car can also be identified by one or more of, for example, iris information, facial information, fingerprint information, infrared, sensing data, and voice inquiries.
  • an image recognized by the iris can be obtained by photographing the camera on the vehicle, and the vehicle judges whether there are many people beside the vehicle by judging whether there is a different iris.
  • an image of the face recognition can be obtained by photographing the camera on the vehicle, and the vehicle determines whether there are many people beside the vehicle by judging whether there are different facial characteristics.
  • the number of people in the vehicle can be determined by an infrared sensor on the vehicle or in the vehicle.
  • the number of people in the car can be determined by sensing data. Specifically, the number of people in the car can be determined by detecting the number of people by the pressure sensor of the seat on the car.
  • the number of people in the car can be judged by voice inquiry.
  • the car machine can determine how many people are in the car by asking how many people in the car, and then the passengers in the car or the driver answering.
  • voiceprint recognition only recognizes one person
  • other ways of identifying the number of people in the car are recognized as multiple people, and then it is handled in a multi-person manner.
  • step 404 if the vehicle determines that there is only one person in the vehicle (ie, only the driver), go to step 404, otherwise go to step 405.
  • the vehicle enters a wake-free interaction mode.
  • the wake-free interaction mode here refers to the interaction mode that does not require wake-up words. For example, the driver can say "navigation home" to the car. Subsequently, in response to this wake-free command, the vehicle will perform an operation of navigating home.
  • the vehicle enters a wake-up question and answer interaction mode.
  • the wake-and-answer interaction mode here refers to the human-computer interaction that needs to wake up words. For example, the driver or the passenger in the car can wake up the car and say "Hello, Xiaochi". Subsequently, in response to this awakened command, the vehicle responded with a reply saying "What can help you?". After that, the driver or the passenger in the car can say "Please navigate home.” Thereafter, the vehicle performs an operation of navigating home in response to the instruction.
  • the driver or the passenger in the car can directly speak the voice of the wake-up word with the car, such as "Xiaochi I am going to the airport"; among them, "Xiaochi” is the wake-up word. Subsequently, the vehicle performs an operation of navigating to the airport in response to this command.
  • the vehicle determines whether the number of people in the vehicle determines whether to activate the wake-free mode, thereby reducing the number of times the user in the vehicle interacts with the vehicle through the wake-up word, thereby improving the user experience.
  • user A speaks the first voice message.
  • the voice of the first user A is collected, and the voiceprint feature of the user A is marked.
  • the first voice information refers to the first voice message received by the voice device just after voice recognition is turned on.
  • the vehicle determines whether the first voice information is a specific instruction of the vehicle.
  • the vehicle determines whether the first voice information is a specific instruction of the vehicle, and refers to whether the vehicle determines whether there is a strong command related to the function of the vehicle.
  • some specific instructions may be pre-stored in the local or cloud database of the vehicle; when the vehicle receives a voice, it may be determined whether the voice completely matches the pre-stored instruction in the database or the matching degree is greater than a certain threshold. If the match is complete or the match is high, the voice belongs to the device strong related instruction (pre-defined), which is a specific instruction. For example, the vehicle judges whether there are instruction words of "immediately" and "navigation" in the first piece of voice information.
  • the vehicle determines that the first voice message is a vehicle-specific command. If the vehicle determines that only the word "navigation" exists in the first piece of voice information, the car machine determines that the first piece of voice information is not a specific instruction of the car machine. If the vehicle determines that the first voice message is a device-specific command, then S503 is performed. If the vehicle determines that the first piece of voice information is not a device-specific instruction, then S504 is performed. For example, the "immediate navigation to a certain place" is a device-related command, and it is likely to ask the car equipment; for example, "Is it good to sleep last night", it is not a strong related instruction, more likely to ask next to People.
  • S503 when the vehicle determines that the first voice information is a device-specific instruction, the vehicle performs a related voice response.
  • S502 and S503 are optional.
  • the vehicle determines that the first voice information is not a device-specific instruction, the vehicle displays the command content on the display screen. Among them, this step is optional.
  • S504 if the vehicle determines that the first piece of voice information is not a specific instruction, then S505 is performed.
  • S504 may or may not be executed.
  • the car is delayed by X seconds.
  • X can be 3 seconds, or 4 seconds, or 5 seconds.
  • the delay here may be to determine whether an answer of other users in the car is received within the first time period after receiving the first piece of voice information.
  • the first voice message and the answers of other users in the car have different voiceprint information.
  • the car machine uses voiceprint technology to determine whether there are other users in the car to answer. If there is no user answer, it proceeds to S507. Otherwise, go to S508.
  • the car machine feedbacks the voice response to the user A, and records that there is only one person in the car.
  • the voice response that the car machine feeds back to User A can be "OK, and will perform XX operations for you.”
  • the car machine gives up the voice response and records that there are many people in the car. That is, if the second user B's voice reply is collected by the voiceprint recognition technology during the delayed voice response, the voice response is discarded, and there are many people in the mark.
  • the vehicle determines that there are several people in the vehicle, and then records the number of people in the vehicle in the vehicle, so that the vehicle can determine whether to adopt the wake-up word according to several people in the vehicle, thereby improving the vehicle and the person.
  • the efficiency of the interaction is the efficiency of the interaction.
  • the space in which an intelligent voice device is located is an enclosed space, a semi-enclosed space, or an open space.
  • the enclosed space may be an enclosed space formed by a continuous curved surface.
  • the semi-enclosed space may be a semi-enclosed space composed of a non-closed curved surface, for example, a space formed by a room in which a room door is opened.
  • the open space can be an open space or a space that is not enclosed by any spatial surface.
  • the voice device may specifically be an intelligent voice device, and is used for implementing voice input (ie, receiving external voice and converting the voice into an electrical signal), voice recognition, and implementing a voice required function.
  • the semi-enclosed space or the open space may be a spherical space having a radius of communication distance of the intelligent voice device.
  • the space where the smart voice device is located is the closed space.
  • the radius of the enclosed space here may refer to the distance of half of the longest side of the enclosed space.
  • the car is an enclosed space.
  • the car has a radius of half the length of the car.
  • the communication distance of the smart voice device in the vehicle is the radius of the ball C1
  • the radius of the car is exactly equal to the radius of the ball C1
  • the space in which the smart voice device is located is the closed space.
  • the space where the intelligent voice device is located is the closed space surrounded by the vehicle body. Rather than being a spherical space with a radius of communication distance (radius of C2) of the intelligent voice device.
  • car equipment including car smart rearview mirrors, car smart speakers and car smart TVs.
  • car smart rearview mirrors There are many types of car equipment, including car smart rearview mirrors, car smart speakers and car smart TVs.
  • the following uses examples to describe how the wake-up words are specifically used in the embodiments of the present invention.
  • the vehicle's smart rearview mirror is turned on and begins to collect ambient sound.
  • the in-vehicle smart rearview mirror collects the surrounding sound through the microphone 307 in FIG.
  • the car smart rearview mirror determines the number of people in the car through voiceprint recognition technology.
  • Voiceprint recognition is a kind of biometric technology, also known as speaker recognition. There are two types, namely speaker recognition and speaker confirmation. Different voiceprint recognition techniques are used for different tasks and applications. For example, when narrowing the scope of criminal investigation, it may be necessary to identify the technology, and the bank transaction needs to confirm the technology.
  • Voiceprint recognition is the conversion of sound signals into electrical signals, which are then identified by a computer. Specifically in this paper, voiceprint recognition technology can be used to identify and confirm how many people are in the car.
  • the vehicle-mounted smart rearview mirror determines that there is only one person in the car (ie, only the driver), and the vehicle-mounted smart rearview mirror enters the wake-free interaction mode.
  • the wake-free interaction mode here refers to the interaction mode that does not require wake-up words.
  • the driver can say "navigation to the airport" with the on-board smart rearview mirror.
  • the in-vehicle smart rearview mirror will perform navigation to the airport in response to this wake-free command.
  • the onboard smart rearview mirror can report "Start to navigate to the airport for you" in response to the driver's instructions.
  • the vehicle intelligent rearview mirror determines that there are many people in the vehicle, the vehicle intelligent rearview mirror enters a wake-up question and answer interaction mode.
  • the wake-and-answer interaction mode here refers to the human-computer interaction that needs to wake up words.
  • the driver or the passenger in the car can wake up the car smart rearview mirror and say "Hello, Xiaochi".
  • the in-vehicle smart rearview mirror responded to a reply in response to a reply saying "What can help you?".
  • the driver or the passenger in the car can say "I am going to the airport.”
  • the in-vehicle smart rearview mirror performs an operation of navigating to the airport in response to the instruction.
  • the onboard smart rearview mirror can report "Start to navigate to the airport for you" in response to the driver's instructions.
  • FIG. 9 it is a schematic diagram of the internal structure of a car.
  • the car smart speaker is integrated into the car center control panel.
  • the in-vehicle smart speaker is different from the existing speaker, and the in-vehicle smart speaker includes a sound collecting device (for example, a microphone).
  • the driver of the car or the passenger in the car can communicate with the car smart speaker via the microphone and speaker on the car smart speaker.
  • the car smart speaker uses the voiceprint recognition technology to determine the number of people in the car.
  • the car smart TV is integrated on the top of the car.
  • the in-vehicle smart TV differs from the existing television in that the in-vehicle smart television includes a sound pickup device (for example, a microphone).
  • the driver of the car or the passenger in the car can communicate with the car smart TV via the microphone and speaker on the car smart TV.
  • the car smart TV uses the voiceprint recognition technology to determine the number of people in the car.
  • the car smart TV and car smart speaker in the car can be used in the home field, specifically corresponding to the home smart speaker and home smart TV.
  • FIG. 11 it is a schematic diagram of a home smart speaker.
  • the in-vehicle smart speaker is different from the existing speaker, and the in-vehicle smart speaker includes a sound collecting device (for example, a microphone).
  • the home can communicate with the home smart speaker through the microphone and speaker on the home smart speaker.
  • the car smart speaker judges the number of people in the home through voiceprint recognition technology.
  • the in-vehicle smart TV differs from the existing television in that the in-vehicle smart television includes a sound pickup device (for example, a microphone). People in the home can communicate with the home smart TV via microphones and speakers on the home smart TV. The car smart TV judges the number of people in the home through voiceprint recognition technology.
  • a sound pickup device for example, a microphone
  • the home smart TV judges one or more people in the home, whether and how to use the wake-up words can refer to the processing method of the car smart rearview mirror. For example, when it is judged that there are two people in the home in FIG. 12, the home smart television enters a wake-up question and answer interaction mode.
  • the wake-and-answer interaction mode here refers to the human-computer interaction that needs to wake up words. For example, a mother or daughter can wake up the family smart TV and say "Hello, TV.” Subsequently, in response to this awakened command, the home smart TV can respond to a reply saying "What can help you?".
  • the home smart television performs an operation of watching a live television broadcast in response to the instruction.
  • the home smart TV can broadcast "Start selecting a live channel for you" in response to a mother or daughter's instruction.
  • the smart voice device may include: a processor 301, configured to determine a number of people in a space where the smart voice device is located; and determine a number of people in the space.
  • the smart voice device is controlled to enter a wake-free voice interaction mode.
  • the wake-free voice interaction method here refers to a voice interaction method without using a wake-up word.
  • the user can say "navigation to XX place", the voice device (for example, smart voice device) will perform navigation to the XX place operation without replying "What can help you?"
  • the smart voice device further includes:
  • a collector eg, a microphone 307 for collecting a first voice in the space
  • the processor 301 is configured to determine the number of people in the space, including:
  • the processor is configured to determine whether the first voice is a specific instruction of the smart voice device; if the first voice is not a specific instruction of the smart voice device, delay X seconds; determine whether there is The first voice has a second voice with different voiceprint characteristics; when it is determined that the second voice command has a different voiceprint characteristic with the first voice, determining that there are multiple people in the space;
  • the smart voice device further includes:
  • a collector eg, a microphone 307 for collecting a first voice in the space
  • the processor 301 is configured to determine the number of people in the closed space, including:
  • the processor is configured to determine whether the first voice is a specific instruction of the smart voice device; if the first voice is not a specific instruction of the smart voice device, delay X seconds; determine whether there is The first voice has a second voice with different voiceprint characteristics; when it is determined that the second voice command does not have a different voiceprint characteristic with the voice, it is determined that there is only one person in the closed space;
  • the space in which the intelligent voice device is located is an enclosed space, a semi-enclosed space or an open space.
  • the semi-enclosed space or open space is a spherical space having a radius of communication distance of the intelligent voice device.
  • the space where the smart voice device is located is the closed space.
  • the processor includes 4 high-speed processing cores and 4 low-speed processing cores.
  • Each of the four high-speed processing cores and a corresponding secondary cache are combined to form a high-speed core processing area.
  • Each of the four low-speed processing cores and a corresponding second-level buffer are combined to form a low-speed core processing area.
  • the high speed processing core may refer to a processing core having a processing frequency of 2.1 GHz (hertz).
  • the low speed processing core may refer to a processing core having a processing frequency of 1.7 GHz (hertz).
  • processor 301 All of the steps performed by processor 301 are performed by a high speed processing core or a low speed processing core.
  • a modem baseband portion for processing a baseband portion of the radio frequency signal
  • a display subsystem coupled to the display; an image signal processing subsystem coupled to the external CPU; and a single channel DDR controller coupled to the DDR memory ; embedded multimedia card interface with embedded multimedia card; USB interface connected with personal computer; SDIO input and output interface connected with short-distance communication module; UART interface connected with Bluetooth, GPS; I2C interface connected with sensor; Smart card interface of smart card SIM card interface.
  • a video processing subsystem a Sensor Hub subsystem, a low-power microcontroller, a high-resolution video codec, a dual-security engine, an image processor, and an image processing unit formed by a second-level cache.
  • a coherent bus that is placed inside the CPU to connect all the interfaces and processing units in the CPU.
  • the above terminal and the like include hardware structures and/or software modules corresponding to each function.
  • Those skilled in the art should readily appreciate that the technical solutions of the embodiments of the present application can be implemented in a combination of hardware or hardware and computer software, in combination with the unit and algorithm steps of the examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
  • the embodiment of the present application may perform the division of the function modules on the terminal or the like according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Provided in an embodiment of the present application is a method for voice interaction, comprising: a voice equipment determining the number of people within a space in which the voice equipment is located; when the voice equipment determines that the number of people in the space is one, the voice equipment entering a non-awakening voice interaction mode. Compared with the existing technology, awakening actions on a smart equipment may be reduced when the smart equipment must be operated by means of using the solution provided by using the present invention.

Description

一种语音交互的方法及装置Method and device for voice interaction 技术领域Technical field
本申请实施例涉及通信技术领域,尤其涉及一种语音提示的方法及装置。The embodiments of the present invention relate to the field of communications technologies, and in particular, to a voice prompting method and apparatus.
背景技术Background technique
人工智能技术在车载智能设备上广泛使用,但当前市场中的智能车载设备虽然搭载有语音操控功能,但均需在使用前进行语音唤醒(如“小驰你好”),且有识别不够灵敏、交互过于繁杂的普遍现象,已成为车载智能后视镜等产品用户最大的诟病之一。Artificial intelligence technology is widely used in in-vehicle smart devices, but the smart car devices in the current market are equipped with voice control functions, but all need to wake up before use (such as "Xiao Chi Hello"), and the recognition is not sensitive enough. The common phenomenon of too complicated interaction has become one of the biggest problems for users of car smart rearview mirrors and other products.
如果不使用唤醒词进行唤醒,又存在被频繁误唤醒的情况,特别是与其他人聊天时,设备误以为是对其下发的指令而进行响应,非常尴尬。目前有厂家提出采用“唤醒词+语音语义识别”一体化方式,实现唤醒词与语音操控之间零间隔、零延迟、无缝对接,摒弃传统的一问一答的形式,减少用户语音操控的步骤及繁杂的唤醒动作。对于用户来说,使用智能设备每个功能的时候,不论先唤醒再问答还是唤醒问答一体化,因为都会有唤醒动作,所以会使车载设备和用户的交互过于复杂。If you do not use wake-up words to wake up, there are cases of frequent false wake-ups, especially when chatting with other people, the device mistakenly thinks that it is responding to the instructions issued by it, which is very embarrassing. At present, some manufacturers propose to adopt the "wake-up word + speech semantic recognition" integration method to achieve zero-interval, zero-delay, seamless docking between wake-up words and voice manipulation, abandon the traditional one-question-answer form, and reduce user voice control. Steps and complicated wake-up actions. For the user, when using each function of the smart device, whether it is first wake up and then answer the question or wake up the question and answer integration, because there will be a wake-up action, the interaction between the in-vehicle device and the user is too complicated.
发明内容Summary of the invention
本申请的实施例提供一种语音提示的方法及装置,在需要对智能设备操作时,可减少对智能设备的唤醒动作。Embodiments of the present application provide a method and apparatus for voice prompting, which can reduce the wake-up action to the smart device when the smart device needs to be operated.
一方面,本发明实施例提供了一种语音交互的方法,包括:语音设备判断所述语音设备所在空间内的人数;当所述语音设备判断所述空间内的人数为一时,所述语音设备进入免唤醒语音交互方式。In one aspect, the embodiment of the present invention provides a method for voice interaction, including: a voice device determining a number of people in a space where the voice device is located; and when the voice device determines that the number of people in the space is one, the voice device Enter the wake-free voice interaction mode.
在一个可能的设计中,所述语音设备判断所述语音设备所在空间内的人数,包括:所述语音设备根据声纹信息、虹膜信息、人像信息、指纹信息、感应数据中的一个或多个判断所述语音设备所在空间内的人数。语音设备通过多种方式识别语音设备所在空间内的人数,并综合判断,提高了识别语音设备所在空间内的人数的准确性。In a possible design, the voice device determines the number of people in the space where the voice device is located, including: the voice device according to one or more of voiceprint information, iris information, portrait information, fingerprint information, and sensing data. Judging the number of people in the space where the voice device is located. The voice device recognizes the number of people in the space where the voice device is located in a plurality of ways, and comprehensively judges the accuracy of identifying the number of people in the space where the voice device is located.
在一个可能的设计中,所述语音设备根据声纹信息判断所述语音设备所在空间内的人数,包括:所述语音设备采集所述空间内的第一语音;所述语音设备判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,所述第二语音与所述第一语音具备不同的声纹特性;如果所述语音设备没有在所述第一时间段之内接收到所述第二语音,则确定所述空间内有一人。语音设备通过不同的声纹识别语音设备所在空间内的人数,是一种常用的识别方式,此例中判断空间中有一人。In a possible design, the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including: the voice device collects the first voice in the space; and the voice device determines that the voice device is received. Whether the second voice is received within the first time period after the first voice, the second voice and the first voice have different voiceprint characteristics; if the voice device is not in the first time Receiving the second voice within the segment determines that there is one person in the space. It is a common way for a voice device to identify the number of people in the space where the voice device is located through different voiceprints. In this example, there is one person in the space.
在一个可能的设计中,所述语音设备根据声纹信息判断所述语音设备所在空间内的人数,包括:所述语音设备采集所述空间内的第一语音;如果所述第一语音不是所述特定指令,则所述语音设备判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,所述第二语音与所述第一语音具备不同的声纹特性;如果所述语音设备没有在所述第一时间段之内接收到所述第二语音,则确定所述空间内有一人。语音设备通过判断采集的第一语音不是特定指令,进一步判断在接收到所述第一语音之 后的第一时间段之内是否接收到第二语音,能更准确的确定该语音设备所在空间内的人数。本文中第一语音可以是首条语音。In a possible design, the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including: the voice device collects a first voice in the space; if the first voice is not Describe a specific instruction, the voice device determines whether a second voice is received within a first time period after receiving the first voice, and the second voice and the first voice have different voiceprint characteristics And if the voice device does not receive the second voice within the first time period, determining that there is a person in the space. The voice device further determines whether the second voice is received within the first time period after receiving the first voice, by determining that the collected first voice is not a specific command, and can more accurately determine the space in the space where the voice device is located. Number of people. The first voice in this article can be the first voice.
在一个可能的设计中,所述语音设备根据声纹信息判断所述语音设备所在空间内的人数,包括:所述语音设备采集所述空间内的第一语音;如果所述第一语音不是所述特定的指令,则所述语音设备判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,所述第二语音与所述第一语音具备不同的声纹特性;如果所述语音设备在所述第一时间段之内接收到所述第二语音,则所述智能语音设备确定所述空间内有多人。语音设备通过判断采集的第一语音不是特定指令,进一步判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,能更准确的确定该语音设备所在空间内的人数。In a possible design, the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including: the voice device collects a first voice in the space; if the first voice is not Describe a specific instruction, the voice device determines whether a second voice is received within a first time period after receiving the first voice, and the second voice has a different voiceprint with the first voice Characteristic; if the voice device receives the second voice within the first time period, the smart voice device determines that there are multiple people in the space. The voice device further determines whether the second voice is received within the first time period after receiving the first voice, by determining that the collected first voice is not a specific command, and can more accurately determine the space in the space where the voice device is located. Number of people.
在一个可能的设计中,语音设备根据虹膜信息判断所述语音设备所在空间内的人数,包括:所述语音设备通过所述语音设备所在空间内的摄像头摄影得到虹膜识别的图像;所述语音设备判断所述图像中是否有不同的虹膜信息;当所述语音设备判断有不同的虹膜信息时,所述语音设备确定所述语音设备所在空间内有多人;当所述语音设备判断只有一种虹膜信息时,所述语音设备确定所述语音设备所在空间内为一人。此例中,语音设备根据虹膜信息判断所述语音设备所在空间内的人数,增加了一种确定所述语音设备所在空间内人数的方式。In a possible design, the voice device determines the number of people in the space where the voice device is located according to the iris information, including: the voice device obtains an image recognized by the iris through a camera in a space where the voice device is located; the voice device Determining whether there is different iris information in the image; when the voice device determines that there is different iris information, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one type In the case of iris information, the voice device determines that the space in which the voice device is located is one person. In this example, the voice device determines the number of people in the space where the voice device is located according to the iris information, and adds a method for determining the number of people in the space where the voice device is located.
在一个可能的设计中,所述语音设备根据人像信息判断所述语音设备所在空间内的人数,包括:所述语音设备通过所述语音设备所在空间内的摄像头摄影得到人像信息;所述语音设备判断所述图像中是否有不同的人像信息;当所述语音设备判断有不同的人像信息时,所述语音设备确定所述语音设备所在空间内有多人;当所述语音设备判断只有一种人像信息时,所述语音设备确定所述语音设备所在空间内为一人。此例中,语音设备根据人像信息判断所述语音设备所在空间内的人数,增加了一种确定所述语音设备所在空间内人数的方式。In a possible design, the voice device determines the number of people in the space where the voice device is located according to the portrait information, including: the voice device obtains portrait information through a camera in a space where the voice device is located; the voice device Determining whether there is different portrait information in the image; when the voice device determines that there is different portrait information, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one type In the case of portrait information, the voice device determines that the space in which the voice device is located is one person. In this example, the voice device determines the number of people in the space where the voice device is located according to the portrait information, and adds a method for determining the number of people in the space where the voice device is located.
在一个可能的设计中,语音设备根据指纹信息判断所述语音设备所在空间内的人数,包括:所述语音设备通过所述语音设备所在空间内的指纹识别装置获得得到指纹信息;所述语音设备判断所述图像中是否有不同的指纹信息;当所述语音设备判断有不同的指纹信息时,所述语音设备确定所述语音设备所在空间内有多人;当所述语音设备判断只有一种指纹信息时,所述语音设备确定所述语音设备所在空间内为一人。此例中,语音设备根据指纹信息判断所述语音设备所在空间内的人数,增加了一种确定所述语音设备所在空间内人数的方式。In a possible design, the voice device determines the number of people in the space where the voice device is located according to the fingerprint information, including: the voice device obtains fingerprint information by using a fingerprint identification device in a space where the voice device is located; the voice device Determining whether there is different fingerprint information in the image; when the voice device determines that there is different fingerprint information, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one type of In the case of fingerprint information, the voice device determines that the space in which the voice device is located is one person. In this example, the voice device determines the number of people in the space where the voice device is located according to the fingerprint information, and adds a method for determining the number of people in the space where the voice device is located.
在一个可能的设计中,语音设备根据感应数据判断所述语音设备所在空间内的人数,包括:所述语音设备通过所述语音设备所在空间内的感应装置获得得到感应数据;所述语音设备判断所述图像中是否有不同的感应数据;当所述语音设备判断有不同的感应数据时,所述语音设备确定所述语音设备所在空间内有多人;当所述语音设备判断只有一种感应数据时,所述语音设备确定所述语音设备所在空间内为一人。此例中,语音设备根据感应数据判断所述语音设备所在空间内的人数,增加了一种确定所述语音设备所在空间内人数的方式。In a possible design, the voice device determines the number of people in the space where the voice device is located according to the sensing data, including: the voice device obtains the sensing data through the sensing device in the space where the voice device is located; the voice device determines Whether there is different sensing data in the image; when the voice device determines that there is different sensing data, the voice device determines that there are multiple people in the space where the voice device is located; when the voice device determines that there is only one sensing In the case of data, the voice device determines that the space in which the voice device is located is one person. In this example, the voice device determines the number of people in the space where the voice device is located according to the sensing data, and adds a method for determining the number of people in the space where the voice device is located.
在一个可能的设计中,所述语音设备进入免唤醒语音交互方式之后,所述方法还 包括:所述语音设备接收第三语音,所述第三语音不包括唤醒词;所述语音设备识别并执行所述第三语音对应的功能。此例中,语音设备进入免唤醒语音交互方式之后识别不包括唤醒词的第三语音,并执行相应的功能,能实现减少唤醒词的语音交互次数。In a possible design, after the voice device enters the wake-free voice interaction mode, the method further includes: the voice device receives a third voice, the third voice does not include an awakening word; Performing a function corresponding to the third voice. In this example, after the voice device enters the wake-free voice interaction mode, the third voice that does not include the wake-up word is recognized, and the corresponding function is executed, so that the number of voice interactions of the wake-up word can be reduced.
在一个可能的设计中,当所述语音设备判断所述空间内有多人时,所述语音设备进入唤醒语音交互方式;所述语音设备接收唤醒词或者包括唤醒词的第四语音;所述语音设备进入语音交互方式或者语音识别并执行所述第四语音对应的功能。此例中,语音设备进入唤醒语音交互方式,可实现基于唤醒词的语音交互。In a possible design, when the voice device determines that there are multiple people in the space, the voice device enters a wake-up voice interaction mode; the voice device receives an wake-up word or a fourth voice including an wake-up word; The voice device enters a voice interaction mode or voice recognition and performs a function corresponding to the fourth voice. In this example, the voice device enters the wake-up voice interaction mode, and the voice interaction based on the wake-up word can be implemented.
在一个可能的设计中,所述语音设备所在的空间为封闭空间、半封闭空间或开放空间。In one possible design, the space in which the voice device is located is an enclosed space, a semi-enclosed space, or an open space.
在一个可能的设计中,半封闭空间或开放空间为以所述语音设备的通信距离为半径的球状空间。In one possible design, the semi-enclosed space or open space is a spherical space having a radius of communication distance of the voice device.
在一个可能的设计中,若所述封闭空间的半径小于或等于所述语音设备的通信距离,则所述语音设备所在的空间为所述封闭空间。In a possible design, if the radius of the closed space is less than or equal to the communication distance of the voice device, the space in which the voice device is located is the closed space.
另一方面,本发明实施例提供了一种语音设备,该语音设备具有实现上述方法实际中语音设备行为的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。On the other hand, the embodiment of the invention provides a voice device, which has the function of realizing the behavior of the voice device in the actual method. The functions may be implemented by hardware or by corresponding software implemented by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
再一方面,本发明实施例提供了一种计算机存储介质,用于储存为上述语音设备所用的计算机软件指令,其包含用于执行上述方面所设计的程序。In still another aspect, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for use in the voice device, including a program designed to perform the above aspects.
相较于现有技术,本发明提供的方案可以在需要对智能设备操作时,可减少对智能设备的唤醒动作。Compared with the prior art, the solution provided by the present invention can reduce the wake-up action on the smart device when the smart device needs to be operated.
附图说明DRAWINGS
图1为本发明实施例提供的一种移动交通工具通信系统的系统架构图;1 is a system architecture diagram of a mobile vehicle communication system according to an embodiment of the present invention;
图2为本发明实施例提供的一种示例车辆12的一种功能框图;2 is a functional block diagram of an exemplary vehicle 12 according to an embodiment of the present invention;
图3为本发明实施例提供的终端的结构示意图;FIG. 3 is a schematic structural diagram of a terminal according to an embodiment of the present disclosure;
图4为图3中音频电路305的结构示意图;4 is a schematic structural view of the audio circuit 305 of FIG. 3;
图5为本发明实施例提供的语音交互流程图;FIG. 5 is a flowchart of voice interaction according to an embodiment of the present invention;
图6为本发明实施例提供的确定车内人数的流程图;FIG. 6 is a flowchart of determining a number of people in a vehicle according to an embodiment of the present invention;
图7为本发明实施例提供的人与车载智能后视镜交互的示意图;FIG. 7 is a schematic diagram of interaction between a person and a vehicle-mounted smart rearview mirror according to an embodiment of the present invention; FIG.
图8为本发明实施例提供的人与车载智能后视镜交互的另一示意图;FIG. 8 is another schematic diagram of interaction between a person and a vehicle-mounted smart rearview mirror according to an embodiment of the present invention; FIG.
图9为本发明实施例提供的车载智能音箱的示意图;FIG. 9 is a schematic diagram of a vehicle-mounted smart speaker according to an embodiment of the present invention; FIG.
图10为本发明实施例提供的车载智能电视的示意图;FIG. 10 is a schematic diagram of a vehicle-mounted smart TV according to an embodiment of the present invention; FIG.
图11为本发明实施例提供的家庭智能音箱的示意图;FIG. 11 is a schematic diagram of a home smart speaker according to an embodiment of the present invention; FIG.
图12为本发明实施例提供的人与家庭智能电视交互的示意图;FIG. 12 is a schematic diagram of interaction between a person and a home smart television according to an embodiment of the present invention; FIG.
图13为本发明实施例提供的包括两个封闭空间的示意图;Figure 13 is a schematic diagram of two enclosed spaces provided by an embodiment of the present invention;
图14为本发明实施例提供的处理器的结构示意图。FIG. 14 is a schematic structural diagram of a processor according to an embodiment of the present invention.
具体实施方式Detailed ways
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的 特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。In the following, the terms "first" and "second" are used for descriptive purposes only, and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, "multiple" means two or more unless otherwise stated.
以下详细描述参考附图对所公开的系统和方法的各种特征和功能进行了描述。在图中,除非上下文另外指出,否则相似的符号标识相似的组件。本文中所描述的说明性系统和方法实施例并非意图进行限制。可容易理解,所公开的系统和方法的某些方面可以按多种不同的配置进行布置和组合,所有这些都在本文中被设想到。The detailed description below describes various features and functions of the disclosed systems and methods with reference to the drawings. In the figures, similar symbols identify similar components unless the context indicates otherwise. The illustrative system and method embodiments described herein are not intended to be limiting. It will be readily understood that certain aspects of the disclosed systems and methods can be arranged and combined in a variety of different configurations, all of which are contemplated herein.
请参阅图1,图1是本发明实施例提供的一种移动交通工具通信系统的系统架构图。其中,通信系统10包括交通工具12、一个或多个无线载波系统14、地面通信网络16、计算机18以及呼叫中心20。应该理解的是,所公开的方法能够与任何数量的不同系统一起使用,并不特定地限于此处示出的运行环境。同样,系统10的架构、构造、设置和运行以及它的单独部件在现有技术中通常是已知的。因此,以下的段落仅仅简单地提供了一个示例通信系统10的概述,本文没有示出的其它系统也能够使用所公开的方法。Referring to FIG. 1, FIG. 1 is a system architecture diagram of a mobile vehicle communication system according to an embodiment of the present invention. Among other things, communication system 10 includes vehicle 12, one or more wireless carrier systems 14, terrestrial communication network 16, computer 18, and call center 20. It should be understood that the disclosed methods can be used with any number of different systems and are not specifically limited to the operating environments shown herein. Likewise, the architecture, construction, setup, and operation of system 10, as well as its individual components, are generally known in the art. Thus, the following paragraphs simply provide an overview of an example communication system 10, and other systems not shown herein can also use the disclosed methods.
交通工具12可实现在汽车上或可采取汽车的形式。然而,示例系统还可实现在其它车辆上或采取其它车辆的形式,诸如轿车、卡车、摩托车、公交车、船、飞机、直升机、割草机、铲雪车、休旅车、游乐园车辆、农业设备、施工设备、有轨电车、高尔夫球车、火车和电车等其它车辆。此外,机器人装置也可用于执行本文描述的方法和系统。The vehicle 12 can be implemented on a car or in the form of a car. However, the example system can also be implemented on other vehicles or in the form of other vehicles, such as cars, trucks, motorcycles, buses, boats, airplanes, helicopters, lawn mowers, snow shovels, recreational vehicles, amusement park vehicles. Other equipment such as agricultural equipment, construction equipment, trams, golf carts, trains and trams. In addition, the robotic device can also be used to perform the methods and systems described herein.
一些交通工具硬件28在图1中示出,包括信息通讯单元30、麦克风32、一个或多个按钮或者其它控制输入34、音频系统36、可视显示器38、以及全球定位系统(global position system,GPS)模块40和多个交通工具安全单元(vehicle security module,VSM)42。这些设备中的一些能够直接连接到信息通讯单元,例如麦克风32和按钮34,而其它的使用一个或多个网络连接实现间接连接,例如通信总线44或者娱乐总线46。合适的网络连接的实例包括控制器局域网(controller area network,CAN)、媒体导向系统转移(media oriented systems transport,MOST)、局部互联网络(local Interconnect network,LIN)、局域网(local area network,LAN)以及其它合适的连接,例如以太网或者符合已知的国际标准化组织(International organization for standardization,ISO)、美国机动车工程师学会(society of automotive engineers,SAE)和国际电气与电子工程师学会(institute of electrical and electronics engineers,IEEE)标准和规定的其它连接,这仅仅列举一小部分。Some vehicle hardware 28 is shown in FIG. 1, including an information communication unit 30, a microphone 32, one or more buttons or other control inputs 34, an audio system 36, a visual display 38, and a global position system (global position system, GPS) module 40 and a plurality of vehicle security modules (VSMs) 42. Some of these devices can be directly connected to an information communication unit, such as microphone 32 and button 34, while others use one or more network connections to make an indirect connection, such as communication bus 44 or entertainment bus 46. Examples of suitable network connections include controller area network (CAN), media oriented systems transport (MOST), local interconnect network (LIN), local area network (LAN) And other suitable connections, such as Ethernet or conforming to the International Organization for Standardization (ISO), the Society of Automotive Engineers (SAE), and the Institute of Electrical and Electronics Engineers (institute of electrical) And electronics engineers, IEEE) other connections to the standards and regulations, just to name a few.
信息通讯单元30可以是原始设备制造商(original equipment manufacturer,OEM)安装(嵌入)或者配件市场设备,它安装在交通工具中,且能够在无线载波系统14上且经无线联网进行无线声音和/或数据通信。这能使交通工具与呼叫中心20、其它启用信息通讯的交通工具、或者一些其它实体或者设备通信。信息通讯单元优选地使用无线电广播来与无线载波系统14建立通信信道(声音信道和/或数据信道),使得声音和/或数据传输能够在信道上被发送和接收。通过提供声音和数据通信,信息通讯单元30能使交通工具提供多种不同的服务,包括与导航、电话、紧急救援、诊断、信息娱乐等相关联的那些服务。数据能够经数据连接(例如经数据信道上的分组数据传输,或者经使用现有技术中已知技术的声音信道)被发送。对于包括声音通信(例如,在呼叫中心 20处具有现场顾问或者声音响应单元)和数据通信(例如,提供GPS位置数据或者车辆诊断数据至呼叫中心20)两者的组合服务,系统可利用在声音信道上的单个呼叫,并根据需要在声音信道上在声音和数据传输之间切换,这可以使用本领域技术人员已知的技术来完成。此外,可使用短消息服务SMS发送和接收数据(例如,分组数据协议(packet data protocol,PDP));信息通讯单元可被配置为移动终止和/或发起,或者被配置为应用终止和/或发起。The information communication unit 30 may be an original equipment manufacturer (OEM) installation (embedded) or an accessory market device installed in the vehicle and capable of wirelessly sounding and/or wirelessly networked on the wireless carrier system 14. Or data communication. This enables the vehicle to communicate with the call center 20, other information-enabled vehicles, or some other entity or device. The information communication unit preferably uses radio broadcasts to establish a communication channel (sound channel and/or data channel) with the wireless carrier system 14 such that sound and/or data transmissions can be transmitted and received over the channel. By providing voice and data communications, the messaging unit 30 enables the vehicle to provide a variety of different services, including those associated with navigation, telephony, emergency, diagnostics, infotainment, and the like. Data can be transmitted over a data connection (e.g., via packet data over a data channel, or via a voice channel using techniques known in the art). For combined services including voice communication (eg, having a field advisor or voice response unit at call center 20) and data communication (eg, providing GPS location data or vehicle diagnostic data to call center 20), the system can utilize the sound A single call on the channel and switching between sound and data transmission on the sound channel as needed, which can be done using techniques known to those skilled in the art. In addition, the short message service SMS can be used to send and receive data (eg, a packet data protocol (PDP)); the information communication unit can be configured to be mobile terminated and/or initiated, or configured to apply termination and/or initiate.
信息通讯单元30根据全球移动通信系统(global system for mobile communication,GSM)或者码分多址(code division multiple access,CDMA)标准利用蜂窝通信,因此包括用于声音通信(例如免提呼叫)的标准蜂窝芯片集50、用于数据传输的无线调制解调器、电子处理设备52、一个或多个数字存储器设备54以及双天线56。应该明白,调制解调器能够通过存储在信息通讯单元内的软件实施且由处理器52执行,或者它能够是位于信息通讯单元30内部或者外部的分开的硬件部件。调制解调器能够使用任何数量的不同标准或者协议(例如EVDO(CDMA20001xEV-DO,EVDO)、CDMA、通用分组无线服务技术(general packet radio service,GPRS)和增强型数据速率GSM演进技术(enhanced data rate for GSM evolution,EDGE))来运行。交通工具和其它联网设备之间的无线联网也能够使用信息通讯单元30来执行。为此目的,信息通讯单元30能够被配置为根据一个或多个无线协议(例如,IEEE 802.11协议、全球微波互联接入(worldwide interoperability for microwave access,WiMAX)或者蓝牙中的任何一种)无线通信。当用于例如传输控制协议/因特网互联协议(transmission control protocol/Internet protocol,TCP/IP)的分组交换数据通信时,信息通讯单元能够被配置具有静态IP地址,或者能够被设置以从网络上的另一个设备(例如路由器)或者从网络地址服务器自动接收所分配的IP地址。The information communication unit 30 utilizes cellular communication according to a global system for mobile communication (GSM) or code division multiple access (CDMA) standard, and thus includes standards for voice communication (eg, hands-free calling). A set of cells 50, a wireless modem for data transmission, an electronic processing device 52, one or more digital memory devices 54, and a dual antenna 56. It should be understood that the modem can be implemented by software stored in the information communication unit and executed by the processor 52, or it can be a separate hardware component located inside or outside the information communication unit 30. The modem can use any number of different standards or protocols (eg EVDO (CDMA2000 1xEV-DO, EVDO), CDMA, general packet radio service (GPRS) and enhanced data rate for GSM) (enhanced data rate for GSM) Evolution, EDGE)) to run. Wireless networking between the vehicle and other networked devices can also be performed using the information communication unit 30. To this end, the information communication unit 30 can be configured to wirelessly communicate according to one or more wireless protocols (eg, any of IEEE 802.11 protocols, worldwide interoperability for microwave access (WiMAX), or Bluetooth) . When used for packet switched data communication such as transmission control protocol/Internet protocol (TCP/IP), the information communication unit can be configured to have a static IP address or can be set to be from the network. Another device (such as a router) or automatically receives the assigned IP address from a network address server.
处理器52可以是能够处理电子指令的任何类型的设备,包括微处理器、微控制器、主处理器、控制器、交通工具通信处理器、以及专用集成电路(application specific integrated circuit,ASIC)。它能够是仅用于信息通讯单元30的专用处理器或者能够与其它交通工具系统共享。处理器52执行各种类型的数字存储指令,例如存储在存储器54中的软件或者固件程序,它能使信息通讯单元提供较宽的多种服务。例如,处理器52能够执行程序或者处理数据,以执行本文讨论的方法的至少一部分。 Processor 52 can be any type of device capable of processing electronic instructions, including microprocessors, microcontrollers, host processors, controllers, vehicle communication processors, and application specific integrated circuits (ASICs). It can be a dedicated processor for the information communication unit 30 only or can be shared with other vehicle systems. Processor 52 executes various types of digital storage instructions, such as software or firmware programs stored in memory 54, which enables the information communication unit to provide a wide variety of services. For example, processor 52 can execute a program or process data to perform at least a portion of the methods discussed herein.
信息通讯单元30能够被用于提供不同范围的交通工具服务,包括与来自交通工具其他部分的无线通信。这样的服务包括:转向指引turn-by-turn direct 1ns以及与基于GPS的交通工具导航模块40结合提供的其它导航相关联的服务;安全气囊部署通知以及与一个或多个碰撞传感器接口模块(例如主体控制模块(未图示))结合提供的其它紧急或路边救援相关联的服务。使用一个或多个诊断模块的诊断报告。以及信息娱乐相关联的服务,其中音乐、网页、电影、电视节目、视频游戏和/或其它信息被信息娱乐模块下载,并被存储用于当前或稍后回放。以上列出的服务决不是信息通讯单元30的所有能力的详尽列表,而仅仅是信息通讯单元能够提供的一些服务的列举。此外,应该理解,至少一些上述模块能够以存储在信息通讯单元30内部或外部的软件指令的形式实施,它们可以是位于信息通讯单元30内部或外部的硬件部件,或者它们可以是彼此集成的和/或共享的,或者与位于整个交通工具中的其它系统集成和/或共享,这仅列举几 种可能性。位于信息通讯单元30外部的VSM 42在工作的情况下,它们可利用交通工具总线44与信息通讯单元30交换数据和命令。The information communication unit 30 can be used to provide a range of vehicle services, including wireless communication with other portions from the vehicle. Such services include: turn-by-turn direct 1 ns and other navigation-related services provided in conjunction with GPS-based vehicle navigation module 40; airbag deployment notifications and interface modules with one or more collision sensors (eg, The main body control module (not shown) combines the services provided by other emergency or roadside rescues. Diagnostic reports using one or more diagnostic modules. And entertainment related services, where music, web pages, movies, television shows, video games, and/or other information are downloaded by the infotainment module and stored for current or later playback. The services listed above are by no means an exhaustive list of all the capabilities of the messaging unit 30, but merely an enumeration of some of the services that the messaging unit can provide. Moreover, it should be understood that at least some of the above modules can be implemented in the form of software instructions stored inside or outside of the information communication unit 30, which may be hardware components located inside or outside the information communication unit 30, or they may be integrated with each other. / or shared, or integrated and / or shared with other systems located throughout the vehicle, just to name a few possibilities. Where the VSMs 42 located outside of the information communication unit 30 are in operation, they can exchange data and commands with the information communication unit 30 using the vehicle bus 44.
GPS模块40从GPS卫星60接收无线电信号。从这些信号,GPS模块40能够确定交通工具的位置,该交通工具的位置被用于给交通工具驾驶者提供导航和其它位置相关联的服务。导航信息能够被呈现在显示器38上(或者交通工具内的其它显示器)或者能够用语言呈现,例如当提供转向导航时完成。能够使用专用的交通工具内的导航模块(可以是GPS模块40的一部分)来提供导航服务,或者一些或全部导航服务可以经信息通讯单元30来完成,其中位置信息被发送到远程位置,以便于为交通工具提供导航地图、地图标注(感兴趣的点、餐馆等)、路线计算等等。位置信息能够被提供给呼叫中心20或者其它远程计算机系统,例如计算机18,以用于其它的目的,例如车队管理。并且,新的或者更新的地图数据能够经信息通讯单元30从呼叫中心20下载至GPS模块40。The GPS module 40 receives radio signals from the GPS satellites 60. From these signals, the GPS module 40 is able to determine the location of the vehicle that is used to provide the vehicle driver with navigation and other location-associated services. The navigation information can be presented on display 38 (or other display within the vehicle) or can be rendered in a language, such as when providing steering navigation. A navigation module (which may be part of the GPS module 40) within the dedicated vehicle can be used to provide navigation services, or some or all of the navigation services can be completed via the information communication unit 30, where the location information is sent to a remote location to facilitate Provide navigation maps, map annotations (points of interest, restaurants, etc.), route calculations, etc. for vehicles. The location information can be provided to call center 20 or other remote computer system, such as computer 18, for other purposes, such as fleet management. And, new or updated map data can be downloaded from the call center 20 to the GPS module 40 via the information communication unit 30.
除了音频系统36和GPS模块40之外,交通工具12能够包括电子硬件部件形式的其它交通工具安全模块VSM 42,其它交通工具安全模块VSM42位于整个交通工具中,通常从一个或多个传感器接收输入,并使用所感测到的输入来执行诊断、监测、控制、报告和/或其它功能。VSM 42中的每一个优选地通过通信总线44连接到其它VSM,也连接到信息通讯单元30,并且能够被编程以运行交通工具系统和子系统诊断测试。例如,一个VSM 42能够是控制发动机运行的各方面(例如,燃料点火和点火时间)的发动机控制模块(engine control module,ECM),另一个VSM 42能够是调节交通工具的动力传动系的一个或多个部件的运行的动力传动系控制模块,且另一个VSM 42能够是管理位于整个交通工具中的各个电部件(如同交通工具的电动门锁和前灯)的主体控制模块。根据一个实施例,发动机控制模块装备有车载诊断(on board diagnostics,OBD)特征,车载诊断特征提供大量实时数据,例如从各种传感器(包括交通工具排放传感器)接收的数据,并提供标准化系列的诊断故障代码(DTS),诊断故障代码允许技术人员快速地识别和维修交通工具内的故障。如本领域的技术人员所明白的,以上提及的VSM仅仅是可以在交通工具12内使用的一些模块的实例,许多其它的模块也是可能的。In addition to the audio system 36 and the GPS module 40, the vehicle 12 can include other vehicle security modules VSM in the form of electronic hardware components. The other vehicle security modules VSM 42 are located throughout the vehicle, typically receiving input from one or more sensors. And use the sensed inputs to perform diagnostics, monitoring, control, reporting, and/or other functions. Each of the VSMs 42 is preferably connected to other VSMs via a communication bus 44, also to the information communication unit 30, and can be programmed to run vehicle system and subsystem diagnostic tests. For example, one VSM 42 can be an engine control module (ECM) that controls various aspects of engine operation (eg, fuel ignition and ignition timing), and the other VSM 42 can be one that regulates the powertrain of the vehicle or The operating powertrain control module of the plurality of components, and the other VSM 42 can be a body control module that manages various electrical components (like the power door locks and headlights of the vehicle) located throughout the vehicle. According to one embodiment, the engine control module is equipped with on board diagnostics (OBD) features that provide a large amount of real-time data, such as data received from various sensors (including vehicle emission sensors), and provide a standardized series Diagnostic Trouble Code (DTS), which diagnoses fault codes allows technicians to quickly identify and repair faults within the vehicle. As will be appreciated by those skilled in the art, the VSMs mentioned above are merely examples of some of the modules that may be used within the vehicle 12, and many other modules are also possible.
交通工具电子件28还包括多个交通工具用户接口,为交通工具司乘人员提供了提供和/或接收信息的装置,包括麦克风32、按钮34、音频系统36和可视显示器38。如在本文所使用的,术语“交通工具用户接口”广泛地包括任何合适形式的电子设备,包括硬件和软件部件,该电子设备位于交通工具上,且能使交通工具用户与交通工具的部件通信或者通过交通工具的部件通信。麦克风32提供了至信息通讯单元的音频输入,以能使驾驶者或者其他司乘人员提供声音命令,并执行经无线载波系统14的免提护叫。为此目的,它能够连接到车载自动化声音处理单元,车载自动化声音处理单元利用现有技术中已知的人机接口(human machine interface,HMI)技术。按钮34允许手动用户输入至信息通讯单元30,以发起无线电话呼叫和提供其它数据、响应或者控制输入。分开的按钮能够被用于发起紧急呼叫以及常规服务求助呼叫至呼叫中心20。音频系统36提供音频输出至交通工具司乘人员且能够是专用的单机系统或者主交通工具音频系统的一部分。根据此处所示的具体实施例,音频系统36可运行地联接到交通工具总线44和娱乐总线46,且能够提供调幅(amplitude modulation,AM)、调频(frequency modulation,FM)和卫星广播、数字多功能光盘(digital versatile disc,DVD)和其它 多媒体功能。这个功能能够与以上描述的信息娱乐模块结合提供或者独立提供。可视显示器38优选地是图形显示器,例如仪表板上的触摸屏或者从挡风玻璃反射的抬头显示器,且能够被用于提供多种输入和输出功能。各种其它交通工具用户接口也能够被利用,因为图1中的接口仅仅是一种具体实施方案的实例。The vehicle electronics 28 also includes a plurality of vehicle user interfaces that provide vehicle occupants with means for providing and/or receiving information, including a microphone 32, buttons 34, an audio system 36, and a visual display 38. As used herein, the term "vehicle user interface" broadly includes any suitable form of electronic device, including hardware and software components that are located on a vehicle and that enable the vehicle user to communicate with components of the vehicle. Or communicate through the components of the vehicle. The microphone 32 provides an audio input to the information communication unit to enable the driver or other occupant to provide voice commands and perform an hands-free escort via the wireless carrier system 14. For this purpose, it can be connected to an in-vehicle automated sound processing unit that utilizes human machine interface (HMI) technology known in the art. Button 34 allows a manual user to input to messaging unit 30 to initiate a wireless telephone call and provide other data, response or control inputs. Separate buttons can be used to initiate an emergency call as well as a regular service call to call center 20. The audio system 36 provides audio output to the vehicle occupant and can be a dedicated stand-alone system or part of the host vehicle audio system. In accordance with the particular embodiment illustrated herein, audio system 36 is operatively coupled to vehicle bus 44 and entertainment bus 46, and is capable of providing amplitude modulation (AM), frequency modulation (FM), and satellite broadcast, digital. Digital versatile disc (DVD) and other multimedia features. This functionality can be provided in conjunction with or separately from the infotainment module described above. Visual display 38 is preferably a graphical display, such as a touch screen on a dashboard or a heads-up display that is reflected from a windshield, and can be used to provide a variety of input and output functions. Various other vehicle user interfaces can also be utilized, as the interface in Figure 1 is merely an example of a particular implementation.
无线载波系统14优选地是蜂窝电话系统,包括多个蜂窝塔70(仅示出一个)、一个或多个移动交换中心(mobile switching center,MSC)72以及将无线载波系统14与地面网络16连接所要求的任何其它的联网部件。每个蜂窝塔70包括发送和接收天线以及基站,来自不同蜂窝塔的基站直接连接到MSC 72或者经中间装置(例如基站控制器)连接到MSC 72。蜂窝系统14可实施任何合适的通信技术,包括例如模拟技术(例如模拟移动通信系统(advanced mobile phone system,AMPS))或者更新的数字技术(例如CDMA(例如CDMA2000)或GSM/GPRS)。如本领域的技术人员将会明白的,各种蜂窝塔/基站/MSC设置都是可能的,且可与无线系统14一起使用。例如,基站和蜂窝塔能够共同位于相同的地点,或者它们能够彼此定位较远,每个基站能够响应单个的蜂窝塔或者单个基站能够服务各个蜂窝塔,各个基站能够联接到单个MSC,这仅仅例举一小部分可能的设置。The wireless carrier system 14 is preferably a cellular telephone system comprising a plurality of cellular towers 70 (only one shown), one or more mobile switching centers (MSCs) 72, and a wireless carrier system 14 coupled to the terrestrial network 16. Any other networking components required. Each of the cellular towers 70 includes transmit and receive antennas and base stations, and base stations from different cellular towers are directly connected to the MSC 72 or to the MSC 72 via intermediate devices (e.g., base station controllers). Cellular system 14 may implement any suitable communication technology including, for example, analog technologies (e.g., an advanced mobile phone system (AMPS)) or newer digital technologies (e.g., CDMA (e.g., CDMA2000) or GSM/GPRS). As will be appreciated by those skilled in the art, various cellular tower/base station/MSC settings are possible and can be used with wireless system 14. For example, the base station and the cellular tower can be co-located at the same location, or they can be located farther from each other, each base station can respond to a single cellular tower or a single base station can serve each cellular tower, and each base station can be coupled to a single MSC, which is merely an example Give a small set of possible settings.
除了使用无线载波系统14之外,卫星通信形式的不同无线载波系统能够被用于提供与交通工具的单向或者双向通信。这能够使用一个或多个通信卫星62和上行链路发射站64来完成。单向通信能够是例如卫星广播服务,其中节目内容(新闻、音乐等)被发射站64接收、打包用于上传、且接下来发送到卫星62,卫星62将节目广播到用户。双向通信能够是例如使用卫星62在交通工具12和站64之间中继电话通信的卫星电话服务。如果使用,这种卫星电话能够被附加到无线载波系统14或者代替无线载波系统14使用。In addition to using the wireless carrier system 14, different wireless carrier systems in the form of satellite communications can be used to provide one-way or two-way communication with the vehicle. This can be done using one or more communication satellites 62 and an uplink transmitting station 64. The one-way communication can be, for example, a satellite broadcast service in which program content (news, music, etc.) is received by the transmitting station 64, packaged for uploading, and then transmitted to the satellite 62, which broadcasts the program to the user. The two-way communication can be, for example, a satellite telephone service that relays telephone communications between the vehicle 12 and the station 64 using the satellite 62. If used, such a satellite phone can be attached to or used in place of the wireless carrier system 14.
地面网络16可以是常规的陆基无线电通信网络,它连接到一个或多个固定电话,并将无线载波系统14连接到呼叫中心20。例如,地面网络16可包括公共交换电话网络(public switched telephone network,PSTN),例如被用于提供有线电话、分组交换数据通信以及互联网基础设施的PSTN。地面网络16的一个或多个部分能够通过使用标准的有线网络、光纤或者其它光学网络、电缆网络、电力线、其它无线网络(例如无线局域网(wireless local area networks,WLAN))、或者提供宽带无线访问(broadband wireless access,BWA)的网络及其任何组合来实施。地面网络16还可以包括用于存储、上传、转换和/或在发送者和接收者之间传输短消息(short message service,SMS)的一个或多个短消息服务中心(short message service center,SMSC)。例如,SMSC可以从呼叫中心20或者内容提供商(例如,外部短消息实体或者ESME)接收SMS消息,且SMSC可以将SMS消息传输给交通工具12(例如,移动终端设备)。SMSC和它们的功能对于技术人员来说是已知的。此外,呼叫中心20不必经地面网络16连接,但是可以包括无线电话设备,使得它能够直接与无线网络(例如无线载波系统14)通信。The terrestrial network 16 may be a conventional land-based radio communication network that is coupled to one or more fixed telephones and that connects the wireless carrier system 14 to the call center 20. For example, terrestrial network 16 may include a public switched telephone network (PSTN), such as a PSTN that is used to provide wired telephone, packet switched data communications, and Internet infrastructure. One or more portions of terrestrial network 16 can be accessed using standard wired networks, fiber optic or other optical networks, cable networks, power lines, other wireless networks (eg, wireless local area networks (WLAN)), or providing broadband wireless access. (broadband wireless access, BWA) network and any combination thereof to implement. The terrestrial network 16 may also include one or more short message service centers (SMSCs) for storing, uploading, converting, and/or transmitting short messages (SMS) between the sender and the receiver. ). For example, the SMSC can receive an SMS message from the call center 20 or a content provider (eg, an external short message entity or ESME), and the SMSC can transmit the SMS message to the vehicle 12 (eg, a mobile terminal device). SMSCs and their functions are known to the skilled person. Moreover, call center 20 need not be connected via terrestrial network 16, but may include a wireless telephone device such that it can communicate directly with a wireless network (e.g., wireless carrier system 14).
计算机18能够是多个计算机中的一个,这多个计算机可经私人或者公共网络(例如互联网)访问。每个这样的计算机18都能够被用于一个或多个目的,例如交通工具可经信息通讯单元30和无线载波器14访问网页服务器。其它这样的可访问计算机18能够是例如:服务中心计算机,其中诊断信息和其它交通工具数据能够经信息通讯单元30从交 通工具上传;交通工具所有者或者其他用户为例如如下目的而使用的客户端计算机:访问或者接收交通工具数据,或者设置或配置用户参数,或者控制交通工具的功能;或者第三方库,无论是通过与交通工具12还是呼叫中心20通信,或者与两者通信,交通工具数据或者其它信息被提供至或者来自该第三方库。计算机18还能够被用于提供互联网连接,例如域名服务器(domain name server,DNS)服务,或者作为使用动态主机配置协议(dynamic host configuration protocol,DHCP)或者其它合适的协议来分配IP地址给交通工具12的网络地址服务器。 Computer 18 can be one of a plurality of computers that are accessible via a private or public network, such as the Internet. Each such computer 18 can be used for one or more purposes, such as a vehicle that can access a web server via the information communication unit 30 and the wireless carrier 14. Other such accessible computers 18 can be, for example, a service center computer in which diagnostic information and other vehicle data can be uploaded from the vehicle via the information communication unit 30; the vehicle owner or other user is a client for use, for example, for the following purposes Computer: accessing or receiving vehicle data, or setting or configuring user parameters, or controlling the functionality of the vehicle; or third party library, whether by communicating with the vehicle 12 or the call center 20, or communicating with both, vehicle data Or other information is provided to or from the third party library. The computer 18 can also be used to provide an internet connection, such as a domain name server (DNS) service, or as a means of assigning an IP address to a vehicle using a dynamic host configuration protocol (DHCP) or other suitable protocol. 12 network address server.
呼叫中心20被设计以提供多种不同的系统后端功能给交通工具电子件28,并且根据在此示出的示例性实施例,呼叫中心20通常包括一个或多个交换机80、服务器82、数据库84、现场顾问86、以及自动声音响应系统(automatic voice response system,VRS)88,它们在现有技术中全部都是已知的。这些各种呼叫中心部件优选地经有线或者无线局域网90彼此联接。交换机80能够是专用交换分机(private branch exchange,PBX),路由进入的信号,使得声音传输通常通过普通电话发送到现场顾问86或者使用VoIP发送到自动声音响应系统88。现场顾问电话也能够使用网络语音电话业务(voice over Internet phone,VoIP),如图1中的虚线所指示。VoIP和通过交换机80的其它的数据通信经连接在交换机80和网络90之间的调制解调器(未图示)来实施。数据传输经调制解调器传递到服务器82和/或数据库84。数据库84能够存储账户信息,例如用户身份验证信息、交通工具标识符、数据图表(profile)记录、行为模式以及其它有关的用户信息。数据传输也可以由无线系统来执行,例如802.1lx,GPRS等等。此外,可使用短消息服务(SMS)发送和/或接收数据(例如,PDP);且呼叫中心20可被配置为移动终止和/或发起,或者被配置为应用终止和/或发起。虽然所阐述的实施例已经被描述为它将会与使用现场顾问86的有人控制的呼叫中心20一起使用,但是将会明白呼叫中心可代替使用VRS 88作为自动顾问,或者VRS 88和现场顾问86的组合可以被使用。The call center 20 is designed to provide a variety of different system backend functions to the vehicle electronics 28, and according to the exemplary embodiment shown herein, the call center 20 typically includes one or more switches 80, servers 82, databases 84. On-site consultant 86, and automatic voice response system (VRS) 88, all of which are known in the prior art. These various call center components are preferably coupled to each other via a wired or wireless local area network 90. Switch 80 can be a private branch exchange (PBX) that routes incoming signals such that voice transmissions are typically sent to field consultant 86 via a regular telephone or to automated voice response system 88 using VoIP. The on-site advisory phone can also use voice over Internet phone (VoIP), as indicated by the dashed line in Figure 1. VoIP and other data communications through switch 80 are implemented via a modem (not shown) connected between switch 80 and network 90. Data transfer is passed to server 82 and/or database 84 via a modem. The database 84 is capable of storing account information such as user authentication information, vehicle identifiers, data profile records, behavioral patterns, and other related user information. Data transmission can also be performed by a wireless system, such as 802.1lx, GPRS, and the like. In addition, short message service (SMS) can be used to send and/or receive data (eg, PDP); and call center 20 can be configured to terminate and/or initiate mobile, or configured to terminate and/or initiate applications. Although the illustrated embodiment has been described as being used with a human-controlled call center 20 using a field advisor 86, it will be appreciated that the call center can instead use VRS 88 as an automated advisor, or VRS 88 and on-site consultant 86. The combination can be used.
图2是本发明实施例提供的一种示例车辆12的一种功能框图。耦合到车辆12或包括在车辆12中的组件可包括推进系统102、传感器系统104、控制系统106、外围设备108、电源110、计算装置111以及用户接口112。计算装置111可包括处理器113和存储器114。计算装置111可以是车辆12的控制器或控制器的一部分。存储器114可包括处理器113可运行的指令115,并且还可存储地图数据116。车辆12的组件可被配置为以与彼此互连和/或与耦合到各系统的其它组件互连的方式工作。例如,电源110可向车辆12的所有组件提供电力。计算装置111可被配置为从推进系统102、传感器系统104、控制系统106和外围设备108接收数据并对它们进行控制。计算装置111可被配置为在用户接口112上生成图像的显示并从用户接口112接收输入。2 is a functional block diagram of an example vehicle 12 provided by an embodiment of the present invention. Components coupled to or included in vehicle 12 may include propulsion system 102, sensor system 104, control system 106, peripherals 108, power source 110, computing device 111, and user interface 112. Computing device 111 can include a processor 113 and a memory 114. Computing device 111 may be part of a controller or controller of vehicle 12. The memory 114 can include instructions 115 that the processor 113 can run, and can also store map data 116. The components of the vehicle 12 can be configured to operate in a manner interconnected with each other and/or with other components coupled to the various systems. For example, power source 110 can provide power to all components of vehicle 12. Computing device 111 can be configured to receive data from, and control, propulsion system 102, sensor system 104, control system 106, and peripherals 108. Computing device 111 can be configured to generate a display of images on user interface 112 and receive input from user interface 112.
在其它示例中,车辆12可包括更多、更少或不同的系统,并且每个系统可包括更多、更少或不同的组件。此外,示出的系统和组件可以按任意种的方式进行组合或划分。In other examples, vehicle 12 may include more, fewer, or different systems, and each system may include more, fewer, or different components. Moreover, the systems and components shown may be combined or divided in any number of ways.
推进系统102可用于车辆12提供动力运动。如图所示,推进系统102包括引擎/发动机118、能量源120、传动装置(transmission)122和车轮/轮胎124。The propulsion system 102 can be used to provide power motion to the vehicle 12. As shown, the propulsion system 102 includes an engine/engine 118, an energy source 120, a transmission 122, and a wheel/tire 124.
引擎/发动机118可以是或包括内燃机、电动机、蒸汽机和斯特林发动机等的任意组合。其它发动机和引擎也是可能的。在一些示例中,推进系统102可包括多种类型 的引擎和/或发动机。例如,气电混合轿车可包括汽油发动机和电动机。其它示例是可能的。Engine/engine 118 may be or include any combination of internal combustion engine, electric motor, steam engine, and Stirling engine. Other engines and engines are also possible. In some examples, propulsion system 102 can include multiple types of engines and/or engines. For example, a gas-electric hybrid car may include a gasoline engine and an electric motor. Other examples are possible.
能量源120可以是全部或部分向引擎/发动机118供能的能量的来源。也就是说,引擎/发动机118可用于为将能量源120转换为机械能。能量源120的示例包括汽油、柴油、其它基于石油的燃料、丙烷、其它基于压缩气体的燃料、乙醇、太阳能电池板、电池和其它电力来源。(一个或多个)能量源120可以额外地或可替换地包括燃料箱、电池、电容器和/或飞轮的任意组合。在一些示例中,能量源120也可以为车辆12的其它系统提供能量。Energy source 120 may be a source of energy that is fully or partially powered to engine/engine 118. That is, the engine/engine 118 can be used to convert the energy source 120 to mechanical energy. Examples of energy source 120 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas based fuels, ethanol, solar panels, batteries, and other sources of electrical power. The energy source(s) 120 may additionally or alternatively include any combination of fuel tanks, batteries, capacitors, and/or flywheels. In some examples, energy source 120 may also provide energy to other systems of vehicle 12 .
传动装置122可用于为把机械动力从引擎/发动机118传送到车轮/轮胎124。为此,传动装置122可包括变速箱、离合器、差速器、驱动轴和/或其它元件。在传动装置122包括驱动轴的示例中,驱动轴包括用于耦合到车轮/轮胎124的一个或多个轴。Transmission 122 can be used to transfer mechanical power from engine/engine 118 to wheel/tire 124. To this end, the transmission 122 can include a gearbox, a clutch, a differential, a drive shaft, and/or other components. In the example where the transmission 122 includes a drive shaft, the drive shaft includes one or more shafts for coupling to the wheel/tire 124.
车辆12的车轮/轮胎124可配置为各种形式,包括单轮车、自行车/摩托车、三轮车或者轿车/卡车四轮形式。其它车轮/轮胎形式也是可能的,诸如包括六个或更多个车轮的那些。车辆12的车轮/轮胎124可被配置为相对于其它车轮/轮胎124差速地旋转。在一些示例中,车轮/轮胎124可包括固定地附着到传动装置122的至少一个车轮和与驾驶表面接触的耦合到车轮的边缘的至少一个轮胎。车轮/轮胎124可包括金属和橡胶的任意组合,或者其它材料的组合。The wheel/tire 124 of the vehicle 12 can be configured in a variety of forms, including a single wheeled vehicle, a bicycle/motorcycle, a tricycle, or a car/truck four wheel form. Other wheel/tire forms are also possible, such as those that include six or more wheels. The wheel/tire 124 of the vehicle 12 can be configured to rotate differentially relative to the other wheels/tires 124. In some examples, the wheel/tire 124 can include at least one wheel that is fixedly attached to the transmission 122 and at least one tire that is coupled to the driving surface and that is coupled to the edge of the wheel. Wheel/tire 124 may comprise any combination of metal and rubber, or a combination of other materials.
推进系统102可以额外地或可替换地包括除了所示出的那些以外的组件。Propulsion system 102 may additionally or alternatively include components in addition to those shown.
传感器系统104可包括用于感测关于车辆12所位于的环境的信息的若干个传感器。如图所示,传感器系统的传感器包括GPS126、惯性测量单元(inertial measurement unit,IMU)128、无线电检测和雷达测距(RADAR)单元130、激光测距(LIDAR)单元132、相机134以及用于为修改传感器的位置和/或朝向的致动器136。传感器系统104也可包括额外的传感器,包括例如监视车辆12的内部系统的传感器(例如,O2监视器、燃油量表、机油温度,等等)。传感器系统104也可以包括其它传感器。 Sensor system 104 may include a number of sensors for sensing information regarding the environment in which vehicle 12 is located. As shown, the sensors of the sensor system include a GPS 126, an inertial measurement unit (IMU) 128, a radio detection and radar ranging (RADAR) unit 130, a laser ranging (LIDAR) unit 132, a camera 134, and Actuator 136 to modify the position and/or orientation of the sensor. Sensor system 104 may also include additional sensors including, for example, sensors that monitor the internal system of vehicle 12 (eg, O2 monitor, fuel gauge, oil temperature, etc.). Sensor system 104 may also include other sensors.
GPS模块126可以为用于估计车辆12的地理位置的任何传感器。为此,GPS模块126可能包括收发器,基于卫星定位数据,估计车辆12相对于地球的位置。在示例中,计算装置111可用于结合地图数据116使用GPS模块126来估计车辆12可在其上行驶的道路上的车道边界的位置。GPS模块126也可采取其它形式。The GPS module 126 can be any sensor for estimating the geographic location of the vehicle 12. To this end, the GPS module 126 may include a transceiver that estimates the position of the vehicle 12 relative to the earth based on satellite positioning data. In an example, computing device 111 can be used in conjunction with map data 116 to use GPS module 126 to estimate the location of a lane boundary on a road on which vehicle 12 can travel. The GPS module 126 can take other forms as well.
IMU 128可以是用于基于惯性加速度及其任意组合来感测车辆12的位置和朝向变化。在一些示例中,传感器的组合可包括例如加速度计和陀螺仪。传感器的其它组合也是可能的。 IMU 128 may be used to sense changes in position and orientation of vehicle 12 based on inertial acceleration and any combination thereof. In some examples, the combination of sensors can include, for example, an accelerometer and a gyroscope. Other combinations of sensors are also possible.
RADAR单元130可以被看作物体检测系统,其用于使用无线电波来检测物体的特性,诸如物体的距离、高度、方向或速度。RADAR单元130可被配置为传送无线电波或微波脉冲,其可从波的路线中的任何物体反弹。物体可将波的一部分能量返回至接收器(例如,碟形天线或天线),该接收器也可以是RADAR单元130的一部分。RADAR单元130还可被配置为对接收到的信号(从物体反弹)执行数字信号处理,并且可被配置为识别物体。The RADAR unit 130 can be viewed as an object detection system for detecting the characteristics of an object using radio waves, such as the distance, height, direction or speed of the object. The RADAR unit 130 can be configured to transmit radio waves or microwave pulses that can bounce off any object in the course of the wave. The object may return a portion of the energy of the wave to a receiver (eg, a dish or antenna), which may also be part of the RADAR unit 130. The RADAR unit 130 can also be configured to perform digital signal processing on the received signal (bounce from the object) and can be configured to identify the object.
其它类似于RADAR的系统已用在电磁波谱的其它部分上。一个示例是LIDAR(光检测和测距),其可使用来自激光的可见光,而非无线电波。Other systems similar to RADAR have been used on other parts of the electromagnetic spectrum. One example is LIDAR (Light Detection and Ranging), which can use visible light from a laser instead of radio waves.
LIDAR单元132包括传感器,该传感器使用光感测或检测车辆12所位于的环境中的物体。通常,LIDAR是可通过利用光照射目标来测量到目标的距离或目标的其它属性的光学遥感技术。作为示例,LIDAR单元132可包括被配置为发射激光脉冲的激光源和/或激光扫描仪,和用于为接收激光脉冲的反射的检测器。例如,LIDAR单元132可包括由转镜反射的激光测距仪,并且以一维或二维围绕数字化场景扫描激光,从而以指定角度间隔采集距离测量值。在示例中,LIDAR单元132可包括诸如光(例如,激光)源、扫描仪和光学系统、光检测器和接收器电子器件之类的组件,以及位置和导航系统。The LIDAR unit 132 includes a sensor that uses light to sense or detect objects in the environment in which the vehicle 12 is located. In general, LIDAR is an optical remote sensing technique that can measure the distance to a target or other attribute of a target by illuminating the target with light. As an example, LIDAR unit 132 can include a laser source and/or a laser scanner configured to emit laser pulses, and a detector for receiving reflections of the laser pulses. For example, the LIDAR unit 132 can include a laser range finder that is reflected by a rotating mirror and scans the laser around the digitized scene in one or two dimensions to acquire distance measurements at specified angular intervals. In an example, LIDAR unit 132 may include components such as light (eg, laser) sources, scanners and optical systems, photodetectors, and receiver electronics, as well as position and navigation systems.
在示例中,LIDAR单元132可被配置为使用紫外光(UV)、可见光或红外光对物体成像,并且可用于广泛的目标,包括非金属物体。在一个示例中,窄激光波束可用于以高分辨率对物体的物理特征进行地图绘制。In an example, the LIDAR unit 132 can be configured to image an object using ultraviolet (UV), visible, or infrared light, and can be used for a wide range of targets, including non-metallic objects. In one example, a narrow laser beam can be used to map physical features of an object with high resolution.
在示例中,从约10微米(红外)至约250纳米(UV)的范围中的波长可被使用。光通常经由后向散射被反射。不同类型的散射被用于不同的LIDAR应用,诸如瑞利散射、米氏散射和拉曼散射以及荧光。基于不同种类的后向散射,作为示例,LIDAR可因此被称为瑞利激光RADAR、米氏LIDAR、拉曼LIDAR以及钠/铁/钾荧光LIDAR。波长的适当组合可允许例如通过寻找反射信号的强度的依赖波长的变化对物体进行远程地图绘制。In an example, wavelengths in the range of from about 10 microns (infrared) to about 250 nanometers (UV) can be used. Light is typically reflected via backscattering. Different types of scattering are used for different LIDAR applications such as Rayleigh scattering, Mie scattering and Raman scattering, and fluorescence. Based on different kinds of backscattering, as an example, LIDAR can thus be referred to as Rayleigh laser RADAR, Mie LIDAR, Raman LIDAR, and sodium/iron/potassium fluorescent LIDAR. Appropriate combinations of wavelengths may allow remote mapping of objects, for example by looking for wavelength dependent changes in the intensity of the reflected signal.
使用扫描LIDAR系统和非扫描LIDAR系统两者可实现三维(3D)成像。“3D选通观测激光RADAR(3D gated viewing laser radar)”是非扫描激光测距系统的示例,其应用脉冲激光和快速选通相机。成像LIDAR也可使用通常使用互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)和混合互补金属氧化物半导体/电荷耦合器件(charge coupled device,CCD)制造技术在单个芯片上构建的高速检测器阵列和调制敏感检测器阵列来执行。在这些装置中,每个像素可通过以高速解调或选通来被局部地处理,以使得阵列可被处理成表示来自相机的图像。使用此技术,可同时获取上千个像素以创建表示LIDAR单元132检测到的物体或场景的3D点云。Three-dimensional (3D) imaging can be achieved using both a scanned LIDAR system and a non-scanning LIDAR system. "3D gated viewing laser radar" is an example of a non-scanning laser ranging system that uses a pulsed laser and a fast gating camera. Imaging LIDAR can also use high speed detector arrays that are typically built on a single chip using complementary metal oxide semiconductor (CMOS) and hybrid complementary metal oxide semiconductor/charge coupled device (CCD) fabrication techniques. And modulating the sensitive detector array to perform. In these devices, each pixel can be locally processed by high speed demodulation or gating such that the array can be processed to represent an image from the camera. Using this technique, thousands of pixels can be acquired simultaneously to create a 3D point cloud representing the object or scene detected by the LIDAR unit 132.
点云可包括3D坐标系统中的一组顶点。这些顶点例如可由X、Y、Z坐标定义,并且可表示物体的外表面。LIDAR单元132可被配置为通过测量物体的表面上的大量点来创建点云,并可将点云作为数据文件输出。作为通过LIDAR单元132的对物体的3D扫描过程的结果,点云可用于识别并可视化物体。A point cloud can include a set of vertices in a 3D coordinate system. These vertices may be defined, for example, by X, Y, Z coordinates and may represent the outer surface of the object. The LIDAR unit 132 can be configured to create a point cloud by measuring a large number of points on the surface of the object, and can output the point cloud as a data file. As a result of the 3D scanning process of the object through the LIDAR unit 132, the point cloud can be used to identify and visualize the object.
在一个示例中,点云可被直接渲染以可视化物体。在另一示例中,点云可通过可被称为曲面重建的过程被转换为多边形或三角形网格模型。用于将点云转换为3D曲面的示例技术可包括德洛内三角剖分、阿尔法形状和旋转球。这些技术包括在点云的现有顶点上构建三角形的网络。其它示例技术可包括将点云转换为体积距离场,以及通过移动立方体算法重建这样定义的隐式曲面。In one example, the point cloud can be rendered directly to visualize the object. In another example, a point cloud may be converted to a polygonal or triangular mesh model by a process that may be referred to as surface reconstruction. Example techniques for converting a point cloud to a 3D surface may include a Delaunay triangulation, an alpha shape, and a rotating sphere. These techniques include building a network of triangles on existing vertices of a point cloud. Other example techniques may include converting a point cloud to a volumetric distance field, and reconstructing such an implicit surface as defined by a moving cube algorithm.
相机134可以用于获取车辆12所位于的环境的图像的任何相机(例如,静态相机、视频相机等)。为此,相机可被配置为检测可见光,或可被配置为检测来自光谱的其它部分(诸如红外光或紫外光)的光。其它类型的相机也是可能的。相机134可以是二维检测器,或可具有三维空间范围。在一些示例中,相机134例如可以是距离检测器,其被配置为生成指示从相机134到环境中的若干点的距离的二维图像。为此,相机134可使用一种或多种距离检测技术。例如,相机134可被配置为使用结构光技术,其中车辆12 利用预定光图案,诸如栅格或棋盘格图案,对环境中的物体进行照射,并且使用相机134检测从物体的预定光图案的反射。基于反射的光图案中的畸变,车辆12可被配置为检测到物体上的点的距离。预定光图案可包括红外光或其它波长的光。Camera 134 can be used to capture any camera (eg, a still camera, video camera, etc.) of an image of the environment in which vehicle 12 is located. To this end, the camera can be configured to detect visible light, or can be configured to detect light from other portions of the spectrum, such as infrared or ultraviolet light. Other types of cameras are also possible. Camera 134 can be a two-dimensional detector or can have a three-dimensional spatial extent. In some examples, camera 134 can be, for example, a distance detector configured to generate a two-dimensional image indicative of the distance from camera 134 to several points in the environment. To this end, camera 134 can use one or more distance detection techniques. For example, camera 134 can be configured to use structured light technology in which vehicle 12 illuminates an object in the environment with a predetermined light pattern, such as a grid or checkerboard pattern, and uses camera 134 to detect reflections from a predetermined light pattern of the object. . Based on the distortion in the reflected light pattern, the vehicle 12 can be configured to detect the distance of a point on the object. The predetermined light pattern may include infrared light or light of other wavelengths.
致动器136例如可被配置为修改传感器的位置和/或朝向。传感器系统104可额外地或可替换地包括除了所示出的那些以外的组件。Actuator 136 can be configured, for example, to modify the position and/or orientation of the sensor. Sensor system 104 may additionally or alternatively include components in addition to those shown.
控制系统106可被配置为控制车辆12及其组件的操作。为此,控制系统106可包括转向单元138、油门140、制动单元142、传感器融合算法144、计算机视觉系统146、导航或路线控制(pathing)系统148以及避障系统150。 Control system 106 can be configured to control the operation of vehicle 12 and its components. To this end, control system 106 can include steering unit 138, throttle 140, braking unit 142, sensor fusion algorithm 144, computer vision system 146, navigation or routing system 148, and obstacle avoidance system 150.
转向单元138可以是被配置为调整车辆12的前进方向或方向的机构的任意组合。 Steering unit 138 may be any combination of mechanisms configured to adjust the direction or direction of advancement of vehicle 12.
油门140可以是被配置为控制引擎/发动机118的操作速度和加速度并进而控制车辆12的速度和加速度的机构的任意组合。The throttle 140 may be any combination of mechanisms configured to control the operating speed and acceleration of the engine/engine 118 and thereby control the speed and acceleration of the vehicle 12.
制动单元142可以是被配置为使车辆12减速的机构的任意组合。例如,制动单元142可使用摩擦来减慢车轮/轮胎124。作为另一示例,制动单元142可被配置为再生的(regenerative)并且将车轮/轮胎124的动能转换为电流。制动单元142也可采取其它形式。 Brake unit 142 may be any combination of mechanisms configured to decelerate vehicle 12. For example, the brake unit 142 can use friction to slow the wheel/tire 124. As another example, the braking unit 142 can be configured to regeneratively convert the kinetic energy of the wheel/tire 124 into a current. Brake unit 142 can take other forms as well.
传感器融合算法144可以包括例如计算装置111可运行的算法(或者存储算法的计算机程序产品)。传感器融合算法144可被配置为接受来自传感器104的数据作为输入。所述数据可包括例如表示在传感器系统104的传感器处感测到的信息的数据。传感器融合算法144可包括例如卡尔曼滤波器、贝叶斯网络或者另外的算法。传感器融合算法144还可被配置为基于来自传感器系统104的数据来提供各种评价,包括例如对车辆12所位于的环境中的个体物体和/或特征的评估、对具体情形的评估和/或基于特定情形的可能影响的评估。其它评价也是可能的。 Sensor fusion algorithm 144 may include, for example, an algorithm (or a computer program product that stores the algorithm) that computing device 111 may operate. Sensor fusion algorithm 144 can be configured to accept data from sensor 104 as an input. The data may include, for example, data representing information sensed at the sensors of sensor system 104. Sensor fusion algorithm 144 may include, for example, a Kalman filter, a Bayesian network, or another algorithm. The sensor fusion algorithm 144 may also be configured to provide various ratings based on data from the sensor system 104, including, for example, an assessment of individual objects and/or features in the environment in which the vehicle 12 is located, an assessment of a particular situation, and/or An assessment based on the likely impact of a particular situation. Other evaluations are also possible.
计算机视觉系统146可以是被配置为处理和分析由相机134捕捉的图像以便识别车辆12所位于的环境中的物体和/或特征的任何系统,所述物体和/或特征包括例如车道信息、交通信号和障碍物。为此,计算机视觉系统146可使用物体识别算法、从运动中恢复结构(structure from motion,SFM)算法、视频跟踪或其它计算机视觉技术。在一些示例中,计算机视觉系统146可以额外地被配置为地图绘制环境、跟随物体、估计物体的速度,等等。 Computer vision system 146 may be any system configured to process and analyze images captured by camera 134 to identify objects and/or features in the environment in which vehicle 12 is located, such as lane information, traffic, for example Signals and obstacles. To this end, computer vision system 146 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, or other computer vision techniques. In some examples, computer vision system 146 may additionally be configured as a mapping environment, following an object, estimating the speed of an object, and the like.
导航和路线控制系统148可以是被配置为确定车辆12的驾驶路线的任何系统。导航和路线控制系统148可以额外地被配置为在车辆12处于操作中的同时动态地更新驾驶路线。在一些示例中,导航和路线控制系统148可被配置为结合来自传感器融合算法144、GPS模块126和一个或多个预定地图的数据以便为车辆12确定驾驶路线。Navigation and route control system 148 may be any system configured to determine the driving route of vehicle 12. The navigation and route control system 148 can additionally be configured to dynamically update the driving route while the vehicle 12 is in operation. In some examples, navigation and route control system 148 can be configured to combine data from sensor fusion algorithm 144, GPS module 126, and one or more predetermined maps to determine a driving route for vehicle 12.
避障系统150可以是被配置为识别、评估和避免或者以其它方式越过车辆12所位于的环境中的障碍物的任何系统。The obstacle avoidance system 150 can be any system configured to identify, evaluate, and avoid or otherwise cross obstacles in the environment in which the vehicle 12 is located.
控制系统106可以额外地或可替换地包括除了所示出的那些以外的组件。 Control system 106 may additionally or alternatively include components in addition to those shown.
外围设备108可被配置为允许车辆12与外部传感器、其它车辆和/或用户交互。为此,外围设备108可包括例如无线通信系统152、触摸屏154、麦克风156和/或扬声器158。Peripheral device 108 can be configured to allow vehicle 12 to interact with external sensors, other vehicles, and/or users. To this end, peripheral device 108 can include, for example, wireless communication system 152, touch screen 154, microphone 156, and/or speaker 158.
无线通信系统152可以是被配置为直接地或经由通信网络无线耦合至一个或多个其它车辆、传感器或其它实体的任何系统。为此,无线通信系统152可包括用于直接或通过空中接口与其它车辆、传感器或其它实体通信的天线和芯片集。芯片集或整个无 线通信系统152可被布置为根据一个或多个其它类型的无线通信(例如,协议)来通信,所述无线通信诸如蓝牙、IEEE 802.11(包括任何IEEE 802.11修订版)中描述的通信协议、蜂窝技术(诸如GSM、CDMA、通用移动通信系统(universal mobile telecommunications system,UMTS)、EV-DO、WiMAX或长期演进(long term evolution,LTE))、紫蜂、专用短程通信(dedicated short range communications,DSRC)以及射频识别(radio frequency identification,RFID)通信,等等。无线通信系统152也可采取其它形式。Wireless communication system 152 can be any system configured to be wirelessly coupled to one or more other vehicles, sensors, or other entities, either directly or via a communication network. To this end, the wireless communication system 152 can include an antenna and chipset for communicating with other vehicles, sensors, or other entities, either directly or through an air interface. The chipset or the entire wireless communication system 152 can be arranged to communicate in accordance with one or more other types of wireless communications (e.g., protocols) such as those described in Bluetooth, IEEE 802.11 (including any IEEE 802.11 revision). Communication protocol, cellular technology (such as GSM, CDMA, universal mobile telecommunications system (UMTS), EV-DO, WiMAX or long term evolution (LTE)), ZigBee, dedicated short-range communication (dedic short) Range communications, DSRC) and radio frequency identification (RFID) communications, and the like. Wireless communication system 152 can take other forms as well.
触摸屏154可被用户用来向车辆12输入命令。为此,触摸屏154可被配置为经由电容感测、电阻感测或者表面声波过程等等来感测用户的手指的位置和移动中的至少一者。触摸屏154可能够感测在与触摸屏表面平行或与触摸屏表面在同一平面内的方向上、在与触摸屏表面垂直的方向上或者在这两个方向上的手指移动,并且还可能够感测施加到触摸屏表面的压力的水平。触摸屏154可由一个或多个半透明或透明绝缘层和一个或多个半透明或透明导电层形成。触摸屏154也可采取其它形式。Touch screen 154 can be used by a user to enter commands into vehicle 12. To this end, the touch screen 154 can be configured to sense at least one of a position and a movement of a user's finger via a capacitive sensing, a resistive sensing, or a surface acoustic wave process or the like. The touch screen 154 may be capable of sensing finger movement in a direction parallel to the touch screen surface or in the same plane as the touch screen surface, in a direction perpendicular to the touch screen surface, or in both directions, and may also be capable of sensing application to The level of pressure on the surface of the touch screen. Touch screen 154 may be formed from one or more translucent or transparent insulating layers and one or more translucent or transparent conductive layers. Touch screen 154 can take other forms as well.
麦克风156可被配置为从车辆12的用户接收音频(例如,声音命令或其它音频输入)。类似地,扬声器158可被配置为向车辆12的用户输出音频。Microphone 156 can be configured to receive audio (eg, a voice command or other audio input) from a user of vehicle 12. Similarly, the speaker 158 can be configured to output audio to a user of the vehicle 12.
外围设备108可以额外地或可替换地包括除了所示出的那些以外的组件。Peripheral device 108 may additionally or alternatively include components in addition to those shown.
电源110可被配置为向车辆12的一些或全部组件提供电力。为此,电源110可包括例如可再充电锂离子或铅酸电池。在一些示例中,一个或多个电池组可被配置为提供电力。其它电源材料和配置也是可能的。在一些示例中,电源110和能量源120可一起实现,如一些全电动车中那样。The power source 110 can be configured to provide power to some or all of the components of the vehicle 12. To this end, the power source 110 can include, for example, a rechargeable lithium ion or lead acid battery. In some examples, one or more battery packs can be configured to provide power. Other power materials and configurations are also possible. In some examples, power source 110 and energy source 120 can be implemented together, as in some all-electric vehicles.
包括在计算装置111中的处理器113可包括一个或多个通用处理器和/或一个或多个专用处理器(例如,图像处理器、数字信号处理器等)。就处理器113包括多于一个处理器而言,这种处理器可单独工作或组合工作。计算装置111可实现基于通过用户接口112接收的输入控制车辆12的功能。 Processor 113 included in computing device 111 may include one or more general purpose processors and/or one or more special purpose processors (eg, image processors, digital signal processors, etc.). Insofar as the processor 113 includes more than one processor, such processors can work individually or in combination. Computing device 111 may implement the function of controlling vehicle 12 based on input received through user interface 112.
存储器114进而可包括一个或多个易失性存储组件和/或一个或多个非易失性存储组件,诸如光、磁和/或有机存储装置,并且存储器114可全部或部分与处理器113集成。存储器114可包含可由处理器113运行的指令115(例如,程序逻辑),以运行各种车辆功能,包括本文中描述的功能或方法中的任何一个。The memory 114, in turn, can include one or more volatile storage components and/or one or more non-volatile storage components, such as optical, magnetic, and/or organic storage devices, and the memory 114 can be fully or partially coupled to the processor 113. integrated. Memory 114 may include instructions 115 (eg, program logic) executable by processor 113 to perform various vehicle functions, including any of the functions or methods described herein.
车辆12的组件可被配置为以与在其各自的系统内部和/或外部的其它组件互连的方式工作。为此,车辆12的组件和系统可通过系统总线、网络和/或其它连接机制通信地链接在一起。The components of the vehicle 12 can be configured to operate in a manner interconnected with other components internal and/or external to their respective systems. To this end, the components and systems of the vehicle 12 can be communicatively linked together via a system bus, network, and/or other connection mechanism.
如图3所示,是本发明实施例的车机的结构示意图。该终端300(以车机为例)中包括处理器301、存储器302、摄像头303、RF电路304、音频电路305、扬声器306、话筒307、输入设备308、其他输入设备309、显示屏310、触控面板311、显示面板312、输出设备313、以及电源314等部件。其中,显示屏310至少由作为输入设备的触控面板311和作为输出设备的显示面板312组成。需要说明的是,图3中示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置,在此不做限定。FIG. 3 is a schematic structural view of a vehicle according to an embodiment of the present invention. The terminal 300 (taking the vehicle as an example) includes a processor 301, a memory 302, a camera 303, an RF circuit 304, an audio circuit 305, a speaker 306, a microphone 307, an input device 308, other input devices 309, a display screen 310, and a touch. The control panel 311, the display panel 312, the output device 313, and the power source 314 and the like. The display screen 310 is composed of at least a touch panel 311 as an input device and a display panel 312 as an output device. It should be noted that the terminal structure shown in FIG. 3 does not constitute a limitation on the terminal, and may include more or less components than those illustrated, or combine some components, or split some components, or different. The component arrangement is not limited herein.
下面结合图3对终端300的各个构成部件进行具体的介绍:The components of the terminal 300 will be specifically described below with reference to FIG. 3:
射频(radio frequency,RF)电路304可用于收发信息或通话过程中,信号的接收 和发送,比如,若该终端300为车载设备,那么该终端300可以通过RF电路304,将基站发送的下行信息接收后,传送给处理器301处理;另外,将涉及上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(low noise amplifier,LNA)、双工器等。此外,RF电路304还可以通过无线通信与网络和其他设备通信。该无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(global system for mobile communication,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA)、长期演进(long term evolution,LTE)、电子邮件、短消息服务(short messaging service,SMS)等。The radio frequency (RF) circuit 304 can be used for transmitting and receiving information or during the call, and receiving and transmitting the signal. For example, if the terminal 300 is an in-vehicle device, the terminal 300 can send the downlink information sent by the base station through the RF circuit 304. After receiving, it is transmitted to the processor 301 for processing; in addition, data related to the uplink is transmitted to the base station. Generally, RF circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 304 can also communicate with the network and other devices via wireless communication. The wireless communication can use any communication standard or protocol, including but not limited to global system for mobile communication (GSM), general packet radio service (GPRS), code division multiple access (code division) Multiple access (CDMA), wideband code division multiple access (WCDMA), long term evolution (LTE), e-mail, short messaging service (SMS), and the like.
存储器302可用于存储软件程序以及模块,处理器301通过运行存储在存储器302的软件程序以及模块,从而执行终端300的各种功能应用以及数据处理。存储器302可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如,声音播放功能、图像播放功能等)等;存储数据区可存储根据终端300的使用所创建的数据(比如,音频数据、视频数据等)等。此外,存储器302可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 302 can be used to store software programs and modules, and the processor 301 executes various functional applications and data processing of the terminal 300 by running software programs and modules stored in the memory 302. The memory 302 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (for example, a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored. Data (such as audio data, video data, etc.) created according to the use of the terminal 300, and the like. Moreover, memory 302 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
其他输入设备309可用于接收输入的数字或字符信息,以及产生与终端300的用户设置以及功能控制有关的键信号输入。具体地,其他输入设备309可包括但不限于物理键盘、功能键(比如,音量控制按键、开关按键等)、轨迹球、鼠标、操作杆、光鼠(光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸)等中的一种或多种。其他输入设备309还可以包括终端300内置的传感器,比如,重力传感器、加速度传感器等,终端300还可以将传感器所检测到的参数作为输入数据。Other input devices 309 can be used to receive input numeric or character information, as well as to generate key signal inputs related to user settings and function control of terminal 300. Specifically, other input devices 309 may include, but are not limited to, a physical keyboard, function keys (eg, volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, light rats (light mice are touches that do not display visual output) One or more of a sensitive surface, or an extension of a touch sensitive surface formed by a touch screen. The other input device 309 may further include a sensor built in the terminal 300, such as a gravity sensor, an acceleration sensor, etc., and the terminal 300 may also use the parameter detected by the sensor as input data.
显示屏310可用于显示由用户输入的信息或提供给用户的信息以及终端300的各种菜单,还可以接受用户输入。此外,显示面板312可以采用液晶显示器(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板312;触控面板311,也称为触摸屏、触敏屏等,可收集用户在其上或附近的接触或者非接触操作(比如,用户使用手指、触笔等任何适合的物体或附件在触控面板311上或在触控面板311附近的操作,也可以包括体感操作;该操作包括单点控制操作、多点控制操作等操作类型),并根据预先设定的程式驱动相应的连接装置。需要说明的是,触控面板311还可以包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位、姿势,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成处理器301能够处理的信息,再传送给处理器301,并且,还能接收处理器301发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板311,也可以采用未来发展的任何技术实现触控面板311。一般情况下,触控面板311可覆盖显示面板312,用户可以根据显示面板312显示的内容(该显示内容包括但不限于软键盘、虚拟鼠标、虚拟按键、图标等),在显示面板312上覆盖的触控面板 311上或者附近进行操作,触控面板111检测到在其上或附近的操作后,传送给处理器301以确定用户输入,随后处理器301根据用户输入,在显示面板312上提供相应的视觉输出。虽然在图3中,触控面板311与显示面板312是作为两个独立的部件来实现终端300的输入和输出功能,但是在某些实施例中,可以将触控面板311与显示面板312集成,以实现终端300的输入和输出功能。The display screen 310 can be used to display information input by the user or information provided to the user as well as various menus of the terminal 300, and can also accept user input. In addition, the display panel 312 can be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. The touch panel 311 is also called a touch screen or a touch sensitive screen. Etc., it is possible to collect contact or non-contact operations on or near the user (for example, the user can use any suitable object or accessory such as a finger or a stylus on the touch panel 311 or in the vicinity of the touch panel 311, or Including the somatosensory operation; the operation includes a single point control operation, a multi-point control operation and the like, and drives the corresponding connection device according to a preset program. It should be noted that the touch panel 311 may further include two parts: a touch detection device and a touch controller. Wherein, the touch detection device detects the touch orientation and posture of the user, and detects a signal brought by the touch operation, and transmits a signal to the touch controller; the touch controller receives the touch information from the touch detection device, and converts it into the processor 301. The information that can be processed is transmitted to the processor 301, and the commands sent from the processor 301 can also be received and executed. In addition, the touch panel 311 can be implemented by using various types such as resistive, capacitive, infrared, and surface acoustic waves, and the touch panel 311 can be implemented by any technology developed in the future. In general, the touch panel 311 can cover the display panel 312, and the user can cover the display panel 312 according to the content displayed by the display panel 312 (including but not limited to a soft keyboard, a virtual mouse, a virtual button, an icon, etc.). The operation is performed on or near the touch panel 311. After the touch panel 111 detects the operation thereon or nearby, the touch panel 111 transmits to the processor 301 to determine the user input, and then the processor 301 provides the display panel 312 according to the user input. Corresponding visual output. Although in FIG. 3, the touch panel 311 and the display panel 312 are used as two independent components to implement the input and output functions of the terminal 300, in some embodiments, the touch panel 311 can be integrated with the display panel 312. To implement the input and output functions of the terminal 300.
RF电路304、扬声器306,话筒307可提供用户与终端300之间的音频接口。音频电路305可将接收到的音频数据转换后的信号,传输到扬声器306,由扬声器306转换为声音信号输出;另一方面,话筒307可以将收集的声音信号转换为信号,由音频电路305接收后转换为音频数据,再将音频数据输出至RF电路304以发送给诸如另一终端的设备,或者将音频数据输出至存储器302,以便处理器301结合存储器302中存储的内容进行进一步的处理。另外,摄像头303可以实时采集图像帧,并传送给处理器301处理,并将处理后的结果存储至存储器302和/或将处理后的结果通过显示面板312呈现给用户。The RF circuit 304, the speaker 306, and the microphone 307 can provide an audio interface between the user and the terminal 300. The audio circuit 305 can transmit the converted audio data to the speaker 306 and convert it into a sound signal output by the speaker 306. On the other hand, the microphone 307 can convert the collected sound signal into a signal, which is received by the audio circuit 305. It is then converted to audio data, which is then output to RF circuitry 304 for transmission to a device such as another terminal, or audio data is output to memory 302 for processor 301 to perform further processing in conjunction with the content stored in memory 302. In addition, the camera 303 can acquire image frames in real time and transmit them to the processor 301 for processing, and store the processed results to the memory 302 and/or present the processed results to the user via the display panel 312.
处理器301是终端300的控制中心,利用各种接口和线路连接整个终端300的各个部分,通过运行或执行存储在存储器302内的软件程序和/或模块,以及调用存储在存储器302内的数据,执行终端300的各种功能和处理数据,从而对终端300进行整体监控。需要说明的是,处理器301可以包括一个或多个处理单元;处理器301还可以集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面(user interface,UI)和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器301中。Processor 301 is the control center of terminal 300, which connects various portions of the entire terminal 300 using various interfaces and lines, by running or executing software programs and/or modules stored in memory 302, and recalling data stored in memory 302. The various functions and processing data of the terminal 300 are executed to perform overall monitoring of the terminal 300. It should be noted that the processor 301 may include one or more processing units; the processor 301 may further integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface (UI) And the application, etc., the modem processor mainly handles wireless communication. It can be understood that the above modem processor may not be integrated into the processor 301.
终端300还可以包括给各个部件供电的电源314(比如,电池),在本发明实施例中,电源314可以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗等功能。The terminal 300 may further include a power source 314 (for example, a battery) for supplying power to the respective components. In the embodiment of the present invention, the power source 314 may be logically connected to the processor 301 through the power management system, thereby managing charging, discharging, and the like through the power management system. And power consumption and other functions.
此外,图3中还存在未示出的部件,比如,终端300还可以包括蓝牙模块、传感器等,在此不再赘述。In addition, there are components not shown in FIG. 3, for example, the terminal 300 may further include a Bluetooth module, a sensor, and the like, and details are not described herein again.
如图4所示,音频电路305具体可以包括数字信号处理(digital signal processing,DSP)和编解码器401模块,其中编解码器子模块实现模拟到数字/数字到模拟(AD/DA)的转换,DSP子模块实现语音算法的处理。As shown in FIG. 4, the audio circuit 305 may specifically include a digital signal processing (DSP) and a codec 401 module, wherein the codec sub-module implements analog to digital/digital to analog (AD/DA) conversion. The DSP sub-module implements the processing of the speech algorithm.
下面以终端为车载设备为例说明,如图5所示,车机的开机流程如下:The following takes the terminal as an in-vehicle device as an example. As shown in Figure 5, the booting process of the car is as follows:
S401,车机开机,开始采集周围声音。此步骤中,车机通过图3中的话筒307来采集周围的声音。S401, the car is turned on, and the surrounding sound is collected. In this step, the vehicle collects the surrounding sound through the microphone 307 in FIG.
S402,车机通过声纹识别技术,判断车内的人数。声纹识别,是生物识别技术的一种,也称为说话人识别,有两类,即说话人辨认和说话人确认。本文中涉及的是说话人的确认。声纹识别就是把声信号转换成电信号,再用计算机进行识别。具体在本文中,可通过声纹识别技术,通过判断车内有多少不同的声纹,来识别车内有多少人。所谓声纹(Voiceprint),是用电声学仪器显示的携带言语信息的声波频谱。人类语言的产生是人体语言中枢与发音器官之间一个复杂的生理物理过程,人在讲话时使用的发声器官--舌、牙齿、喉头、肺、鼻腔在尺寸和形态方面每个人的差异很大,所以任何两个人的声纹图谱都有差异。每个人的语音声学特征既有相对稳定性,又有变异性, 不是绝对的、一成不变的。这种变异可来自生理、病理、心理、模拟、伪装,也与环境干扰有关。尽管如此,由于每个人的发音器官都不尽相同,因此在一般情况下,人们仍能区别不同的人的声音或判断是否是同一人的声音。声纹是一种识别的方式,还可以通过比如:虹膜信息、脸部信息、指纹信息、红外、感应数据、语音询问中的一种或多种方式来识别车内的人数。In S402, the car machine determines the number of people in the car through voiceprint recognition technology. Voiceprint recognition is a kind of biometric technology, also known as speaker recognition. There are two types, namely speaker recognition and speaker confirmation. This article refers to the confirmation of the speaker. Voiceprint recognition is the conversion of sound signals into electrical signals, which are then identified by a computer. Specifically in this paper, how many people in the car can be identified by the voiceprint recognition technology by judging how many different voiceprints are in the car. The so-called voiceprint is a sound wave spectrum that carries speech information displayed by an electroacoustic instrument. The generation of human language is a complex physiological and physical process between the human language center and the vocal organs. The vocal organs used by people in speech--tongue, teeth, throat, lungs, and nasal cavity vary greatly in size and shape. , so the soundprints of any two people are different. Each person's speech acoustic characteristics are both relatively stable and variability, not absolute and immutable. This variation can come from physiology, pathology, psychology, simulation, camouflage, and also related to environmental disturbances. However, since each person's vocal organs are different, in general, people can still distinguish the voices of different people or judge whether they are the same person's voice. Voiceprint is a way of identifying, and the number of people in the car can also be identified by one or more of, for example, iris information, facial information, fingerprint information, infrared, sensing data, and voice inquiries.
例如,可以通过车机上的摄像头摄影可以得到虹膜识别的图像,车机通过判断是否有不同的虹膜判断车机旁是否有多人。For example, an image recognized by the iris can be obtained by photographing the camera on the vehicle, and the vehicle judges whether there are many people beside the vehicle by judging whether there is a different iris.
又例如,可以通过车机上的摄像头摄影可以得到脸部识别的图像,车机通过判断是否有不同的脸部特性判断车机旁是否有多人。For another example, an image of the face recognition can be obtained by photographing the camera on the vehicle, and the vehicle determines whether there are many people beside the vehicle by judging whether there are different facial characteristics.
再例如,可以通过在车上的门把手上设置指纹识别装置,识别车上是否有不同的指纹,通过是否有不同的指纹判断车上是否有多人。For another example, it is possible to identify whether there is a different fingerprint on the vehicle by setting a fingerprint recognition device on the door handle of the vehicle, and whether there are many people on the vehicle by whether there are different fingerprints.
又例如,可以通过车机上或者车上的红外传感装置,判断车内的人数。For another example, the number of people in the vehicle can be determined by an infrared sensor on the vehicle or in the vehicle.
又例如,可以通过感应数据,判断车内人数。具体可以通过车上座位的压力感应器检测人数,判断车内的人数。For another example, the number of people in the car can be determined by sensing data. Specifically, the number of people in the car can be determined by detecting the number of people by the pressure sensor of the seat on the car.
再例如,可以通过语音询问,判断车内人数。具体的,车机可以通过询问车内有多少人,然后车内乘客或驾驶员回答的方式来判断车内有多少人。For another example, the number of people in the car can be judged by voice inquiry. Specifically, the car machine can determine how many people are in the car by asking how many people in the car, and then the passengers in the car or the driver answering.
以上多种识别车内人数的方式可以多种方式配合实现。The above various ways of identifying the number of people in the car can be implemented in various ways.
比如:声纹识别虽然只识别了一个人,但是其它识别车内人数的方式识别为多人,则按多人的方式处理。For example, although voiceprint recognition only recognizes one person, other ways of identifying the number of people in the car are recognized as multiple people, and then it is handled in a multi-person manner.
S403,如果车机判断车内只有一个人(即只有驾驶员),转步骤404,否则转步骤405。S403, if the vehicle determines that there is only one person in the vehicle (ie, only the driver), go to step 404, otherwise go to step 405.
S404,车机进入免唤醒交互方式。此处的免唤醒交互方式,是指不需要唤醒词的交互方式。例如,驾驶员可以和车机说“导航回家”。随后,车机响应于这个免唤醒的命令,将执行导航回家的操作。S404, the vehicle enters a wake-free interaction mode. The wake-free interaction mode here refers to the interaction mode that does not require wake-up words. For example, the driver can say "navigation home" to the car. Subsequently, in response to this wake-free command, the vehicle will perform an operation of navigating home.
S405,车机进入唤醒问答交互方式。此处的唤醒问答交互方式,是指需要唤醒词的人机交互。例如,驾驶员或车内的乘客可以先唤醒车机说“你好,小驰”。随后,车机响应于这个唤醒的命令,回应一个答复说“有什么可以帮您的?”。之后,驾驶员或车内的乘客可以说“请导航回家”。之后,车机响应于该指令,执行导航回家的操作。又如,驾驶员或车内的乘客可以直接和车机说包括了唤醒词的语音,例如“小驰我要去机场”;其中“小驰”是唤醒词。随后,车机响应于这个命令,执行导航去机场的操作。S405, the vehicle enters a wake-up question and answer interaction mode. The wake-and-answer interaction mode here refers to the human-computer interaction that needs to wake up words. For example, the driver or the passenger in the car can wake up the car and say "Hello, Xiaochi". Subsequently, in response to this awakened command, the vehicle responded with a reply saying "What can help you?". After that, the driver or the passenger in the car can say "Please navigate home." Thereafter, the vehicle performs an operation of navigating home in response to the instruction. For example, the driver or the passenger in the car can directly speak the voice of the wake-up word with the car, such as "Xiaochi I am going to the airport"; among them, "Xiaochi" is the wake-up word. Subsequently, the vehicle performs an operation of navigating to the airport in response to this command.
本发明实施例中车机判断车内人数决定是否开启免唤醒模式,可减少车内用户通过唤醒词和车机交互的次数,从而改善了用户的体验。In the embodiment of the invention, the vehicle determines whether the number of people in the vehicle determines whether to activate the wake-free mode, thereby reducing the number of times the user in the vehicle interacts with the vehicle through the wake-up word, thereby improving the user experience.
车机具体如何判断车内的人数,可采用下面的方式。具体可参考图6所示的方法流程图。How to determine the number of people in the car, the following way. For details, refer to the method flowchart shown in FIG. 6.
S501,用户A说出首条语音信息。采集到首条用户A的语音,并标记用户A的声纹特征。其中首条语音信息是指,语音设备刚开启语音识别,收到的首条语音信息。S501, user A speaks the first voice message. The voice of the first user A is collected, and the voiceprint feature of the user A is marked. The first voice information refers to the first voice message received by the voice device just after voice recognition is turned on.
S502,车机判断首条语音信息是否为车机特定的指令。这里车机判断首条语音信息是否为车机特定的指令,是指车机判断是否有和车机功能强相关的指令。具体的,车机本地或云端的数据库中可以预先存储一些特定指令;当车机接收到一条语音时, 可以判断这条语音是否与该数据库中预先存储的指令完全匹配或者匹配度大于一定的阈值,如果完全匹配或者匹配度高,则该语音属于设备强相关指令(事先预定义),即为特定指令。例如,车机判断首条语音信息中是否都存在“立刻”和“导航”的指令词。如果都存在,则车机判断首条语音信息为车机特定的指令。如果车机判断首条语音信息中仅存在“导航”一词,则车机判断首条语音信息不是车机特定的指令。如果车机判断首条语音信息为设备特定的指令,则执行S503。如果车机判断首条语音信息不是设备特定的指令,则执行S504。比如,“立刻导航去某某”地方,则属于设备相强关指令,很可能是问本车机设备;比如,“昨晚睡的好吗”,则非强相关指令,更可能是问旁边的人。S502. The vehicle determines whether the first voice information is a specific instruction of the vehicle. Here, the vehicle determines whether the first voice information is a specific instruction of the vehicle, and refers to whether the vehicle determines whether there is a strong command related to the function of the vehicle. Specifically, some specific instructions may be pre-stored in the local or cloud database of the vehicle; when the vehicle receives a voice, it may be determined whether the voice completely matches the pre-stored instruction in the database or the matching degree is greater than a certain threshold. If the match is complete or the match is high, the voice belongs to the device strong related instruction (pre-defined), which is a specific instruction. For example, the vehicle judges whether there are instruction words of "immediately" and "navigation" in the first piece of voice information. If both exist, the vehicle determines that the first voice message is a vehicle-specific command. If the vehicle determines that only the word "navigation" exists in the first piece of voice information, the car machine determines that the first piece of voice information is not a specific instruction of the car machine. If the vehicle determines that the first voice message is a device-specific command, then S503 is performed. If the vehicle determines that the first piece of voice information is not a device-specific instruction, then S504 is performed. For example, the "immediate navigation to a certain place" is a device-related command, and it is likely to ask the car equipment; for example, "Is it good to sleep last night", it is not a strong related instruction, more likely to ask next to People.
S503,当车机判断首条语音信息为设备特定的指令,则车机进行相关的语音响应。S502和S503为可选。S503, when the vehicle determines that the first voice information is a device-specific instruction, the vehicle performs a related voice response. S502 and S503 are optional.
S504,如果车机判断首条语音信息不是设备特定的指令,则车机在显示屏上显示指令内容。其中,该步骤可选。当该实施例不包括S504时,如果车机判断首条语音信息不是特定指令,则执行S505。S504. If the vehicle determines that the first voice information is not a device-specific instruction, the vehicle displays the command content on the display screen. Among them, this step is optional. When the embodiment does not include S504, if the vehicle determines that the first piece of voice information is not a specific instruction, then S505 is performed.
其中,上述S502和S503若没有执行,则S504也不会执行。If the above S502 and S503 are not executed, S504 will not be executed.
如果S502和S503被执行,则S504可以执行也可以不执行。If S502 and S503 are executed, S504 may or may not be executed.
S505,车机延迟X秒。具体的,X可以为3秒,或者4秒,或者5秒。这里的延迟,可以是判断在接收到首条语音信息之后的第一时间段之内是否接收到车内其他用户的回答。首条语音信息和车内其他用户的回答具备不同的声纹信息。S505, the car is delayed by X seconds. Specifically, X can be 3 seconds, or 4 seconds, or 5 seconds. The delay here may be to determine whether an answer of other users in the car is received within the first time period after receiving the first piece of voice information. The first voice message and the answers of other users in the car have different voiceprint information.
S506,车机利用声纹技术,判断是否已有车内其他用户回答。如果没有用户回答,则进入S507。否则,进入S508。S506, the car machine uses voiceprint technology to determine whether there are other users in the car to answer. If there is no user answer, it proceeds to S507. Otherwise, go to S508.
S507,如果车内没有其他用户回答,则车机向用户A反馈语音响应,并记录车里只有1人。该车机向用户A反馈的语音响应可以是“好的,将为您执行XX操作”。S507, if there is no other user answering in the car, the car machine feedbacks the voice response to the user A, and records that there is only one person in the car. The voice response that the car machine feeds back to User A can be "OK, and will perform XX operations for you."
S508,如果车内存在用户A以外的其他用户的回答,则车机放弃语音响应,并记录车里有多人。即,如果在延迟语音答复期间,通过声纹识别技术采集到了第二个用户B的语音答复,则放弃本次语音响应,标记本内有多人。S508, if there is an answer from a user other than the user A in the car, the car machine gives up the voice response and records that there are many people in the car. That is, if the second user B's voice reply is collected by the voiceprint recognition technology during the delayed voice response, the voice response is discarded, and there are many people in the mark.
本发明该实施例中车机判断车内有几人,之后将车内人数记录在车机中,可使得车机根据车内有几人来判断是否采用唤醒词,从而提高了车机和人交互的效率。In the embodiment of the present invention, the vehicle determines that there are several people in the vehicle, and then records the number of people in the vehicle in the vehicle, so that the vehicle can determine whether to adopt the wake-up word according to several people in the vehicle, thereby improving the vehicle and the person. The efficiency of the interaction.
智能语音设备(例如:车机)所在的空间为封闭空间、半封闭空间或开放空间。该封闭空间可以是由连续曲面封闭而形成的封闭空间。半封闭空间可以是由空间非封闭曲面构成的半封闭空间,例如,开着房间门的房间形成的空间。开放空间可以是露天空间,或不由任何空间曲面封闭而形成的空间。该语音设备具体可以为智能语音设备,用于实现语音输入(即接收外部的语音,并将该语音转换为电信号),语音识别,并实现语音要求的功能的设备。The space in which an intelligent voice device (eg, a car) is located is an enclosed space, a semi-enclosed space, or an open space. The enclosed space may be an enclosed space formed by a continuous curved surface. The semi-enclosed space may be a semi-enclosed space composed of a non-closed curved surface, for example, a space formed by a room in which a room door is opened. The open space can be an open space or a space that is not enclosed by any spatial surface. The voice device may specifically be an intelligent voice device, and is used for implementing voice input (ie, receiving external voice and converting the voice into an electrical signal), voice recognition, and implementing a voice required function.
具体的,半封闭空间或开放空间可以是以所述智能语音设备的通信距离为半径的球状空间。Specifically, the semi-enclosed space or the open space may be a spherical space having a radius of communication distance of the intelligent voice device.
若所述封闭空间的半径小于或等于所述智能语音设备的通信距离,则所述智能语音设备所在的空间为所述封闭空间。这里封闭空间的半径,可以指该封闭空间的最长边的一半的距离。If the radius of the closed space is less than or equal to the communication distance of the smart voice device, the space where the smart voice device is located is the closed space. The radius of the enclosed space here may refer to the distance of half of the longest side of the enclosed space.
如图13所示,汽车为一个封闭空间。该汽车的半径为车长的一半。当车内的所述智能语音设备的通信距离为球C1的半径时,该汽车的半径正好也等于球C1的半径,则所述智能语音设备所在的空间为所述封闭空间。As shown in Figure 13, the car is an enclosed space. The car has a radius of half the length of the car. When the communication distance of the smart voice device in the vehicle is the radius of the ball C1, the radius of the car is exactly equal to the radius of the ball C1, and the space in which the smart voice device is located is the closed space.
当车内的智能语音设备的通信距离为球C2的半径时,由于球C2的半径大于球C1的半径(即汽车的半径),所以智能语音设备所在的空间为车体所包围的封闭空间,而不是以所述智能语音设备的通信距离(C2的半径)为半径的球状空间。When the communication distance of the intelligent voice device in the car is the radius of the ball C2, since the radius of the ball C2 is larger than the radius of the ball C1 (ie, the radius of the car), the space where the intelligent voice device is located is the closed space surrounded by the vehicle body. Rather than being a spherical space with a radius of communication distance (radius of C2) of the intelligent voice device.
车机设备的种类有多种,具体可包括车载智能后视镜,车载智能音箱和车载智能电视。以下用实例来讲述本发明实施例中具体是如何使用唤醒词的。There are many types of car equipment, including car smart rearview mirrors, car smart speakers and car smart TVs. The following uses examples to describe how the wake-up words are specifically used in the embodiments of the present invention.
如图7所示,车载智能后视镜开机,开始采集周围声音。此步骤中,车载智能后视镜通过图3中的话筒307来采集周围的声音。As shown in Figure 7, the vehicle's smart rearview mirror is turned on and begins to collect ambient sound. In this step, the in-vehicle smart rearview mirror collects the surrounding sound through the microphone 307 in FIG.
车载智能后视镜通过声纹识别技术,判断车内的人数。声纹识别,是生物识别技术的一种,也称为说话人识别,有两类,即说话人辨认和说话人确认。不同的任务和应用会使用不同的声纹识别技术,如缩小刑侦范围时可能需要辨认技术,而银行交易时则需要确认技术。声纹识别就是把声信号转换成电信号,再用计算机进行识别。具体在本文中,可通过声纹识别技术识别并确认车内有多少人。The car smart rearview mirror determines the number of people in the car through voiceprint recognition technology. Voiceprint recognition is a kind of biometric technology, also known as speaker recognition. There are two types, namely speaker recognition and speaker confirmation. Different voiceprint recognition techniques are used for different tasks and applications. For example, when narrowing the scope of criminal investigation, it may be necessary to identify the technology, and the bank transaction needs to confirm the technology. Voiceprint recognition is the conversion of sound signals into electrical signals, which are then identified by a computer. Specifically in this paper, voiceprint recognition technology can be used to identify and confirm how many people are in the car.
本例中,车载智能后视镜判断车内只有一个人(即只有驾驶员),车载智能后视镜进入免唤醒交互方式。此处的免唤醒交互方式,是指不需要唤醒词的交互方式。例如,驾驶员可以和车载智能后视镜说“导航去机场”。随后,车载智能后视镜响应于这个免唤醒的命令,将执行导航去机场的操作。可选的,车载智能后视镜可响应于驾驶员的指令,播报“开始为您导航去机场”。In this example, the vehicle-mounted smart rearview mirror determines that there is only one person in the car (ie, only the driver), and the vehicle-mounted smart rearview mirror enters the wake-free interaction mode. The wake-free interaction mode here refers to the interaction mode that does not require wake-up words. For example, the driver can say "navigation to the airport" with the on-board smart rearview mirror. Subsequently, the in-vehicle smart rearview mirror will perform navigation to the airport in response to this wake-free command. Optionally, the onboard smart rearview mirror can report "Start to navigate to the airport for you" in response to the driver's instructions.
如图8所示,如果车载智能后视镜判断车内有多人,车载智能后视镜进入唤醒问答交互方式。此处的唤醒问答交互方式,是指需要唤醒词的人机交互。例如,驾驶员或车内的乘客可以先唤醒车载智能后视镜说“你好,小驰”。随后,车载智能后视镜响应于这个唤醒的命令,回应一个答复说“有什么可以帮您?”。之后,驾驶员或车内的乘客可以说“我要去机场”。之后,车载智能后视镜响应于该指令,执行导航去机场的操作。可选的,车载智能后视镜可响应于驾驶员的指令,播报“开始为您导航去机场”。As shown in Fig. 8, if the vehicle intelligent rearview mirror determines that there are many people in the vehicle, the vehicle intelligent rearview mirror enters a wake-up question and answer interaction mode. The wake-and-answer interaction mode here refers to the human-computer interaction that needs to wake up words. For example, the driver or the passenger in the car can wake up the car smart rearview mirror and say "Hello, Xiaochi". Subsequently, the in-vehicle smart rearview mirror responded to a reply in response to a reply saying "What can help you?". After that, the driver or the passenger in the car can say "I am going to the airport." Thereafter, the in-vehicle smart rearview mirror performs an operation of navigating to the airport in response to the instruction. Optionally, the onboard smart rearview mirror can report "Start to navigate to the airport for you" in response to the driver's instructions.
如图9所示,是一个汽车内部结构示意图。具体的,图中将车载智能音箱集成在汽车中控面板。图9中,该车载智能音箱和现有音箱不同的是,该车载智能音箱包括拾音设备(例如麦克风)。汽车驾驶员或者车内的乘客可以和车载智能音箱通过车载智能音箱上的麦克风和扬声器进行语音通信。车载智能音箱通过声纹识别技术,判断车内的人数。As shown in Figure 9, it is a schematic diagram of the internal structure of a car. Specifically, the car smart speaker is integrated into the car center control panel. In FIG. 9, the in-vehicle smart speaker is different from the existing speaker, and the in-vehicle smart speaker includes a sound collecting device (for example, a microphone). The driver of the car or the passenger in the car can communicate with the car smart speaker via the microphone and speaker on the car smart speaker. The car smart speaker uses the voiceprint recognition technology to determine the number of people in the car.
车载智能音箱判断车内有一人或多人后,是否及如何使用唤醒词,可参考车载智能后视镜的处理方式。此处不再赘述。After the car smart speaker judges whether there are one or more people in the car, whether and how to use the wake-up words can refer to the processing method of the car smart rearview mirror. I will not repeat them here.
如图10所示,是一个汽车内部结构示意图。具体的,图中将车载智能电视集成在汽车顶部。图10中,该车载智能电视和现有电视不同的是,该车载智能电视包括拾音设备(例如麦克风)。汽车驾驶员或者车内的乘客可以和车载智能电视通过车载智能电视上的麦克风和扬声器进行语音通信。车载智能电视通过声纹识别技术,判断车内的人数。As shown in Figure 10, it is a schematic diagram of the internal structure of a car. Specifically, the car smart TV is integrated on the top of the car. In FIG. 10, the in-vehicle smart TV differs from the existing television in that the in-vehicle smart television includes a sound pickup device (for example, a microphone). The driver of the car or the passenger in the car can communicate with the car smart TV via the microphone and speaker on the car smart TV. The car smart TV uses the voiceprint recognition technology to determine the number of people in the car.
车载智能电视判断车内有一人或多人后,是否及如何使用唤醒词,可参考车载智 能后视镜的处理方式。此处不再赘述。After the car smart TV judges whether there are one or more people in the car, whether and how to use the wake-up words can refer to the processing method of the car smart rearview mirror. I will not repeat them here.
车机中的车载智能电视和车载智能音箱可以用到家庭领域,具体可对应到家庭智能音箱和家庭智能电视。The car smart TV and car smart speaker in the car can be used in the home field, specifically corresponding to the home smart speaker and home smart TV.
如图11所示,为一个家庭智能音箱的示意图。图11中,该车载智能音箱和现有音箱不同的是,该车载智能音箱包括拾音设备(例如麦克风)。家中的人可以和家庭智能音箱通过家庭智能音箱上的麦克风和扬声器进行语音通信。车载智能音箱通过声纹识别技术,判断家中人数。As shown in Figure 11, it is a schematic diagram of a home smart speaker. In FIG. 11, the in-vehicle smart speaker is different from the existing speaker, and the in-vehicle smart speaker includes a sound collecting device (for example, a microphone). The home can communicate with the home smart speaker through the microphone and speaker on the home smart speaker. The car smart speaker judges the number of people in the home through voiceprint recognition technology.
家庭智能音箱判断家中有一人或多人后,是否及如何使用唤醒词,可参考上述实施例的车载智能后视镜的处理方式。此处不再赘述。After the home smart speaker judges one or more people in the home, whether and how to use the wake-up words, refer to the processing method of the vehicle-mounted smart rearview mirror of the above embodiment. I will not repeat them here.
如图12所示,为一个家庭智能电视的示意图。图12中,该车载智能电视和现有电视不同的是,该车载智能电视包括拾音设备(例如麦克风)。家中的人可以和家庭智能电视通过家庭智能电视上的麦克风和扬声器进行语音通信。车载智能电视通过声纹识别技术,判断家中人数。As shown in Figure 12, it is a schematic diagram of a home smart TV. In FIG. 12, the in-vehicle smart TV differs from the existing television in that the in-vehicle smart television includes a sound pickup device (for example, a microphone). People in the home can communicate with the home smart TV via microphones and speakers on the home smart TV. The car smart TV judges the number of people in the home through voiceprint recognition technology.
家庭智能电视判断家中有一人或多人后,是否及如何使用唤醒词,可参考车载智能后视镜的处理方式。例如,判断图12中家中有2人时,家庭智能电视进入唤醒问答交互方式。此处的唤醒问答交互方式,是指需要唤醒词的人机交互。例如,妈妈或女儿可以先唤醒家庭智能电视说“你好,电视”。随后,家庭智能电视响应于这个唤醒的命令,可以回应一个答复说“有什么可以帮您?”。之后,妈妈或女儿可以说“我要看直播”。之后,家庭智能电视响应于该指令,执行看电视直播的操作。可选的,家庭智能电视可响应于妈妈或女儿的指令,播报“开始为您选择直播频道”。After the family smart TV judges one or more people in the home, whether and how to use the wake-up words can refer to the processing method of the car smart rearview mirror. For example, when it is judged that there are two people in the home in FIG. 12, the home smart television enters a wake-up question and answer interaction mode. The wake-and-answer interaction mode here refers to the human-computer interaction that needs to wake up words. For example, a mother or daughter can wake up the family smart TV and say "Hello, TV." Subsequently, in response to this awakened command, the home smart TV can respond to a reply saying "What can help you?". After that, my mother or daughter can say "I want to watch the live broadcast." Thereafter, the home smart television performs an operation of watching a live television broadcast in response to the instruction. Alternatively, the home smart TV can broadcast "Start selecting a live channel for you" in response to a mother or daughter's instruction.
本发明实施例还公开一种智能语音设备,具体可参考图3,智能语音设备可以包括:处理器301,用于判断所述智能语音设备所在空间内的人数;当判断所述空间内的人数为一时,控制所述智能语音设备进入免唤醒语音交互方式。这里的免唤醒语音交互方式是指,无需使用唤醒词的语音交互方式。例如,用户可以说“导航去XX地方”,语音设备(例如,智能语音设备)将执行导航去XX地方的操作,而不回复“有什么可以帮您的?”The embodiment of the present invention further discloses an intelligent voice device. Specifically, the smart voice device may include: a processor 301, configured to determine a number of people in a space where the smart voice device is located; and determine a number of people in the space. For a moment, the smart voice device is controlled to enter a wake-free voice interaction mode. The wake-free voice interaction method here refers to a voice interaction method without using a wake-up word. For example, the user can say "navigation to XX place", the voice device (for example, smart voice device) will perform navigation to the XX place operation without replying "What can help you?"
所述智能语音设备还包括:The smart voice device further includes:
采集器(例如:话筒307),用于采集所述空间内的第一语音;a collector (eg, a microphone 307) for collecting a first voice in the space;
所述处理器301用于判断所述空间内的人数,包括:The processor 301 is configured to determine the number of people in the space, including:
所述处理器,用于判断所述第一语音是否为所述智能语音设备的特定的指令;如果所述第一语音不是所述智能语音设备的特定的指令,延迟X秒;判断是否存在与所述第一语音具备不同声纹特性的第二语音;当判断有与所述第一语音具备不同声纹特性的所述第二语音指令时,确定所述空间内有多人;The processor is configured to determine whether the first voice is a specific instruction of the smart voice device; if the first voice is not a specific instruction of the smart voice device, delay X seconds; determine whether there is The first voice has a second voice with different voiceprint characteristics; when it is determined that the second voice command has a different voiceprint characteristic with the first voice, determining that there are multiple people in the space;
其中X为大于零的正数。Where X is a positive number greater than zero.
所述智能语音设备还包括:The smart voice device further includes:
采集器(例如:话筒307),用于收集所述空间内的第一语音;a collector (eg, a microphone 307) for collecting a first voice in the space;
所述处理器301用于判断所述封闭空间内的人数,包括:The processor 301 is configured to determine the number of people in the closed space, including:
所述处理器,用于判断所述第一语音是否为所述智能语音设备的特定的指令;如果所述第一语音不是所述智能语音设备的特定的指令,延迟X秒;判断是否存在与所 述第一语音具备不同声纹特性的第二语音;当判断没有与所述语音具备不同声纹特性的所述第二语音指令时,确定所述封闭空间内只有一人;The processor is configured to determine whether the first voice is a specific instruction of the smart voice device; if the first voice is not a specific instruction of the smart voice device, delay X seconds; determine whether there is The first voice has a second voice with different voiceprint characteristics; when it is determined that the second voice command does not have a different voiceprint characteristic with the voice, it is determined that there is only one person in the closed space;
其中X为大于零的正数。Where X is a positive number greater than zero.
所述智能语音设备所在的空间为封闭空间、半封闭空间或开放空间。The space in which the intelligent voice device is located is an enclosed space, a semi-enclosed space or an open space.
半封闭空间或开放空间为以所述智能语音设备的通信距离为半径的球状空间。The semi-enclosed space or open space is a spherical space having a radius of communication distance of the intelligent voice device.
若所述封闭空间的半径小于或等于所述智能语音设备的通信距离,则所述智能语音设备所在的空间为所述封闭空间。If the radius of the closed space is less than or equal to the communication distance of the smart voice device, the space where the smart voice device is located is the closed space.
如图14所示,是一个处理器的内部实现框图。可以从图中看出,处理器中包括4个高速处理核和4个低速处理核。每4个高速处理核和一个相应的二级缓存配合起来,形成一个高速核处理区域。每4个低速处理核和一个相应的二级缓存配合起来,形成一个低速核处理区域。这里高速处理核可以指处理频率为2.1GHz(赫兹)的处理核。这里低速处理核可以指处理频率为1.7GHz(赫兹)的处理核。As shown in Figure 14, it is a block diagram of the internal implementation of a processor. As can be seen from the figure, the processor includes 4 high-speed processing cores and 4 low-speed processing cores. Each of the four high-speed processing cores and a corresponding secondary cache are combined to form a high-speed core processing area. Each of the four low-speed processing cores and a corresponding second-level buffer are combined to form a low-speed core processing area. Here, the high speed processing core may refer to a processing core having a processing frequency of 2.1 GHz (hertz). Here, the low speed processing core may refer to a processing core having a processing frequency of 1.7 GHz (hertz).
而所有处理器301执行的步骤都是由高速处理核或低速处理核完成。All of the steps performed by processor 301 are performed by a high speed processing core or a low speed processing core.
除了高速处理核,低速处理核和相应的二级缓存外,还有其他的组成部分。例如,调制解调器基带部分;和射频收发器连接,用于处理射频信号的基带部分;和显示器相连的显示子系统;和CPU外部相连的图像信号处理子系统;和DDR存储相连的单通道DDR控制器;,和嵌入多媒体卡连接的嵌入多媒体卡接口;和个人电脑连接的USB接口;和短距通信模块相连的SDIO输入输出接口;和蓝牙,GPS相连的UART接口;和传感器相连的I2C接口;和智能卡SIM卡接口的智能卡接口。以及CPU内部还包括的影片处理子系统,Sensor Hub子系统,低功耗微控制器,高分辨率视频编解码器,双安全引擎,图像处理器和二级缓存形成的图像处理单元。还有布局于CPU内部的一致性总线,用于连接CPU中的所有接口及处理单元。In addition to the high-speed processing core, the low-speed processing core and the corresponding secondary cache, there are other components. For example, a modem baseband portion; a radio frequency transceiver connection for processing a baseband portion of the radio frequency signal; a display subsystem coupled to the display; an image signal processing subsystem coupled to the external CPU; and a single channel DDR controller coupled to the DDR memory ; embedded multimedia card interface with embedded multimedia card; USB interface connected with personal computer; SDIO input and output interface connected with short-distance communication module; UART interface connected with Bluetooth, GPS; I2C interface connected with sensor; Smart card interface of smart card SIM card interface. And a video processing subsystem, a Sensor Hub subsystem, a low-power microcontroller, a high-resolution video codec, a dual-security engine, an image processor, and an image processing unit formed by a second-level cache. There is also a coherent bus that is placed inside the CPU to connect all the interfaces and processing units in the CPU.
可以理解的是,上述终端等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例的技术方案能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。It can be understood that, in order to implement the above functions, the above terminal and the like include hardware structures and/or software modules corresponding to each function. Those skilled in the art should readily appreciate that the technical solutions of the embodiments of the present application can be implemented in a combination of hardware or hardware and computer software, in combination with the unit and algorithm steps of the examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
本申请实施例可以根据上述方法示例对上述终端等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present application may perform the division of the function modules on the terminal or the like according to the foregoing method example. For example, each function module may be divided according to each function, or two or more functions may be integrated into one processing module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present application is schematic, and is only a logical function division, and the actual implementation may have another division manner.
在上述实施例中,可以全部或部分的通过软件,硬件,固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式出现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以 存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘,硬盘、磁带)、光介质(例如,DVD)或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using a software program, it may occur in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)).
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. . Therefore, the scope of protection of the present application should be determined by the scope of the claims.

Claims (28)

  1. 一种语音交互的方法,其特征在于,包括:A method for voice interaction, comprising:
    语音设备判断所述语音设备所在空间内的人数;The voice device determines the number of people in the space where the voice device is located;
    当所述语音设备判断所述空间内的人数为一时,所述语音设备进入免唤醒语音交互方式。When the voice device determines that the number of people in the space is one, the voice device enters a wake-free voice interaction mode.
  2. 根据权利要求1所述的方法,其特征在于,所述语音设备判断所述语音设备所在空间内的人数,包括:The method according to claim 1, wherein the voice device determines the number of people in the space where the voice device is located, including:
    所述语音设备根据声纹信息、虹膜信息、人像信息、指纹信息、感应数据中的一个或多个判断所述语音设备所在空间内的人数。The voice device determines the number of people in the space where the voice device is located according to one or more of voiceprint information, iris information, portrait information, fingerprint information, and sensing data.
  3. 根据权利要求2所述的方法,其特征在于,所述语音设备根据声纹信息判断所述语音设备所在空间内的人数,包括:The method according to claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including:
    所述语音设备采集所述空间内的第一语音;The voice device collects a first voice in the space;
    所述语音设备判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,所述第二语音与所述第一语音具备不同的声纹特性;The voice device determines whether a second voice is received within a first time period after receiving the first voice, and the second voice and the first voice have different voiceprint characteristics;
    如果所述语音设备没有在所述第一时间段之内接收到所述第二语音,则确定所述空间内有一人。If the voice device does not receive the second voice within the first time period, it is determined that there is one person in the space.
  4. 根据权利要求2所述的方法,其特征在于,所述语音设备根据声纹信息判断所述语音设备所在空间内的人数,包括:The method according to claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including:
    所述语音设备采集所述空间内的第一语音;The voice device collects a first voice in the space;
    如果所述第一语音不是所述特定指令,则所述语音设备判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,所述第二语音与所述第一语音具备不同的声纹特性;If the first voice is not the specific instruction, the voice device determines whether a second voice is received within a first time period after receiving the first voice, the second voice and the second voice A voice has different voiceprint characteristics;
    如果所述语音设备没有在所述第一时间段之内接收到所述第二语音,则确定所述空间内有一人。If the voice device does not receive the second voice within the first time period, it is determined that there is one person in the space.
  5. 根据权利要求2所述的方法,其特征在于,所述语音设备根据声纹信息判断所述语音设备所在空间内的人数,包括:The method according to claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the voiceprint information, including:
    所述语音设备采集所述空间内的第一语音;The voice device collects a first voice in the space;
    如果所述第一语音不是所述特定的指令,则所述语音设备判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,所述第二语音与所述第一语音具备不同的声纹特性;If the first voice is not the specific instruction, the voice device determines whether a second voice is received within a first time period after receiving the first voice, the second voice and the The first voice has different voiceprint characteristics;
    如果所述语音设备在所述第一时间段之内接收到所述第二语音,则所述智能语音设备确定所述空间内有多人。If the voice device receives the second voice within the first time period, the smart voice device determines that there are multiple people in the space.
  6. 根据权利要求2所述的方法,其特征在于,所述语音设备根据虹膜信息判断所述语音设备所在空间内的人数,包括:The method according to claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the iris information, including:
    所述语音设备通过所述语音设备所在空间内的摄像头摄影得到虹膜识别的图像;The voice device obtains an image recognized by the iris through a camera in a space where the voice device is located;
    所述语音设备判断所述图像中是否有不同的虹膜信息;The voice device determines whether there is different iris information in the image;
    当所述语音设备判断有不同的虹膜信息时,所述语音设备确定所述语音设备所在空间内有多人;When the voice device determines that there is different iris information, the voice device determines that there are multiple people in the space where the voice device is located;
    当所述语音设备判断只有一种虹膜信息时,所述语音设备确定所述语音设备所在空间内为一人。When the voice device determines that there is only one type of iris information, the voice device determines that the voice device is located in a space.
  7. 根据权利要求2所述的方法,其特征在于,所述语音设备根据人像信息判断所述语音设备所在空间内的人数,包括:The method according to claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the portrait information, including:
    所述语音设备通过所述语音设备所在空间内的摄像头摄影得到人像信息;The voice device obtains portrait information by using a camera in a space where the voice device is located;
    所述语音设备判断所述图像中是否有不同的人像信息;The voice device determines whether there is different portrait information in the image;
    当所述语音设备判断有不同的人像信息时,所述语音设备确定所述语音设备所在空间内有多人;When the voice device determines that there is different portrait information, the voice device determines that there are multiple people in the space where the voice device is located;
    当所述语音设备判断只有一种人像信息时,所述语音设备确定所述语音设备所在空间内为一人。When the voice device determines that there is only one type of portrait information, the voice device determines that the voice device is located in a space.
  8. 根据权利要求2所述的方法,其特征在于,所述语音设备根据指纹信息判断所述语音设备所在空间内的人数,包括:The method according to claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the fingerprint information, including:
    所述语音设备通过所述语音设备所在空间内的指纹识别装置获得得到指纹信息;The voice device obtains fingerprint information by using a fingerprint identification device in a space where the voice device is located;
    所述语音设备判断所述图像中是否有不同的指纹信息;The voice device determines whether there is different fingerprint information in the image;
    当所述语音设备判断有不同的指纹信息时,所述语音设备确定所述语音设备所在空间内有多人;When the voice device determines that there is different fingerprint information, the voice device determines that there are multiple people in the space where the voice device is located;
    当所述语音设备判断只有一种指纹信息时,所述语音设备确定所述语音设备所在空间内为一人。When the voice device determines that there is only one type of fingerprint information, the voice device determines that the voice device is located in a space.
  9. 根据权利要求2所述的方法,其特征在于,所述语音设备根据感应数据判断所述语音设备所在空间内的人数,包括:The method according to claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the sensing data, including:
    所述语音设备通过所述语音设备所在空间内的感应装置获得得到感应数据;The voice device obtains the sensing data by using a sensing device in a space where the voice device is located;
    所述语音设备判断所述图像中是否有不同的感应数据;The voice device determines whether there is different sensing data in the image;
    当所述语音设备判断有不同的感应数据时,所述语音设备确定所述语音设备所在空间内有多人;When the voice device determines that there is different sensing data, the voice device determines that there are multiple people in the space where the voice device is located;
    当所述语音设备判断只有一种感应数据时,所述语音设备确定所述语音设备所在空间内为一人。When the voice device determines that there is only one type of sensing data, the voice device determines that the voice device is located in a space.
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述语音设备进入免唤醒语音交互方式之后,所述方法还包括:The method according to any one of claims 1 to 9, wherein after the voice device enters the wake-free voice interaction mode, the method further includes:
    所述语音设备接收第三语音,所述第三语音不包括唤醒词;The voice device receives a third voice, and the third voice does not include an awakening word;
    所述语音设备识别并执行所述第三语音对应的功能。The voice device identifies and performs a function corresponding to the third voice.
  11. 根据权利要求1-9任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 9, wherein the method further comprises:
    当所述语音设备判断所述空间内有多人时,所述语音设备进入唤醒语音交互方式;When the voice device determines that there are multiple people in the space, the voice device enters a wake-up voice interaction mode;
    所述语音设备接收唤醒词或者包括唤醒词的第四语音;Receiving, by the voice device, an wake-up word or a fourth voice including an awakening word;
    所述语音设备进入语音交互方式或者语音识别并执行所述第四语音对应的功能。The voice device enters a voice interaction mode or voice recognition and performs a function corresponding to the fourth voice.
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述语音设备所在的空间为封闭空间、半封闭空间或开放空间。The method according to any one of claims 1 to 11, wherein the space in which the voice device is located is a closed space, a semi-enclosed space or an open space.
  13. 根据权利要求12所述的方法,其特征在于,半封闭空间或开放空间为以所述语音设备的通信距离为半径的球状空间。The method according to claim 12, wherein the semi-enclosed space or the open space is a spherical space having a radius of a communication distance of the voice device.
  14. 根据权利要求12所述的方法,其特征在于,若所述封闭空间的半径小于或等 于所述语音设备的通信距离,则所述语音设备所在的空间为所述封闭空间。The method according to claim 12, wherein if the radius of the closed space is less than or equal to the communication distance of the voice device, the space in which the voice device is located is the closed space.
  15. 一种语音设备,其特征在于,包括:A voice device, comprising:
    采集器,用于采集所述语音设备所在空间内的信息;a collector, configured to collect information in a space where the voice device is located;
    处理器,用于根据采集器采集的信息判断所述空间内的人数;当判断所述空间内的人数为一时,控制所述语音设备进入免唤醒语音交互方式。The processor is configured to determine the number of people in the space according to the information collected by the collector; and when the number of people in the space is determined to be one, the voice device is controlled to enter a wake-free voice interaction mode.
  16. 根据权利要求15所述的语音设备,其特征在于,处理器用于根据采集器采集的信息判断所述语音设备所在空间内的人数,包括:The voice device according to claim 15, wherein the processor is configured to determine the number of people in the space where the voice device is located according to the information collected by the collector, including:
    处理器,用于根据声纹信息、虹膜信息、人像信息、指纹信息、感应数据中的一个或多个判断所述语音设备所在空间内的人数。The processor is configured to determine, according to one or more of the voiceprint information, the iris information, the portrait information, the fingerprint information, and the sensing data, the number of people in the space where the voice device is located.
  17. 根据权利要求16所述的语音设备,其特征在于,处理器用于根据声纹信息判断所述语音设备所在空间内的人数,包括:The voice device according to claim 16, wherein the processor is configured to determine the number of people in the space where the voice device is located according to the voiceprint information, including:
    所述采集器,用于采集所述空间内的第一语音;The collector is configured to collect a first voice in the space;
    处理器,具体用于判断在接收到所述第一语音之后的第一时间段之内是否接收到第二语音,所述第二语音与所述第一语音具备不同的声纹特性;如果所述语音设备没有在所述第一时间段之内接收到所述第二语音,则确定所述空间内有一人。The processor is specifically configured to determine whether the second voice is received within a first time period after receiving the first voice, where the second voice and the first voice have different voiceprint characteristics; If the voice device does not receive the second voice within the first time period, it is determined that there is a person in the space.
  18. 根据权利要求16所述的语音设备,其特征在于,A speech device according to claim 16 wherein:
    采集器,具体用于采集所述空间内的第一语音;The collector is specifically configured to collect the first voice in the space;
    所述处理器用于判断所述空间内的人数,包括:The processor is configured to determine the number of people in the space, including:
    所述处理器,用于判断如果所述第一语音不是所述特定的指令,判断在接收到所述第一语音之后的第一时间段之间是否接收到与所述第一语音具备不同的声纹特性的第二语音当判断有与所述第一语音具备不同声纹特性的所述第二语音指令时,确定所述空间内有多人。The processor, configured to determine, if the first voice is not the specific instruction, determining whether a first time period after receiving the first voice is received is different from the first voice The second voice of the voiceprint characteristic determines that there are a plurality of people in the space when it is determined that the second voice command has a different voiceprint characteristic with the first voice.
  19. 根据权利要求7所述的语音设备,其特征在于,采集器,具体用于采集所述空间内的第一语音;The voice device according to claim 7, wherein the collector is configured to collect the first voice in the space;
    所述处理器用于判断所述封闭空间内的人数,包括:The processor is configured to determine the number of people in the closed space, including:
    所述处理器,用于判断如果所述第一语音不是所述特定的指令,判断在接收到所述第一语音之后的第一时间段之间是否接收到与所述第一语音具备不同的声纹特性的第二语音当判断没有在第一时间段之间接收到所述第二语音时,确定所述封闭空间内有一人。The processor, configured to determine, if the first voice is not the specific instruction, determining whether a first time period after receiving the first voice is received is different from the first voice The second voice of the voiceprint characteristic determines that there is one person in the closed space when it is determined that the second voice is not received between the first time periods.
  20. 根据权利要求16所述的语音设备,其特征在于,所述处理器,用于根据虹膜信息判断所述语音设备所在空间内的人数,包括:The voice device according to claim 16, wherein the processor is configured to determine, according to the iris information, the number of people in the space where the voice device is located, including:
    所述处理器,具体用于通过所述语音设备所在空间内的摄像头摄影得到虹膜识别的图像;判断所述图像中是否有不同的虹膜信息;当判断有不同的虹膜信息时,所述处理器确定所述语音设备所在空间内有多人;当判断只有一种虹膜信息时,所述处理器确定所述语音设备所在空间内为一人。The processor is specifically configured to obtain an iris-recognized image by using a camera in a space where the voice device is located; determine whether there is different iris information in the image; and when determining different iris information, the processor Determining that there are multiple people in the space where the voice device is located; when it is determined that there is only one type of iris information, the processor determines that the space in the space where the voice device is located is one person.
  21. 根据权利要求16所述的语音设备,其特征在于,所述处理器,用于根据人像信息判断所述语音设备所在空间内的人数,包括:The voice device according to claim 16, wherein the processor is configured to determine, according to the portrait information, the number of people in the space where the voice device is located, including:
    所述处理器,具体用于通过所述语音设备所在空间内的摄像头摄影得到人像信息;判断所述图像中是否有不同的人像信息;当判断有不同的人像信息时,所述处理器确 定所述语音设备所在空间内有多人;当判断只有一种人像信息时,所述处理器确定所述语音设备所在空间内为一人。The processor is specifically configured to obtain portrait information by using a camera in a space where the voice device is located; determine whether there is different portrait information in the image; and when determining that there is different portrait information, the processor determines There are a plurality of people in the space where the voice device is located; when it is judged that there is only one type of portrait information, the processor determines that the space in which the voice device is located is one person.
  22. 根据权利要求16所述的语音设备,其特征在于,所述处理器,用于根据指纹信息判断所述语音设备所在空间内的人数,包括:The voice device according to claim 16, wherein the processor is configured to determine, according to the fingerprint information, the number of people in the space where the voice device is located, including:
    所述处理器,具体用于通过所述语音设备所在空间内的指纹识别装置获得得到指纹信息;判断所述图像中是否有不同的指纹信息;当判断有不同的指纹信息时,所述处理器确定所述语音设备所在空间内有多人;当判断只有一种指纹信息时,所述处理器确定所述语音设备所在空间内为一人。The processor is specifically configured to obtain fingerprint information by using a fingerprint identification device in a space where the voice device is located; determine whether there is different fingerprint information in the image; and when determining different fingerprint information, the processor Determining that there is more than one person in the space where the voice device is located; when determining that there is only one type of fingerprint information, the processor determines that the space in which the voice device is located is one person.
  23. 根据权利要求16所述的语音设备,其特征在于,所述处理器,用于根据感应数据判断所述语音设备所在空间内的人数,包括:The voice device according to claim 16, wherein the processor is configured to determine, according to the sensing data, the number of people in the space where the voice device is located, including:
    所述处理器,具体用于通过所述语音设备所在空间内的感应装置获得得到感应数据;判断所述图像中是否有不同的感应数据;当判断有不同的感应数据时,所述处理器确定所述语音设备所在空间内有多人;当判断只有一种感应数据时,所述处理器确定所述语音设备所在空间内为一人。The processor is specifically configured to obtain the sensing data by using a sensing device in a space where the voice device is located; determine whether there is different sensing data in the image; and when determining that different sensing data is determined, the processor determines There are multiple people in the space where the voice device is located; when it is determined that there is only one type of sensing data, the processor determines that the space in which the voice device is located is one person.
  24. 根据权利要求15-23所述的语音设备,其特征在于,A speech device according to claims 15-23, characterized in that
    所述采集器,还用于接收第三语音,所述第三语音不包括唤醒词;The collector is further configured to receive a third voice, where the third voice does not include an awakening word;
    所述处理器还用于:识别并执行所述第三语音对应的功能。The processor is further configured to: identify and perform a function corresponding to the third voice.
  25. 根据权利要求15-23所述的语音设备,其特征在于,A speech device according to claims 15-23, characterized in that
    所述处理器还用于:当所述语音设备判断所述空间内有多人时,所述语音设备进入唤醒语音交互方式;The processor is further configured to: when the voice device determines that there are multiple people in the space, the voice device enters a wake-up voice interaction mode;
    所述采集器,还用于接收唤醒词或者接收包括唤醒词的第四语音;The collector is further configured to receive an awakening word or receive a fourth voice including an awakening word;
    所述处理器还用于:控制所述语音设备进入语音交互方式,或者语音识别并执行所述第四语音对应的功能。The processor is further configured to: control the voice device to enter a voice interaction mode, or perform voice recognition and perform a function corresponding to the fourth voice.
  26. 根据权利要求15-25任一项所述的语音设备,其特征在于,所述语音设备所在的空间为封闭空间、半封闭空间或开放空间。The voice device according to any one of claims 15 to 25, wherein the space in which the voice device is located is a closed space, a semi-enclosed space or an open space.
  27. 根据权利要求26所述的语音设备,其特征在于,半封闭空间或开放空间为以所述语音设备的通信距离为半径的球状空间。The voice device according to claim 26, wherein the semi-enclosed space or the open space is a spherical space having a radius of a communication distance of the voice device.
  28. 根据权利要求26所述的语音设备,其特征在于,若所述封闭空间的半径小于或等于所述语音设备的通信距离,则所述语音设备所在的空间为所述封闭空间。The voice device according to claim 26, wherein if the radius of the closed space is less than or equal to the communication distance of the voice device, the space in which the voice device is located is the closed space.
PCT/CN2018/078362 2018-03-07 2018-03-07 Method and device for voice interaction WO2019169591A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/078362 WO2019169591A1 (en) 2018-03-07 2018-03-07 Method and device for voice interaction
CN201880090636.6A CN111819626A (en) 2018-03-07 2018-03-07 Voice interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/078362 WO2019169591A1 (en) 2018-03-07 2018-03-07 Method and device for voice interaction

Publications (1)

Publication Number Publication Date
WO2019169591A1 true WO2019169591A1 (en) 2019-09-12

Family

ID=67845477

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/078362 WO2019169591A1 (en) 2018-03-07 2018-03-07 Method and device for voice interaction

Country Status (2)

Country Link
CN (1) CN111819626A (en)
WO (1) WO2019169591A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112201246B (en) * 2020-11-19 2023-11-28 深圳市欧瑞博科技股份有限公司 Intelligent control method and device based on voice, electronic equipment and storage medium
CN114758654B (en) * 2022-03-14 2024-04-12 重庆长安汽车股份有限公司 Automobile voice control system and control method based on scene

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103035240A (en) * 2011-09-28 2013-04-10 苹果公司 Speech recognition repair using contextual information
US20140350924A1 (en) * 2013-05-24 2014-11-27 Motorola Mobility Llc Method and apparatus for using image data to aid voice recognition
CN105320726A (en) * 2014-05-30 2016-02-10 苹果公司 Reducing the need for manual start/end-pointing and trigger phrases
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR112017021673B1 (en) * 2015-04-10 2023-02-14 Honor Device Co., Ltd VOICE CONTROL METHOD, COMPUTER READABLE NON-TRANSITORY MEDIUM AND TERMINAL
CN107437415B (en) * 2017-08-09 2020-06-02 科大讯飞股份有限公司 Intelligent voice interaction method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103035240A (en) * 2011-09-28 2013-04-10 苹果公司 Speech recognition repair using contextual information
US20140350924A1 (en) * 2013-05-24 2014-11-27 Motorola Mobility Llc Method and apparatus for using image data to aid voice recognition
CN105320726A (en) * 2014-05-30 2016-02-10 苹果公司 Reducing the need for manual start/end-pointing and trigger phrases
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof

Also Published As

Publication number Publication date
CN111819626A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
US20210280055A1 (en) Feedback performance control and tracking
US11501513B2 (en) System and method for vehicle wheel detection
CN107776574B (en) Driving mode switching method and device for automatic driving vehicle
US10712556B2 (en) Image information processing method and augmented reality AR device
US10762635B2 (en) System and method for actively selecting and labeling images for semantic segmentation
CN111071152B (en) Fish-eye image processing system and method
WO2023274361A1 (en) Method for controlling sound production apparatuses, and sound production system and vehicle
WO2019169591A1 (en) Method and device for voice interaction
CN115520198A (en) Image processing method and system and vehicle
WO2024093768A1 (en) Vehicle alarm method and related device
WO2021217575A1 (en) Identification method and identification device for object of interest of user
US20230342883A1 (en) Image processing method and apparatus, and storage medium
CN115170630B (en) Map generation method, map generation device, electronic equipment, vehicle and storage medium
CN115056784B (en) Vehicle control method, device, vehicle, storage medium and chip
CN108528330B (en) Vehicle and information interaction method thereof
CN115214629B (en) Automatic parking method, device, storage medium, vehicle and chip
WO2024022394A1 (en) Payment method and related apparatus
US11914914B2 (en) Vehicle interface control
WO2023072118A1 (en) Audio stream processing method, address processing method, and related device
WO2023072120A1 (en) Position acquisition method and related device
WO2023000206A1 (en) Speech sound source location method, apparatus and system
WO2024092559A1 (en) Navigation method and corresponding device
WO2024108380A1 (en) Automatic parking method and device
EP4296132A1 (en) Vehicle control method and apparatus, vehicle, non-transitory storage medium and chip
CN114964294A (en) Navigation method, navigation device, storage medium, electronic equipment, chip and vehicle

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908322

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18908322

Country of ref document: EP

Kind code of ref document: A1