CN111819626A - Voice interaction method and device - Google Patents

Voice interaction method and device Download PDF

Info

Publication number
CN111819626A
CN111819626A CN201880090636.6A CN201880090636A CN111819626A CN 111819626 A CN111819626 A CN 111819626A CN 201880090636 A CN201880090636 A CN 201880090636A CN 111819626 A CN111819626 A CN 111819626A
Authority
CN
China
Prior art keywords
voice
space
people
equipment
voice equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201880090636.6A
Other languages
Chinese (zh)
Inventor
魏建宾
余尚春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN111819626A publication Critical patent/CN111819626A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase

Abstract

An embodiment of the present application provides a method for voice interaction, including: the voice equipment judges the number of people in the space where the voice equipment is located; and when the voice equipment judges that the number of people in the space is one, the voice equipment enters a wake-up-free voice interaction mode. Compared with the prior art, by adopting the scheme provided by the invention, the awakening action of the intelligent equipment can be reduced when the intelligent equipment needs to be operated.

Description

Voice interaction method and device Technical Field
The embodiment of the application relates to the technical field of communication, in particular to a voice prompt method and device.
Background
Although the artificial intelligence technology is widely used in vehicle-mounted intelligent devices, the intelligent vehicle-mounted devices in the current market are equipped with a voice control function, but all need to be awakened by voice (such as 'small you good') before use, and have the common phenomena of less sensitive identification and excessively complex interaction, and have become one of the biggest defects of users of products such as vehicle-mounted intelligent rearview mirrors.
If the user does not use the awakening word for awakening, the user is frequently awakened by mistake, and particularly when the user chats with other people, the device mistakenly assumes that the user responds to the instruction issued by the user, so that the user is very embarrassed. At present, manufacturers propose an integrated mode of 'awakening words and voice semantic recognition', zero interval, zero delay and seamless connection between the awakening words and voice control are realized, the traditional question-answer mode is abandoned, and steps and complex awakening actions of voice control of users are reduced. For a user, when each function of the intelligent device is used, no matter whether the intelligent device is awakened first and then asked and answered or the intelligent device is awakened and asked and answered integrally, the interaction between the vehicle-mounted device and the user is too complicated because the intelligent device has awakening actions.
Disclosure of Invention
The embodiment of the application provides a voice prompt method and device, which can reduce the awakening action of intelligent equipment when the intelligent equipment needs to be operated.
In one aspect, an embodiment of the present invention provides a method for voice interaction, including: the voice equipment judges the number of people in the space where the voice equipment is located; and when the voice equipment judges that the number of people in the space is one, the voice equipment enters a wake-up-free voice interaction mode.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located includes: the voice equipment judges the number of people in the space where the voice equipment is located according to one or more of voiceprint information, iris information, portrait information, fingerprint information and induction data. The voice equipment identifies the number of people in the space where the voice equipment is located in multiple modes, and comprehensive judgment is carried out, so that the accuracy of identifying the number of people in the space where the voice equipment is located is improved.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located according to the voiceprint information includes: the voice equipment collects first voice in the space; the voice equipment judges whether a second voice is received within a first time period after the first voice is received, wherein the second voice and the first voice have different voiceprint characteristics; determining that there is a person in the space if the speech device does not receive the second speech within the first time period. The voice device identifies the number of people in the space where the voice device is located through different voiceprints, and the method is a common identification method, and in the example, one person in the space is judged.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located according to the voiceprint information includes: the voice equipment collects first voice in the space; if the first voice is not the specific instruction, the voice device judges whether a second voice is received within a first time period after the first voice is received, wherein the second voice and the first voice have different voiceprint characteristics; determining that there is a person in the space if the speech device does not receive the second speech within the first time period. The voice equipment further judges whether a second voice is received within a first time period after the first voice is received by judging that the collected first voice is not a specific instruction, so that the number of people in the space where the voice equipment is located can be more accurately determined. The first speech herein may be the first speech.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located according to the voiceprint information includes: the voice equipment collects first voice in the space; if the first voice is not the specific instruction, the voice device judges whether a second voice is received within a first time period after the first voice is received, wherein the second voice and the first voice have different voiceprint characteristics; and if the voice equipment receives the second voice within the first time period, the intelligent voice equipment determines that a plurality of people exist in the space. The voice equipment further judges whether a second voice is received within a first time period after the first voice is received by judging that the collected first voice is not a specific instruction, so that the number of people in the space where the voice equipment is located can be more accurately determined.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located according to the iris information includes: the voice equipment obtains an iris recognition image through shooting by a camera in the space where the voice equipment is located; the voice equipment judges whether different iris information exists in the image or not; when the voice equipment judges that different iris information exists, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located; and when the voice equipment judges that only one type of iris information exists, the voice equipment determines that one person exists in the space where the voice equipment is located. In this example, the voice device determines the number of people in the space where the voice device is located according to the iris information, and a way of determining the number of people in the space where the voice device is located is added.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located according to the portrait information includes: the voice equipment obtains portrait information through the shooting of a camera in the space where the voice equipment is located; the voice equipment judges whether different portrait information exists in the image; when the voice equipment judges that different portrait information exists, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located; and when the voice equipment judges that only one type of portrait information exists, the voice equipment determines that one person exists in the space where the voice equipment is located. In this example, the voice device determines the number of people in the space where the voice device is located according to the portrait information, and a way of determining the number of people in the space where the voice device is located is added.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located according to the fingerprint information includes: the voice equipment obtains fingerprint information through a fingerprint identification device in the space where the voice equipment is located; the voice equipment judges whether the image has different fingerprint information or not; when the voice equipment judges that different fingerprint information exists, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located; and when the voice equipment judges that only one type of fingerprint information exists, the voice equipment determines that one person exists in the space where the voice equipment is located. In this example, the voice device determines the number of people in the space where the voice device is located according to the fingerprint information, and a way of determining the number of people in the space where the voice device is located is added.
In one possible design, the determining, by the speech device, the number of people in the space where the speech device is located according to the sensing data includes: the voice equipment obtains induction data through an induction device in the space where the voice equipment is located; the voice equipment judges whether the image has different induction data or not; when the voice equipment judges that different induction data exist, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located; and when the voice equipment judges that only one type of sensing data exists, the voice equipment determines that one person exists in the space where the voice equipment is located. In this example, the number of people in the space where the voice device is located is determined by the voice device according to the sensing data, and a mode for determining the number of people in the space where the voice device is located is added.
In one possible design, after the voice device enters the wake-up-free voice interaction mode, the method further includes: the voice equipment receives a third voice, wherein the third voice does not comprise a wake-up word; and the voice equipment identifies and executes the function corresponding to the third voice. In this example, after the voice device enters the wake-up-free voice interaction mode, the third voice not including the wake-up word is recognized, and the corresponding function is executed, so that the voice interaction times of the wake-up word can be reduced.
In one possible design, when the voice device determines that there are a plurality of people in the space, the voice device enters a wake-up voice interaction mode; the voice equipment receives a wake-up word or a fourth voice comprising the wake-up word; and the voice equipment enters a voice interaction mode or voice recognition and executes the function corresponding to the fourth voice. In this example, the voice device enters a wake-up voice interaction mode, and voice interaction based on the wake-up word can be realized.
In one possible design, the space in which the speech device is located is a closed space, a semi-closed space or an open space.
In one possible design, the semi-enclosed space or the open space is a spherical space with a radius of the communication distance of the speech device.
In one possible design, if the radius of the closed space is smaller than or equal to the communication distance of the voice device, the space where the voice device is located is the closed space.
On the other hand, the embodiment of the present invention provides a speech device, which has a function of implementing the behavior of the speech device in the above method. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In another aspect, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the speech device, which includes a program designed to perform the above aspects.
Compared with the prior art, the scheme provided by the invention can reduce the awakening action of the intelligent equipment when the intelligent equipment needs to be operated.
Drawings
FIG. 1 is a system architecture diagram of a mobile vehicle communication system according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of an exemplary vehicle 12 provided by the present embodiment;
fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the audio circuit 305 of FIG. 3;
FIG. 5 is a flow chart of voice interaction provided by an embodiment of the present invention;
FIG. 6 is a flowchart illustrating a method for determining a number of people in a vehicle according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of human interaction with a vehicle-mounted intelligent rearview mirror provided by an embodiment of the invention;
FIG. 8 is another schematic diagram of human interaction with a vehicle mounted smart rearview mirror provided in accordance with an embodiment of the present invention;
fig. 9 is a schematic view of a vehicle-mounted smart sound box according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a vehicle-mounted smart television according to an embodiment of the present invention;
fig. 11 is a schematic view of a home smart sound box according to an embodiment of the present invention;
fig. 12 is a schematic diagram of human interaction with a home smart tv according to an embodiment of the present invention;
FIG. 13 is a schematic view of an embodiment of the present invention including two enclosures;
fig. 14 is a schematic structural diagram of a processor according to an embodiment of the present invention.
Detailed Description
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present application, "a plurality" means two or more unless otherwise specified.
The following detailed description describes various features and functions of the disclosed systems and methods with reference to the accompanying drawings. In the drawings, like numerals identify like components unless context dictates otherwise. The illustrative system and method embodiments described herein are not intended to be limiting. It will be readily understood that certain aspects of the disclosed systems and methods may be arranged and combined in a wide variety of different configurations, all of which are contemplated herein.
Referring to fig. 1, fig. 1 is a system architecture diagram of a mobile vehicle communication system according to an embodiment of the present invention. Communication system 10 includes, among other things, a vehicle 12, one or more wireless carrier systems 14, a terrestrial communication network 16, a computer 18, and a call center 20. It should be understood that the disclosed methods can be used with any number of different systems and are not particularly limited to the operating environments illustrated herein. As such, the architecture, construction, arrangement, and operation of the system 10, as well as its individual components, are generally known in the art. Thus, the following paragraphs simply provide an overview of one example communication system 10, and other systems not shown herein can also use the disclosed methods.
The vehicle 12 may be implemented on an automobile or may take the form of an automobile. However, the example systems may also be implemented on or take the form of other vehicles, such as cars, trucks, motorcycles, buses, boats, airplanes, helicopters, lawn mowers, snow shovels, recreational vehicles, amusement park vehicles, agricultural equipment, construction equipment, trams, golf carts, trains, and trams, among other vehicles. Further, robotic devices may also be used to perform the methods and systems described herein.
Some vehicle hardware 28 is shown in fig. 1, including a telematics unit 30, a microphone 32, one or more buttons or other control inputs 34, an audio system 36, a visual display 38, and a Global Positioning System (GPS) module 40 and a plurality of Vehicle Security Modules (VSMs) 42. Some of these devices can be directly connected to the information communication unit, such as microphone 32 and buttons 34, while others make indirect connections using one or more network connections, such as communication bus 44 or entertainment bus 46. Examples of suitable network connections include Controller Area Networks (CAN), Media Oriented Systems Transfer (MOST), Local Interconnect Networks (LIN), Local Area Networks (LAN), and other suitable connections such as Ethernet or other connections consistent with the known International organization for standardization (ISO), the Society of Automotive Engineers (SAE), and the Institute of Electrical and Electronics Engineers (IEEE) standards and specifications, to name a few.
The telematics unit 30 may be an Original Equipment Manufacturer (OEM) installed (embedded) or aftermarket device that is installed in the vehicle and is capable of wireless voice and/or data communication over the wireless carrier system 14 and via wireless networking. This enables the vehicle to communicate with call center 20, other information-enabled vehicles, or some other entity or device. The information communication unit preferably uses radio broadcasting to establish a communication channel (voice channel and/or data channel) with wireless carrier system 14 so that voice and/or data transmissions can be sent and received over the channel. By providing both voice and data communications, telematics unit 30 enables the vehicle to provide a variety of different services, including those associated with navigation, telephony, emergency rescue, diagnostics, infotainment, and the like. Data can be sent via a data connection, e.g. via packet data transmission on a data channel, or via a voice channel using techniques known in the art. For a combination service that includes both voice communication (e.g., having a live advisor or voice response unit at the call center 20) and data communication (e.g., providing GPS location data or vehicle diagnostic data to the call center 20), the system may utilize a single call over a voice channel and switch between voice and data transmission over the voice channel as needed, which may be accomplished using techniques known to those skilled in the art. In addition, data (e.g., Packet Data Protocol (PDP)) may be transmitted and received using a short message service SMS; the information communication unit may be configured as a mobile termination and/or origination or as an application termination and/or origination.
The information communication unit 30 utilizes cellular communication in accordance with global system for mobile communication (GSM) or Code Division Multiple Access (CDMA) standards and thus includes a standard cellular chipset 50 for voice communication (e.g., hands-free calling), a wireless modem for data transmission, an electronic processing device 52, one or more digital memory devices 54, and a dual antenna 56. It should be understood that the modem can be implemented in software stored within the information communication unit and executed by processor 52, or it can be a separate hardware component located either internal or external to information communication unit 30. The modem can operate using any number of different standards or protocols, such as EVDO (CDMA20001xEV-DO, EVDO), CDMA, General Packet Radio Service (GPRS), and enhanced data rates for GSM evolution (EDGE). Wireless networking between the vehicle and other networked devices can also be performed using the information communication unit 30. For this purpose, the information communication unit 30 can be configured to wirelessly communicate according to one or more wireless protocols (e.g., any of IEEE 802.11 protocol, Worldwide Interoperability for Microwave Access (WiMAX), or bluetooth). When used for packet-switched data communication such as transmission control protocol/Internet protocol (TCP/IP), the information communication unit can be configured with a static IP address, or can be set to automatically receive an assigned IP address from another device (e.g., a router) on the network or from a network address server.
The processor 52 may be any type of device capable of processing electronic instructions, including a microprocessor, a microcontroller, a main processor, a controller, a vehicle communication processor, and an Application Specific Integrated Circuit (ASIC). It can be a dedicated processor for the information communication unit 30 only or can be shared with other vehicle systems. Processor 52 executes various types of digitally stored instructions, such as software or firmware programs stored in memory 54, which enable the information communication unit to provide a wide variety of services. For example, the processor 52 can execute programs or process data to perform at least a portion of the methods discussed herein.
The information communication unit 30 can be used to provide a diverse range of vehicle services, including wireless communication with other parts of the vehicle. Such services include turn-by-turn direct 1ns and services associated with other navigation provided in conjunction with the GPS based vehicle navigation module 40; airbag deployment notification and other emergency or roadside assistance associated services provided in conjunction with one or more crash sensor interface modules, such as a body control module (not shown). A diagnostic report using one or more diagnostic modules. And infotainment-associated services in which music, web pages, movies, television programs, video games, and/or other information is downloaded by the infotainment module and stored for current or later playback. The above listed services are by no means an exhaustive list of all capabilities of the messaging unit 30 but merely an enumeration of some of the services that the messaging unit is capable of providing. Further, it should be understood that at least some of the above modules may be implemented in the form of software instructions stored within or external to information communication unit 30, they may be hardware components located within or external to information communication unit 30, or they may be integrated and/or shared with each other or with other systems located throughout the vehicle, to name just a few possibilities. In the operational state, VSMs 42 located outside of telematics unit 30 can utilize vehicle bus 44 to exchange data and commands with telematics unit 30.
The GPS module 40 receives radio signals from GPS satellites 60. From these signals, the GPS module 40 is able to determine the location of the vehicle, which is used to provide navigation and other location-related services to the vehicle driver. The navigation information can be presented on the display 38 (or other display within the vehicle) or can be presented in language, such as is done when providing turn-by-turn navigation. Navigation services can be provided using a navigation module within the dedicated vehicle (which may be part of the GPS module 40), or some or all of the navigation services can be accomplished via the telematics unit 30, where location information is transmitted to a remote location in order to provide a navigation map, map labeling (points of interest, restaurants, etc.), route calculation, etc. for the vehicle. The location information can be provided to call center 20 or other remote computer system, such as computer 18, for other purposes, such as fleet management. And, new or updated map data can be downloaded from the call center 20 to the GPS module 40 via the information communication unit 30.
In addition to the audio system 36 and the GPS module 40, the vehicle 12 can include other vehicle safety modules VSMs 42 in the form of electronic hardware components, the other vehicle safety modules VSMs 42 being located throughout the vehicle, typically receiving input from one or more sensors, and using the sensed input to perform diagnostic, monitoring, control, reporting and/or other functions. Each of the VSMs 42 is preferably connected to other VSMs, also connected to the telematics unit 30, by a communications bus 44, and can be programmed to run vehicle system and subsystem diagnostic tests. For example, one VSM42 can be an Engine Control Module (ECM) that controls various aspects of engine operation (e.g., fuel ignition and ignition timing), another VSM42 can be a powertrain control module that regulates operation of one or more components of a powertrain of the vehicle, and another VSM42 can be a body control module that manages various electrical components located throughout the vehicle, such as power door locks and headlights of the vehicle. According to one embodiment, the engine control module is equipped with an On Board Diagnostics (OBD) feature that provides a large amount of real-time data, such as data received from various sensors, including vehicle emissions sensors, and provides a standardized set of diagnostic trouble codes (DTSs) that allow technicians to quickly identify and repair faults within the vehicle. As will be appreciated by those skilled in the art, the above-mentioned VSMs are merely examples of some of the modules that may be used within the vehicle 12, and many others are possible.
The vehicle electronics 28 also includes a number of vehicle user interfaces that provide a means for vehicle occupants to provide and/or receive information, including a microphone 32, buttons 34, an audio system 36, and a visual display 38. As used herein, the term "vehicle user interface" broadly includes any suitable form of electronic device, including hardware and software components, that is located on the vehicle and enables a vehicle user to communicate with or through components of the vehicle. Microphone 32 provides an audio input to the information communication unit to enable the driver or other occupant to provide voice commands and perform hands-free calling via wireless carrier system 14. For this purpose, it can be connected to an on-board automated sound processing unit, which makes use of Human Machine Interface (HMI) technology known in the art. Buttons 34 allow manual user input to the messaging unit 30 to initiate a wireless telephone call and provide other data, response or control inputs. Separate buttons can be used to initiate emergency calls as well as regular service help calls to call center 20. The audio system 36 provides audio output to the vehicle occupants and can be a dedicated stand-alone system or part of the host vehicle audio system. According to the particular embodiment shown herein, audio system 36 is operably coupled to vehicle bus 44 and entertainment bus 46 and is capable of providing Amplitude Modulation (AM), Frequency Modulation (FM), and satellite radio, Digital Versatile Disc (DVD), and other multimedia functions. This functionality can be provided in conjunction with the infotainment module described above or independently. The visual display 38 is preferably a graphical display, such as a touch screen on the dashboard or a heads-up display that reflects off the windshield, and can be used to provide a variety of input and output functions. Various other vehicle user interfaces can also be utilized, as the interface in FIG. 1 is merely an example of one specific embodiment.
Wireless carrier system 14 is preferably a cellular telephone system that includes a plurality of cell towers 70 (only one shown), one or more Mobile Switching Centers (MSCs) 72, and any other networking components required to connect wireless carrier system 14 with land network 16. Each cell tower 70 includes transmit and receive antennas and a base station, with base stations from different cell towers being connected directly to the MSC 72 or to the MSC 72 via an intermediate device (e.g., a base station controller). Cellular system 14 may implement any suitable communication technology including, for example, analog technology (e.g., analog mobile telephone system (AMPS)) or more recent digital technology (e.g., CDMA2000) or GSM/GPRS). As will be appreciated by those skilled in the art, various cell tower/base station/MSC arrangements are possible and may be used with the wireless system 14. For example, the base station and cell tower can be co-located at the same site, or they can be remotely located from each other, each base station can respond to a single cell tower or a single base station can serve each cell tower, each base station can be coupled to a single MSC, to name just a few of the possible arrangements.
In addition to using wireless carrier system 14, a different wireless carrier system in the form of satellite communication can be used to provide one-way or two-way communication with the vehicle. This can be accomplished using one or more communication satellites 62 and an uplink transmitting station 64. The one-way communication can be, for example, a satellite broadcast service in which program content (news, music, etc.) is received by a transmitting station 64, packaged for upload, and then transmitted to a satellite 62, which satellite 62 broadcasts the program to the users. The two-way communication can be, for example, a satellite telephone service that relays telephone communications between the vehicle 12 and the station 64 using the satellite 62. Such satellite phones, if used, can be used in addition to wireless carrier system 14 or in place of wireless carrier system 14.
Land network 16 may be a conventional land-based radio communication network that connects to one or more landline telephones and connects wireless carrier system 14 to call center 20. For example, land network 16 may include a Public Switched Telephone Network (PSTN), such as a PSTN used to provide wireline telephony, packet-switched data communications, and internet infrastructure. One or more portions of land network 16 can be implemented using a standard wired network, fiber optic or other optical network, cable network, power line, other wireless networks such as Wireless Local Area Networks (WLANs), or networks providing Broadband Wireless Access (BWA), as well as any combination thereof. The land network 16 may also include one or more Short Message Service Centers (SMSCs) for storing, uploading, converting, and/or transmitting Short Messages (SMS) between senders and recipients. For example, the SMSC can receive SMS messages from the call center 20 or a content provider (e.g., an external short message entity or ESME), and the SMSC can transmit the SMS messages to the vehicle 12 (e.g., a mobile terminal device). SMSCs and their functionality are known to the skilled person. In addition, call center 20 need not be connected via land network 16, but may include wireless telephony equipment so that it can communicate directly with a wireless network, such as wireless carrier system 14.
The computer 18 can be one of a plurality of computers accessible via a private or public network (e.g., the internet). Each such computer 18 can be used for one or more purposes, such as a vehicle accessing a web server via the telematics unit 30 and the wireless carrier 14. Other such accessible computers 18 can be, for example, service center computers, wherein diagnostic information and other vehicle data can be uploaded from the vehicle via the information communication unit 30; a client computer used by the vehicle owner or other user for purposes such as accessing or receiving vehicle data, or setting or configuring user parameters, or controlling functions of the vehicle; or a third party library to or from which vehicle data or other information is provided, whether by communication with the vehicle 12 or the call center 20, or both. The computer 18 can also be used to provide internet connectivity, such as a Domain Name Server (DNS) service, or as a network address server that uses a Dynamic Host Configuration Protocol (DHCP) or other suitable protocol to assign IP addresses to the vehicles 12.
The call center 20 is designed to provide a variety of different system back-end functions to the vehicle electronics 28, and according to the exemplary embodiment shown here, the call center 20 generally includes one or more switches 80, servers 82, databases 84, live advisors 86, and automated Voice Response Systems (VRS) 88, all of which are known in the art. These various call center components are preferably coupled to each other via a wired or wireless local area network 90. The switch 80 can be a private branch exchange (PBX) that routes incoming signals so that voice transmissions are typically sent over ordinary telephone to the live advisor 86 or to an automated voice response system 88 using VoIP. The live advisor phone can also use voice over Internet phone (VoIP) as indicated by the dashed line in fig. 1. VoIP and other data communications through the switch 80 are implemented via a modem (not shown) connected between the switch 80 and the network 90. The data transmission is passed via the modem to the server 82 and/or database 84. The database 84 can store account information such as user authentication information, vehicle identifiers, data graph (profile) records, behavior patterns, and other relevant user information. Data transmission may also be performed by wireless systems, such as 802.1lx, GPRS, etc. In addition, Short Message Service (SMS) may be used to send and/or receive data (e.g., PDP); and call center 20 may be configured for mobile termination and/or origination or for application termination and/or origination. While the illustrated embodiment has been described as it would be used with a manned call center 20 using a live advisor 86, it will be understood that the call center may instead use VRS 88 as an automated advisor, or a combination of VRS 88 and the live advisor 86 may be used.
FIG. 2 is a functional block diagram of an example vehicle 12 provided by embodiments of the present invention. The components coupled to the vehicle 12 or included in the vehicle 12 may include a propulsion system 102, a sensor system 104, a control system 106, peripherals 108, a power source 110, a computing device 111, and a user interface 112. Computing device 111 may include a processor 113 and a memory 114. The computing device 111 may be a controller or a portion of a controller of the vehicle 12. The memory 114 may include instructions 115 that the processor 113 may execute and may also store map data 116. The components of the vehicle 12 may be configured to operate in interconnected fashion with each other and/or with other components coupled to the various systems. For example, the power source 110 may provide power to all components of the vehicle 12. The computing device 111 may be configured to receive data from and control the propulsion system 102, the sensor system 104, the control system 106, and the peripherals 108. The computing device 111 may be configured to generate a display of images on the user interface 112 and receive input from the user interface 112.
In other examples, the vehicle 12 may include more, fewer, or different systems, and each system may include more, fewer, or different components. Further, the systems and components shown may be combined or divided in any number of ways.
The propulsion system 102 may be used to power movement of the vehicle 12. As shown, the propulsion system 102 includes an engine/motor 118, an energy source 120, a transmission 122, and wheels/tires 124.
The engine/motor 118 may be or include any combination of an internal combustion engine, an electric motor, a steam engine, a stirling engine, and the like. Other engines and engines are possible. In some examples, the propulsion system 102 may include multiple types of engines and/or motors. For example, a hybrid gas electric vehicle may include a gasoline engine and an electric motor. Other examples are possible.
The energy source 120 may be a source of energy that powers all or a portion of the engine/motor 118. That is, the engine/motor 118 may be used to convert the energy source 120 into mechanical energy. Examples of energy source 120 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electrical power. Energy source(s) 120 may additionally or alternatively include any combination of fuel tanks, batteries, capacitors, and/or flywheels. In some examples, the energy source 120 may also provide energy to other systems of the vehicle 12.
The transmission 122 may be used to transmit mechanical power from the engine/generator 118 to the wheels/tires 124. To this end, the transmission 122 may include a gearbox, a clutch, a differential, a drive shaft, and/or other elements. In examples where the transmission 122 includes a drive shaft, the drive shaft includes one or more shafts for coupling to wheels/tires 124.
The wheels/tires 124 of the vehicle 12 may be configured in a variety of forms including a unicycle, bicycle/motorcycle, tricycle, or sedan/truck four-wheel form. Other wheel/tire forms are also possible, such as those comprising six or more wheels. The wheels/tires 124 of the vehicle 12 may be configured to rotate differentially with respect to the other wheels/tires 124. In some examples, the wheels/tires 124 may include at least one wheel fixedly attached to the transmission 122 and at least one tire coupled to an edge of the wheel in contact with the driving surface. The wheel/tire 124 may include any combination of metal and rubber, or other material combinations.
The propulsion system 102 may additionally or alternatively include components other than those shown.
The sensor system 104 may include a number of sensors for sensing information about the environment in which the vehicle 12 is located. As shown, the sensors of the sensor system include a GPS126, an Inertial Measurement Unit (IMU) 128, a radio detection and RADAR ranging (RADAR) unit 130, a laser ranging (LIDAR) unit 132, a camera 134, and an actuator 136 for modifying the position and/or orientation of the sensors. The sensor system 104 may also include additional sensors, including, for example, sensors that monitor internal systems of the vehicle 12 (e.g., an O2 monitor, fuel gauge, oil temperature, etc.). The sensor system 104 may also include other sensors.
The GPS module 126 may be any sensor for estimating the geographic location of the vehicle 12. To this end, the GPS module 126 may include a transceiver to estimate the position of the vehicle 12 relative to the Earth based on satellite positioning data. In an example, the computing device 111 may be used to estimate the location of lane boundaries on a road on which the vehicle 12 may travel using the GPS module 126 in conjunction with the map data 116. The GPS module 126 may take other forms as well.
The IMU 128 may be a sensor for sensing position and orientation changes of the vehicle 12 based on inertial acceleration, and any combination thereof. In some examples, the combination of sensors may include, for example, an accelerometer and a gyroscope. Other combinations of sensors are also possible.
The RADAR unit 130 may be regarded as an object detection system for detecting characteristics of an object, such as a distance, height, direction, or speed of the object, using radio waves. The RADAR unit 130 may be configured to transmit radio waves or microwave pulses that may bounce off any object in the path of the waves. The object may return a portion of the energy of the wave to a receiver (e.g., a dish or antenna), which may also be part of RADAR unit 130. The RADAR unit 130 may also be configured to perform digital signal processing on the received signal (bouncing off the object) and may be configured to identify the object.
Other systems similar to RADAR have been used on other parts of the electromagnetic spectrum. One example is LIDAR (light detection and ranging), which may use visible light from a laser, rather than radio waves.
The LIDAR unit 132 includes a sensor that uses light sensing or detects objects in the environment in which the vehicle 12 is located. In general, LIDAR is an optical remote sensing technology that can measure the distance to a target or other properties of a target by illuminating the target with light. As an example, the LIDAR unit 132 may include a laser source and/or a laser scanner configured to emit laser pulses, and a detector for receiving reflections of the laser pulses. For example, the LIDAR unit 132 may include a laser range finder that is reflected by a turning mirror and scans the laser in one or two dimensions around the digitized scene to acquire distance measurements at specified angular intervals. In an example, the LIDAR unit 132 may include components such as a light (e.g., laser) source, a scanner and optics system, a light detector and receiver electronics, and a position and navigation system.
In an example, the LIDAR unit 132 may be configured to image objects using Ultraviolet (UV), visible, or infrared light, and may be used for a wide range of targets, including non-metallic objects. In one example, a narrow laser beam may be used to map physical features of an object at high resolution.
In an example, wavelengths in the range from about 10 micrometers (infrared) to about 250 nanometers (UV) may be used. Light is typically reflected via backscattering. Different types of scattering are used for different LIDAR applications, such as rayleigh scattering, mie scattering and raman scattering, and fluorescence. Based on different kinds of back scattering, the LIDAR may thus be referred to as rayleigh laser RADAR, mie LIDAR, raman LIDAR and sodium/iron/potassium fluorescence LIDAR, as examples. A suitable combination of wavelengths may allow remote mapping of objects, for example by looking for wavelength dependent changes in the intensity of the reflected signal.
Three-dimensional (3D) imaging can be achieved using both scanning and non-scanning LIDAR systems. A "3D gated viewing laser RADAR" is an example of a non-scanning laser ranging system that employs a pulsed laser and a fast gated camera. Imaging LIDAR may also be performed using high-speed detector arrays and modulation-sensitive detector arrays that are typically built on a single chip using Complementary Metal Oxide Semiconductor (CMOS) and hybrid complementary metal oxide semiconductor/Charge Coupled Device (CCD) fabrication techniques. In these devices, each pixel can be processed locally by demodulation or gating at high speed so that the array can be processed to represent an image from the camera. Using this technique, thousands of pixels may be acquired simultaneously to create a 3D point cloud representing an object or scene detected by the LIDAR unit 132.
The point cloud may include a set of vertices in a 3D coordinate system. These vertices may be defined by, for example, X, Y, Z coordinates, and may represent the outer surface of the object. The LIDAR unit 132 may be configured to create a point cloud by measuring a large number of points on the surface of the object, and may output the point cloud as a data file. As a result of the 3D scanning process of the object by the LIDAR unit 132, the point cloud may be used to identify and visualize the object.
In one example, the point cloud may be directly rendered to visualize the object. In another example, the point cloud may be converted to a polygonal or triangular mesh model by a process that may be referred to as surface reconstruction. Example techniques for converting a point cloud to a 3D surface may include delaunay triangulation, alpha shapes, and rolling spheres. These techniques include building a network of triangles on existing vertices of a point cloud. Other example techniques may include converting the point cloud to a volumetric distance field, and reconstructing the thus defined implicit surface by a marching cubes algorithm.
The camera 134 may be any camera (e.g., still camera, video camera, etc.) that acquires images of the environment in which the vehicle 12 is located. To this end, the camera may be configured to detect visible light, or may be configured to detect light from other parts of the spectrum (such as infrared or ultraviolet light). Other types of cameras are also possible. The camera 134 may be a two-dimensional detector, or may have a three-dimensional spatial extent. In some examples, the camera 134 may be, for example, a distance detector configured to generate a two-dimensional image indicative of distances from the camera 134 to several points in the environment. To this end, the camera 134 may use one or more distance detection techniques. For example, the camera 134 may be configured to use structured light technology, wherein the vehicle 12 illuminates objects in the environment with a predetermined light pattern, such as a grid or checkerboard pattern, and uses the camera 134 to detect reflections of the predetermined light pattern from the objects. Based on the distortion in the reflected light pattern, the vehicle 12 may be configured to detect the distance of a point on the object. The predetermined light pattern may include infrared light or other wavelengths of light.
The actuator 136 may be configured to modify the position and/or orientation of the sensor, for example. The sensor system 104 may additionally or alternatively include components other than those shown.
The control system 106 may be configured to control operation of the vehicle 12 and its components. To this end, the control system 106 may include a steering unit 138, a throttle 140, a braking unit 142, a sensor fusion algorithm 144, a computer vision system 146, a navigation or routing control (routing) system 148, and an obstacle avoidance system 150.
The steering unit 138 may be any combination of mechanisms configured to adjust the heading or direction of the vehicle 12.
The throttle 140 may be any combination of mechanisms configured to control the operating speed and acceleration of the engine/motor 118 and, in turn, the speed and acceleration of the vehicle 12.
The brake unit 142 may be any combination of mechanisms configured to decelerate the vehicle 12. For example, the brake unit 142 may use friction to slow the wheel/tire 124. As another example, the brake unit 142 may be configured to be regenerative and convert kinetic energy of the wheel/tire 124 into electrical current. The brake unit 142 may also take other forms.
The sensor fusion algorithm 144 may comprise, for example, an algorithm (or a computer program product storing an algorithm) executable by the computing device 111. The sensor fusion algorithm 144 may be configured to accept data from the sensors 104 as input. The data may include, for example, data representing information sensed at sensors of the sensor system 104. The sensor fusion algorithm 144 may include, for example, a kalman filter, a bayesian network, or another algorithm. The sensor fusion algorithm 144 may also be configured to provide various evaluations based on data from the sensor system 104, including, for example, an evaluation of individual objects and/or features in the environment in which the vehicle 12 is located, an evaluation of a specific situation, and/or an evaluation based on the likely impact of a particular situation. Other evaluations are also possible.
The computer vision system 146 may be any system configured to process and analyze images captured by the camera 134 in order to identify objects and/or features in the environment in which the vehicle 12 is located, including, for example, lane information, traffic signals, and obstacles. To this end, the computer vision system 146 may use object recognition algorithms, Structure From Motion (SFM) algorithms, video tracking, or other computer vision techniques. In some examples, the computer vision system 146 may additionally be configured to map the environment, follow objects, estimate the speed of objects, and so forth.
The navigation and route control system 148 may be any system configured to determine a driving route of the vehicle 12. The navigation and route control system 148 may additionally be configured to dynamically update the driving route while the vehicle 12 is in operation. In some examples, the navigation and route control system 148 may be configured to combine data from the sensor fusion algorithm 144, the GPS module 126, and one or more predetermined maps to determine a driving route for the vehicle 12.
The obstacle avoidance system 150 may be any system configured to identify, evaluate, and avoid or otherwise negotiate obstacles in the environment in which the vehicle 12 is located.
The control system 106 may additionally or alternatively include components other than those shown.
The peripheral devices 108 may be configured to allow the vehicle 12 to interact with external sensors, other vehicles, and/or users. To this end, the peripheral devices 108 may include, for example, a wireless communication system 152, a touch screen 154, a microphone 156, and/or a speaker 158.
The wireless communication system 152 may be any system configured to wirelessly couple to one or more other vehicles, sensors, or other entities, either directly or via a communication network. To this end, the wireless communication system 152 may include an antenna and chipset for communicating with other vehicles, sensors, or other entities, either directly or over an air interface. The chipset, or the entire wireless communication system 152, may be arranged to communicate in accordance with one or more other types of wireless communications (e.g., protocols), such as bluetooth, communication protocols described in IEEE 802.11 (including any IEEE 802.11 revisions), cellular technologies (such as GSM, CDMA, Universal Mobile Telecommunications System (UMTS), EV-DO, WiMAX, or Long Term Evolution (LTE)), zigbee, Dedicated Short Range Communications (DSRC), and Radio Frequency Identification (RFID) communications, among others. The wireless communication system 152 may take other forms as well.
The touch screen 154 may be used by a user to input commands to the vehicle 12. To this end, the touch screen 154 may be configured to sense at least one of a position and a movement of a user's finger via capacitive sensing, resistive sensing, or a surface acoustic wave process, among others. The touch screen 154 may be capable of sensing finger movement in a direction parallel to or in the same plane as the touch screen surface, in a direction perpendicular to the touch screen surface, or both, and may also be capable of sensing a level of pressure applied to the touch screen surface. The touch screen 154 may be formed of one or more translucent or transparent insulating layers and one or more translucent or transparent conductive layers. The touch screen 154 may take other forms as well.
The microphone 156 may be configured to receive audio (e.g., voice commands or other audio input) from a user of the vehicle 12. Similarly, the speaker 158 may be configured to output audio to a user of the vehicle 12.
Peripheral devices 108 may additionally or alternatively include components other than those shown.
The power supply 110 may be configured to provide power to some or all of the components of the vehicle 12. To this end, the power source 110 may include, for example, a rechargeable lithium ion or lead acid battery. In some examples, one or more battery packs may be configured to provide power. Other power supply materials and configurations are also possible. In some examples, the power source 110 and the energy source 120 may be implemented together, as in some all-electric vehicles.
The processor 113 included in the computing device 111 may include one or more general purpose processors and/or one or more special purpose processors (e.g., image processors, digital signal processors, etc.). To the extent that the processor 113 includes more than one processor, such processors may operate alone or in combination. The computing device 111 may implement functionality to control the vehicle 12 based on inputs received through the user interface 112.
The memory 114, in turn, may include one or more volatile memory components and/or one or more non-volatile memory components, such as optical, magnetic, and/or organic memory devices, and the memory 114 may be integrated in whole or in part with the processor 113. The memory 114 may contain instructions 115 (e.g., program logic) executable by the processor 113 to perform various vehicle functions, including any of the functions or methods described herein.
The components of the vehicle 12 may be configured to operate in an interconnected manner with other components within and/or outside of their respective systems. To this end, the components and systems of the vehicle 12 may be communicatively linked together via a system bus, network, and/or other connection mechanism.
As shown in fig. 3, is a schematic structural diagram of the vehicle machine according to the embodiment of the present invention. The terminal 300 (for example, a car machine) includes a processor 301, a memory 302, a camera 303, an RF circuit 304, an audio circuit 305, a speaker 306, a microphone 307, an input device 308, another input device 309, a display screen 310, a touch panel 311, a display panel 312, an output device 313, and a power supply 314. The display screen 310 is composed of at least a touch panel 311 as an input device and a display panel 312 as an output device. It should be noted that the terminal structure shown in fig. 3 is not limited to the terminal, and may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, and is not limited herein.
The various components of the terminal 300 will now be described in detail with reference to fig. 3:
a Radio Frequency (RF) circuit 304 may be used for receiving and transmitting signals during information transmission and reception or during a call, for example, if the terminal 300 is a vehicle-mounted device, the terminal 300 may receive downlink information transmitted by a base station through the RF circuit 304, and then transmit the downlink information to the processor 301 for processing; in addition, data relating to uplink is transmitted to the base station. Typically, the RF circuitry includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 304 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), etc.
The memory 302 may be used to store software programs and modules, and the processor 301 executes various functional applications and data processing of the terminal 300 by operating the software programs and modules stored in the memory 302. The memory 302 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (e.g., a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (e.g., audio data, video data, etc.) created according to the use of the terminal 300, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
Other input devices 309 may be used to receive entered numeric or character information and generate key signal inputs relating to user settings and function control of terminal 300. In particular, other input devices 309 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, a light mouse (a light mouse is a touch-sensitive surface that does not display visual output, or is an extension of a touch-sensitive surface formed by a touch screen), and the like. Other input devices 309 may also include sensors built into terminal 300, such as gravity sensors, acceleration sensors, etc., and terminal 300 may also use parameters detected by the sensors as input data.
The display screen 310 may be used to display information input by or provided to the user and various menus of the terminal 300, and may also accept user input. In addition, the display panel 312 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like; the touch panel 311, also referred to as a touch screen, a touch sensitive screen, etc., may collect contact or non-contact operations (for example, operations performed by a user on or near the touch panel 311 using any suitable object or accessory such as a finger, a stylus, etc., and may also include body sensing operations; the operations include single-point control operations, multi-point control operations, etc., and drive the corresponding connection device according to a preset program. It should be noted that the touch panel 311 may further include two parts, namely, a touch detection device and a touch controller. The touch detection device detects the touch direction and gesture of a user, detects signals brought by touch operation and transmits the signals to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into information capable of being processed by the processor 301, transmits the information to the processor 301, and receives and executes a command sent by the processor 301. In addition, the touch panel 311 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, a surface acoustic wave, and the like, and any technology developed in the future may be used to implement the touch panel 311. In general, the touch panel 311 covers the display panel 312, a user can operate on or near the touch panel 311 covered on the display panel 312 according to the content displayed on the display panel 312 (the displayed content includes, but is not limited to, a soft keyboard, a virtual mouse, virtual keys, icons, etc.), the touch panel 111 detects the operation on or near the touch panel and transmits the operation to the processor 301 to determine a user input, and then the processor 301 provides a corresponding visual output on the display panel 312 according to the user input. Although in fig. 3, the touch panel 311 and the display panel 312 are two separate components to implement the input and output functions of the terminal 300, in some embodiments, the touch panel 311 and the display panel 312 may be integrated to implement the input and output functions of the terminal 300.
RF circuitry 304, speaker 306, and microphone 307 may provide an audio interface between a user and terminal 300. The audio circuit 305 may transmit the converted signal of the received audio data to the speaker 306, and the converted signal is converted into a sound signal by the speaker 306 and output; alternatively, the microphone 307 may convert the collected sound signals into signals, convert the signals into audio data after being received by the audio circuit 305, and output the audio data to the RF circuit 304 to be transmitted to a device such as another terminal, or output the audio data to the memory 302 for further processing by the processor 301 in conjunction with the contents stored in the memory 302. In addition, the camera 303 may capture image frames in real time and transmit them to the processor 301 for processing, and store the processed results in the memory 302 and/or present the processed results to the user via the display panel 312.
The processor 301 is a control center of the terminal 300, connects various parts of the entire terminal 300 using various interfaces and lines, performs various functions of the terminal 300 and processes data by running or executing software programs and/or modules stored in the memory 302 and calling data stored in the memory 302, thereby monitoring the terminal 300 as a whole. It is noted that processor 301 may include one or more processing units; the processor 301 may also integrate an application processor, which mainly handles operating systems, User Interfaces (UIs), application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 301.
The terminal 300 may further include a power supply 314 (e.g., a battery) for supplying power to various components, and in embodiments of the present invention, the power supply 314 may be logically connected to the processor 301 through a power management system, so as to manage charging, discharging, and power consumption functions through the power management system.
In addition, there are also components not shown in fig. 3, for example, the terminal 300 may further include a bluetooth module, a sensor, and the like, which are not described herein again.
As shown in fig. 4, the audio circuit 305 may specifically include a Digital Signal Processing (DSP) and codec 401 module, wherein the codec sub-module implements analog-to-digital/digital-to-analog (AD/DA) conversion and the DSP sub-module implements processing of a speech algorithm.
Taking the terminal as the vehicle-mounted device as an example, as shown in fig. 5, the start-up process of the vehicle-mounted device is as follows:
s401, starting the car machine, and starting to collect surrounding sound. In this step, the car machine collects ambient sound through the microphone 307 in fig. 3.
And S402, judging the number of people in the car by the car machine through a voiceprint recognition technology. Voiceprint recognition, which is one of the biometric identification techniques, is also called speaker recognition, and has two categories, namely speaker recognition and speaker verification. Reference herein is made to speaker verification. Voiceprint recognition is the conversion of acoustic signals into electrical signals, which are then recognized by a computer. In this context, it is possible to identify how many people are in the vehicle by determining how many different voiceprints are in the vehicle by a voiceprint identification technique. Voiceprint (Voiceprint) is a spectrum of sound waves carrying verbal information displayed by an electro-acoustic apparatus. The generation of human language is a complex physiological and physical process between the human language center and the pronunciation organs, and the vocal print maps of any two people are different because the vocal organs used by a person in speaking, namely the tongue, the teeth, the larynx, the lung and the nasal cavity, are different greatly in size and shape. The speech acoustic characteristics of each person are both relatively stable and variable, not absolute, but invariant. The variation can come from physiology, pathology, psychology, simulation, camouflage and is also related to environmental interference. However, since the pronunciation organs of each person are different, in general, people can distinguish different sounds or judge whether the sounds are the same. Voiceprints are a way of identification, and can also be identified by, for example: iris information, face information, fingerprint information, infrared, sensing data, voice query, or a combination thereof.
For example, an iris recognition image can be obtained through camera shooting on a vehicle, and the vehicle determines whether multiple people exist beside the vehicle through judging whether different irises exist.
For another example, an image of face recognition can be obtained by shooting through a camera on the car machine, and the car machine judges whether multiple people exist beside the car machine or not by judging whether different face characteristics exist or not.
For another example, a fingerprint recognition device may be provided on a door handle of a vehicle to recognize whether different fingerprints are present in the vehicle, and whether a plurality of people are present in the vehicle may be determined by whether different fingerprints are present.
For example, the number of people in the vehicle can be determined by an infrared sensor device on the vehicle.
For example, the number of people in the vehicle can be determined by sensing data. The number of people can be detected through a pressure sensor on a seat on the vehicle, and the number of people in the vehicle can be judged.
For another example, the number of persons in the vehicle can be determined by voice inquiry. Specifically, the car machine can judge how many people are in the car by inquiring how many people are in the car and then answering by the passengers or the driver in the car.
The above various ways of identifying the number of people in the vehicle can be realized in a plurality of ways.
Such as: although voiceprint recognition recognizes only one person, other methods for recognizing the number of persons in the vehicle recognize many persons, and the methods are handled as many persons.
S403, if the vehicle-mounted device judges that only one person (namely only the driver) exists in the vehicle, turning to step 404, and otherwise, turning to step 405.
S404, the vehicle machine enters a wake-up-free interactive mode. The wake-free interactive mode herein refers to an interactive mode that does not require a wake word. For example, a driver may say "navigate home" with a car. The car machine will then perform a navigation back home operation in response to this wake-free command.
S405, the vehicle machine enters a question-answer awakening interaction mode. The question-answer interaction method for waking up here refers to human-computer interaction requiring a word to wake up. For example, a driver or passenger in a car may first wake up the car saying "hello, little drive". The car machine then responds to the wake up command with a reply saying "what can help you? ". Thereafter, the driver or the passenger in the vehicle may say "please navigate home". And then the car machine responds to the instruction and executes the operation of navigating home. As another example, a driver or passenger in a car may speak directly into the car a voice that includes a wake-up word, such as "small cruise me to an airport"; where "little relaxation" is a wake up word. The car machine then performs an operation to navigate to the airport in response to this command.
According to the embodiment of the invention, the vehicle machine judges the number of people in the vehicle to decide whether to start the wake-up-free mode, so that the interaction times of the user in the vehicle through the wake-up word and the vehicle machine can be reduced, and the user experience is improved.
How the number of people in the car is judged by the car machine can adopt the following mode. Reference may be made in particular to the method flowchart shown in fig. 6.
S501, the user A speaks the first voice message. The voice of the first user A is collected, and the voiceprint characteristics of the user A are marked. The first voice message is the first voice message received when the voice equipment just starts voice recognition.
S502, the car machine judges whether the first voice message is a specific instruction of the car machine. The vehicle machine judges whether the first voice message is a specific instruction of the vehicle machine, which means that the vehicle machine judges whether an instruction strongly related to the function of the vehicle machine exists. Specifically, a database local to the vehicle or in the cloud may store some specific instructions in advance; when the car machine receives a voice, whether the voice is completely matched with the instruction stored in the database in advance or whether the matching degree is greater than a certain threshold value can be judged, if the voice is completely matched or the matching degree is high, the voice belongs to the strong relevant instruction (predefined in advance) of the equipment, and the voice is the specific instruction. For example, the car machine determines whether the instruction words of "immediate" and "navigation" are present in the first voice message. If the first voice information is the instruction specific to the car machine, the car machine judges that the first voice information is the instruction specific to the car machine. If the car machine judges that the first voice message only has the word of 'navigation', the car machine judges that the first voice message is not a specific instruction of the car machine. If the car machine judges that the first voice message is a device-specific instruction, S503 is executed. If the car machine judges that the first voice message is not a device-specific instruction, S504 is executed. For example, if the user "navigates to a certain place immediately", the user belongs to a strong device-related instruction, and is likely to ask the user about the mobile device; for example, "do you sleep last night", then not a strongly related instruction, more likely to ask someone next.
And S503, when the vehicle machine judges that the first voice message is a specific command of the equipment, the vehicle machine performs related voice response. S502 and S503 are optional.
S504, if the car machine judges that the first voice message is not the specific instruction of the equipment, the car machine displays the instruction content on the display screen. Wherein this step is optional. When the embodiment does not include S504, if the car machine determines that the first voice message is not the specific instruction, S505 is executed.
If S502 and S503 are not executed, S504 is not executed.
If S502 and S503 are executed, S504 may or may not be executed.
And S505, delaying the vehicle machine for X seconds. Specifically, X may be 3 seconds, or 4 seconds, or 5 seconds. The delay may be to determine whether the response from the other user in the vehicle is received within a first time period after the first voice message is received. The first voice message and the answers of other users in the car have different voiceprint information.
And S506, the car machine judges whether other users in the car answer or not by using the voiceprint technology. If there is no user answer, S507 is entered. Otherwise, the process proceeds to S508.
And S507, if no other user answers in the car, the car machine feeds back a voice response to the user A, and records that only 1 person exists in the car. The voice response fed back to user a by the car machine may be "good, will perform XX operations for you".
And S508, if the answer of other users except the user A exists in the vehicle, the vehicle machine gives up the voice response and records that a plurality of people exist in the vehicle. That is, if the voice response of the second user B is collected by the voiceprint recognition technique during the delayed voice response, the voice response is abandoned, and a plurality of people in the notebook are marked.
According to the embodiment of the invention, the vehicle machine judges that a plurality of people exist in the vehicle, and then records the number of people in the vehicle, so that the vehicle machine can judge whether to adopt the awakening words according to the number of people in the vehicle, and the interaction efficiency of the vehicle machine and people is improved.
The space where the intelligent voice equipment (such as a car machine) is located is a closed space, a semi-closed space or an open space. The closed space may be a closed space formed by closing a continuous curved surface. The semi-closed space may be a semi-closed space formed by a space non-closed curved surface, for example, a space formed by a room with a room door opened. The open space may be an open space or a space not closed by any space curved surface. The voice device may specifically be an intelligent voice device, and is a device for implementing voice input (i.e., receiving external voice and converting the voice into an electrical signal), voice recognition, and implementing a function required by the voice.
Specifically, the semi-closed space or the open space may be a spherical space with a radius equal to a communication distance of the smart voice device.
And if the radius of the closed space is smaller than or equal to the communication distance of the intelligent voice equipment, the space where the intelligent voice equipment is located is the closed space. Here, the radius of the closed space may refer to a distance of half of the longest side of the closed space.
As shown in fig. 13, the automobile is a closed space. The radius of the automobile is half of the length of the automobile. When the communication distance of the intelligent voice device in the car is the radius of the sphere C1, the radius of the car is also equal to the radius of the sphere C1, and the space where the intelligent voice device is located is the closed space.
When the communication distance of the smart voice device in the vehicle is the radius of the sphere C2, since the radius of the sphere C2 is larger than the radius of the sphere C1 (i.e., the radius of the vehicle), the space in which the smart voice device is located is a closed space surrounded by the vehicle body, not a spherical space having the communication distance of the smart voice device (the radius of C2) as the radius.
The vehicle-mounted equipment has multiple types, and specifically can comprise a vehicle-mounted intelligent rearview mirror, a vehicle-mounted intelligent sound box and a vehicle-mounted intelligent television. The following examples are provided to explain how to use the wake-up word in embodiments of the present invention.
As shown in fig. 7, the vehicle-mounted intelligent rearview mirror is turned on to start collecting ambient sound. In this step, the vehicle-mounted smart rearview mirror collects ambient sound through the microphone 307 in fig. 3.
The vehicle-mounted intelligent rearview mirror judges the number of people in the vehicle through a voiceprint recognition technology. Voiceprint recognition, which is one of the biometric identification techniques, is also called speaker recognition, and has two categories, namely speaker recognition and speaker verification. Different tasks and applications may use different voiceprint recognition techniques, such as recognition techniques may be required to narrow criminal investigation, and validation techniques may be required for banking transactions. Voiceprint recognition is the conversion of acoustic signals into electrical signals, which are then recognized by a computer. In particular herein, how many people are in a car can be identified and confirmed by voiceprint recognition techniques.
In this example, the vehicle-mounted intelligent rearview mirror judges that only one person (i.e. only the driver) exists in the vehicle, and the vehicle-mounted intelligent rearview mirror enters the wake-up-free interactive mode. The wake-free interactive mode herein refers to an interactive mode that does not require a wake word. For example, a driver may say "navigate to an airport" with the vehicle-mounted smart rearview mirror. The onboard intelligent rearview mirror will then perform navigation to the airport in response to this wake-free command. Optionally, the vehicle-mounted intelligent rearview mirror may respond to the driver's instruction to report "start navigating you to the airport".
As shown in fig. 8, if the vehicle-mounted intelligent rearview mirror determines that there are more people in the vehicle, the vehicle-mounted intelligent rearview mirror enters a question-answer waking interaction mode. The question-answer interaction method for waking up here refers to human-computer interaction requiring a word to wake up. For example, a driver or passenger in a vehicle may first wake up the vehicle smart rearview mirror saying "hello, little drive". Then, the vehicle-mounted smart mirror responds to the command to wake up in response to a reply saying "what can help you? ". Thereafter, the driver or passenger in the vehicle may say "i want to go to the airport". Then, the vehicle-mounted intelligent rearview mirror responds to the instruction and executes the operation of navigating to the airport. Optionally, the vehicle-mounted intelligent rearview mirror may respond to the driver's instruction to report "start navigating you to the airport".
Fig. 9 is a schematic view of the interior structure of an automobile. Specifically, the vehicle-mounted intelligent sound box is integrated in the automobile central control panel in the drawing. In fig. 9, the in-vehicle smart speaker includes a sound pickup device (e.g., a microphone) unlike the existing speaker. The automobile driver or the passenger in the automobile can carry out voice communication with the vehicle-mounted intelligent sound box through the microphone and the loudspeaker on the vehicle-mounted intelligent sound box. The vehicle-mounted intelligent sound box judges the number of people in the vehicle through a voiceprint recognition technology.
After the vehicle-mounted intelligent sound box judges that one person or more persons exist in the vehicle, whether or not to use the awakening words can refer to the processing mode of the vehicle-mounted intelligent rearview mirror. And will not be described in detail herein.
Fig. 10 is a schematic view of the interior structure of an automobile. Specifically, the vehicle-mounted smart television is integrated on the top of the automobile. In fig. 10, the vehicle-mounted smart television is different from the existing television in that the vehicle-mounted smart television includes a sound pickup device (e.g., a microphone). The automobile driver or passengers in the automobile can perform voice communication with the vehicle-mounted smart television through a microphone and a loudspeaker on the vehicle-mounted smart television. The vehicle-mounted intelligent television judges the number of people in the vehicle through a voiceprint recognition technology.
The vehicle-mounted intelligent television judges whether or not and how to use the awakening words after one or more persons exist in the vehicle, and can refer to the processing mode of the vehicle-mounted intelligent rearview mirror. And will not be described in detail herein.
The vehicle-mounted intelligent television and the vehicle-mounted intelligent sound box in the vehicle machine can be used in the field of families, and can specifically correspond to the family intelligent sound box and the family intelligent television.
Fig. 11 is a schematic diagram of a home smart speaker. In fig. 11, the in-vehicle smart speaker includes a sound pickup device (e.g., a microphone) unlike the existing speaker. People in the house can communicate with the household intelligent sound box through the microphone and the loudspeaker on the household intelligent sound box in a voice mode. The number of people at home is judged by the vehicle-mounted intelligent sound box through a voiceprint recognition technology.
After the family smart speaker determines that there is one or more people in the family, whether to use the awakening word or not can refer to the processing mode of the vehicle-mounted smart rearview mirror of the above embodiment. And will not be described in detail herein.
Fig. 12 is a schematic diagram of a home smart tv. In fig. 12, the vehicle-mounted smart television is different from the existing television in that the vehicle-mounted smart television includes a sound pickup device (e.g., a microphone). The person in the home can perform voice communication with the home smart television through the microphone and the loudspeaker on the home smart television. The number of people at home is judged by the vehicle-mounted smart television through a voiceprint recognition technology.
After the family intelligent television judges that one or more persons exist in the family, whether or not and how to use the awakening words can refer to the processing mode of the vehicle-mounted intelligent rearview mirror. For example, when 2 persons are determined in the home in fig. 12, the home smart tv enters a wake-up question-answer interaction mode. The question-answer interaction method for waking up here refers to human-computer interaction requiring a word to wake up. For example, the mother or daughter may first wake up the home smart television to say "hello, tv". Then, the home smart tv may respond with a reply saying "what can help you? ". After that, the mother or daughter may say "i want to see live". And then, the home smart television responds to the instruction and executes the operation of watching the live television. Alternatively, the home smart television may respond to a mom or daughter instruction to broadcast "start selecting a live channel for you".
The embodiment of the present invention further discloses an intelligent voice device, which specifically refers to fig. 3, where the intelligent voice device may include: the processor 301 is used for judging the number of people in the space where the intelligent voice equipment is located; and when the number of people in the space is judged to be one, controlling the intelligent voice equipment to enter a wake-up-free voice interaction mode. The wake-free voice interaction mode is a voice interaction mode without using a wake word. For example, the user may say "navigate to XX place", the voice device (e.g., smart voice device) will perform an operation to navigate to XX place without replying to "what can help you? "
The intelligent voice device further comprises:
a collector (e.g., a microphone 307) for collecting a first voice in the space;
the processor 301 is configured to determine the number of people in the space, and includes:
the processor is used for judging whether the first voice is a specific instruction of the intelligent voice equipment; delaying for X seconds if the first voice is not a specific instruction of the smart voice device; judging whether a second voice with different voiceprint characteristics from the first voice exists or not; when the second voice instruction with different voiceprint characteristics from the first voice instruction is judged, determining that a plurality of people exist in the space;
wherein X is a positive number greater than zero.
The intelligent voice device further comprises:
a collector (e.g., a microphone 307) for collecting a first voice in the space;
the processor 301 is configured to determine the number of people in the enclosed space, and includes:
the processor is used for judging whether the first voice is a specific instruction of the intelligent voice equipment; delaying for X seconds if the first voice is not a specific instruction of the smart voice device; judging whether a second voice with different voiceprint characteristics from the first voice exists or not; when the second voice instruction with different voiceprint characteristics from the voice is judged not to exist, determining that only one person exists in the closed space;
wherein X is a positive number greater than zero.
The space where the intelligent voice equipment is located is a closed space, a semi-closed space or an open space.
The semi-closed space or the open space is a spherical space with the communication distance of the intelligent voice device as a radius.
And if the radius of the closed space is smaller than or equal to the communication distance of the intelligent voice equipment, the space where the intelligent voice equipment is located is the closed space.
FIG. 14 is a block diagram of an internal implementation of a processor. As can be seen in the figure, the processor includes 4 high speed processing cores and 4 low speed processing cores. Each 4 high-speed processing cores and a corresponding second-level cache are matched to form a high-speed core processing area. Every 4 low-speed processing cores and a corresponding second-level cache are matched to form a low-speed core processing area. A high speed processing core may be referred to herein as a processing core that processes at a frequency of 2.1GHz (hertz). A low speed processing core may be referred to herein as a processing core that processes at a frequency of 1.7GHz (hertz).
And all steps performed by the processor 301 are performed by either the high speed processing core or the low speed processing core.
There are other components in addition to the high speed processing core, the low speed processing core and the corresponding level two cache. For example, the modem baseband portion; a baseband part connected with the radio frequency transceiver for processing the radio frequency signal; a display subsystem coupled to the display; an image signal processing subsystem connected with the outside of the CPU; the single-channel DDR controller is connected with the DDR storage; an embedded multimedia card interface connected with the embedded multimedia card; USB interface connected to personal computer; the SDIO input/output interface is connected with the short-distance communication module; a UART interface connected with the Bluetooth and the GPS; an I2C interface connected to the sensor; and a smart card interface to the smart card SIM card interface. And the CPU also comprises a film processing subsystem, a Sensor Hub subsystem, a low-power consumption microcontroller, a high-resolution video codec, a double-safety engine, an image processor and an image processing unit formed by a secondary cache. And a consistency bus arranged in the CPU for connecting all the interfaces and the processing units in the CPU.
It is to be understood that the above-mentioned terminal and the like include hardware structures and/or software modules corresponding to the respective functions for realizing the above-mentioned functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
In the embodiment of the present application, the terminal and the like may be divided into functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any combination thereof. When implemented using a software program, may take the form of a computer program product, either entirely or partially. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (28)

  1. A method of voice interaction, comprising:
    the voice equipment judges the number of people in the space where the voice equipment is located;
    and when the voice equipment judges that the number of people in the space is one, the voice equipment enters a wake-up-free voice interaction mode.
  2. The method of claim 1, wherein the determining, by the speech device, the number of people in the space in which the speech device is located comprises:
    the voice equipment judges the number of people in the space where the voice equipment is located according to one or more of voiceprint information, iris information, portrait information, fingerprint information and induction data.
  3. The method of claim 2, wherein the determining, by the speech device, the number of people in the space in which the speech device is located according to the voiceprint information comprises:
    the voice equipment collects first voice in the space;
    the voice equipment judges whether a second voice is received within a first time period after the first voice is received, wherein the second voice and the first voice have different voiceprint characteristics;
    determining that there is a person in the space if the speech device does not receive the second speech within the first time period.
  4. The method of claim 2, wherein the determining, by the speech device, the number of people in the space in which the speech device is located according to the voiceprint information comprises:
    the voice equipment collects first voice in the space;
    if the first voice is not the specific instruction, the voice device judges whether a second voice is received within a first time period after the first voice is received, wherein the second voice and the first voice have different voiceprint characteristics;
    determining that there is a person in the space if the speech device does not receive the second speech within the first time period.
  5. The method of claim 2, wherein the determining, by the speech device, the number of people in the space in which the speech device is located according to the voiceprint information comprises:
    the voice equipment collects first voice in the space;
    if the first voice is not the specific instruction, the voice device judges whether a second voice is received within a first time period after the first voice is received, wherein the second voice and the first voice have different voiceprint characteristics;
    and if the voice equipment receives the second voice within the first time period, the intelligent voice equipment determines that a plurality of people exist in the space.
  6. The method of claim 2, wherein the determining, by the voice device, the number of people in the space where the voice device is located according to the iris information comprises:
    the voice equipment obtains an iris recognition image through shooting by a camera in the space where the voice equipment is located;
    the voice equipment judges whether different iris information exists in the image or not;
    when the voice equipment judges that different iris information exists, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located;
    and when the voice equipment judges that only one type of iris information exists, the voice equipment determines that one person exists in the space where the voice equipment is located.
  7. The method of claim 2, wherein the determining, by the speech device, the number of people in the space in which the speech device is located according to the portrait information comprises:
    the voice equipment obtains portrait information through the shooting of a camera in the space where the voice equipment is located;
    the voice equipment judges whether different portrait information exists in the image;
    when the voice equipment judges that different portrait information exists, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located;
    and when the voice equipment judges that only one type of portrait information exists, the voice equipment determines that one person exists in the space where the voice equipment is located.
  8. The method of claim 2, wherein the determining, by the speech device, the number of people in the space in which the speech device is located according to the fingerprint information comprises:
    the voice equipment obtains fingerprint information through a fingerprint identification device in the space where the voice equipment is located;
    the voice equipment judges whether the image has different fingerprint information or not;
    when the voice equipment judges that different fingerprint information exists, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located;
    and when the voice equipment judges that only one type of fingerprint information exists, the voice equipment determines that one person exists in the space where the voice equipment is located.
  9. The method of claim 2, wherein the voice device determines the number of people in the space where the voice device is located according to the sensing data, comprising:
    the voice equipment obtains induction data through an induction device in the space where the voice equipment is located;
    the voice equipment judges whether the image has different induction data or not;
    when the voice equipment judges that different induction data exist, the voice equipment determines that a plurality of people exist in the space where the voice equipment is located;
    and when the voice equipment judges that only one type of sensing data exists, the voice equipment determines that one person exists in the space where the voice equipment is located.
  10. The method according to any one of claims 1-9, wherein after the voice device enters the wake-up-free voice interaction mode, the method further comprises:
    the voice equipment receives a third voice, wherein the third voice does not comprise a wake-up word;
    and the voice equipment identifies and executes the function corresponding to the third voice.
  11. The method according to any one of claims 1-9, further comprising:
    when the voice equipment judges that a plurality of people exist in the space, the voice equipment enters a voice awakening interaction mode;
    the voice equipment receives a wake-up word or a fourth voice comprising the wake-up word;
    and the voice equipment enters a voice interaction mode or voice recognition and executes the function corresponding to the fourth voice.
  12. The method according to any one of claims 1 to 11, wherein the space in which the speech device is located is a closed space, a semi-closed space or an open space.
  13. The method of claim 12, wherein the semi-enclosed space or the open space is a spherical space with a radius of a communication distance of the speech device.
  14. The method according to claim 12, wherein the space in which the voice device is located is the closed space if the radius of the closed space is smaller than or equal to the communication distance of the voice device.
  15. A speech device, comprising:
    the collector is used for collecting the information in the space where the voice equipment is located;
    the processor is used for judging the number of people in the space according to the information collected by the collector; and when the number of people in the space is judged to be one, controlling the voice equipment to enter a wake-up-free voice interaction mode.
  16. The voice device according to claim 15, wherein the processor is configured to determine the number of people in the space where the voice device is located according to the information collected by the collector, and includes:
    and the processor is used for judging the number of people in the space where the voice equipment is located according to one or more of voiceprint information, iris information, portrait information, fingerprint information and induction data.
  17. The speech device of claim 16, wherein the processor is configured to determine the number of people in the space where the speech device is located according to the voiceprint information, and comprises:
    the collector is used for collecting first voice in the space;
    a processor, configured to determine whether a second voice is received within a first time period after the first voice is received, where the second voice and the first voice have different voiceprint characteristics; determining that there is a person in the space if the speech device does not receive the second speech within the first time period.
  18. The speech device of claim 16,
    the collector is specifically used for collecting the first voice in the space;
    the processor is configured to determine a number of people in the space, including:
    the processor is configured to determine whether a second voice having a different voiceprint characteristic from the first voice is received between a first period of time after the first voice is received if the first voice is not the specific instruction, and determine that there are multiple persons in the space when it is determined that there is the second voice instruction having a different voiceprint characteristic from the first voice.
  19. The speech device of claim 7, wherein the collector is specifically configured to collect the first speech in the space;
    the processor is used for judging the number of people in the closed space, and comprises the following steps:
    the processor is configured to determine whether a second voice having different voiceprint characteristics from the first voice is received between first time periods after the first voice is received if the first voice is not the specific instruction, and determine that there is a person in the enclosed space when it is determined that the second voice is not received between the first time periods.
  20. The speech device of claim 16, wherein the processor, configured to determine the number of people in the space where the speech device is located according to the iris information, comprises:
    the processor is specifically configured to obtain an image of iris recognition through camera shooting in a space where the voice device is located; judging whether the image has different iris information; when different iris information is judged, the processor determines that a plurality of persons exist in the space where the voice equipment is located; when only one type of iris information is judged, the processor determines that the voice equipment is located in the space as one person.
  21. The speech device of claim 16, wherein the processor is configured to determine the number of people in the space where the speech device is located according to the portrait information, and comprises:
    the processor is specifically used for obtaining portrait information through shooting of a camera in a space where the voice equipment is located; judging whether the image has different portrait information; when different portrait information is judged, the processor determines that a plurality of people exist in the space where the voice equipment is located; and when the fact that only one type of portrait information exists is judged, the processor determines that one person exists in the space where the voice equipment is located.
  22. The speech device of claim 16, wherein the processor is configured to determine the number of people in the space where the speech device is located according to the fingerprint information, and comprises:
    the processor is specifically configured to obtain fingerprint information through a fingerprint identification device in a space where the voice device is located; judging whether the image has different fingerprint information; when different fingerprint information is judged, the processor determines that a plurality of persons exist in the space where the voice equipment is located; when only one type of fingerprint information is judged, the processor determines that the voice equipment is located in the space as one person.
  23. The audio device of claim 16, wherein the processor is configured to determine the number of people in the space where the audio device is located according to the sensing data, and the determining includes:
    the processor is specifically configured to obtain sensing data through a sensing device in a space where the voice device is located; judging whether the image has different induction data or not; when different sensing data are judged, the processor determines that a plurality of people exist in the space where the voice equipment is located; when only one type of sensing data exists, the processor determines that the voice equipment is located in the space as one person.
  24. Speech device according to claim 15-23,
    the collector is further configured to receive a third voice, where the third voice does not include a wakeup word;
    the processor is further configured to: and identifying and executing a function corresponding to the third voice.
  25. Speech device according to claim 15-23,
    the processor is further configured to: when the voice equipment judges that a plurality of people exist in the space, the voice equipment enters a voice awakening interaction mode;
    the collector is further used for receiving the awakening words or receiving fourth voice comprising the awakening words;
    the processor is further configured to: and controlling the voice equipment to enter a voice interaction mode, or performing voice recognition and executing a function corresponding to the fourth voice.
  26. The speech device according to any one of claims 15-25, wherein the space in which the speech device is located is a closed space, a semi-closed space, or an open space.
  27. The speech device of claim 26, wherein the semi-enclosed space or the open space is a spherical space with a radius of a communication distance of the speech device.
  28. The speech device according to claim 26, wherein if the radius of the enclosed space is smaller than or equal to the communication distance of the speech device, the space in which the speech device is located is the enclosed space.
CN201880090636.6A 2018-03-07 2018-03-07 Voice interaction method and device Pending CN111819626A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/078362 WO2019169591A1 (en) 2018-03-07 2018-03-07 Method and device for voice interaction

Publications (1)

Publication Number Publication Date
CN111819626A true CN111819626A (en) 2020-10-23

Family

ID=67845477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880090636.6A Pending CN111819626A (en) 2018-03-07 2018-03-07 Voice interaction method and device

Country Status (2)

Country Link
CN (1) CN111819626A (en)
WO (1) WO2019169591A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112201246A (en) * 2020-11-19 2021-01-08 深圳市欧瑞博科技股份有限公司 Intelligent control method and device based on voice, electronic equipment and storage medium
CN114758654A (en) * 2022-03-14 2022-07-15 重庆长安汽车股份有限公司 Scene-based automobile voice control system and control method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103035240A (en) * 2011-09-28 2013-04-10 苹果公司 Speech recognition repair using contextual information
US20140350924A1 (en) * 2013-05-24 2014-11-27 Motorola Mobility Llc Method and apparatus for using image data to aid voice recognition
CN105320726A (en) * 2014-05-30 2016-02-10 苹果公司 Reducing the need for manual start/end-pointing and trigger phrases
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN107437415A (en) * 2017-08-09 2017-12-05 科大讯飞股份有限公司 A kind of intelligent sound exchange method and system
US20180033436A1 (en) * 2015-04-10 2018-02-01 Huawei Technologies Co., Ltd. Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103035240A (en) * 2011-09-28 2013-04-10 苹果公司 Speech recognition repair using contextual information
US20140350924A1 (en) * 2013-05-24 2014-11-27 Motorola Mobility Llc Method and apparatus for using image data to aid voice recognition
CN105320726A (en) * 2014-05-30 2016-02-10 苹果公司 Reducing the need for manual start/end-pointing and trigger phrases
US20180033436A1 (en) * 2015-04-10 2018-02-01 Huawei Technologies Co., Ltd. Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal
CN105654943A (en) * 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 Voice wakeup method, apparatus and system thereof
CN107437415A (en) * 2017-08-09 2017-12-05 科大讯飞股份有限公司 A kind of intelligent sound exchange method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112201246A (en) * 2020-11-19 2021-01-08 深圳市欧瑞博科技股份有限公司 Intelligent control method and device based on voice, electronic equipment and storage medium
CN112201246B (en) * 2020-11-19 2023-11-28 深圳市欧瑞博科技股份有限公司 Intelligent control method and device based on voice, electronic equipment and storage medium
CN114758654A (en) * 2022-03-14 2022-07-15 重庆长安汽车股份有限公司 Scene-based automobile voice control system and control method
CN114758654B (en) * 2022-03-14 2024-04-12 重庆长安汽车股份有限公司 Automobile voice control system and control method based on scene

Also Published As

Publication number Publication date
WO2019169591A1 (en) 2019-09-12

Similar Documents

Publication Publication Date Title
CN107776574B (en) Driving mode switching method and device for automatic driving vehicle
US20210192941A1 (en) Feedback performance control and tracking
US20240046489A1 (en) System and method for online real-time multi-object tracking
US10712556B2 (en) Image information processing method and augmented reality AR device
CN110753934B (en) System and method for actively selecting and tagging images for semantic segmentation
CN110175686B (en) Monitoring vehicle maintenance quality
CN107816976B (en) Method and device for determining position of approaching object
CN111071152B (en) Fish-eye image processing system and method
CN110341620B (en) Vehicle prognosis and remedial response
US10818110B2 (en) Methods and systems for providing a mixed autonomy vehicle trip summary
WO2021023095A1 (en) Method for detecting relay attack based on communication channels and device
EP4137914A1 (en) Air gesture-based control method and apparatus, and system
WO2023274361A1 (en) Method for controlling sound production apparatuses, and sound production system and vehicle
CN115520198A (en) Image processing method and system and vehicle
CN115239548A (en) Target detection method, target detection device, electronic device, and medium
CN111819626A (en) Voice interaction method and device
WO2021217575A1 (en) Identification method and identification device for object of interest of user
CN115170630B (en) Map generation method, map generation device, electronic equipment, vehicle and storage medium
CN108736983A (en) Vehicle, ultrasonic system and device and its information interacting method
CN114693536A (en) Image processing method, device and storage medium
CN108528330B (en) Vehicle and information interaction method thereof
CN208515473U (en) One vehicle and ultrasonic system
CN115179930B (en) Vehicle control method and device, vehicle and readable storage medium
CN114842454B (en) Obstacle detection method, device, equipment, storage medium, chip and vehicle
US11914914B2 (en) Vehicle interface control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination