WO2022226919A1 - 与乘客交流的方法及相关装置 - Google Patents

与乘客交流的方法及相关装置 Download PDF

Info

Publication number
WO2022226919A1
WO2022226919A1 PCT/CN2021/091121 CN2021091121W WO2022226919A1 WO 2022226919 A1 WO2022226919 A1 WO 2022226919A1 CN 2021091121 W CN2021091121 W CN 2021091121W WO 2022226919 A1 WO2022226919 A1 WO 2022226919A1
Authority
WO
WIPO (PCT)
Prior art keywords
passenger
target
sign language
target passenger
information
Prior art date
Application number
PCT/CN2021/091121
Other languages
English (en)
French (fr)
Inventor
兰睿东
王頔
于佳鹏
黄为
徐文康
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180001483.5A priority Critical patent/CN113330394A/zh
Priority to PCT/CN2021/091121 priority patent/WO2022226919A1/zh
Publication of WO2022226919A1 publication Critical patent/WO2022226919A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60RVEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
    • B60R16/00Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for
    • B60R16/02Electric or fluid circuits specially adapted for vehicles and not otherwise provided for; Arrangement of elements of electric or fluid circuits specially adapted for vehicles and not otherwise provided for electric constitutive elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application relates to the field of smart cars, and in particular, to a method and related device for communicating with passengers.
  • the reception and transmission of information is the basic way of communication between human beings and the whole world, but for the hearing-impaired people, they have lost one of the most important ways to perceive the world - sound.
  • the number of hearing-impaired people is huge, but the overall penetration rate of barrier-free facilities is only 40.6%. More importantly, apart from using sign language to communicate, the hearing-impaired people are no different from ordinary people in normal life, which leads to the fact that the real needs of the hearing-impaired people are often ignored.
  • the embodiments of the present application provide a method and a related device for communicating with passengers, which can solve the problem of communication barriers with hearing-impaired passengers or passengers who are inconvenient to speak in a car ride, and help the hearing-impaired passengers or passengers who are inconvenient to speak and the driver. Or other passengers can communicate normally without barriers.
  • an embodiment of the present application provides a method for communicating with passengers, the method comprising:
  • the target passenger may be a hearing-impaired passenger or a passenger who is inconvenient to speak.
  • the target person is a person other than the target passenger in the cockpit, such as a driver or other passengers.
  • Communication with passengers specifically refers to the communication between the hearing-impaired passenger and the driver in the cockpit, or the communication between the hearing-impaired passenger and the disabled passenger, or the communication between the disabled passenger and the driver and other scenarios.
  • the sign language information of the target passenger in the cockpit is acquired; according to the sign language information of the target passenger, the text information corresponding to the sign language information of the target passenger is determined; the text corresponding to the sign language information of the target passenger is played through a multimedia device information.
  • the target user since the meaning of one or more sentences is expressed through sign language, the target user needs to express the sign language for a period of time. Therefore, the sign language information of the target passenger in the cockpit is obtained in real time, so as to ensure that the target passenger's sign language is not missed. mean.
  • the target person By converting the sign language information of the target passenger into text information, and then playing the text information converted from the sign language information of the target passenger through a multimedia device, the target person can be informed of the meaning expressed by the sign language of the target passenger, which solves the problem between the target passenger and the target person. problems of communication barriers.
  • the above-mentioned multimedia device may be a vehicle-mounted device or an additional device, and may also be a terminal device of a passenger in the cockpit, such as a smart phone, a smart watch, or a smart bracelet.
  • the multimedia device includes a display screen and/or a speaker of the target person, and the text information corresponding to the sign language information of the target passenger is played through the multimedia device, including:
  • the display screen of the above-mentioned target person or target passenger may be located on the front windshield; or for the passengers located in the front row (including the target person and the target passenger), the display screen of the display screen may be located on the center console; occupants, whose displays are located on the front seats.
  • the text information corresponding to the sign language information of the target passenger is displayed on the display screen or the first audio signal is played through the speaker, so that the target person in the cockpit can know the meaning expressed by the target passenger; and for the driver, the first audio signal is played through the speaker. Avoid dangerous driving caused by the driver distracted by checking the text information corresponding to the sign language information of the target passenger.
  • the multimedia device includes a display screen of the target passenger, and before the text information corresponding to the sign language information of the target passenger is played through the multimedia device, the method of this embodiment further includes:
  • the multimedia device also includes a display screen and/or speaker of the target person, and the text information corresponding to the sign language information of the target passenger is played through the multimedia device, including:
  • the target text is displayed on the display screen of the target person, and/or the second audio signal is played through the speaker to inform the target person of the content expressed by the sign language information of the target passenger, and the second audio signal is obtained based on the target text.
  • the text information obtained based on the target passenger's sign language information may be inconsistent with the target passenger's expression
  • the text information is displayed on the display screen of the target passenger for the target passenger to use. Confirm whether there is an error; if the confirmation is wrong, the target passenger can modify the text to obtain the target text, which improves the practicability of the system.
  • the method of the present application before determining the text information corresponding to the sign language information of the target passenger according to the sign language information of the target passenger, the method of the present application further includes:
  • the target passenger determines whether the target passenger performs a sign language operation according to the body posture information of the target passenger, or determine whether the target passenger performs a sign language operation according to the body posture information and gesture information of the target passenger; wherein, the body posture information and gesture information of the target passenger are based on the target passenger
  • the target passenger is determined to perform sign language operation, it will enter the hearing-impaired mode.
  • the target passenger When judging that the target passenger is performing sign language operations, it enters the hearing impaired mode, which avoids the target passenger not entering the hearing impaired mode and ignoring the sign language done by the target passenger while doing sign language, and improves the user experience of the system.
  • the method of the present application further includes:
  • the image information of the passenger in the cockpit is obtained; the target passenger in the cockpit is determined according to the image information of the passenger in the cockpit, and the target passenger is the passenger who is determined to make a preset action based on the image information of the passenger in the cockpit. Or; after detecting the instruction of the target person to press the button on the target passenger, it is determined that there is a target passenger in the cockpit. In this way, it can be accurately determined whether there is a target passenger in the cockpit, thereby avoiding the problem that the target passenger cannot communicate with the target person.
  • the target passenger key may be a physical case, a touch case, etc. in the cockpit; the above-mentioned instruction of the target person for the target passenger key may be a pressing instruction, a touch instruction, a gesture instruction, or a voice instruction.
  • the method of the present application further includes: when it is determined that there is a target passenger in the cockpit, entering the hearing-impaired mode without judging whether the target passenger performs a sign language operation, so as to avoid the execution of "judging whether the target passenger performs a sign language operation". sign language operation” while omitting the target passenger’s sign language.
  • the multimedia device includes a display screen of the target passenger, and the method of the present application further includes:
  • a third audio signal of the sign language information of the target person and the target passenger collected by the microphone is acquired; the first text is displayed on the display screen of the target passenger, and the first text is obtained according to the third audio signal. In this way, the target passenger can be informed of the reply of the target person.
  • the method of the present application further includes:
  • a passenger identification is displayed on the display screen, and the passenger identification is used to indicate the passenger who issued the third audio signal. In this way, the target passenger can be made aware of the specific passenger in the cockpit that communicates with him.
  • an in-vehicle device for communicating with passengers, the in-vehicle device comprising:
  • the acquisition unit is used to acquire the sign language information of the target passenger in the cockpit;
  • a determining unit configured to determine the text information corresponding to the sign language information of the target passenger according to the sign language information of the target passenger;
  • the control unit is configured to control the multimedia device to play the text information corresponding to the sign language information of the target passenger, so as to inform the target personnel in the cockpit of the content expressed by the sign language of the target passenger.
  • the acquiring unit is used to acquire the sign language information of the target passenger in the cockpit in the hearing-impaired mode
  • the multimedia device includes a display screen and/or a speaker of the target person, and in terms of controlling the multimedia device to play text information corresponding to the sign language information of the target passenger, the control unit is specifically used for:
  • the multimedia device includes a display screen of the target passenger, and before the control unit controls the multimedia device to play the text information corresponding to the sign language information of the target passenger,
  • the control unit is further configured to control to display the text information corresponding to the sign language information of the target passenger on the display screen of the target passenger, so that the target passenger can confirm whether the text information corresponding to the sign language information of the target passenger is correct;
  • the obtaining unit is further configured to obtain the target text according to the operation instructions of the target passenger on the display screen and the text information corresponding to the sign language information of the target passenger;
  • the multimedia device also includes a display screen and/or a speaker of the target person.
  • the control unit is specifically used for:
  • the determining unit before determining the text information corresponding to the sign language information of the target passenger according to the sign language information of the target passenger, is further configured to:
  • the acquiring unit is further configured to acquire image information of the passenger in the cabin after detecting that the passenger gets on the vehicle;
  • the determining unit is further configured to determine the target passenger in the cabin according to the image information of the passenger in the cabin, and the target passenger is the passenger who is determined to perform the preset action based on the image information of the passenger in the cabin; or,
  • the determining unit is further configured to determine that there is a target passenger in the cockpit after detecting the instruction of the target person pressing the button on the target passenger.
  • the multimedia device includes a display screen of the target passenger, an acquisition unit, and is further configured to acquire a third audio signal of the sign language information of the target person for the target passenger collected through a microphone;
  • the control unit is further configured to control the display screen of the target passenger to display the first text, where the first text is obtained according to the third audio signal.
  • control unit is also used for:
  • a passenger identification is displayed on the display screen, and the passenger identification is used to indicate the passenger who sends out the third audio signal.
  • the target passenger is a hearing-impaired passenger, or a passenger who is inconvenient to speak.
  • embodiments of the present application provide an in-vehicle device, including a processor and a memory, wherein the processor is connected to the memory, wherein the memory is used to store program codes, and the processor is used to call the program codes to execute the first aspect. part or all of the method.
  • an embodiment of the present application provides a chip system, which is applied to an electronic device; the chip system includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected through lines; the interface circuit For receiving signals from the memory of the electronic device and sending signals to the processor, the signals include computer instructions stored in the memory; when the processor executes the computer instructions, the electronic device performs part or all of the method described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement part or all of the method described in the first aspect.
  • the sign language information of the target passenger in the cockpit is obtained, the sign language information of the target passenger is identified, the text information corresponding to the sign language information of the target passenger is obtained, and the multimedia The device plays the text information corresponding to the sign language information of the target passenger to inform the target personnel in the cockpit of the meaning expressed by the target passenger's sign language, which solves the problem of communication between the hearing-impaired passengers or those who are inconvenient to speak and other personnel in the cockpit;
  • the text information is displayed on the display screen of the target passenger for the target passenger to confirm whether the text information is correct; when it is determined that the text information does not express the meaning expressed by the sign language, the target passenger
  • the text information can be modified through the display screen, and the target text can be obtained, which improves the practicability of the system; when judging that the target passenger is performing sign language operations, it enters the hearing-impaired mode, which avoid
  • FIG. 1a is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 1b is a schematic flowchart of a method for communicating with passengers according to an embodiment of the present application
  • FIG. 1c is a schematic diagram of another system architecture provided by an embodiment of the present application.
  • FIG. 1d is a schematic structural diagram of a sign language recognition model provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for communicating with passengers according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an interface display provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a target passenger communication method provided by an embodiment of the present application.
  • Fig. 4a is a schematic diagram of the setting position of the display screen of the front passenger according to the embodiment of the present application.
  • Fig. 4b provides a schematic diagram of the setting position of the display screen of the rear passenger according to the embodiment of the present application
  • Fig. 5 is a schematic diagram of speech conversion into text
  • FIG. 6 is a schematic flowchart of a method for communicating with a hearing-impaired passenger according to an embodiment of the present application
  • FIG. 7 is a schematic structural diagram of a vehicle-mounted device for communicating with passengers according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a vehicle-mounted device according to an embodiment of the present application.
  • the target passenger in this application is a hearing-impaired passenger in the cockpit, or a passenger who is inconvenient to speak, and the target person is a person other than the target passenger in the cockpit, such as a disabled passenger or a driver.
  • FIG. 1a is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system architecture includes: a vehicle-mounted device 101 , a motion collector 102 , an audio collector 103 , a display screen 104 and a speaker 105 .
  • the vehicle-mounted device 102 is the control center of the entire system, including multiple modules, the multiple modules include a central processing unit (CPU), a graphics processing unit (GPU), a neural network processor (neural) One or more of network processing unit, NPU), the in-vehicle device 101 also includes a memory.
  • the in-vehicle device 101 mainly includes the following functions:
  • Controlling the motion collector 102 manipulating each link of acquisition, including multiple imaging indicators of the motion collector 102, such as brightness, illumination, exposure time, etc., and processing the collected image data into a format required by subsequent algorithms;
  • Controlling the audio collector 103 controlling each link of the collection, including the indicators of the audio signal collected by the audio collector 103, such as frequency, volume, etc., and processing the collected audio signal into the format required by the subsequent algorithm;
  • the target passenger confirms that the text information is correct, the text information is displayed on the display screen 104, or the text information is converted into an audio signal, and the audio signal is played through the speaker 105, so that the target person Understand the sign language of the target passenger;
  • the action collector 102 is used to collect sign language information of passengers in the cockpit; optionally, the action collector includes an image camera, a time of flight (TOF) camera or a millimeter-wave radar; wherein, the image camera is used to collect information in the cockpit floor plan, red green blue (RGB) imaging or infrared (infrared radiation, IR) image; the method of the present application can obtain the sign language information of passengers in the cockpit based on the image collected by the image camera; the TOF camera is used to collect depth images , the depth image contains plane images and depth information. Under the condition of depth information, more accurate sign language information can be obtained based on the depth image; through the high-frequency signal emitted by the millimeter-wave radar, the passenger's movements can be continuously scanned to obtain Sign language information for passengers.
  • TOF time of flight
  • IR infrared radiation
  • the audio collector 103 includes microphones installed at different positions in the cockpit, which can accurately collect the audio signal in the cockpit, and then translate the collected audio signal into text through the vehicle-mounted device 101, and display it on the display screen 104 for the target passenger. read.
  • the audio collector 103 may be built in the vehicle, or may be installed.
  • the display screen 104 can be an ordinary display screen or a touch display screen, and the display screen 104 can be built in the vehicle or added; it can be used to display text information, and the text information is collected by the audio collector 103
  • Sign language to express the target passenger; the number and location of the display screen 104 are not limited here;
  • the speaker 105 can be a built-in speaker of the vehicle, or an additional speaker, used for playing the basis, so that the driver or other disabled passengers can understand the sign language meaning of the target passenger.
  • the number and position of the speakers 105 are not limited herein.
  • each device cooperates to realize the method flow shown in Fig. 1b: after the target passenger enters the cockpit, the motion collector 102 continuously detects the body posture and hand movements of the target passenger, and the in-vehicle device 101 uses the The deep learning algorithm accurately recognizes the detected body gestures and sign language movements, and translates the sign language into text in real time; after the translation into text, the target passenger is provided with an interface that can be modified word by word on the display screen 104.
  • the target passenger After the target passenger confirms that the text is correct, The text is synthesized into speech, and the obtained speech is played through the speaker 105; the in-vehicle device 101 uses the deep learning algorithm to recognize the voice of the barrier-free passenger or driver collected by the audio collector 103 in the cockpit in real time and obtain the text, and the text is real-time. Displayed on display screen 104 for review by the target passenger.
  • an embodiment of the present application provides another system architecture 100 .
  • the data collection device 160 is used to collect training data.
  • the training data includes a video or image recorded with sign language information and text information corresponding to the sign language information; and the training data is stored in Database 130, the training device 120 trains the sign language recognition model 113 based on the training data maintained in the database 130, and the sign language recognition model 113 can be used to realize the sign language recognition method disclosed in the embodiment of the present application, that is, the sign language of the target passenger will be recorded
  • the image or video of the information is input into the sign language recognition model 113 after relevant preprocessing, and the text information corresponding to the sign language information of the target passenger can be obtained.
  • the training data maintained in the database 130 may not necessarily come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 may not necessarily train the sign language recognition model 113 entirely based on the training data maintained in the database 130, and it is also possible to obtain training data from the cloud or other places for model training.
  • Model training is performed on images or videos with sign language information of the target passenger and text information corresponding to the sign language information of the target passenger, and the above description should not be taken as a limitation on the embodiments of the present application.
  • the sign language recognition model 113 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 1c, the execution device 110 can be a terminal, such as a mobile phone terminal, a tablet computer, a notebook Computer, desktop computer, AR/VR, vehicle terminal, etc., it can also be a server or cloud, etc.
  • the execution device 110 is configured with an I/O interface 112, which is used for data interaction with external devices. Data can be input to the I/O interface 112 through the acquisition device 170, and the input data is in the embodiment of the present application.
  • a video or image recording the sign language information of the target passenger may be included, so that the sign language recognition model 113 performs sign language recognition on the video or image recording the sign language information of the target passenger. See action collector 102 above.
  • the execution device 110 When the execution device 110 preprocesses the input data, or the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the data and instructions obtained by corresponding processing may also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing result, such as the detection result and/or the heat map of the product to be tested obtained by the above-mentioned calculation module 111, to the multimedia device 140, so as to be provided to the user.
  • the multimedia device 140 may be a display.
  • the training device 120 can generate corresponding sign language recognition models 113 based on different training data for different goals or different tasks, and the corresponding sign language recognition models 113 can be used to achieve the above goals or complete the above tasks, Thus, text information corresponding to the sign language information is output.
  • the user can manually specify input data, which can be operated through the interface provided by the I/O interface 112 .
  • the user's terminal device can automatically send the input data to the I/O interface 112. If the user's terminal device is required to automatically send the input data and the user's authorization is required, the user can set the corresponding permissions.
  • the user can view the result output by the execution device 110 on the terminal device, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the collection device 170 can also be used as a data collection terminal to collect the input data of the I/O interface 112 and the output result of the I/O interface 112 as new sample data as shown in FIG.
  • the collection can also be performed without the collection device 170, but the I/O interface 112 directly uses the input data of the I/O interface 112 and the output result of the I/O interface 112 as shown in FIG. 1c as new sample data Stored in database 130.
  • FIG. 1c is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the positional relationship between the devices, devices, modules, etc. shown in FIG. 1c does not constitute any limitation.
  • the data storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • the training device 120 and the execution device 110 may be the same device.
  • FIG. 1d is a schematic structural diagram of a sign language recognition model provided by an embodiment of the present application; It includes: a detection module, a feature extraction module, a semantic recognition module and a text synthesis module.
  • a detection module a feature extraction module
  • a semantic recognition module a semantic recognition module
  • a text synthesis module a text synthesis module
  • the present application provides a communication method, including:
  • S1 Receive the passenger's gesture/body information.
  • the passenger is any passenger in the cabin, which can be a hearing-impaired passenger or a passenger who is inconvenient to speak, or a disabled passenger.
  • the gesture/body information can be preset or updated periodically.
  • cameras, lidars, or other sensors in the cockpit collect image information or other forms of information of the above-mentioned passengers, and then extract gesture/body information based on the passenger's image information or other forms of information.
  • one way is: whether to make a preset gesture or preset after the passenger enters the cockpit.
  • Body movement when it is determined that the passenger makes a preset gesture or a preset body movement, it is determined that the passenger is a hearing-impaired passenger or a passenger who does not speak; another way is: setting a setting on or near each seat in the cockpit
  • a button which can be a physical button or a virtual touch button; after the passenger enters the cockpit, whether the passenger's instruction for the button is detected, and when the passenger's instruction for the button is detected, it is determined that the passenger is a hearing-impaired passenger. Or passengers who don't speak to each other.
  • the text information corresponding to the sign language information is played, and the text information corresponding to the sign language information of the passenger is played through the multimedia device.
  • the passenger's gesture/body information to determine whether the passenger is performing sign language operations, including:
  • Whether the passenger is performing a sign language operation is determined according to the passenger's gesture information, or whether the passenger is performing a sign language operation is determined according to the passenger's gesture information and body information.
  • the executive body of the above embodiment and the embodiment shown in FIG. 2 may be a vehicle center console, an additional device with processing capability, or a terminal device of a driver or passenger, such as a driver or a passenger.
  • a driver or passenger such as a driver or a passenger.
  • Passenger smartphones, smart watches, tablets, etc. But before performing the above method, these devices need to establish wired or wireless connections with on-board sensors and multimedia devices.
  • FIG. 2 is a schematic flowchart of a method for communicating with a target passenger according to an embodiment of the present application. As shown in Figure 2, the method includes:
  • the method of the present application further includes:
  • the preset motion may be a preset body motion or a preset gesture motion.
  • the preset body motion and the preset gesture motion can be updated periodically.
  • the video of the passengers in the cockpit is obtained through the action collector, and the action information of each passenger in the cabin is determined based on the video of the passenger in the cockpit; based on the action information of each passenger, it is determined whether the passenger has made a preset action, the
  • the assumed action can be a preset gesture action or a preset body action; if the passenger makes a preset action, the passenger is determined to be the target passenger; if the passenger does not make the preset action, the passenger is determined to be a disabled passenger.
  • the motion collector includes a camera, a lidar, or a millimeter-wave radar. These devices can be built-in to the vehicle or retrofitted.
  • the above-mentioned camera may be a camera for collecting plane images, RGB images or infrared images, and may also be a TOF camera.
  • Lidar or millimeter-wave radar can collect point cloud images with depth information.
  • a hearing-impaired button is installed on each seat in the cockpit, which may be a physical button or a virtual touch button. After the passenger gets on the bus, an operation instruction for any hearing-impaired button is detected. , the passenger in the seat corresponding to the hearing-impaired button is determined as the target passenger.
  • the above operation instruction may be a pressing operation instruction, a touch operation instruction or a gesture operation instruction.
  • confirmation information is sent to the target passenger to request to confirm whether the passenger is a real target passenger; feedback information from the passenger is received, based on the feedback information. Determine if the passenger is the target passenger.
  • a label is displayed on the display screen corresponding to the passenger seat, as shown in FIG. 3 , the label is used to prompt whether the passenger sitting on the seat is the target passenger;
  • the feedback information for this tag determines whether the passenger sitting in this position is the target passenger.
  • the feedback information may be voice information; for the target passenger, two function buttons are displayed on the display screen at the same time, and the two function buttons respectively indicate that the passenger sitting on the seat is the target passenger and the If it is not a target passenger, the feedback information at this time may be a pressing command or a touch command of the two function keys.
  • the in-vehicle device can determine the position information of the target passenger in the cabin through the source of the feedback information.
  • the driver can determine whether the passenger is the target passenger through communication with the passenger; when determining that the passenger is the target passenger, the driver sends an instruction to the vehicle-mounted device, and the instruction can be a voice instruction , gesture commands or other commands; when the vehicle-mounted device receives the driver's command, it determines that there is a target passenger in the cockpit.
  • the specific method can be as follows:
  • the body posture information of the target passenger is extracted from the video of the target passenger, and then whether the target passenger is performing sign language operations is determined based on the body posture information of the target passenger.
  • the gesture information of the target passenger can also be extracted from the video of the target passenger, and then whether the target passenger is performing a sign language operation is determined based on the body posture information and gesture information of the target passenger.
  • determining whether the target passenger is performing a sign language operation based on the body posture information of the target passenger including:
  • the value indicates that the target passenger is performing a sign language operation or that the target passenger is not performing a sign language operation. For example, when the first flag is 1 or true, it indicates that the target passenger is performing a sign language operation, and when the first flag is 0 or false, it indicates that the target passenger is not performing a sign language operation.
  • the body posture information and gesture information of the target passenger are obtained according to the video of the target passenger.
  • a deep learning algorithm is used in combination with context information to process the current image information of the target passenger to obtain the human body of the target passenger.
  • Key point information and hand key point information wherein the context information includes an image whose acquisition time is before the current image and the human body key point information and hand key point information of the target passenger obtained based on the previous image; the above body posture information It includes human body key point information, and the above gesture information includes hand key point information.
  • Both the human body key point information and the hand key point information include 2-dimensional (2dimension, 2d) plane information and 3-dimensional (2dimension, 3d) plane information.
  • the human body key point information is key point information with a larger scale, including 18 key point information or 32 key point information.
  • the target passenger uses sign language to express a sentence, he needs to make multiple sign languages in succession. Therefore, after entering the hearing-impaired mode, the video of the target passenger needs to be obtained in real time, and the video includes the sign language information of the target passenger.
  • the step "judging whether the target passenger speaks sign language" is not performed, and the hearing impaired mode is directly entered to acquire the sign language information of the target passenger in real time.
  • the sign language information is used to represent the information of the sign language. Since the sign language requires not only the gesture information, but also the arm and body parts, the sign language information includes the gesture information and the body posture information.
  • the target passenger uses sign language to express the meaning of a sentence
  • he needs to make multiple sign languages in succession wherein one sign language or multiple consecutive sign languages represent one character or one word in the above sentence, and optionally, a Sign language or multiple consecutive sign languages represent one word in the above sentence
  • the sign language information of the target passenger is extracted in real time from the video of the target passenger acquired in real time to obtain one or more sign language information of the target passenger; optionally , after obtaining the video containing the complete sign language information of the target passenger, extract the sign language information of the target passenger in the video to obtain one or more sign language information; after obtaining one or more sign language information, based on one or more One or more phrases are obtained from each sign language information, and the phrase can be a word or a word; multiple phrases are combined into a sentence with complete intention, which can express the sign language of the target passenger, and the sentence is the above-mentioned target passenger
  • a recurrent neural network recurrent neural network
  • RNN recurrent neural network
  • LSTM long short term memory
  • the vehicle-mounted device collects the image of the target passenger in real time through the camera in the cockpit and inputs it into the sign language recognition model; optionally, after collecting the image of the target passenger in real time, according to the first A preset frequency is sampled from the images collected in real time, and the sampling results are input into the sign language recognition model.
  • the sign language recognition model includes a detection module, a feature extraction module, a semantic recognition module and a text synthesis module; since the hands, arms and some body parts are needed to perform sign language representation, Therefore, when performing sign language recognition, it is necessary to determine the position information of the hand, arms and body of the target passenger in the input image; specifically, the detection module detects the input image and determines the hand of the target passenger in the image of the target passenger.
  • the detection module outputs an image containing one or more detection boxes containing the hands, arms and body parts of the target passenger; then it will contain one or more detection boxes
  • the image of the frame is input to the feature extraction module; the feature extraction module extracts the hand key point information and human key point information of the target passenger in one or more detection frames in the input image, wherein the human key point information includes the feature information of the arms ; Among them, the hand key point information and the human body key point information are represented in the form of vectors, so the feature extraction module outputs the hand feature vector and body posture feature vector of the target passenger.
  • the target passenger Since the target passenger needs to express the meaning of a sentence in sign language for a period of time, in order to determine the meaning expressed by the target passenger, it is necessary to obtain the images of the target passenger expressing in sign language in real time, and use the detection module to determine from the first to the Nth image. After the position information of the sign language, arms and body parts of the target passenger in the image, the tracking technology can be used to determine the position information of the sign language, arms and body parts of the target passenger in the subsequent images; where N is greater than or equal to 1 .
  • phrases can have the same phrase, and multiple first phrases can also be different phrases; after obtaining the hand feature vector and body pose feature vector of the target passenger in the M+1 image, the The hand feature vector and body pose feature vector of the target passenger in the images obtained before the image (including the first image above) and the hand feature vector and body pose feature of the target passenger in the second image and the M+1 image
  • the vector input is processed in the semantic recognition module to obtain one or more second phrases, wherein the same phrase exists in multiple second phrases, and multiple second phrases can also be phrases that are different from each other;
  • M is a preset value; optionally, M may be 1, 2, 3, 5 or other values.
  • the text synthesis module After obtaining one or more first phrases, one or more second phrases, ... one or more T-th phrases in this way, the one or more first phrases, one or more second phrases, ... ...one or more T-th phrases are input into the text synthesis module for processing to obtain text information; wherein, the text synthesis model may be a neural network model.
  • an RNN network such as a transformer-RNN network based on the attention mechanism, can be used to synthesize text.
  • the above text information is the output of the above sign language recognition model, and T is a positive integer.
  • the T-th phrase is the hand feature vector and body pose feature vector of the target passenger in the images obtained before the T-th image (including the above-mentioned first image, second image... T-th image) and the T-th image.
  • the hand feature vector and body posture feature vector of the target passenger in the image to the T+M-1th image are input into the semantic recognition module for processing.
  • the feature extraction module when the feature extraction module extracts the hand feature and the body posture feature of the target passenger, it may be performed simultaneously or separately.
  • the vehicle-mounted device can obtain the sign language recognition model from the training device according to a preset cycle through over-the-air technology (OAT), so as to achieve the purpose of regularly updating the sign language recognition model, thereby improving the Accuracy of Sign Language Recognition Models.
  • OAT over-the-air technology
  • the multimedia device includes a display screen corresponding to each seat. After acquiring the text information corresponding to the sign language information of the target passenger, the display screen of the target person displays the text information corresponding to the sign language information of the target passenger. , in order to inform the target personnel in the cockpit of the content expressed by the sign language of the target passenger.
  • the display screen for the front passenger can be set on the windshield in front of the passenger, on the front console, or on the glass on the side of the passenger (that is, on the side door of the passenger). glass).
  • the content to be displayed can be projected onto the front windshield or the side glass by means of projection, similar to a projector.
  • the display screen for the rear passenger can be arranged on the backrest of the seat in front of it, or can be arranged on the glass on the side of the passenger (i.e. the glass on the side door).
  • the multimedia device includes a speaker. After acquiring the text information corresponding to the sign language information of the target passenger, a first audio signal is obtained according to the text information corresponding to the sign language information of the target passenger, and the first audio signal is played through the speaker. , in order to inform the target personnel in the cockpit of the content expressed by the sign language of the target passenger.
  • the multimedia device includes a speaker and a display screen corresponding to each seat.
  • the first audio frequency is obtained according to the text information corresponding to the sign language information of the target passenger.
  • the text information corresponding to the sign language information of the target passenger is displayed through the display screen of the target person, and the first audio signal is played through the speaker to inform the target person in the cockpit of the content expressed by the sign language of the target passenger; at this time, the target passenger's sign language
  • the text information corresponding to the sign language information may be regarded as subtitles corresponding to the first audio signal.
  • the multimedia device includes a display screen of the target passenger, and before the text information corresponding to the sign language information of the target passenger is played through the multimedia device, the method of the present application further includes:
  • the text information corresponding to the sign language information of the target passenger on the display screen of the target passenger, so that the target passenger can confirm whether the text information corresponding to the sign language information of the target passenger is correct;
  • the text information corresponding to the passenger's sign language information obtains the target text;
  • the multimedia device also includes a speaker and/or a display screen of the target person, and the text information corresponding to the sign language information of the target passenger is played through the multimedia device, including:
  • the target text is displayed on the display screen of the target person, and/or the second audio signal is played through the speaker to inform the target person of the content expressed in the sign language of the target passenger, the second audio signal is obtained based on the target text.
  • the target passenger can confirm whether the text information corresponding to the target passenger's sign language information can represent the meaning of the target passenger's sign language;
  • the target passenger makes a preset gesture, and the preset gesture can be an "OK" gesture; optionally, the display screen can be a touch screen, and the target passenger is confirming the corresponding sign language information of the target passenger.
  • the target passenger can directly modify the text information corresponding to the target passenger's sign language information on the display screen to obtain the target text, or the target passenger can click the "Modify" function key on the display screen to enter the modification mode.
  • the target passenger modifies the text information corresponding to the sign language information of the target passenger on the display screen to obtain the target text, wherein the target text can represent the meaning that the target passenger wants to express through the sign language;
  • the text information corresponding to the sign language information of the target passenger is sent to the terminal device of the target passenger, so that the target passenger can confirm whether the text information corresponding to the sign language information of the target passenger can represent the meaning that the target passenger wants to express through sign language; then
  • the target text is obtained according to the above method, which will not be described here.
  • the target text and the video recording the body posture and/or sign language of the target passenger are saved; optionally, the video recording the body posture and/or sign language of the target passenger is Perform desensitization operations on the video, such as removing the face information of the target passenger to obtain the desensitized video, and save the video and the target text as subsequent training samples for subsequent training of the sign language recognition model and improve the sign language recognition model. accuracy.
  • the target person after informing the target person of the content expressed by the sign language of the target passenger, the target person can directly respond to the sign language information of the target passenger through ordinary speech, and at this time, the microphone in the cockpit collects the target passenger's sign language information.
  • the third audio signal of the person, the in-vehicle device converts the third audio signal into the first text, and displays the first text on the display screen of the target passenger, so that the target passenger knows the response made by the target person.
  • the in-vehicle device converts the third audio signal into the first text by using a speech recognition technology, wherein the process of converting the third audio signal into the first text through the speech recognition technology is shown in FIG. 5 :
  • the vehicle-mounted device performs feature extraction on the third audio signal to obtain a feature vector of the third audio signal; uses an acoustic model, a language model and a dictionary, and processes the feature vector of the third audio signal according to a speech decoding and search algorithm to obtain the first Text; before that, the on-board device obtains the acoustic model, language model and dictionary, either from the training device, or the on-board device is trained in the following way:
  • the vehicle-mounted device obtains the audio signal from the language database, performs feature extraction on the audio signal, obtains the feature vector of the audio signal, and trains the acoustic model according to the feature vector of the audio signal; obtains the text information from the text database, and the text information Perform feature extraction to obtain a feature vector of the text information, and train a language model according to the feature vector of the text information.
  • the specific training process is not described here.
  • the vehicle-mounted device also obtains the location information of the target person in the cockpit, and specifically can collect the image or video of the target person in the cockpit, analyze the image or video of the target person through lip language recognition technology, and determine the target person's position. position information; or when acquiring the audio signal of the target person in the cockpit, obtain the third audio signal through the microphone array in the cockpit, and then analyze the third audio signal to determine the position information of the target person in the cockpit; After the position information of the target person in the cockpit, when the first text is displayed on the display screen of the target passenger, a first label is displayed at the same time, and the first label is used to indicate the position information of the target person mentioned above, so that the target passenger knows that it is the cockpit Which passenger is talking.
  • the vehicle-mounted device may send the first text to the terminal device of the target passenger, so that the target passenger knows the response made by the target person; and send the location information to the terminal device of the target passenger, so that the target passenger knows which passenger in the cockpit is speaking.
  • the target passenger enters the cockpit to board the car, expresses in sign language, and after a brief confirmation on the display screen, broadcasts it to the driver and other passengers in real time.
  • the driver and other passengers reply by voice, and the translation of different voices is displayed on the screen in real time for the target passengers to check.
  • the target passenger can communicate with other people in the cockpit completely without hindrance during the entire ride.
  • FIG. 6 is a schematic flowchart of a method for communicating with a hearing-impaired passenger according to an embodiment of the present application. As shown in Figure 6, the method includes:
  • the on-board device After the hearing-impaired passenger gets on the bus, that is, after entering the cockpit, the on-board device enters the hearing-impaired passenger detection mode, and performs real-time detection on the body posture and hands of the hearing-impaired passenger. Specifically, the motion collector in the cockpit collects the hearing-impaired passenger in real time.
  • the video or image of the passenger, the in-vehicle device obtains the body posture and/or gesture of the hearing-impaired passenger according to the collected video or image of the hearing-impaired passenger, such as inputting the collected video or image of the hearing-impaired passenger into the detection network for processing, and obtaining The position information of the hearing-impaired passenger's hands, arms and body, and then use the feature extraction network to perform feature extraction on the video or image of the hearing-impaired passenger based on the position information of the hearing-impaired passenger's hands, arms and body, and obtain the hearing-impaired passenger.
  • the body posture information and/or hand information of the passenger then, according to the body posture information and/or hand information of the hearing-impaired passenger, it is determined whether the hearing-impaired passenger is performing sign language;
  • the vehicle-mounted device determines that the hearing-impaired passenger is expressing sign language;
  • the on-board device determines that the hearing-impaired passenger has not performed sign language;
  • the video or image of the hearing-impaired passenger obtains the body posture information and/or hand information of the hearing-impaired passenger, and re-determines whether the hearing-impaired passenger is performing sign language based on the acquired body posture information and/or hand information of the hearing-impaired passenger;
  • the gestures and body posture of the hearing-impaired passenger are obtained in the above manner, and the acquired gesture information and body posture information of the hearing-impaired passenger are input into the semantic recognition network for processing, and multiple and input multiple phrases into the text synthesis network to obtain the text information corresponding to the sign language information of the hearing-impaired passengers;
  • the above detection network, feature extraction network, semantic recognition network and text synthesis network can be seen in the into the above sign language recognition model; wherein, the detection network, the feature extraction network and the semantic recognition network can be implemented based on the convolutional neural network, and the text synthesis network is implemented based on the LSTM network.
  • the in-vehicle device displays the text information corresponding to the sign language information of the hearing-impaired passenger on the display screen of the hearing-impaired passenger and provides the ability to modify word by word.
  • the display screen is a touch screen;
  • the hearing-impaired passenger can operate the display screen to modify the text displayed on the display screen to obtain the target text.
  • the target text can correctly express the meaning expressed by the hearing-impaired passenger through sign language; when all the text information corresponding to the hearing-impaired passenger's sign language information is expressed correctly, the correct text will be synthesized into speech and played through the speakers in the cockpit;
  • the microphone in the cockpit listens in real time to the voice of the disabled people in the cockpit (including the driver and passengers) in response to the meaning expressed in the sign language of the hearing-impaired passenger.
  • the voice of the response obtain the voice, and convert the voice into text; display the text on the display screen of the hearing-impaired passenger in real time;
  • the system has completed a communication between the hearing-impaired passenger and the disabled person, and then the system continues to detect and process in the above-mentioned manner to realize the communication between the hearing-impaired passenger and the disabled person, until the hearing-impaired passenger leaves the cockpit.
  • the communication method disclosed in this application can not only be used in the driving environment, but also in the home environment.
  • family members include hearing-impaired persons, and the hearing-impaired person and the disabled person can follow the above-mentioned communication method.
  • Communication or in the home environment, someone (such as infants, etc.) is resting, at this time other people are not convenient to speak and communicate, you can communicate according to the above method.
  • an indoor camera collects the sign language information of the hearing-impaired person, and then sends the hearing-impaired person's sign language information to a smart TV or a smart phone of an indoor user.
  • the smart TV or smart phone determines the hearing-impaired person based on the sign language information of the hearing-impaired person.
  • the text information corresponding to the sign language information of the hearing-impaired person is displayed on the smart TV or smart phone, or the text information is converted into a voice signal, and the voice signal is played through the smart TV or smart phone and other devices.
  • the smart phone or smart TV can collect the voice signal of the disabled, convert the voice signal into text information, and display the text information on the smart TV or smart phone.
  • FIG. 7 is a schematic structural diagram of an in-vehicle device for communicating with passengers according to an embodiment of the present application.
  • the vehicle-mounted device 700 includes:
  • an acquiring unit 701 configured to acquire sign language information of a target passenger in the cockpit
  • a determining unit 702 configured to determine the text information corresponding to the sign language information of the target passenger according to the sign language information of the target passenger;
  • the control unit 703 is configured to control the multimedia device to play the text information corresponding to the sign language information of the target passenger, so as to inform the target person in the cockpit of the content expressed by the sign language of the target passenger.
  • the multimedia device includes a display screen and/or a speaker of the target person.
  • the control unit 703 is specifically configured to:
  • the multimedia device includes a display screen of the target passenger. Before the control unit 703 controls the multimedia device to play the text information corresponding to the sign language information of the target passenger,
  • the control unit 703 is further configured to control to display the text information corresponding to the sign language information of the target passenger on the display screen of the target passenger, so that the target passenger can confirm whether the text information corresponding to the sign language information of the target passenger is correct;
  • the obtaining unit 701 is further configured to obtain the target text according to the operation instruction of the target passenger on the display screen and the text information corresponding to the sign language information of the target passenger;
  • the multimedia device further includes a display screen and/or a speaker of the target person.
  • the control unit 703 is specifically used for:
  • the determining unit 702 before determining the text information corresponding to the sign language information of the target passenger according to the sign language information of the target passenger, is further configured to:
  • the obtaining unit 701 is further configured to obtain image information of the passengers in the cabin after detecting that the passengers get on the vehicle;
  • the determining unit 702 is also used to determine a target passenger in the cabin according to the image information of the passenger in the cabin, and the target passenger is a passenger who determines to make a preset action based on the image information of the passenger in the cabin; or;
  • the determining unit 702 is further configured to determine that there is a target passenger in the cockpit after detecting the instruction of the target person pressing a key on the target passenger.
  • the multimedia device includes a display screen of the target passenger
  • the obtaining unit 701 is further configured to obtain the third audio signal of the sign language information of the target person and the target passenger collected through the microphone;
  • the control unit 703 is further configured to control the display screen of the target passenger to display the first text, where the first text is obtained according to the third audio signal.
  • control unit 703 is further configured to:
  • a passenger identification is displayed on the display screen, and the passenger identification is used to indicate the passenger who sends out the third audio signal.
  • the above-mentioned units acquiring unit 701, determining unit 702, and controlling unit 703) are configured to execute the relevant steps of the above-mentioned method.
  • the acquiring unit 701 is used to execute the relevant content of S201
  • the determining unit 702 is used to execute the relevant content of step S202
  • the control unit 703 is used to execute the relevant content of S203.
  • the vehicle-mounted device 700 is presented in the form of a unit.
  • a "unit” here may refer to an application-specific integrated circuit (ASIC), a processor and memory executing one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-described functions .
  • ASIC application-specific integrated circuit
  • the above acquisition unit 701 , determination unit 702 and control unit 703 may be implemented by the processor 801 of the in-vehicle device shown in FIG. 8 .
  • FIG. 8 is a schematic structural diagram of a vehicle-mounted device provided by an embodiment of the present application; the vehicle-mounted device 800 shown in FIG. screen 803 and communication interface 804.
  • the memory 802, the processor 801, the display screen 803 and the communication interface 804 realize the communication connection among each other through the bus.
  • the memory 802 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 802 may store programs. When the program stored in the memory 802 is executed by the processor 801, the processor 801, the display screen 803 and the communication interface 804 are used to execute various steps of the target passenger communication method of the embodiment of the present application.
  • the processor 801 may adopt a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute the relevant program to realize the function required to be performed by the unit in the vehicle-mounted device of the embodiment of the present application, or to execute the target passenger communication method of the method embodiment of the present application.
  • the processor 801 may also be an integrated circuit chip, which has signal processing capability. In the implementation process, each step of the target passenger communication method of the present application may be completed by an integrated logic circuit of hardware in the processor 801 or instructions in the form of software.
  • the above-mentioned processor 801 can also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices. , discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application-specific integrated circuit
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly embodied as being executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 802, and the processor 801 reads the information in the memory 802, and in combination with its hardware, completes the functions required to be performed by the units included in the vehicle-mounted device of the embodiment of the present application, or performs the communication with passengers in the method embodiment of the present application. Methods.
  • the display screen 803 may be an LCD display screen, an LED display screen, an OLED display screen, a 3D display screen or other display screens.
  • the communication interface 804 implements communication between the in-vehicle device 800 and other devices or a communication network using a transceiver such as, but not limited to, a transceiver. For example, sign language information and the like of the target passenger can be acquired through the communication interface 804 .
  • the bus may include a pathway for communicating information between the various components of the in-vehicle device 800 (eg, memory 802, processor 801, display screen 803, communication interface 804).
  • the acquisition unit 701 , the determination unit 702 and the control unit 703 in the vehicle-mounted device for target passenger communication may be equivalent to the processor 801 .
  • the display screen 803 is used to display the text information in the above embodiment.
  • the in-vehicle device 800 shown in FIG. 8 only shows a memory, a processor, a display screen, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device 800 also includes other devices required. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 800 may further include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the apparatus 800 may only include the necessary devices for implementing the embodiments of the present application, and does not necessarily include all the devices shown in FIG. 8 .
  • vehicle-mounted device 800 is equivalent to the execution device 110 in FIG. 1c.
  • Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
  • Embodiments of the present application further provide a computer storage medium, wherein the computer storage medium can store a program, and when the program is executed, it can implement some or all of the steps of any communication method with passengers described in the above method embodiments.
  • the aforementioned storage media include: U disk, read-only memory (English: read-only memory), random access memory (English: random access memory, RAM), mobile hard disks, magnetic disks or optical disks, etc. medium.
  • the disclosed apparatus may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative, for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or Integration into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Acoustics & Sound (AREA)
  • Mechanical Engineering (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种与乘客的交流方法,具体包括:获取座舱内目标乘客的手语信息(S201);根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息(S202);通过多媒体设备播放目标乘客的手语信息对应的文本信息,向座舱内目标人员告知目标乘客的手语所表达的内容(S203)。采用该方法,解决了与特殊乘客,比如听障乘客等语言障碍乘客或者不方便讲话的乘客之间交流障碍的问题。

Description

与乘客交流的方法及相关装置 技术领域
本申请涉及智能车领域,具体涉及一种与乘客交流的方法及相关装置。
背景技术
信息的接收与传递是人类和整个世界基本的沟通方式,而对于听障人士来说,他们失去了认识世界最重要之一的感知途径—声音。目前,听障人群数量巨大,但无障碍设施整体普及率仅为40.6%。更重要的是,听障人士除了使用手语交流外,正常生活中与普通人无异,导致听障人群的真正需求时常被忽略。听障乘客等语言障碍乘客乘车是一个更加特殊的场景,主要包含两点:一是大多数驾驶员无法直接阅读手语,导致其和听障乘客交流十分困难;二是听障乘客除非使用特殊的装置进行发声,不然只能通过写字的方式传达信息,会直接影响驾驶员的注意力集中。
发明内容
本申请实施例提供了一种与乘客交流的方法及相关装置,可解决乘车情景下的与听障乘客或者不方便讲话的乘客交流障碍问题,帮助听障乘客或者不方便讲话的乘客与司机或者其他乘客可以正常无障碍交流。
第一方面,本申请实施例提供了一种与乘客交流的方法,该方法包括:
获取座舱内目标乘客的手语信息;根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息;通过多媒体设备播放目标乘客的手语信息对应的文本信息,以向座舱内目标人员告知目标乘客的手语所表达的内容。
其中,目标乘客可以为听障乘客或者不方便讲话的乘客。目标人员为座舱内除了目标乘客之外的人员,比如司机或者其他乘客。
与乘客交流具体是指座舱内听障乘客与司机之间的交流、或者听障乘客与无障乘客之间的交流,或者无障乘客与司机之间的交流等多种场景。
在一个示例中,在听障模式下,获取座舱内目标乘客的手语信息;根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息;通过多媒体设备播放目标乘客的手语信息对应的文本信息。
其中,由于通过手语表达一句话或者多句话的意思,目标用户需要表达一段时间的手语,因此上述获取座舱内目标乘客的手语信息是实时获取的,才能保证不遗漏目标乘客通过手语所要表达的意思。
通过将目标乘客的手语信息转换成文本信息,然后通过多媒体设备播放目标乘客的手语信息转换得到的文本信息,以向目标人员告知目标乘客的手语所表达的意思,解决了目标乘客与目标人员之间交流障碍的问题。
其中,上述多媒体设备可以是车载的,也可以加装的设备,还可以是座舱内乘客的终端设备,比如智能手机、智能手表或者智能手环。
在一个可行的实施例中,多媒体设备包括目标人员的显示屏和/或扬声器,通过多媒体设备播放目标乘客的手语信息对应的文本信息,包括:
通过目标人员的显示屏显示目标乘客的手语信息对应的文本信息,和/或;通过扬声器播放第一音频信号,以向目标人员告知目标乘客的手语信息所表达的内容,第一音频信号是基于目标乘客的手语信息对应的文本心得到的。
可选地,上述目标人员或目标乘客的显示屏可以为位于前挡风玻璃上;或者对于位于前排的乘客(包括目标人员和目标乘客),其显示屏位于中控台上;对于后排的乘客,其显示屏位于前排座椅上。
通过显示屏显示目标乘客的手语信息对应的文本信息或者通过扬声器播放第一音频信号,方便座舱内目标人员知晓目标乘客所表达的意思;并且对于司机来说,通过扬声器播放第一音频信号,可避免司机因分心查看目标乘客的手语信息对应的文本信息而造成危险驾驶。
在一个可行的实施例中,多媒体设备包括目标乘客的显示屏,通过多媒体设备播放目标乘客的手语信息对应的文本信息之前,本实施例的方法还包括:
将目标乘客的手语信息对应的文本信息显示在目标乘客的显示屏上,供目标乘客确认目标乘客的手语信息对应的文本信息是否正确;根据目标乘客针对显示屏的操作指令和目标乘客的手语信息对应的文本信息得到目标文本;
多媒体设备还包括目标人员的显示屏和/或扬声器,通过多媒体设备播放目标乘客的手语信息对应的文本信息,包括:
通过目标人员的显示屏显示目标文本,和/或;通过扬声器播放第二音频信号,以向目标人员告知目标乘客的手语信息所表达的内容,第二音频信号是基于目标文本得到的。
由于基于目标乘客的手语信息得到的文本信息可能与目标乘客所表达的不一致,因此在基于目标乘客的手语信息得到的文本信息后,将该文本信息显示在目标乘客的显示屏上,供目标乘客确认是否有误;若确认有误,目标乘客可以对文本进行修改,得到目标文本,提高了系统的实用性。
在一个可行的实施例中,根据目标乘客的手语信息,确定目标乘客手语信息对应的文本信息之前,本申请的方法还包括:
根据目标乘客的身体姿态信息确定目标乘客是否进行手语操作,或者,根据目标乘客的身体姿态信息和手势信息确定目标乘客是否进行手语操作;其中,目标乘客的身体姿态信息和手势信息是根据目标乘客的图像信息得到的;在确定目标乘客进行手语操作时,进入听障模式。
在判断目标乘客进行手语操作时,进入听障模式,避免了目标乘客在做手语时却没有进入听障模式而忽视了目标乘客所做的手语,提高了系统的用户体验。
在一个可行的实施例中,本申请的方法还包括:
在检测到乘客上车后,获取座舱内乘客的图像信息;根据座舱内乘客的图像信息确定出座舱内的目标乘客,目标乘客为基于座舱内乘客的图像信息确定做出预设动作的乘客,或者;在检测到目标人员针对目标乘客按键的指令后,确定座舱内有目标乘客。采用此方式,可以准确地确定座舱内是否有目标乘客,从而避免目标乘客与目标人员无法沟通的问 题。
可选地,目标乘客按键可以为座舱内的实体案件、触摸案件等;上述目标人员针对目标乘客按键的指令可以为按压指令,触摸指令、手势指令或者语音指令等。
在一个可行的实施例中,本申请的方法还包括:在确定座舱内有目标乘客时,进入听障模式,不需要判断目标乘客是否进行手语操作,从而可以避免因执行“判断目标乘客是否进行手语操作”而遗漏目标乘客手语。
在一个可行的实施例中,多媒体设备包括目标乘客的显示屏,本申请的方法还包括:
获取通过麦克风采集的目标人员针对目标乘客的手语信息的第三音频信号;通过目标乘客的显示屏显示第一文本,第一文本是根据第三音频信号得到的。通过该方式,可以让目标乘客知晓目标人员的回复。
在一个可行的实施例中,本申请的方法还包括:
在目标乘客的显示屏上显示第三文本信息时,在该显示屏上显示乘客标识,乘客标识用于指示发出第三音频信号的乘客。通过该方式,可以让目标乘客知晓座舱内与其进行沟通的乘客具体是哪位。
第二方面,本申请实施例提供了一种用于与乘客交流的车载装置,该车载装置包括:
获取单元,用于获取座舱内目标乘客的手语信息;
确定单元,用于根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息;
控制单元,用于控制多媒体设备播放目标乘客的手语信息对应的文本信息,以向座舱内目标人员告知目标乘客的手语所表达的内容。
可选的,获取单元,用于在听障模式下,获取座舱内目标乘客的手语信息,
在一个可行的实施例中,多媒体设备包括目标人员的显示屏和/或扬声器,在控制多媒体设备播放目标乘客的手语信息对应的文本信息的方面,控制单元具体用于:
控制目标人员的显示屏显示目标乘客的手语信息对应的文本信息,和/或;控制扬声器播放第一音频信号,以向座舱内目标人员告知目标乘客的手语所表达的内容,该第一音频信号是基于目标乘客的手语信息对应的文本信息得到的。
在一个可行的实施例中,多媒体设备包括目标乘客的显示屏,在控制单元控制多媒体设备播放目标乘客的手语信息对应的文本信息之前,
控制单元,还用于控制将目标乘客的手语信息对应的文本信息显示在目标乘客的显示屏上,供目标乘客确认目标乘客的手语信息对应的文本信息是否正确;
获取单元,还用于根据目标乘客针对显示屏的操作指令和目标乘客的手语信息对应的文本信息得到目标文本;
多媒体设备还包括目标人员的显示屏和/或扬声器,在控制多媒体设备播放目标乘客额的手语信息对应的文本信息的方面,控制单元具体用于:
控制目标人员的显示屏显示目标文本,和/或;通过扬声器播放第二音频信号,以向座舱内目标人员告知目标乘客的手语所表达的内容,第二音频信号是基于目标文本得到的。
在一个可行的实施例中,在根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息之前,确定单元还用于:
根据目标乘客的身体姿态信息确定目标乘客是否进行手语操作,或者,
根据目标乘客的身体姿态信息和手势信息确定目标乘客是否进行手语操作;其中,目标乘客的身体姿态信息和手势信息是根据目标乘客的图像信息得到的;在确定目标乘客进行手语操作时,进入听障模式。
在一个可行的实施例中,获取单元,还用于在检测到乘客上车后,获取座舱内乘客的图像信息;
确定单元,还用于根据座舱内乘客的图像信息确定出座舱内的目标乘客,目标乘客为基于座舱内乘客的图像信息确定做出预设动作的乘客;或者,
确定单元,还用于在检测到目标人员针对目标乘客按键的指令后,确定座舱内有目标乘客。
在一个可行的实施例中,多媒体设备包括目标乘客的显示屏,获取单元,还用于获取通过麦克风采集的目标人员针对目标乘客的手语信息的第三音频信号;
控制单元,还用于控制目标乘客的显示屏显示第一文本,第一文本是根据第三音频信号得到的。
在一个可行的实施例中,控制单元还用于:
控制目标乘客的显示屏显示第三文本信息时,在该显示屏上显示乘客标识,乘客标识用于指示发出第三音频信号的乘客。
在一个可行的实施例中,目标乘客为听障乘客,或者不方便讲话的乘客。
第三方面,本申请实施例提供一种车载装置,包括处理器和存储器,其中,处理器和存储器相连,其中,存储器用于存储程序代码,处理器用于调用程序代码,以执行如第一方面所述方法的部分或全部。
第四方面,本申请实施例提供一种芯片系统,芯片系统应用于电子设备;芯片系统包括一个或多个接口电路,以及一个或多个处理器;接口电路和处理器通过线路互联;接口电路用于从电子设备的存储器接收信号,并向处理器发送信号,信号包括存储器中存储的计算机指令;当处理器执行计算机指令时,电子设备执行如第一方面所述方法的部分或全部。
第五方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行以实现如第一方面所述方法的部分或全部。
可以看出,在本申请的实施例中,在听障模式下,获取座舱内目标乘客的手语信息,对目标乘客的手语信息进行识别,得到目标乘客的手语信息对应的文本信息,并通过多媒体设备播放目标乘客的手语信息对应的文本信息,以向座舱内目标人员告知目标乘客的手语所表达的意思,解决了座舱内听障乘客或者不方便讲话的人员与其他人员之间交流的问题;在得到目标乘客的手语信息对应的文本信息,将该文本信息显示在目标乘客的显示屏上,供目标乘客确认文本信息是否正确;在确定文本信息未表达出其手语所表示的意思,目标乘客可通过显示屏修改文本信息,得到目标文本,提高了系统的实用性;在判断目标乘客进行手语操作时,进入听障模式,避免了目标乘客在做手语时却没有进入听障模式而忽视了目标乘客所做的手语,提高了系统的用户体验。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。
图1a为本申请实施例提供的一种系统架构示意图;
图1b为本申请实施例提供的与乘客交流方法的流程示意图;
图1c为本申请实施例提供的另一种系统架构示意图;
图1d为本申请实施例提供的一种手语识别模型的架构示意图;
图2为本申请实施例提供的一种与乘客交流方法的流程示意图;
图3为本申请实施例提供的一种界面显示示意图;
图4为本申请实施例提供的目标乘客交流方法的流程示意图;
图4a为本申请实施例提供的前排乘客的显示屏设置位置示意图;
图4b为本申请实施例提供后排乘客的显示屏设置位置示意图;
图5为语音转换成文本的原理图;
图6为本申请实施例提供的一种听障乘客交流方法的流程示意图;
图7为本申请实施例提供的一种用于与乘客交流的车载装置的结构示意图;
图8为本申请实施例提供的一种车载装置的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
首先需要说明的是,本申请中的目标乘客为座舱内的听障乘客,或者不方便讲话的乘客,目标人员为座舱内除了目标乘客之外的人员,比如无障乘客或者司机等。
参见图1a,图1a为本申请实施例提供的一种系统架构示意图。如图1a所示,该系统架构包括:车载装置101、动作采集器102、音频采集器103、显示屏104和扬声器105。
其中,车载装置102是整个系统的控制中心,包括多个模块,该多个模块包括中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、神经网络处理器(neural network processing unit,NPU)中的一种或多种,该车载装置101还包括存储器。车载装置101主要包括以下功能:
对动作采集器102进行控制,操控采集的各个环节,包括动作采集器102的多个成像指标,比如亮度、光照、曝光时间等,并将采集的图像数据处理成后续算法需要的格式;
对音频采集器103进行控制,控制采集的各个环节,包括音频采集器103采集的音频信号的指标,比如频率、音量等,并将采集的音频信号处理成后续算法需要的格式;
对采集的图像进行特征提取,得到目标乘客的手语信息;基于深度学习算法对目标乘客的手语信息确定目标乘客的手语语义,并将目标乘客的手语语义转换为文本信息,在显示屏104上显示,供目标人员理解目标乘客的手语。可选地,将目标乘客的手语语义转换为音频信号,通过扬声器105播放该音频信号,以方便目标人员理解目标乘客的手语;可选地,将目标乘客的手语语义转换为文本信息,在显示屏104上显示供目标乘客修改,在目标乘客确认文本信息无误后,将该文本信息通过显示屏104显示,或者将该文本信息转 换为音频信号,并通过扬声器105播放该音频信号,以便目标人眼理解目标乘客的手语;
动作采集器102,用于采集座舱内乘客的手语信息;可选地,动作采集器包括图像摄像头、飞行时间(time of flight,TOF)摄像头或者毫米波雷达;其中,图像摄像头用于采集座舱内的平面图、红绿蓝(red green blue,RGB)成像或者红外线(infrared radiation,IR)图像;本申请的方法可以基于图像摄像头采集的图像获取座舱内乘客的手语信息;TOF摄像头用于采集深度图像,该深度图像包含平面图像以及深度信息,在拥有深度信息的条件下,可以基于深度图像获取更加精确的手语信息;通过毫米波雷达发射的高频信号,对乘客的动作不断的扫描,可得到乘客的手语信息。
音频采集器103,包括安装在座舱内不同位置的麦克风,可以准确采集座舱内的音频信号,然后通过车载装置101将采集的音频信号语音转译为文本,并显示在显示屏104上,供目标乘客阅读。该音频采集器103可以是车辆自带的,也可以是加装的。
显示屏104,可以为普通显示屏或者触摸显示屏,该显示屏104可以是车辆自带的,也可以是加装的;可以用于显示文本信息,该文本信息为经音频采集器103采集的音频信号转换得到的文本信息,或者为表示目标乘客的手语语义的文本信息;若为触摸显示屏,还可以供目标乘客修改经基于目标乘客的手势信息确定的手语语义得到的文本信息,方便正确表达目标乘客的手语;显示屏104的数量和位置在此均不作限定;
扬声器105可以为车辆自带的扬声器,也可以为加装的扬声器,用于播放基于,以使司机或者其他无障乘客理解目标乘客的手语含义。扬声器105的数量和位置在此均不作限定。
图1a所示的系统架构中各设备配合实现如图1b所示的方法流程:在目标乘客进入座舱后,动作采集器102对目标乘客的身体姿态和手部动作进行持续检测,车载装置101利用深度学习算法对检测到的身体姿态及手语动作进行准确识别,将手语实时翻译成文本;翻译成文本后在显示屏104上给目标乘客提供可逐字修改的界面,目标乘客确认文本无误后,将文本进行语音合成,并将得到的语音通过扬声器105播放;车载装置101利用深度学习算法对座舱内音频采集器103采集的无障乘客或司机的语音进行实时识别并得到文本,将该文本实时显示在显示屏104上供目标乘客查阅。
参见附图1c,本申请实施例提供了另一种系统架构100。如所述系统架构100所示,数据采集设备160用于采集训练数据,本申请实施例中训练数据包括记录有手语信息的视频或者图像和该手语信息对应的文本信息;并将训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到手语识别模型113,该手语识别模型113能够用于实现本申请实施例所公开的手语识别方法,即,将记录有目标乘客的手语信息的图像或视频通过相关预处理后输入该手语识别模型113,即可得到目标乘客的手语信息对应的文本信息。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行手语识别模型113的训练,也有可能从云端或其他地方获取训练数据进行模型训练,例如,训练设备120直接根据记录有目标乘客的手语信息的图像或视频及目标乘客的手语信息对应的文本信息进行模型训 练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的手语识别模型113可以应用于不同的系统或设备中,如应用于图1c所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,台式电脑,AR/VR,车载终端等,还可以是服务器或者云端等。在附图1c中,执行设备110配置有I/O接口112,用于与外部设备进行数据交互,可以通过采集设备170向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括记录目标乘客的手语信息的视频或图像,以使手语识别模型113对记录目标乘客的手语信息的视频或图像进行手语识别,此时,采集设备170可以是图像采集设备,如相机,可以参见上述动作采集器102。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,如上述计算模块111得到的待测产品的检测结果和/或热力图返回给多媒体设备140,从而提供给用户,此时,多媒体设备140可以为显示器。
值得说明的是,训练设备120可以针对不同的目标或者不同的任务,基于不同的训练数据生成相应的手语识别模型113,该相应的手语识别模型113即可以用于实现上述目标或完成上述任务,从而输出手语信息对应的文本信息。
在附图1c中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,用户的终端设备可以自动地向I/O接口112发送输入数据,如果要求用户的终端设备自动发送输入数据需要获得用户的授权,则用户可以在用户的终端设备中设置相应权限。用户可以在终端设备查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。采集设备170也可以作为数据采集端,采集如图1c所示的I/O接口112的输入数据及I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过采集设备170进行采集,而是由I/O接口112直接将如图1c所示I/O接口112的输入数据及I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,附图1c仅是本发明实施例提供的一种系统架构的示意图,图1c中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在附图1c中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。另外,训练设备120和执行设备110可以为同一个设备。
如图1c所示,根据训练设备120训练得到手语识别模型113,具体的,参考图1d,图1d是本申请实施例提供的手语识别模型的结构示意图;本申请实施例提供的手语识别模型可以包括:检测模块、特征提取模块、语义识别模块和文本合成模块,手语识别模型的具体描述过程参考实施例一中关于图2的描述。
在一个可行的实施例中,本申请提供一种交流方法,包括:
S1:接收乘客的手势/肢体信息。
其中,该乘客为座舱内任一乘客,可以为听障乘客或者不方便讲话的乘客,也可以为 无障乘客。
可选地,手势/肢体信息可以是预设的,也可以是周期性更新。比如座舱内的摄像头、激光雷达或者其他传感器采集上述乘客的图像信息或者其他形式的信息,然后基于该乘客的图像信息或者其他形式的信息提取手势/肢体信息。
可选地,在接收乘客的手势/肢体信息之前,需要判断该乘客是否为听障乘客或者不方面讲话的乘客;一种方式是:在该乘客进入座舱后是否做出预设手势或者预设肢体动作;在确定该乘客做出预设手势或者预设的肢体动作时,确定该乘客为听障乘客或者不方面讲话的乘客;另一种方式是:在座舱内每个座位上或者附近设置有一个按键,可以是实体按键,也可以是虚拟触摸按键;在乘客进入座舱后,是否检测到乘客针对该按键的指令,在检测到乘客针对该按键的指令时,确定该乘客为听障乘客或者不方面讲话的乘客。
S2:根据该乘客的手势/肢体信息,通过多媒体设备表达或者播放该乘客的手语信息对应的文本信息。
具体地,根据该乘客的手势/肢体信息判断该乘客是否在进行手语操作;在确定该乘客在进行手语操作时,实时获取该乘客的手语信息,并根据该乘客的手语信息,确定该乘客的手语信息对应的文本信息,并通过多媒体设备播放该乘客的手语信息对应的文本信息。
其中,根据该乘客的手势/肢体信息判断该乘客是否在进行手语操作,包括:
根据该乘客的手势信息判断该乘客是否在进行手语操作,或者根据该乘客的手势信息和肢体信息判断该乘客是否在进行手语操作。
在此需要指出的是,本实施的具体实现过程参见下图2所示实施例的相关描述,在此不再叙述。
需要指出的是,上述实施例和图2所示实施例的执行主体,可以是车载中控台,也可以是加装的具有处理能力的设备,也可是司机或者乘客的终端设备,比如司机或乘客的智能手机、智能手表、平板电脑等。但在执行上述方法之前,这些设备需要与车载传感器、多媒体设备建立有线或者无线连接。
参见图2,图2为本申请实施例提供的一种目标乘客交流方法的流程示意图。如图2所示,该方法包括:
S201、在听障模式下,获取座舱内目标乘客的手语信息。
在一个可行的实施例中,本申请的方法还包括:
当检测到乘客上车后,获取座舱内乘客的视频,根据座舱内乘客的视频确定出座舱内的目标乘客,该目标乘客为基于座舱内乘客的视频确定做出预设动作的乘客。可选地,该预设动作可以为预设身体动作或者预设手势动作。当然,在预设身体动作和预设手势动作是可以周期性更新的。
具体地,通过动作采集器获取座舱内乘客的视频,基于该座舱内乘客的视频确定座舱内每个乘客的动作信息;基于每个乘客的动作信息判断该乘客是否做出预设动作,该预设动作可以为预设手势动作或者预设身体动作;若该乘客做出预设动作,则确定该乘客为目标乘客;若该乘客未做出预设动作,则确定该乘客为无障乘客。
可选地,动作采集器包括摄像头、激光雷达或者毫米波雷达。这些设备可以是车辆自 带的,也可以是加装的。可选地,上述摄像头可以为用于采集平面图像、RGB图像或者红外图像的摄像头,还可以为TOF摄像头。激光雷达或者毫米波雷达可以采集具有深度信息的点云图像。
在一个可行的实施例中,座舱内每个座位上都安装有听障按键,可以是实体按键,也可以是虚拟触摸按键,在乘客上车后,检测到针对任一听障按键的操作指令时,确定该听障按键对应的座位上的乘客为目标乘客。可选地,上述操作指令可以为按压操作指令、触摸操作指令或者手势操作指令。
可选地,按照上述方法确定座舱内有目标乘客时,为了避免误判,向目标乘客发送确认信息,以请求确认该乘客是否为真正的目标乘客;接收该乘客的反馈信息,基于该反馈信息确定该乘客是否目标乘客。
在一个可行的实施例中,在乘客进入座舱后,乘客座位对应的显示屏上显示一个标签,如图3所示,该标签用于提示坐在该座位上的乘客是否为目标乘客;根据乘客的针对该标签的反馈信息确定坐在该位置的乘客是否为目标乘客。其中,对于无障乘客来说,该反馈信息可以是语音信息;对于目标乘客,显示屏上同时显示有两个功能按键,该两个功能按键分别指示坐在该座位上的乘客为目标乘客和不为目标乘客,此时反馈信息可以针对该两个功能按键的按压指令或触摸指令。进一步地,车载装置可以通过反馈信息的来源确定目标乘客在座舱内的位置信息。
在另一个可行的实施例中,在乘客进入座舱后,司机可通过与乘客的沟通确定乘客是否为目标乘客;在确定乘客为目标乘客时,司机向车载装置发送指令,该指令可以是语音指令、手势指令或者其他指令;车载装置接收到司机的指令时,确定座舱内有目标乘客。
在一个可行的实施例中,根据目标乘客的手语信息确定目标乘客的手语信息对应的文本信息之前,需要确定出目标乘客是否在进行手语操作,在确定目标乘客在进行手语操作时,实时获取座舱内目标乘客的手语信息,根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息。其中,确定目标乘客是否在进行手语操作,具体可采用如下方式:
从目标乘客的视频中提取出目标乘客的身体姿态信息,然后基于目标乘客的身体姿态信息确定目标乘客是否在进行手语操作。为了提高判断精度,还可以从目标乘客的视频中提取出目标乘客的手势信息,再基于目标乘客的身体姿态信息和手势信息确定目标乘客是否在进行手语操作。
可选地,基于目标乘客的身体姿态信息确定目标乘客是否在进行手语操作,包括:
将目标乘客的身体姿态信息,或者将目标乘客的身体姿态信息和手势信息输入到二分类的深度神经网络中进行处理,得到一个输出结果,该输出结果包括第一标识,该第一标识的不同取值表征目标乘客在进行手语操作或者表征目标乘客在未进行手语操作。比如,第一标识为1或true时表征目标乘客在进行手语操作,第一标识为0或false时,表征目标乘客在未进行手语操作。
可选地,目标乘客的身体姿态信息和手势信息是根据目标乘客的视频得到的,具体地,利用深度学习算法,并结合上下文信息,对目标乘客的当前图像信息进行处理,得到目标乘客的人体关键点信息和手部关键点信息,其中,上述上下文信息包括采集时间位于当前图像之前的图像及基于该之前的图像得到的目标乘客的人体关键点信息和手部关键点信息; 上述身体姿态信息包括人体关键点信息,上述手势信息包括手部关键点信息。人体关键点信息和手部关键点信息均包含2维(2dimension,2d)平面信息和3维(2dimension,3d)平面信息,可选地,人体关键点信息是尺度较大的关键点信息,包括18关键点信息或者32关键点信息。
需要指出的是,目标乘客在利用手语表示一句话时,是需要连续做出多个手语的,因此在进入听障模式后,需要实时获取目标乘客的视频,该视频包括目标乘客的手语信息。
在一个可行的实施例中,在确定座舱内有目标乘客时,不执行步骤“判断目标乘客是否做出手语”,直接进入听障模式,实时获取目标乘客的手语信息。
其中,手语信息用于表征手语的信息,由于表示手语不仅需要手势信息,还需借助手臂和身体部分部位,因此手语信息包括手势信息和身体姿态信息。
S202、根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息。
具体地,目标乘客在利用手语表达一句话的意思时,需要连续做出多个手语,其中,一个手语或多个连续的手语表示上述一句话中的一个字或者一个词,可选地,一个手语或多个连续的手语表示上述一句话中的一个词语;因此,从实时获取的目标乘客的视频中实时提取目标乘客的手语信息,以得到目标乘客的一个或多个手语信息;可选地,在获取包含目标乘客的完整手语信息的视频后,再对该视频中的目标乘客的手语信息进行提取,得到一个或多个手语信息;在得到一个或多个手语信息后,基于一个或多个手语信息得到一个或多个短语,该短语可以为一个字,也可以为词语;将多个短语组合成一个拥有完整意向的语句,该语句可以表示目标乘客手语,该语句即为上述目标乘客的手语信息对应的文本信息。
可选地,将上述多个短语输入一个神经网络模型中进行处理,以将多个短语组成一句拥有完整意向的语句,在实现过程中,可以采用循环神经网络(recurrent neural network,RNN)模型,如长短期记忆(long short term memory,LSTM)神经网络模型来实现。
下面结合附图具体说明,在进入听障模式后,车载装置通过座舱内的摄像头实时采集目标乘客的图像,输入到手语识别模型中;可选地,在实时采集目标乘客的图像后,按照第一预设频率从实时采集的图像中进行采样,并将采样结果输入到手语识别模型中,可选地,车载装置按照第二预设频率通过座舱内的摄像头采集目标乘客的图像,并输入到手语识别模型中;如图4所示,该手语识别模型包括检测模块、特征提取模块、语义识别模块和文本合成模块;由于在进行手语表示时,需要借助手部,双臂及部分身体部位,因此在进行手语识别时,需要在输入的图像中确定目标乘客的手部、双臂和身体的位置信息;具体地,检测模块对输入的图像进行检测,确定目标乘客的图像中目标乘客的手部、双臂和身体的位置信息,检测模块输出包含一个或多个检测框的图像,该检测框中的内容包含目标乘客的手部、双臂和身体部位;然后将包含一个或多个检测框的图像输入到特征提取模块;特征提取模块提取输入的图像中一个或多个检测框内目标乘客的手部关键点信息和人体关键点信息,其中,人体关键点信息包含双臂的特征信息;其中,手部关键点信息和人体关键点信息是以向量的方式表示的,因此特征提取模块输出的是目标乘客的手部特征向量和身体姿态特征向量。
由于目标乘客表示一句话的意思时,需要用手语表达一段时间,因此若要确定目标乘 客所表达的意思,需要实时获取目标乘客进行手语表达的图像,通过检测模块确定从第一张到第N张图像中目标乘客的手语、双臂和部分身体部位的位置信息后,后续可以采用跟踪技术确定后续图像中目标乘客的手语、双臂和部分身体部位的位置信息;其中,N大于或者等于1。
按照上述方式获取第1张图像到第M张图像中目标乘客的手部特征向量和身体姿态特征向量后,将在第1张图像之前获取的图像中目标乘客的手部特征向量和身体姿态特征向量及将第1张图像到第M张图像中目标乘客的手部特征向量和身体姿态特征向量输入到语义识别模块中进行手语识别,得到一个或多个第一短语,其中,多个第一短语可以存在相同的短语,多个第一短语也可以为均互不相同的短语;在获取第M+1张图像中目标乘客的手部特征向量和身体姿态特征向量后,将在第2张图像之前获取的图像(包括上述第1张图像)中目标乘客的手部特征向量和身体姿态特征向量及第2张图像和第M+1张图像中目标乘客的手部特征向量和身体姿态特征向量输入语义识别模块中进行处理,得到一个或多个第二短语,其中,多个第二短语存在相同的短语,多个第二短语也可以为均互不相同的短语;在一个或多个第一短语中,可以存在与一个或多个第二短语相同的短语。其中,M为预设值;可选地,M可以为1,2,3,5或者其他值。按照该方式得到一个或多个第一短语、一个或多个第二短语、……一个或多个第T短语后,将该一个或多个第一短语、一个或多个第二短语、……一个或多个第T短语输入到文本合成模块中进行处理,得到文本信息;其中,该文本合成模型可以是神经网络模型。例如,在实现过程中,可以采用RNN网络,如基于attention机制的transformer-RNN网络来合成文本。上述文本信息即为上述手语识别模型的输出,T为正整数。第T短语是在第T张图像之前获取的图像(包括上述第1张图像、第2张图像……第T张图像)中目标乘客的手部特征向量和身体姿态特征向量及该第T张图像至第T+M-1张图像中目标乘客的手部特征向量和身体姿态特征向量输入到语义识别模块中进行处理得到的。
在一个可选地示例中,特征提取模块在提取目标乘客的手部特征和身体姿态特征时,可以是同时进行的,也可以是分开进行的。
在一个可行的实施例中,车载装置可以通过空中下载技术(over-the-air technology,OAT)按照预设周期从训练设备中获取手语识别模型,从而到达定期更新手语识别模型的目的,进而提高手语识别模型的精度。
S203、通过多媒体设备播放目标乘客的手语信息对应的文本信息,以向座舱内目标人员告知目标乘客的手语所表达的内容。
在一个可选地实施例中,多媒体设备包括每个座位对应设置的显示屏,在获取目标乘客的手语信息对应的文本信息后,通过目标人员的显示屏显示目标乘客的手语信息对应的文本信息,以向座舱内目标人员告知目标乘客的手语所表达的内容。
如图4a所示,对于前排的乘客的显示屏,可以设置于其前方的挡风玻璃上,还可以设置在前面操作台上,还可以设置在该乘客侧面的玻璃(即旁边车门上的玻璃)上。可选地,可以是通过投射的方式,类似于投影仪,将所要显示的内容投射到前面挡风玻璃上,或者侧面的玻璃上。
如图4b所示,对于后排乘客的显示屏,可以设置在其前面座椅的靠背上,或者设置在 还可以设置在该乘客侧面的玻璃(即旁边车门上的玻璃)上。
在一个可选地实施例中,多媒体设备包括扬声器,在获取目标乘客的手语信息对应的文本信息后,根据目标乘客的手语信息对应的文本信息得到第一音频信号,通过扬声器播放第一音频信号,以向座舱内目标人员告知目标乘客的手语所表达的内容。
在一个可选地实施例中,多媒体设备包括扬声器和每个座位对应设置的显示屏,在获取目标乘客的手语信息对应的文本信息后,根据目标乘客的手语信息对应的文本信息得到第一音频信号,通过目标人员的显示屏显示目标乘客的手语信息对应的文本信息,同时通过扬声器播放第一音频信号,以向座舱内目标人员告知目标乘客的手语所表达的内容;此时,目标乘客的手语信息对应的文本信息可以看成第一音频信号对应的字幕。
在一个可选的实施例后,多媒体设备包括目标乘客的显示屏,通过多媒体设备播放目标乘客的手语信息对应的文本信息之前,本申请的方法还包括:
将目标乘客的手语信息对应的文本信息显示在目标乘客的显示屏上,以供目标乘客确认目标乘客的手语信息对应的文本信息是否正确;根据目标乘客针对目标乘客的显示屏的操作指令和目标乘客的手语信息对应的文本信息得到目标文本;
多媒体设备还包括扬声器和/或目标人员的显示屏,通过多媒体设备播放目标乘客的手语信息对应的文本信息,包括:
通过目标人员的显示屏显示目标文本,和/或;通过扬声器播放第二音频信号,以向目标人员告知目标乘客的手语所表达的内容,该第二音频信号是基于目标文本得到的。
具体地,由于存在因为前期采集的目标乘客做手语时的视频精度低,或者其他原因导致确定的目标乘客的手语信息对应的文本信息所表示手语所表达的内容不准确情况,为了避免这样的情况,更加精确地向目标人员告知目标乘客的手语所表达的内容,在基于听障乘的手语信息得到目标乘客的手语信息对应的文本信息,并将目标乘客的手语信息对应的文本信息显示在目标乘客的显示屏上,以供目标乘客确认目标乘客的手语信息对应的文本信息是否能够表征目标乘客通过手语想表达的意思;目标乘客在确认目标乘客的手语信息对应的文本信息能够表征目标乘客通过手语想表达的意思时,目标乘客做出预设手势,该预设手势可以为“OK”的手势;可选地,该显示屏可以为触控屏,目标乘客在确认目标乘客的手语信息对应的文本信息能够表征目标乘客通过手语想表达的意思时,点击上述显示屏上显示的“确认”功能按键;在接收到目标乘客的确认指令后,将目标乘客的手语信息对应的文本信息作为目标文本,其中,目标乘客的确认指令包括目标乘客做出预设手势或者目标乘客点击显示屏上显示的“确认”功能按键;在目标乘客在确认目标乘客的手语信息对应的文本信息不能够表征目标乘客通过手语想表达的意思时,目标乘客可以直接在显示屏上修改目标乘客的手语信息对应的文本信息,得到目标文本,或者目标乘客点击显示屏上的“修改”功能键,进入修改模式,目标乘客在显示屏上修改目标乘客的手语信息对应的文本信息,得到目标文本,其中,目标文本能够表征目标乘客通过手语想表达的意思;
可选地,将目标乘客的手语信息对应的文本信息发送至目标乘客的终端设备上,以供目标乘客确认目标乘客的手语信息对应的文本信息是否能够表征目标乘客通过手语想表达的意思;然后按照上述方法得到目标文本,在此不再叙述。
在一个可选地实施例中,在得到目标文本后,将目标文本和记录目标乘客的身体姿态 和/或手语的视频进行保存;可选地,对记录目标乘客的身体姿态和/或手语的视频进行脱敏操作,比如去掉目标乘客的人脸信息,得到脱敏操作后的视频,并将该视频与目标文本作为后续训练样本进行保存,以便后续对手语识别模型进行训练,提高手语识别模型的精度。
在一个可选的实施例中,在向目标人员告知目标乘客的手语所表达的内容后,目标人员可以直接通过普通说话的方式针对目标乘客的手语信息作出回应,此时座舱内麦克风采集该目标人员的第三音频信号,车载装置将第三音频信号转换为第一文本,并在目标乘客的显示屏上显示第一文本,以便目标乘客知晓目标人员作出的回应。
具体地,车载装置将第三音频信号转换为第一文本具体是采用了语音识别技术,其中,通过语音识别技术实现将第三音频信号转换为第一文本的过程如图5所示:
车载装置对第三音频信号进行特征提取,得到第三音频信号的特征向量;利用声学模型、语言模型和字典,并根据语音解码和搜索算法对第三音频信号的特征向量进行处理,得到第一文本;在此之前,车载装置获取声学模型、语言模型和字典,可以是从训练设备中获取声学模型和语言模型,也可以是车载装置采用如下方式训练得到:
车载装置从语言数据库中获取音频信号,对该音频信号进行特征提取,得到该音频信号的特征向量,根据该音频信号的特征向量训练得到声学模型;从文本数据库中获取文本信息,对该文本信息进行特征提取,得到该文本信息的特征向量,根据该文本信息的特征向量训练得到语言模型。具体训练过程在此不再叙述。
可选地,车载装置还获取发声目标人员在座舱内的位置信息,具体可以采集座舱内目标人员的图像或者视频,通过唇语识别技术对目标人员的图像或视频进行分析,确定发声目标人员的位置信息;或者在获取座舱内目标人员的音频信号时,通过座舱内的麦克风阵列获取第三音频信号,再对该第三音频信号进行分析,确定发声目标人员在座舱内的位置信息;在获取座舱内目标人员的位置信息后,在目标乘客的显示屏上显示第一文本时,同时显示第一标签,该第一标签用于指示上述发声目标人员的位置信息,以使目标乘客知晓是座舱内哪位乘客在说话。
可选地,车载装置在获取第一文本后,可以将第一文本发送至目标乘客的终端设备,以便目标乘客知晓目标人员作出的回应;车载装置还可以按照上述方式获取发声目标人员在座舱内的位置信息,并向目标乘客的终端设备发送该位置信息,以使目标乘客知晓是座舱内哪位乘客在说话。
可以看出,本申请实施例的方案中,目标乘客进入座舱乘车,进行手语的表述,在显示屏上短暂确认后,实时播报给司机和其他乘客。司机和其他乘客进行语音的回复,不同的声音经过的转译分别实时显示在屏幕上,供目标乘客进行查阅。目标乘客在整个乘车过程中,可以完全无碍的和座舱内的其他人员进行交流。
以听障乘客为例进行具体说明,参见图6,图6为本申请实施例提供的一种听障乘客交流方法的流程示意图。如图6所示,该方法包括:
在听障乘客上车后,也就是进入座舱后,车载装置进入听障乘客检测模式,对听障乘客的身体姿态与手部进行实时检测,具体地,座舱内的动作采集器实时采集听障乘客的视频或者图像,车载装置根据采集的听障乘客的视频或者图像获取听障乘客的身体姿态和/或 手势,比如将采集的听障乘客的视频或者图像输入到检测网络中进行处理,得到听障乘客的手部、双臂和身体的位置信息,再利用特征提取网络基于听障乘客的手部、双臂和身体的位置信息对听障乘客的视频或图像进行特征提取,得到听障乘客的身体姿态信息和/或手部信息;然后根据听障乘客的身体姿态信息和/或手部信息判断听障乘客是否在进行手语表达;可选地,具体当获取的听障乘客的身体姿态信息所指示的身体姿态为预设身体姿态和/或听障乘客的手部信息所指示的手势为预设手势时,车载装置确定听障乘客在进行手语表达;当获取的听障乘客的身体姿态信息所指示的身体姿态不为预设身体姿态或听障乘客的手部信息所指示的手势不为预设手势时,车载装置确定听障乘客未进行手语表达;车载装置根据实时采集的听障乘客的视频或图像获取听障乘客的身体姿态信息和/或手部信息,并重新基于获取的听障乘客的身体姿态信息和/或手部信息判断听障乘客是否在进行手语表达;
在确定听障乘客在进行手语表达时,按照上述方式获取听障乘客的手势和身体姿态,并将获取的听障乘客的手势信息和身体姿态信息输入到语义识别网络中进行处理,得到多个短语,并将多个短语输入到文本合成网络中,得到听障乘客的手语信息对应的文本信息;在此需要指出的是,上述检测网络、特征提取网络、语义识别网络和文本合成网络可以看成上述手语识别模型;其中,检测网络、特征提取网络和语义识别网络可以是基于卷积神经网络实现的,文本合成网络是基于LSTM网络实现。
由于翻译的文本可能存在一些误差,因此车载装置将听障乘客的手语信息对应的文本信息显示听障乘客的显示屏上且提供逐字修改的能力,该显示屏为触摸屏;当听障乘客的手语信息对应的文本信息有误,未正确表达出听障乘客通过手语表达的意思时,听障乘客可以对该显示屏进行操作,实现对显示屏上显示的文本进行修改,得到目标文本,该目标文本能够正确表示听障乘客通过手语表达的意思;当听障乘客的手语信息对应的文本信息全部表达正确时,将正确的文本合成语音,并通过座舱内的扬声器播放;
座舱内的麦克风实时聆听座舱内的无障人员(包括司机和乘客)针对听障乘客手语所表达意思的回应的语音,在检测到启动词后,确定该语音是针对听障乘客手语所表达意思的回应的语音,获取该语音,并将该语音转换成文本,;在听障乘客的显示屏上实时显示该文本;
至此,系统完成一次听障乘客与无障人员的交流,并且之后系统持续进行检测,并可按照上述方式进行处理,实现听障乘客与无障人员的交流,直至听障乘客离开座舱。
在得到正确的文本后,将该正确的文本与记录听障乘客的手语和身体姿态的视频进行保存,作为标注数据进入后续对新的手语识别模型训练部分,以实现对手语识别模型训练,从而提高手语识别模型的精度。
需要指出的是,图6所示实施例的具体实现过程参见图2所示实施例中的相关描述,在此不再叙述。
在此需要指出的是,本申请公开的交流方法不仅仅可用于驾驶环境,还可以用于家居环境,比如家庭成员包含听障人士,听障人士和无障人士之间可以按照上述交流方法进行交流沟通,或者在家居环境中,有人(比如婴幼儿等)在休息,此时其他人不方便讲话交流,就可以按照上述方法进行交流沟通。比如室内摄像头采集听障人士的手语信息,然后 将该听障人士的手语信息发送至智能电视或者室内某个用户的智能手机,智能电视或者智能手机根据听障人士的手语信息,确定听障人士的手语信息对应的文本信息,然后通过智能电视或者智能手机显示听障人士的手语信息对应的文本信息,或者将该文本信息转换为语音信号,并通过智能电视或者智能手机等设备播放该语音信号;智能手机或者智能电视可采集无障人士的语音信号,将语音信号转换为文本信息,并将该文本信息显示在智能电视或者智能手机上。
参见图7,图7为本申请实施例提供的一种用于与乘客交流的车载装置的结构示意图。如图7所示,该车载装置700包括:
获取单元701,用于获取座舱内目标乘客的手语信息;
确定单元702,用于根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息;
控制单元703,用于控制多媒体设备播放目标乘客的手语信息对应的文本信息,以向座舱内目标人员告知目标乘客的手语所表达的内容。
在一个可行的实施例中,多媒体设备包括目标人员的显示屏和/或扬声器,在控制多媒体设备播放目标乘客的手语信息对应的文本信息的方面,控制单元703具体用于:
控制目标人员的显示屏显示目标乘客的手语信息对应的文本信息,和/或;控制扬声器播放第一音频信号,以向座舱内目标人员告知目标乘客的手语所表达的内容,该第一音频信号是基于目标乘客的手语信息对应的文本信息得到的。
在一个可行的实施例中,多媒体设备包括目标乘客的显示屏,在控制单元703控制多媒体设备播放目标乘客的手语信息对应的文本信息之前,
控制单元703,还用于控制将目标乘客的手语信息对应的文本信息显示在目标乘客的显示屏上,供目标乘客确认目标乘客的手语信息对应的文本信息是否正确;
获取单元701,还用于根据目标乘客针对显示屏的操作指令和目标乘客的手语信息对应的文本信息得到目标文本;
多媒体设备还包括目标人员的显示屏和/或扬声器,在控制多媒体设备播放目标乘客的手语信息对应的文本信息的方面,控制单元703具体用于:
通过目标人员的显示屏显示目标文本,和/或;通过扬声器播放第二音频信号,向座舱内目标人员告知目标乘客的手语所表达的内容,第二音频信号是基于目标文本得到的。
在一个可行的实施例中,在根据目标乘客的手语信息,确定目标乘客的手语信息对应的文本信息之前,确定单元702还用于:
根据目标乘客的身体姿态信息确定目标乘客是否进行手语操作,或者,
根据目标乘客的身体姿态信息和手势信息确定目标乘客是否进行手语操作;其中,目标乘客的身体姿态信息和手势信息是根据目标乘客的图像信息得到的;在确定目标乘客进行手语操作时,进入听障模式。
在一个可行的实施例中,获取单元701,还用于在检测到乘客上车后,获取座舱内乘客的图像信息;
确定单元702,还用于根据座舱内乘客的图像信息确定出座舱内的目标乘客,目标乘 客为基于座舱内乘客的图像信息确定做出预设动作的乘客;或者;
确定单元702,还用于在检测到目标人员针对目标乘客按键的指令后,确定座舱内有目标乘客。
在一个可行的实施例中,多媒体设备包括目标乘客的显示屏,
获取单元701,还用于获取通过麦克风采集的目标人员针对目标乘客的手语信息的第三音频信号;
控制单元703,还用于控制目标乘客的显示屏显示第一文本,第一文本是根据第三音频信号得到的。
在一个可行的实施例中,控制单元703还用于:
控制目标乘客的显示屏显示第三文本信息时,在该显示屏上显示乘客标识,乘客标识用于指示发出第三音频信号的乘客。
需要说明的是,上述各单元获取单元701、确定单元702和控制单元703)用于执行上述方法的相关步骤。比如获取单元701用于执行S201的相关内容,确定单元702用于执行步骤S202的相关内容、控制单元703用于执行S203的相关内容。
在本实施例中,车载装置700是以单元的形式来呈现。这里的“单元”可以指特定应用集成电路(application-specific integrated circuit,ASIC),执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。此外,以上获取单元701、确定单元702和控制单元703可通过图8所示的车载装置的处理器801来实现。
参考图8,图8是本申请实施例提供的一种车载装置的结构示意图;图8所示的车载装置800(该装置800具体可以是一种计算机设备)包括存储器802、处理器801、显示屏803以及通信接口804。其中,存储器802、处理器801、显示屏803和通信接口804通过总线实现彼此之间的通信连接。
存储器802可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器802可以存储程序,当存储器802中存储的程序被处理器801执行时,处理器801、显示屏803和通信接口804用于执行本申请实施例的目标乘客交流方法的各个步骤。
处理器801可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的车载装置中的单元所需执行的功能,或者执行本申请方法实施例的目标乘客交流方法。
处理器801还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的目标乘客交流方法的各个步骤可以通过处理器801中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器801还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以 直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器802,处理器801读取存储器802中的信息,结合其硬件完成本申请实施例的车载装置中包括的单元所需执行的功能,或者执行本申请方法实施例的与乘客交流的方法。
显示屏803可以是LCD显示屏、LED显示屏、OLED显示屏、3D显示屏或者其他显示屏。
通信接口804使用例如但不限于收发器一类的收发装置,来实现车载装置800与其他设备或通信网络之间的通信。例如,可以通过通信接口804获取目标乘客的手语信息等。
总线可包括在车载装置800各个部件(例如,存储器802、处理器801、显示屏803、通信接口804)之间传送信息的通路。
应理解,用于目标乘客交流的车载装置中的获取单元701、确定单元702和控制单元703可以相当于处理器801。显示屏803用于显示上述实施例中的文本信息。
应注意,尽管图8所示的车载装置800仅仅示出了存储器、处理器、显示屏、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置800还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置800还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置800也可仅仅包括实现本申请实施例所必须的器件,而不必包括图8中所示的全部器件。
可以理解,所述车载装置800相当于图1c中的所述执行设备110。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时可以实现包括上述方法实施例中记载的任何与乘客的交流方法的部分或全部步骤。前述的存储介质包括:U盘、只读存储器(英文:read-only memory)、随机存取存储器(英文:random access memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可 以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (19)

  1. 一种与乘客交流的方法,其特征在于,包括:
    获取座舱内目标乘客的手语信息;
    根据所述目标乘客的手语信息,确定所述目标乘客的手语信息对应的文本信息;
    通过多媒体设备播放所述目标乘客的手语信息对应的文本信息。
  2. 根据权利要求1所述的方法,其特征在于,所述多媒体设备包括目标人员的显示屏和/或扬声器,所述目标人员为所述座舱内除了所述目标乘客之外的人员,所述通过多媒体设备播放所述目标乘客的手语信息对应的文本信息,包括:
    通过所述目标人员的显示屏显示所述目标乘客的手语信息对应的文本信息,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,和/或;
    通过所述扬声器播放第一音频信号,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,其中,所述第一音频信号是基于所述目标乘客的手语信息对应的文本信息得到的。
  3. 根据权利要求1所述的方法,其特征在于,所述多媒体设备包括目标乘客的显示屏,所述通过多媒体设备播放所述目标乘客的手语信息对应的文本信息之前,所述方法还包括:
    将所述目标乘客的手语信息对应的文本信息显示在所述目标乘客的显示屏上,供所述目标乘客确认所述目标乘客的手语信息对应的文本信息是否正确;
    根据所述目标乘客针对所述目标乘客的显示屏的操作指令和所述目标乘客的手语信息对应的文本信息得到目标文本;
    所述多媒体设备还包括所述目标人员的显示屏和/或扬声器,所述通过多媒体设备播放所述目标乘客的手语信息对应的文本信息,包括:
    通过所述目标人员的显示屏显示所述目标文本,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,和/或;
    通过所述扬声器播放第二音频信号,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,所述第二音频信号是基于所述目标文本得到的。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述根据目标乘客的手语信息,确定所述目标乘客的手语信息对应的文本信息之前,所述方法还包括:
    根据所述目标乘客的身体姿态信息确定所述目标乘客是否进行手语操作,或者,
    根据所述目标乘客的身体姿态信息和手势信息确定所述目标乘客是否进行手语操作;其中,所述目标乘客的身体姿态信息和手势信息是根据所述目标乘客的图像信息得到的;
    在确定所述目标乘客进行手语操作时,获取所述座舱内所述目标乘客的手语信息。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    在检测到乘客上车后,获取所述座舱内乘客的图像信息;根据所述座舱内乘客的图像信息确定出所述座舱内的目标乘客,所述目标乘客为基于所述座舱内乘客的图像信息确定 做出预设动作的乘客,或者;
    在检测到所述目标人员针对目标乘客按键的指令后,确定所述座舱内有目标乘客。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述多媒体设备包括所述目标乘客的显示屏,所述方法还包括:
    获取通过麦克风采集的所述目标人员针对所述目标乘客的手语信息的第三音频信号;
    通过所述目标乘客的显示屏显示第一文本,所述第一文本是根据所述第三音频信号得到的。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    在所述目标乘客的显示屏上显示所述第三文本信息时,在该显示屏上显示乘客标识,所述乘客标识用于指示发出所述第三音频信号的乘客。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述目标乘客为听障乘客或者不方便讲话的乘客。
  9. 一种用于与乘客交流的装置,其特征在于,包括:
    获取单元,用于获取座舱内目标乘客的手语信息;
    确定单元,用于根据所述目标乘客的手语信息,确定所述目标乘客的手语信息对应的文本信息;
    控制单元,用于控制多媒体设备播放所述目标乘客的手语信息对应的文本信息。
  10. 根据权利要求9所述的装置,其特征在于,所述多媒体设备包括目标人员的显示屏和/或扬声器,所述目标人员为所述座舱内处理所述目标乘客之外的人员,在所述控制多媒体设备播放所述目标乘客的手语信息对应的文本信息的方面,所述控制单元具体用于:
    控制所述目标人员的显示屏显示所述目标乘客的手语信息对应的文本信息,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,和/或;
    通过控制所述扬声器播放第一音频信号,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,所述第一音频信号是基于所述目标乘客的手语信息对应的文本信息得到的。
  11. 根据权利要求9所述的装置,其特征在于,所述多媒体设备包括目标乘客的显示屏,所述控制单元控制多媒体设备播放所述目标乘客的手语信息对应的文本信息之前:
    所述控制单元单元,还用于控制将所述目标乘客的手语信息对应的文本信息显示在所述目标乘客的显示屏上,供所述目标乘客确认所述目标乘客的手语信息对应的文本信息是否正确;
    所述获取单元,还用于根据所述目标乘客针对所述显示屏的操作指令和所述目标乘客的手语信息对应的文本信息得到目标文本;
    所述多媒体设备还包括所述目标人员的显示屏和/或扬声器,在所述控制多媒体设备播放所述目标乘客的手语信息对应的文本信息的方面,所述控制单元具体用于:
    控制所述目标人员的显示屏显示所述目标文本,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,和/或;
    控制所述扬声器播放第二音频信号,以向所述目标人员告知所述目标乘客的手语信息所表达的内容,其中,所述第二音频信号是基于所述目标文本得到的。
  12. 根据权利要求9-11任一项所述的装置,其特征在于,所述根据目标乘客的手语信息得到所述目标乘客的手语信息对应的文本信息之前,所述确定单元还用于:
    根据所述目标乘客的身体姿态信息确定所述目标乘客是否进行手语操作,或者,
    根据所述目标乘客的身体姿态信息和手势信息确定所述目标乘客是否进行手语操作;其中,所述目标乘客的身体姿态信息和手势信息是根据所述目标乘客的图像信息得到的;
    在确定所述目标乘客进行手语操作时,所述获取单元获取所述目标乘客的手语信息。
  13. 根据权利要求9-12任一项所述的装置,其特征在于,
    所述获取单元,还用于在检测到乘客上车后,获取所述座舱内乘客的图像信息;
    所述确定单元,还用于根据所述座舱内乘客的图像信息确定出所述座舱内的目标乘客,所述目标乘客为基于所述座舱内乘客的图像信息确定做出预设动作的乘客,
    或者,
    所述确定单元,还用于在检测所述目标人员针对目标乘客按键的指令后,确定座舱内有目标乘客。
  14. 根据权利要求9-13任一项所述的装置,其特征在于,所述多媒体设备包括所述目标乘客的显示屏,
    所述获取单元,还用于获取通过麦克风采集的所述目标人员针对所述目标乘客的手语信息的第三音频信号;
    控制单元,还用于控制所述目标乘客的显示屏显示第一文本,所述第一文本是根据所述第三音频信号得到的。
  15. 根据权利要求13所述的装置,其特征在于,所述控制单元还用于:
    控制所述目标乘客的显示屏显示所述第三文本信息时,在该显示屏上显示乘客标识,所述乘客标识用于指示发出所述第三音频信号的乘客。
  16. 根据权利要求9-15任一项所述的装置,其特征在于,所述目标乘客为听障乘客或者不方便讲话的乘客。
  17. 一种用于与乘客交流的车载装置,其特征在于,包括处理器和存储器,其中,所述处理器和存储器相连,其中,所述存储器用于存储程序代码,所述处理器用于调用所述 程序代码,以执行如权利要求1至8任一项所述的与乘客交流的方法。
  18. 一种芯片系统,其特征在于,所述芯片系统应用于电子设备;所述芯片系统包括一个或多个接口电路,以及一个或多个处理器;所述接口电路和所述处理器通过线路互联;所述接口电路用于从所述电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括所述存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,所述电子设备执行如权利要求1-8中任意一项所述的与乘客交流的方法。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如权利要求1至8任一项所述的与乘客交流的方法。
PCT/CN2021/091121 2021-04-29 2021-04-29 与乘客交流的方法及相关装置 WO2022226919A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180001483.5A CN113330394A (zh) 2021-04-29 2021-04-29 与乘客交流的方法及相关装置
PCT/CN2021/091121 WO2022226919A1 (zh) 2021-04-29 2021-04-29 与乘客交流的方法及相关装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/091121 WO2022226919A1 (zh) 2021-04-29 2021-04-29 与乘客交流的方法及相关装置

Publications (1)

Publication Number Publication Date
WO2022226919A1 true WO2022226919A1 (zh) 2022-11-03

Family

ID=77427026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091121 WO2022226919A1 (zh) 2021-04-29 2021-04-29 与乘客交流的方法及相关装置

Country Status (2)

Country Link
CN (1) CN113330394A (zh)
WO (1) WO2022226919A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN203819120U (zh) * 2013-11-19 2014-09-10 浙江吉利汽车研究院有限公司 一种聋哑人乘车辅助装置
CN108170266A (zh) * 2017-12-25 2018-06-15 珠海市君天电子科技有限公司 智能设备控制方法、装置及设备
CN108764172A (zh) * 2018-05-31 2018-11-06 京东方科技集团股份有限公司 车辆中的手语通信方法、系统及车载标签或智能后视镜
CN110992783A (zh) * 2019-10-29 2020-04-10 东莞市易联交互信息科技有限责任公司 一种基于机器学习的手语翻译方法及翻译设备
CN111752387A (zh) * 2020-06-11 2020-10-09 汪子翔 信息交互方法、装置、系统及设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3034374B1 (en) * 2014-12-19 2017-10-25 Volvo Car Corporation Vehicle safety arrangement, vehicle and a method for increasing vehicle safety
DE102019003785A1 (de) * 2019-05-29 2020-01-02 Daimler Ag Verfahren zum Betrieb eines Fahrzeugs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN203819120U (zh) * 2013-11-19 2014-09-10 浙江吉利汽车研究院有限公司 一种聋哑人乘车辅助装置
CN108170266A (zh) * 2017-12-25 2018-06-15 珠海市君天电子科技有限公司 智能设备控制方法、装置及设备
CN108764172A (zh) * 2018-05-31 2018-11-06 京东方科技集团股份有限公司 车辆中的手语通信方法、系统及车载标签或智能后视镜
CN110992783A (zh) * 2019-10-29 2020-04-10 东莞市易联交互信息科技有限责任公司 一种基于机器学习的手语翻译方法及翻译设备
CN111752387A (zh) * 2020-06-11 2020-10-09 汪子翔 信息交互方法、装置、系统及设备

Also Published As

Publication number Publication date
CN113330394A (zh) 2021-08-31

Similar Documents

Publication Publication Date Title
US10381003B2 (en) Voice acquisition system and voice acquisition method
WO2020134858A1 (zh) 人脸属性识别方法及装置、电子设备和存储介质
EP3593958A1 (en) Data processing method and nursing robot device
US20150325240A1 (en) Method and system for speech input
US11289074B2 (en) Artificial intelligence apparatus for performing speech recognition and method thereof
US20220139389A1 (en) Speech Interaction Method and Apparatus, Computer Readable Storage Medium and Electronic Device
CN113835522A (zh) 手语视频生成、翻译、客服方法、设备和可读介质
JP6759445B2 (ja) 情報処理装置、情報処理方法及びコンピュータプログラム
WO2018107489A1 (zh) 一种聋哑人辅助方法、装置以及电子设备
CN102903362A (zh) 集成的本地和基于云的语音识别
Arsan et al. Sign language converter
US11355101B2 (en) Artificial intelligence apparatus for training acoustic model
US11468247B2 (en) Artificial intelligence apparatus for learning natural language understanding models
KR20210058152A (ko) 지능형 보안 디바이스를 제어하는 방법
EP3671699A1 (en) Electronic apparatus and controlling method thereof
CN113822187A (zh) 手语翻译、客服、通信方法、设备和可读介质
CN113851029B (zh) 一种无障碍通信方法和装置
US20210334461A1 (en) Artificial intelligence apparatus and method for generating named entity table
WO2022226919A1 (zh) 与乘客交流的方法及相关装置
CN113409770A (zh) 发音特征处理方法、装置、服务器及介质
WO2019150708A1 (ja) 情報処理装置、情報処理システム、および情報処理方法、並びにプログラム
KR20180012192A (ko) 유아동용 학습 장치 및 그 동작 방법
JP2018144534A (ja) 運転支援システムおよび運転支援方法並びに運転支援プログラム
CN111081120A (zh) 一种协助听说障碍人士交流的智能穿戴设备
CN111611812A (zh) 翻译成盲文

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938398

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938398

Country of ref document: EP

Kind code of ref document: A1