WO2020006711A1 - 一种消息的播放方法及终端 - Google Patents
一种消息的播放方法及终端 Download PDFInfo
- Publication number
- WO2020006711A1 WO2020006711A1 PCT/CN2018/094517 CN2018094517W WO2020006711A1 WO 2020006711 A1 WO2020006711 A1 WO 2020006711A1 CN 2018094517 W CN2018094517 W CN 2018094517W WO 2020006711 A1 WO2020006711 A1 WO 2020006711A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- terminal
- voice
- message
- user
- text
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 117
- 230000004044 response Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 16
- 238000004891 communication Methods 0.000 abstract description 41
- 230000009286 beneficial effect Effects 0.000 abstract description 9
- 230000003993 interaction Effects 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 33
- 230000008569 process Effects 0.000 description 32
- 238000012545 processing Methods 0.000 description 19
- 238000007726 management method Methods 0.000 description 13
- 230000005236 sound signal Effects 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 230000008859 change Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 229920001621 AMOLED Polymers 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/04—Real-time or near real-time messaging, e.g. instant messaging [IM]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/42—Mailbox-related aspects, e.g. synchronisation of mailboxes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/226—Delivery according to priorities
Definitions
- the present application relates to the field of communication technologies, and in particular, to a method and terminal for playing a message.
- a method and terminal for playing a message provided by the present application can learn a user's voice command, thereby identifying the user's intention and performing corresponding operations, which is beneficial to improving the interaction efficiency between the user and the terminal and the user experience.
- the method provided in the embodiment of the present application is applicable to a terminal, and the method includes: the terminal receives a first message, the first message is text information; and in response to receiving the first message, the terminal plays a first voice, the first voice It is used to inquire whether the user plays the first message by voice; the terminal detects the user's second voice; the terminal converts the second voice into the first text; if the first text does not match the first keyword, the terminal continues to detect the user's voice; A keyword is a positive keyword; when the terminal detects the user's third voice, the terminal converts the third voice into the second text; if the second text matches the first keyword, the terminal voice plays the first message, and the terminal records the first message The number of times of a text; if the number of times of the first text is greater than a first threshold, the terminal adds the first text to the first keyword.
- the technical solution provided in the embodiment of the present application can learn a non-preset answer from the user to determine whether it is an affirmative answer, that is, whether the user wishes to play a message.
- the accuracy of the commands executed by the terminal is improved, and the success rate of the terminal's voice playback messages is improved, making the terminal more intelligent and conducive to improving the user experience of using the terminal.
- the method further includes that the terminal converts the first message into a fourth voice; the broadcasting of the first message by the terminal voice is specifically: the terminal plays the fourth voice.
- the terminal after determining that the second text matches the first keyword, the terminal converts text information of the first message into a voice message (that is, a fourth voice), and then plays the voice message.
- the terminal may convert the text information of the first message into a voice message (that is, the fourth voice) before determining that the second text matches the first keyword.
- the terminal can directly play the voice message. In this way, it is beneficial to reduce the time for the user to wait for the terminal to play the first message, and improve the user experience.
- the text information of the first message is converted into a fourth voice.
- the embodiment of the present application does not limit the time for the terminal to convert the text information of the first message into a voice message.
- the method further includes: the terminal receives a second message, and the second message is text information; and in response to receiving the second message, the terminal Play the fifth voice, the fifth voice is used to ask the user whether to play the second message by voice; the terminal detects the user's sixth voice; the terminal converts the sixth voice to the third text; if the third text matches the first key added Words, the terminal voice broadcasts the second message.
- the terminal can quickly recognize the user's intention and play the second message by voice.
- the interaction efficiency between the user and the terminal is improved, and the user experience is improved.
- the method before the terminal plays the first voice, the method further includes: if the terminal determines that the first message belongs to a preset application, and / or a sender of the first message belongs to a preset contact group, And / or the first message contains a second keyword, and the terminal determines to play the first voice.
- the terminal can also filter the message. In this way, it is beneficial for the user to select a specific message for voice playback according to the needs, and it is possible to avoid too many messages playing through the voice and disturbing the user, which is beneficial to improving the user experience.
- the method before the terminal plays the first voice, the method further includes: while the terminal receives the first message, it also receives a third message; the terminal determines the first message according to a preset priority order The priority of the message is higher than the priority of the third message.
- the terminal when the terminal receives multiple messages at the same time, it can determine the playback order of the messages according to a preset priority order, which is conducive to meeting the diverse needs of users and improving the user experience.
- the method further includes: the terminal displays prompt information for prompting the terminal to update the first keyword.
- the method further includes: if the terminal has not detected the user's voice within a preset time period, or the terminal has not detected the If the first keyword matches the voice of the user, the terminal determines not to play the first message by voice.
- the method further includes: if the terminal detects within a preset time period that the number of times of the voice of the user who does not match the first keyword is greater than the second threshold, Then the terminal determines not to play the first message by voice.
- the first message is a message of an instant messaging application.
- a method for playing a message provided by an embodiment of the present application is applicable to a terminal.
- the method includes: the terminal receives a first message, and the first message is text information; and in response to receiving the first message, the terminal plays a first message.
- a voice the first voice is used to ask the user whether to play the first message by voice; the terminal detects the user's second voice; the terminal converts the second voice into the first text; if the first text does not match the first keyword, the terminal continues Detecting the user's voice; the first keyword includes positive keywords and negative keywords; when the terminal detects the user's third voice, the terminal converts the third voice into the second text; if the second text matches the positive keyword, the terminal The first message is played by voice, and the terminal records the number of times of the first text; if the number of times of the first text is greater than the first threshold, the terminal adds the first text to the positive keyword.
- the terminal determines not to play the first message by voice and the terminal records the number of times of the first text; if the number of times of the first text is greater than the first threshold, the terminal adds the first text to the negative keyword in.
- the technical solution provided in the embodiments of the present application can learn the non-pre-set answer of the user, thereby determining whether the answer is affirmative or negative, that is, whether the user wishes to play the message.
- the accuracy of the commands executed by the terminal is improved, and the success rate of the terminal's voice playback messages is improved, making the terminal more intelligent and conducive to improving the user experience of using the terminal.
- the method further includes: the terminal receives a second message, the second message is text information; and in response to receiving the first Two messages, the terminal plays the fourth voice, the fourth voice is used to ask the user whether to play the second message by voice; the terminal detects the user's fifth voice; the terminal converts the fifth voice to the third text; if the third text matches and is added If the positive keyword is positive, the terminal plays the second message by voice; if the third text matches the added negative keyword, the terminal determines not to announce the second message by voice.
- the method before the terminal plays the first voice, the method further includes: if the terminal determines that the first message belongs to a preset application, and / or a sender of the first message belongs to a preset contact group, And / or the first message contains a second keyword, and the terminal determines to play the first voice.
- the method before the terminal plays the first voice, the method further includes: while the terminal receives the first message, it also receives a third message; the terminal determines the first message according to a preset priority order The priority of the message is higher than the priority of the third message.
- the method further includes: the terminal displays prompt information for prompting the user that the first keyword has been updated.
- the method further includes: if the terminal has not detected the user's voice within a preset time period, or the terminal has not detected the If the first keyword matches the voice of the user, the terminal determines not to play the first message by voice.
- the method further includes: if the terminal detects within a preset time period that the number of times of the voice of the user who does not match the first keyword is greater than the second threshold, Then the terminal determines not to play the first message by voice.
- the first message is a message of an instant messaging application.
- a terminal includes a processor, a memory, and a touch screen.
- the memory and the touch screen are coupled to the processor.
- the memory is used to store computer program code.
- the computer program code includes computer instructions.
- a terminal includes a processor, a memory, and a touch screen.
- the memory and the touch screen are coupled to the processor.
- the memory is used to store computer program code.
- the computer program code includes computer instructions.
- a computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal is caused to execute the method as described in the first aspect and any possible implementation manner thereof.
- a computer storage medium includes computer instructions, and when the computer instructions are executed on a terminal, the terminal is caused to execute the method as described in the second aspect and any possible implementation manner thereof.
- a seventh aspect is a computer program product that, when the computer program product runs on a computer, causes the computer to perform the method described in the first aspect and any one of its possible implementations.
- An eighth aspect a computer program product, when the computer program product runs on a computer, causes the computer to perform the method described in the second aspect and any one of its possible implementations.
- FIG. 1 is a first schematic structural diagram of a terminal according to an embodiment of the present application.
- FIG. 2 is a second schematic structural diagram of a terminal according to an embodiment of the present application.
- FIG. 3 is a first flowchart of a method for playing a message according to an embodiment of the present application
- FIG. 4 is a second schematic flowchart of a method for playing a message according to an embodiment of the present application
- FIG. 5 is a third flowchart of a method for playing a message according to an embodiment of the present application.
- FIG. 6 is a fourth flowchart of a method for playing a message according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of some terminal interfaces provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of still other terminal interfaces according to an embodiment of the present application.
- first and second are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise stated, the meaning of "a plurality" is two or more.
- the terminal in this application may be a mobile phone, a tablet computer, a personal computer (PC), a personal digital assistant (PDA), a smart watch, a netbook, a wearable electronic device, or an augmented reality technology ( Augmented Reality (AR) equipment, virtual reality (VR) equipment, etc., this application does not specifically limit the specific form of the terminal.
- PC personal computer
- PDA personal digital assistant
- AR Augmented Reality
- VR virtual reality
- FIG. 1 it is an example of a structural block diagram of a terminal 100 according to an embodiment of the present invention.
- the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a USB interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a radio frequency module 150, a communication module 160, and an audio module.
- a processor 110 an external memory interface 120, an internal memory 121, a USB interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a radio frequency module 150, a communication module 160, and an audio module.
- speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display 194, and SIM card interface 195 may be included in the terminal 100.
- the sensor module can include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and an ambient light sensor. 180L, bone conduction sensor 180M, etc.
- the structure illustrated in the embodiment of the present invention does not limit the terminal 100. It may include more or fewer parts than shown, or some parts may be combined, or some parts may be split, or different parts may be arranged.
- the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
- the processor 110 may include one or more processing units.
- the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image, signal processor, ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor, and / or neural-network processing unit (NPU) Wait.
- AP application processor
- modem processor graphics processing unit
- GPU graphics processing unit
- image signal processor image signal processor
- ISP image signal processor
- DSP digital signal processor
- NPU neural-network processing unit
- different processing units can be independent devices or integrated in the same processor.
- the controller may be a decision maker who instructs each component of the terminal 100 to coordinate work according to instructions. It is the nerve center and command center of the terminal 100.
- the controller generates operation control signals according to the instruction operation code and timing signals, and completes the control of fetching and executing the instructions.
- the application processor is configured to obtain a user voice and convert the obtained user voice into text, and the matched text may be matched with a pre-stored keyword to record the number of times of the text. When the number of times of the text reaches the preset number, the text is added to the corresponding keywords and so on.
- the application processor may also be used to obtain a text message sent by another terminal or server to the terminal through a radio frequency module or a communication module, and convert the received text message into a voice, and the like.
- the processor 110 may further include a memory for storing instructions and data.
- the memory in the processor is a cache memory. You can save instructions or data that the processor has just used or recycled. If the processor needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses, reducing processor waiting time, thus improving system efficiency.
- the terminal may store keywords preset by the user in the memory in the processor 110, such as keywords for positive answers and / or keywords for negative answers, and the like.
- the terminal may also store the content of the recorded voice command and the number of times of the voice command in the memory.
- the terminal may also store the data in the internal memory 121 or the external memory, which is not specifically limited in the embodiments of the present application.
- the processor 110 may include an interface.
- the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit (inter-integrated circuit, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transceiver asynchronous receiver / transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input / output (GPIO) interface, subscriber identity module (SIM) interface, And / or a universal serial bus (universal serial bus, USB) interface.
- I2C integrated circuit
- I2S integrated circuit
- PCM pulse code modulation
- UART universal asynchronous transceiver asynchronous receiver / transmitter
- MIPI mobile industry processor interface
- GPIO general-purpose input / output
- SIM subscriber identity module
- USB universal serial bus
- the I2C interface is a two-way synchronous serial bus, including a serial data line (serial data line (SDA)) and a serial clock line (derail clock line (SCL)).
- the processor may include multiple sets of I2C buses.
- the processor can be coupled to touch sensors, chargers, flashes, cameras, etc. through different I2C bus interfaces.
- the processor may couple the touch sensor through the I2C interface, so that the processor and the touch sensor communicate through the I2C bus interface to implement the touch function of the terminal 100.
- the I2S interface can be used for audio communication.
- the processor may include multiple sets of I2S buses.
- the processor can be coupled with the audio module through the I2S bus to achieve communication between the processor and the audio module.
- the audio module can pass audio signals to the communication module through the I2S interface, so as to implement the function of receiving calls through a Bluetooth headset.
- the PCM interface can also be used for audio communications, sampling, quantizing, and encoding analog signals.
- the audio module and the communication module may be coupled through a PCM bus interface.
- the audio module may also pass audio signals to the communication module through the PCM interface, so as to implement the function of receiving calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication, and the sampling rates of the two interfaces are different.
- the UART interface is a universal serial data bus for asynchronous communication. This bus is a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
- a UART interface is typically used to connect the processor and the communication module 160.
- the processor communicates with the Bluetooth module through a UART interface to implement the Bluetooth function.
- the audio module can transmit audio signals to the communication module through the UART interface to implement the function of playing music through a Bluetooth headset.
- the terminal may implement any one or more of the I2S interface, the PCM interface, and the UART interface to implement voice playback of a message, and transfer the recorded user voice to a processor.
- the MIPI interface can be used to connect processors with peripheral devices such as displays and cameras.
- the MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), and the like.
- the processor and the camera communicate through a CSI interface to implement a shooting function of the terminal 100.
- the processor and the display screen communicate through a DSI interface to implement a display function of the terminal 100.
- the terminal may display, through a MIPI interface, interface diagrams involved in the process of the terminal performing voice playback, for example, a user's setting interface and the like.
- the GPIO interface can be configured by software.
- the GPIO interface can be configured as a control signal or as a data signal.
- the GPIO interface may be used to connect the processor with a camera, a display screen, a communication module, an audio module, a sensor, and the like.
- GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
- the USB interface 130 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
- the USB interface can be used to connect a charger to charge the terminal 100, and can also be used to transfer data between the terminal 100 and a peripheral device. It can also be used to connect headphones and play audio through headphones. It can also be used to connect other electronic devices, such as AR devices.
- the interface connection relationship between the modules illustrated in the embodiments of the present invention is only a schematic description, and does not constitute a limitation on the structure of the terminal 100.
- the terminal 100 may use different interface connection modes or a combination of multiple interface connection modes in the embodiments of the present invention.
- the charging management module 140 is configured to receive a charging input from a charger.
- the charger may be a wireless charger or a wired charger.
- the power management module 141 is used to connect the battery 142, the charge management module 140 and the processor 110.
- the power management module receives the input of the battery and / or charge management module, and the wireless communication function for the power supply terminal 100 such as a processor, internal memory, external memory, display screen, camera, and communication module can be performed through antenna 1, antenna 2,
- the radio frequency module 150, the communication module 160, a modem processor, and a baseband processor are implemented.
- the antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals.
- Each antenna in the terminal 100 can be used to cover a single or multiple communication frequency bands.
- the radio frequency module 150 can provide a communication processing module applied to the terminal 100 and including a wireless communication solution of 2G / 3G / 4G / 5G. It may include at least one filter, switch, power amplifier, Low Noise Amplifier (LNA), and the like.
- the modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
- the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
- the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
- the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
- the application processor outputs sound signals through audio equipment (not limited to speakers, receivers, etc.), or displays images or video communication module 160 through the display screen. It can provide applications on the terminal 100 including wireless local area networks (WLAN), Bluetooth (bluetooth, BT), global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communications Solution for communication processing module.
- the communication module 160 may be one or more devices that integrate at least one communication processing module.
- the communication module receives electromagnetic waves through the antenna 2, frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor.
- the communication module 160 may also receive a signal to be transmitted from the processor, frequency-modulate it, amplify it, and turn it into electromagnetic wave radiation through the antenna 2.
- a user's voice can be recorded through a microphone in a Bluetooth headset (or a Bluetooth speaker, etc.), and the recorded voice is transmitted to the processor 110 via the Bluetooth communication processing module and the audio module 170.
- the terminal may also play the voice through the audio module 170 and the Bluetooth communication processing module through a Bluetooth headset (or a Bluetooth speaker, etc.).
- the antenna 1 of the terminal 100 is coupled to a radio frequency module, and the antenna 2 is coupled to a communication module. It enables the terminal 100 to communicate with the network and other devices through wireless communication technology.
- the wireless communication technology may include a global mobile communication system (GSM), a general packet radio service (GPRS), a code division multiple access (CDMA), and broadband.
- GSM global mobile communication system
- GPRS general packet radio service
- CDMA code division multiple access
- WCDMA Code division multiple access
- TD-SCDMA time-division code division multiple access
- LTE long-term evolution
- BT GNSS
- WLAN NFC
- FM FM
- IR technology IR
- the GNSS may include a global positioning system (GPS), a global navigation satellite system (GLONASS), a beidou navigation navigation system (BDS), and a quasi-zenith satellite system (quasi -zenith satellite system (QZSS)) and / or satellite-based augmentation systems (SBAS).
- GPS global positioning system
- GLONASS global navigation satellite system
- BDS Bertdou navigation navigation system
- QZSS quasi-zenith satellite system
- SBAS satellite-based augmentation systems
- the terminal may receive messages sent by other terminals, such as a short message, through the antenna 1 and the radio frequency module.
- the terminal can also receive messages sent by other terminals through the antenna 2 and the communication module, such as WeChat messages and QQ messages.
- the embodiment of the present application does not specifically limit the message.
- the terminal 100 implements a display function through a GPU, a display screen 194, and an application processor.
- the GPU is a microprocessor for image processing, which connects the display and the application processor.
- the GPU is used to perform mathematical and geometric calculations for graphics rendering.
- the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
- the display screen 194 is used to display images, videos, and the like.
- the display includes a display panel.
- the display panel can use LCD (liquid crystal display), OLED (organic light-emitting diode), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode) emitting diodes (AMOLED), flexible light-emitting diodes (FLEDs), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (QLEDs), etc.
- the terminal 100 may include one or N display screens, where N is a positive integer greater than 1.
- the terminal 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen, and an application processor.
- the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to realize the expansion of the storage capacity of the terminal 100.
- the external memory card communicates with the processor through an external memory interface to implement a data storage function. For example, save music, videos and other files on an external memory card.
- the internal memory 121 may be used to store computer executable program code, where the executable program code includes instructions.
- the processor 110 executes various functional applications and data processing of the terminal 100 by executing instructions stored in the internal memory 121.
- the memory 121 may include a program storage area and a data storage area.
- the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.) and the like.
- the storage data area may store data (such as audio data, phone book, etc.) created during the use of the terminal 100.
- the memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, other volatile solid-state storage devices, a universal flash memory (universal flash memory, UFS), etc. .
- a non-volatile memory such as at least one magnetic disk storage device, a flash memory device, other volatile solid-state storage devices, a universal flash memory (universal flash memory, UFS), etc.
- the terminal 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. Such as music playback, recording, etc.
- the audio module 170 is used to convert digital audio information into an analog audio signal and output, and is also used to convert an analog audio input into a digital audio signal.
- the audio module can also be used to encode and decode audio signals.
- the audio module may be disposed in the processor 110, or some functional modules of the audio module may be disposed in the processor 110.
- the speaker 170A also called a "horn" is used to convert audio electrical signals into sound signals.
- the terminal 100 can listen to music through a speaker, or listen to a hands-free call.
- the receiver 170B also referred to as a "handset" is used to convert audio electrical signals into sound signals.
- the terminal 100 answers a call or a voice message, it can answer the voice by placing the receiver close to the human ear.
- the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
- the user can make a sound through the mouth close to the microphone, and input the sound signal into the microphone.
- the terminal 100 may be provided with at least one microphone.
- the terminal 100 may be provided with two microphones, in addition to collecting sound signals, it may also implement a noise reduction function.
- the terminal 100 may further be provided with three, four, or more microphones to collect sound signals, reduce noise, and also identify sound sources, and implement a directional recording function.
- the headset interface 170D is used to connect a wired headset.
- the headphone interface can be a USB interface or a 3.5mm open mobile terminal platform (OMTP) standard interface, and the American Cellular Telecommunications Industry Association (United States of America, CTIA) standard interface.
- OMTP open mobile terminal platform
- CTIA American Cellular Telecommunications Industry Association
- the terminal may record a user voice through the microphone 170C, and transmit the recorded voice to the processor 110 through the audio module 170. After the processor 110 converts the received text message into a voice, the terminal may also pass the voice through the audio module 170 and play the voice through a speaker. In other embodiments of the present application, the terminal may record user voice through a microphone in a wired headset, and transmit the recorded voice to the processor 110 via the headset interface 170D and the audio module 170. After the processor 110 converts the received text message into a voice, the terminal may also pass the voice through the audio module 170 and the headphone interface 170D, and the voice is played by the wired headset.
- the touch sensor 180K is also called “touch panel”. Can be set on the display. Used to detect touch operations on or near it. The detected touch operation can be passed to the application processor to determine the type of touch event and provide corresponding visual output through the display screen.
- the keys 190 include a power-on key, a volume key, and the like.
- the keys can be mechanical keys. It can also be a touch button.
- the terminal 100 receives a key input and generates a key signal input related to user settings and function control of the terminal 100.
- the motor 191 may generate a vibration prompt.
- the motor can be used for incoming vibration alert and touch vibration feedback.
- the touch operation applied to different applications can correspond to different vibration feedback effects.
- Touch operations on different areas of the display can also correspond to different vibration feedback effects.
- Different application scenarios for example: time reminder, receiving information, alarm clock, game, etc.
- Touch vibration feedback effect can also support customization.
- the indicator 192 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, and so on.
- the SIM card interface 195 is used to connect to a subscriber identity module (SIM).
- SIM subscriber identity module
- the SIM card can be contacted and separated from the terminal 100 by inserting or removing the SIM card interface.
- the terminal 100 may support one or N SIM card interfaces, and N is a positive integer greater than 1.
- the SIM card interface can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple SIM cards can be inserted into the same SIM card interface at the same time. The types of the multiple cards may be the same or different.
- the SIM card interface can also be compatible with different types of SIM cards.
- the SIM card interface is also compatible with external memory cards.
- the terminal 100 interacts with the network through a SIM card to implement functions such as calling and data communication.
- the terminal 100 uses an eSIM, that is, an embedded SIM card.
- the eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.
- the software system of the terminal 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture.
- the Android system with a layered architecture is taken as an example to exemplify the software structure of the terminal 100.
- the layered architecture divides the software into several layers, each of which has a clear role and division of labor. Layers communicate with each other through interfaces.
- the Android system is divided into four layers, which are an application layer, an application framework layer, an Android runtime and a system library, and a kernel layer from top to bottom.
- the application layer can include a series of application packages.
- the application package can include camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, SMS, WeChat, QQ, settings and other applications.
- the application package involved mainly includes instant messaging applications, including but not limited to applications such as short message, WeChat, and QQ.
- it also relates to a setting application, which provides a user with an interface for setting a voice playback message.
- the set content includes, but is not limited to, a preset application, a preset contact, a preset contact group, a preset second keyword, and a playback priority.
- the application framework layer provides an application programming interface (API) and a programming framework for applications at the application layer.
- API application programming interface
- the application framework layer includes some predefined functions.
- the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and so on.
- the window manager is used to manage window programs.
- the window manager can obtain the display size, determine whether there is a status bar, lock the screen, take a screenshot, etc.
- Content providers are used to store and retrieve data and make it accessible to applications.
- the data may include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, and so on.
- the view system includes visual controls, such as controls that display text, and controls that display pictures.
- the view system can be used to build applications.
- the display interface can consist of one or more views.
- the display interface that includes the SMS notification icon can include a view that displays text and a view that displays pictures.
- the phone manager is used to provide a communication function of the terminal 100. For example, management of call status (including connection, hang up, etc.).
- the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
- the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages that can disappear automatically after a short stay without user interaction.
- the notification manager is used to inform the download completion, message reminder, etc.
- the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
- the status bar prompts text messages, emits a tone, the terminal vibrates, and the indicator light flashes.
- the application framework layer may further include a voice playback system, which provides a service for voice playback of the instant message.
- a voice playback system may be an independent module in the application framework layer, and the voice playback system may also call other modules in the application framework layer to jointly complete the voice playback function of the instant message, which is not specifically limited in the embodiment of the present application. .
- Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
- the core library contains two parts: one is the functional functions that the Java language needs to call, and the other is the Android core library.
- the application layer and the application framework layer run in a virtual machine.
- the virtual machine executes the java files of the application layer and the application framework layer as binary files.
- Virtual machines are used to perform object lifecycle management, stack management, thread management, security and exception management, and garbage collection.
- the system library can include multiple functional modules. For example: surface manager (media manager), media library (Media library), three-dimensional graphics processing library (OpenGL ES), 2D graphics engine (SGL) and so on.
- surface manager media manager
- media library Media library
- OpenGL ES three-dimensional graphics processing library
- SGL 2D graphics engine
- the surface manager is used to manage the display subsystem, and provides a fusion of 2D and 3D layers for multiple applications.
- the media library supports a variety of commonly used audio and video formats for playback and recording, as well as still image files.
- the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
- OpenGL ES is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
- SGL is a drawing engine for 2D drawing.
- the kernel layer is the layer between hardware and software.
- the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
- a message for processing a short message application is used as an example to exemplify the software and hardware workflow of the terminal 100.
- a short message application in the application layer When a short message application in the application layer receives a message, it can call the display driver of the kernel layer and display a message prompt message in the touch screen of the hardware layer to prompt the user to view the message. Then, after the user clicks the control corresponding to the prompt information of the message through the touch screen of the hardware layer, the touch screen can be triggered to report the touch event (such as the position of the touch point and time) generated by the user's touch action to the kernel layer through the corresponding driver.
- the kernel layer encapsulates the touch event and calls the corresponding API to distribute the touch event to the short message application. Then, the terminal opens the short message application and displays an interface for viewing the message. This allows users to view the contents of the message.
- the voice playback system can play a voice asking the user whether to play the message by calling an audio driver of the kernel layer and an audio output device (for example, a Bluetooth headset, a speaker, etc.). Then, the audio input device (for example: Bluetooth headset, microphone, etc.) records the user's voice, and then reports the recorded user's voice to the kernel layer through the corresponding driver.
- the kernel layer encapsulates the event and calls the corresponding API to the framework layer.
- the speech playback system distributes the event. Then, the voice playback system determines whether the message is played by voice according to the event.
- the voice playback system can convert the reported user's voice into text, and match the converted text with pre-stored keywords (affirmative reply keywords and / or negative reply keywords). If the keywords of the positive answer are matched, it is determined that the message is played by voice. Then, the voice playback system converts the message into a voice message, and calls an audio driver in the kernel layer to play the voice message through an audio output device. If a negatively answered keyword is matched, it is determined that the message is not played by voice. In this way, users can process messages when it is not convenient to manually operate the terminal.
- pre-stored keywords affirmative reply keywords and / or negative reply keywords
- the voice playback system may also record the number of times of the converted text. When the number of times of the converted text reaches a predetermined number, the text may also be added to the keywords to achieve the learning of the user's voice. effect.
- the voice playback system may also call a display driver of the kernel layer to display the interfaces involved in the embodiments of the present application through a touch screen, for example, the interface diagrams shown in FIG. 7 and FIG. 8.
- the embodiment of the present application provides a method for playing an instant message by voice. Further, considering that the user may forget the preset voice command or the language habits of the user, the user's voice command is not a preset voice command of the terminal, so the terminal cannot recognize the user's intention and cannot perform the operation desired by the user. For this reason, in the technical solution provided in the embodiment of the present application, the user's voice command can be learned, the true meaning of the user's voice command can be automatically recognized, the use efficiency of the terminal is improved, and the user experience is improved.
- the terminal asks the user whether to play the newly received message by voice.
- FIG. 3 it is a schematic flowchart of a method for playing a voice message according to an embodiment of the present application, and specifically includes:
- the terminal receives a first message.
- the terminal receives a first message sent by another terminal or a server.
- the first message may be a message of an instant messaging application, for example, a message of a short message application, a message of a WeChat application, a message of a QQ application, and the like.
- the terminal voice inquires whether to play the first message.
- the terminal After the terminal newly receives the first message, the terminal displays the prompt information of the first message in the terminal interface.
- the terminal may ask the user whether to play the first message before, or at the same time as, or after the prompt information of the first message is displayed.
- the terminal can perform voice playback through audio devices such as a speaker, a wired headset, a wireless headset, a Bluetooth speaker, and a Bluetooth vehicle-mounted device, which are not specifically limited in this embodiment of the present application.
- audio devices such as a speaker, a wired headset, a wireless headset, a Bluetooth speaker, and a Bluetooth vehicle-mounted device, which are not specifically limited in this embodiment of the present application.
- an interface 801 displayed by the terminal may display a status bar 802, a message prompt box 803, a pattern 804, and a time widget.
- the status bar 802 may include the name of the operator (for example, China Mobile), time, WiFi icon, signal strength, and current remaining power.
- the interface 801 is an interface diagram of the terminal voice asking the user whether to play the first message.
- the terminal may dynamically display the pattern 804, or change the color and grayscale of the pattern 804 to remind the user that the terminal is playing a voice.
- the terminal may also display text information to prompt the terminal to ask the user whether to play the first message. It should be noted that the embodiment of the present application does not limit the manner of prompting the terminal.
- the terminal detects the first voice of the user.
- the terminal records the first voice of the user through the audio input device, and sends the recorded first voice of the user to the terminal's application processor for processing.
- an interface 805 displayed by the terminal may display a status bar 802, a message prompt box 803, a pattern 806, and a time widget.
- the interface 805 is an interface where the terminal detects a user's voice.
- the terminal may dynamically display the pattern 806, or change the color or grayscale of the pattern 806 to prompt the user terminal to detect the user voice, or to process the detected user voice.
- the terminal may also display text information to prompt the terminal to detect the user's voice and process the detected user's voice. It should be noted that, the embodiment of the present application does not limit the prompting manner when the terminal detects the user voice (or is processing the user voice).
- the terminal converts the first voice into text information and records it as a first command.
- the terminal matches the first command with a first keyword stored in advance by the terminal.
- the first keyword may include a command preset by the terminal, for example, a command for a positive reply, a command for a negative reply, and the like.
- the first keyword may be a terminal default or a user setting.
- the first keyword may also be learned by the terminal. For a specific learning method, refer to the following description.
- the preset first keyword may be a positive answer keyword, that is, if the first command matches the first keyword, it can be determined that the first command is a user Hope the voice plays the first message. Then, if the first command does not match the first keyword, the method of the embodiment of the present application (as shown in FIG. 3) needs to be used to learn the first command to determine whether the first command is an affirmative answer.
- the first keyword set in advance may also be a keyword with a negative answer, that is, if the first command matches the first keyword, it can be determined that the first command is not a user. Hope the voice plays the first message. Then, the terminal learns the first command to determine that the first command is a negative response.
- the preset first keyword may further include both a positive answer keyword and a negative answer keyword. Then, the terminal needs to separately process according to which type of first keyword the first command matches. This situation is explained below. This embodiment of the present application is not limited to this.
- steps S306-S313 take the first keyword as an affirmative answer as an example to describe the process of the terminal learning the first command. If the first command matches the first keyword, it is determined that the first command is an affirmative reply, and the user wants the voice to play the first message. Then, the terminal plays the first message in a voice manner, that is, step S306 is performed. Otherwise, step S307 is performed.
- the terminal voice plays the first message.
- playing the first message by voice specifically includes: playing the content of the first message by voice, and may also play the application name to which the first message belongs, the name of the sender of the first message, and the like.
- the terminal may convert the text information of the first message into a voice message, and then play the voice message.
- the terminal may convert the text information of the first message into a voice message before determining that the first command is a positive reply. After determining that the first command is an affirmative answer, the terminal may directly play the voice message. In this way, it is beneficial to reduce the time for the user to wait for the terminal to play the first message, and improve the user experience.
- the terminal may receive the first message, or after receiving the user ’s first voice, or after converting the user ’s first voice into the first command, or the user may use the first command with a preset first keyword After the matching is performed, the text information of the first message is converted into a voice message.
- the embodiment of the present application does not limit the time for the terminal to convert the text information of the first message into a voice message.
- an interface 807 displayed by the terminal may display a status bar 802, a message prompt box 803, a pattern 804, and a time widget.
- the interface 807 is an interface where the terminal is playing the first message.
- the terminal may dynamically display the pattern 804, or change the color and grayscale of the pattern 804 to prompt the user terminal that the first message is being played.
- the terminal may also display prompt information to prompt the user terminal that the first message is being played. It should be noted that the embodiment of the present application does not limit the manner in which the terminal is playing the first message.
- S307 The terminal temporarily does not play the first message, and continues to monitor the user's voice.
- an interface 809 displayed by the terminal may display a status bar 802, a message prompt box 803, a pattern 806, and a time widget.
- the interface 809 is an interface where the terminal does not recognize the user's voice command and continues to monitor the user's voice.
- the terminal may dynamically display the pattern 806, or change the color and grayscale of the pattern 806 to prompt the user that the terminal does not recognize the user's voice command.
- the terminal may also display text information to prompt the user that the terminal does not recognize the user's voice command and continue to monitor the user's voice.
- the terminal may also use a voice mode to prompt the user that the terminal does not recognize the user's instruction, and then continue to monitor the user's voice. It should be noted that the embodiment of the present application does not limit the specific prompt form of the terminal.
- S308 The terminal detects the second voice of the user.
- S309 The terminal converts the detected second voice into text information, and records the second voice as a second command.
- step S310 The terminal matches the second command with a first keyword preset by the terminal. If the second command does not match the preset first keyword, step S311 is performed. If the second command matches the preset first keyword, step S312 is performed.
- S311 The terminal does not play the first message temporarily, and the terminal continues to monitor the user's voice.
- the terminal may end the process. That is, the terminal default user does not want to play the first message by voice.
- the terminal may end the process. That is, the terminal default user does not want to play the first message by voice.
- an interface 808 displayed by the terminal may display a status bar 802, a message prompt box 803, a pattern 804, and a time widget.
- the interface 808 is an interface that the terminal determines not to play the first message.
- the terminal may change the color, gray scale, etc. of the pattern 804 to prompt the user terminal not to play the first message.
- the terminal may also display text information to prompt the user terminal not to play the first message. It should be noted that the embodiment of the present application does not limit the manner in which the terminal prompts the user that the terminal does not play the first message.
- S312 The terminal plays the first message in a voice manner. In addition, the terminal records the content of the first command and the number of times of the first command.
- the first command does not match the preset first keyword.
- the user wants to play the first message, but may forget the content of the preset positive reply, so the first command of the first voice conversion is different from the preset positive reply.
- the first voice of the user is not a response to a terminal inquiry.
- the first voice may be a conversation between the user and another person. After the terminal receives the first voice, it is mistaken for the user's first command.
- the terminal needs to record the content of the first command and the number of times the user uses the first command.
- the terminal automatically adds the first command to the first keyword.
- the terminal learns that the first command of the user is an affirmative reply.
- an interface 810 displayed by the terminal may display a status bar 802, a message prompt box 803, a pattern 806, and a time widget.
- the interface 810 is an interface where the terminal has successfully learned the first command.
- the terminal may change the color, gray scale, and other methods of the pattern 806 to prompt the user terminal that the terminal has learned the first command.
- the terminal may also display text information to prompt the user that the terminal has successfully learned the first command, or add the first command to a keyword of a positive answer. It should be noted that the embodiment of the present application does not limit the manner of prompting the terminal.
- the terminal may display the prompt information that the first command has been successfully learned after playing the message, or may display the prompt information that the first command has been successfully learned before playing the message, or may not display the prompt information that the successful learning This embodiment of the present application does not limit this.
- the terminal voice asks whether to play the message.
- the terminal detects the user's voice and the user's voice is converted into a first command (that is, the content of the user's voice and the first voice at this time).
- the first keyword the result of learning
- the matching is successful.
- the user's positive answers are "Yes” and "Play”. Then, when the user answers "Please say” for the first time, the terminal matches "Please say” with the set positive reply. If it is determined that “please say” is not a set positive answer, the terminal temporarily does not play the message by voice, and the terminal continues to monitor the user's response. The user responds with "play” for the second time, and the terminal matches "play” with the set positive answer. If it is determined that “play” is a positive answer to the setting, the terminal voice plays the message. And, the terminal records "Please say” once.
- the terminal After that, after the terminal asks the user if he needs to play the message by voice, if the user still answers "please say” first, and then responds with an affirmative answer set by the terminal.
- the terminal records "Please say” twice. After the terminal records "Please say” for a preset number of times, the terminal learns "Please say” as a positive answer. Then the terminal can set "Please say” to an affirmative reply. After that, when the terminal receives the user's "Please Say” again, and the terminal matches "Please Say” with the set affirmative reply, it can be determined that "Please Say” is an affirmative reply, and the terminal plays the message by voice.
- the technical solution provided in the embodiments of the present application can learn the user's non-preset answer to determine the true intention of the user and whether the user wishes to play the message. In this way, the accuracy of the commands executed by the terminal is improved, and the success rate of the terminal's voice playback messages is improved, making the terminal more intelligent and conducive to improving the user experience of using the terminal.
- FIG. 4 it is a schematic flowchart of another method for playing a voice message according to an embodiment of the present application, which specifically includes: S101-S113, as follows:
- the terminal newly receives a message.
- the terminal voice asks the user whether to play the message.
- n is used to mark the number of times that the terminal detects the user's voice command from this step to the end of the process.
- time is used for timing.
- the terminal may also initialize time in step S101 and start timing from step S101, which is not limited in this embodiment of the present application.
- m is used to mark the number of records of text learning of the voice during the process of the terminal learning the user's voice command.
- the terminal detects the user voice, and records the number of times n that the user voice is detected.
- the initial value of n is 0.
- the terminal detects the user's voice, it updates the value of n and adds n to 1.
- you can use the code “n n + 1 "achieve.
- the user voice detected this time is recorded as the user voice detected for the nth time.
- the terminal converts the detected voice for the nth time into text.
- step S105 The terminal matches the converted text with a set first keyword. If they do not match, step S106 is performed, and if they match, step S108 is performed.
- the first keyword set here is a positive answer. That is, if the converted text matches the first keyword of the positive answer, it is considered that the user wants to play the message by voice. Otherwise, you need to further learn whether the user's voice wants the voice to play the message.
- the terminal temporarily does not play messages by voice.
- step S107 The terminal determines whether the time has reached a preset time. If the timing reaches the preset time, the process ends. If the timing does not reach the preset time, step S103 is continued.
- S108 The terminal plays a voice message.
- step S109 The terminal determines whether n is greater than or equal to 2. If it is greater than or equal to 2, step S110 is performed; otherwise, the process ends.
- n is not greater than or equal to 2, then it means that when the user issues a voice command for the first time after receiving the message, the user uses the first keyword of the preset positive answer to respond. Then, the terminal recognizes the user's voice command and plays the message by voice. Therefore, there is no need to learn the last voice command of the user, that is, the following learning process does not need to be performed, and the process ends.
- n is greater than or equal to 2, it means that after receiving the message, when the user issues a voice command a second time or later, the user uses the preset first keyword of the positive answer to respond. That is to say, before this, the user has at least one voice command without using the first keyword of the preset positive answer, so the terminal needs to learn the voice command at least once to determine whether the voice command at least once The answer is affirmative, that is, steps S110 and subsequent steps are performed.
- step S110 The terminal performs semantic analysis on the text of the voice detected last time (that is, the n-1th time) to determine whether the text of the voice detected last time is a positive answer. If the text of the last detected voice is an affirmative answer, step S111 is performed. Otherwise, the process ends.
- this step it is possible to determine whether the text of the last detected voice is a positive answer, a negative answer, or neither by using a semantic analysis, and the terminal can determine the last detected voice Whether the text is an affirmative reply provides more basis.
- the terminal may perform semantic analysis on the text of the voice detected before this time (from the first time to the n-1th time). This is because the voices detected from the 1st to the n-1th time may be different expressions of the message that the user wants the voice to play. In this way, it is beneficial to improve the ability and efficiency of terminal learning.
- the terminal may directly execute step S110 without executing step S110, which is not specifically limited in the embodiments of the present application.
- the terminal may separately record the text of all voices detected before this time (from the 1st time to the n-1th time).
- the text of each detected speech corresponds to a value of m
- the text of each detected speech corresponds to m plus 1.
- step S112 The terminal determines whether m is a predetermined number of times. If m is a predetermined number of times, step S113 is performed. Otherwise, the process ends.
- the terminal records the text of the voice detected from the first time to the n-1th time.
- the terminal may determine whether the m value corresponding to the text of each detected voice is a predetermined number of times for the m value corresponding to the text of each detected voice.
- the terminal adds the text of the voice detected last time (that is, the n-1th time) to the first keyword.
- the terminal records the text of the voice detected from the first time to the n-1th time.
- the terminal may set the text of the detected voice at the time or times corresponding to a predetermined number of m times as the first keyword.
- the first keyword set includes a positive response and a negative response as an example.
- step S201 The terminal determines whether the text of the detected voice for the nth time matches a set key. If it is determined that the text of the detected voice for the nth time does not match the first keyword of the positive reply and the first keyword of the negative reply, step S202 is performed. If it is determined that the text of the detected voice for the nth time matches the first keyword of the positive answer, step S204 is performed. If it is determined that the text of the detected voice for the nth time matches the first keyword of the negative reply, step S210 is performed.
- the terminal temporarily does not play messages by voice.
- step S203 The terminal determines whether the time has reached a preset time. If the timing reaches the preset time, the process ends. If the timing does not reach the preset time, step S103 is continued.
- the terminal plays a voice message.
- step S205 The terminal determines whether n is greater than or equal to 2. If it is greater than or equal to 2, step S206 is performed; otherwise, the process ends.
- step S206 The terminal performs semantic analysis on the text of the voice detected at the n-1th time to determine whether the text of the voice detected at the n-1th time is a positive answer. If the text of the detected voice of the n-1th time is a positive answer, step S207 is performed. Otherwise, the process ends.
- step S208 The terminal determines whether m is a predetermined number of times. If m is a predetermined number of times, step S209 is performed. Otherwise, the process ends.
- the terminal adds the text of the voice detected at the n-1th time to the first keyword of the positive reply.
- steps S204-S209 reference may be made to steps S108-S113, and details are not repeated.
- S210 The terminal does not play messages by voice.
- step S211 The terminal determines whether n is greater than or equal to 2. If it is greater than or equal to 2, step S212 is performed; otherwise, the process ends.
- step S212 The terminal performs semantic analysis on the text of the voice detected at the n-1th time to determine whether the text of the voice detected at the n-1th time is a positive answer. If the text of the voice detected at the n-1th time is a positive answer, step S213 is performed. Otherwise, the process ends.
- this step it is possible to determine whether the text of the voice detected at the n-1th time is a positive answer, a negative answer, or neither by voice analysis.
- the text negative response of the detected speech provides more evidence.
- the terminal may perform semantic analysis on the text of the voice detected before this time (from the 1st time to the n-1th time). This is because the speech detected from the 1st to the n-1th time may be different expressions that the user wants to play the message without speech. In this way, it is beneficial to improve the ability and efficiency of terminal learning.
- the terminal may also directly execute S213 without performing this step, which is not specifically limited in the embodiments of the present application.
- the terminal may separately record the text of the voice detected before this time (from the first time to the n-1th time).
- the text of each detected speech corresponds to a value of m
- the text of each detected speech corresponds to m plus 1.
- step S214 The terminal determines whether m is a predetermined number of times. If m is a predetermined number of times, step S215 is performed. Otherwise, the process ends.
- the terminal separately records the text of the voice detected from the first time to the n-1th time.
- the terminal may determine whether the m value corresponding to the text of each detected voice is a predetermined number of times for the m value corresponding to the text of each detected voice.
- the terminal adds the text of the voice detected at the n-1th time to the first keyword of the negative response.
- the terminal separately records the text of the voice detected from the first time to the n-1th time.
- the terminal may set the text of the detected voice at the time or times corresponding to a predetermined number of m times as the first keyword. This process ends.
- the method for voice playback provided by the embodiments of the present application, when the user's voice command is not a preset command, the terminal can learn the user's voice command, thereby identifying the user's intention and performing the corresponding operation . Make the interaction between the user and the terminal more personalized and intelligent, which is conducive to improving the use efficiency of the terminal and improving the user experience.
- the terminal automatically plays the newly received message through voice.
- a schematic flowchart of a method for playing a voice message includes steps S501a-S505, as follows:
- the terminal displays a setting interface.
- the terminal detects a user's setting operation.
- the terminal sets a function of automatically playing a voice message.
- a user may set a message automatically played by the terminal.
- the terminal can set an application that automatically plays a message (that is, a preset application), and then when the terminal receives a message of the preset application, the terminal can automatically play the message.
- the terminal may also set a contact (that is, a preset contact) corresponding to the message that is automatically played, or a group of contacts (that is, a preset contact group) that corresponds to the message that is automatically played. Then, when the terminal receives a message sent by a preset contact or a preset contact group, the terminal can automatically play the message by voice.
- the terminal may also set a second keyword (a preset second keyword) included in the message content that is automatically played, and then when the message received by the terminal includes a preset second keyword, the terminal automatically plays the message.
- the terminal can also set the type of the message to be played automatically (such as chat messages in WeChat, messages from friends circles, and system messages, etc.), the time period of the message to be played automatically, and the location of the message to be played automatically.
- the embodiments of this application are different. One more detail.
- the terminal may also set a priority for playing a message.
- the user may determine the playing priority of the message according to the frequency of use of each application, the importance of each contact or group of contacts, and the specific setting content of the second keyword. For example, if a user uses WeChat more frequently, then the priority of WeChat can be set higher than the priority of SMS. Another example: Set the priority of star contacts in WeChat to be higher than that of ordinary contacts. For another example: if the second keyword is set to "urgent", the priority of the message containing the second keyword can be set to the highest. This embodiment of the present application does not specifically limit this.
- the interface 701 shown in (1) in FIG. 7 can be used to set an application that can automatically play messages.
- the interface 701 may include a status bar 712, a plurality of controls 702, a plurality of controls 703, and a control 704.
- the status bar 712 may include the name of the operator (for example, China Mobile), time, WiFi icon, signal strength, current remaining power, and the like.
- the control 702 can be used to delete an application that automatically plays a message
- the control 703 can be used to add an application that automatically plays a message
- the control 704 can be used to further set the selected application.
- the terminal receives an operation of clicking the control 704 corresponding to “WeChat” by the user, for example, the terminal displays an interface 705 as shown in (2) in FIG. 7.
- the interface 705 can be used to set contacts in "WeChat".
- Contact settings include group settings and settings for specific contacts.
- the group setting is taken as an example for description.
- the control 706 can be used to enable a function of automatically playing a message of a group in a contact. That is, the terminal can set a group for automatically playing messages.
- Control 707 is used to further set the selected group. For example, the terminal receives a user operation on the control 707 corresponding to the group. For example, the terminal displays an interface 708 shown in (3) of FIG. 7.
- a user can select a group for automatically playing a message, and perform specific settings for the selected group.
- the terminal receives a user's operation on the control 709 corresponding to "family", for example, the terminal displays an interface 710 as shown in (4) of FIG. 7.
- the control 711 can be used to receive keywords input by a user.
- the keyword-enabled function may not be associated with an application or a contact, that is, the terminal may also be set to automatically play the message if the message contains certain keywords. content. It doesn't matter what app the message belongs to, or whether the message was sent by a contact. This embodiment of the present application does not specifically limit this.
- the terminal receives a fourth message.
- step S503 The terminal determines whether the fourth message belongs to a preset application. If it belongs to a preset application, step S505 is performed, and if it does not belong to a preset application, step S504 is performed.
- the terminal plays a prompt tone to prompt the user to receive a fourth message.
- step S505 The terminal determines whether the fourth message is sent by a preset contact. If yes, step S506 is performed. Otherwise, step S504 is performed.
- the terminal determines whether the content of the fourth message includes a second keyword. If yes, go to step S507. If not, go to step S504.
- steps S503-S504 and the specific judgment content in each step are not limited in the embodiments of the present application.
- the specific judgment process and specific judgment content of the terminal are related to the specific settings of the user in step S501. This embodiment of the present application does not specifically limit this.
- the terminal voice plays the fourth message.
- the terminal may also ask the user whether to play the fourth message by voice, and the inquiry process may refer to related content in Application Scenario 1, which is not repeated here.
- the terminal may also receive The user's setting on whether to enable the voice playback function. If the user enables the voice playback function, the terminal has the right to play messages by voice, and after the conditions are met, the message can be played by voice. If the user does not enable the voice playback function, the terminal does not have the right to play voice messages and cannot play voice messages.
- the user's voice command can also be received to the message.
- the user's voice command may use the third keyword as a voice prefix to identify the user's voice as the user's reply to the message.
- the third keyword can be "please reply", "please reply”, etc.
- the terminal receives the fourth voice of the user.
- the terminal converts the fourth voice into text information.
- the terminal determines that the fourth voice starts with the third keyword according to the text information, it is determined that the fourth voice is the user's reply to the first message or the fourth message.
- the terminal returns the voice information after the third keyword in the fourth voice to the contact who sent the message.
- the terminal may also convert the voice information after the third keyword in the fourth voice into text information, and reply to the contact who sent the message.
- the embodiments of the present application are not specifically limited.
- the terminal may convert the fourth voice into text information after receiving the fourth voice.
- the terminal may convert part of the received fourth voice into text information.
- the terminal may learn the third keyword, and the learning method is similar to the method in which the terminal learns the first keyword in application scenario 1. I will not repeat them here.
- the terminal may reply some pictures (for example, smiley pictures, angry face pictures, etc.) in the input method to the sender according to the user's voice.
- the user can preset the correspondence between the fourth keyword and the picture in the input method. For example: "Smile” corresponds to a smiley picture, "Angry” corresponds to a picture of a angry face, and the like.
- the terminal when the terminal has determined that the fourth voice is the user's reply to the first message or the fourth message, and detects that the fourth voice of the user includes a fourth keyword, the terminal corresponds to the picture according to the fourth keyword Relationship, reply the corresponding picture to the sender. This enriches the diversity of user response messages and improves user experience.
- the terminal when receiving a message, may also perform a natural semantic analysis on the content of the message.
- the terminal can call related application modules or functional modules according to the result of natural semantic analysis, and provide users with more relevant information through voice playback. For example: The content of the message is "I'm going to your business tomorrow, how is the weather over there?".
- the terminal asks the weather through semantic analysis, then it can call the weather-related application in the terminal, obtain weather information from the weather-related application, and play it to the user.
- the content of the message is "Where to eat today?"
- the terminal asks restaurants through natural language analysis, then the terminal can call the map to query the restaurants near the terminal, or the terminal can call the public comment application to query the restaurants that the user frequents Information and play to the user. In this way, the interaction between the user and the terminal is more efficient and the user experience is improved.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Telephone Function (AREA)
Abstract
本申请提供的一种消息的播放方法及终端,涉及通信技术领域,有利于提高用户与终端的交互效率,提升用户体验。该方法具体包括:终端接收到第一消息,终端语音询问是否播放第一消息,若用户的第一语音不匹配肯定答复的关键词,则终端继续检测用户语音,若用户检测到的用户的第二语音匹配肯定答复的关键词,则终端语音播放第一消息,且记录第一语音对应文本的次数;当终端记录的第一语音对应的文本的次数大于第一阈值时,终端将该文本添加到肯定答复的关键词中。
Description
本申请涉及通信技术领域,尤其涉及一种消息的播放方法及终端。
随着终端技术的发展,手机上的即时通信类应用(例如:短信、微信、QQ等)逐渐成为用户生活、工作和学习中必不可少的通信工具。在终端接收到即时通信类应用的消息时,用户需要手动操作手机,以查看和处理消息。在某些用户不方便操作手机的场景下,例如:用户正在驾车,终端与用户可以通过语音的方式进行交互,以确定是否语音播放消息。
然而,在终端与用户通过语音的方式进行交互的过程中,常常出现终端不能识别出语音中用户的真实意图,从而造成终端不能按照用户意图对消息进行播放等处理,影响了用户的使用体验。
发明内容
本申请提供的一种消息的播放方法及终端,可以对用户的语音命令进行学习,从而识别出用户的意图,执行相应的操作,有利于提高用户与终端的交互效率,提升用户体验。
第一方面,本申请实施例提供的方法可应用于终端,该方法包括:终端接收第一消息,第一消息为文本信息;响应于接收到第一消息,终端播放第一语音,第一语音用于询问用户是否语音播放第一消息;终端检测到用户的第二语音;终端将第二语音转化为第一文本;若第一文本不匹配第一关键词,终端继续检测用户的语音;第一关键词为肯定关键词;当终端检测到用户的第三语音,终端将第三语音转化为第二文本;若第二文本匹配第一关键词,终端语音播放第一消息,且终端记录第一文本的次数;若第一文本的次数大于第一阈值,则终端将第一文本添加到第一关键词中。
由此可见,本申请实施例提供的技术方案能够通过对用户的非预先设置的回答进行学习,从而确定是否是肯定答复,即用户是否希望播放消息。这样,提升终端执行命令的准确性,提高终端语音播放消息的成功率,使得终端更加智能化,有利于提升用户使用终端的体验。
一种可能的实现方式中,该方法还包括:终端将第一消息转换为第四语音;终端语音播报第一消息具体为:终端播放第四语音。
在本申请的一些实施例中,终端可以在确定第二文本匹配第一关键词后,将第一消息的文本信息转换成语音消息(即第四语音),然后播放该语音消息。在本申请的另一些实施例中,终端可以在确定第二文本匹配第一关键词之前,将第一消息的文本信息转换成语音消息(即第四语音)。当确定第二文本匹配第一关键词后,终端可以直接播放该语音消息。这样,有利于减少用户等待终端语音播放第一消息的时间,提升用户体验。例如:终端可以在接收到第一消息后,或者接收到用户的第一语音后, 或将用户的第三语音转换成第二文本后,或用户将第二文本与预先设置的第一关键词进行匹配后,将第一消息的文本信息转换为第四语音。本申请实施例对终端将第一消息的文本信息转化为语音消息的时间不做限定。
一种可能的实现方式中,在终端将第一文本添加到第一关键词中之后,该方法还包括:终端接收第二消息,第二消息为文本信息;响应于接收到第二消息,终端播放第五语音,第五语音用于询问用户是否语音播放第二消息;终端检测到用户的第六语音;终端将第六语音转化为第三文本;若第三文本匹配添加后的第一关键词,终端语音播报第二消息。
由此可见,终端成功学习了第一文本中用户的意图后,当用户再次使用第一文本对应的语音时,终端可以快速识别出用户的意图,语音播放第二消息。提升了用户和终端之间的交互效率,提升用户体验。
一种可能的实现方式中,在终端播放第一语音之前,该方法还包括:若终端确定第一消息属于预设应用、和/或第一消息的发件人属于预设联系人群组、和/或第一消息包含第二关键词,终端确定播放第一语音。
由此可见,终端在进行语音播放消息之前,还可以对消息进行筛选。这样,有利于用户根据需求选择特定的消息进行语音播放,能够避免过多的消息通过语音播放,而打扰到用户,有利于提升用户体验。
一种可能的实现方式中,在终端播放第一语音之前,该方法还包括:终端在接收到第一消息的同时,还接收到第三消息;终端根据预设的优先级顺序,确定第一消息的优先级高于第三消息的优先级。
由此可见,终端在同时接收到多条消息时,可以根据预设的优先级顺序确定消息的播放顺序,有利于满足用户多样化的需求,提升用户体验。
一种可能的实现方式中,在终端将第一文本添加到第一关键词中之后,该方法还包括:终端显示提示信息,用于提示终端已更新第一关键词。
一种可能的实现方式中,在终端播放第一语音之后,该方法还包括:若终端在预设时间段内一直未检测到用户的语音,或者终端在预设时间段内一直未检测到与第一关键词匹配的用户的语音,则终端确定不语音播放第一消息。
一种可能的实现方式中,在终端播放第一语音之后,该方法还包括:若终端在预设时间段内检测到,与第一关键词不匹配的用户的语音的次数大于第二阈值,则终端确定不语音播放第一消息。
一种可能的实现方式中,第一消息为即时通信类应用的消息。
第二方面,本申请实施例提供的一种消息的播放方法,可应用于终端,该方法包括:终端接收第一消息,第一消息为文本信息;响应于接收到第一消息,终端播放第一语音,第一语音用于询问用户是否语音播放第一消息;终端检测到用户的第二语音;终端将第二语音转化为第一文本;若第一文本不匹配第一关键词,终端继续检测用户的语音;第一关键词包括肯定关键词和否定关键词;当终端检测到用户的第三语音,终端将第三语音转化为第二文本;若第二文本匹配肯定关键词,则终端语音播放第一消息,且终端记录第一文本的次数;若第一文本的次数大于第一阈值,则终端将第一文本添加到肯定关键词中。若第二文本匹配否定关键词,则终端确定不语音播放第一 消息,且终端记录第一文本的次数;若第一文本的次数大于第一阈值,则终端将第一文本添加到否定关键词中。
由此可见,本申请实施例提供的技术方案能够通过对用户的非预先设置的回答进行学习,从而确定是肯定答复还是否定答复,即用户是否希望播放消息。这样,提升终端执行命令的准确性,提高终端语音播放消息的成功率,使得终端更加智能化,有利于提升用户使用终端的体验。
一种可能的实现方式中,在终端将第一文本添加到肯定关键词中或否定关键词中之后,该方法还包括:终端接收第二消息,第二消息为文本信息;响应于接收到第二消息,终端播放第四语音,第四语音用于询问用户是否语音播放第二消息;终端检测到用户的第五语音;终端将第五语音转化为第三文本;若第三文本匹配添加后的肯定关键词,终端语音播放第二消息;若第三文本匹配添加后的否定关键词,终端确定不语音播报第二消息。
一种可能的实现方式中,在终端播放第一语音之前,该方法还包括:若终端确定第一消息属于预设应用、和/或第一消息的发件人属于预设联系人群组、和/或第一消息包含第二关键词,终端确定播放第一语音。
一种可能的实现方式中,在终端播放第一语音之前,该方法还包括:终端在接收到第一消息的同时,还接收到第三消息;终端根据预设的优先级顺序,确定第一消息的优先级高于第三消息的优先级。
一种可能的实现方式中,在终端将第一文本添加到肯定关键词或否定关键词之后,该方法还包括:终端显示提示信息,用于提示用户第一关键词已更新。
一种可能的实现方式中,在终端播放第一语音之后,该方法还包括:若终端在预设时间段内一直未检测到用户的语音,或者终端在预设时间段内一直未检测到与第一关键词匹配的用户的语音,则终端确定不语音播放第一消息。
一种可能的实现方式中,在终端播放第一语音之后,该方法还包括:若终端在预设时间段内检测到,与第一关键词不匹配的用户的语音的次数大于第二阈值,则终端确定不语音播放第一消息。
一种可能的实现方式中,第一消息为即时通信类应用的消息。
第三方面、一种终端,包括:处理器、存储器和触摸屏,存储器、触摸屏与处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以执行如第一方面及其中任一种可能的实现方式中的所述的方法。
第四方面、一种终端,包括:处理器、存储器和触摸屏,存储器、触摸屏与处理器耦合,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令,当处理器从存储器中读取计算机指令,以执行如第二方面及其中任一种可能的实现方式中的所述的方法。
第五方面、一种计算机存储介质,包括计算机指令,当计算机指令在终端上运行时,使得终端执行如第一方面及其中任一种可能的实现方式中所述的方法。
第六方面、一种计算机存储介质,包括计算机指令,当计算机指令在终端上运行时,使得终端执行如第二方面及其中任一种可能的实现方式中所述的方法。
第七方面、一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面中及其中任一种可能的实现方式中所述的方法。
第八方面、一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第二方面中及其中任一种可能的实现方式中所述的方法。
图1为本申请实施例提供的一种终端的结构示意图一;
图2为本申请实施例提供的一种终端的结构示意图二;
图3为本申请实施例提供的一种消息的播放方法的流程示意图一;
图4为本申请实施例提供的一种消息的播放方法的流程示意图二;
图5为本申请实施例提供的一种消息的播放方法的流程示意图三;
图6为本申请实施例提供的一种消息的播放方法的流程示意图四;
图7为本申请实施例提供的一些终端界面的示意图;
图8为本申请实施例提供的又一些终端界面的示意图。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
示例性的,本申请中的终端可以为手机、平板电脑、个人计算机(Personal Computer,PC)、个人数字助理(personal digital assistant,PDA)、智能手表、上网本、可穿戴电子设备、增强现实技术(Augmented Reality,AR)设备、虚拟现实(Virtual Reality,VR)设备等,本申请对该终端的具体形式不做特殊限制。
如图1所示,是本发明实施例的终端100的一种结构框图的示例。
终端100可以包括处理器110,外部存储器接口120,内部存储器121,USB接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,射频模块150,通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及SIM卡接口195等。其中传感器模块可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
本发明实施例示意的结构并不构成对终端100的限定。可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(Neural-network Processing Unit,NPU)等。其中,不同的处理 单元可以是独立的器件,也可以是集成在同一个处理器中。
其中,控制器可以是指挥终端100的各个部件按照指令协调工作的决策者。是终端100的神经中枢和指挥中心。控制器根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
在本申请的一些实施例中,应用处理器用于获取用户语音,并将获取到的用户语音转化成文本,还可将转化成的文本与预先存储的关键词进行匹配,记录文本的次数,当文本的次数达到预设次数时,将该文本添加到对应的关键词中等。应用处理器还可以用于通过射频模块或通信模块等获取其他终端或服务器发送给该终端的文本消息,将接收到的文本消息转化成语音等。
此外,处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器中的存储器为高速缓冲存储器。可以保存处理器刚用过或循环使用的指令或数据。如果处理器需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器的等待时间,因而提高了系统的效率。
在本申请的一些实施例中,终端可以在处理器110中的存储器中存储用户预先设置的关键词,例如肯定答复的关键词、和/或否定答复的关键词等。终端还可以在存储器中存储记录的语音命令的内容以及语音命令的次数等。在本申请的另一些实施例中,终端也可以在内部存储器121或外部存储器中存储这些数据,本申请实施例不做具体限定。
在一些实施例中,处理器110可以包括接口。其中接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
其中,I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器可以包含多组I2C总线。处理器可以通过不同的I2C总线接口分别耦合触摸传感器,充电器,闪光灯,摄像头等。例如:处理器可以通过I2C接口耦合触摸传感器,使处理器与触摸传感器通过I2C总线接口通信,实现终端100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器可以包含多组I2S总线。处理器可以通过I2S总线与音频模块耦合,实现处理器与音频模块之间的通信。在一些实施例中,音频模块可以通过I2S接口向通信模块传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块与通信模块可以通过PCM总线接口耦合。在一些实施例中,音频模块也可以通过PCM接口向通信模块传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信,两种接口的采样速率不同。
UART接口是一种通用串行数据总线,用于异步通信。该总线为双向通信总线。 它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器与通信模块160。例如:处理器通过UART接口与蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块可以通过UART接口向通信模块传递音频信号,实现通过蓝牙耳机播放音乐的功能。
在本申请实施例中,终端可以通过I2S接口、PCM接口和UART接口中任一种或任几种的接口,来实现消息的语音播放,以及将录制的用户语音传递给处理器等。
MIPI接口可以被用于连接处理器与显示屏,摄像头等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器和摄像头通过CSI接口通信,实现终端100的拍摄功能。处理器和显示屏通过DSI接口通信,实现终端100的显示功能。
在本申请实施例中,终端可以通过MIPI接口显示终端在执行语音播放的过程中涉及到的界面图,例如:用户的设置界面等。
GPIO接口可以通过软件配置。GPIO接口可以配置为控制信号,也可配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器与摄像头,显示屏,通信模块,音频模块,传感器等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口可以用于连接充电器为终端100充电,也可以用于终端100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。还可以用于连接其他电子设备,例如AR设备等。
本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。终端100可以采用本发明实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块接收所述电池和/或充电管理模块的输入,为处理器,内部存储器,外部存储器,显示屏,摄像头,和通信模块等供电终端100的无线通信功能可以通过天线1,天线2,射频模块150,通信模块160,调制解调处理器以及基带处理器等实现。
其中,天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单个或多个通信频带射频模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决方案的通信处理模块。可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(Low Noise Amplifier,LNA)等。调制解调处理器可以包括调制器和解调器。调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器,受话器等)输出声音信号,或通过显示屏显示图像或视频通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN),蓝牙(bluetooth,BT),全球导航卫星系 统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案的通信处理模块。通信模块160可以是集成至少一个通信处理模块的一个或多个器件。通信模块经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器。通信模块160还可以从处理器接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在本申请的一些实施例中,可以通过蓝牙耳机(或蓝牙音箱等)中的麦克风录制用户语音,并将录制的语音经蓝牙通信处理模块和音频模块170传递到处理器110处。在处理器110将接收到的文本消息转化成语音后,终端还可以将该语音通过音频模块170和蓝牙通信处理模块,由蓝牙耳机(或蓝牙音箱等)播放该语音。
在一些实施例中,终端100的天线1和射频模块耦合,天线2和通信模块耦合。使得终端100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS))和/或星基增强系统(satellite based augmentation systems,SBAS)。
在本申请实施例中,终端可以通过天线1和射频模块接收其他终端发送的消息,例如:短信消息。终端还可以通过天线2和通信模块接收其他终端发送的消息,例如:微信消息、QQ消息等。本申请实施例对消息不做具体限定。
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏包括显示面板。显示面板可以采用LCD(liquid crystal display,液晶显示屏),OLED(organic light-emitting diode,有机发光二极管),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括1个或N个显示屏,N为大于1的正整数。
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏以及应用处理器等实现拍摄功能。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口与处理器通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,其他易失性固态存储器件,通用闪存存储器(universal flash storage,UFS)等。
终端100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块还可以用于对音频信号编码和解码。在一些实施例中,音频模块可以设置于处理器110中,或将音频模块的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端100可以通过扬声器收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端100接听电话或语音信息时,可以通过将受话器靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风发声,将声音信号输入到麦克风。终端100可以设置至少一个麦克风。在一些实施例中,终端100可以设置两个麦克风,除了采集声音信号,还可以实现降噪功能。在一些实施例中,终端100还可以设置三个,四个或更多麦克风,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口可以是USB接口,也可以是3.5mm的开放移动终端平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
在本申请的一些实施例中,终端可以通过麦克风170C录制用户语音,并将录制的语音经音频模块170传递到处理器110处。在处理器110将接收到的文本消息转化成语音后,终端还可以将该语音通过音频模块170,由扬声器播放该语音。在本申请的另一些实施例中,终端可以通过有线耳机中的麦克风录制用户语音,并将录制的语音经耳机接口170D和音频模块170传递到处理器110处。在处理器110将接收到的文本消息转化成语音后,终端还可以将该语音通过音频模块170和耳机接口170D,由有线耳机播放该语音。
触摸传感器180K,也称“触控面板”。可设置于显示屏。用于检测作用于其上或附近的触摸操作。可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型,并通过显示屏提供相应的视觉输出。
按键190包括开机键,音量键等。按键可以是机械按键。也可以是触摸式按键。 终端100接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏不同区域的触摸操作,也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接用户标识模块(subscriber identity module,SIM)。SIM卡可以通过插入SIM卡接口,或从SIM卡接口拔出,实现和终端100的接触和分离。终端100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口也可以兼容不同类型的SIM卡。SIM卡接口也可以兼容外部存储卡。终端100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端100中,不能和终端100分离。
终端100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明终端100的软件结构。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息,微信,QQ,设置等应用程序。
在本申请的一些实施例中,涉及到的应用程序包主要包括即时通信类的应用,包括但不限于短信息、微信、QQ等应用程序。在本申请另一些实施例中,还涉及到设置应用,为用户提供对语音播放消息进行设置的界面。设置的内容包括且不限于预设应用、预设联系人、预设联系人群组、预设的第二关键词、以及播放优先级等。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
其中,窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图 标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供终端100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,终端振动,指示灯闪烁等。
在本申请一些实施例中,应用程序框架层还可包括语音播放系统,该语音播放系统提供对即时消息进行语音播放的服务。需要说明的是,语音播放系统可以是应用框架层中独立的一个模块,语音播放系统也可以调用应用程序框架层的其他模块,共同完成即时消息的语音播放功能,本申请实施例不做具体限定。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(OpenGL ES),2D图形引擎(SGL)等。
其中,表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
OpenGL ES用于实现三维图形绘图,图像渲染,合成,和图层处理等。
SGL是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面以处理短信息应用的消息为例,示例性说明终端100软件以及硬件的工作流程。
当应用程序层中的短信息应用接收到一条消息时,可通过调用内核层的显示驱动,通过硬件层的触摸屏中显示一条消息的提示信息,提示用户查看该消息。那么,用户通过硬件层的触摸屏点击该消息的提示信息所对应的控件后,可触发触摸屏通过相应的驱动向内核层上报用户这一触摸动作产生的触摸事件(例如触摸点位置、时间等参数),内核层将该触摸事件封装后调用相应的API向短信息应用分发该触摸事件。而 后,终端打开短消息应用,显示查看该消息的界面。这样,用户便可查看消息的内容。
考虑到用户不方便手动操作终端情况,应用程序层中的短信息应用接收到一条消息时,可以调用框架层的语音播放系统。语音播放系统可以通过调用内核层的音频驱动,通过音频输出设备(例如:蓝牙耳机、扬声器等),播放询问用户是否播放该消息的语音。而后,音频输入设备(例如:蓝牙耳机、麦克风等)录制用户的语音,然后将录制的用户的语音通过相应的驱动向内核层上报,内核层将该事件封装后调用相应的API向框架层的语音播放系统分发该事件。而后,语音播放系统根据该事件确定是否语音播放消息。具体的,语音播放系统可将上报的用户语音转化成文本,并将转化后的文本与预先存储的关键词(肯定答复关键词和/或否定答复关键词)进行匹配。若匹配肯定答复的关键词,则确定语音播放该消息。那么,语音播放系统将该消息转化成语音消息,并调用内核层中的音频驱动,通过音频输出设备播放该语音消息。若匹配否定答复的关键词,则确定不语音播放该消息。这样,用户在不方便手动操作终端的时候,也可以处理消息了。
再有,在本申请实施例中,语音播放系统还可以记录转化后文本的次数,当转化后文本的次数达到预定次数后,还可以将该文本添加到关键词中,以达到学习用户语音的效果。此外,语音播放系统还可以调用内核层的显示驱动,通过触摸屏显示本申请实施例中涉及的界面,例如:图7和图8中所示的界面图。
以下实施例中的方法均可以在具有上述硬件结构和软件结构的终端100中实现。
为了使得用户在使用终端时,能在不方便手动操作终端的情况下,也能够及时处理一些重要的紧急的即时消息,本申请实施例提供了一种通过语音播放即时消息的方法。进一步的,考虑到用户可能忘记预先设置的语音命令或者用户的语言习惯等因素,造成用户的语音命令不是终端预先设置的语音命令,从而终端不能识别出用户的意图,不能执行用户希望的操作。为此,在本申请实施例提供的技术方案中,能够对用户的语音命令进行学习,自动识别出用户语音命令的真实意思,提升终端的使用效率,提升用户体验。
下面结合附图和具体的应用场景,对本申请实施例提供的技术方案进行介绍。
第一种应用场景,终端询问用户是否语音播放新接收到的消息。
如图3所示,为本申请实施例提供的一种语音播放消息的方法的流程示意图,具体包括:
S301、终端接收到第一消息。
在本申请的一些实施例中,终端接收其他终端或服务器发送的第一消息。其中,第一消息可以是即时通信类应用的消息,例如:短信应用的消息、微信应用的消息、QQ应用的消息等。
S302、终端语音询问是否播放第一消息。
终端新接收到第一消息后,在终端界面中显示第一消息的提示信息。终端可以在显示第一消息的提示信息之前、或同时、或之后,通过语音的方式询问用户是否播放该第一消息。
在本申请实施例中,终端可以通过扬声器、有线耳机、无线耳机、蓝牙音箱、蓝牙车载设备等音频设备进行语音播放,本申请实施例对此不做具体限定。
示例性的,如图8中(1)所示,为终端显示的一种界面801。其中,界面801中可以显示有状态栏802、消息提示框803、图案804、以及时间Widget等。其中,状态栏802可以包括运营商的名称(例如中国移动)、时间、WiFi图标、信号强度和当前的剩余电量等。界面801为终端语音询问用户是否播放第一消息的界面图。其中,终端可以动态显示图案804,或者改变图案804的颜色、灰度等方式,以提示用户终端正在播放语音。终端也可以在显示文本信息,提示终端正在询问用户是否播放第一消息。需要说明的是,本申请实施例对终端的提示方式不做限定。
S303、终端检测到用户的第一语音。
具体的,终端通过音频输入设备录制用户的第一语音,并将录制的用户的第一语音发送到终端的应用处理器中进行处理。
示例性的,如图8中(2)所示,为终端显示的一种界面805。其中,界面805中可以显示有状态栏802、消息提示框803、图案806、以及时间Widget等。界面805为终端检测到用户语音的界面。其中,终端可以动态显示图案806,或者改变图案806的颜色、灰度等方式,以提示用户终端检测到用户语音,或者正在处理检测到的用户语音。终端也可以在显示文本信息,以提示终端检测到用户语音,以及对检测到的用户语音进行处理中。需要说明的是,本申请实施例对终端检测到用户语音(或正在处理用户语音)的提示方式不做限定。
S304、终端将第一语音转化为文本信息,记为第一命令。
S305、终端将第一命令与终端预先存储的第一关键词进行匹配。
其中,第一关键词可以包括终端预先设置的命令,例如:肯定答复的命令、否定答复的命令等。第一关键词可以是终端默认的,也可以是用户设置的。第一关键词还可以是终端学习到的,具体学习方法可参考下面的描述。
具体的,在本申请的一些实施例中,预先设置的第一关键词可以为肯定答复的关键词,也就是说,第一命令若与第一关键词匹配,则能确定第一命令为用户希望语音播放第一消息。那么,第一命令若与第一关键词不匹配,则需要采用本申请实施例的方法(如图3所示),对第一命令进行学习,以确定第一命令是否为肯定答复。
在本申请的另一些实施例中,预先设置的第一关键词还可以为否定答复的关键词,也就是说,第一命令若与第一关键词匹配,则能确定第一命令为用户不希望语音播放第一消息。那么,终端对第一命令进行学习,以确定第一命令为否定答复。在本申请的又一些实施例中,预先设置的第一关键词还可以既包括肯定答复的关键词,又包括否定答复的关键词。那么,终端需要根据第一命令与哪类第一关键词匹配,分别处理。这种情况会在下文进行阐述。本申请实施例对此不限定。
以下步骤S306-S313以第一关键词是肯定答复为例,对终端学习第一命令的过程进行说明。若第一命令与第一关键词匹配,则确定第一命令为肯定答复,用户希望语音播放第一消息。于是则终端通过语音方式播放第一消息,即执行步骤S306。否则,则执行步骤S307。
S306、终端语音播放第一消息。
其中,语音播放第一消息具体包括:语音播放第一消息的内容,也可以播放第一消息所属的应用名称,第一消息的发件人的名称等。
在本申请的一些实施例中,终端可以在确定第一命令为肯定答复后,将第一消息的文本信息转换成语音消息,然后播放该语音消息。
在本申请的另一些实施例中,终端可以在确定第一命令为肯定答复之前,将第一消息的文本信息转换成语音消息。当确定第一命令为肯定答复后,终端可以直接播放该语音消息。这样,有利于减少用户等待终端语音播放第一消息的时间,提升用户体验。例如:终端可以在接收到第一消息后,或者接收到用户的第一语音后,或将用户的第一语音转换成第一命令后,或用户将第一命令与预先设置的第一关键词进行匹配后,将第一消息的文本信息转换为语音消息。本申请实施例对终端将第一消息的文本信息转化为语音消息的时间不做限定。
示例性的,如图8中(3)所示,为终端显示的一种界面807。其中,界面807中可以显示有状态栏802、消息提示框803、图案804、以及时间Widget等。界面807为终端正在播放第一消息的界面。其中,终端可以动态显示图案804,或者改变图案804的颜色、灰度等方式,以提示用户终端正在播放第一消息。终端也可以在显示提示信息,提示用户终端正在播放第一消息。需要说明的是,本申请实施例对终端正在播放第一消息的提示方式不做限定。
S307、终端暂不播放第一消息,继续监听用户的语音。
示例性的,如图8中(5)所示,为终端显示的一种界面809。其中,界面809中可以显示有状态栏802、消息提示框803、图案806、以及时间Widget等。界面809为终端未识别出用户语音命令,继续监听用户语音的界面。其中,终端可以动态显示图案806,或者改变图案806的颜色、灰度等方式,以提示用户终端未识别出用户语音命令。终端也可以在显示文本信息,提示用户终端未识别出用户语音命令,继续监听用户语音。终端还可以通过语音的方式,提示用户终端未识别出用户的指示,而后再继续监测用户语音。需要说明的是,本申请实施例对终端具体的提示形式不做限定。
S308、终端检测用户的第二语音。
S309、终端将检测到的第二语音转换成文本信息,记为第二命令。
S310、终端将第二命令与终端预先设置的第一关键词进行匹配。若第二命令与预先设置的第一关键词不匹配,则执行步骤S311。若第二命令与预先设置的第一关键词匹配,则执行步骤S312。
S311、终端暂不播放第一消息,终端继续监听用户的语音。
在本申请的另一些实施例中,若终端在预设的时间段内(例如:30秒,该预设时间段可以是终端默认的,也可以是用户设置的)一直未接收到用户的语音,或者在预设的时间段内一直未接收到与设置的第一关键词相匹配的用户的语音时,终端可以结束本次流程。也就是说,终端默认用户不希望通过语音播放该第一消息。
在本申请的又一些实施例中,若终端在检测到与设置的第一关键词不匹配的用户语音达到预设次数时,终端可以结束本次流程。也就是说,终端默认用户不希望通过语音播放该第一消息。
示例性的,如图8中(4)所示,为终端显示的一种界面808。其中,界面808中可以显示有状态栏802、消息提示框803、图案804、以及时间Widget等。界面808为终端确定不播放第一消息的界面。其中,终端可以改变图案804的颜色、灰度等方 式,以提示用户终端不播放第一消息。终端也可以显示文本信息,以提示用户终端不播放第一消息。需要说明的是,本申请实施例对终端提示用户,终端不播放第一消息的方式不做限定。
S312、终端通过语音方式播放第一消息。并且,终端记录第一命令的内容以及第一命令的次数。
需要说明的是,当第一命令与预先设置的第一关键词不匹配时存在两种可能的情况。一种可能是,用户希望播放第一消息的,但可能忘记预先设置的肯定答复的内容,故第一语音转化的第一命令与预先设置的肯定答复不同。另一种可能是,用户的第一语音不是针对终端询问进行的回答。例如:第一语音可能是用户与其他人的对话。终端接收到第一语音后,误认为是用户的第一命令。
此时,终端需要记录第一命令的内容,以及用户使用第一命令的次数。
S313、当终端记录第一命令的次数为预定次数(或者第一命令的次数大于预定次数)时,终端自动将第一命令添加到第一关键词。
若在预定次数(例如:M次,M为大于2的自然数)的场景中,用户都是先使用第一命令回复终端询问是否播放消息,之后再使用终端设置的肯定答复进行回答的,则终端可以认为用户的第一命令为肯定答复,用户是希望终端通过语音播放消息的。于是,终端学习到用户的第一命令是肯定答复。
示例性的,如图8中(6)所示,为终端显示的一种界面810。其中,界面810中可以显示有状态栏802、消息提示框803、图案806、以及时间Widget等。界面810为终端已成功学习第一命令的界面。其中,终端可以改变图案806的颜色、灰度等方式,以提示用户终端已学习第一命令。终端也可以显示文本信息,以提示用户终端已成功学习第一命令,或将第一命令添加到肯定答复的关键词中。需要说明的是,本申请实施例对终端的提示方式不做限定。
需要说明的是,终端可以在播放消息之后,显示已成功学习第一命令的提示信息,也可以在播放消息之前,显示已成功学习第一命令的提示信息,也可以不显示成功学习的提示信息,本申请实施例对此不做限定。
而后,当终端再次接收到消息(例如第三消息)时,终端语音询问是否播放该消息。当终端检测到用户的语音,且用户的语音被转化为第一命令(也就是说此时用户语音与第一语音的内容相同)。终端将第一命令与设置的第一关键词进行匹配。此时,由于终端设置的第一关键词中包含有第一命令(学习的结果),故匹配成功。终端确定用户希望语音播放消息,故终端语音播放第三消息。
例如:用户设置的肯定答复为“是”和“播放”。那么,当用户第一次回答使用“请说”,终端将“请说”与设置的肯定答复进行匹配。确定“请说”不是设置的肯定答复,则终端暂不语音播放消息,终端继续监听用户的回答。用户第二次回答使用“播放”,终端将“播放”与设置的肯定答复进行匹配。确定“播放”是设置的肯定答复,则终端语音播放消息。并且,终端记录“请说”一次。之后,在终端询问用户是否需要语音播放消息后,若用户仍然先回答“请说”,后再回答终端设置的肯定答复进行回答的。终端记录“请说”两次。一直到终端记录“请说”达到预设次数后,则终端学习到“请说”为肯定答复。则终端可以将“请说”设置为肯定答复。再之后,当终端再次接收到用户使用“请说”,终端 将“请说”与设置的肯定答复进行匹配时,能够确定“请说”为肯定答复,则终端语音播放消息。
由此可见,本申请实施例提供的技术方案能够通过对用户的非预先设置的回答进行学习,从而确定用户的真实意图,用户是否希望播放消息。这样,提升终端执行命令的准确性,提高终端语音播放消息的成功率,使得终端更加智能化,有利于提升用户使用终端的体验。
如图4所示,为本申请实施例提供的另一种语音播放消息的方法的流程示意图,具体包括:S101-S113,如下:
S101、终端新接收到消息。
S102、终端语音询问用户是否播放该消息。
示例性的,在S102的执行前,执行时或者执行后,终端初始化各个参数,n=0,time=0。其中,n用于标记终端从本步骤开始,一直到该流程结束的过程中,检测到用户语音命令的次数。time用于计时。在预设时间段内,若终端一直未检测到用户语音命令,或者用户的语音一直未确定是命令播放消息,则终端结束本流程,默认用户不希望语音播放该消息。在本申请的一些实施例中,终端也可以在步骤S101中初始化time,从步骤S101开始计时,本申请实施例对此不做限定。
需要说明的是,m用于标记终端学习用户语音命令过程中,对语音的文本学习的记录的次数。在本申请的一些实施例中,m值的初始化(m=1),可以是终端在第一次记录需要学习的语音命令的时候。在本申请的另一些实例中,m值的初始化(m=0),可以是终端第一次开启语音播放消息的功能的时候,也可以是终端第一次开启学习语音命令功能的时候。本申请实施例对此不做限定。
S103、终端检测到用户语音,记录检测到用户语音的次数n。
需要说明的是,本实施例中n的初始值为0,当终端每次在检测到用户语音后,更新n的值,将n加1,具体实现时,可以通过代码“n=n+1”实现。并将本次检测到的用户语音记录为第n次检测到的用户语音。
S104、终端将第n次检测到的语音转化为文本。
S105、终端将转化的文本与设置的第一关键词进行匹配。若不匹配则执行步骤S106,若匹配,则执行步骤S108。
其中,这里设置的第一关键词为肯定答复。也就是说,若转化的文本与肯定答复的第一关键词匹配,则认为用户希望语音播放消息。否则,需进一步学习用户语音是否希望语音播放消息。
S106、终端暂不语音播放消息。
S107、终端判断计时(time)是否达到预设时间。若计时达到预设时间,则流程结束。若计时未达到预设时间,继续执行步骤S103。
S108、终端语音播放消息。
S109、终端判断n是否大于或等于2。若大于或等于2,则执行步骤S110,否则,本流程结束。
若n不是大于或等于2,那么,说明用户在接收到该消息后,第一次发出语音命令时,就使用了预先设置的肯定答复的第一关键词进行答复。那么,终端识别出用户 的语音命令,语音播放该消息。故也就不存在要对用户上一次语音命令进行学习,即不用再执行下面的学习过程,流程结束。
若n大于或等于2,那么,说明用户在接收到该消息后,是在第二次或再之后发出语音命令时,才使用了预先设置的肯定答复的第一关键词进行答复的。也就是说,在这之前,用户有至少一次语音命令没有使用预先设置的肯定答复的第一关键词,于是,终端需要对这至少一次的语音命令进行学习,以确定这至少一次的语音命令是否是肯定答复,即执行S110及之后的步骤。
S110、终端对上一次(即第n-1次)检测到的语音的文本进行语义分析,确定上一次检测到的语音的文本是否是肯定答复。若上一次检测到的语音的文本是肯定答复,则执行步骤S111。否则,本流程结束。
在本申请的一些实施例中,本步骤可以通过语义分析对上一次检测到的语音的文本是肯定答复、还是否定答复,或者两者都不是进行确定,能够为终端确定上一次检测到的语音的文本是否是肯定答复提供更多依据。
在本申请的另一些实施例中,终端可以对本次之前(从第1次到第n-1次)检测的语音的文本都进行语义分析。这是由于第1次到第n-1次检测到的语音都有可能是用户希望语音播放消息的不同表达方式。这样,有利于提高终端学习的能力和效率。
在本申请的又一些实施例中,终端也可以不执行步骤S110,直接执行S111,本申请实施例不做具体限定。
S111、终端记录上一次(即第n-1次)检测到语音的文本,m=m+1。
在本申请的另一些实施例中,终端可以对本次之前(从第1次到第n-1次)检测到的所有语音的文本进行分别记录。在这种情况下,每一次检测到的语音的文本对应一个m值,每次检测到的语音的文本对应的m加1。
S112、终端判断m是否为预定次数。若m为预定次数,则执行步骤S113。否则,本流程结束。
在本申请的另一些实施例中,终端记录从第1次到第n-1次检测到的语音的文本。在这种情况下,终端可以针对每一次检测到的语音的文本对应的m值,分别确定每次检测到的语音的文本对应的m值是否为预定次数。
S113、终端将上一次(即第n-1次)检测到的语音的文本添加到第一关键词中。
在本申请的另一些实施例中,终端记录从第1次到第n-1次检测到的语音的文本。在这种情况下,终端可以将对应的m值满足预定次数的那一次或几次检测到的语音的文本设置为第一关键词。
如图5所示,为本申请实施例提供的另一种语音播放消息的方法的流程示意图,该流程中以设置的第一关键词包括肯定答复和否定答复为例进行说明,该流程具体包括步骤S101-S104,以及步骤S201-S215。具体如下:
S201、终端判断第n次检测到的语音的文本是否与设置的关键匹配。若确定第n次检测到的语音的文本与肯定答复的第一关键词以及否定答复的第一关键词均不匹配,则执行步骤S202。若确定第n次检测到的语音的文本与肯定答复的第一关键词匹配,则执行步骤S204。若确定第n次检测到的语音的文本与否定答复的第一关键词匹配,则执行步骤S210。
S202、终端暂不语音播放消息。
S203、终端判断计时(time)是否达到预设时间。若计时达到预设时间,则流程结束。若计时未达到预设时间,继续执行步骤S103。
S204、终端语音播放消息。
S205、终端判断n是否大于或等于2。若大于或等于2,则执行步骤S206,否则,本流程结束。
S206、终端对第n-1次检测到的语音的文本进行语义分析,确定第n-1次检测到的语音的文本是否是肯定答复。若第n-1次检测到的语音的文本是肯定答复,则执行步骤S207。否则,本流程结束。
S207、终端记录第n-1次检测到语音的文本,记录m=m+1。
S208、终端判断m是否为预定次数。若m为预定次数,则执行步骤S209。否则,本流程结束。
S209、终端将第n-1次检测到的语音的文本添加到肯定答复的第一关键词中。
其中,步骤S204-S209可参考步骤S108-S113,不再重复赘述。
S210、终端不语音播放消息。
S211、终端判断n是否大于或等于2。若大于或等于2,则执行步骤S212,否则,本流程结束。
S212、终端对第n-1次检测到的语音的文本进行语义分析,确定第n-1次检测到的语音的文本是否是肯定答复。若第n-1次检测到的语音的文本是肯定答复,则执行步骤S213。否则,本流程结束。
在本申请的一些实施例中,本步骤可以通过语音分析对第n-1次检测到的语音的文本是肯定答复、还是否定答复,或者两者都不是进行确定,能够为终端确定第n-1次检测到的语音的文本是否定答复提供更多依据。
在本申请的另一些实施例中,终端可以对本次之前(从第1次到第n-1次)检测到的语音的文本都进行语义分析。这是由于第1次到第n-1次检测到的语音都有可能是用户希望不语音播放消息的不同表达方式。这样,有利于提高终端学习的能力和效率。
在本申请的又一些实施例中,终端也可以不执行本步骤,直接执行S213,本申请实施例不做具体限定。
S213、终端记录第n-1次检测到语音的文本,记录m=m+1。
在本申请的另一些实施例中,终端可以分别记录本次之前(从第1次到第n-1次)检测到的语音的文本进行。在这种情况下,每一次检测到的语音的文本对应一个m值,每次检测到的语音的文本对应的m加1。
S214、终端判断m是否为预定次数。若m为预定次数,则执行步骤S215。否则,本流程结束。
在本申请的另一些实施例中,终端分别记录从第1次到第n-1次检测到的语音的文本。在这种情况下,终端可以针对每一次检测到的语音的文本对应的m值,分别确定每次检测到的语音的文本对应的m值是否为预定次数。
S215、终端将第n-1次检测到的语音的文本添加到否定答复的第一关键词中。
在本申请的另一些实施例中,终端分别记录从第1次到第n-1次检测到的语音的文本。在这种情况下,终端可以将对应的m值满足预定次数的那一次或几次检测到的语音的文本设置为第一关键词。本流程结束。
由上可见,本申请实施例提供的一种语音播放的方法,当用户的语音命令不是预先设置的命令时,终端可以对用户的语音命令进行学习,从而识别出用户的意图,执行相应的操作。使得用户与终端的交互更加个性化和智能化,有利于提升终端的使用效率,提升用户体验。
第二种应用场景,终端自动通过语音播放新接收到的消息。
如图6所示,为本申请实施例提供一种语音播放消息的方法的流程示意图,包括步骤S501a-S505,具体如下:
S501a、终端显示设置界面。
S501b、终端检测到用户的设置操作。
S501c、响应于用户的设置操作,终端设置自动语音播放消息的功能。
在本申请的一些实施例中,用户可以对终端自动播放的消息进行设置。例如:终端可以设置自动播放消息的应用(即预设应用),那么终端在接收到预设应用的消息时,可以自动播放该消息。终端也可以设置自动播放的消息对应的联系人(即预设联系人)、或者自动播放的消息对应的联系人的群组(即预设联系人群组)。那么,终端接收到预设联系人或预设联系人群组发送的消息时,可以自动语音播放该消息。终端还可以设置自动播放的消息内容中包含的第二关键词(预设的第二关键词),那么当终端接收到的消息中包含预设的第二关键词时,终端自动播放该消息。类似的,终端还可以设置自动播放消息的类型(例如微信中的聊天消息、朋友圈消息,以及系统消息等),自动播放消息的时间段、自动播放消息的位置等,本申请实施例不一一赘述。
在本申请的另一些实施例中,终端还可以设置消息的播放优先级。具体的,用户可以根据各个应用的使用频繁程度、各个联系人或联系人群组的重要性、第二关键词的具体设置内容来确定消息的播放优先级。例如:若某用户使用微信更为频繁,那么可以将微信的优先级设置高于短信的优先级。又例如:设置微信中星标联系人的优先级高于普通联系人的优先级。再例如:将第二关键词设置为“紧急”,则包含第二关键词的消息优先级可以设置为最高。本申请实施例对此不做具体限定。
示例性的,如图7所示,为终端的一些界面示意图。其中,图7中(1)所示的界面701,可用于设置可以自动播放消息的应用。界面701可以包括状态栏712、多个控件702、多个控件703、控件704。其中,状态栏712可以包括运营商的名称(例如中国移动)、时间、WiFi图标、信号强度和当前的剩余电量等。控件702可用于删除自动播放消息的应用,控件703可用于增加自动播放消息的应用,控件704可用于选中的应用的进行进一步的设置。例如:终端接收用户对点击“微信”对应的控件704的操作,例如点击操作,则终端显示如图7中(2)所示的界面705。界面705可用于对“微信”中联系人的进行设置。联系人的设置包括群组设置和具体联系人的设置。这里以群组设置为例进行说明。控件706可用于开启联系人中群组的自动播放消息的功能。也就是说,终端可以设置自动播放消息的群组。控件707用于对选中群组的进行进一步 设置。例如:终端接收用户对群组对应的控件707的操作,例如:点击,则终端显示如图7中(3)所示的界面708。该界面708中可供用户选择自动播放消息的群组,以及对选中的群组进行具体设置。例如:终端接收用户对“家人”对应的控件709的操作,例如:点击,则终端显示如图7中(4)所示的界面710。该界面710中可以开启是否启用关键词的功能。开启关键词的功能后,若消息的内容中包含这些关键词时,可认为需要自动播放该消息。控件711中可用于接收用户输入的关键词。在本申请的另一些示例中,启用关键词的功能也可以不与应用或联系人进行关联,也就是说,终端也可以设置若消息的内容中包含某些关键词,就自动播放该消息的内容。与该消息属于什么应用,该消息是否为某个联系人发送的无关。本申请实施例对此不做具体限定。
S502、终端接收到第四消息。
S503、终端判断第四消息是否属于预设应用。若属于预设应用,则执行步骤S505,若不属于预设应用,则执行步骤S504。
S504、终端播放提示音,用于提示用户收到第四消息。
本流程结束。
S505、终端判断第四消息是否是预设联系人发送的。若是,则执行步骤S506。否则,执行步骤S504。
S506、终端判断第四消息的内容中是否包含第二关键词。若包含,则执行步骤S507.若不包含,则执行步骤S504。
需要说明的是,本申请实施例中并不限定步骤S503-S504的执行顺序,以及各个步骤中具体的判断内容。终端具体判断过程和具体判断的内容,与用户在步骤S501中的具体设置相关。本申请实施例对此不做具体限定。
S507、终端语音播放该第四消息。
在本申请一些实施例中,在本步骤之前,终端也可以通过语音询问用户是否播放第四消息,询问过程可参考应用场景一中的相关内容,在此不赘述。
在第一种应用场景和第二种应用场景中,在一些实施例中,为了保证用户信息的隐私性,在终端语音播放消息(例如:第一消息或第四消息)之前,终端还可以接收用户对是否开启语音播放功能的设置。若用户开启语音播放功能,则终端具有通过语音播放消息的权限,在满足条件后可以通过语音播放消息。若用户未开启语音播放功能,则终端不具有语音播放消息的权限,不能进行语音播放消息。
在第一种应用场景和第二种应用场景中,在另一些实施例中,在终端语音播放消息(例如:第一消息或第四消息)之后,还可以接收用户的语音命令,对该消息进行回复。此时,用户的语音命令可以第三关键词作为语音的前缀,用于标识该用户语音为用户对消息的回复。例如:第三关键词可以是“请回复”、“请答复”等。
举例来说,终端在播放第一消息或第四消息后,接收到用户的第四语音。终端将第四语音转化成文本信息。当根据文本信息确定第四语音是以第三关键词作为开头的,则确定第四语音为用户是对第一消息或第四消息的回复。那么,终端将第四语音中第三关键词后的语音信息,回复给发送消息的联系人。可选的,终端也可以将第四语音中第三关键词后的语音信息转化成文本信息,回复给发送消息的联系人。本申请实施例不做具体限定。
需要说明的是,可选的,终端可以在接收第四语音后,将第四语音转化为文本信息。可选的,终端可以在接收第四语音的同时,将接收到的第四语音的部分语音转化为文本信息。当确定第四语音前缀是第三关键词时,确定第四语音是用户的回复,继续执行后续步骤。当确定第四语音前缀不是第三关键词时,确定第四语音不是用户的回复,可停止后续步骤。这样,有利于减轻终端的处理负荷,提升终端的处理能力。
在本申请的一些实施例中,终端可以对第三关键词进行学习,学习方法与应用场景一中终端学习第一关键词的方法类似。在此,不再赘述。
在本申请的另一些实施例中,终端可以根据用户的语音回复一些图片(例如:输入法中的笑脸图片、生气脸图片等)给发件人。用户可以预先设置第四关键词与输入法中图片的对应关系。例如:“微笑”对应于笑脸图片,“生气”对应于生气脸图片等。这样,当终端已确定第四语音为用户针对第一消息或第四消息的回复,且检测到用户的第四语音中,包含有第四关键词时,终端根据第四关键词与图片的对应关系,回复相应的图片给发件人。这样,丰富了用户回复消息的多样性,提升用户体验。
在本申请的又一些实施例中,终端在接收到消息时,也可以对消息的内容进行自然语义分析。终端可以根据自然语义分析的结果,调用相关的应用模块或功能模块,并通过语音播放的方式为用户提供更多相关信息。例如:消息的内容为“我明天去你们那出差,那边天气怎么样啊?”。终端通过语义分析出询问天气,那么可以调用终端中天气相关的应用,从天气相关的应用中获取天气信息,并播放给用户。又例如:消息的内容为“今天去哪里吃饭?”,终端通过自然语言分析出询问餐馆,那么终端可以调用地图查询终端附近的餐馆,或者终端可以调用例如大众点评应用查询用户常去的餐馆等信息,并播放给用户。这样,用户和终端之间的交互更加高效,提升用户体验。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
Claims (23)
- 一种消息的播放方法,其特征在于,可应用于终端,所述方法包括:所述终端接收第一消息,所述第一消息为文本信息;响应于接收到所述第一消息,所述终端播放第一语音,所述第一语音用于询问用户是否语音播放所述第一消息;所述终端检测到用户的第二语音;所述终端将所述第二语音转化为第一文本;若所述第一文本不匹配第一关键词,所述终端继续检测用户的语音;所述第一关键词为肯定关键词;当所述终端检测到用户的第三语音,所述终端将所述第三语音转化为第二文本;若所述第二文本匹配所述第一关键词,所述终端语音播放所述第一消息,且所述终端记录所述第一文本的次数;若所述第一文本的次数大于第一阈值,则所述终端将所述第一文本添加到所述第一关键词中。
- 根据权利要求1所述的消息的播放方法,其特征在于,所述方法还包括:所述终端将所述第一消息转换为第四语音;所述终端语音播报所述第一消息具体为:所述终端播放所述第四语音。
- 根据权利要求1或2所述的消息的播放方法,其特征在于,在所述终端将所述第一文本添加到所述第一关键词中之后,所述方法还包括:所述终端接收第二消息,所述第二消息为文本信息;响应于接收到所述第二消息,所述终端播放第五语音,所述第五语音用于询问用户是否语音播放所述第二消息;所述终端检测到用户的第六语音;所述终端将所述第六语音转化为所述第三文本;若所述第三文本匹配添加后的第一关键词,所述终端语音播报所述第二消息。
- 根据权利要求1-3任一项所述的消息的播放方法,其特征在于,在所述终端播放第一语音之前,所述方法还包括:若所述终端确定所述第一消息属于预设应用、和/或所述第一消息的发件人属于预设联系人群组、和/或所述第一消息包含第二关键词,所述终端确定播放所述第一语音。
- 根据权利要求1-4任一项所述的消息的播放方法,其特征在于,在所述终端播放第一语音之前,所述方法还包括:所述终端在接收到所述第一消息的同时,还接收到第三消息;所述终端根据预设的优先级顺序,确定所述第一消息的优先级高于所述第三消息的优先级。
- 根据权利要求1-5任一项所述的消息的播放方法,其特征在于,在所述终端将所述第一文本添加到所述第一关键词中之后,所述方法还包括:所述终端显示提示信息,用于提示所述终端已更新所述第一关键词。
- 根据权利要求1所述的消息的播放方法,其特征在于,在所述终端播放第一语音之后,所述方法还包括:若所述终端在预设时间段内一直未检测到用户的语音,或者所述终端在所述预设时间段内一直未检测到与所述第一关键词匹配的用户的语音,则所述终端确定不语音播放所述第一消息。
- 根据权利要求1所述的消息的播放方法,其特征在于,在所述终端播放第一语音之后,所述方法还包括:若所述终端在预设时间段内检测到,与所述第一关键词不匹配的用户的语音的次数大于第二阈值,则所述终端确定不语音播放所述第一消息。
- 根据权利要求1-8任一项所述的消息的播放方法,其特征在于,所述第一消息为即时通信类应用的消息。
- 一种消息的播放方法,其特征在于,可应用于终端,所述方法包括:所述终端接收第一消息,所述第一消息为文本信息;响应于接收到所述第一消息,所述终端播放第一语音,所述第一语音用于询问用户是否语音播放所述第一消息;所述终端检测到用户的第二语音;所述终端将所述第二语音转化为第一文本;若所述第一文本不匹配第一关键词,所述终端继续检测用户的语音;所述第一关键词包括肯定关键词和否定关键词;当所述终端检测到用户的第三语音,所述终端将所述第三语音转化为第二文本;若所述第二文本匹配所述肯定关键词,则所述终端语音播放所述第一消息,且所述终端记录所述第一文本的次数;若所述第一文本的次数大于第一阈值,则所述终端将所述第一文本添加到所述肯定关键词中。若所述第二文本匹配所述否定关键词,则所述终端确定不语音播放所述第一消息,且所述终端记录所述第一文本的次数;若所述第一文本的次数大于第一阈值,则所述终端将所述第一文本添加到所述否定关键词中。
- 根据权利要求10所述的消息的播放方法,其特征在于,在所述终端将所述第一文本添加到所述肯定关键词中或所述否定关键词中之后,所述方法还包括:所述终端接收第二消息,所述第二消息为文本信息;响应于接收到所述第二消息,所述终端播放第四语音,所述第四语音用于询问用户是否语音播放所述第二消息;所述终端检测到用户的第五语音;所述终端将所述第五语音转化为所述第三文本;若所述第三文本匹配添加后的所述肯定关键词,所述终端语音播放所述第二消息;若所述第三文本匹配添加后的所述否定关键词,所述终端确定不语音播报所述第二消息。
- 根据权利要求10或11所述的消息的播放方法,其特征在于,在所述终端播放第一语音之前,所述方法还包括:若所述终端确定所述第一消息属于预设应用、和/或所述第一消息的发件人属于预设联系人群组、和/或所述第一消息包含第二关键词,所述终端确定播放所述第一语音。
- 根据权利要求10-12所述的消息的播放方法,其特征在于,在所述终端播放 第一语音之前,所述方法还包括:所述终端在接收到第一消息的同时,还接收到第三消息;所述终端根据预设的优先级顺序,确定所述第一消息的优先级高于所述第三消息的优先级。
- 根据权利要求10-13任一项所述的消息的播放方法,其特征在于,在所述终端将所述第一文本添加到所述肯定关键词或所述否定关键词之后,所述方法还包括:所述终端显示提示信息,用于提示用户所述第一关键词已更新。
- 根据权利要求10所述的消息的播放方法,其特征在于,在所述终端播放第一语音之后,所述方法还包括:若所述终端在预设时间段内一直未检测到用户的语音,或者所述终端在所述预设时间段内一直未检测到与所述第一关键词匹配的用户的语音,则所述终端确定不语音播放所述第一消息。
- 根据权利要求10所述的消息的播放方法,其特征在于,在所述终端播放第一语音之后,所述方法还包括:若所述终端在预设时间段内检测到,与所述第一关键词不匹配的用户的语音的次数大于第二阈值,则所述终端确定不语音播放所述第一消息。
- 根据权利要求10-16任一项所述的消息的播放方法,其特征在于,所述第一消息为即时通信类应用的消息。
- 一种终端,其特征在于,包括:处理器、存储器和触摸屏,所述存储器、所述触摸屏与所述处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器从所述存储器中读取所述计算机指令,以使得所述终端执行如权利要求1-9中任一项所述的消息的播放方法。
- 一种终端,其特征在于,包括:处理器、存储器和触摸屏,所述存储器、所述触摸屏与所述处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器从所述存储器中读取所述计算机指令,以使得所述终端执行如权利要求10-17中任一项所述的消息的播放方法。
- 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在终端上运行时,使得所述终端执行如权利要求1-9中任一项所述的消息的播放方法。
- 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在终端上运行时,使得所述终端执行如权利要求10-17中任一项所述的消息的播放方法。
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-9中任一项所述的消息的播放方法。
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求10-17中任一项所述的消息的播放方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201880093445.5A CN112154640B (zh) | 2018-07-04 | 2018-07-04 | 一种消息的播放方法及终端 |
US17/257,547 US11837217B2 (en) | 2018-07-04 | 2018-07-04 | Message playing method and terminal |
EP18925355.2A EP3809671A4 (en) | 2018-07-04 | 2018-07-04 | MESSAGE READING PROCESS AND TERMINAL |
PCT/CN2018/094517 WO2020006711A1 (zh) | 2018-07-04 | 2018-07-04 | 一种消息的播放方法及终端 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/094517 WO2020006711A1 (zh) | 2018-07-04 | 2018-07-04 | 一种消息的播放方法及终端 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020006711A1 true WO2020006711A1 (zh) | 2020-01-09 |
Family
ID=69060593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/094517 WO2020006711A1 (zh) | 2018-07-04 | 2018-07-04 | 一种消息的播放方法及终端 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11837217B2 (zh) |
EP (1) | EP3809671A4 (zh) |
CN (1) | CN112154640B (zh) |
WO (1) | WO2020006711A1 (zh) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113206779A (zh) * | 2021-03-31 | 2021-08-03 | 广州朗国电子科技有限公司 | 消息播放的优化方法、终端以及存储装置 |
CN113364669B (zh) * | 2021-06-02 | 2023-04-18 | 中国工商银行股份有限公司 | 消息处理方法、装置、电子设备及介质 |
CN115695636B (zh) * | 2021-07-27 | 2024-09-24 | 华为技术有限公司 | 一种智能语音交互的方法及电子设备 |
CN114124860A (zh) * | 2021-11-26 | 2022-03-01 | 中国联合网络通信集团有限公司 | 会话管理方法、装置、设备及存储介质 |
CN114822506A (zh) * | 2022-04-15 | 2022-07-29 | 广州易而达科技股份有限公司 | 一种消息播报方法、装置、移动终端及存储介质 |
US11777882B1 (en) * | 2022-07-06 | 2023-10-03 | ph7, Ltd. | Notification sound processing for push notification |
CN115499397B (zh) * | 2022-09-08 | 2023-11-17 | 亿咖通(湖北)技术有限公司 | 一种信息回复方法、装置、设备及存储介质 |
CN115204127B (zh) * | 2022-09-19 | 2023-01-06 | 深圳市北科瑞声科技股份有限公司 | 基于远程流调的表单填写方法、装置、设备及介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150073805A1 (en) * | 2013-09-12 | 2015-03-12 | At&T Intellectual Property I, L.P. | System and method for distributed voice models across cloud and device for embedded text-to-speech |
CN105245729A (zh) * | 2015-11-02 | 2016-01-13 | 北京奇虎科技有限公司 | 移动终端消息阅读方法和装置 |
CN105825856A (zh) * | 2016-05-16 | 2016-08-03 | 四川长虹电器股份有限公司 | 车载语音识别模块的自主学习方法 |
CN106156022A (zh) * | 2015-03-23 | 2016-11-23 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN106899946A (zh) * | 2015-12-17 | 2017-06-27 | 北京奇虎科技有限公司 | 消息的语音播放处理方法、装置及系统 |
CN107360320A (zh) * | 2017-06-30 | 2017-11-17 | 维沃移动通信有限公司 | 一种移动终端控制方法及移动终端 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101378530A (zh) | 2007-08-30 | 2009-03-04 | 乐金电子(中国)研究开发中心有限公司 | 一种短信收发方法、系统及短信服务器 |
CN101800800A (zh) | 2009-02-06 | 2010-08-11 | 沈阳晨讯希姆通科技有限公司 | 实现语音化短信接收的移动终端及其工作方法 |
CN101778154A (zh) | 2009-12-28 | 2010-07-14 | 中兴通讯股份有限公司 | 一种短信语音播报屏蔽的方法和装置 |
US8798995B1 (en) * | 2011-09-23 | 2014-08-05 | Amazon Technologies, Inc. | Key word determinations from voice data |
US9026176B2 (en) * | 2013-05-12 | 2015-05-05 | Shyh-Jye Wang | Message-triggered voice command interface in portable electronic devices |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
CN103929537B (zh) * | 2014-04-03 | 2017-02-15 | 北京深思数盾科技股份有限公司 | 基于不同级别信息的实时提醒方法 |
CN104159206A (zh) | 2014-08-19 | 2014-11-19 | 广州市久邦数码科技有限公司 | 一种可穿戴式设备的短信系统及其短信处理方法 |
CN104991894A (zh) | 2015-05-14 | 2015-10-21 | 深圳市万普拉斯科技有限公司 | 即时聊天信息浏览方法和系统 |
CN106656732A (zh) * | 2015-11-04 | 2017-05-10 | 陈包容 | 一种基于场景信息获取聊天回复内容的方法及装置 |
CN107644640A (zh) | 2016-07-22 | 2018-01-30 | 佛山市顺德区美的电热电器制造有限公司 | 一种信息处理方法及家电设备 |
JP6922920B2 (ja) | 2016-08-26 | 2021-08-18 | ソニーグループ株式会社 | 情報処理装置及び情報処理方法 |
CN106412282B (zh) | 2016-09-26 | 2019-08-20 | 维沃移动通信有限公司 | 一种实时消息语音提示方法及移动终端 |
CN106506804B (zh) | 2016-09-29 | 2020-02-21 | 维沃移动通信有限公司 | 一种通知消息的提醒方法及移动终端 |
CN107220292A (zh) * | 2017-04-25 | 2017-09-29 | 上海庆科信息技术有限公司 | 智能对话装置、反馈式智能语音控制系统及方法 |
CN107452373A (zh) | 2017-07-26 | 2017-12-08 | 上海与德通讯技术有限公司 | 机器人交互方法及系统 |
CN107612814A (zh) | 2017-09-08 | 2018-01-19 | 北京百度网讯科技有限公司 | 用于生成候选回复信息的方法和装置 |
-
2018
- 2018-07-04 US US17/257,547 patent/US11837217B2/en active Active
- 2018-07-04 WO PCT/CN2018/094517 patent/WO2020006711A1/zh unknown
- 2018-07-04 EP EP18925355.2A patent/EP3809671A4/en active Pending
- 2018-07-04 CN CN201880093445.5A patent/CN112154640B/zh active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150073805A1 (en) * | 2013-09-12 | 2015-03-12 | At&T Intellectual Property I, L.P. | System and method for distributed voice models across cloud and device for embedded text-to-speech |
CN106156022A (zh) * | 2015-03-23 | 2016-11-23 | 联想(北京)有限公司 | 一种信息处理方法及电子设备 |
CN105245729A (zh) * | 2015-11-02 | 2016-01-13 | 北京奇虎科技有限公司 | 移动终端消息阅读方法和装置 |
CN106899946A (zh) * | 2015-12-17 | 2017-06-27 | 北京奇虎科技有限公司 | 消息的语音播放处理方法、装置及系统 |
CN105825856A (zh) * | 2016-05-16 | 2016-08-03 | 四川长虹电器股份有限公司 | 车载语音识别模块的自主学习方法 |
CN107360320A (zh) * | 2017-06-30 | 2017-11-17 | 维沃移动通信有限公司 | 一种移动终端控制方法及移动终端 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3809671A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP3809671A4 (en) | 2021-06-02 |
CN112154640B (zh) | 2024-04-30 |
EP3809671A1 (en) | 2021-04-21 |
US11837217B2 (en) | 2023-12-05 |
US20210210068A1 (en) | 2021-07-08 |
CN112154640A (zh) | 2020-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020006711A1 (zh) | 一种消息的播放方法及终端 | |
WO2020192456A1 (zh) | 一种语音交互方法及电子设备 | |
WO2020207326A1 (zh) | 一种对话消息的发送方法及电子设备 | |
US20220360942A1 (en) | Bluetooth-based object searching method and electronic device | |
CN113169760A (zh) | 无线短距离音频共享方法及电子设备 | |
CN113225423B (zh) | 一种联系人的推荐方法及电子设备 | |
WO2021000817A1 (zh) | 环境音处理方法及相关装置 | |
EP3923617A1 (en) | Method for reducing power consumption of mobile terminal and mobile terminal | |
CN113170279B (zh) | 基于低功耗蓝牙的通信方法及相关装置 | |
EP4213489A1 (en) | Device recommendation method and electronic device | |
CN113747374B (zh) | 一种消息推送方法及装置 | |
CN113806105A (zh) | 消息处理方法、装置、电子设备和可读存储介质 | |
WO2022037480A1 (zh) | 任务处理方法及相关电子设备 | |
CN111382418A (zh) | 应用程序权限管理方法、装置、存储介质与电子设备 | |
US20240272865A1 (en) | Audio playing method, electronic device, and system | |
WO2021042881A1 (zh) | 消息通知方法及电子设备 | |
US20230224398A1 (en) | Audio output channel switching method and apparatus and electronic device | |
WO2020216144A1 (zh) | 一种添加邮件联系人的方法和电子设备 | |
CN114664306A (zh) | 一种编辑文本的方法、电子设备和系统 | |
CN117118970B (zh) | 文件的下载方法、电子设备及存储介质 | |
CN115883714B (zh) | 消息回复方法及相关设备 | |
CN117041465B (zh) | 一种视频通话的优化方法、电子设备及存储介质 | |
CN114449103B (zh) | 提醒方法、图形用户界面及终端 | |
CN115942253B (zh) | 一种提示方法及相关装置 | |
US20240334531A1 (en) | Electronic device interaction method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18925355 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018925355 Country of ref document: EP Effective date: 20210112 |