CN111556197B - Method and device for realizing voice assistant and computer storage medium - Google Patents

Method and device for realizing voice assistant and computer storage medium Download PDF

Info

Publication number
CN111556197B
CN111556197B CN202010337041.5A CN202010337041A CN111556197B CN 111556197 B CN111556197 B CN 111556197B CN 202010337041 A CN202010337041 A CN 202010337041A CN 111556197 B CN111556197 B CN 111556197B
Authority
CN
China
Prior art keywords
type
operation task
voice
call
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010337041.5A
Other languages
Chinese (zh)
Other versions
CN111556197A (en
Inventor
张浩波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Pinecone Electronic Co Ltd
Priority to CN202010337041.5A priority Critical patent/CN111556197B/en
Publication of CN111556197A publication Critical patent/CN111556197A/en
Application granted granted Critical
Publication of CN111556197B publication Critical patent/CN111556197B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

The disclosure relates to a method and a device for realizing a voice assistant and a computer storage medium, and relates to an intelligent voice technology of terminal equipment. The present disclosure provides a method for implementing a voice assistant, which is applied to a terminal device, and includes: detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment; when the audio information is detected, the voice assistant is awakened; when a voice instruction is detected, when the type of an operation task triggered by the voice instruction is identified as a voice broadcast result type, corresponding operation is carried out according to the content of the operation task, and the operation result is inserted into an audio stream of a call service in an audio format for broadcasting; and when the type of the operation task is identified to be the data processing type, performing corresponding data processing operation according to the content of the operation task. The technical scheme of the embodiment enriches the application scenes of the voice assistant and improves the experience of the user using the voice assistant.

Description

Method and device for realizing voice assistant and computer storage medium
Technical Field
The present disclosure relates to intelligent voice technology of terminal devices, and in particular, to a method and an apparatus for implementing a voice assistant, and a computer storage medium.
Background
The intelligent voice assistant has many applications in devices such as mobile phones, cars and televisions. In the related art, the intelligent voice assistant generally implements intention recognition and intention processing on the voice format information input by the user by using technologies such as Speech recognition asr (automatic Speech recognition), Speech synthesis tts (text to Speech), natural Language processing nlp (natural Language processing), and voiceprint recognition.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a method and an apparatus for implementing a voice assistant, and a computer storage medium.
According to a first aspect of the embodiments of the present disclosure, a method for implementing a voice assistant is provided, which is applied to a terminal device, and includes:
detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment;
when the audio information is detected, waking up a voice assistant;
when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type;
when the type of the operation task is a voice broadcast result type, performing corresponding operation according to the content of the operation task, and inserting the operation result into the audio stream of the call service in an audio format for broadcasting;
and when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
In the implementation method of the voice assistant, the recognizing the type of the operation task triggered by the voice instruction includes:
when the operation task triggered by the voice instruction belongs to a preset first type task, recognizing that the type of the operation task triggered by the voice instruction is a voice broadcast result type, wherein the first type task at least comprises intelligent voice output and/or conference host;
and when the operation task triggered by the voice instruction is determined to belong to a preset second type task, identifying the type of the operation task triggered by the voice instruction as a data processing type, wherein the second type task at least comprises any one of multimedia information transmission, text recording and call recording.
The implementation method of the voice assistant further comprises the following steps:
after the voice assistant is awakened, detecting the types of ongoing call services of the terminal equipment, wherein the types of the call services comprise a one-to-one call type and a one-to-many call type;
the corresponding operation is carried out according to the content of the operation task, and an operation result is inserted into the audio stream of the call service in an audio format for broadcasting, and the method comprises the following steps:
when the type of the call service is a one-to-one call type or a one-to-many call type, if the operation task is determined to be intelligent voice output, extracting keywords contained in the content of the operation task, determining intelligent voice output information corresponding to the keywords, converting the intelligent voice output information into audio data, and inserting the audio data into an audio stream of the call service for broadcasting;
and when the type of the conversation service is a one-to-many conversation type, if the operation task is determined to be a conference host, converting preset conference flow information into audio data according to the content of the operation task, and inserting the audio data into an audio stream of the conversation service for broadcasting.
In the implementation method of the voice assistant, the performing corresponding data processing operation according to the content of the operation task includes:
when the operation task is determined to be information transmission, data transmission is carried out with an opposite end of a call service according to the content of the operation task;
when the operation task is determined to be a text record, calling a preset application with a text recording function, and storing the content of the operation task into a preset position;
and calling a preset application with a recording function when the operation task is determined to be call recording, and recording the audio stream of the call service according to the content of the operation task.
The implementation method of the voice assistant further comprises the following steps:
pre-storing mapping information of intelligent voice output and key words;
the method comprises the steps of storing keywords as indexes, and storing intelligent voice output information corresponding to the keywords as output values corresponding to the indexes.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for implementing a voice assistant, the apparatus including:
the first detection module is used for detecting audio information for awakening the voice assistant from the audio stream of the ongoing call service of the terminal equipment;
the awakening module is used for awakening the voice assistant when the audio information is detected;
the voice assistant is used for receiving a voice instruction from a user, and sending the voice instruction to the voice assistant;
the first processing module is used for performing corresponding operation according to the content of the operation task when the type of the operation task is a voice broadcast result type, and inserting the operation result into an audio stream of the call service in an audio format for broadcasting;
and the second processing module is used for performing corresponding data processing operation according to the content of the operation task when the type of the operation task is a data processing type.
In the implementation apparatus of the voice assistant, the recognition module includes:
the first type identification submodule is used for identifying the type of the operation task triggered by the voice instruction as a voice broadcast result type when the operation task triggered by the voice instruction is determined to belong to a preset first type task, and the first type task at least comprises intelligent voice output and/or conference host;
and the second type identification submodule is used for identifying the type of the operation task triggered by the voice instruction as a data processing type when the operation task triggered by the voice instruction belongs to a preset second type task, wherein the second type task at least comprises any one of multimedia information transmission, text recording and call recording.
The device for implementing the voice assistant further comprises:
the second detection module is used for detecting the types of the ongoing call services of the terminal equipment after the voice assistant is awakened, wherein the types of the call services comprise a one-to-one call type and a one-to-many call type;
the first processing module comprises:
the intelligent voice output processing submodule is used for extracting keywords contained in the content of the operation task if the operation task is determined to be intelligent voice output when the type of the call service is a one-to-one call type or a one-to-many call type, determining intelligent voice output information corresponding to the keywords, converting the intelligent voice output information into audio data, and inserting the audio data into an audio stream of the call service for broadcasting;
and the conference host processing submodule is used for converting preset conference flow information into audio data according to the content of the operation task and inserting the audio data into the audio stream of the call service for broadcasting if the operation task is determined to be a conference host when the type of the call service is a one-to-many call type.
In the implementation apparatus of the voice assistant, the second processing module includes:
the information transmission submodule is used for carrying out data transmission with an opposite end of the communication service according to the content of the operation task when the operation task is determined to be information transmission;
the text storage sub-module is used for calling a preset application with a text recording function and storing the content of the operation task to a preset position when the operation task is determined to be a text record;
and the recording submodule is used for calling a preset application with a recording function when the operation task is determined to be call recording, and recording the audio stream of the call service according to the content of the operation task.
The device for implementing the voice assistant further comprises:
the setting module is used for pre-storing mapping information of intelligent voice output and key words;
the method comprises the steps of storing keywords as indexes, and storing intelligent voice output information corresponding to the keywords as output values corresponding to the indexes.
According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus for implementing a voice assistant, including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment;
when the audio information is detected, waking up a voice assistant;
when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type;
when the type of the operation task is a voice broadcast result type, performing corresponding operation according to the content of the operation task, and inserting the operation result into an audio stream of the call service in an audio format;
and when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of a terminal device, enable the terminal device to perform a method for implementing a voice assistant, the method including:
detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment;
when the audio information is detected, waking up a voice assistant;
when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type;
when the type of the operation task is a voice broadcast result type, performing corresponding operation according to the content of the operation task, and inserting the operation result into an audio stream of the call service in an audio format;
and when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
according to the technical scheme, the voice assistant is called in the call process, and services of voice broadcasting result types and data processing types are provided for the user through the added types of the operation tasks triggered by the user voice. Therefore, the user can conveniently trigger various operation tasks through voice, application scenes of the voice assistant are enriched, and the experience of the user using the voice assistant is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating a method for implementing a voice assistant in accordance with an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method for a voice assistant to implement according to an example embodiment.
FIG. 3 is a block diagram illustrating an apparatus for implementing a voice assistant in accordance with an example embodiment.
FIG. 4 is a block diagram illustrating an apparatus for implementing a voice assistant in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
FIG. 1 is a flow diagram illustrating a method for implementing a voice assistant in accordance with an exemplary embodiment. The method can be applied to terminal equipment and comprises the following operations:
step S101, detecting audio information for waking up a voice assistant from an audio stream of a call service in progress of a terminal device;
step S102, when audio information is detected, a voice assistant is awakened;
step S103, when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type;
step S104, when the type of the operation task is a voice broadcast result type, corresponding operation is carried out according to the content of the operation task, and the operation result is inserted into an audio stream of the call service in an audio format for broadcasting;
and step S105, when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
In step S101, the ongoing call service of the terminal device may include various call services containing audio data. Such as mobile voice telephone service, mobile video telephone service. And calls containing audio data in various instant messaging applications, such as video calls, voice calls, multi-person video calls, and the like. The audio stream of the call service is an uplink audio stream and a downlink audio stream included in the current call service.
Herein, the audio information for waking up the voice assistant may include various forms. For example, it may be a preset wake-up word for waking up the voice assistant. At this time, the operation of step S101 may detect a preset wake-up word from an audio stream of a call service being executed by the terminal device through a wake-up word detection module of the voice assistant, so as to determine whether to wake up the voice assistant.
In step S102, the voice command detected by the voice assistant may be detected from an upstream audio stream of the call service being executed by the terminal device, that is, the initiator of the voice command detected by the voice assistant is the user of the terminal device. The voice command may also be detected from the downstream audio stream of the call service being executed by the terminal device, that is, the initiator of the voice command detected by the voice assistant at this time is the user at the opposite end of the call with the terminal device. The type of the operation task triggered by the voice instruction can indicate a feedback mode of an operation result of the operation task triggered by the voice assistant, so that a subsequent processing flow of the voice assistant is determined. For example, when the type of the operation task is a voice broadcast result type, the operation result of the operation task triggered by the voice assistant may be fed back in a form of voice broadcast, and at this time, the voice assistant may perform a conventional processing manner, that is, the operation in step S104. For another example, when the type of the operation task is the data processing type, the voice assistant is instructed not to perform voice broadcast on the operation result of the operation task triggered by the voice assistant, and at this time, the voice assistant needs to involve the data processing operation, that is, the operation in step S105.
In this embodiment, in step S105, the format of the processed data may include a picture, a text, an audio, and other formats. The data processing may be local data processing for the terminal device or data sharing processing with a remote terminal. For example, if the operation task triggered by the voice instruction initiated by the user is to create a log text, etc., the log text may be created and stored locally in the terminal device according to the voice instruction. For another example, if the operation task triggered by the voice instruction initiated by the user is an operation of sharing the specified text with the opposite end of the call, the text specified by the voice instruction to be shared and transmitted with the opposite end of the call can be shared and transmitted through the network connection between the terminal device and the opposite end of the call service. The network connection between the terminal device and the opposite end of the call service may be established before the call service, or may be established in real time according to an operation task triggered by a voice instruction after the voice assistant detects the voice instruction. For another example, the operation task triggered by the voice command initiated by the user is to transmit the data specified by the user to the remote end specified by the user when transmitting the data to the remote user end or the remote server end.
In addition, the type of the operation task may not be distinguished in a non-call scene. That is, when the voice assistant is awakened, it may be detected whether a call service is in progress, and if the call service is in progress, the operation may be performed according to the above steps S103 to S105. If the communication service is not currently carried out, the type of the operation task triggered by the voice instruction does not need to be identified, and the corresponding processing is directly carried out according to the identified operation task and the content of the operation task.
It can be seen from the above embodiments that, the technical solution herein can invoke the voice assistant in the call process, thereby facilitating the user to trigger various operation tasks through voice, enriching the application scenarios of the voice assistant, and improving the experience of the user using the voice assistant. Moreover, the voice assistant provided by the technical scheme increases the types of the operation tasks triggered by the voice of the user, so that different processing modes can be selected for operation according to the types of the operation tasks, namely, the operation results can be inserted into the call audio in an audio mode to broadcast the operation results to the user in a voice mode, or corresponding data processing operation is carried out without broadcasting the operation results to the user in the voice mode. Therefore, on the basis of not influencing normal conversation services, the voice assistant can provide services of voice broadcasting result types and data processing types for the user, and user experience is improved.
The embodiment also provides an implementation method of a voice assistant, wherein the identifying the type of the operation task triggered by the voice instruction includes:
when the operation task triggered by the voice instruction belongs to a preset first type task, recognizing the type of the operation task triggered by the voice instruction as a voice broadcast result type, wherein the first type task at least comprises intelligent voice output and/or conference host;
and when the operation task triggered by the voice instruction is determined to belong to a preset second type task, identifying the type of the operation task triggered by the voice instruction as a data processing type, wherein the second type task at least comprises any one of information transmission, text recording and call recording.
The operation task triggered by the voice instruction can be identified by using an NLP (Natural Language Processing) function in the voice assistant, so as to determine whether the operation task belongs to the first type task or the second type task.
The first type of task is used for dividing tasks needing voice broadcasting results. That is, when it is recognized that an operation task triggered by a voice instruction belongs to a first type of task, an operation result of the operation task needs to be converted into an audio format, so that a user is notified of the operation result in a voice broadcast manner. For example, a user may have a need to query for information immediately during a call, and thus, the intelligent voice output may be set as a first type of task. Therefore, in the conversation process, after the user wakes up the voice assistant, the voice assistant detects the intelligent voice output task, can timely and quickly broadcast the contents to be inquired by the user, saves the time cost of the user, and does not influence the conversation operation of the user. For another example, a user may have a requirement of a conference host during a multi-party call, and at this time, the user needs to notify the conference flows of all users participating in the call, so that the conference flow information can be broadcasted in a voice broadcast manner, thereby notifying all users participating in the multi-party call of information such as the current conference progress. The method simplifies the user operation in the multi-party call, prompts the call progress in time and improves the experience of the user in using the multi-party call.
The second type of tasks is used for dividing the data processing type of tasks, namely the tasks do not need voice broadcasting results. Which is a different operational task than the first type of task. For example, information transmission, text recording, call recording and the like are all data processing, and voice broadcasting is not needed as a result of the data processing operation, so that ongoing call service is not affected.
In this embodiment, the definition of the first type of task and the second type of task may be default of an operating system of the voice assistant, that is, one or more operation tasks included in the first type of task and one or more operation tasks included in the second type of task are set in an initial configuration of the voice assistant. The first type task and the second type task can be set by the user according to the user requirement. It can be seen that, for different terminal devices, the division of the first type task and the second type task related to the voice assistant may be the same or different. And the division of the first type of task and the second type of task related to the voice assistant may be changed for the same terminal equipment. Therefore, for the operation task triggered by the voice instruction, the corresponding type of the operation task is changed along with the division change of the first type task and the second type task. For example, when the two parties of a call need to complete a shopping operation using a credit card, that is, when the voice assistant detects that the voice command is an operation task of "tell what is the bank card number of him/herself", the voice assistant needs to divide the first type of task and the second type of task to determine the type of the operation task. Namely, when the current first-class task comprises the operation task, the bank card information of the opposite-end transaction is informed in a voice broadcasting mode in a call. The touch control operation of the user can be reduced, the effect of timely informing the opposite terminal of the information of the bank card is achieved, and the communication efficiency is improved. If the current second-class task contains the operation task, the bank card information is sent to the opposite terminal in a non-voice broadcasting mode such as text or audio message in the communication. Therefore, the information safety of the user can be improved, and the touch operation of the user is reduced.
Therefore, in the embodiment, the operation tasks detected by the voice assistant are divided into different types through the predefined first type tasks and the predefined second type tasks, so that the corresponding processing modes are selected according to the different types, the voice assistant functions are reasonably added in the call process, and the effect of intelligent voice operation is optimized.
The embodiment also provides an implementation method of the voice assistant, and the method further includes:
after the voice assistant is awakened, detecting the types of ongoing call services of the terminal equipment, wherein the types of the call services comprise a one-to-one call type and a one-to-many call type;
carrying out corresponding operation according to the content of the operation task, and inserting the operation result into the audio stream of the call service in an audio format for broadcasting, wherein the method comprises the following steps:
when the type of the call service is a one-to-one call type or a one-to-many call type, if the operation task is determined to be intelligent voice output, extracting keywords contained in the content of the operation task, determining intelligent voice output information corresponding to the keywords, converting the intelligent voice output information into audio data, and inserting the audio data into an audio stream of the call service for broadcasting;
and when the type of the conversation service is a one-to-many conversation type, if the operation task is determined to be a conference host, converting preset conference flow information into audio data according to the content of the operation task, and inserting the audio data into an audio stream of the conversation service for broadcasting.
The voice print recognition function in the voice assistant can be used for recognizing the number of users participating in the call, so as to determine the type of the call service. For a one-to-one call, a first type of task initiated by the user may include intelligent voice output. For one-to-many conversations, the first type of task initiated by the user may include intelligent voice output and/or conference hosting.
When the type of the call service is determined to be a one-to-one call type or a one-to-many call type, the intelligent voice output task can be triggered by recognizing a voice instruction input by a user by using an NLP function in the voice assistant. At this time, the keyword may be recognized from the content of the operation task, the intelligent voice output information corresponding to the keyword is queried, the queried intelligent voice output information is converted into audio data, and the audio data is inserted into an audio stream of the call service for broadcasting. Therefore, in the embodiment, different information queries can be triggered by using different keywords, and required information is timely notified to the user in the call process, so that the personalized requirements of the user are improved.
When the type of the call service is determined to be a one-to-many call type, the voice instruction input by the user can be recognized to trigger the conference host task by utilizing the NLP function in the voice assistant. At this time, it may be considered that the conference flow of all users participating in the call needs to be notified. Therefore, the preset conference flow information can be extracted, the conference flow information is converted into the conference flow information in the audio format by using the text-to-audio function in the voice assistant, and the conference flow information is inserted into the audio stream of the call service, so that all users participating in the multi-party call are informed of the information such as the current conference progress in a voice broadcasting mode. The conference flow information may include a plurality of item information, and at this time, the plurality of item information are sequentially converted into item information in an audio format according to the sequence of the plurality of item information, and are sequentially inserted into an audio stream of the call service for broadcasting. When the project information is broadcast in sequence, the preset conference flow can trigger the combination of audio information (such as voice 'conference flow') and a voice instruction 'next item', and the voice broadcast operation of entering the next project information is triggered, so that the task of hosting the conference is completed. Therefore, the embodiment simplifies the user operation in the multi-party call, prompts the call progress in time, and improves the experience of the user in using the multi-party call.
The embodiment further provides an implementation method of a voice assistant, wherein the performing corresponding data processing operations according to the content of the operation task includes:
when the operation task is determined to be information transmission, data transmission is carried out with an opposite end of the conversation service according to the content of the operation task;
when the operation task is determined to be a text record, calling a preset application with a text recording function, and storing the content of the operation task into a preset position;
and when the operation task is determined to be call recording, calling a preset application with a recording function, and recording the audio stream of the call service according to the content of the operation task.
When the operation task is information transmission, all parties of the call support the network transmission function in the call, and at the moment, data transmission can be carried out with the opposite end of the call service according to the content of the operation task through the network link between the terminal equipment and the opposite end of the call service. When the data is transmitted to the opposite end of the call, the data can be displayed by the multimedia function application of the opposite end. The network link between the terminal device and the opposite end of the call service may be established before the call service, or may be established in real time according to an operation task triggered by a voice instruction after the voice assistant detects the voice instruction. The related data transmission may be unidirectional or bidirectional transmission, that is, the user triggering the information transmission task transmits data to the opposite end of the call, and the data exchange between the parties of the call may be performed. And when the type of the ongoing call service is one-to-many call, data transmission can be carried out to one or more users in the call according to a transmission object indicated in the voice instruction detected by the voice assistant. Therefore, under a call scene, data sharing among call users can be realized, the communication cost among call parties is reduced, and the communication efficiency is improved.
And when the operation task is determined to be the text record, calling a preset application with a text record function to operate. For example, the preset application having the text recording function may be a calendar schedule function application, and at this time, the time and the content of the operation task (i.e., the recording content) may be recorded in the calendar schedule and stored. The storage position of the calendar schedule is the preset position. For another example, the preset application with the text recording function may be a memo, and at this time, the content of the operation task (i.e., the stored content) may be stored as one record of the memo. The storage position of the memorandum is the preset position. Therefore, the text recording operation can be triggered through the voice instruction, so that the touch operation of the user in the call process is reduced, the hands of the user are liberated, and the information utilization rate is improved.
And when the operation task is determined to be call recording, calling a preset application with a recording function to operate. For example, the preset application with the recording function may be a recorder application of the mobile terminal. At this time, when the content of the recognition operation task includes "start call recording", the recorder application is called to start the call recording operation. And when the content of the identification operation task comprises 'ending call recording', calling the recorder application to end the call recording operation, starting the time of starting the call recording operation last time to the time of ending the call recording operation, and storing the recorded call recording data as a call recording record of the current call. Therefore, in the call process, the call recording operation can be triggered through the voice command, the touch operation of the user in the call process is reduced, the call recording requirement of the user is met in time, and the user experience is improved.
The embodiment also provides an implementation method of the voice assistant, and the method further includes:
pre-storing mapping information of intelligent voice output and key words;
the keywords are stored as indexes, and the intelligent voice output information corresponding to the keywords is stored as output values corresponding to the indexes.
In this embodiment, the mapping information between the intelligent speech output and the keyword may be pre-stored according to a user operation.
The manner of storing the keyword as an index and storing the intelligent voice output information corresponding to the keyword as an output value corresponding to the index may include various manners. For example, the mapping information of the intelligent speech output and the keyword is stored in a key-value structure. Suppose that the user stores in advance a mapping relationship of "the first address and a certain number of buildings on a certain street". Wherein, the first address is key, and the certain number of buildings on a certain street is value. Therefore, in the call process, when the voice assistant recognizes that the operation task triggered by the voice instruction includes the content of inquiring the 'first address', the 'first address' can be used as a keyword (value) to inquire the corresponding value, namely 'a certain number of buildings on a certain street'. And then converting the 'building on a certain street' into audio data, and inserting the audio data into the audio stream of the call service for broadcasting.
Therefore, according to the embodiment, different information queries can be triggered by using different keywords according to the user requirements, and personalized voice operation setting of the user is realized.
FIG. 2 is a flow diagram illustrating a method for a voice assistant to implement according to an example embodiment. The method takes an Android operating system as an example, and explains a process of calling a voice assistant by terminal equipment in a call. The process is shown in fig. 2 and includes the following operations:
step S201, in the process of executing the call service, the terminal device obtains the uplink and downlink audio streams of the call, and detects the wakeup word of the voice assistant in real time.
In this step, a wakeup word detection module of the voice assistant application function may be used to detect a wakeup word, that is, preset audio information for waking up the voice assistant.
The ongoing call service of the terminal device may include various types of services including communication audio. Such as mobile voice communication services, mobile video call services, etc. Various services including communication audio in instant messaging applications, such as WeChat video calls, voice calls, multi-person teleconferences, and the like.
When the awakening words are detected from the uplink and downlink audio streams of the call, the awakening words are detected from the uplink or downlink audio streams of the call.
Step S202, when the awakening word is detected, the voice assistant is awakened, voiceprint recognition is started, and the user awakening the voice assistant and the type of the ongoing call service are recorded.
In this step, voiceprint recognition may be performed using a voiceprint recognition module of the voice assistant application function. In this context, a user who enters a wake-up word can be determined by voiceprint recognition, i.e. the user can be recorded as an initiator of an operation task. In addition, the type of the call service may be a one-to-one call or a one-to-many call. At this time, the number of people participating in the current call, that is, the type of the call service, can be determined through voiceprint recognition. And determining the type of the operation task in the subsequent operation according to the type of the call service.
Step S203, detecting a voice instruction in real time, identifying the type and content of an operation task triggered by the voice instruction when the voice instruction is detected, entering step S204 if the type of the operation task is a voice broadcasting result type, and entering step S205 if the type of the operation task is a data processing type.
The operation of step S203 may be implemented by various functional modules of the voice assistant application function. For example, when a voice command is detected, the voice command may be recognized by an ASR (Automatic Speech Recognition) module to convert the voice command into a text format. When the VAD (Voice Activity Detection) module judges that the Voice command is finished, the Voice command in the text format is sent to an NLP (Natural language processing) module, and the type and content of the operation task triggered by the Voice command are identified.
When the type of the operation task triggered by the voice instruction is identified, the operation task can be distinguished according to the preset first-class task and the preset second-class task. Namely, when the identified operation task belongs to a preset first type of task, the type of the operation task can be determined to be a voice broadcast result type. When the identified operation task belongs to the preset second type task, the type of the operation task can be determined to be the data processing type.
Step S204, generating an operation result corresponding to the audio format according to the content of the operation task, inserting the operation result in the audio format into the audio stream of the ongoing call service, and returning to step S201.
In this step, an operation result in an audio format may be generated by using a TTS (extto Speech) module of the voice assistant application function.
In step S202, a user that wakes up the voice assistant is recorded, so that the operation result can be broadcasted into the audio stream corresponding to the user according to the user requirement, and a user other than the initiator is prevented from acquiring the operation result. For example, if the user who wakes up the voice assistant is the user of the terminal device, the operation result in the audio format may be inserted into the downstream audio stream of the call service, so that the opposite end of the call service cannot receive the operation result of the voice broadcast. Of course, when the user needs to share the operation result of the voice broadcast with the opposite end of the call service, the operation result in the audio format can be simultaneously inserted into the uplink and downlink audio streams of the call service, so that both parties of the call can hear the operation result of the voice broadcast.
Step S205, corresponding data processing is performed according to the content of the operation task, and the process returns to step S201. .
In this step, the data processing mode may be determined according to the operation task itself. For example, when the operation task is information transmission, a network link may be established with a sharing party (for example, an opposite end of a call service), and then the content of the operation task is used as a sharing object and transmitted to the sharing party through the established network link to implement sharing. In step S202, the type of ongoing call service is recorded. Therefore, when the type of the call service is one-to-many call, information transmission can be performed with one or more opposite terminals in the call according to the user requirements.
For another example, when the operation task is a text record, a preset application with a text recording function may be called, and the content of the operation task is stored in a preset location. At this time, according to the user requirement, the user can be prompted in a text information mode that the text recording is completed.
FIG. 3 illustrates a block diagram of an apparatus for implementing a voice assistant in accordance with an exemplary embodiment. As shown in fig. 3, the apparatus includes at least a first detection module 31, a wake-up module 32, an identification module 33, a first processing module 34, and a second processing module 35.
A first detection module 31, configured to detect audio information for waking up a voice assistant from an audio stream of an ongoing call service of a terminal device;
a wake-up module 32 configured to wake up the voice assistant upon detecting the audio information;
the recognition module 33 is configured to, when the voice assistant detects a voice instruction, recognize the type of an operation task triggered by the voice instruction, where the type of the operation task includes a voice broadcast result type and a data processing type;
the first processing module 34 is configured to, when the type of the operation task is a voice broadcast result type, perform corresponding operation according to the content of the operation task, and insert the operation result into an audio stream of the call service in an audio format for broadcast;
and the second processing module 35 is configured to, when the type of the operation task is a data processing type, perform a corresponding data processing operation according to the content of the operation task.
The embodiment also provides an implementation apparatus of a voice assistant, in which the recognition module includes:
the first type recognition sub-module is configured to recognize that the type of the operation task triggered by the voice instruction is a voice broadcast result type when the operation task triggered by the voice instruction is determined to belong to a preset first type task, and the first type task at least comprises intelligent voice output and/or conference host;
and the second type identification submodule is configured to identify the type of the operation task triggered by the voice instruction as a data processing type when the operation task triggered by the voice instruction is determined to belong to a preset second type task, wherein the second type task at least comprises any one of multimedia information transmission, text recording and call recording.
The embodiment also provides an implementation apparatus of a voice assistant, and the apparatus further includes:
the second detection module is configured to detect the types of ongoing call services of the terminal device after the voice assistant is awakened, wherein the types of the call services comprise a one-to-one call type and a one-to-many call type;
a first processing module comprising:
the intelligent voice output processing submodule is configured to extract keywords contained in the content of the operation task when the type of the call service is a one-to-one call type or a one-to-many call type and determine that the operation task is intelligent voice output, determine intelligent voice output information corresponding to the keywords, convert the intelligent voice output information into audio data and insert the audio data into an audio stream of the call service for broadcasting;
and the conference host processing submodule is configured to, when the type of the call service is a one-to-many call type, convert preset conference flow information into audio data according to the content of the operation task and insert the audio data into an audio stream of the call service for broadcasting if the operation task is determined to be a conference host.
The embodiment also provides an apparatus for implementing a voice assistant, in which the second processing module includes:
the information transmission submodule is configured to transmit data with an opposite end of the communication service according to the content of the operation task when the operation task is determined to be information transmission;
the text storage sub-module is configured to call a preset application with a text recording function and store the content of the operation task to a preset position when the operation task is determined to be a text record;
and the recording submodule is configured to call a preset application with a recording function when the operation task is determined to be call recording, and record the audio stream of the call service according to the content of the operation task.
The embodiment also provides a device for implementing the voice assistant, and the device further comprises:
the setting module is configured to store mapping information of the intelligent voice output and the keywords in advance;
the keywords are stored as indexes, and the intelligent voice output information corresponding to the keywords is stored as output values corresponding to the indexes.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 4 is a block diagram illustrating an apparatus 400 for implementing a voice assistant, according to an example embodiment. For example, the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 4, the apparatus 400 may include one or more of the following components: processing components 402, memory 404, power components 406, multimedia components 408, audio components 410, input/output (I/O) interfaces 412, sensor components 414, and communication components 416.
The processing component 402 generally controls overall operation of the apparatus 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.
The memory 404 is configured to store various types of data to support operations at the device 400. Examples of such data include instructions for any application or method operating on the device 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power supply components 406 provide power to the various components of device 400. The power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the apparatus 400.
The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 400 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when apparatus 400 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.
The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the apparatus 400. For example, the sensor component 414 can detect the open/closed state of the device 400, the relative positioning of components, such as a display and keypad of the apparatus 400, the sensor component 414 can also detect a change in the position of the apparatus 400 or a component of the apparatus 400, the presence or absence of user contact with the apparatus 400, orientation or acceleration/deceleration of the apparatus 400, and a change in the temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 404 comprising instructions, executable by the processor 420 of the apparatus 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of implementation of a voice assistant, comprising:
detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment;
when the audio information is detected, the voice assistant is awakened;
when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type;
when the type of the operation task is a voice broadcasting result type, performing corresponding operation according to the content of the operation task, and inserting the operation result into an audio stream of the call service in an audio format for broadcasting;
and when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (12)

1. A method for implementing a voice assistant is applied to a terminal device, and is characterized by comprising the following steps:
detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment;
when the audio information is detected, waking up a voice assistant;
after the voice assistant is awakened, detecting the type of the ongoing call service of the terminal equipment in a voiceprint recognition mode, wherein the type of the call service comprises a one-to-one call type and a one-to-many call type;
when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type; the type of the operation task indicates a feedback mode of an operation result of the operation task;
when the type of the operation task is a voice broadcast result type, performing corresponding operation according to the content of the operation task, and inserting the operation result into the audio stream of the call service in an audio format for broadcasting; the audio stream of the call service comprises an uplink audio stream and/or a downlink audio stream in the call service;
and when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
2. The method of claim 1, wherein the identifying the type of the operation task triggered by the voice instruction comprises:
when the operation task triggered by the voice instruction belongs to a preset first type task, recognizing that the type of the operation task triggered by the voice instruction is a voice broadcast result type, wherein the first type task at least comprises intelligent voice output and/or conference host;
and when the operation task triggered by the voice instruction is determined to belong to a preset second type task, identifying the type of the operation task triggered by the voice instruction as a data processing type, wherein the second type task at least comprises any one of multimedia information transmission, text recording and call recording.
3. The method of claim 2, further comprising:
the corresponding operation is carried out according to the content of the operation task, and an operation result is inserted into the audio stream of the call service in an audio format for broadcasting, and the method comprises the following steps:
when the type of the call service is a one-to-one call type or a one-to-many call type, if the operation task is determined to be intelligent voice output, extracting keywords contained in the content of the operation task, determining intelligent voice output information corresponding to the keywords, converting the intelligent voice output information into audio data, and inserting the audio data into an audio stream of the call service for broadcasting;
and when the type of the conversation service is a one-to-many conversation type, if the operation task is determined to be a conference host, converting preset conference flow information into audio data according to the content of the operation task, and inserting the audio data into an audio stream of the conversation service for broadcasting.
4. The method according to claim 2, wherein the performing corresponding data processing operations according to the content of the operation task comprises:
when the operation task is determined to be information transmission, data transmission is carried out with an opposite end of a call service according to the content of the operation task;
when the operation task is determined to be a text record, calling a preset application with a text recording function, and storing the content of the operation task into a preset position;
and calling a preset application with a recording function when the operation task is determined to be call recording, and recording the audio stream of the call service according to the content of the operation task.
5. The method of claim 1, further comprising:
pre-storing mapping information of intelligent voice output and key words;
the method comprises the steps of storing keywords as indexes, and storing intelligent voice output information corresponding to the keywords as output values corresponding to the indexes.
6. An apparatus for implementing a voice assistant, comprising:
the first detection module is used for detecting audio information for awakening the voice assistant from the audio stream of the ongoing call service of the terminal equipment;
the awakening module is used for awakening the voice assistant when the audio information is detected;
the second detection module is used for detecting the types of ongoing call services of the terminal equipment in a voiceprint recognition mode after the voice assistant is awakened, wherein the types of the call services comprise a one-to-one call type and a one-to-many call type;
the voice assistant is used for receiving a voice instruction from a user, and sending the voice instruction to the voice assistant; the type of the operation task indicates a feedback mode of an operation result of the operation task;
the first processing module is used for performing corresponding operation according to the content of the operation task when the type of the operation task is a voice broadcast result type, and inserting the operation result into an audio stream of the call service in an audio format for broadcasting; the audio stream of the call service comprises an uplink audio stream and/or a downlink audio stream in the call service;
and the second processing module is used for performing corresponding data processing operation according to the content of the operation task when the type of the operation task is a data processing type.
7. The apparatus of claim 6, wherein the identification module comprises:
the first type identification submodule is used for identifying the type of the operation task triggered by the voice instruction as a voice broadcast result type when the operation task triggered by the voice instruction is determined to belong to a preset first type task, and the first type task at least comprises intelligent voice output and/or conference host;
and the second type identification submodule is used for identifying the type of the operation task triggered by the voice instruction as a data processing type when the operation task triggered by the voice instruction belongs to a preset second type task, wherein the second type task at least comprises any one of multimedia information transmission, text recording and call recording.
8. The apparatus of claim 7, further comprising:
the first processing module comprises:
the intelligent voice output processing submodule is used for extracting keywords contained in the content of the operation task if the operation task is determined to be intelligent voice output when the type of the call service is a one-to-one call type or a one-to-many call type, determining intelligent voice output information corresponding to the keywords, converting the intelligent voice output information into audio data, and inserting the audio data into an audio stream of the call service for broadcasting;
and the conference host processing submodule is used for converting preset conference flow information into audio data according to the content of the operation task and inserting the audio data into the audio stream of the call service for broadcasting if the operation task is determined to be a conference host when the type of the call service is a one-to-many call type.
9. The apparatus of claim 7, wherein the second processing module comprises:
the information transmission submodule is used for carrying out data transmission with an opposite end of the communication service according to the content of the operation task when the operation task is determined to be information transmission;
the text storage sub-module is used for calling a preset application with a text recording function and storing the content of the operation task to a preset position when the operation task is determined to be a text record;
and the recording submodule is used for calling a preset application with a recording function when the operation task is determined to be call recording, and recording the audio stream of the call service according to the content of the operation task.
10. The apparatus of claim 6, further comprising:
the setting module is used for pre-storing mapping information of intelligent voice output and key words;
the method comprises the steps of storing keywords as indexes, and storing intelligent voice output information corresponding to the keywords as output values corresponding to the indexes.
11. An apparatus for implementing a voice assistant, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment;
when the audio information is detected, waking up a voice assistant;
after the voice assistant is awakened, detecting the type of the ongoing call service of the terminal equipment in a voiceprint recognition mode, wherein the type of the call service comprises a one-to-one call type and a one-to-many call type;
when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type; the type of the operation task indicates a feedback mode of an operation result of the operation task;
when the type of the operation task is a voice broadcast result type, performing corresponding operation according to the content of the operation task, and inserting the operation result into an audio stream of the call service in an audio format; the audio stream of the call service comprises an uplink audio stream and/or a downlink audio stream in the call service;
and when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
12. A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a terminal device, enable the terminal device to perform a method of implementation of a voice assistant, the method comprising:
detecting audio information for awakening a voice assistant from an audio stream of an ongoing call service of the terminal equipment;
when the audio information is detected, waking up a voice assistant;
after the voice assistant is awakened, detecting the type of the ongoing call service of the terminal equipment in a voiceprint recognition mode, wherein the type of the call service comprises a one-to-one call type and a one-to-many call type;
when the voice assistant detects a voice instruction, identifying the type of an operation task triggered by the voice instruction, wherein the type of the operation task comprises a voice broadcast result type and a data processing type; the type of the operation task indicates a feedback mode of an operation result of the operation task;
when the type of the operation task is a voice broadcast result type, performing corresponding operation according to the content of the operation task, and inserting the operation result into an audio stream of the call service in an audio format; the audio stream of the call service comprises an uplink audio stream and/or a downlink audio stream in the call service;
and when the type of the operation task is a data processing type, performing corresponding data processing operation according to the content of the operation task.
CN202010337041.5A 2020-04-26 2020-04-26 Method and device for realizing voice assistant and computer storage medium Active CN111556197B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010337041.5A CN111556197B (en) 2020-04-26 2020-04-26 Method and device for realizing voice assistant and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010337041.5A CN111556197B (en) 2020-04-26 2020-04-26 Method and device for realizing voice assistant and computer storage medium

Publications (2)

Publication Number Publication Date
CN111556197A CN111556197A (en) 2020-08-18
CN111556197B true CN111556197B (en) 2022-06-03

Family

ID=72004367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010337041.5A Active CN111556197B (en) 2020-04-26 2020-04-26 Method and device for realizing voice assistant and computer storage medium

Country Status (1)

Country Link
CN (1) CN111556197B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112153223B (en) * 2020-10-23 2021-12-14 北京蓦然认知科技有限公司 Method for voice assistant to recognize and execute called user instruction and voice assistant
CN112291438B (en) * 2020-10-23 2021-10-01 北京蓦然认知科技有限公司 Method for controlling call and voice assistant
CN112530398A (en) * 2020-11-14 2021-03-19 国网河南省电力公司检修公司 Portable human-computer interaction operation and maintenance device based on voice conversion function
CN112565659B (en) * 2020-12-07 2023-08-18 康佳集团股份有限公司 Method for executing voice command during audio and video application work
CN114302197A (en) * 2021-03-19 2022-04-08 海信视像科技股份有限公司 Voice separation control method and display device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106791071A (en) * 2016-12-15 2017-05-31 珠海市魅族科技有限公司 Call control method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11348586B2 (en) * 2018-06-21 2022-05-31 Dell Products L.P. Systems and methods for extending and enhancing voice assistant and/or telecommunication software functions to a remote endpoint device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106791071A (en) * 2016-12-15 2017-05-31 珠海市魅族科技有限公司 Call control method and system

Also Published As

Publication number Publication date
CN111556197A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111556197B (en) Method and device for realizing voice assistant and computer storage medium
EP3113466A1 (en) Method and device for warning
CN107767864B (en) Method and device for sharing information based on voice and mobile terminal
CN106791921B (en) Processing method and device for live video and storage medium
RU2635292C2 (en) Method and device for processing communication message
CN111063354B (en) Man-machine interaction method and device
EP4184506A1 (en) Audio processing
CN112751971A (en) Voice playing method and device and electronic equipment
CN111833868A (en) Voice assistant control method, device and computer readable storage medium
CN106656746B (en) Information output method and device
CN111540350B (en) Control method, device and storage medium of intelligent voice control equipment
CN108648754B (en) Voice control method and device
CN106375178B (en) Message display method and device based on instant messaging
CN109670025B (en) Dialogue management method and device
CN111181844A (en) Message processing method, device and medium
CN107247794B (en) Topic guiding method in live broadcast, live broadcast device and terminal equipment
CN109325337A (en) Unlocking method and device
CN109474744B (en) Alarm clock processing method, device and storage medium
CN109274825B (en) Message reminding method and device
CN112019948A (en) Intercommunication device communication method, intercommunication device and storage medium
CN112863511A (en) Signal processing method, signal processing apparatus, and storage medium
CN111968680A (en) Voice processing method, device and storage medium
CN111667827A (en) Voice control method and device of application program and storage medium
CN112489650A (en) Wake-up control method and device, storage medium and terminal
CN109119075A (en) Speech recognition scene awakening method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant