CN113038063B - Method, apparatus, device, medium and product for outputting a prompt - Google Patents

Method, apparatus, device, medium and product for outputting a prompt Download PDF

Info

Publication number
CN113038063B
CN113038063B CN202110313078.9A CN202110313078A CN113038063B CN 113038063 B CN113038063 B CN 113038063B CN 202110313078 A CN202110313078 A CN 202110313078A CN 113038063 B CN113038063 B CN 113038063B
Authority
CN
China
Prior art keywords
sound
state
information
preset
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110313078.9A
Other languages
Chinese (zh)
Other versions
CN113038063A (en
Inventor
褚长森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110313078.9A priority Critical patent/CN113038063B/en
Publication of CN113038063A publication Critical patent/CN113038063A/en
Application granted granted Critical
Publication of CN113038063B publication Critical patent/CN113038063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a method, a device, equipment, a medium and a product for outputting a prompt, which relate to the field of computers and further relate to the technical field of artificial intelligence. The specific implementation scheme is as follows: acquiring state information of a target window; acquiring sound information in response to the fact that the state information meets the preset state condition; determining the sound type of the sound information based on the sound information and a preset model; and outputting prompt information based on the sound category. The realization mode can prompt the state of the microphone, thereby improving the remote conference effect.

Description

Method, apparatus, device, medium and product for outputting a prompt
Technical Field
The present disclosure relates to the field of computers, and more particularly to the field of artificial intelligence techniques, and more particularly to methods, apparatus, devices, media and products for outputting a prompt.
Background
At present, the popularity of teleconferencing is increasing more and more, and especially under some special scenes, the staff need to do remote office work, and at this moment, most of work content communication all needs to be carried out through the mode of teleconferencing.
In a remote conference scene, especially a multi-person remote conference scene, it is often necessary to rely on employees to control the on and off of the microphone states. Employees turn on the microphone in situations where they need to speak and turn off the microphone in situations where they do not need to speak. However, it is often the case that the employee does not control his microphone state, thereby affecting the effect of the teleconference.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, medium, and article of manufacture for outputting a prompt.
According to a first aspect, there is provided a method for outputting a prompt, comprising: acquiring state information of a target window; acquiring sound information in response to the fact that the state information meets the preset state condition; determining the sound type of the sound information based on the sound information and a preset model; and outputting prompt information based on the sound category.
According to a second aspect, there is provided an apparatus for outputting a prompt, comprising: a status acquisition unit configured to acquire status information of the target window; a sound acquisition unit configured to acquire sound information in response to determining that the state information satisfies a preset state condition; a category determination unit configured to determine a sound category of the sound information based on the sound information and a preset model; a prompt output unit configured to output prompt information based on the sound category.
According to a third aspect, there is provided an electronic device that performs a method for outputting a prompt, comprising: one or more computing units; a storage unit for storing one or more programs; when executed by one or more computing units, cause the one or more computing units to implement a method for outputting a prompt as described above.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method for outputting a prompt as any one of the above.
According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a computing unit, implements a method for outputting a prompt as any one of the above.
According to the technology of the application, a method for outputting the prompt is provided, by acquiring the state information of the target window, the sound information can be acquired under the condition that the state information meets the preset state condition, and the prompt information is output based on the sound type of the sound information. Therefore, under the condition that the display state of the target window is in the shielded display state, the speaking requirement of the user is determined based on the sound information, the microphone state is prompted based on the speaking requirement, and if the target window is shielded and the user is in the speaking state, the prompting information prompting the current microphone to be silent is output, so that the probability of occurrence of the condition that the speaking of the user reduces the conference communication efficiency under the silent state is reduced, and the teleconference effect is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for outputting a prompt, according to the present application;
FIG. 3 is a schematic diagram of one application scenario of a method for outputting a prompt according to the present application;
FIG. 4 is a flow diagram of another embodiment of a method for outputting a prompt in accordance with the present application;
FIG. 5 is a schematic diagram illustrating one embodiment of an apparatus for outputting alerts, according to the present application;
FIG. 6 is a block diagram of an electronic device used to implement a method for outputting a prompt of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is an exemplary system architecture diagram according to a first embodiment of the present disclosure, showing an exemplary system architecture 100 to which embodiments of the method for outputting a hint of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The terminal devices 101, 102, and 103 may be electronic devices such as a mobile phone, a computer, and a tablet, and an application for performing a teleconference may be installed in the terminal devices 101, 102, and 103, and an online conference may be implemented by running the application. And the application may be configured to allow access to the microphones of the terminal devices 101, 102, 103, so that the microphones of the terminal devices 101, 102, 103 can be invoked to enable the user to speak online. In addition, in order to reduce the noise interference in the online conference scenario, the user may manually configure the on/off of the microphone to manually turn on the microphone when the user needs to speak in the online conference, and manually turn off the microphone when the user does not need to speak.
Further, the terminal devices 101, 102, 103 may acquire state information of a window of an application for performing a teleconference, acquire sound information if the window is in a minimized or non-frontmost display state, determine a sound category of the sound information based on the sound information and a preset model, such as determining that the sound information is human voice, and output prompt information for prompting a microphone state based on the sound category. For example, when the sound information is a human voice, the sound information is output to indicate that the microphone is silent. Optionally, after the terminal devices 101, 102, and 103 output the prompt information, a touch operation of the user may be received, and the microphone state of the application may be configured.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, televisions, smart phones, tablet computers, e-book readers, car-mounted computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, and for example, may acquire microphone configuration information of an application for performing a teleconference in the terminal apparatuses 101, 102, 103 and synchronize the microphone configuration information to other apparatuses participating in the teleconference.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for outputting a prompt provided in the embodiment of the present application may be executed by the terminal devices 101, 102, and 103, or may be executed by the server 105. Accordingly, the means for outputting the prompt may be provided in the terminal apparatuses 101, 102, 103, or may be provided in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting a prompt in accordance with the present application is illustrated. The method for outputting the prompt of the embodiment comprises the following steps:
step 201, obtaining the state information of the target window.
In the present embodiment, the execution agent (e.g., the terminal device 101, 102, 103 or the server 105 in fig. 1) may detect the state of each currently running window through a preset system interface. The preset system interface may include, but is not limited to, an interface of the Windows system itself and an interface of the Mac system itself. Further, the target window may be a window corresponding to the teleconference application, where the window may be used to enter a conference interface, and the conference interface may include, but is not limited to, participant information, speech information, microphone state information of each participant, and the like, which is not limited in this embodiment. The state information of the target window is used for indicating the current display state of the target window, and may include, but is not limited to, a window display size and a window display position. The window display size is used to describe a size of the target window relative to a display screen of the execution main body, such as full-screen display, minimized display, display according to a user-defined size, and the like, and the window display position is used to describe a positional relationship between the target window and another window, such as frontmost display, rearmost display, middle position display, and the like, which is not limited in this embodiment.
In response to determining that the state information satisfies the preset state condition, the sound information is acquired, step 202.
In this embodiment, after acquiring the state information of the target window, the execution subject may match the state information with a preset state condition, where the preset state condition may be used to determine whether the window is in a state where display is blocked. For example, if a window is minimized or the window display content is occluded by another window, the window is said to be in a state where the display is occluded. When the window is in a state that the display is blocked, the current microphone state is often difficult to acquire from the content displayed on the window, and at the moment, the sound information can be acquired, and the prompt information is output based on the sound type of the sound information, so that the microphone state is prompted, and the prompt effect is better. The sound information may be sound information collected by the execution subject at the current time, and may include, but is not limited to, a human voice, an environmental sound, and the like.
Optionally, the determining whether the state information satisfies the preset state condition may be implemented by: determining icon coordinates corresponding to the microphone icons in the target window; determining whether the icon coordinates are in an occluded state based on the state information; if the mobile terminal is in the shielded state, determining that the state information meets a preset state condition, and acquiring sound information; and if the mobile terminal is not in the shielded state, determining that the state information does not meet the preset state condition, and not executing the step of acquiring the sound information.
And step 203, determining the sound type of the sound information based on the sound information and a preset model.
In this embodiment, the preset model is a model for determining a sound category based on sound recognition, and may be a two-class model or a multi-class model, which is not limited in this embodiment. In the case that the preset model is a binary model, the sound category of the sound information can be identified as human sound or non-human sound based on the preset model; under the condition that the preset model is a multi-classification model, the voice of the target user and the voices of other users can be distinguished during model training, and then the voice category of the voice information is recognized to be the voice of the target user, the voice of the other users or the voice of the non-human users based on the preset model, so that whether the currently detected voice information is the voice emitted by the target user or not can be determined more accurately.
And step 204, outputting prompt information based on the sound type.
In this embodiment, if the preset model is the above two-class model, the prompt information is output when the sound class is human sound; and if the preset model is the multi-classification model, outputting prompt information under the condition that the sound type is the voice of the target user. The prompt information is used for prompting the microphone state in the conference corresponding to the target window, such as the microphone is turned on or the microphone is turned off. Therefore, when the user speaks in the conference, the information for prompting the microphone state is output, so that the user can know the speaking effectiveness of the user in time. Optionally, the prompt information for prompting the current speech in the mute state may be output only when the microphone is turned off and the user speaks, or the prompt information for prompting the current microphone state may be output when the user speaks regardless of whether the microphone is turned off or on, which is not limited in this embodiment.
With continued reference to FIG. 3, a schematic diagram of one application scenario of a method for outputting a prompt in accordance with the present application is shown. In the application scenario of fig. 3, the remote conference information can be presented through the target window 3011 displayed on the display screen 301 of the terminal device, at this time, the target window 3011 is in a minimized display state, that is, an icon of the target window 3011 is displayed on a bottom taskbar of the display screen 301, and the window information in the target window 3011 is not displayed in the display area of the display screen 301. At this time, the execution subject may obtain the state information of the target window 3011, and determine that the target window 3011 is in the minimized display state, that is, the window information is in the blocked state, so it is determined that the state information satisfies the preset state condition. Then, the execution subject collects the sound information and determines the sound category based on the sound information and a preset model. If the sound category is human voice and the microphone state in the teleconference is configured to be in a mute state at this time, prompt information 3012 is output for prompting the user to speak in the mute state. Optionally, a virtual key for muting may be displayed in the prompt 3012, and the user may directly click the virtual key to turn on a microphone in the teleconference. In the process, the target window 3011 does not need to be opened, and then a virtual key for configuring the microphone state is found in the target window 3011, so that the microphone is opened, the microphone opening step is simplified, and the microphone is controlled more flexibly and conveniently.
According to the method for outputting the prompt, provided by the embodiment of the application, by acquiring the state information of the target window, the sound information can be acquired under the condition that the state information meets the preset state condition, and the prompt information is output based on the sound type of the sound information. Therefore, under the condition that the display state of the target window is in the shielded display state, the speaking requirement of the user is determined based on the sound information, the microphone state is prompted based on the speaking requirement, and if the target window is shielded and the user is in the speaking state, prompting information prompting the current microphone to be muted is output, so that the probability that the conference communication efficiency is reduced when the user speaks in the mute state is reduced, and the teleconference effect is improved.
With continued reference to FIG. 4, a flow 400 of another embodiment of a method for outputting a prompt in accordance with the present application is shown. As shown in fig. 4, the method for outputting a prompt of the present embodiment may include the following steps:
step 401, obtaining the state information of the target window.
In this embodiment, please refer to the detailed description of step 201 for the detailed description of step 401, which is not repeated herein.
Step 402, in response to determining that the state information satisfies a preset state condition and detecting that sound exists, obtaining sound information.
In this embodiment, after determining that the state information satisfies the preset state condition, the execution main body may further determine whether sound exists in the current environment, and if so, acquire the sound information. If no sound is present, the subsequent step of outputting the prompt message is not performed. For a detailed description of step 402, refer to the detailed description of step 202, which is not repeated herein.
In some optional implementations of this embodiment, the preset state condition includes any one of: displaying the window at the non-foremost end; alternatively, the window is in a minimized state.
In this implementation, the non-frontmost display of the window means that there are other windows located at the front end of the target window, that is, there are other windows that obscure the target window. Optionally, the preset state condition may further include but is not limited to: the microphone status icon in the window is occluded and the window does not have a microphone status icon.
And step 403, determining the sound type of the sound information based on the sound information and a preset model.
In this embodiment, please refer to the detailed description of step 203 for the detailed description of step 403, which is not repeated herein.
In some optional implementations of this embodiment, the preset model is obtained by training through the following steps: acquiring a sound sample set, wherein each sound sample in the sound sample set has a labeled category; inputting each sound sample in the sound sample set into a model to be trained, and determining a sample sound category corresponding to each sound sample; and adjusting the model parameters of the model to be trained based on the sound classes of the samples and the labeled classes of the sound samples to obtain a preset model.
In this implementation manner, in a case that the preset model is a binary model, the sound sample set may include a human sound sample and a non-human sound sample, and the type of the human sound sample is labeled as human sound, and the type of the non-human sound sample is labeled as non-human sound. Then, each sound sample in the sound sample set can be input into a model to be trained, and the model to be trained outputs a corresponding sample sound category; and constructing a loss function based on the difference between the sample sound category and the labeled category, continuously performing iterative training, and adjusting the model parameters of the model to be trained until the loss function is converged to obtain the trained preset model. Under the condition that the preset model is a multi-classification model, the sound sample set can comprise a target human sound sample, other human sound samples and non-human sound samples, the type of the target human sound sample is marked as the target human sound, the type of the other human sound sample is marked as the other human sound, and the type of the non-human sound sample is marked as the non-human sound. The subsequent model training steps are the same as those of the binary models, and are not described herein again.
Step 404, determining an application corresponding to the target window.
In this embodiment, the target window may be a window corresponding to a remote conference application, and in a case where the execution subject receives an instruction for instructing to open the target window, the execution subject may establish a correspondence between the target window and the application. Thereafter, the execution subject may determine an application corresponding to the target window based on the correspondence between each window and the application. Optionally, if there are multiple windows corresponding to the teleconference application, the window including the microphone configuration information is determined as the target window, so as to output corresponding microphone state prompt information under the condition that the microphone configuration information is not directly displayed, and prompt timing is more accurate.
Step 405, microphone configuration information of an application is obtained.
In this embodiment, the microphone configuration information of the application may include a microphone mute configuration and a microphone on configuration, and the microphone configuration information is used to describe whether the conference in progress by the teleconference application allows the microphone right of the execution subject to be used.
It should be noted that, steps 404 to 405 may be executed after determining the sound type of the sound information, or may be executed before determining the sound type of the sound information, which is not limited in this embodiment.
Step 406, in response to determining that the microphone configuration information is in a mute configuration state and the sound type is voice, outputting prompt information and/or a target virtual key in a target area according to a preset display form; the target virtual key is used for releasing the mute configuration state.
In this embodiment, the mute configuration state is used to indicate that the conference in progress of the teleconference application is not allowed to use the microphone rights of the execution subject. If the microphone configuration information is in a mute configuration state and the sound type is human voice, it indicates that the user speaks in the mute state of the conference, and at this time, prompt information may be output in a target area on the display screen of the execution main body. Optionally, the target virtual key may also be output, or the prompt message and the target virtual key may be output simultaneously. The user can release the mute configuration state by performing a touch operation on the target virtual key. The target area may be a lower left corner, a lower right corner, a middle screen, an upper left corner, an upper right corner, and the like, which is not limited in this embodiment. The preset display form may be a pop-up window form, a floating window form, and the like, which is not limited in this embodiment.
Step 407, in response to detecting the touch operation for the target virtual key, the mute configuration state is released.
In this embodiment, the execution main body may further detect whether a touch operation for the target virtual key is received, and in response to detecting the touch operation, control to release the mute configuration state in the application, and update the microphone configuration information.
The method for outputting the prompt provided by the above embodiment of the application may further output the prompt information in the target area according to a preset display form under the condition that the microphone configuration information of the teleconference application is in a mute configuration state and the sound type is human voice, so that the prompt information is output in an obvious display form at a conspicuous position, the prompt is more obvious, and the prompt effect is good. In addition, the target virtual key can be output, so that a user can directly remove the mute configuration state according to the output target virtual key, and the mute removal is more convenient.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for outputting a prompt, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various servers.
As shown in fig. 5, the apparatus 500 for outputting a prompt of the present embodiment includes: a state acquisition unit 501, a sound acquisition unit 502, a category determination unit 503, and a presentation output unit 504.
A state acquisition unit 501 configured to acquire state information of the target window.
A sound obtaining unit 502 configured to obtain sound information in response to determining that the state information satisfies a preset state condition.
A category determination unit 503 configured to determine a sound category of the sound information based on the sound information and a preset model.
A prompt output unit 504 configured to output prompt information based on the sound category.
In some optional implementations of this embodiment, the preset state condition includes any one of: displaying the window at the non-foremost end; alternatively, the window is in a minimized state.
In some optional implementations of this embodiment, the apparatus further includes: an application determining unit configured to determine an application corresponding to the target window; an information acquisition unit configured to acquire microphone configuration information of an application.
In some optional implementations of the present embodiment, the prompt output unit is further configured to: responding to the fact that the microphone configuration information is determined to be in a mute configuration state and the sound type is human voice, and outputting prompt information and/or target virtual keys in a target area according to a preset display form; the target virtual key is used for releasing the mute configuration state.
In some optional implementations of this embodiment, the apparatus further includes: a state release unit configured to release the mute configuration state in response to detecting a touch operation for the target virtual key.
In some optional implementations of the present embodiment, the sound capturing unit is further configured to: in response to determining that the state information satisfies a preset state condition and detecting the presence of sound, sound information is acquired.
In some optional implementations of this embodiment, the preset model is obtained by training through the following steps: acquiring a sound sample set, wherein each sound sample in the sound sample set has a labeled category; inputting each sound sample in the sound sample set into a model to be trained, and determining a sample sound category corresponding to each sound sample; and adjusting the model parameters of the model to be trained based on the sound classes of the samples and the labeled classes of the sound samples to obtain a preset model.
According to the device for outputting the prompt, provided by the embodiment of the application, by acquiring the state information of the target window, the sound information can be acquired under the condition that the state information meets the preset state condition, and the prompt information is output based on the sound type of the sound information. Therefore, under the condition that the display state of the target window is in the shielded display state, the speaking requirement of the user is determined based on the sound information, the microphone state is prompted based on the speaking requirement, and if the target window is shielded and the user is in the speaking state, prompting information prompting the current microphone to be muted is output, so that the probability that the conference communication efficiency is reduced when the user speaks in the mute state is reduced, and the teleconference effect is improved.
It should be understood that the units 501 to 504 recited in the apparatus 500 for outputting a prompt correspond to the respective steps in the method described with reference to fig. 2, respectively. Thus, the operations and features described above for the method of outputting a prompt are equally applicable to the apparatus 500 and the units included therein, and are not described in detail here.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present application.
FIG. 6 shows a block diagram of an electronic device 600 used to implement a method for outputting a prompt of an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, and the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 601 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as a method for outputting a prompt. For example, in some embodiments, the method for outputting a prompt may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method for outputting a prompt described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the method for outputting the prompt.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (16)

1. A method for outputting a prompt, comprising:
acquiring state information of a target window, wherein the state information is used for indicating the current display state of the target window;
judging whether the state information meets a preset state condition, wherein the judging whether the state information meets the preset state condition comprises the following steps: determining icon coordinates corresponding to the microphone icons in the target window; determining whether the icon coordinates are in an occluded state based on the state information; if the object window is in the shielded state or the object window does not have the microphone icon, the current microphone state cannot be acquired from the content displayed by the window, and the state information is determined to meet the preset state condition;
in response to determining that the state information meets the preset state condition, acquiring sound information; in response to determining that the state information does not satisfy the preset state condition, not acquiring the sound information;
determining the sound category of the sound information based on the sound information and a preset model;
and outputting prompt information based on the sound category.
2. The method of claim 1, wherein the preset state condition comprises any one of:
displaying the window at the non-foremost end; or alternatively
The window is in a minimized state.
3. The method of claim 1, wherein the method further comprises:
determining an application corresponding to the target window;
microphone configuration information of the application is obtained.
4. The method of claim 3, wherein outputting a prompt based on the sound category comprises:
in response to the fact that the microphone configuration information is determined to be in a mute configuration state and the sound type is human sound, outputting the prompt information and/or the target virtual key in a target area according to a preset display form; the target virtual key is used for releasing the mute configuration state.
5. The method of claim 4, wherein the method further comprises:
and in response to detecting the touch operation aiming at the target virtual key, releasing the mute configuration state.
6. The method of claim 1, wherein the obtaining sound information in response to determining that the status information satisfies a preset status condition comprises:
and in response to determining that the state information meets the preset state condition and that sound is detected to exist, acquiring the sound information.
7. The method of claim 1, wherein the preset model is trained by:
acquiring a sound sample set, wherein each sound sample in the sound sample set has a labeled category;
inputting each sound sample in the sound sample set into a model to be trained, and determining a sample sound category corresponding to each sound sample;
and adjusting the model parameters of the model to be trained based on the sound classes of the samples and the classes marked by the sound samples to obtain the preset model.
8. An apparatus for outputting a prompt, comprising:
a state obtaining unit configured to obtain state information of a target window, the state information indicating a current display state of the target window;
a state determination unit configured to determine whether the state information satisfies a preset state condition, the state determination unit further configured to: determining icon coordinates corresponding to the microphone icons in the target window; determining whether the icon coordinates are in an occluded state based on the state information; if the object window is in the shielded state or the object window does not have the microphone icon, the current microphone state cannot be acquired from the content displayed by the window, and the state information is determined to meet the preset state condition; a sound acquisition unit configured to acquire sound information in response to determining that the state information satisfies the preset state condition; in response to determining that the state information does not satisfy the preset state condition, not acquiring the sound information;
a category determination unit configured to determine a sound category of the sound information based on the sound information and a preset model;
a prompt output unit configured to output prompt information based on the sound category.
9. The apparatus of claim 8, wherein the preset state condition comprises any one of:
displaying the window at the non-foremost end; or
The window is in a minimized state.
10. The apparatus of claim 8, wherein the apparatus further comprises:
an application determining unit configured to determine an application corresponding to the target window;
an information acquisition unit configured to acquire microphone configuration information of the application.
11. The apparatus of claim 10, wherein the cue output unit is further configured to:
in response to the fact that the microphone configuration information is determined to be in a mute configuration state and the sound type is human voice, outputting the prompt information and/or the target virtual key in a target area according to a preset display form; the target virtual key is used for releasing the mute configuration state.
12. The apparatus of claim 11, wherein the apparatus further comprises:
a state release unit configured to release the mute configuration state in response to detection of a touch operation with respect to the target virtual key.
13. The apparatus of claim 8, wherein the sound capture unit is further configured to:
and acquiring the sound information in response to determining that the state information meets the preset state condition and the existence of sound is detected.
14. The apparatus of claim 8, wherein the preset model is trained by:
acquiring a sound sample set, wherein each sound sample in the sound sample set has a labeled category;
inputting each sound sample in the sound sample set into a model to be trained, and determining a sample sound category corresponding to each sound sample;
and adjusting the model parameters of the model to be trained based on the sound classes of the samples and the labeled classes of the sound samples to obtain the preset model.
15. An electronic device that performs a method for outputting a prompt, comprising:
at least one computing unit; and
a storage unit communicatively coupled to the at least one computing unit; wherein, the first and the second end of the pipe are connected with each other,
the storage unit stores instructions executable by the at least one computing unit to enable the at least one computing unit to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN202110313078.9A 2021-03-24 2021-03-24 Method, apparatus, device, medium and product for outputting a prompt Active CN113038063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110313078.9A CN113038063B (en) 2021-03-24 2021-03-24 Method, apparatus, device, medium and product for outputting a prompt

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110313078.9A CN113038063B (en) 2021-03-24 2021-03-24 Method, apparatus, device, medium and product for outputting a prompt

Publications (2)

Publication Number Publication Date
CN113038063A CN113038063A (en) 2021-06-25
CN113038063B true CN113038063B (en) 2023-03-03

Family

ID=76473218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110313078.9A Active CN113038063B (en) 2021-03-24 2021-03-24 Method, apparatus, device, medium and product for outputting a prompt

Country Status (1)

Country Link
CN (1) CN113038063B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102025972A (en) * 2010-12-16 2011-04-20 中兴通讯股份有限公司 Mute indication method and device applied for video conference
CN111787263A (en) * 2020-07-13 2020-10-16 睿魔智能科技(深圳)有限公司 Control method, system, storage medium and equipment for video communication

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160037129A1 (en) * 2014-08-01 2016-02-04 Cisco Technology, Inc. Method and Apparatus for Enhanced Caller ID
CN108111701A (en) * 2016-11-24 2018-06-01 北京中创视讯科技有限公司 Silence processing method and device
CN111343410A (en) * 2020-02-14 2020-06-26 北京字节跳动网络技术有限公司 Mute prompt method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102025972A (en) * 2010-12-16 2011-04-20 中兴通讯股份有限公司 Mute indication method and device applied for video conference
CN111787263A (en) * 2020-07-13 2020-10-16 睿魔智能科技(深圳)有限公司 Control method, system, storage medium and equipment for video communication

Also Published As

Publication number Publication date
CN113038063A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
US10811008B2 (en) Electronic apparatus for processing user utterance and server
CN113325954B (en) Method, apparatus, device and medium for processing virtual object
JP2021196599A (en) Method and apparatus for outputting information
CN113841118B (en) Activation management for multiple voice assistants
CN111862987B (en) Speech recognition method and device
US20170177298A1 (en) Interacting with a processing stsyem using interactive menu and non-verbal sound inputs
CN111863036B (en) Voice detection method and device
US20220068267A1 (en) Method and apparatus for recognizing speech, electronic device and storage medium
CN113242358A (en) Audio data processing method, device and system, electronic equipment and storage medium
US10936823B2 (en) Method and system for displaying automated agent comprehension
CN113038063B (en) Method, apparatus, device, medium and product for outputting a prompt
CN112669855A (en) Voice processing method and device
EP4030424A2 (en) Method and apparatus of processing voice for vehicle, electronic device and medium
CN114238821B (en) Popup window processing method and device
CN115312042A (en) Method, apparatus, device and storage medium for processing audio
CN113808585A (en) Earphone awakening method, device, equipment and storage medium
CN114333017A (en) Dynamic pickup method and device, electronic equipment and storage medium
CN111724805A (en) Method and apparatus for processing information
CN112969000A (en) Control method and device of network conference, electronic equipment and storage medium
CN111986682A (en) Voice interaction method, device, equipment and storage medium
CN114286343B (en) Multi-way outbound system, risk identification method, equipment, medium and product
US20230117749A1 (en) Method and apparatus for processing audio data, and electronic device
JP2020024310A (en) Speech processing system and speech processing method
CN114221940B (en) Audio data processing method, system, device, equipment and storage medium
CN108417208A (en) A kind of pronunciation inputting method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant